Detection and Estimation of Signals in Noise
Detection and Estimation of Signals in Noise
in Noise
Dr. Robert Schober
Department of Electrical and Computer Engineering
University of British Columbia
Contents
1 Basic Elements of a Digital Communication
System 1
1.1 Transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Communication Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 Physical Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.2 Mathematical Models for Communication Channels . . . . . . . . . . . . 5
Course Outline
1. Basic Elements of a Digital Communication System
5. Bandlimited Channels
6. Equalization
Channel
1.1 Transmitter
a) Information Source
– analog signal: e.g. audio or video signal
– digital signal: e.g. data, text
b) Source Encoder
Objective: Represent the source signal as efficiently as
possible, i.e., with as few bits as possible
⇒ minimize the redundancy in the source
encoder output bits
c) Channel Encoder
Objective: Increase reliability of received data
⇒ add redundancy in a controlled manner to
information bits
d) Digital Modulator
Objective: Transmit most efficiently over the
(physical) transmission channel
⇒ map the input bit sequence to a signal waveform
which is suitable for the transmission channel
Examples: Binary modulation:
bit 0 → s0(t)
bit 1 → s1(t)
⇒ 1 bit per channel use
M –ary modulation:
we map b bits to one waveform
⇒ we need M = 2b different waveforms to represent
all possible b–bit combinations
⇒ b bit/(channel use)
1.2 Receiver
a) Digital Demodulator
Objective: Reconstruct transmitted data symbols (binary or
M –ary from channel–corrupted received signal
b) Channel Decoder
Objective: Exploit redundancy introduced by channel encoder
to increase reliability of information bits
Note: In modern receivers demodulation and decoding is
sometimes performed in an iterative fashion.
c) Source Decoder
Objective: Reconstruct original information signal from
output of channel decoder
a) Types
– wireline
– optical fiber
– optical wireless channel
– wireless radio frequency (RF) channel
– underwater acoustic channel
– storage channel (CD, disc, etc.)
b) Impairments
– noise from electronic components in transmitter and receiver
– amplifier nonlinearities
– other users transmitting in same frequency band at the same
time (co–channel or multiuser interference)
– linear distortions because of bandlimited channel
– time–variance in wireless channels
For the design of the transmitter and the receiver we need a simple
mathematical model of the physical communication channel that
captures its most important properties. This model will vary from
one application to another.
s(t) α r(t)
n(t)
ejϕ
s(t) α r(t)
n(t)
n(t)
c(t): channel impulse response; ∗: linear convolution
r(t) = c(t) ∗ s(t) + n(t)
Z∞
= c(τ ) s(t − τ )dτ + n(t)
−∞
d) Multiuser Channel
Two users:
s1(t)
r(t)
s2(t)
n(t)
K–user channel:
X
K
r(t) = sk (t) + n(t)
k=1
e) Other Channels
– time–variant channels
– stochastic (random) channels
– fading channels
– multiple–input multiple–output (MIMO) channels
– ...
Motivation:
2.1 Probability
Definitions:
Axioms of Probability
P (Ā) = 1 − P (A)
A ∩ B 6= /◦ then P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
Example:
Fair die
– S = {1, 2, 3, 4, 5, 6}
– A = {1, 2, 5}, B = {3, 4, 5}
– Ā = {3, 4, 6}
– D = A ∪ B = {1, 2, 3, 4, 5}
– E = A ∩ B = {5}
1
– P (1) = P (2) = . . . = P (6) = 6
Conditional Probability
Definition:
P (A, B)
P (A|B) =
P (B)
Bayes’ Theorem:
From
P (A, B) = P (A|B) P (B) = P (B|A) P (A)
we get
P (B|A) P (A)
P (A|B) =
P (B)
Statistical Independence
If observing B does not change the probability of observing A, i.e.,
P (A|B) = P (A),
then A and B are statistically independent.
In this case:
P (A, B) = P (A|B) P (B)
= P (A) P (B)
Thus, two events A and B are statistically independent if and only if
Example:
lim F (x) = 1
x→∞
d
F (x) ≥ 0
dx
Example:
1. Fair die X = X(s) = s
F (x)
1/6
x
1 2 3 4 5 6
Note: X is a discrete random variable.
F (x)
1
Definition:
dF (x)
p(x) = , −∞ < x < ∞
dx
Properties:
p(x) ≥ 0
Zx
F (x) = p(u)du
−∞
Z∞
p(x)dx = 1
−∞
Probability that x1 ≤ X ≤ x2
Zx2
P (x1 ≤ X ≤ x2) = p(x)dx = F (x2) − F (x1)
x1
Example:
Fair die
p(x)
1/6
x
1 2 3 4 5 6
Joint CDF:
FXY (x, y) = P (X ≤ x, Y ≤ y)
Zx Zy
= pXY (u, v) du dv
−∞ −∞
Joint PDF:
∂2
pXY (x, y) = FXY (x, y)
∂x∂y
Marginal densities
Z∞
pX (x) = pXY (x, y) dy
−∞
Z∞
pY (y) = pXY (x, y) dx
−∞
Conditional PDF
pXY (x, y)
pX|Y (x|y) =
pY (y)
Conditional CDF
Zx
FX|Y (x|y) = pX|Y (u|y) du
−∞
Rx
pXY (u, y) du
−∞
=
pY (y)
Statistical Independence
X and Y are statistical independent if and only if
pXY (x, y) = pX (x) pY (y)
FXY (x, y) = FX (x) FY (y)
CDF
FZ (z) = P (X ≤ x, Y ≤ y) = FXY (x, y)
PDF
pZ (z) = pXY (x, y)
Special Cases:
= pX (x) dx
−∞
y−b
= FX
a
– PDF
∂ ∂ ∂x
pY (y) = FY (y) = FX (x)
∂y ∂y ∂x
x=(y−b)/a
∂x ∂
= FX (x)
∂y ∂x
x=(y−b)/a x=(y−b)/a
1 y−b
= pX
a a
d
with g (xi) = dx g(x)
′
x=xi
Example:
Y = aX 2 + b, a > 0
Roots: r
y−b
ax2 + b = y ⇒ x1/2 = ±
a
g ′ (xi):
d
g(x) = 2ax
dx
PDF: q q
y−b
pX a
pX − y−ba
pY (y) = q + q
y−b
2a a 2a y−b
a
– PDF:
with
∗ gi−1 = gi−1 (y1, y2, . . . , yn )
∗ Jacobian of transformation
−1
∂ ( g1 ) ∂ (gn−1 )
···
∂y. 1 ∂y1
...
J = .−1
.
∂ ( g1 ) ∂ (gn−1 )
∂yn
··· ∂yn
∗ |J |: Determinant of matrix J
– Solution:
From x1 = y − x2 we obtain the joint PDF of Y and X2
⇒ pY X2 (y, x2) = pX1X2 (x1, x2)
x1 =y−x2
Z∞
= pX1X2 (x1, y − x1) dx1
−∞
Z∞
pY (y) = pX1 (x1)pX2 (y − x1) dx1 = pX1 (x1) ∗ pX2 (x2)
−∞
Expected value of Y :
Z∞ Z∞
E{Y } = E{g(X)} = ··· g(x) pX (x) dx1 . . . dxn
−∞ −∞
Z∞ Z∞ Z∞
= x2 pX (x) dx + m2X pX (x) dx − 2 mX x pX (x) dx
−∞ −∞ −∞
Z∞
= x2 pX (x) dx − m2X
−∞
= E{X 2} − (E{X})2
2
Complex case: σX = E{|X|2} − |E{X}|2
∗ Solution:
ψY (jv) = E{ejvY }
= E{ejv(X1+X2)}
= E{ejvX1 } E{ejvX2 }
= ψX1 (jv) ψX2 (jv)
Covariance matrix
M = E{(X − mX ) (X − mX )H }
= R − mX mH
X
CDF
Zx Zx
1 2 /(2σ 2 )
F (x) = p(u) du = √ e−(u−mX ) du
2πσ
−∞ −∞
x−mX
√
Z2σ
1 2 −t2 1 1 x − mX
= √ e dt = + erf √
2 π 2 2 2σ
−∞
= 1 − erf(x)
Gaussian Q–function
The integral over the tail [x, ∞) of a normal distribution (= Gaus-
sian distribution with mX = 0, σ 2 = 1) is referred to as the Gaus-
sian Q–function:
Z∞
1 2
Q(x) = √ e−t /2 dt
2π
x
Characteristic function
Z∞
ψ(jv) = ejvx p(x) dx
−∞
Z∞
1 2 2
= ejvx √ e(x−mX ) /(2σ ) dx
2πσ
−∞
2 σ 2 /2
= ejvmX −v
Moments
Central moments:
k 1 · 3 · · · (k − 1)σ k even k
E{(X − mX ) } = µk =
0 odd k
Non–central moments
k
X k
E{X k } = miX µk−i
i=0
i
Note: All higher order moments of a Gaussian RV can be expressed
in terms of its first and second order moments.
Y
n
2 σ 2 /2
= ejvmi −v i
i=1
2 σ 2 /2
= ejvmY −v Y
with
X
n
mY = mi
i=1
X
n
σY2 = σi2
i=1
Given:
– Vector X = (X1, X2, . . . , Xn)T of n Gaussian RVs
PDF
1 1
p(x) = p exp − (x − mX )T M −1 (x − mX )
(2π)n/2 |M | 2
Special case: n = 2
m1 σ12 µ12
mX = , M=
m2 µ12 σ22
with the joint central moment
µ12 = E{(X1 − m1)(X2 − m2)}
get
1
p(x1, x2) = p
2πσ1σ2 1 − ρ2
σ22(x1 − m1)2 − 2ρσ1σ2(x1 − m1)(x2 − m2) + σ12(x2 − m2)2
· exp −
2σ12σ22(1 − ρ2)
Observe that for ρ = 0 (X1 and X2 uncorrelated) the joint PDF
can be factored into p(x1, x2) = pX1 (x1)·pX2 (x2). This means that
two uncorrelated Gaussian RVs are also statistically independent.
Note that this is not true for other distributions. On the other
hand, statistically independent RVs are always uncorrelated.
Linear transformation
– Given: Linear transformation Y = A X, where A denotes a
non–singular matrix
2
Most important case: X and Y are uncorrelated and σX = σY2 =
σ 2 (in this case, Z is also referred to as a proper Gaussian RV)
PDF
pZ (z) = pXY (x, y) = pX (x) pY (y)
1 2 2 1 2 2
= √ e−(x−mX ) /(2σ ) · √ e−(y−mY ) /(2σ )
2πσ 2πσ
1 −((x−mX )2+(y−mY )2 )/(2σ2 )
= e
2πσ 2
1 −|z−mZ |2/σ2
= 2 e Z
πσZ
with
mZ = E{Z} = mX + jmY
and
σZ2 = E{|Z − mZ |2} = σX
2
+ σY2 = 2σ 2
PDF
1 H −1
pZ (z) = exp −(z − mZ ) M Z (z − mZ )
π n |M Z |
with
mZ = E{z} = mX + jmY
and
M Z = E{(z − mZ )(z − mZ )H } = M X + M Y
The “tail probability” (area under the tail of PDF) often has to
be evaluated to determine the error probability of digital commu-
nication systems
Chernoff Bound
The tail probability is given by
Z∞
P (X ≥ δ) = p(x) dx
δ
Z∞
= g(x) p(x) dx
−∞
= E{g(X)}
eα(X−δ)
g(X)
1
δ X
Therefore, we get the bound
P (X ≥ δ) = E{g(X)}
≤ E{eα(X−δ)}
= e−α δ E{eα X }
P (X ≥ δ) ≤ e−αopt δ E{eαopt X }
2
For n → ∞ Y is a Gaussian RV with zero mean and variance σX .
For a stationary process φ(t1, t2) does not depend on the specific
time instances t1, t2, but on the difference τ = t1 − t2:
E{Xt1 Xt2 } = φ(t1, t2) = φ(t1 − t2) = φ(τ )
Note that φ(τ ) = φ(−τ ) (φ(·) is an even function) since E{Xt1 Xt2 } =
E{Xt2 Xt1 } is valid.
Example:
ACF of an uncorrelated stationary process: φ(τ ) = δ(τ )
φ(τ )
Wide–sense stationarity:
If the first and second order moments of a stochastic process are
invariant to any time shift τ , the process is referred to as wide–
sense stationary process. Wide–sense stationary processes are not
necessarily stationary in the strict sense.
Gaussian process:
Since Gaussian RVs are fully specified by their first and second
order moments, in this special case wide–sense stationarity auto-
matically implies stationarity in the strict sense.
Ergodicity:
We refer to a process X(t) as ergodic if its statistical averages
can also be calculated as time–averages of sample functions. Only
(wide–sense) stationary processes can be ergodic.
For example, if X(t) is ergodic and one of its sample functions
(i.e., one of its realizations) is denoted as x(t), the mean and the
and
ZT
1
φ(τ ) = lim x(t)x(t + τ ) dt,
T →∞ 2T
−T
respectively. In practice, it is usually assumed that a process is
(wide–sense) stationary and ergodic. Ergodicity is important since
in practice only sample functions of a stochastic process can be
observed!
Let X(t) and Y (t) denote two stochastic processes and consider
the RVs Xti = X(ti), i = 1, 2, . . . , n, and Yt′j = Y (t′j ), j =
1, 2, . . . , m at times t1 > t2 > . . . > tn and t′1 > t′2 > . . . > t′m,
respectively. The two stochastic processes are fully characterized
by their joint PDF
p(xt1 , xt2 , . . . , xtn , yt′1 , yt′2 , . . . , yt′m )
Statistical independence
Two processes X(t) and Y (t) are statistical independent if and
only if
p(xt1 , xt2 , . . . , xtn , yt′1 , yt′2 , . . . , yt′m ) =
p(xt1 , xt2 , . . . , xtn ) p(yt′1 , yt′2 , . . . , yt′m )
Uncorrelated processes
Two processes X(t) and Y (t) are uncorrelated if and only if
µXY (t1, t2) = 0
holds.
as
φZZ (t1, t2) = E{Zt1 Zt∗2 }
= E{(Xt1 + jYt1 )(Xt2 − jYt2 )}
= φXX (t1, t2) + φY Y (t1, t2) + j(φY X (t1, t2) − φXY (t1, t2))
where φXX (t1, t2), φY Y (t1, t2) and φY X (t1, t2), φXY (t1, t2) denote
the ACFs and the CCFs of X(t) and Y (t), respectively.
Note that our definition of φZZ (t1, t2) differs from the Textbook,
where φZZ (t1, t2) = 12 E{Zt1 Zt∗2 } is used!
If Z(t) is a stationary process we get
φZZ (t1, t2) = φZZ (t2 + τ, t2) = φZZ (τ )
with τ = t1 − t2.
We can also establish the symmetry relation
φZZ (τ ) = φ∗ZZ (−τ )
Z∞
Φ(f ) = F{φ(τ )} = φ(τ ) e−j2πf τ dτ
−∞
Z∞
φ(τ ) = F −1 {Φ(f )} = Φ(f ) ej2πf τ df
−∞
Example:
Power spectrum of an uncorrelated stationary process:
Φ(f ) = F{δ(τ )} = 1
Φ(f )
1
= E{|Xt|2} ≥ 0
Z∞
Φ∗(f ) = φ∗ (τ ) ej2πf τ dτ
−∞
Z∞
= φ∗ (−τ ) e−j2πf τ dτ
−∞
Z∞
= φ(τ ) e−j2πf τ dτ
−∞
= Φ(f )
Cross–correlation spectrum
Consider the random processes X(t) and Y (t) with CCF φXY (τ ).
The cross–correlation spectrum ΦXY (f ) is defined as
Z∞
ΦXY (f ) = φXY (τ ) e−j2πf τ dτ
−∞
Let the signal x(t) be the input to the system h(t). Then the
output y(t) of the system can be expressed as
Z∞
y(t) = h(τ ) x(t − τ ) dτ
−∞
Mean of Y (t)
Z∞
mY = E{Y (t)} = h(τ ) E{X(t − τ )} dτ
−∞
Z∞
= mX h(τ ) dτ = mX H(0)
−∞
ACF of Y (t)
φY Y (t1, t2) = E{Yt1 Yt∗2 }
Z∞ Z∞
= h(α) h∗(β) E{X(t1 − α)X ∗(t2 − β)} dα dβ
−∞ −∞
Z∞ Z∞
= h(α) h∗(β) φXX (t1 − t2 + β − α) dα dβ
−∞ −∞
Z∞ Z∞
= h(α) h∗(β) φXX (τ + β − α) dα dβ
−∞ −∞
= φY Y (τ )
φY Y (τ ) = φhh (τ ) ∗ φXX (τ )
is valid.
As an example, we may choose H(f ) = 1 for f1 ≤ f ≤ f2 and
H(f ) = 0 outside this interval, and obtain
Zf2
ΦXX (f ) df ≥ 0
f1
Z∞
= h(α) φXX (t1 − t2 − α) dα
−∞
= h(τ ) ∗ φXX (τ )
= φY X (τ )
with τ = t1 − t2.
Cross–spectrum
ΦY X (f ) = F{φY X (τ )} = H(f ) ΦXX (f )
ACF
Z∞ Z∞
φ[n, k] = E{XnXk∗} = xn x∗k p(xn, xk ) dxndxk
−∞ −∞
Covariance function
µ[n, k] = φ[n, k] − E{Xn}E{Xk∗}
Power spectrum
The power spectrum of X[n] is the (discrete–time) Fourier trans-
form of the ACF φ[λ]
X
∞
Φ(f ) = F{φ[λ]} = φ[λ]e−j2πf λ
λ=−∞
Z1/2
φ[λ] = F −1 {Φ(f )} = Φ(f ) ej2πf λ df
−1/2
– Frequency response
X
∞
H(f ) = F{h[n]} = h[n] e−j2πf n
n=−∞
– Mean of Y [n]
X
∞
mY = E{Y [n]} = h[k] E{X[n − k]}
k=−∞
X
∞
= mX h[k]
k=−∞
= mX H(0)
– ACF of Y [n]
Using the deterministic ”system” ACF
X
∞
φhh[λ] = h[λ] ∗ h∗[−λ] = h∗[k]h[k + λ]
k=−∞
Mean of X(t)
mX (t) = E{X(t)}
X
∞
= E{a[n]} g(t − nT )
n=−∞
X
∞
= ma g(t − nT )
n=−∞
ACF of X(t)
φXX (t + τ, t) = E{X(t + τ )X ∗(t)}
X
∞ X
∞
= E{a∗[n]a[m]} g ∗(t − nT )g(t + τ − mT )
n=−∞ m=−∞
X
∞ X
∞
= φaa [m − n] g ∗(t − nT )g(t + τ − mT )
n=−∞ m=−∞
Time–average ACF
The ACF φXX (t + τ, t) depends on two parameters t and τ . Often
we are only interested in the time–average ACF defined as
ZT /2
1
φ̄XX (τ ) = φXX (t + τ, t) dt
T
−T /2
S(f )
B
−fc fc f
0, f <0
u(f ) = 1/2, f = 0 .
1, f >0
u(f )
1
1/2
j, f < 0
H(f ) = F{h(t)} = 0, f = 0 .
−j, f > 0
H(f )
j
f
−j
Example:
S(f )
−fc fc f
S+(f )
2
fc f
Sb(f )
√
2
Quadrature Modulation
Problem Statement: Assume we have two real–valued low-
pass signals x(t) and y(t) whose spectra X(f ) = F{x(t)} and
Y (f ) = F{y(t)} are zero for |f | > f0. We wish to transmit these
lowpass signals over a passband channel in the frequency range
fc − f0 ≤ |f | ≤ fc + f0, where fc > f0. How should we generate
the corresponding passband signal?
Solution: As shown before, the corresponding passband signal
can be generated as
√ √
s(t) = 2 x(t)cos(2πfct) − 2 y(t)sin(2πfct).
The lowpass signals x(t) and y(t) are modulated using the (or-
thogonal) carriers cos(2πfct) and sin(2πfct), respectively. x(t) and
y(t) are also referred to as the inphase and quadrature compo-
nent, respectively, and the modulation scheme is called quadrature
modulation. Often x(t) and y(t) are also jointly addressed as the
quadrature components of sb(t).
√
2 cos(2πfct)
x(t)
s(t)
y(t)
√
− 2 sin(2πfct)
Demodulation
At the receiver side, the quadrature components x(t) and y(t) have
to be extracted from the passband signal s(t).
Using the relation
1
sb (t) = x(t) + jy(t) = √ [s(t) + j ŝ(t)] e−j2πfct ,
2
we easily get
1
x(t) = √ [s(t)cos(2πfct) + ŝ(t)sin(2πfct)]
2
1
y(t) = √ [ŝ(t)cos(2πfct) − s(t)sin(2πfct)]
2
+ x(t)
+
−
Hilbert y(t)
Transform +
HLP(f ) x(t)
s(t)
HLP(f ) y(t)
√
− 2 sin(2πfct)
HLP(f )
1
−fLP fLP f
Energy of sb (t)
For calculation of the energy of sb(t), we need to express the spec-
trum S(f ) of s(t) as a function of Sb (f ):
S(f ) = F{s(t)}
Z∞ n√ o
j2πfc t
= Re 2sb(t)e e−j2πf t dt
−∞
√ Z∞
2
= sb(t)ej2πfct + s∗b (t)e−j2πfct e−j2πf t dt
2
−∞
1
= √ [Sb (f − fc ) + Sb∗(−f − fc)]
2
Z∞ Z∞
E= |Sb (f )|2 df = |sb (t)|2 dt
−∞ −∞
H(f ), f ≥ 0
Hb (f − fc) =
0, f <0
equivalent?
Obviously, we have
R(f ) = F{r(t)} = H(f )S(f )
1
= √ [Sb (f − fc) + S ∗ (−f − fc)] [Hb (f − fc ) + Hb∗(−f − fc )]
2
Since both s(t) and h(t) have narrowband character
Sb (f − fc )Hb∗(−f − fc) = 0
Sb∗ (−f − fc )Hb(f − fc) = 0
is valid, and we get
1
R(f ) = √ [Sb (f − fc)Hb (f − fc ) + S ∗(−f − fc)Hb∗(−f − fc)]
2
1
= √ [Rb(f − fc) + Rb∗(−f − fc)]
2
This result shows that we can get the output r(t) of the linear system
h(t) by transforming the output rb (t) of the equivalent baseband system
hb (t) into passband. This is an important result since it shows that,
without loss of generality, we can perform linear filtering operations
always in the equivalent baseband domain. This is very convenient for
example if we want to simulate a communication system that operates
in passband using a computer.
ΦN N (f )
B
−fc fc f
Stationarity of n(t)
Since n(t) is assumed to be wide–sense stationary, the correlation func-
tions φXX (τ ) = E{x(t + τ )x∗(t)}, φY Y (τ ) = E{y(t + τ )y ∗ (t)}, and
φXY (τ ) = E{x(t + τ )y ∗(t)} have to fulfill certain conditions as will be
shown in the following.
The ACF of n(t) is given by
φNN (τ, t + τ ) = E{n(t)n(t + τ )}
= E {2 [x(t)cos(2πfct) − y(t)sin(2πfct)]
[x(t + τ )cos(2πfc(t + τ )) − y(t + τ )sin(2πfc(t + τ ))]}
= [φXX (τ ) + φY Y (τ )] cos(2πfcτ )
+ [φXX (τ ) − φY Y (τ )] cos(2πfc(2t + τ ))
− [φY X (τ ) − φXY (τ )] sin(2πfcτ )
− [φY X (τ ) + φXY (τ )] sin(2πfc(2t + τ )),
ACF of z(t)
Using the above conditions, the ACF φZZ (τ ) of z(t) is easily calculated
as
φZZ (τ ) = E{z(t + τ )z ∗(t)}
= φXX (τ ) + φY Y (τ ) + j(φY X (τ ) − φXY (τ ))
= 2φXX (τ ) + j2φY X (τ )
White Noise
In the region of interest (i.e., where the transmit signal has non–zero
frequency components), ΦNN (f ) can often be approximated as flat,
i.e.,
N0/2, fc − B/2 ≤ |f | ≤ fc + B/2
ΦNN (f ) =
0, otherwise
ΦN N (f )
B
N0/2
−fc fc f
ΦZZ (f )
N0
− B2 B f
2
This means that x(t) and y(t) are mutually uncorrelated, white pro-
cesses with equal variances.
Note that although white noise is convenient for analysis, it is not phys-
ically realizable since it would have an infinite variance.
n(t)
equivalent
z(t)
Linear Independence
Vector x is linearly independent of vectors v j , 1 ≤ j ≤ n, if there
is no set of coefficients aj , 1 ≤ j ≤ n, for which
Xn
x= aj v j
j=1
is valid.
Triangle Inequality
The triangle inequality states that
||v 1 + v 2|| ≤ ||v 1|| + ||v 2||,
where equality holds if and only if v 1 = a v 2, where a is positive
real valued, i.e., a ≥ 0.
Cauchy–Schwarz Inequality
The Cauchy–Schwarz inequality states that
|v 1 • v 2| ≤ ||v 1|| ||v 2||
is true. Equality holds if v 1 = b v 2, where b is an arbitrary
(complex–valued) scalar.
Gram–Schmidt Procedure
– Enables construction of an orthonormal basis for a given set
of vectors.
– Given: Set of n–dimensional vectors v j , 1 ≤ j ≤ m, which
span an n1–dimensional vector space with n1 ≤ max{n, m}.
– Objective: Find orthonormal basis vectors uj , 1 ≤ j ≤ n1.
– Procedure:
1. First Step: Normalize first vector of set (the order is arbi-
trary).
v1
u1 =
||v 1||
2. Second Step: Identify that part of v 2 which is orthogonal
to u1.
u′2 = v 2 − (v 2 • u1) u1,
where (v 2 • u1) u1 is the projection of v 2 onto u1. There-
fore, it is easy to show that u′2 • u1 = 0 is valid. Thus, the
second basis vector u2 is obtained by normalization of u′2
u′2
u2 =
||u′2||
3. Third Step:
u′3 = v 3 − (v 3 • u1) u1 − (v 3 • u2) u2
u′3
u3 =
||u′3||
4. Repeat until v m has been processed.
– Remarks:
1. The number n1 of basis vectors is smaller or equal max{n, m},
i.e., n1 ≤ max{n, m}.
2. If a certain vector v j is linearly dependent on the previously
found basis vectors ui , 1 ≤ i ≤ j − 1, u′j = 0 results and
we proceed with the next element v j+1 of the set of vectors.
3. The found set of basis vectors is not unique. Different sets
of basis vectors can be found e.g. by changing the processing
order of the vectors v j , 1 ≤ j ≤ m.
Inner Product
The inner product of two (complex–valued) signals x1(t) and x2(t)
is defined as
Zb
< x1(t), x2(t) > = x1(t)x∗2(t) dt, b ≥ a,
a
Energy
The energy of a signal x(t) is defined as
Z∞
E = ||x(t)||2 = |x(t)|2 dt.
−∞
Linear Independence
m signals are linearly independent if and only if no signal of the
set can be represented as a linear combination of the other m − 1
signals.
Triangle Inequality
Similar to the vector space case the triangle inequality states
||x1(t) + x2(t)|| ≤ ||x1(t)|| + ||x2(t)||,
where equality holds if and only if x1(t) = ax2(t), a ≥ 0.
Cauchy–Schwarz Inequality
| < x1(t), x2(t) > | ≤ ||x1(t)|| ||x2(t)||,
where equality holds if and only if x1(t) = bx2(t), where b is
arbitrary complex.
Given:
– (Complex–valued) signal s(t) with finite energy
Z∞
Es = |s(t)|2 dt
−∞
Objective:
Find “best” approximation for s(t) in terms of fn (t), 1 ≤ n ≤ K.
The approximation ŝ(t) is given by
X
K
ŝ(t) = snfn (t),
n=1
Optimize Coefficients sn
In order to find the optimum coefficients sn, 1 ≤ n ≤ K, which
minimize Ee , we have to differentiate Ee with respect to s∗n , 1 ≤
n ≤ K,
Z∞ " X
K
#
∂Ee
∗
= s(t) − sk fk (t) fn∗ (t) dt = 0, n = 1, . . . , K,
∂sn
−∞ k=1
Emin = 0
If Emin = 0 is valid, we can represent s(t) as
X
K
s(t) = sk fk (t),
k=1
Gram–Schmidt Procedure
– Given: M signal waveforms {si (t), i = 1, . . . , M }.
– Objective: Construct a set of orthogonal waveforms {fi(t)}
from the original set {si(t)}.
– Procedure
1. First Step:
s1(t)
f1 (t) = √ ,
E1
where the energy E1 of signal s1 (t) is given by
Z∞
E1 = |s1(t)|2 dt
−∞
2. Second Step:
Calculate the inner product of s2(t) and f1(t)
Z∞
c12 = s2(t)f1∗(t) dt
−∞
with
Z∞
Ej = |fj′ (t)|2 dt, j = 2, 3, . . .
−∞
3. kth Step
X
k−1
fk′ (t) = sk (t) − cik fi (t)
i=1
with
Z∞
cik = sk (t)fi∗(t) dt
−∞
Example:
s1 (t) s2(t)
1 1
2 3
2 t t
−1
1. First Step:
Signal s1(t) has energy
Z∞
E1 = |s1(t)|2 dt = 2.
−∞
f1(t)
√1
2
2 t
2. Second Step:
The inner product between the first basis function f1(t) and
Therefore, we obtain
f2′ (t) = s2(t) − c12f1(t)
= s2(t) − s1(t)
Since the energy of f2′ (t) is E2 = 1, the second basis function
is
f2(t) = s2(t) − s1(t)
f2(t)
2 3
t
−1
with coefficients
Z∞
skn = sk (t)fn∗(t) dt, n = 1, 2, . . . , N.
−∞
Example:
Continuation of
√previous example:
Since s1(t) = 2f1(t), the signal space representation of s1(t) is
simply
√
s1 = [ 2 0]T .
For s21 and s22 we get, respectively,
Z∞ √
∗
s21 = s2(t)f1 (t) dt = 2
−∞
and
Z∞
s22 = s2(t)f2∗(t) dt = 1.
−∞
Consequently, the signal space representation of s2(t) is
√
s2 = [ 2 1]T .
A graphical representation of the signal space for this example is
given below.
f2(t)
1 s2
1 s1 f1(t)
Using the signal space representation, e.g. the inner product be-
Inner Product
We express the inner product between sm (t) and sk (t) in terms of
the inner product between sbm (t) and sbk (t).
Z∞ Z∞
sm (t)s∗k (t) dt = 2 Re sbm (t)ej2πfct Re sbk (t)ej2πfct dt
−∞ −∞
Z∞
2 j2πfc t
= sbm (t)e + s∗bm (t)e−j2πfct
4
−∞
j2πfc t
sbk (t)e + s∗bk (t)e−j2πfct dt
Using a generalized form of Parsevals Theorem
Z∞ Z∞
x(t)y ∗(t) dt = X(f )Y ∗(f ) df
−∞ −∞
and the fact that both sm (t) and sk (t) are narrowband signals, the
above integral can be simplified to
Z∞ Z∞
1
sm(t)s∗k (t) dt = ∗
[Sbm (f − fc )Sbk (f − fc) +
2
−∞ −∞
∗
Sbm (−f − fc )Sbk (−f − fc)] df
Z∞
1 ∗ ∗
= [Sbm (f )Sbk (f ) + Sbm (f )Sbk (f )] df
2
−∞
Z∞
1
= [sbm (t)s∗bk (t) + s∗bm (t)sbk (t)] dt
2
−∞
Z∞
This result shows that the inner product of the passband signals is
identical to the real part of the inner product of the corresponding
baseband signals.
< sm (t), sk (t) > = Re{< sbm (t), sbk (t) >}
If we have a signal space representation of passband and baseband
signals respectively, we can rewrite the above equation as
sm • sk = Re{sbm • sbk },
where sm and sk denote the signals space representation of sm (t)
and sk (t), respectively, whereas sbm and sbk denote those of sbm (t)
and sbk (t), respectively.
Correlation ρkm
In general, the (cross–)correlation between two signals allows to
quantify the “similarity” of the signals. The complex correlation
ρbkm of two baseband signals sbk (t) and sbm (t) is defined as
< sbk (t), sbm (t) >
ρbkm = √
Ek Em
Z∞
1
= √ sbk (t)s∗bm (t) dt
Ek Em
−∞
sbk • sbm
= p
||sbk || · ||sbm ||
Similarly, the correlation of the passband signals is defined as
< sk (t), sm (t) >
ρkm = √
Ek Em
Z∞
1
= √ sk (t)sm(t) dt
Ek Em
−∞
sk • sm
= p
||sk || · ||sm ||
= Re{ρbkm}
This important result shows once again, that the baseband rep-
resentation is really equivalent to the passband signal. We will
see later that the Euclidean distance between signals used for digi-
tal communication determines the achievable error probability, i.e.,
Modulation:
serial
{an} select sm(t)
waveform
parallel
k bits
Transmitted Waveform
The transmitted waveform s(t) for continuous transmission is given
by
X
∞
s(t) = sm (t − kT ).
k=−∞
Example:
sb(t)
d
T
−T T 2T t
− Td
Signal Energy
ZT
Em = |sbm (t)|2 dt
0
ZT
= A2m |g(t)|2 dt = A2m Eg
0
Example:
1. M = 2
m=1 2
p p
−d Eg d Eg
m=1 2 3 4
00 01 11 10
= 2 Eg d.
Special Case: M = 2
In this case, the signal set contains only two functions:
√
s1(t) = − 2dg(t)cos(2πfct)
√
s2(t) = 2dg(t)cos(2πfct).
Obviously,
s1(t) = −s2(t)
is valid. The correlation in that case is
ρ12 = −1.
Because of the above properties, s1(t) and s2(t) are referred to as
antipodal signals. 2PAM is an example for antipodal modulation.
Signal Energy
ZT
Em = |sbm (t)|2 dt
0
ZT
= |g(t)|2 dt = Eg
0
This means that all signals of the set have the same energy.
Example:
1. M = 2
2 m=1
2. M = 4
m=1
3
4
3. M = 8
3 2
4
m=1
5
6 8
7
Euclidean Distance
demn = ||sbm − sbn ||
qp p
= | Eg e jΘ m − Eg ejΘn |2
q
= Eg (2 − 2Re{ej(Θm−Θn) })2
q
= 2Eg (1 − cos[2π(m − n)/M ])
The minimum Euclidean distance of M PSK is given by
q
e
dmin = 2Eg (1 − cos[2π/M ])
Signal Energy
ZT
Em = |sbm (t)|2 dt
0
ZT
= |Am|2 |g(t)|2 dt = |Am|2Eg .
0
Euclidean Distance
demn = ||sbm − sbn ||
qp
= | Eg (Am − An )|2
p
= Eg |Am − An|
p p
= Eg (Acm − Acn )2 + (Asm − Asn )2
How to Choose Amplitudes
So far, we have not specified the amplitudes Acm and Asm . In
principle, arbitrary sets of amplitudes Acm and Asm can be cho-
sen. However, in practice, usually the spacing of the amplitudes is
equidistant, i.e.,
√
Acm , Asm ∈ {±d, ±3d, . . . , ±( M − 1)d}.
In that case, the minimum distance of M QAM is
e
p
dmin = 2 Eg d
Example:
M = 16:
Example:
Assume N1 = 2, N2 = 3:
f
fc + 2∆f
fc + ∆f
fc
T 2T t
Example:
M = 2:
s2
s1
Euclidean Distance
demn = ||sm − sn||
√
= 2E.
This
√ means the Euclidean distance between any two signal points is
e
√2E. Therefore, the minimum Euclidean distance is also dmin =
2E.
Biorthogonal Signals
A set of 2M biorthogonal signals is derived from a set of M or-
thogonal signals {sm(t)} by also including the negative signals
{−sm(t)}.
For biorthogonal signals√the Euclidean distance
√ between pairs of
signals is either demn = 2E or demn = 2 E. The correlation is
either ρmn = −1 or ρmn = 0.
Example:
M = 2:
s2
s3 = −s1 s1
s4 = −s2
Simplex Signals
– Consider a set of M orthogonal waveforms sm (t), 1 ≤ m ≤ M .
– The mean of the waveforms is
√
1 XM
E
s̄ = sm = 1M ,
M m=1 M
where 1M is the M –dimensional all–ones column vector.
– In practice, often zero–mean waveforms are preferred. There-
fore, it is desirable to modify the orthogonal signal set to a
signal set with zero mean.
– The mean can be removed by
s′m = sm − s̄.
The set {s′m } has zero mean and the waveforms s′m are called
simplex signals.
– Energy
||s′m||2 = ||sm − s̄||2 = ||sm ||2 − 2sm • s̄ + ||s̄||2
1 1 1
= E−2 E + E =E 1−
M M M
We observe that simplex waveforms require less energy than
orthogonal waveforms
√ to achieve the same minimum Euclidean
distance demin = 2E.
– Correlation
s′m • s′n −1/M 1
ρmn = ′ = = −
||sm|| · ||s′n|| 1 − 1/M M −1
Simplex waveforms are equally correlated.
DPSK
In DPSK, the bits are mapped to the difference of two consecutive
signal phases. If we denote the phase difference by ∆Θm , then in
M DPSK k = log2 M bits are mapped to
∆Θm = 2π(m − 1)/M, m ∈ {1, 2 . . . , M }.
If we drop again the symbol index m, and introduce the time index
k, the absolute phase Θ[k] of the transmitted symbol is obtained
as
Θ[k] = Θ[k − 1] + ∆Θ[k].
Since the information is carried in the phase difference, in the
receiver detection can be performed based on the phase difference
of the waveforms received in two consecutive symbol intervals, i.e.,
knowledge of the absolute phase of the received signal is not re-
quired.
− T1 1
T f
CPFSK Signal
– We first consider the PAM signal
X
∞
d(t) = I[k]g(t − kT ),
k=−∞
T t
X
k−1 ZkT
= 4πfdT I[n] g(τ − nT ) dτ
n=−∞
|
−∞
{z }
1
2
Z
(k+1)T
X
k−1
= 2πfdT I[n] +4πfdT I[k]q(t − kT )
| {z
n=−∞
}
Θ[k]
= Θ[k] + 2πhI[k]q(t − kT ),
where
h = 2fdT : modulation index
Θ[k]: accumulated phase memory
q(t)
1
2
T t
CPM Waveform
The CPM waveform in complex baseband representation is given
by
r
E
sb (t) = exp(j[φ(t, I) + φ0]),
T
whereas the corresponding passband signal is
r
2E
s(t) = cos[2πfct + φ(t, I) + φ0],
T
where the information–carrying phase in the interval kT ≤ t ≤
(k + 1)T is given by
X
k
φ(t, I) = 2πh I[n]q(t − nT ).
n=−∞
Example:
1
1 2
2T
T t T t
L = 2:
g(t) q(t)
1
1 2
4T
2T t 2T t
1 1
T 2
T t T t
L = 2:
g(t) q(t)
1/2
2T t 2T t
BT = 1.0
BT = 0.1
−5 −1 1 5 t/T
Phase Tree
The phase trajectories φ(t, I) of all possible input sequences
{I[k]} can be represented by a phase tree. For construction of
the tree it is assumed that the phase at a certain time t0 (usually
t0 = 0) is known. Usually, φ(0, I) = 0 is assumed.
Example:
Binary CPFSK
In the interval kT ≤ t ≤ (k + 1)T , we get
X
k−1
φ(t, I) = πh I[n] + 2πhI[k]q(t − kT ),
n=−∞
φ(t, I)
2hπ
hπ
T 2T t
−hπ
−2hπ
Phase Trellis
The phase trellis is obtained by plotting the phase trajectories
modulo 2π. If h is furthermore given by
q
h= ,
p
where q and p are relatively prime integers, it can be shown that
φ(kT, I) can assume only a finite number of different values.
These values are called the (terminal) states of the trellis.
Example:
Full response CPM with odd q
In that case, we get
X
k
φ(kT, I) = πh I[n]
n=−∞
q
= π {0, ±1, ±2, . . .}.
p
From the above equation we conclude that φ(kT, I) mod2π can
assume only the values
πq πq πq
ΘS = 0, , 2 , . . . , (2p − 1)
p p p
If we further specialize this result to CPFSK with h = 12 , which is
also referred to as minimum shift–keying (MSK), we get
π 3π
ΘS = 0, , π,
2 2
3π
2
−1 +1
π +1 −1
φ(t, I) −1 +1
−1
π
2 +1 −1
+1
0
T 2T t
Number of States
More generally, it can be shown that for general M –ary CPM the
number S of states is given by
pM L−1, even q
S=
2pM L−1, odd q
The number of states is an important indicator for the receiver
complexity required for optimum detection of the corresponding
CPM scheme.
Phase State Diagram
Yet another representation of the CPM phase results if we just
display the terminal phase states along with the possible transitions
between these states. This representation is referred to as phase
state diagram and is more compact than the trellis representation.
However, all information related to time is not contained in the
phase state diagram.
Example:
1
Binary CPFSK with h = 2
−1
0 π
2
+1
+1 −1 +1 −1
+1
3π
2 −1 π
g(t)
j
Although the equivalence between MSK and OQPSK only holds
for the special g(t) given above, in practice often other transmit
pulse shapes are employed for OQPSK. In that case, OQPSK does
not have a constant envelope but the variation of the envelope is
still smaller than in case of conventional QPSK.
Important Properties of CPM
– CPM signals have a constant envelope, i.e., |sb (t)| = const.
Therefore, efficient nonlinear amplifiers can be employed to
boost the CPM transmit signal.
– For low–to–moderate M (e.g. M ≤ 4) CPM can be made more
power and bandwidth efficient than simpler linear modulation
schemes such as PSK or QAM.
– Receiver design for CPM is more complex than for linear mod-
ulation schemes.
Given:
Passband signal
√ j2πfc t
s(t) = 2Re sb (t)e
with baseband signal
X
∞
sb (t) = I[k]g(t − kT )
k=−∞
where
– I[k]: Sequence of symbols, e.g. PAM, PSK, or QAM symbols
Note: I[k] is complex valued for PSK and QAM.
ACF of sb (t)
φSb Sb (t + τ, t) = E{s∗b (t)sb(t + τ )}
( ! !)
X∞ X
∞
=E I ∗[k]g ∗(t − kT ) I[k]g(t + τ − kT )
k=−∞ k=−∞
X
∞ X
∞
= E{I ∗[k]I[n]} g ∗(t − kT )g(t + τ − nT )
n=−∞ k=−∞
Since both mean µSb (t) and ACF φSb Sb (t + τ, t) are periodic, sb (t)
is a cyclostationary process with period T .
X
∞ X
∞ ZT /2
1
= φII [λ] g ∗(t − kT ) ·
T
λ=−∞ k=−∞
−T /2
g(t + τ − kT − λT ) dt
X
∞ X
∞ Z
T /2−kT
1
= φII [λ] g ∗ (t′) ·
T
λ=−∞ k=−∞
−T /2−kT
g(t + τ − λT ) dt′
′
X
∞ Z∞
1
= φII [λ] g ∗ (t′)g(t′ + τ − λT ) dt′
T
λ=−∞ −∞
and obtain
1 X
∞
φ̄Sb Sb (τ ) = φII [λ]φgg (τ − λT )
T
λ=−∞
we obtain for the average power spectral density of sb (t) the elegant
expression
1
ΦSb Sb (f ) = |G(f )|2ΦII (f )
T
Example:
1. Rectangular Pulse
g(t)
T t
The frequency response of g(t) is given by
sin(πf T ) −jπf T
G(f ) = AT e
πf T
Thus, the spectrum of sb (t) is
2
sin(πf T )
ΦSbSb (f ) = σI2A2 T + A2 |µI |2δ(f )
πf T
1.2
0.8
ΦSbSb (f )
0.6
0.4
0.2
0
−3 −2 −1 0 1 2 3
fT
1.2
0.8
g(t)
0.6
0.4
0.2
0
−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4
0.25
0.2
ΦSbSb (f )
0.15
0.1
0.05
0
−3 −2 −1 0 1 2 3
fT
−1
10
−2
10
−3
10
−4
10
ΦSbSb (f )
−5
10
−6
10
−7
10
−8
10
−9
10
−10
10
−3 −2 −1 0 1 2 3
fT
Example:
In this chapter, we derive the optimum receiver structures for the modu-
lation schemes introduced in Chapter 3 and analyze their performance.
Problem Formulation
– We first consider memoryless linear modulation formats. In
symbol interval 0 ≤ t ≤ T , information is transmitted using
one of M possible waveforms sm (t), 1 ≤ m ≤ M .
– The received passband signal r(t) is corrupted by real–valued
AWGN n(t):
r(t) = sm (t) + n(t), 0 ≤ t ≤ T.
n(t), N0/2
sm(t) r(t)
z(t), N0
sbm(t) rb(t)
r(t) r Decision
Demodulator Detector
4.1.1 Demodulation
= smk + nk , 1≤k≤N
f1∗ (t)
RT
0 (·) dt r1
f2∗ (t)
RT
0 (·) dt r2
r(t)
...
...
fN∗ (t)
RT
0 (·) dt rN
Since n′ (t) does not lie in the signal space spanned by the basis
functions of sm (t), it is irrelevant for detection of sm (t). There-
fore, without loss of optimality, we can estimate the transmitted
waveform sm(t) from r instead of r(t).
Properties of nk
– nk is a Gaussian random variable (RV), since n(t) is Gaussian.
– Mean:
T
Z
E{nk } = E n(t)fk∗(t) dt
0
ZT
= E{n(t)}fk∗(t) dt
0
= 0
– Covariance:
T T ∗
Z Z
E{nk nm } = E n(t)fk (t) dt n(t)fm(t) dt
∗ ∗ ∗
0 0
ZT ZT
= E{n(t)n∗(τ )} fk∗ (t)fm(τ ) dt dτ
| {z }
0 0 N0
2 δ(t−τ )
ZT
N0
= fm (t)fk∗(t)dt
2
0
N0
=
δ[k − m]
2
where δ[k] denotes the Kronecker function
1, k=0
δ[k] =
0, k 6= 0
We conclude that the N noise components are zero–mean, mu-
tually uncorrelated Gaussian RVs.
Conditional pdf of r
r can be expressed as
r = sm + n
with
sm = [sm1 sm2 . . . smN ]T ,
n = [n1 n2 . . . nN ]T .
Therefore, conditioned on sm vector r is Gaussian distributed and
we obtain
p(r|sm) = pn(r − sm)
YN
= pn(rk − smk ),
k=1
where pn(n) and pn(nk ) denote the pdfs of the Gaussian noise
vector n and the components nk of n, respectively. pn(nk ) is
given by
1 n2k
pn(nk ) = √ exp −
πN0 N0
since nk is a real–valued Gaussian RV with variance σn2 = N20 .
Therefore, p(r|sm ) can be expressed as
YN
1 (rk − smk )2
p(r|sm ) = √ exp −
πN0 N0
k=1
XN
(rk − smk )2
1 k=1
= exp − , 1≤m≤M
(πN0)N/2 N0
p(r|sm ) will be used later to find the optimum estimate for sm (or
equivalently sm (t)).
Role of n′(t)
We consider the correlation between rk and n′(t):
E{n′(t)rk∗} = E{n′(t)(smk + nk )∗}
= E{n′(t)} s∗mk + E{n′(t)n∗k }
| {z }
=0
X
N
= E n(t) − nj fj (t) n∗k
j=1
ZT X
N
∗
= E{n(t)n (τ )} fk (τ ) dτ − E{nj n∗k } fj (t)
| {z } | {z }
0 N0 j=1 N0
2 δ(t−τ ) 2 δ[j−k]
N0 N0
= fk (t) − fk (t)
2 2
= 0
We observe that r and n′(t) are uncorrelated. Since r and n′(t)
are Gaussian distributed, they are also statistically independent.
Therefore, n′(t) cannot provide any useful information that is rele-
vant for the decision, and consequently, r forms a sufficient statis-
tic for detection of sm (t).
y1(t)
f1∗ (T − t) r1
y2(t)
r(t) f2∗ (T − t) r2
...
...
yN (t)
fN∗ (T − t) rN
sample at
t=T
Example:
s(t) h(t)
A A
T t T t
y(t)
y(T )
T 2T t
Problem Formulation:
Solution:
m̂ = argmax {p(r|sm̃)}
m̃
The above MAP and ML decision rules are very general. They can be
applied to any channel as long as we are able to find an expression for
p(r|sm̃ ).
= argmax {ln(p(r|sm̃))}
m̃
( )
N 1 X
N
= argmax −ln(πN0) − |rk − sm̃k |2
m̃ 2 N0
( N ) k=1
X
= argmin |rk − sm̃k |2
m̃
k=1
2
= argmin ||r − sm̃ ||
m̃
Interpretation:
We select that vector sm̃ which has the minimum Euclidean dis-
tance
Example:
4QAM
m̂ = 2 m̂ = 1
m̂ = 3 m̂ = 4
Alternative Representation:
Using the expansion
||r − sm̃ ||2 = ||r||2 − 2Re{r • sm̃} + ||sm̃||2,
we observe that ||r||2 is independent of sm̃. Therefore, we can
further simplify the ML decision rule
2
m̂ = argmin ||r − sm̃||
m̃
= argmin −2Re{r • sm̃ } + ||sm̃ ||2
m̃
2
= argmax 2Re{r • sm̃} − ||sm̃ ||
m̃
T
Z 1
∗
= argmax Re r(t)sm̃(t) dt − Em̃ ,
m̃ 2
0
with
ZT
Em̃ = |sm̃ (t)|2 dt.
0
If we are dealing with passband signals both r(t) and s∗m̃ (t) are
real–valued, and we obtain
T
Z 1
m̂ = argmax r(t)sm̃(t) dt − Em̃
m̃ 2
0
s1 (t) 1
− 2
E1
RT
0 (·) dt
s2 (t) 1
− 2
E2 select m̃
RT
(·) dt that
0
corresponds
r(t) ... m̂
...
to maximum
sN (t) 1 input
− 2
EN
RT
0 (·) dt
Example:
T t
z(t), N0
Am sbm(t) rb m̂
g(t) Demodulation Detection
1. Demodulator
– Energy of transmit pulse
ZT
Eg = |g(t)|2 dt = a2T
0
– Correlation demodulator
ZT
rb = rb (t)f ∗(t) dt
0
ZT
1
= √ rb (t) dt
T
0
ZT ZT
1 1
= √ sbm (t) dt + √ z(t) dt
T T
| 0
{z } | 0
{z }
sbm z
= sbm + z
sbm is given by
ZT
1
sbm = √ Am g(t) dt
T
0
ZT
1
= √ Am a dt
T
√ 0 p
= a T Am = Eg Am.
On the other hand, the noise variance is
σz2 = E{|z|2}
ZT ZT
1
= E{z(t)z ∗(τ )} dt dτ
T | {z }
0 0 N0 δ(t−τ )
= N0
– pdf p(rb|Am):
p !
2
1 |rb − Eg Am |
p(rb|Am ) = exp −
πN0 N0
2. Optimum Detector
The ML decision rule is given by
m̂ = argmax {ln(p(rb|Am̃ ))}
m̃
p
= argmax {−|rb − Eg Am̃|2}
m̃
p
= argmin {|rb − Eg Am̃ |2}
m̃
m̂ = 1 m̂ = 2 m̂ = 3 m̂ = 4
1. Binary PAM (M = 2)
From the example in the previous section we know that the
detector input signal in this case is
p
rb = Eg Am + z, m = 1, 2,
where the noise variance of the complex baseband noise is σz2 =
N0.
Decision Regions
m̂ = 1 m̂ = 2
p p
− Eg d Eg d
Therefore, we get
Z∞ p !
2
1 (rR − (− Eg d))
P (e|s1) = √ exp − drR
πN0 N0
0
Z∞
1 x2
= √ exp − dx
2π r 2
2Eg
N0 d
r !
2Eg
= Q d
N0
√ p √
where we have used the substitution x = 2(rR + Eg d)/ N0
and the Q–function is defined as
Z∞ 2
1 t
Q(x) = √ exp − dt
2π 2
x
Note that binary PSK (BPSK) yields the same BEP as 2PAM.
Decision Rule
The ML decision rule is given by
m̂ = argmax 2r • sm̃ − ||sm̃||2
m̃
= argmax {r • sm̃ } ,
m̃
−1
10
−2
10
−3
10
2FSK
BER
−4
10
−5
10
2PAM
−6
10
0 2 4 6 8 10 12 14
Eb/N0 [dB]
– Signal Space
2PAM 2FSK √
d12 Eb
d12
√ √ √
− Eb Eb Eb
– Rule of Thumb:
In general, for a given average energy of the signal points,
the BEP of a linear modulation scheme is larger if the min-
imum Euclidean distance of the signals in the signal space
is smaller.
1 X
M
ES = Em
M m=1
1 X
M
2
= Eg d (2m − 1 − M )2
M
" m=1 M #
Eg d2 X XM XM
= 4 m2 −4(M + 1) m + (M + 1)2
M
| {z }
m=1
| {z } m=1
m=1
| {z }
1 M(M+1)(2M+1) 1 M(M+1) M(M+1)2
6 2
M2 − 1 2
= d Eg
3
m̂ = 1 m̂ = 2 m̂ = M
0
10
−1
M = 64
10
−2
M = 32
10
−3
10
−4
10 M = 16
−5
SEP
10
−6
10
−7
10
M =2 M =4 M =8
−8
10
0 2 4 6 8 10 12 14 16 18 20 22
Eb/N0 [dB]
−1
M = 64
10
10
−2 M = 32
−3
10
10
−4
M =4
10
−5 M = 16
SEP
10
−6 M =8
−7
10
M =2
−8
10
0 2 4 6 8 10 12 14 16 18 20 22
Eb/N0 [dB]
0
10
−1
10
−2
10
−3
10
10
−4 M = 64
10
−5 M = 16
SEP
10
−6 M =4
−7
10
−8
10
0 2 4 6 8 10 12 14 16 18 20 22
Eb/N0 [dB]
Although exact (and complicated) expressions for the SEP and BEP of
most regular linear modulation formats exist, it is sometimes more con-
venient to employ simple bounds and approximation. In this sections,
we derive the union upper bound valid for arbitrary signal constella-
tions and a related approximation for the SEP.
We consider an M –ary modulation scheme with M signal points
sm, 1 ≤ m ≤ M , in the signal space.
We denote the pairwise error probability of two signal points sµ
and sν , µ 6= ν by
PEP(sµ → sν ) = P (sµ transmitted, sν , detected)
1 XX
M M
PM ≤ PEP(sµ → sν )
M µ=1 ν=1
ν6=µ
M =4
0
10
−1
10
−2
10
−3
10
−4
10
10
−5 4PAM
SEP
−6
10 4PSK
4QAM
−7
10
−8
10
0 2 4 6 8 10 12 14 16
Eb/N0 [dB]
M = 16
0
10
−1
10
−2
10
−3
10
−4
10
−5
10
SEP
−6
10
10
−7
16QAM 16PSK 16PAM
−8
10
6 8 10 12 14 16 18 20 22 24 26
Eb/N0 [dB]
M = 64
0
10
−1
10
−2
10
−3
10
−4
10
64PAM
−5
10
SEP
−6
64PSK
10
10
−7
64QAM
−8
10
10 15 20 25 30 35
Eb/N0 [dB]
Passband Signal
We assume that the received passband signal can be modeled as
√
r(t) = 2Re ejφsbm (t) + z(t) ej2πfct ,
where z(t) is complex AWGN with power spectral density Φzz (f ) =
N0 , and φ is an unknown, random but constant phase.
– φ may originate from the local oscillator or the transmission
channel.
– φ is often modeled as uniformly distributed, i.e.,
1
2π
, −π ≤ φ < π
pΦ(φ) =
0, otherwise
pΦ(φ)
1/(2π)
−π π φ
Baseband Signal
The received baseband signal is given by
rb (t) = ejφsbm (t) + z(t).
Optimal Demodulation
Since the unknown phase results just in the multiplication of the
transmitted baseband waveform sbm (t) by a constant factor ejφ,
demodulators that are optimum for φ = 0 are also optimum for
φ 6= 0. Therefore, both correlation demodulation and matched–
filter demodulation are also optimum if the channel phase is un-
known. The demodulated signal in interval kT ≤ t ≤ (k + 1)T
can be written as
rb [k] = ejφsbm [k] + z[k],
where the components of the AWGN vector z[k] are mutually
independent, zero–mean complex Gaussian processes with variance
σz2 = N0 .
ejφ z(t)
ejφ z[k]
e−j(·)
Phase φ̂
Estimation
ejφ z[k]
T (·)∗
e−jφ
T (·)∗
SEPCD CD
DPSK ≈ 2SEPPSK
where zeff [k] denotes the effective noise in the decision vari-
able d[k]. It can be shown that zeff [k] is a white process with
variance
σz2eff = 2σz2 + σz4
2
N0 N0
= 2 +
Eg Eg
For high SNRs we can approximate σz2eff as
σz2eff ≈ 2σz2
N0
= 2 .
Eg
– Comparison
We observe that the variance σz2eff of the effective noise in the
decision variable for DD is twice as high as that for CD. How-
ever, for small M the distribution of zeff [k] is significantly dif-
ferent from a Gaussian distribution. Therefore, for small M
a direct comparison of DD and CD is difficult and requires a
detailed BEP or SEP analysis. On the other hand, for large
M ≥ 8 the distribution of zeff [k] can be well approximated
as Gaussian. Therefore, we expect that at high SNRs DPSK
with DD requires approximately a 3 dB higher Eb /N0 ratio to
achieve the same BEP as CD. For M = 2 and M = 4 this dif-
ference is smaller. At a BEP of 10−5 the loss of DD compared
to CD is only about 0.8 dB and 2.3 dB for M = 2 and M = 4,
respectively.
0
10
−1
10
−2
10
−3
10
−4
10
4DPSK
BEP
−5
10
−6
10
2PSK
−7 4PSK
10 2DPSK
−8
10
0 5 10 15
Eb/N0 [dB]
we finally obtain
1 ||rb||2 + ||sbm̃ ||2 2
p(rb|sbm̃ ) = exp − · I0 |r b • sbm̃ |
(πN0)N N0 N0
4.5
3.5
2.5
I0(x)
1.5
0.5
0
0 0.5 1 1.5 2 2.5 3
ML Detection
With the above expression for p(rb |sbm̃ ) the noncoherent ML de-
cision rule becomes
m̂ = argmax {p(r|sbm̃ )}.
m̃
Simplification
The above ML decision rule depends on N0, which may not be
desirable in practice, since N0 (or equivalently the SNR) has to be
estimated. However, for x ≫ 1 the approximation
ln[I0(x)] ≈ x
holds. Therefore, at high SNR (or small N0 ) the above ML metric
can be simplified to
2
m̂ = argmax −||sbm̃ || + 2|rb • sbm̃ | ,
m̃
we obtain
√ 2 2
rb1 E r 0
b
>
b1
• √
rb2 • 0 rb2 Eb
|r1|2 > |r2|2.
In other words, the ML decision rule can be simplified to
m̂ = 1 if |r1|2 > |r2|2
m̂ = 2 if |r1|2 < |r2|2
Example:
Binary FSK
– In this case, in the interval 0 ≤ t ≤ T we have
r
Eb
sb1 (t) =
rT
Eb j2π∆f t
sb2 (t) = e
T
– If sb1(t) and sb2(t) are orthogonal, the basis functions are
(
√1 , 0≤t≤T
f1 (t) = T
0, otherwise
(
√1 ej2π∆f t , 0≤t≤T
f2 (t) = T
0, otherwise
– Receiver Structure
rb1
f1∗ (T − t) | · |2
rb(t) m̂ = 1
d>
<
0
rb2 − m̂ = 2
f2∗ (T − t) | · |2
sample
at
t=T
p ZT
1
= Ebejφ ej2π∆f t dt + z1
T
0
T
p
1 j2π∆f t
= Ebejφ
e + z1
j2π∆f T
0
p 1
= Ebejφ ejπ∆f T ejπ∆f T − e−jπ∆f T + z1
j2π∆f T
p
= Ebej(π∆f T +φ) sinc(π∆f T ) + z1
0
10
−1
10
−2
10
10
−3 noncoherent
−4
10
coherent
BEP
−5
10
−6
10
−7
10
−8
10
0 2 4 6 8 10 12 14 16
Eb/N0 [dB]
ML Detection
The ML decision rule is
|sbm̃|2 2
m̂ = argmax − + ln I0 |rbs∗bm̃ |
m̃ N0 N0
We decide for m̂ = 1 if
p
2Eb 2
− + ln I0 | 2Eb rb | > ln(I0(0))
N0 N0
p
2 2Eb
ln I0 2Eb |rb | >
N0 N0
| √{z }
≈ N2 2Eb |rb |
0
r
Eb
|rb | >
2
Therefore, the (approximate) ML decision rule can be simplified
to
q
m̂ = 1 if |rb | > E2b
q
m̂ = 2 if |rb | < E2b
m̂ = 1
m̂ = 2
with
d[k] = rb [k]rb∗[k − 1].
Obviously, for N = 2 the ML (MSDD) decision rule is identical to
the heuristically derived conventional differential detection decision
rule. For N > 2 the gap to coherent detection becomes smaller.
Disadvantage of MSDD
The trial vector ã has N − 1 elements and each element has M
possible values. Therefore, there are M N−1 different trial vectors
ã. This means for MSDD we have to make M N−1 /(N − 1) tests
per (scalar) symbol decision. This means the complexity of MSDD
grows exponentially with N . For example, for M = 4 we have to
make 4 and 8192 tests for N = 2 and N = 9, respectively.
Alternatives
In order to avoid the exponential complexity of MSDD there are
two different approaches:
1. The first approach is to replace the above brute–force search
with a smarter approach. In this smarter approach the trial
vectors ã are sorted first. The resulting fast decoding algorithm
still performs optimum MSDD but has only a complexity of
N log(N ) (Mackenthun, 1994). For fading channels a similar
approach based on sphere decoding exists (Pauli et al.).
2. The second approach is suboptimum but achieves a similar
performance as MSDD. Complexity is reduced by using de-
cision feedback of previously decided symbols. The resulting
scheme is referred to as decision–feedback differential detection
(DF–DD). The complexity of this approach is only linear in N
(Edbauer, 1992).
CPM Modulation
– CPM Transmit Signal
Recall that the CPM transmit signal in complex baseband rep-
resentation is given by
r
E
sb (t, I) = exp(j[φ(t, I) + φ0]),
T
where I is the sequence {I[k]} of information bearing symbols
I[k] ∈ {±1, ±3, . . . , ±(M − 1)}, φ(t, I) is the information
carrying phase, and φ0 is the initial carrier phase. Without
loss of generality, we assume φ0 = 0 in the following.
– Information Carrying Phase
In the interval kT ≤ t ≤ (k + 1)T the phase φ(t, I) can be
written as
X
L−1
φ(t, I) = Θ[k] + 2πh I[k − ν]q(t − [k − ν]T )
ν=1
+2πhI[k]q(t − kT ),
where Θ[k] represents the accumulated phase up to time kT , h
is the so–called modulation index, and q(t) is the phase shaping
pulse with
0, t<0
q(t) = monotonic, 0 ≤ t ≤ LT
1/2, t > LT
– Trellis Diagram
If h = q/p is a rational number with relative prime integers
q and p, CPM can be described by a trellis diagram whose
number of states S is given by
pM L−1, even q
S=
2pM L−1, odd q
Received Signal
The received complex baseband signal is given by
rb (t) = sb (t, I) + z(t)
with complex AWGN z(t), which has a power spectral density of
ΦZZ (f ) = N0 .
ML Detection
– Since the CPM signal has memory, ideally we have to observe
the entire received signal rb (t), −∞ ≤ t ≤ ∞, in order to
make a decision on any I[k] in the sequence of transmitted
signals.
– The conditional pdf p(rb(t)|sb(t, Ĩ)) is given by
Z∞
1
p(rb(t)|sb(t, Ĩ)) ∝ exp − |rb (t) − sb(t, Ĩ)|2 dt ,
N0
−∞
Viterbi Algorithm
– The exponential complexity of brute–force search can be avoided
using the so–called Viterbi algorithm (VA).
– Introducing the definition
Z
(k+1)T
– The above steps are carried out for all symbol intervals and at
the end of the transmission, a decision is made on the trans-
mitted sequence. Alternatively, at time kT we may use the
˜ − k0] corresponding to the surviving path with the
symbol I[k
largest accumulated metric as estimate for I[k − k0]. It has
been shown that as long as the decision delay k0 is large enough
(e.g. k0 ≥ 5 log2(S)), this method yields the same performance
as true ML detection.
– The complexity of the VA is exponential in the number of
states, but only linear in the length of the transmitted se-
quence.
Remarks:
– At the expense of a certain loss in performance the complexity
of the VA can be further reduced by reducing the number of
states.
– Alternative implementations receivers for CPM are based on
Laurent’s decomposition of the CPM signal into a sum of PAM
signals
z(t)
C(f )
−W W f
and
ν(t) = gR (t) ∗ z(t)
z(t)
Proof:
X
∞ Z
(2m+1)/(2T )
= X(f )ej2πf kT df
m=−∞
(2m−1)/(2T )
X
∞ Z )
1/(2T
′
= X(f ′ + m/T )ej2π(f +m/T )kT df ′
m=−∞
−1/(2T )
Z )
1/(2T
X
∞
′
= X(f ′ + m/T ) ej2πf kT df ′
−1/(2T ) |
m=−∞
{z }
B(f ′ )
Z )
1/(2T
Since
X
∞
B(f ) = X(f + m/T )
m=−∞
B(f )
− 2T1 −W W 1
2T
f
2. T = 1/(2W )
If the symbol duration T is equal to 1/(2W ), we get
1
=W
2T
and B(f ) = T is achieved for
T, |f | ≤ W
X(f ) = .
0, |f | > W
This corresponds to
sin π Tt
x(t) =
π Tt
B(f )
− 2T1 1
2T
f
3. T > 1/(2W )
In this case, we have
1
<W
2T
and B(f ) consists of overlapping, shifted replica of X(f ). ISI–
free transmission is possible if X(f ) is chosen properly.
Example:
X(f )
T
T
2
− 2T1 1
2T
f
B(f )
T
T
2
− 2T1 1
2T
f
0.9
0.8
0.7
0.6
0.5
β=1
X(f )/T
0.4
0.3 β = 0.35
0.2
0.1
β=0
0
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
fT
0.8
0.6
0.4
β = 0.35
0.2
x(t)
β=1
−0.2
β=0
−0.4
−4 −3 −2 −1 0 1 2 3 4
t/T
√
Nyquist–Filters
The spectrum of x(t) is given by
GT (f ) GR (f ) = X(f )
ISI–free transmission is of course possible for any choice of GT (f )
and GR (f ) as long as X(f ) fulfills the Nyquist criterion. However,
the SNR maximizing optimum choice is
GT (f ) = G(f )
GR (f ) = αG∗(f ),
i.e., gR (t) is matched to gT (t). α is a constant. In that case, X(f )
is given by
X(f ) = α|G(f )|2
and consequently, G(f ) can be expressed as
r
1
G(f ) = X(f ).
α
√
G(f ) is referred to as Nyquist–Filter. In practice, a delay τ0
may be added to GT (f ) and/or GR (f ) to make them causal filters
in order to ensure physical realizability of the filters.
z(t)
kT
I[k] rb(t) rb[k]
gT (t) c(t) gR(t)
√
gT (t) and gR (t) are Nyquist–Impulses that are matched to each
other.
Equivalent Discrete–Time Channel Model
z[k]
p
I[k] Eg rb[k]
Proof:
– Signal Component
The overall channel is given by
x[k] = gT (t) ∗ c(t) ∗ gR (t)
t=kT
= gT (t) ∗ gR (t) ,
t=kT
where we have used the fact that the channel acts as an ideal
lowpass filter. We assume
gR (t) = g(t)
1
gT (t) = p g ∗(−t),
Eg
√
where g(t) is a Nyquist–Impulse, and Eg is given by
Z∞
Eg = |g(t)|2 dt
−∞
– Noise Component
z(t) is a complex AWGN process with power spectral density
∗ Mean
1 Z∞
E{z[k]} = E p ∗
z(τ )g (kT + τ ) dτ
Eg
−∞
Z∞
1
= p E{z(τ )}g ∗(kT + τ ) dτ
Eg
−∞
= 0
∗ ACF
φZZ [λ] = E{z[k + λ]z ∗[k]}
Z∞ Z∞
1
= E{z(τ1)z ∗(τ2)} g ∗ ([k + λ]T + τ1)
Eg | {z }
−∞ −∞ N0 δ(τ1 −τ2 )
·g(kT + τ2) dτ1dτ2
Z∞
N0
= g ∗ ([k + λ]T + τ1)g(kT + τ1) dτ1
Eg
−∞
N0
= Eg δ[λ]
Eg
= N0 δ[λ]
This shows that z[k] is a discrete–time AWGN process with
variance σZ2 = N0 and the proof is complete.
– Channel c(t)
In general, the channel c(t) is not ideal, i.e., |C(f )| is not a
constant over the range of frequencies where GT (f ) is non–zero.
Therefore, linear distortions are inevitable.
– Transmit Filter gT (t)
√
The transmit filter gT (t) may or may not be a Nyquist–Filter,
e.g. in the North American D–AMPS mobile phone system a
square–root raised cosine filter with roll–off factor β = 0.35 is
used, whereas in the European EDGE mobile communication
system a linearized Gaussian minimum–shift keying (GMSK)
pulse is employed.
– Receive Filter gR (t)
√
We assume that the receive filter gR (t) is a Nyquist–Filter.
Therefore, the filtered, sampled noise z[k] = gR (t) ∗ z(t)|t=kT
is white Gaussian noise (WGN).
Ideally, gR (t) consists of a filter matched to gT (t) ∗ c(t) and a
noise whitening filter. The drawback of this approach is that
gR (t) depends on the channel, which may change with time in
wireless applications. Therefore, in practice often a fixed but
√
suboptimum Nyquist–Filter is preferred.
X
L−1
rb [k] = h[l]I[k − l] + z[k]
l=0
z[k]
ML Detection
For ML detection, we need the pdf p(rb |I) which is given by
X 2
1
K+L−2 X
L−1
p(rb |I) ∝ exp − rb [k] − h[l]I[k − l] .
N0
k=0 l=0
is valid, i.e., Λ[k +1] can be calculated recursively from Λ[k], which
renders the application of the VA possible.
For M –ary modulation an ISI channel of length L can be described
by a trellis diagram with M L−1 states since the signal component
X
L−1
˜ − l]
h[l]I[k
l=0
Example:
[1, −1]
[−1, 1]
[−1, −1]
–k=0
Arbitrarily and without loss of optimality, we may set the ac-
cumulated metric corresponding to state S[k] at time k = 0
equal to zero
Λ(S[0], 0) = Λ([1, 1], 0) = 0.
Note that there is only one accumulated metric at time k = 0
since S[0] is known at the receiver.
–k=1
˜
The accumulated metric corresponding to S[1] = [I[0], 1] is
given by
˜ 0)
Λ(S[1], 1) = Λ(S[0], 0) + λ(S[0], I[0],
˜ 0)
= λ(S[0], I[0],
Since there are two possible states, namely S[1] = [1, 1] and
S[1] = [−1, 1], there are two corresponding accumulated met-
rics at time k = 1.
–k=2
˜
Now, there are 4 possible states S[2] = [I[1], ˜ and for each
I[0]]
state a corresponding accumulated metric
˜ 1)
Λ(S[2], 2) = Λ(S[1], 1) + λ(S[1], I[1],
has to be calculated.
–k=3
At k = 3 two branches emanate in each state S[3]. However,
since of the two paths that emanate in the same state S[3] that
path which has the smaller accumulated metric Λ(S[3], 3) also
will have the smaller metric at time k = K +L−2 = K +1, we
need to retain only the path with the smaller Λ(S[3], 3). This
path is also referred to as the surviving path. In mathematical
terms, the accumulated metric for state S[3] is given by
˜ 2)}
Λ(S[3], 3) = argmin {Λ(S[2], 2) + λ(S[2], I[2],
˜
I[2]
[1, −1]
[−1, 1]
[−1, −1]
–k≥4
All following steps are similar to that at time k = 3. In each
step k we retain only M L−1 = 4 surviving paths and the cor-
responding accumulated branch metrics.
– Termination of Trellis
Since we assume that for k ≥ K, I[k] = 1 is transmitted, the
end part of the trellis is as shown below.
k =K −2 k =K −1 k=K k =K +1
[1, 1]
[1, −1]
[−1, 1]
[−1, −1]
Since only M L−1 paths are retained at each step of the VA, the
complexity of the VA is linear in the sequence length K, but
exponential in the length L of the overall channel impulse response.
If the VA is implemented as described above, a decision can be
made only at time k = K + L − 2. However, the related delay may
be unacceptable for large sequence lengths K. Fortunately, empir-
ical studies have shown that the surviving paths tend to merge
relatively quickly, i.e., at time k a decision can be made on the
symbol I[k − k0] if the delay k0 is chosen large enough. In prac-
tice, k0 ≈ 5(L − 1) works well and gives almost optimum results.
I[0] d[0] ˆ
I[0]
h[k] h∗[−k]
= EhσZ2 = EhN0
Therefore, this corresponds to the transmission of I[0] over a non–
ISI channel with ES /N0 ratio
ES Eh2 Eh
= = ,
N0 EhN0 N0
and the related SEP or BEP can be calculated easily. For example,
for the BEP of BPSK we obtain
r !
Eh
PMF = Q 2 .
N0
For the true BEP of MLSE we get
PMLSE ≥ PMF .
The above bound is referred to as the matched–filter (MF) bound.
The tightness of the MF bound largely depends on the underlying
channel. For example, for a channel with L = 2, h[0] = h[1] = 1
Example:
−1
10
−2
10
−3
BEP
10
−4
10
−5
10
3 4 5 6 7 8 9 10
Eb/N0 [dB]
Since MLSE becomes too complex for long channel impulse re-
sponses, in practice, often suboptimum equalizers with a lower
complexity are preferred.
The most simple suboptimum equalizer is the so–called linear
equalizer. Roughly speaking, in LE a linear filter
F (z) = Z {f [k]}
X∞
= f [k]z −k
k=−∞
z[k]
rb[k] d[k]
I[k] H(z) F (z) ˆ
I[k]
1
F (z) =
H(z)
where we assume that H(z) has no roots on the unit circle. Since
in most practical applications H(z) can be modeled as a filter with
finite impulse response (FIR), F (z) will be an IIR filter in general.
Obviously, the resulting overall channel transfer function is
Hov (z) = H(z)F (z) = 1,
and we arrive at the equivalent channel model shown below.
e[k]
d[k] ˆ
I[k] I[k]
= T Φee (ej2πf T ) df
−1/(2T )
Z )
1/(2T
N0
= T df.
|H(ej2πf T )|2
−1/(2T )
E{|I[k]|2} 1
SNRIIR−ZF = =
σe2 Z )
1/(2T
N0
T df
|H(ej2πf T )|2
−1/(2T )
Example:
We consider a channel with two coefficients and energy 1
1
H(z) = p (1 − cz −1 ),
1 + |c|2
where c is complex. The equalizer filter is given by
p z
F (z) = 1 + |c|2
z−c
In the following, we consider two cases: |c| < 1 and |c| > 1.
1. |c| < 1
In this case, a stable, causal impulse response is obtained.
p
f [k] = 1 + |c|2 ck u[k],
where u[k] denotes the unit step function. The corresponding
error variance is
Z )
1/(2T
N0
σe2 = T df
|H(ej2πf T )|2
−1/(2T )
Z )
1/(2T
= N0 T |F (ej2πf T )|2 df
−1/(2T )
X
∞
= N0 |f [k]|2
k=−∞
X
∞
= N0 (1 + |c|2) |c|2k
k=0
2
1 + |c|
= N0 .
1 − |c|2
0.9
0.8
0.7
0.6
N0 SNRIIR−ZF
0.5
0.4
0.3
0.2
0.1
0
0 1 2 3 4 5 6
|c|
Obviously, the SNR drops to zero as |c| approaches one, i.e., as the
rb[k]
T T ... T
d[k] ˆ − k0 ]
I[k
subject to
hov [k0] = 1,
where hov [k] denotes the overall impulse response (channel and
equalizer filter).
Although D is a convex function of the equalizer coefficients, it
is in general difficult to find the optimum filter coefficients. An
exception is the special case when the binary eye at the equalizer
input is open
1 X∞
|h[k]| < 1
|h[k1]| k=−∞
k6=k1
hov[k]
k0 k
LF −1 LF −1
2 2
f = H −1 [0 0 . . . 0 1 0 . . . 0]T
Example:
We assume
1
H(z) = p (1 − cz −1 ),
1 + |c|2
and k0 = qF /2.
√
1. First we assume
√ c = 0.5, i.e., h[k] is given by h[0] = 2/ 5 and
h[1] = 1/ 5.
qF = 2 qF = 2
1.5
1
1 0.8
0.6
hov[k]
0.5
0.4
f [k]
0 0.2
0
−0.5
−0.2
−1 −0.4
0 0.5 1 1.5 2 0 2 4 6
k k
q =8 q =8
F F
1
1
0.8
0.5 0.6
0.4
hov[k]
f [k]
0
0.2
−0.5 0
0 2 4 6 8 0 5 10 15
k k
0 0
f [k]
−0.5
−0.5
−1
−1.5 −1
0 0.5 1 1.5 2 0 2 4 6
k k
qF = 8 qF = 8
1.5
1
1
0.8
0.5
0.6
0
hov[k]
0.4
f [k]
−0.5
0.2
−1
0
−1.5
0 2 4 6 8 0 5 10 15
k k
Objective
Minimize the variance of the error signal
ˆ
e[k] = d[k] − I[k].
z[k]
rb[k] d[k]
I[k] H(z) F (z) ˆ
I[k]
e[k]
– Cost Function
The cost function for filter optimization is given by
J = E{|e[k]|2}
( !
X∞
= E f [m]rb [k − m] − I[k]
m=−∞
!)
X∞
f ∗ [m]rb∗[k − m] − I ∗ [k] ,
m=−∞
tiation
∂f ∗[κ]
= 1
∂f ∗[κ]
∂f [κ]
= 0
∂f ∗[κ]
∂|f [κ]|2
= f [κ].
∂f ∗ [κ]
We observe that the error signal and the input of the MMSE
filter must be orthogonal. This is referred to as the orthogo-
nality principle of MMSE optimization.
The above condition can be modified to
E {e[k]rb∗[k − κ]} = E {d[k]rb∗[k − κ]} − E {I[k]rb∗[k − κ]}
The individual terms on the right hand side of the above equa-
tion can be further simplified to
X
∞
E {d[k]rb∗[k − κ]} = f [m] E {rb [k − m]rb∗[k − κ]},
| {z }
m=−∞ φrr [κ−m]
and
X
∞
E {I[k]rb∗[k − κ]} = h∗[m] E {I[k]I ∗[k − κ − m]}
| {z }
m=−∞ φII [κ+m]
X
∞
= h∗[−µ]φII [κ − µ],
µ=−∞
Error Variance
The error variance σe2 is given by
σe2 = φII [0] − φdI [0]
= 1 − φdI [0].
σe2 can be calculated most easily from the the power spectral den-
sity
Φee (z) = ΦII (z) − ΦdI (z)
= 1 − F (z)H(z)
H(z)H ∗ (1/z ∗)
= 1−
H(z)H ∗ (1/z ∗) + N0
N0
= .
H(z)H ∗ (1/z ∗ ) + N0
More specifically, σe2 is given by
Z )
1/(2T
σe2 = T Φee(ej2πf T ) df
−1/(2T )
or
Z )
1/(2T
N0
σe2 = T df
|H(ej2πf T )|2 + N0
−1/(2T )
Z )
1/(2T
= T (1 − Φee (ej2πf T )) df
−1/(2T )
= 1 − σe2 < 1.
Since hov [0] < 1 is valid, MMSE equalization is said to be biased.
SNR
The decision variable d[k] may be rewritten as
d[k] = I[k] + e[k]
= hov [0]I[k] + e[k] + (1 − hov [0])I[k],
| {z }
=e′ [k]
where e′[k] does not contain I[k]. Using φee[λ] = −φeI [λ] the
variance of e′ [k] is given by
σe2′ = E{|e[k]|2}
= (1 − hov [0])2 + 2(1 − hov [0])φeI [0] + σe2
= σe4 − 2σe4 + σe2
= σe2 − σe4.
Therefore, the SNR for MMSE equalization with IIR filters is given
by
h2ov [0]
SNRIIR−MMSE =
σe2′
(1 − σe2)2
= 2
σe (1 − σe2)
which yields
1 − σe2
SNRIIR−MMSE =
σe2
Example:
We consider again the channel with one root and transfer function
1
H(z) = p (1 − cz −1 ),
1 + |c|2
where c is a complex constant. After some straightforward manip-
ulations the error variance is given by
σe2 = E{|e[k]|2}
Z )
1/(2T
N0
= T df
|H(ej2πf T )|2 + N0
−1/(2T )
N0 1
= p ,
1 + N0 1 − β 2
where β is defined as
2|c|
β= .
(1 + N0)(1 + |c|2)
It is easy to check that for N0 → 0, i.e., for high SNRs σe2 ap-
proaches the error variance for linear ZF equalization.
We illustrate the SNR for two different cases.
1. |c| = 0.5
20
MMSE
ZF
15
10
SNR [dB]
−5
0 2 4 6 8 10 12 14 16 18 20
1/N0 [dB]
2. |c| = 0.95
15
MMSE
ZF
10
5
SNR [dB]
−5
−10
−15
0 2 4 6 8 10 12 14 16 18 20
1/N0 [dB]
As expected, for high input SNRs (small noise variances), the (out-
put) SNR for ZF equalization approaches that for MMSE equal-
ization. For larger noise variances, however, MMSE equalization
yields a significantly higher SNR, especially if H(z) has zeros close
to the unit circle.
d[k]
rb[k] F (z) ˆ − k0 ]
I[k
e[k]
Error Signal
The error signal e[k] is given by
e[k] = d[k] − I[k − k0],
where we allow for a decision delay k0 ≥ 0 to account for non–
ˆ − k0] = I[k − k0] for
causal components, and we assume again I[k
the sake of mathematical tractability.
Cost Function
The cost function for filter optimization is given by
J(f ) = E{|e[k]|2}
n H H o
H
= E f rb − I[k − k0] f rb − I[k − k0]
= f H E{rb rH } f − f H E{rb I ∗[k − k0]}
| {z } b | {z }
Φrr ϕrI
2
−E{I[k − k0]rH b }f + E{|I[k − k0 ]| }
= f H Φrr f − f H ϕrI − ϕH rI f + 1,
where φrI [λ] = E{rb[k + λ]I ∗[k]}. Note that for independent,
identically distributed input data and AWGN, we get
φrr [λ] = h[λ] ∗ h∗[−λ] + N0 δ[λ]
φrI [λ] = h[k0 + λ].
This completely specifies Φrr and ϕrI .
Filter Optimization
The optimum filter coefficient vector can be obtained be setting
the gradient of J(f ) equal to zero
∂J(f )
= 0.
∂f ∗
For calculation of this gradient, we use the following rules for dif-
ferentiation of scalar functions with respect to (complex) vectors:
∂ H
f Xf = Xf
∂f ∗
∂ H
f x = x
∂f ∗
∂ H
x f = 0,
∂f ∗
where X and x denote a matrix and a vector, respectively.
With these rules we obtain
∂J(f )
= Φrr f − ϕrI = 0
∂f ∗
or
Φrr f = ϕrI .
f opt = Φ−1
rr ϕrI
Error Variance
The minimum error variance is given by
σe2 = J(f opt)
= 1 − ϕH −1
rI Φrr ϕrI
= 1 − ϕHrI f opt
1 − σe2
SNRFIR−MMSE =
σe2
Example:
Below, we show f [k] and hov [k] for a channel with one root and
transfer function
1
H(z) = p (1 − cz −1 ).
1 + |c|2
We consider the case c = 0.8 and different noise variances σZ2 = N0.
Furthermore, we use qF = 12 and k0 = 7.
N0 = 0.2 N0 = 0.2
1.5
1
1 0.8
0.5 0.6
hov[k]
f [k]
0.4
0
0.2
−0.5
0
−1 −0.2
0 2 4 6 8 10 12 0 5 10
k k
N = 0.001 N = 0.001
0 0
1.5
1
1 0.8
0.5 0.6
0.4
hov[k]
f [k]
0
0.2
−0.5
0
−1 −0.2
0 2 4 6 8 10 12 0 5 10
k k
We observe that the residual ISI in hov [k] is smaller for the smaller
noise variance, since the MMSE filter approaches the ZF filter for
N0 → 0. Also the bias decreases with decreasing noise variance,
i.e., hov [k0] approaches 1.
z[k]
z[k]
I[k] H(z) 1 ˆ
I[k]
H(z)
T P (z) P (z) T
feedforward filter feedback filter
General DFE
If the predictor has infinite length, the above DFE scheme cor-
responds to optimum zero–forcing (ZF) DFE. However, the DFE
concept can be generalized of course allowing for different filter
optimization criteria. The structure of a general DFE scheme is
shown below.
z[k]
B(z) − 1
is causal and monic (b[k] = 1). Note that F (z) may also be
an FIR filter. F (z) and B(z) can be optimized according to any
suitable criterion, e.g. ZF or MMSE criterion.
Typical Example:
0.8
0.6
h[k] ∗ f [k]
0.4
0.2
−0.2
−8 −6 −4 −2 0 2 4 6 8
k
0.8
0.6
0.4
b[k]
0.2
−0.2
0 1 2 3 4 5 6 7 8
k
Properties of DFE
– The feedforward filter has to suppress only the pre–cursor ISI.
This imposes fewer constraints on the feedforward filter and
therefore, the noise enhancement for DFE is significantly smaller
than for linear equalization.
– The post–cursors are canceled by the feedback filter. This
causes no additional noise enhancement since the slicer elimi-
nates the noise before feedback.
– Feedback of wrong decisions causes error propagation. Fortu-
z[k]
F (z)
I[k] H(z) 1 ˆ
I[k]
H(z)
T P (z) P (z) T
B(z) − 1
Overall Channel
The overall forward channel is given by
Hov (z) = H(z)F (z)
Hmin (z)
=
hmin[0]
X
L−1
hmin[m] −m
= z
m=0
h min [0]
This means the FFF filter F (z) transforms the channel H(z) into
its (scaled) minimum phase equivalent. The FBF is given by
Hmin (z)
B(z) − 1 = −1
hmin[0]
X
L−1
hmin[m] −m
= z
h
m=1 min
[0]
Therefore, assuming error–free feedback the equivalent overall chan-
nel including forward and backward part is an ISI–free channel with
gain 1.
Noise
The FFF F (z) is an allpass filter since
∗
∗ ∗ 1 Hmin (z)Hmin (1/z ∗)
F (z)F (1/z ) =
|hmin [0]|2 H(z)H ∗ (1/z ∗)
1
= .
|hmin [0]|2
Therefore, the noise component ν[k] is AWGN with variance
N0
σν2 = .
|hmin[0]|2
ν[k]
d[k] ˆ
I[k] I[k]
SNR
Obviously, the SNR of optimum ZF–DFE is given by
1
SNRZF−DFE =
σν2
|hmin[0]|2
= .
N0
Furthermore, it can be shown that hmin[0] can be calculated in
closed form as a function of H(z). This leads to
!
R )
1/(2T
|H(ej2πf T )| 2
SNRZF−DFE = exp T ln N0 df .
−1/(2T )
Example:
– Channel
We assume again a channel with one root and transfer function
1
H(z) = p (1 − cz −1 ).
1 + |c|2
– Normalized Minimum–Phase Equivalent
If |c| ≤ 1, H(z) is already minimum phase and we get
Hmin (z)
Pe(z) =
hmin[0]
= 1 − cz −1 , |c| ≤ 1.
If |c| > 1, the root of H(z) has to be mirrored into the unit
circle. Therefore, Hmin (z) will have a zero at z = 1/c∗, and we
get
Hmin (z)
Pe(z) =
hmin [0]
1
= 1 − ∗ z −1 , |c| > 1.
c
– Filters
The FFF is given by
√ 1 2, |c| ≤ 1
1+|c|
F (z) = ∗
√ 1 2 z−1/c
z−c , |c| > 1
1+|c|
– SNR
The SNR can be calculated to
Z
1/(2T ) j2πf T 2
|H(e )|
SNRZF−DFE = exp T ln df .
N0
−1/(2T )
Z
1/(2T ) −2πf T 2
|1 − ce |
= exp T ln df .
(1 + |c|2)N0
−1/(2T )
0.9
0.8 ZF−DFE
0.7
0.6
N0 SNR
0.5 ZF−LE
0.4
0.3
0.2
0.1
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
|c|
z[k]
B(z) − 1
Optimization Criterion
In optimum MMSE–DFE, we optimize FFF and FBF for mini-
mization of the variance of the error signal
e[k] = d[k] − I[k].
This error variance can be expressed as
2
J = E |e[k]|
( !
X∞ X
∞
= E f [κ]rb [k − κ] − b[κ]I[k − κ] − I[k]
κ=−∞ κ=1
!)
X
∞ X
∞
f ∗ [κ]rb∗[k − κ] − b∗[κ]I ∗[k − κ] − I ∗[k] .
κ=−∞ κ=1
FFF Optimization
Differentiating J with respect to f ∗[ν], −∞ < ν < ∞, yields
∂J X∞
∗
= f [κ] E{r [k − κ]r [k − ν]}
∂f ∗[ν] | b
{z b }
κ=−∞ φrr [ν−κ]
X
∞
− b[κ] E{I[k − κ]rb∗[k − ν]} − E{I[k]rb∗[k − ν]}
| {z } | {z }
κ=1 φIr [ν−κ] φIr [ν]
This results in
H ∗ (1/z ∗)
F (z) = B(z).
H(z)H ∗ (1/z ∗) + N0
Recall that
H ∗ (1/z ∗)
FLE(z) =
H(z)H ∗ (1/z ∗) + N0
is the optimum filter for linear MMSE equalization. This means
the optimum FFF for MMSE–DFE is the cascade of a optimum
linear equalizer and the FBF B(z).
FBF Optimization
The Z–transform E(z) of the error signal e[k] is given by
E(z) = F (z)Z(z) + (F (z)H(z) − B(z))I(z).
Adopting the optimum F (z), we obtain for the Z–transform of
where el [k] denotes the error signal at the output of the optimum
linear MMSE equalizer, and Φel el (z) is the Z–transform of the
autocorrelation sequence of el [k].
The optimum FBF filter will minimize the variance of el [k]. There-
fore, the optimum prediction–error filter for el [k] is the optimum
filter B(z). Consequently, the optimum FBF can be defined as
1
B(z) = ,
Q(z)
where Q(z) is obtained by spectral factorization of Φel el (z)
N0
Φel el (z) =
H(z)H ∗ (1/z ∗) + N0
= σe2Q(z)Q∗ (1/z ∗ ).
The coefficients of q[k], k ≥ 1, can be calculated recursively as
X
k−1
k−µ
q[k] = q[µ]β[k − µ], k≥1
µ=0
k
with
Z )
1/(2T
β[µ] = T ln Φel el (ej2πf T ) ej2πµf T df.
−1/(2T )
z[k]
F (z)
d[k]
I[k] H(z) FLE(z) B(z) ˆ
I[k]
linear prediction−error
MMSE filter for el [k] B(z) − 1
equalizer
feedback filter
Overall Channel
The overall forward transfer function is given by
Hov (z) = F (z)H(z)
H(z)H ∗ (1/z ∗ )
= B(z)
H(z)H ∗ (1/z ∗) + N0
N0
= 1− B(z)
H(z)H ∗ (1/z ∗) + N0
σe2
= B(z) − ∗ .
B (1/z ∗ )
Remark
For high SNRs, ZF–DFE and MMSE–DFE become equivalent. In
that case, the noise variance is comparatively small and also the
MMSE criterion leads to a complete elimination of the ISI.
Error Variance J
The error variance can be obtained as
J = E{|e[k]|2}
= f H E{rb [k]rH H H
b [k]}f + b E{I[k − k0 − 1]I [k − k0 − 1]}b
−f H E{rb [k]I H [k − k0 − 1]}b − bH E{I[k − k0 − 1]r H
b [k]}f
−f H E{rb [k]I ∗[k − k0]} − E{rHb [k]I[k − k0 ]}f
+bH E{I[k − k0 − 1]I ∗[k − k0]} + E{I H [k − k0 − 1]I[k − k0]}b.
Since I[k] is an i.i.d. sequence and the noise z[k] is white, the
Optimum Filters
The optimum filter settings can be obtained by differentiating J
with respect to f ∗ and b∗, respectively.
∂J 2
∗ = (Φhh + σn I)f − Hb − h
∂f
∂J H
∗ = b−H f
∂b
Setting the above equations equal to zero and solving for f , we get
−1
f opt = Φhh − HH H + σn2 I h
Bias
The bias is given by
h[k0] = f H
opt h = 1 − Jmin < 1.