Slides2010 PDF
Slides2010 PDF
Erik G. Larsson
Link oping University, ISY, Communication Systems
MM UNICATION
YS TEMS
Detection and Estimation Theory E. G. Larsson
Welcome
1
Detection and Estimation Theory E. G. Larsson
Organization
10 course meetings, 2h each
Examination:
Extensive homework. Oral exam where necessary
Homework to be done individually, but oral cooperation OK
Copying of homework from other students considered serious oense
Deadlines are generous but to be strictly observed
Course webpageplease visit regularly, some notes can be downloaded
Active participation in the course seminars is expected
2
Detection and Estimation Theory E. G. Larsson
Book
H. van Trees, Detection and modulation theory, part I
A classic, inuential textbook for more almost 40 years...
Some alternative books/reference sources (somewhat dierent focus)
S. Kay, Fundamentals of SSP: Estimation & Detection theory
(Parts I & II)
V. Poor, Introduction to detection and estimation
Do read the book. Lecture notes are not a replacement for the book.
Lectures will highlight some important ideas, but we wont have time
cover everything.
If you nd typos in the notes, please report them to me!
3
Detection and Estimation Theory E. G. Larsson
Lecture 1
4
Detection and Estimation Theory E. G. Larsson
Some terminology/motivating examples
Detection
Was a 0 or 1 transmitted? (What is the probability it was a 0?)
Is there a target present? (And is the target friend or foe?)
Is treatment A is better than treatment B?
Is the coin fair?
What is the rank of the covariance matrix?
Estimation:
What is the mean/standard deviation of the distribution?
What is the carrier frequency?
What is the angle to the target?
5
Detection and Estimation Theory E. G. Larsson
Binary hypothesis testing
Data r generated by one of two possible models: H
0
, H
1
The task is to detect what model was used, given r.
Was wrong: H
0
true, detector said H
1
; vice versa.
P(H
1
|H
0
) = P
F
: probability of false alarm
P(H
0
|H
1
) = P
M
: probability of miss
Was right: H
1
true, detector said H
1
; vice versa.
P(H
1
|H
1
) = P
D
: probability of detection
P(H
0
|H
0
): no special name
6
Detection and Estimation Theory E. G. Larsson
Two philosophies
Bayesian
Source selects H
i
at random, according to P(H
0
) = P
0
, P(H
1
) = P
1
The result of the detection is the ratio
P(H
0
|data)
P(H
1
|data)
Often, need a hard decision (which H
i
most likely given data?)
Deterministic (orthodox, classical)
Source is deterministicEither H
0
or H
1
is correct
The result of the detection is one of H
i
, at some prescribed P
D
, P
F
In statistics, there is great controversy between these philosophies. In
engineering, we try be pragmatic. 7
Detection and Estimation Theory E. G. Larsson
Bayesian approach to the binary problem: H
i
random
Decision regions: Z
i
= {r : detector says H
i
}
Bayes cost:
R = C
00
P(H
0
, H
0
) +C
10
P(H
1
, H
0
) +C
11
P(H
1
, H
1
) +C
01
P(H
0
, H
1
)
= C
00
P(H
0
|H
0
)P
0
+C
10
P(H
1
|H
0
)P
0
+C
11
P(H
1
|H
1
)P
1
+C
01
P(H
0
|H
1
)P
1
= C
00
P
0
_
Z
0
p
r|H
0
(R|H
0
)dR+C
10
P
0
_
Z
1
p
r|H
0
(R|H
0
)dR
+C
11
P
1
_
Z
1
p
r|H
1
(R|H
1
)dR+C
01
P
1
_
Z
0
p
r|H
1
(R|H
1
)dR
Decision rule that minimizes R (for C
10
> C
00
, C
01
> C
11
):
Say H
1
if: (R)
p
r|H
1
(R|H
1
)
p
r|H
0
(R|H
0
)
>
P
0
(C
10
C
00
)
P
1
(C
01
C
11
)
8
Detection and Estimation Theory E. G. Larsson
Sucient statistic
Sometimes the likelihood ratio (R) depends on R only via a simpler
function l(R), of R
Then l(R) contains all relevant information about R for the detection
l(R) is called a sucient statistic
Think of sucient statistics as data reduction
9
Detection and Estimation Theory E. G. Larsson
Min-error-probability detector
With C
00
= C
11
= 0 and C
10
= C
01
= 1, R will be the probability that
the detector was wrong
R = P
0
P(H
1
|H
0
) +P
1
P(H
0
|H
1
) = P
0
P
F
+P
1
P
M
= P
e
The hypothesis test is
(R) =
P
0
P
1
=
P
0
1 P
0
This decision rule is called Minimum-probability-of-error receiver
If P
0
= P
1
= 1/2, the threshold is = 1
10
Detection and Estimation Theory E. G. Larsson
Ex.: BI-AGN
Performance characterized by P(e)
Optimal receiver?
Probability of error?
Behavior of threshold when
2
0,
2
?
Behavior of P(e) when
2
0,
2
?
11
Detection and Estimation Theory E. G. Larsson
BI-AGN, cont.
Example: s
0
= 1, s
1
= 1, P(s
0
) = 1/4, P(s
1
) = 3/4
10
1
10
0
10
1
10
2
10
4
10
3
10
2
10
1
10
0
e
r
r
o
r
p
r
o
b
a
b
i
l
i
t
y
opt
subopt
12
Detection and Estimation Theory E. G. Larsson
Reliability information: Soft decisions (not in book)
Sometimes (especially in communications with coding) one is interested
not only in taking a decision but also in knowing how likely it is that one
was right.
The Bayesian framework easily allows us to do this.
Usually soft decisions are expressed in terms of log-likelihood ratios
LLR(b) = log
_
P(b=1)
P(b=0)
_
Think of LLR(b) as our belief in b:
log
_
P(b = 1)
P(b = 0)
_
. .
LLR(b)
=
_
_
+, I am sure that b=1 (P(b = 1) = 1)
, I am sure that b=0 (P(b = 1) = 0)
0, I have no idea about b (P(b = 1) = 1/2)
13
Detection and Estimation Theory E. G. Larsson
Additivity of LLRs
Suppose, our belief in b a priori is LLR
a priori
(b). We then observe a
variable r, that depends on b. What is our belief in b after we observe r?
LLR
a posteriori
(b|r) =log
_
P(b = 1|r)
P(b = 0|r)
_
= log
_
p(r|b = 1)P(b = 1)
p(r|b = 0)P(b = 0)
_
=log
_
p(r|b = 1)
p(r|b = 0)
_
+ log
_
P(b = 1)
P(b = 0)
_
=LLR(r|b) + LLR
a priori
(b)
Note: if we have two independent observations r
1
and r
2
, then
LLR(r
1
, r
2
|b) = log
_
p(r
1
|b = 1)p(r
2
|b = 1)
p(r
1
|b = 0)p(r
2
|b = 0)
_
= LLR(r
1
|b) + LLR(r
2
|b)
These results can be exploited when combining several observations 14
Detection and Estimation Theory E. G. Larsson
Ex.: BI-AGN channel revisited
Recall: r = s + e, s = 1, (say b = 0 s = 1, b = 1 s = +1),
e N(0,
2
)
LLR
a posteriori
(b|r) = LLR
a priori
(b)+log
_
_
_
1
2
2
e
|r1|
2
2
2
1
2
2
e
|r+1|
2
2
2
_
_
_
= LLR
a priori
(b)+
2r
2
The scaled amplitude r adds to our belief about b.
The smaller
2
, the more we believe in the observation r.
If
2
0, we will believe completely in r.
If
2
, we will ignore r, and resort to our a priori belief.
15
Detection and Estimation Theory E. G. Larsson
Ex.: Detection theory for model comparison
A coin is tossed n times, got h heads and t tails (n = h +t). Is it fair?
Need dene precisely what we mean by fair:
E.g.
_
H
0
: P
t
= P
h
= 1/2 (fair)
H
1
: P
t
U[0, 1], P
h
= 1 P
t
(biased)
What we need to know before tossing is P(H
0
)/P(H
1
)
What we know after tossing is P(H
1
|r)/P(H
0
|r)
Compute the a posteriori belief (soft decision)
P(H
1
|r)
P(H
0
|r)
=
P(H
1
)
P(H
0
)
P(r|H
1
)
P(r|H
0
)
=
In Bayesian detection the result always depends on a priori assumptions
16
Detection and Estimation Theory E. G. Larsson
Deterministic (non-Bayesian) Approach
Source viewed as deterministic. That is, H
0
or H
1
deterministically true.
The probabilities P
0
, P
1
are not dened! Nor is R or P(e)!
Neyman-Pearson (CFA) criterion:
max P
D
s.t. P
F
Can show: The NP criterion leads to the decision rule
(R)
p
r|H
1
(R|H
1
)
p
r|H
0
(R|H
0
)
where
_
p
|H
0
()d =
By choosing , P
D
can be traded for P
F
17
Detection and Estimation Theory E. G. Larsson
Ex.: Detecting a radar signal
18
Detection and Estimation Theory E. G. Larsson
Receiver Operating Characteristics (ROC)
For any detector, the pair (P
D
, P
F
) characterizes performance (and the
Bayesian cost is a function of (P
D
, P
F
))
A plot of P
D
versus P
F
fully characterizes the detector performance and
is called ROC curve
ROC is always dened (both under Bayesian and orthodox paradigm)
Two important properties:
Always concave
Slope at a point (P
F
, P
D
) = threshold required to reach that point
Probability of error: P(e) = P
0
P
F
+P
1
P
M
(if P
0
, P
1
dened!)
19
Detection and Estimation Theory E. G. Larsson
Binary tests some important points
For the Bayesian approach,
The criterion was to minimize the cost R (e.g. P(e))
One can also compute soft decisions (a posteriori beliefs)
For the deterministic approach,
Cannot talk about P
0
, P
1
, P(e), R
The criterion is to maximize P
D
for given P
F
.
Both approaches (Bayesian & determ.) lead to a likelihood ratio test.
An LR test reduces the problem dimensionality from dim(r) to 1 (one)
Performance of any detector is characterized by
P
D
, P
F
tradeo via ROC (always)
Bayes cost (e.g. error probability), only for a random source.
20
Detection and Estimation Theory E. G. Larsson
Lecture 2
21
Detection and Estimation Theory E. G. Larsson
M-ary hypothesis testing
Often, we want to decide among more than M > 2 hypotheses.
E.g., communications (>BPSK), target discrimination in radar
We focus on the Bayes criterion, because in this case the extension is
straightforward.
Bayes cost:
R =
M1
i=0
M1
j=0
C
ij
P
j
P(i|j)
P
j
=the chance that the source selected H
j
P(i|j)=the probability that the detector says H
i
when H
j
was true
C
ij
=cost of deciding on H
i
when actually H
j
was true
22
Detection and Estimation Theory E. G. Larsson
Decision regions
Main result: Dene, for i = 1, . . . , M 1 (H
i
arbitrarily ordered)
i
(r) =
p
r|H
i
(r|H
i
)
p
r|H
0
(r|H
0
)
Then, regardless of the underlying distributions, the decision regions in
-space ( = [
1
, . . . ,
M1
]
T
) are separated by hyperplanes in R
M1
Example: For M = 3, (see book)
P
1
(C
01
C
11
)
1
(r)
H
1
or H
2
H
0
or H
2
P
0
(C
10
C
00
) +P
2
(C
12
C
02
)
2
(r)
P
2
(C
02
C
22
)
2
(r)
H
2
or H
1
H
0
or H
1
P
0
(C
20
C
00
) +P
1
(C
21
C
01
)
1
(r)
P
2
(C
12
C
22
)
2
(r)
H
2
or H
0
H
1
or H
0
P
0
(C
20
C
10
) +P
1
(C
21
C
11
)
1
(r)
Detector reduces the problem dimensionality from dim(r) to M 1 23
Detection and Estimation Theory E. G. Larsson
Minimum-probability-of-error detector
With C
ii
= 0, and C
ij
= 1 for j = i, the Bayes cost
R =
M1
i=0
M1
j=0
C
ij
P
j
P(i|j) =
M1
i=0
M1
j=0,j=i
P
j
P(i|j) = 1
M1
i=0
P(i, i) = P(e)
Optimal detector that minimizes P(e): max
i
P(H
i
|r)
Proof: see book. Short proof: Find Z
i
that maximize
P(c) = 1 P(e) =
i
P(H
i
, H
i
) =
M1
i=0
P(H
i
|H
i
)P(H
i
)
=
M1
i=0
P(H
i
)
_
Z
i
p(r|H
i
)dr =
_
M1
i=0
I
i
(r)p(r|H
i
)P(H
i
)dr
24
Detection and Estimation Theory E. G. Larsson
Reliability calculations
The important quantity is the posterior:
P(H
i
|r) =
p(r|H
i
)P(H
i
)
p(r)
Can be obtained from the likelihoods and priors:
P(H
i
|r) =
p(r|H
i
)P(H
i
)
p(r)
=
p(r|H
i
)P(H
i
)
M1
j=0
p(r|H
j
)P(H
j
)
The last operation, marginalization, is the fundamental tool in all
Bayesian inference.
25
Detection and Estimation Theory E. G. Larsson
Bit-level reliability information for multiple hypotheses
If each hypothesis is dened by K independent bits b
1
, . . . , b
K
:
LLR(b
k
|r) = log
P
H
i
:b
k
(H
i
)=1
P(H
i
|r)
P
H
i
:b
k
(H
i
)=0
P(H
i
|r)
!
= log
P
H
i
:b
k
(H
i
)=1
p(r|H
i
)P(H
i
)
P
H
i
:b
k
(H
i
)=0
p(r|H
i
)P(H
i
)
!
= log
0
B
@
P
H
i
:b
k
(H
i
)=1
p(r|H
i
)
Q
K
j=1
P(b
j
= b
j
(H
i
))
P
H
i
:b
k
(H
i
)=0
p(r|H
i
)
Q
K
j=1
P(b
j
= b
j
(H
i
))
1
C
A
= log
0
B
@
P
H
i
:b
k
(H
i
)=1
p(r|H
i
)
Q
K
j=1,j=k
P(b
j
= b
j
(H
i
))
P(b
k
= 1)
P
H
i
:b
k
(H
i
)=0
p(r|H
i
)
Q
K
j=1,j=k
P(b
j
= b
j
(H
i
))
P(b
k
= 0)
1
C
A
= log
0
B
@
P
H
i
:b
k
(H
i
)=1
p(r|H
i
)
Q
K
j=1,j=k
P(b
j
= b
j
(H
i
))
P
H
i
:b
k
(H
i
)=0
p(r|H
i
)
Q
K
j=1,j=k
P(b
j
= b
j
(H
i
))
1
C
A
+ LLR
a priori
(b
k
)
This is called a soft-in soft-out detector
26
Detection and Estimation Theory E. G. Larsson
M-ary tests: some important points
Hard detection:
The optimal detector changes the dimension from dim(r) to M 1
The decision regions in -space (likelihood ratios) are polyhedra,
regardless of the underlying distributions
The decision regions in r-space are not necessarily polyhedra (unless
for Gaussian problems, later)
Soft detection:
Computing reliability information (soft decisions) is a marginalization
problem
27
Detection and Estimation Theory E. G. Larsson
Gaussian problems
Often, r is Gaussian under either hypotheses
Why are Gaussian distributions so common in (electrical) engineering?
Possible answers:
central-limit theorem (noise is sum of many sources),
mathematical beauty/simplicity (quadratic forms after taking log),
Gaussian has maximal entropy under a power constraint
We shall study rst the binary problem to discriminate between
_
H
0
: r N(m
0
, K
0
)
H
1
: r N(m
1
, K
1
)
Two important cases: (A) K
0
= K
1
= K. (B) m
0
= m
1
= m
28
Detection and Estimation Theory E. G. Larsson
A: Gaussians, equal covariance (K
0
= K
1
= K)
Optimal test is given by l(r) = (m
1
m
0
)
T
K
1
r where
l(r)
_
_
N
_
(m
1
m
0
)
T
K
1
m
0
,
_
(m
1
m
0
)
T
K
1
(m
1
m
0
)
_
, H
0
N
_
(m
1
m
0
)
T
K
1
m
1
,
_
(m
1
m
0
)
T
K
1
(m
1
m
0
)
_
, H
1
Consider l
(r)
l(r)(m
1
m
0
)
T
K
1
(m
1
+m
0
)/2
(m
1
m
0
)
T
K
1
(m
1
m
0
)/2
_
N(1, 2/d), H
0
N(+1, 2/d), H
1
where
d
_
(m
1
m
0
)
T
K
1
(m
1
m
0
)
Test is equivalent to the BI-AGN problem
E.g. for P(H
0
) = P(H
1
) = 1/2: P(e) = Q
_
d
2
_
29
Detection and Estimation Theory E. G. Larsson
Special case: K =
2
I
Here d = m
1
m
0
/ (Euclidean distance).
P(e) is special case of above.
Alternative, brute force proof of P(e) for P(H
0
) = P(H
1
) = 1/2:
P(e|H
0
) =P(r m
1
r m
0
|H
0
)
=P(r
2
+m
1
2
2r
T
m
1
r
2
+m
0
2
2r
T
m
0
|H
0
)
=P(m
1
2
m
0
2
2(m
0
+e)
T
(m
1
m
0
) 0)
=P(m
1
2
+m
0
2
2m
T
0
m
1
. .
=m
1
m
0
2
2e
T
(m
1
m
0
)
. .
N(0,
4
2
m
1
m
0
2
)
=P
_
_
_
m
1
m
0
2
_
4
2
m
1
m
0
2
N(0, 1)
_
_
_
= Q
_
m
1
m
0
2
_
30
Detection and Estimation Theory E. G. Larsson
General K
If K diagonal, l(r) =
N
i=1
(m
1
m
0
)
i
2
i
r
i
Components of r weighed with 1/
2
i
Distance is d
2
= (m
1
m
0
)
T
K
1
(m
1
m
0
) =
N
i=1
(m
1
m
0
)
2
i
2
i
For nondiagonal K, write
K = TT
T
Problem equivalent to diagonal-K problem after rotating space by T
T
:
r
= T
T
r
m
i
= T
T
m
i
31
Detection and Estimation Theory E. G. Larsson
B. Gaussians, equal mean
Consider now the problem of observing r and deciding among
_
H
0
: r N(m, K
0
)
H
1
: r N(m, K
1
)
Can show: the test is given by
l(r) = (r m)
T
(K
1
0
K
1
1
)(r m) or
If K
0
=
2
n
I, K
1
= (
2
s
+
2
n
)I, then with
l
(r) = r m
2
(r)
_
2
N
(
2
n
), H
0
2
N
(
2
n
+
2
s
), H
1 32
Detection and Estimation Theory E. G. Larsson
Gaussians, equal mean, cont.
Fig 2.35 from text here
33
Detection and Estimation Theory E. G. Larsson
M-ary Gaussian problem in white noise
Suppose r N(m
i
,
2
I) (N-dim AWGN vector channel)
Optimal detector: min
i
_
r m
i
2
2
2
log P(H
i
)
_
Error probability: P(e) = 1
i
P(H
i
)
_
Z
i
p(r|H
i
)
. .
1
2
N
exp
1
2
2
rm
i
dr
Reliability information
LLR(b
k
|r) = log
_
_
_
m:b
k
=1
exp
_
1
2
2
r m
2
_
P(m)
m:b
k
=0
exp
_
1
2
2
r m
2
_
P(m)
_
_
_
(In practice, use log(e
a
+e
b
) = max(a, b) + log
_
1 +e
|ab|
_
.)
34
Detection and Estimation Theory E. G. Larsson
Example of r-space decision regions (N = 2, M = 4)
3 2 1 0 1 2 3
3
2
1
0
1
2
3
(a) equiprobable
3 2 1 0 1 2 3
3
2
1
0
1
2
3
(b) P
1
= 0.4, P
2
= P
3
=
P
4
= 0.2
3 2 1 0 1 2 3
3
2
1
0
1
2
3
(c) P
1
= 0.85, P
2
= P
3
=
P
4
= 0.05
8pm: s
1
. 3pm: s
2
. 12pm: s
3
. 9pm: s
4
.
Larger a priori probability larger decision region
Decision region does not necessarily contain its own point
35
Detection and Estimation Theory E. G. Larsson
Signal sets
Consider the signal set {m
i
}
In AWGN vector channel, the signal space invariant to
translation/rotation. Proof: Change variables in P(e) expression
The translation that min. average energy (E[m
i
2
]) is = E[m
i
]
Proof: For any possible translation ,
E
_
m
2
_
= E
_
mE [m] +E [m]
2
_
=E
_
mE [m]
2
_
. .
indep. of
+ E [m]
2
. .
zero if = E[m]
+2(E [m] )
T
E [mE [m]]
. .
=0
Example: Recall that binary antipodal set is 3 dB better than orthogonal
36
Detection and Estimation Theory E. G. Larsson
Gaussian problems, some important points
Most expressions are quadratic forms
Both -space and r-space decision regions become polyhedra
Linear transformations (e.g., noise whitening) preserve Gaussianity
37
Detection and Estimation Theory E. G. Larsson
Lecture 3
38
Detection and Estimation Theory E. G. Larsson
Estimation Theory
Data r observed from model that depends on parameter . Given r,
want to determine estimate
(r) of parameter .
Bayesian philosophy
True is random (selected by the source) according to p()
Want to nd
(r) which is close to on the average (avg. over )
What we know before seeing r is p().
What we know after seeing r is p(|r)
Deterministic philosophy
is deterministic
p() is not dened, nor is p(|r)
Want to nd
(r) which is good (in some sense) for a range of
39
Detection and Estimation Theory E. G. Larsson
Bayesian estimation
To an error, associate a cost: C = f(
)
The Bayesian estimate is the estimate
(r) which minimizes the average
cost:
R = E[C] (average over , r)
Some important examples of cost (scalar ):
1. C = |
|
2
, R = E[|
|
2
] (mean-square error, MSE)
2. C =
_
0, |
|
1, |
| >
where small (min-prob. of wrong guess)
3. C = |
| (absolute error)
40
Detection and Estimation Theory E. G. Larsson
The MMSE estimator
1. With R = E[|
|
2
] the optimal
(r) is the posterior (conditional)
mean
(r) = E
[|r] =
_
p(|r)d
Proof: Consider R =
_ _
(
(r))
2
p(|r)p(r)ddr =
Alternatively: Let f(r) be an estimate of . Show that Q(f(r)) =
E[( f(r))
2
|r] is minimized for f(r) = E[|r].
Note:
is a random variable.
(r) is a function of r.
E[
|
1, |
| >
we have
R = E[C] = P(|
(r) | > ) =
_
P(|
p(|r)
If p() is at over all of interest (no a priori information),
= argmax
p(r|)
The MAP estimate reduces to the ML estimate.
3. With R = E[|
). Suppose
p(|r) is symmetric around E[|r], and that
C(
|
2
).
This means that most sensible cost functions yield the same Bayesian
estimate as the MMSE cost function does
Thus, for most estimation in practice, it is enough to use the MMSE
estimate.
43
Detection and Estimation Theory E. G. Larsson
Extension to vector valued parameters
MMSE: C =
2
, R = E[
2
]. Estimate:
(r) = E[|r]
Example: Suppose r = H +e where N(, ), e N(0, Q)
Then E[|r] = +H
T
(HH
T
+Q)
1
(r H)
Proof: Consider the joint density
_
r
_
N
__
H
_
,
_
H
T
H HH
T
+Q
__
Can show that C Cov(|r) = H
T
(HH
T
+Q)
1
H
The error
2
] = Tr {C}
The reduction in uncertainty about learned from r is
= H
T
(HH
T
+Q)
1
H
44
Detection and Estimation Theory E. G. Larsson
Example
Consider
r = +e
N(, )
e N(0, )
Then
E[|r] =
1
1 +
2
/
2
+
2
/
2
1 +
2
/
2
r
The estimate is a weighted sum of and r
At high SNR (
2
/
2
) we trust the data (r).
At low SNR (
2
/
2
) we resort to prior knowledge of .
45
Detection and Estimation Theory E. G. Larsson
Deterministic estimation
The MMSE estimate is good and simple. Why look for other estimator?
Evaluating E[|r] =
_
p(|r)d can be very hard computationally
One may not know a reasonable a priori density p()
Or one may just refuse to accept the Bayesian philosophy
In the deterministic framework, is constant but
(r) is random (it is a
function of r).
The performance of an estimator can be quantied in many ways, e.g.:
Bias: E[
) = E[(
)(
)
T
]
46
Detection and Estimation Theory E. G. Larsson
How to nd estimator?
Maximum-likelihood:
= argmax
p(r|)
Usually good. Asymptotically optimal under broad conditions.
(Writing p(r|) is some notation misuse. Write p(r; ) instead.)
Moment based estimates
Experience, trial-and-error, ad hoc techniques
What is the best estimator?
47
Detection and Estimation Theory E. G. Larsson
Example
Consider
r
i
= +e
i
e
i
i.i.d. N(0, )
Estimate .
Clearly, the following sample-mean estimator
=
1
N
N
i=1
r
i
N
_
,
_
2
/N
_
is unbiased and has variance
2
/N. (This is actually the ML estimate.)
Is this the lowest possible variance we can get?
Is there a better estimator?
48
Detection and Estimation Theory E. G. Larsson
Cramer-Rao bound
Classic/fundamental theorem: Suppose E[
) I
1
where
I =
2
6
6
6
6
6
4
E
d
2
d
1
d
1
log p(r|)
d
2
d
1
d
N
log p(r|)
.
.
.
.
.
.
E
d
2
d
N
d
1
log p(r|)
d
2
d
N
d
N
log p(r|)
3
7
7
7
7
7
5
=
_
_
E
d
d
1
log p(r|)
d
d
1
log p(r|)
d
d
1
log p(r|)
d
d
N
log p(r|)
.
.
.
.
.
.
E
d
d
N
log p(r|)
d
d
1
log p(r|)
d
d
N
log p(r|)
d
d
N
log p(r|)
_
_
I is called the Fisher information matrix (FIM)
49
Detection and Estimation Theory E. G. Larsson
Some properties of the FIM
The CRB=I
1
has many uses:
benchmark a particular estimator
predict the fundamental hardness of an estimation problem
say something about accuracy before an estimator has been found
An estimate if called ecient if it attains the CRB
ML usually (asymptotically) ecient. Indeed
ML
as
N(, I
1
) under
weak conditions. For linear Gaussian problems this is true even in nite
samples.
Ex: r = H +e, e N(0, Q)
Then the ML estimate is unbiased, ecient and given by
ML
= (H
T
Q
1
H)
1
H
T
Q
1
r N(, (H
T
Q
1
H)
1
)
50
Detection and Estimation Theory E. G. Larsson
Some properties of the FIM, cont.
Adding more unknowns makes problem harder.
1
I
ii
[I
1
]
ii
Unwanted unknowns called nuisance parameters
If I
ij
= 0, then
i
and
j
are said to be decoupled
If
N(, I
1
), then I
1
denes error ellipsoids
CRB usually good prediction of variance at high SNR/large-number-of-
samples, but at nite SNR it fails to predict the threshold eect for
nonlinear problems
Better bounds than CRB do exist (e.g. Barankin bound, Bhattacharyya
bound)
51
Detection and Estimation Theory E. G. Larsson
CRB: An important example
Consider a Gaussian model
r N((), C())
Then
I
ij
=
_
d()
d
i
_
T
C
1
()
d()
d
j
+
1
2
Tr
_
C
1
()
dC()
d
i
C
1
()
dC()
d
j
_
Slepian-Bangs formula
Example: CRB for the linear model
52
Detection and Estimation Theory E. G. Larsson
Notes
say something in the lecture about how to handle nuisance parameters
with Bayesian estimation and with classical estimation
53
Detection and Estimation Theory E. G. Larsson
Lecture 4
54
Detection and Estimation Theory E. G. Larsson
Composite hypotheses: M = 2
Sometimes there are unknown parameters under H
0
and H
1
E.g.:
_
r
i
= e
i
, H
0
r
i
= s
i
+e
i
, H
1
s
i
known but e
i
i.i.d. N(0, ); unknown
In general
0
unknown under H
0
,
1
unknown under H
1
Usually the
i
are nuisance parameters, of no independent interest
As before, Bayesian and deterministic approach
If does not depend on
i
the LR test is said to be uniformly most
powerful (UMP)
UMP test does not exist for most interesting problems we encounter.
55
Detection and Estimation Theory E. G. Larsson
Bayesian approach
If p(
i
) known, then integrate out the unknown:
p(r|H
i
) =
_
p(r|H
i
,
i
)p(
i
)d
i
Likelihood ratio (binary test)
=
p(r|H
1
)
p(r|H
0
)
=
_
p(r|H
1
,
1
)p(
1
)d
1
_
p(r|H
0
,
0
)p(
0
)d
0
Conceptually straightforward
Requires a priori densities of the nuisance parameters to be known
The integrals can be hard to compute
56
Detection and Estimation Theory E. G. Larsson
Deterministic approach
Consider the likelihood ratio with
i
replaced by estimates
i
=
p(r|H
1
,
1
)
p(r|H
0
,
0
)
p(r|H
1
,
1
)
p(r|H
0
,
0
)
With maximum-likelihood estimates,
GLRT
=
p(r|H
1
,
1,ML
)
p(r|H
0
,
0,ML
)
=
max
1
p(r|H
1
,
1
)
max
0
p(r|H
0
,
0
)
This is called generalized likelihood ratio test (GLRT)
GLRT often has good properties and is very useful in practice
57
Detection and Estimation Theory E. G. Larsson
Useful example: GLRT for Linear Gaussian Model
Consider r = H +e, e N(0,
2
I) generalize by prewhitening
GLR tests:
2
Test:
_
H
0
: = 0
H
1
: = 0
Test:
_
H
0
: A = b
H
1
: A = b
known
H
r
2
2
(A
b)
T
A(H
T
H)
1
A
T
1
(A
b)
2
unknown
H
r
2
_
_
H
r
_
_
2
(A
b)
T
A(H
T
H)
1
A
T
1
(A
b)
H
r
2
where
= (H
T
H)
1
H
T
r. Note:
_
_
H
r
_
_
2
= r
2
H
r
2
Note that the tests for unknown
2
are scaling invariant
58
Detection and Estimation Theory E. G. Larsson
Asymptotic performance of GLRT not in book
Consider the test
_
H
0
: p(r|H
0
,
r
,
s
),
r
=
r
0
,
s
nuisance
H
1
: p(r|H
1
,
r
,
s
),
r
=
r
0
GLRT decides H
1
if T(r) =
max
r
,
s
p(r|H
1
,
r
,
s
)
max
s
p(r|H
0
,
r
0
,
s
)
>
Asymptotically (large number of samples) e.g. Kay-II p. 205:
2 log T(r)
(
2
r
, H
0
r = dim(
r
)
2
r
(), H
1
where
= (
r
1
r
0
)
T
_
I
r
I
s
I
1
s
I
r
_
(
r
1
r
0
)
and the I
evaluated at
s
,
r
0
59
Detection and Estimation Theory E. G. Larsson
Composite hypotheses: M > 2
Bayesian solution:
Eliminate
i
by marginalization
p(r|H
i
) =
_
p(r|H
i
,
i
)p(
i
)d
i
and use standard Bayesian-cost criterion
With min-P(e) criterion, just compute
arg max
i
P(H
i
|r) = arg max
i
_
P(H
i
)
_
p(r|
i
, H
i
)p(
i
|H
i
)d
i
_
This integral can be hard to nd in closed form!
60
Detection and Estimation Theory E. G. Larsson
M > 2, deterministic approach
For M > 2, the GLRT is not dened.
Suppose one simply tried to nd max
i
_
max
i
p(r|H
i
,
i
)
_
Unfortunately, this decision rule usually cannot do the job. If models are
inclusive:
i
= [
i1
,
i
] it will always choose H
i
over H
i1
max
i
p(r|
i
) = max
i
max
i1
p(r|
i1
,
i
) > max
i1
p(r|
i1
,
i
= xed)
Examples (model selection):
Regression-like polynomial t with unknown polynomial oder
FIR (tapped delay line) coecient identication unknown length
Determine rank of sample covariance matrix
Estimate of number of targets in radar
Fundamental diculty: a more exible model ts the data better.
61
Detection and Estimation Theory E. G. Larsson
M > 2, deterministic approach, cont.
Many methods rely on penalization of models with many parameters:
max
i
_
log
_
max
i
p(r|H
i
,
i
)
_
i
_
where
i
depends on the number of elements in
i
and in r. Typically,
i
dim(r) log(dim(
i
)). Examples: AIC, MDL, ...
What we want is Occams razor:
Among two explanations for the same data, choose the simplest
Deterministic approach needs ad hoc regularization. How is it, with
the Bayesian criterion, do we need to penalize exible models there too,
i.e., bias the a priori probabilities P(H
i
) to favor the simpler model? No.
62
Detection and Estimation Theory E. G. Larsson
Bayesian composite hypothesis testing: an
approximation
A tool we need: saddlepoint (Laplace) approximation to a function q(x)
(think of q(x) as un-normalized pdf)
log q(x) = log q(x
0
)
1
2
(x x
0
)
T
A(x x
0
) + H.O.T.
Then
_
q(x)dx q(x
0
)
_
2A
1
1
2
(xx
0
)
T
A(xx
0
)
Apply to P(
i
|r, H
i
) N(
i
, Q
i
),
i
= argmax
i
P(
i
|r, H
i
)
Q
i
determines the uncertainty ellipsoid (error bars) for
i
Accuracy of Gaussian approximation increases with size of data record 63
Detection and Estimation Theory E. G. Larsson
Bayesian approach, cont.
Use saddlepoint approximation on posterior density
P(H
i
|r) P(r|H
i
)P(H
i
) = P(H
i
)
_
P(r|
i
, H
i
)P(
i
|H
i
)
. .
P(
i
|r,H
i
)
d
i
P(H
i
) P(r|
i
, H
i
)P(
i
|H
i
)
. .
peak of integrand (at
i
=
i
)
_
|2Q
i
|
. .
Laplace factor
= P(H
i
)
. .
a priori
P(r|
i
, H
i
)
. .
best t likelihood
P(
i
|H
i
)
_
|2Q
i
|
. .
Occam factor
The Occam factor will automatically favor the less exible model.
No need to penalize exibility nor to bias P(H
i
) towards simpler models.
64
Detection and Estimation Theory E. G. Larsson
Estimation and Hypothesis testing in practice
If a Bayesian approach is computationally feasible, use it
Given assumptions, inference is unique: p(|r) =
p(r|)p()
_
p(r|
)p(
)d
i
p(t|r, H
i
)p(H
i
|r)
_
= p(t|r,
H
i
)
_
If full Bayesian approach not feasible (many problems, unfortunately)
ML and GLRT produce good results for many engineering problems
Ad hoc or mixed techniques (e.g. empirical Bayesian) often work well
65
Detection and Estimation Theory E. G. Larsson
Performance bounds for hypothesis testing
Often, exact performance of hypothesis tests hard to compute
One can then resort to bounds and approximations.
Main points:
Cherno bounds on P
F
, P
M
for binary tests
Approximation to P
F
, P
M
for binary tests
Union bound for M-ary tests (not in book)
Algebra sometimes tedious, but the results are useful
Derivations for self-study (Van Trees-I Section 2.7).
66
Detection and Estimation Theory E. G. Larsson
Cherno bounds on P
F
, P
M
Consider the binary LR test log
_
p(r|H
1
)
p(r|H
0
)
_
Dene (s) = log
_
(p(r|H
1
))
s
(p(r|H
0
))
1s
dr
Then for s, 0 s 1 such that
(s))
and
P
M
exp ((s) + (1 s)
(s))
Bound on P(e): If P(H
0
) = P(H
1
) = 1/2, then = 0 and
one can show that P(e)
1
2
exp(( s)) where
( s) = 0
67
Detection and Estimation Theory E. G. Larsson
Approximations to P
D
, P
F
Suppose N = dim(r) is large
Then for s
_
(s) > 3, s 0
P
F
1
_
2s
2
(s)
exp ((s) s
(s))
And for (1 s)
_
(s) > 3, s 1
P
M
1
_
2(1 s)
2
(s)
exp ((s) (1 s)
(s))
If r
i
are i.i.d. Gaussian under H
0
and H
1
this approximation also holds
for small N
68
Detection and Estimation Theory E. G. Larsson
Cherno Bound on the Q-function
Consider the binary test with P(H
0
) = P(H
1
) = 1/2:
r
_
N(1, ), H
0
N(1, ), H
1
Recall that P(e) = Q(1/)
One has (s) =
2s(s 1)
2
and with s = 1/2, P(e)
1
2
exp
_
1
2
2
_
Combining these we nd the useful bound
Q(x)
1
2
exp
_
x
2
2
_
, x 0
69
Detection and Estimation Theory E. G. Larsson
Union bound for M-ary hypothesis testing
Consider M-ary hypothesis testing, with min P(e)-criterion.
Test: max
i
f
i
(r), where f
i
(r) = P(H
i
|r). Note: f
i
(r) random variables.
We have
P(e|H
n
) = P
0
@
[
k=0...,M1;k=n
f
k
(r) f
n
(r)|H
n
1
A
M1
X
k=0,k=n
P(f
k
(r) f
n
(r)|H
n
)
Hence
P(e) =
M1
n=0
P(e|H
n
)P(H
n
)
M1
n=0
P(H
n
)
M1
k=0,k=n
P(f
k
(r) f
n
(r)|H
n
)
Here P(f
k
(r) f
n
(r)) can be bounded/approximated as for binary test!
70
Detection and Estimation Theory E. G. Larsson
Example: Union bound for AWGN channel
Suppose r N(
i
,
2
I) (AWGN vector channel)
Assuming equally likely signals P(H
n
) = 1/M, we have the union bound
P(e) =
M1
n=0
P(e|H
n
)P(H
n
)
1
M
M1
n=0
M1
k=0,k=n
Q
_
2
_
For 1, the inner sum is dominated by the terms for which
k
= d
min
(n) min
k
,k
=n
For given H
n
there are, say, K(n) terms with
n
k
= d
min
(n)
For K(n) = K (uniform set) then P(e) K Q
_
d
min
2
_
71
Detection and Estimation Theory E. G. Larsson
Lecture 5
72
Detection and Estimation Theory E. G. Larsson
Representation of deterministic waveforms
Need a vector (nite or countable) representation of waveforms
x(t), 0 t T {x
i
}
N
i=0
x
Ex.: Sampling (in time)
More generally: Expansion in basis
x
N
(t) =
N
i=1
x
i
i
(t)
Preferably {
i
(t)}
i=1
should be such that x
N
(t) can approximate the
x(t) of interest better and better when N
73
Detection and Estimation Theory E. G. Larsson
Orthonormal set (ON) representations
{
i
(t)}
N
i=1
is ON set over [0, T] if
_
T
0
i
(t)
j
(t)dt =
_
1, i = j
0, i = j
Suppose x(t) =
N
i=1
x
i
i
(t). Then
Recovery of coecient: x
i
=
_
T
0
x(t)
i
(t)dt
Filter interpretation: x
i
= y
i
(T), y
i
(t) = S
h
i
{x(t)}, h
i
(t) =
i
(T t)
Inner product: x
T
y =
N
i=1
x
i
y
i
=
_
T
0
x(t)y(t)dt
Special case: x
2
=
N
i=1
x
2
i
=
_
T
0
x
2
(t)dt
74
Detection and Estimation Theory E. G. Larsson
Examples of some common ON signal representations
Stepwise constant basis:
n
(t) =
1
T
(u(t nT) u(t (n + 1)T))
Bandlimited signal: if
x
<
T
s
then
x(t) = T
s
n=
x(nT
s
)
sin
p
(t nT
s
)
(t nT
s
)
,
p
=
T
s
Fourier ser. over [0, T]: x(t) =
n=0
x
2n
cos
_
n
2
T
t
_
+x
2n+1
sin
_
n
2
T
t
_
ON representation for nite number of waveforms x
i
(t): Gram-Schmidt
75
Detection and Estimation Theory E. G. Larsson
Complete ON (CON) set representation
An ON basis is CON if
lim
N
_
T
0
x(t)
N
i=1
x
i
i
(t)
2
dt = 0, N
for all x(t) with
_
T
0
x
2
(t)dt <
The diculties in handling CON with rigor is in the representation of
pathological functions. In engineering we try be pragmatic.
Think of N as very large but nite
x
N
(t) as a good approximation of x(t) (relative to
_
T
0
x
2
(t)dt)
In practice, pathological cases occur rarely
76
Detection and Estimation Theory E. G. Larsson
Random process (x(t))
Mean m(t) = E[x(t)]
Correlation R(t, u) = E[x(t)x(u)]
(cf. R = E[xx
T
])
Covariance K(t, u) = E[(x(t) m(t))(x(u) m(u))] = K(u, t)
(cf. K = E[(x m)(x m)
T
])
Inner product with deterministic function
E
_
_
x(t)f(t)dt
_
=
_
f(t)m(t)dt, cf. E[a
T
x] = a
T
m
var
_
_
x(t)f(t)dt
_
=
_ _
f(t)K(t, u)f(u)dtdu
. .
Note:0,K(t, u) is p.d.
, cf. var[a
T
x] = a
T
Ka
77
Detection and Estimation Theory E. G. Larsson
Stationarity of random process x(t)
Covariance stationary if K(t, u) = K(t u)
Correlation stationary if R(t, u) = R(t u)
Spectrum
S() =
_
R()e
j
d
R() =
1
2
_
S()e
j
d
Inner product with deterministic function (correlation stationary process)
var
_
_
x(t)f(t)dt
_
=
_ _
f(t)K(t u)f(u)dtdu
(cf. E[a
T
x] = a
T
m, var[a
T
x] = a
T
Ka where K is Toeplitz)
78
Detection and Estimation Theory E. G. Larsson
Karhunen-Loeve expansion of random process x(t)
Looking for a CON such that
lim
N
E
_
_
x(t)
N
i=1
x
i
i
(t)
2
_
_
= 0, 0 t T
It turns out a solution is the set of eigenfunctions, {
i
(t)}, to K(t, u):
j
(t) =
_
T
0
K(t, u)
j
(u)du, 0 t T
It holds E[(x
i
m
i
)(x
j
m
j
)] =
_
i
, i = j
0, i = j
where E[x
i
] = m
i
.
Coecients are uncorrelated (independent for Gaussian process)
79
Detection and Estimation Theory E. G. Larsson
Some properties of
j
j
(t) =
_
T
0
K(t, u)
j
(u)du
There exists at least one {
i
(t),
i
}. We shall assume
_
T
0
2
i
(t)dt = 1
{
i
(t), } & {
j
(t), } solutions {a
i
(t) +b
j
(t), } solution
If {
i
(t),
i
} & {
j
(t),
j
} solutions,
i
=
j
_
T
0
i
(t)
j
(t)dt = 0
Mercers theorem: for p.d. K(t, u) (convergence uniform over t, u)
K(t, u) =
i=1
i
(t)
i
(u), 0 t T, 0 u T
Energy:
_
T
0
K(t, t)dt =
i=1
i
_
= E
_
_
T
0
x
2
(t)dt
_
if E[x(t)] = 0
_
Many properties analogous to those in linear algebra (cf. a = Ka)!
80
Detection and Estimation Theory E. G. Larsson
Karhunen-Loeve expansion, cont.
Proof of the expansion (on the board)
Suppose {
i
(t),
i
} solve
j
j
(t) =
_
T
0
K(t, u)
j
(u)du over [0, T]
Show that x
i
are uncorrelated and compute their variance
Show pointwise convergence in mean-square sense
The main usefulness is that if
i
are sorted in decreasing order
the expansion x
n
can be truncated nite-dimensional problem
The integral equation can be very tedious to solve
Solutions has been tabulated for the main interesting cases
81
Detection and Estimation Theory E. G. Larsson
Bandlimited spectra and degrees of freedom
Consider a stationary bandlimited process:
S() =
_
P
0
, || <
0
0, ||
0
, K(t) = P
sin(
0
t)
0
t
Suppose the process is observed over [T/2, T/2].
The integral equation is
i
i
(t) =
_
T/2
T/2
P
sin(
0
(t u))
0
(t u)
i
(u)du
The solution consists of prolate spheroidal functions
One can show that the number of signicant eigenvalues
i
is 2BT
where B =
0
/(2)
The process is said to have 2BT degrees of freedom
82
Detection and Estimation Theory E. G. Larsson
White noise
White noise has covariance function and spectrum
K(t, u) =
2
(t u), S() =
2
White noise does not exist in practice
The integral equation is
i
(t) =
_
T
0
2
(t u)
i
(u)du
Any CON set can be used as basis
(And coecients of series expansion are uncorrelated regardless of basis)
All eigenvalues
i
=
2
White noise can be more rigorously dened as the limiting derivative of
an N-term approximation to a Wiener process (book, for self-study)
83
Detection and Estimation Theory E. G. Larsson
Lecture 6
84
Detection and Estimation Theory E. G. Larsson
Detection and Estimation in Continuous Time
The physical reality is continuous-time (t R)
Need to be able use waveforms as observations
Communications: TX waveforms s
i
(t) represent information symbols
r(t) = s
i
(t) +w(t), i = 1, ..., M
Fading channels: received waveform depends on channel parameters
r(t) = s
i
(t, ) +w(t), i = 1, ..., M
Radar target detection:
_
H
0
: r(t) = w(t)
H
1
: r(t) = s(t, ) +w(t)
85
Detection and Estimation Theory E. G. Larsson
General approach
Expand observed waveform in ON basis over 0 t T:
r
n
=
_
T
0
r(t)
n
(t)dt and (if series converges) r(t) =
n=1
r
n
n
(t)
In general, choose the ON basis {
n
(t)} to make the problem simple.
Approximate: work with p(r
1
, ..., r
K
| ) inst. of p(r(t)| )
Exact for K < if {r
k
}
K+1
indep. of hypothesis, otherwise approx.
If all series converge, can let K (caution: white noise!)
Think of white noise as bandlimited (only 2BT eigenvalues = 0)
86
Detection and Estimation Theory E. G. Larsson
Simple binary detection in white Gaussian noise
Consider
_
H
0
: r(t) = w(t), 0 t T
H
1
: r(t) =
Es(t) +w(t), 0 t T
,
_
T
0
s
2
(t)dt = 1
A natural basis: Pick
1
(t) = s(t), and {
n
(t)}
n=2
arbitrary (ON)
Only r
1
=
_
T
0
r(t)s(t)dt dep. on H
i
. Under
_
H
0
: r
1
N(0, N
0
/2)
H
1
: r
1
N(
E, N
0
/2)
The optimal test was derived in Lecture 1 (recall: log =
2
E
N
0
r
1
E
N
0
).
Performance: P
D
= Q
_
Q
1
_
P
F
_
E
N
0
__
depends only on E/N
0
!
In white noise, the signal shape (and T) is unimportant.
The optimal structure is a correlation (matched lter) receiver
87
Detection and Estimation Theory E. G. Larsson
General binary detection in white Gaussian noise
Now
_
H
0
: r(t) =
E
0
s
0
(t) +w(t), 0 t T
H
1
: r(t) =
E
1
s
1
(t) +w(t), 0 t T
,
_
T
0
s
2
i
(t)dt = 1
With r(t) = r(t)
E
0
s
0
(t):
_
H
0
: r(t) = w(t), 0 t T
H
1
: r(t) =
_
E s(t) +w(t), 0 t T
where
E =
_
T
0
(
_
E
1
s
1
(t)
_
E
0
s
0
(t))
2
dt, s(t) =
E
1
s
1
(t)
E
0
s
0
(t)
_
E
Note:
E = E
0
+E
1
2
_
E
0
E
1
, where =
_
T
0
s
0
(t)s
1
(t)dt,
Performance is P
D
= Q
_
Q
1
_
P
F
_
E
N
0
__
Best performance if signals are antipodal, worst if parallel
88
Detection and Estimation Theory E. G. Larsson
General M-ary detection in white Gaussian noise
Now: H
i
: r(t) = s
i
(t) +w(t), 0 t T, i = 1, ..., M,
_
T
0
s
2
i
(t)dt = E
i
Take ON basis {
n
(t)}
n=1
, such that s
i
(t) =
N
n=1
s
in
n
(t), N M
Consider r
n
=
_
T
0
r(t)
n
(t)dt
For n > N, r
n
do not depend on i
For n = 1, ..., N, r
n
=
_
T
0
(s
i
(t) +w(t))
n
(t)dt N(s
in
,
N
0
2
)
With s
i
= [s
i1
, ..., s
iN
]
T
we have r
N
= [r
1
...r
N
]
T
N(s
i
,
N
0
2
I),
Optimal receiver (Lecture 2): min
i
{r
N
s
i
2
N
0
log(P(H
i
))}
Basis receiver: Uses N correlators/lters, N minimal draw sketch!89
Detection and Estimation Theory E. G. Larsson
Correlation receiver for M-ary detection
First note that s
i
(t) has no component along {
n
(t)}
n=N+1
so
_
T
0
s
i
(t)r(t)dt =
_
T
0
N
n=1
s
in
n
(t)r(t)dt =
N
n=1
s
in
r
n
= s
T
i
r
N
Now consider
r
N
s
i
2
N
0
log(P(H
i
))
= r
N
2
+s
i
2
2s
T
i
r
N
N
0
log(P(H
i
))
= r
N
2
+E
i
2
_
T
0
s
i
(t)r(t)dt N
0
log(P(H
i
))
Optimal receiver: max
i
_
2
_
T
0
s
i
(t)r(t)dt E
i
+N
0
log(P(H
i
))
_
Correlation receiver (uses M correlators/matched lters) draw sketch
90
Detection and Estimation Theory E. G. Larsson
Example: QPSK
Consider
s
i
(t) =
_
2E
T
cos
_
c
t +i
2
_
, 0 t T,
c
T
2
Z
for i = 1, 2, 3, 4
Basis? Show that N = 2, M = 4
Signal points?
Performance:
P(e) = P
_
w
1
<
_
E
2
_
w
2
<
_
E
2
_
= 1
_
1 Q
_
_
E
N
0
__
2
91
Detection and Estimation Theory E. G. Larsson
The case of orthogonal waveforms
Suppose
H
i
equally likely
Signals are orthogonal and have equal energy:
_
T
0
s
i
(t)s
j
(t)dt =
_
E, i = j
0, i = j
Then
P(e) = 1
1
2
_
(1 Q(t))
M1
exp
_
_
1
2
_
t
_
2E
N
0
_
2
_
_
dt
92
Detection and Estimation Theory E. G. Larsson
{
n
(t)}
N
n=1
from {s
m
(t)}
M
m=1
: Gram-Schmidt
Take
0
(t)
s
0
(t)
R
|s
0
(t)|
2
dt
. Clearly
_
2
0
(t)dt = 1 and s
0
(t) sp(
0
(t))
Suppose {s
m
(t)}
L1
m=1
sp({
n
(t)}
K1
n=1
).
Compute
K
(t) = s
L
(t)
K1
n=0
n
(t)
_
s
L
()
n
()d
If
_
|
K
(t)|
2
dt = 0, then s
L
(t) sp({
k
(t)}
K1
k=1
.
Set L := L + 1
If
_
|
K
(t)|
2
dt = 0, take
K
(t) =
K
(t)
_
_
|
K
(t)|
2
dt
.
Set L := L + 1, K := K + 1
Continue until L = M, i.e., the set of signals is exhausted.
Note: sp{
n
(t)}
N
n=1
= sp{s
m
(t)}
M
m=1
but N = dim{s
m
(t)}
M
m=1
M
93
Detection and Estimation Theory E. G. Larsson
Estimation in White Gaussian Noise
Consider estimation of , from r(t) = s(t, ) +w(t), 0 t T
Approx. r(t) r
K
(t) =
K
i=1
r
i
i
(t) where r
i
=
_
T
0
r(t)
i
(t)dt
Note: r
i
indep. with r
i
N(s
i
(), N
0
/2), s
i
() =
_
T
0
s(t, )
i
(t)dt
Also
K
i=1
r
i
s
i
() =
_
T
0
r
K
(t)s
K
(t, ) and
K
i=1
s
2
i
() =
_
T
0
s
2
K
(t, )dt
Then consider the (approximate) normalized likelihood
l() = log
_
p(r
K
|)
p(r
K
|no signal)
_
=
1
N
0
_
K
i=1
2r
i
s
i
() s
2
i
()
_
1
N
0
_
2
_
T
0
r(t)s(t, )dt
_
T
0
s
2
(t, )dt
_
, K large enough
94
Detection and Estimation Theory E. G. Larsson
Linear estimation
In a linear problem, r(t) = As(t) +w(t)
The likelihood is: l(A)
_
2A
_
T
0
r(t)s(t)dt A
2
_
T
0
s
2
(t)dt
_
E.g. Maximum-likelihood:
A
ML
=
_
T
0
r(t)s(t)dt
_
T
0
s
2
(t)dt
E.g. MMSE, assuming A N(0, )
A
MMSE
= E[A|r(t)] = argmax
A
(l(A) + log p(A))
= argmax
A
_
2A
_
T
0
r(t)s(t)dt A
2
_
T
0
s
2
(t)dt
A
2
2
2
_
=
_
T
0
r(t)s(t)dt
1
2
2
+
_
T
0
s
2
(t)dt
95
Detection and Estimation Theory E. G. Larsson
Nonlinear estimation
Consider r(t) = s(t, ) +w(t)
CRB() =
1
E
_
d
2
d
2
l()
_
=
N
0
/2
_
T
0
_
d
d
s(t, )
_
2
dt
Example: Delay estimation s(t, ) = f(t ). Likelihood (T 0):
l() =
1
N
0
_
2
_
T
0
r(t)f(t )dt
_
T
0
f
2
(t )dt
_
_
T
0
r(t)f(t)dt
CRB() =
N
0
/2
_
T
0
_
d
dt
f(t)
_
2
dt
=
N
0
2
_
1
1
2
_
2
|F()|
2
d
_
Usually, in nonlinear problems we observe a threshold phenomenon
96
Detection and Estimation Theory E. G. Larsson
Lecture 7
97
Detection and Estimation Theory E. G. Larsson
Detection in colored Gaussian noise
Consider a transmitted signal s(t) =
_
, 0 t T
0, else
Observation r(t) =
_
H
0
: r(t) = n(t), T
i
t T
f
H
1
: r(t) = s(t) +n(t), T
i
t T
f
Note that r(t) for t < 0 or t > T may contain information about H
i
!
Covariance function: K(t, u) =
N
0
2
(t u) +K
c
(t, u)
Will need invert spectrum white noise component will help!
Detection & estimation similar need construct the LR
98
Detection and Estimation Theory E. G. Larsson
Preliminaries: Inverse kernel
We will need to work with the inverse kernel K
1
(t, t
) dened as
_
T
f
T
i
K
1
(t, u)K(u, t
)du = (t t
) (cf. K
1
K = I)
If
K(t, t
) =
i=1
i
(t)
i
(t
)
then (this series is divergent and must be interpreted as operator)
K
1
(t, t
) =
i=1
1
i
(t)
i
(t
)
Cf. K = TT
T
K
1
= T
1
T
T
99
Detection and Estimation Theory E. G. Larsson
Direct Karhunen-Loeve approach to colored noise
Consider K-term approximations to r(t), s(t) and let K :
(r
K
) = log
_
p(r
K
|H
1
)
p(r
K
|H
0
)
_
=
K
n=1
_
2
r
n
s
n
s
2
n
n
_
=
K
n=1
1
n
_
_
T
f
T
i
_
T
f
T
i
(2r(t)
n
(t)s(t
)
n
(t
) s(t)
n
(t)
n
(t
)s(t
))dtdt
_
=
_
T
f
T
i
_
T
f
T
i
(2r(t) s(t))
_
K
n=1
1
n
(t)
n
(t
)
_
. .
K
1
(t,t
), K
s(t
)dtdt
2
_
T
f
T
i
r(t)
_
_
T
f
T
i
K
1
(t, t
)s(t
)dt
_
dt
_
T
f
T
i
s(t)
_
_
T
f
T
i
K
1
(t, t
)s(t
)dt
_
dt
Cf. 2s
T
K
1
r s
T
K
1
s in discrete time
100
Detection and Estimation Theory E. G. Larsson
Interpretation 1: Matched template
With
g(t) =
_
T
f
T
i
K
1
(t, t
)s(t
)dt
we have
= 2
_
T
f
T
i
r(t)g(t)dt
_
T
f
T
i
s(t)g(t)dt
Note that g(t) can be found without knowing K
1
(t, t
) from:
s(t) =
_
T
f
T
i
K(t, t
)g(t
)dt
Proof:
101
Detection and Estimation Theory E. G. Larsson
Interpretation 2: Noise-whitening
Suppose we can nd h(t, t
) such that
K
1
(t, t
) =
_
T
f
T
i
h(u, t)h(u, t
)du, T
i
< t < T
f
, T
i
< t
< T
f
Then we can write = 2
_
T
f
T
i
r(t) s(t)dt
_
T
f
T
i
s
2
(t)dt where
r(t) =
_
T
f
T
i
h(t, t
)r(t
)dt
s(t) =
_
T
f
T
i
h(t, t
)s(t
)dt
)] = (t t
) (proof!)
102
Detection and Estimation Theory E. G. Larsson
Performance of detectors
Detection performance is P
D
= Q
_
_
Q
1
(P
F
)
1
2
_
T
f
T
i
s
2
(t)dt
_
_
where
_
T
f
T
i
s
2
(t)dt =
_
T
f
T
i
_
_
T
f
T
i
h(t, t
)s(t
)dt
__
_
T
f
T
i
h(t, t
)s(t
)dt
_
dt
=
_
T
f
T
i
_
T
f
T
i
s(t)K
1
(t, t
)s(t
)dtdt
) = K(t t
)
Innite interval: T
i
= , T
f
=
Separable kernel: K(t, t
) =
N
0
2
(t t
) +
K
k=1
k
(t)
k
(t
)
Rational spectra
We next look at some special cases
104
Detection and Estimation Theory E. G. Larsson
Stationary noise and innite observation interval
Suppose T
i
= , T
f
= . The inverse kernel satises
_
K
1
(u t)K(t
u)du = (t t
)
The whitening lter is a (non-unique) solution to
K
1
(t t
) =
_
h(u t)h(u t
)du
After Fourier transformation
S
K
1(j) =
1
S
K
(j)
= |H(j)|
2
In practice h(t) must be chosen causal 105
Detection and Estimation Theory E. G. Larsson
Separable kernel (nite number of interferers)
Suppose K(t, t
) =
N
0
2
(t t
) +
K
k=1
k
(t)
k
(t
)
Then K
1
(t, t
) =
2
N
0
_
(t t
)
K
k=1
k
N
0
2
+
k
k
(t)
k
(t
)
_
and
g(t) =
_
T
f
T
i
K
1
(t, t
)s(t
)dt
=
2
N
0
_
s(t)
K
k=1
k
N
0
2
+
k
s
k
k
(t)
_
Interpretation: MMSE ltering. At high SNR, project out interference
Cf.
_
N
0
2
I +HH
T
_
1
=
2
N
0
_
I H
_
N
0
2
I +H
T
H
_
1
H
T
_
H
106
Detection and Estimation Theory E. G. Larsson
Lecture 8
107
Detection and Estimation Theory E. G. Larsson
Composite hypothesis testing in continuous time
Consider
_
H
0
: r(t) = s
0
(t, ) +n(t)
H
1
: r(t) = s
1
(t, ) +n(t)
, T
i
t T
f
As before, can be deterministic or random (with p())
Examples:
Communications: fading channel, time dispersion
Radar: unknown phase, RCS, ...
General Bayesian approach: average p(r
K
|) over
p(r
K
) =
_
p(r
K
|)p()d
_
p() exp
_
k=1
(s
k
() r
k
)
2
2
k
_
d
108
Detection and Estimation Theory E. G. Larsson
Some important examples
Random phase ():
_
H
0
: r(t) = n(t)
H
1
: r(t) = s(t) cos(
c
t +(t) +) +n(t)
, T
i
t T
f
Random amplitude and phase (A, ):
_
H
0
: r(t) = n(t)
H
1
: r(t) = As(t) cos(
c
t +(t) +) +n(t)
, T
i
t T
f
Unknown dispersive channel (h(t))
_
H
0
: r(t) = n(t)
H
1
: r(t) =
_
0
h()s(t )d +n(t)
, T
i
t T
f
109
Detection and Estimation Theory E. G. Larsson
Random phase
In white noise (
k
= N
0
/2),
=
_
2
0
p() exp
_
1
N
0
_
T
0
2r(t)s(t) cos(
c
t +(t) +)
s
2
(t) cos
2
(
c
t +(t) +)dt
_
d
If E
s
indep of , on quadrature form, with z =
_
T
0
r(t)s(t)e
j
c
t+j(t)
dt
_
2
0
p() exp
_
2
N
0
Re{ze
+j
}
_
d
110
Detection and Estimation Theory E. G. Larsson
Random phase, uniform density
For p() = 1/(2) over [0, 2]
1
2
_
2
0
exp
_
2
N
0
Re{ze
+j
}
_
d = I
0
_
2
N
0
|z|
_
where
I
0
(x) =
1
2
_
2
0
e
x cos()
d
For hard decisions this is equivalent to quadrature-square receiver:
|z|
2
=
_
T
0
r(t)s(t) cos (
c
t +(t))dt
2
+
_
T
0
r(t)s(t) sin (
c
t +(t))dt
2
111
Detection and Estimation Theory E. G. Larsson
Random amplitude and phase: Rayleigh fading
Suppose A is Rayleigh. Then r(t) has the same distribution as
r(t) = A
1
cos(
c
t +(t))s(t)
. .
s
1
(t)
+A
2
sin(
c
t +(t))s(t)
. .
s
2
(t)
+n(t)
where A
1
, A
2
are i.i.d. N(0, ) and s
1
(t), s
2
(t) orthogonal
Likelihood ratio
=
_
1
2
1
2
e
1
2
2
(A
2
1
+A
2
2
)
exp
_
2
N
0
_
T
0
r(t)(A
1
s
1
(t) +A
2
s
2
(t))dt
_
exp
_
1
N
0
_
T
0
(A
1
s
1
(t) +A
2
s
2
(t))
2
dt
_
dA
1
dA
2
The optimal test is
_
_
T
0
s
1
(t)r(t)dt
_
2
+
_
_
T
0
s
2
(t)r(t)dt
_
2
112
Detection and Estimation Theory E. G. Larsson
Ex. Communication with orthogonal FSK in Rayleigh
fading
Consider Rayleigh fading channel, say E[A
2
] = 1, U[0, 2]
Consider: (suppose
_
T
0
s
2
(t)dt = 1)
_
H
0
: r(t) = As(t) cos(
0
t +) +n(t)
H
1
: r(t) = As(t) cos(
1
t +) +n(t)
, T
i
t T
f
Optimal receiver?
Probability of error P(e)? (Many dierent cases of interest.)
A, known at receiver (perfect sync)
A known but unknown (phase uncertainty)
neither A nor known (phase and amplitude uncertainty)
113