Detection and Estimation Theory
Sau-Hsuan Wu
            Institute of Communications Engineering
                 National Chiao Tung University
Institute of Communications Engineering, EE, NCTU     1
Introduction to Detection and Estimation Theory       Sau-Hsuan Wu
    Detection theory
        Determine whether or not an event of interest occurs
        E.g. Whether or not an aircraft occurs
    Estimation theory
        Determine the values of parameters pertaining to an event
         of interest
        E.g. The altitude and position of an aircraft
 Institute of Communications Engineering, EE, NCTU                   2
Introduction to Detection and Estimation Theory         Sau-Hsuan Wu
    Under what basis do we do detection and estimation
        The signal source model and the observations
 Institute of Communications Engineering, EE, NCTU                 3
Introduction to Detection and Estimation Theory      Sau-Hsuan Wu
    Another application of detection and estimation theory
        Determine the existence of a submarine
        Estimate the bearing
 Institute of Communications Engineering, EE, NCTU              4
Introduction to Detection and Estimation Theory         Sau-Hsuan Wu
    Many more applications
        Biomedicine: estimate the heart rate or even heart diseases
        Image analysis: estimate the size, position and orientation of
         an object in an image
        Seismology: estimate the underground distance of an oil
         deposit
        Communications: estimate the signal distortion and
         determine the transmitted symbols
        Economics: estimate the Don-Jones industrial average
 Institute of Communications Engineering, EE, NCTU                    5
Introduction to Detection and Estimation Theory          Sau-Hsuan Wu
    The mathematical estimation problem
        We want to know (or estimate) an unknown quantity 
        This quantity, , may not be obtained directly, or it is a
         notion (quantity) derived from a set of observations
        E.g. the Don-Jones average
        We seem to have a model x[n]=A + Bn + w[n], n=0,…,N-1
 Institute of Communications Engineering, EE, NCTU                    6
Introduction to Detection and Estimation Theory        Sau-Hsuan Wu
       The observations are corrupted by noise w[n]
       The noise may be modeled as a white Gaussian noise N(0,2)
       We want to determine  = [A, B] from the set of corrupted
        observations x = x[0],…, x[N-1] which can be modeled as
        P(x ; A, B) =
 Institute of Communications Engineering, EE, NCTU                7
Introduction to Detection and Estimation Theory     Sau-Hsuan Wu
    Our goal is to determine  according to the observations,
     namely
        Examples:
            x[n] = A1 + w[n],          n=0,…,N-1
            x[n] = A2 + B2 n + w[n],   n=0,…,N-1
        Â1= x[n]/N
          and ? Can we try the least square method
    How many other kinds of methods can we use
        Minimum variance unbiased estimator (MVU)
        Best linear unbiased estimator (BLUE)
        Maximum likelihood estimator (ML)
        Bayesian estimator: MMSE, MAP, and …
 Institute of Communications Engineering, ECE, NCTU              8
Introduction to Detection and Estimation Theory      Sau-Hsuan Wu
    The mathematical detection problem
        We want to determine the presence or absence of an object
        Similarly, we have some observations x[n], n=0,…,N-1
        Based on our problem setting, x[n] can be modeled as
                         H0 : x[n] = s0 + w[n]
                         H1 : x[n] = s1 + w[n]
 Institute of Communications Engineering, ECE, NCTU                  9
Introduction to Detection and Estimation Theory       Sau-Hsuan Wu
        In this special case, s0 = 0 and s1 = 1
        Then, we have two hypotheses for x[n] in this case
                           H0 : x[n] = w[n]
                           H1 : x[n] = 1 + w[n]
    Suppose the noise w[n] ~ N(0, 2)
        And we have one observation only
 Institute of Communications Engineering, ECE, NCTU             10
Introduction to Detection and Estimation Theory    Sau-Hsuan Wu
       A reasonable approach is to decide H1 if x[0] > 1/2
       Type I error: We decide H1 when H0 is true (False Alarm)
       Type II error: We decide H0 when H1 is true (Miss)
 Institute of Communications Engineering, ECE, NCTU            11
Introduction to Detection and Estimation Theory    Sau-Hsuan Wu
    As the threshold  increases, one error increases and the
     other decreases
        P(H0 ; H1) = 1- P(H1 ; H1)
 Institute of Communications Engineering, ECE, NCTU              12
Introduction to Detection and Estimation Theory      Sau-Hsuan Wu
    In general, we have multiple observations x[n], n=0,…,N-1
                         H0 : x[n] = w[n]
                         H1 : x[n] = A + w[n]
        We would decide H1 if
                      T = x[n]/N > 
        The detection performance increases with d2 = NA2/ 2
    Other approach for detection
        Neyman-Pearson approach
    When H0 and H1 are thought of random variables
        Maximum likelihood (ML) detector
        Maximum a posterior (MAP) detector
        Composite hypothesis testing
 Institute of Communications Engineering, ECE, NCTU              13
   Detection Theory
Institute of Communications Engineering, ECE, NCTU   14
Introduction to Detection and Estimation Theory      Sau-Hsuan Wu
    Neyman-Pearson (NP) approach
        We have observations x[n], n=0,…, N-1, x{x[0],…,x[N-1]}
        Given P(x ; Hi ), we decide H1 if P(x ; H1) / P(x ; H0) > 
        Goal: maximize PD = P(H1 ; H1)
                subject to PFA = P(H1 ; H0)  
         is called the significance level
    For example, H0 : x[n] = w[n]
                  H1 : x[n] = A + w[n], n=0,…, N-1,
 Institute of Communications Engineering, ECE, NCTU               15
Introduction to Detection and Estimation Theory   Sau-Hsuan Wu
 Institute of Communications Engineering, ECE, NCTU         16
Introduction to Detection and Estimation Theory   Sau-Hsuan Wu
 
Introduction to Detection and Estimation Theory             Sau-Hsuan Wu
    Bayesian approaches
        In some detection problems, it makes sense to assign
         probabilities to the various hypotheses Hi
        Define a detector (x) = P(decide H1|x) [0, 1]
        The error probability of detection becomes
         Pe = P(H1 | H0) P(H0) + P(H0 | H1) P(H1)
            = P(H0) (x) P(x |H0)dx + P(H1)[1- (x)] P(x |H1)dx
            = P(H1) +  (x)[P(x |H0) P(H0)-P(x |H1) P(H1)] dx
        We want to design a (x) [0, 1] to minimize Pe
            (x) = 0 if P(x |H0) P(H0)-P(x |H1) P(H1) > 0
            (x) = 1 if P(x |H0) P(H0)-P(x |H1) P(H1)  0
            Or, (x) = 1 if P(x |H1) / P(x |H0)  (x)  P(H0) / P(H1) = 
                      = 0 if (x) < 
 Institute of Communications Engineering, ECE, NCTU                       18
Introduction to Detection and Estimation Theory              Sau-Hsuan Wu
        The function (x)=P(x |H1) / P(x |H0) is the likelihood ratio
        And the detection process is called the likelihood ratio test
    Maximum a posteriori (MAP) detector
        We may rewrite
            (x) = 0 if P(x |H0) P(H0)-P(x |H1) P(H1) > 0
            (x) = 1 if P(x |H0) P(H0)-P(x |H1) P(H1)  0
        Into
            (x) = arg max {0, 1} P(x |H) P(H )
                  = arg max {0, 1} P(H|x) P(x) = arg max {0, 1} P(H|x)
    Maximum likelihood (ML) detector
        If P(H0) = P(H1)
            (x) = arg max {0, 1} P(x |H)
 Institute of Communications Engineering, ECE, NCTU                            19
Introduction to Detection and Estimation Theory    Sau-Hsuan Wu
    For example, H0 : x[n] = w[n]
                  H1 : x[n] = A + w[n], n = 0,…, N-1
        P(H0) = P(H1) = ½ and w[n] ~ N(0, 2)
        As a result,  = P(H0)/P(H1) = 1, and
        (x) = P(x |H1)/P(x |H0) >  = 1 yields
        Or equivalently
 Institute of Communications Engineering, ECE, NCTU          20
Introduction to Detection and Estimation Theory   Sau-Hsuan Wu
       Thus, we decide H1 if
       Given that
       As a result
       Since Q(-x) = 1-Q(x), we have
 Institute of Communications Engineering, ECE, NCTU         21
Introduction to Detection and Estimation Theory        Sau-Hsuan Wu
    For MAP detector: arg max {0, 1} P(H|x)
        Thus, we decide H1 if P(H1|x) > P(H0|x)
        Suppose we have only one observation x[0]
        The decision region varies with the prior probability
         (a) P(H0) = P(H1) = ½         (b) P(H0) = ¼, P(H1) = ¾
 Institute of Communications Engineering, ECE, NCTU               22
Introduction to Detection and Estimation Theory      Sau-Hsuan Wu
    In more general cases, we want to design a detector (x) to
     minimize the average cost of decision errors
        Consider a cost function Cab of deciding Ha when Hb is true
        Given the general set of costs {C00, C01, C10, C11}
        The average cost, or also called Bayesian Risk, is given by
         R = C00 P(H0 | H0) P(H0) + C01 P(H0 | H1) P(H1)
             + C10 P(H1 | H0) P(H0) + C11 P(H1 | H1) P(H1)
            = C00 P(H0)[1- (x)]P(x |H0)dx + C10P(H0) (x)P(x |H0)dx
            + C01 P(H1)[1- (x)]P(x |H1)dx + C11P(H1) (x)P(x |H1)dx
        R = C00 P(H0) + C01 P(H1) +
          (x)[(C10 – C00)P(H0)P(x |H0) + (C11 – C01)P(H1)P(x |H1)]dx
        (x) = 0 if                     >0
         (x) = 1 if                     0
 Institute of Communications Engineering, ECE, NCTU               23
Introduction to Detection and Estimation Theory      Sau-Hsuan Wu
    An alternative form
        [(C10 – C00)P(H0) P(x |H0) +(C11 – C01)P(H1) P(x |H1)]
         = [C10 P(H0|x) + C11P(H1|x) - C00P(H0|x) - C01P(H1|x)]P(x)
          [C1(x) - C0(x)] P(x)
        where Ci(x)=Ci0P(H0|x)+Ci1P(H1|x) is the cost of deciding Hi
        Therefore,
         (x) = 0 if C1(x) - C0(x) > 0 or C1(x) > C0(x)
         (x) = 1 if C1(x) - C0(x)  0
        Or, we can define two decision regions
         R1  {x|C1(x)  C0(x)}
         R0  {x|C1(x) > C0(x)}
            (x) = 1 if x  R1
             (x) = 0 if x  R0
 Institute of Communications Engineering, ECE, NCTU               24
Introduction to Detection and Estimation Theory             Sau-Hsuan Wu
    Multiple hypothesis testing
        Suppose we must decide among M hypotheses for which
         H0 : x ~ P(x |H0), H1 : x ~ P(x |H1), …, HM-1 : x ~ P(x |HM-1)
        The average cost is given by
         R = i j Cij P(Hi | Hj) P(Hj)
        Redefine the decision rule
         i(x)=P(decide Hi | x)[0, 1], and i i(x) =1
        R = i jCij  i(x)P(Hj |x) P(x)dx
        Redefine the cost of Ci(x) = j CijP(Hj|x)
        And define the decision region for Hi as
         Ri  {x|Ci(x)  Cj(x),  j  i }
        The decision rule i(x) that minimizes R is
            i(x) = 1 if x Ri , and i(x) = 0 otherwise
 Institute of Communications Engineering, ECE, NCTU                   25
   Estimation Theory
Institute of Communications Engineering, ECE, NCTU   26
Introduction to Detection and Estimation Theory       Sau-Hsuan Wu
    Let us recall the problems we talked about
        x[n] = A1 + w[n],              n=0,…,N-1
        x[n] = A2 + B2 n + w[n],       n=0,…,N-1
    We define a parameter  to represent the quantities to be
     estimated, e.g.  = A1 and  = [A2, B2] in the above cases
    We model the data by its probability density function
     (PDF), assuming that the data are inherently random
        As an example:
        We have a class of PDFs where each one is different due to a
         different value of , i.e. the PDFs are parameterized by  .
        The parameter  is assumed deterministic but unknown
 Institute of Communications Engineering, EE, NCTU                 27
Introduction to Detection and Estimation Theory      Sau-Hsuan Wu
    We conclude for the time being that we are hoping to
     have an estimator that gives
        An unbiased mean of the estimate:
        A minimum variance for the estimate:
    Is an unbiased estimator always the optimal estimator?
    Consider a widely used optimality criterion:
     the minimum mean squared error (MSE) criterion
 Institute of Communications Engineering, EE, NCTU             28
Introduction to Detection and Estimation Theory            Sau-Hsuan Wu
    As an example, consider the modified estimator
    We attempt to find the ‘a’ which yields the minimum MSE
        Since E(Ă)=aA and var(Ă) = a2 2 / N, we have
         mse (Ă) = a2 2 / N + (a-1)2A2
        Taking the derivative w.r.t. to a and setting it to zero leads to
         aopt = A2 / (A2 + 2/ N )
        The optimal value of a depends upon the unknown parameter
         A  The estimator is not realizable
        How do we resolve this dilemma?
            Since, mse (Ă) = var(Ă) + b2, as an alternative,
             we set b=0 and search for the estimator that minimizes var(Ă)
              minimum variance unbiased (MVU) estimator
 Institute of Communications Engineering, EE, NCTU                       29
Introduction to Detection and Estimation Theory             Sau-Hsuan Wu
    However, does a MVU estimator always exist for all ?
        Example:
            x[0] ~ N(,1)
            x[1] ~ N(,1) if   0
                  ~ N(,2) if   0
            Both of the two estimators are unbiased
            Therefore
            None of the estimator has a variance uniformly less than or
             equal to 18/36
 Institute of Communications Engineering, EE, NCTU                         30
Introduction to Detection and Estimation Theory        Sau-Hsuan Wu
    Is there a systematic way to find the MVU if it exists?
    We start by defining the set of data that is sufficient for
     estimation?
    What do we mean by sufficiency in estimation?
        We want to have a set of data T(x) such that given T(x), any
         individual data x(n) is statistically independent of A
    Suppose Â1= x[n]/N, then are the followings sufficient?
        S1 = {x[0], x[1],…, x[N-1]}
        S2 = x[n]
        GivenT0 = x[n] , do we still need the individual data?
 Institute of Communications Engineering, EE, NCTU                  31
Introduction to Detection and Estimation Theory      Sau-Hsuan Wu
    We say the conditional PDF : p( x | x[n] = T0 ; A)
     should not depend on A if T0 is sufficient
    E.g.
    For (a), a value of A near A0 is more likely even given T0
    For (b), however, p( x | x[n] = T0 ; A) is a constant
 Institute of Communications Engineering, EE, NCTU             32
Introduction to Detection and Estimation Theory      Sau-Hsuan Wu
    Now, we need to determine p( x | x[n] = T0 ; A) to
     show x[n] = T0 is sufficient
    By Baye’s rule
    Since T(x) is a direct function of x,
    Clearly, we have
 Institute of Communications Engineering, EE, NCTU             33
Introduction to Detection and Estimation Theory      Sau-Hsuan Wu
    Thus, we have
 Institute of Communications Engineering, EE, NCTU             34
Introduction to Detection and Estimation Theory      Sau-Hsuan Wu
    Since T(x) = x[n] ~ N(NA, N2)
    Thus
     which does not depend on A
 Institute of Communications Engineering, EE, NCTU             35
Introduction to Detection and Estimation Theory          Sau-Hsuan Wu
    In general, to identify potential sufficient statistics is
     difficult
    An efficient procedure for finding the sufficient statistics
     is to employ the Neyman-Fisher factorization theorem
    Observe that
    If we can factor p(x;) into p(x;)=g(T(x),) h(x)
     where
        g is a function depends on x only through T(x)
        h is a function depends only on x
         T(x) is a sufficient statistics for 
        The converse is also true
         If T(x) is a sufficient statistics  p(x;)=g(T(x),) h(x)
 Institute of Communications Engineering, EE, NCTU                    36
Introduction to Detection and Estimation Theory           Sau-Hsuan Wu
    Recall p(x;A)
    On the other hand, we want to estimate 2 of y[n]=A+x[n]
        Suppose A is given, then define x[n] = y[n]-A
        Clearly, T(x) = x2[n] is a sufficient statistics for 2
 Institute of Communications Engineering, EE, NCTU                  37
Introduction to Detection and Estimation Theory      Sau-Hsuan Wu
    Ex. we want to estimate the phase of a sinusoid
      x[n]=A cos(2 f0 n + ) + w[n], n=0,1,…,N-1
        Suppose A and f0 are given
        Expand the exponent
 Institute of Communications Engineering, EE, NCTU             38
Introduction to Detection and Estimation Theory      Sau-Hsuan Wu
    In this case, no single sufficient statistics exists, however
                             g(T1(x),T2 (x),  )
                                       h(x)
 Institute of Communications Engineering, EE, NCTU                   39
Introduction to Detection and Estimation Theory              Sau-Hsuan Wu
    The r statistics T1(x), T2(x),…, Tr(x) are jointly sufficient
     if p(x | T1(x), T2(x),…, Tr(x) ; ) does not depend on 
    If p(x ; ) = g(T1(x), T2(x),…, Tr(x) , ) h(x)
      {T1(x), T2(x),…, Tr(x)} are sufficient statistics for 
    Now, we know how to obtain the sufficient statistics
    How do we apply them to help obtain the MVU estimator?
    The Rao-Blackwell-Lehmann-Scheffe Theorem
        If is an unbiased estimator of  and T(x) is a sufficient
         statistic for , then               is unbiased and
            A valid estimator for  (not dependent on )
            Of lesser or equal variance than that of   for all 
            If T(x) is complete, then     is the MVU estimator
 Institute of Communications Engineering, EE, NCTU                     40
Introduction to Detection and Estimation Theory        Sau-Hsuan Wu
    Finally, a statistic is complete if there is only one function,
     say g, of the statistic that is unbiased
                                   is solely a function of T(x)
      If T(x) is complete  is unique and unbiased
    Besides,                        for any unbiased estimator
    Then,                                  must be the MVU
    In summary, the MVU can be found by
        Taking any unbiased      and carrying out
        Alternatively, since there is only one function of T(x) that
         leads to an unbiased estimator
          find the unique g(T(x)) that makes                 unbiased
 Institute of Communications Engineering, EE, NCTU                   41