Detection and Estimation Theory
Sau-Hsuan Wu
Institute of Communications Engineering
National Chiao Tung University
Institute of Communications Engineering, EE, NCTU 1
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
Detection theory
Determine whether or not an event of interest occurs
E.g. Whether or not an aircraft occurs
Estimation theory
Determine the values of parameters pertaining to an event
of interest
E.g. The altitude and position of an aircraft
Institute of Communications Engineering, EE, NCTU 2
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
Under what basis do we do detection and estimation
The signal source model and the observations
Institute of Communications Engineering, EE, NCTU 3
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
Another application of detection and estimation theory
Determine the existence of a submarine
Estimate the bearing
Institute of Communications Engineering, EE, NCTU 4
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
Many more applications
Biomedicine: estimate the heart rate or even heart diseases
Image analysis: estimate the size, position and orientation of
an object in an image
Seismology: estimate the underground distance of an oil
deposit
Communications: estimate the signal distortion and
determine the transmitted symbols
Economics: estimate the Don-Jones industrial average
Institute of Communications Engineering, EE, NCTU 5
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
The mathematical estimation problem
We want to know (or estimate) an unknown quantity
This quantity, , may not be obtained directly, or it is a
notion (quantity) derived from a set of observations
E.g. the Don-Jones average
We seem to have a model x[n]=A + Bn + w[n], n=0,…,N-1
Institute of Communications Engineering, EE, NCTU 6
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
The observations are corrupted by noise w[n]
The noise may be modeled as a white Gaussian noise N(0,2)
We want to determine = [A, B] from the set of corrupted
observations x = x[0],…, x[N-1] which can be modeled as
P(x ; A, B) =
Institute of Communications Engineering, EE, NCTU 7
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
Our goal is to determine according to the observations,
namely
Examples:
x[n] = A1 + w[n], n=0,…,N-1
x[n] = A2 + B2 n + w[n], n=0,…,N-1
Â1= x[n]/N
and ? Can we try the least square method
How many other kinds of methods can we use
Minimum variance unbiased estimator (MVU)
Best linear unbiased estimator (BLUE)
Maximum likelihood estimator (ML)
Bayesian estimator: MMSE, MAP, and …
Institute of Communications Engineering, ECE, NCTU 8
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
The mathematical detection problem
We want to determine the presence or absence of an object
Similarly, we have some observations x[n], n=0,…,N-1
Based on our problem setting, x[n] can be modeled as
H0 : x[n] = s0 + w[n]
H1 : x[n] = s1 + w[n]
Institute of Communications Engineering, ECE, NCTU 9
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
In this special case, s0 = 0 and s1 = 1
Then, we have two hypotheses for x[n] in this case
H0 : x[n] = w[n]
H1 : x[n] = 1 + w[n]
Suppose the noise w[n] ~ N(0, 2)
And we have one observation only
Institute of Communications Engineering, ECE, NCTU 10
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
A reasonable approach is to decide H1 if x[0] > 1/2
Type I error: We decide H1 when H0 is true (False Alarm)
Type II error: We decide H0 when H1 is true (Miss)
Institute of Communications Engineering, ECE, NCTU 11
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
As the threshold increases, one error increases and the
other decreases
P(H0 ; H1) = 1- P(H1 ; H1)
Institute of Communications Engineering, ECE, NCTU 12
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
In general, we have multiple observations x[n], n=0,…,N-1
H0 : x[n] = w[n]
H1 : x[n] = A + w[n]
We would decide H1 if
T = x[n]/N >
The detection performance increases with d2 = NA2/ 2
Other approach for detection
Neyman-Pearson approach
When H0 and H1 are thought of random variables
Maximum likelihood (ML) detector
Maximum a posterior (MAP) detector
Composite hypothesis testing
Institute of Communications Engineering, ECE, NCTU 13
Detection Theory
Institute of Communications Engineering, ECE, NCTU 14
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
Neyman-Pearson (NP) approach
We have observations x[n], n=0,…, N-1, x{x[0],…,x[N-1]}
Given P(x ; Hi ), we decide H1 if P(x ; H1) / P(x ; H0) >
Goal: maximize PD = P(H1 ; H1)
subject to PFA = P(H1 ; H0)
is called the significance level
For example, H0 : x[n] = w[n]
H1 : x[n] = A + w[n], n=0,…, N-1,
Institute of Communications Engineering, ECE, NCTU 15
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
Institute of Communications Engineering, ECE, NCTU 16
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
Bayesian approaches
In some detection problems, it makes sense to assign
probabilities to the various hypotheses Hi
Define a detector (x) = P(decide H1|x) [0, 1]
The error probability of detection becomes
Pe = P(H1 | H0) P(H0) + P(H0 | H1) P(H1)
= P(H0) (x) P(x |H0)dx + P(H1)[1- (x)] P(x |H1)dx
= P(H1) + (x)[P(x |H0) P(H0)-P(x |H1) P(H1)] dx
We want to design a (x) [0, 1] to minimize Pe
(x) = 0 if P(x |H0) P(H0)-P(x |H1) P(H1) > 0
(x) = 1 if P(x |H0) P(H0)-P(x |H1) P(H1) 0
Or, (x) = 1 if P(x |H1) / P(x |H0) (x) P(H0) / P(H1) =
= 0 if (x) <
Institute of Communications Engineering, ECE, NCTU 18
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
The function (x)=P(x |H1) / P(x |H0) is the likelihood ratio
And the detection process is called the likelihood ratio test
Maximum a posteriori (MAP) detector
We may rewrite
(x) = 0 if P(x |H0) P(H0)-P(x |H1) P(H1) > 0
(x) = 1 if P(x |H0) P(H0)-P(x |H1) P(H1) 0
Into
(x) = arg max {0, 1} P(x |H) P(H )
= arg max {0, 1} P(H|x) P(x) = arg max {0, 1} P(H|x)
Maximum likelihood (ML) detector
If P(H0) = P(H1)
(x) = arg max {0, 1} P(x |H)
Institute of Communications Engineering, ECE, NCTU 19
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
For example, H0 : x[n] = w[n]
H1 : x[n] = A + w[n], n = 0,…, N-1
P(H0) = P(H1) = ½ and w[n] ~ N(0, 2)
As a result, = P(H0)/P(H1) = 1, and
(x) = P(x |H1)/P(x |H0) > = 1 yields
Or equivalently
Institute of Communications Engineering, ECE, NCTU 20
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
Thus, we decide H1 if
Given that
As a result
Since Q(-x) = 1-Q(x), we have
Institute of Communications Engineering, ECE, NCTU 21
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
For MAP detector: arg max {0, 1} P(H|x)
Thus, we decide H1 if P(H1|x) > P(H0|x)
Suppose we have only one observation x[0]
The decision region varies with the prior probability
(a) P(H0) = P(H1) = ½ (b) P(H0) = ¼, P(H1) = ¾
Institute of Communications Engineering, ECE, NCTU 22
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
In more general cases, we want to design a detector (x) to
minimize the average cost of decision errors
Consider a cost function Cab of deciding Ha when Hb is true
Given the general set of costs {C00, C01, C10, C11}
The average cost, or also called Bayesian Risk, is given by
R = C00 P(H0 | H0) P(H0) + C01 P(H0 | H1) P(H1)
+ C10 P(H1 | H0) P(H0) + C11 P(H1 | H1) P(H1)
= C00 P(H0)[1- (x)]P(x |H0)dx + C10P(H0) (x)P(x |H0)dx
+ C01 P(H1)[1- (x)]P(x |H1)dx + C11P(H1) (x)P(x |H1)dx
R = C00 P(H0) + C01 P(H1) +
(x)[(C10 – C00)P(H0)P(x |H0) + (C11 – C01)P(H1)P(x |H1)]dx
(x) = 0 if >0
(x) = 1 if 0
Institute of Communications Engineering, ECE, NCTU 23
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
An alternative form
[(C10 – C00)P(H0) P(x |H0) +(C11 – C01)P(H1) P(x |H1)]
= [C10 P(H0|x) + C11P(H1|x) - C00P(H0|x) - C01P(H1|x)]P(x)
[C1(x) - C0(x)] P(x)
where Ci(x)=Ci0P(H0|x)+Ci1P(H1|x) is the cost of deciding Hi
Therefore,
(x) = 0 if C1(x) - C0(x) > 0 or C1(x) > C0(x)
(x) = 1 if C1(x) - C0(x) 0
Or, we can define two decision regions
R1 {x|C1(x) C0(x)}
R0 {x|C1(x) > C0(x)}
(x) = 1 if x R1
(x) = 0 if x R0
Institute of Communications Engineering, ECE, NCTU 24
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
Multiple hypothesis testing
Suppose we must decide among M hypotheses for which
H0 : x ~ P(x |H0), H1 : x ~ P(x |H1), …, HM-1 : x ~ P(x |HM-1)
The average cost is given by
R = i j Cij P(Hi | Hj) P(Hj)
Redefine the decision rule
i(x)=P(decide Hi | x)[0, 1], and i i(x) =1
R = i jCij i(x)P(Hj |x) P(x)dx
Redefine the cost of Ci(x) = j CijP(Hj|x)
And define the decision region for Hi as
Ri {x|Ci(x) Cj(x), j i }
The decision rule i(x) that minimizes R is
i(x) = 1 if x Ri , and i(x) = 0 otherwise
Institute of Communications Engineering, ECE, NCTU 25
Estimation Theory
Institute of Communications Engineering, ECE, NCTU 26
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
Let us recall the problems we talked about
x[n] = A1 + w[n], n=0,…,N-1
x[n] = A2 + B2 n + w[n], n=0,…,N-1
We define a parameter to represent the quantities to be
estimated, e.g. = A1 and = [A2, B2] in the above cases
We model the data by its probability density function
(PDF), assuming that the data are inherently random
As an example:
We have a class of PDFs where each one is different due to a
different value of , i.e. the PDFs are parameterized by .
The parameter is assumed deterministic but unknown
Institute of Communications Engineering, EE, NCTU 27
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
We conclude for the time being that we are hoping to
have an estimator that gives
An unbiased mean of the estimate:
A minimum variance for the estimate:
Is an unbiased estimator always the optimal estimator?
Consider a widely used optimality criterion:
the minimum mean squared error (MSE) criterion
Institute of Communications Engineering, EE, NCTU 28
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
As an example, consider the modified estimator
We attempt to find the ‘a’ which yields the minimum MSE
Since E(Ă)=aA and var(Ă) = a2 2 / N, we have
mse (Ă) = a2 2 / N + (a-1)2A2
Taking the derivative w.r.t. to a and setting it to zero leads to
aopt = A2 / (A2 + 2/ N )
The optimal value of a depends upon the unknown parameter
A The estimator is not realizable
How do we resolve this dilemma?
Since, mse (Ă) = var(Ă) + b2, as an alternative,
we set b=0 and search for the estimator that minimizes var(Ă)
minimum variance unbiased (MVU) estimator
Institute of Communications Engineering, EE, NCTU 29
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
However, does a MVU estimator always exist for all ?
Example:
x[0] ~ N(,1)
x[1] ~ N(,1) if 0
~ N(,2) if 0
Both of the two estimators are unbiased
Therefore
None of the estimator has a variance uniformly less than or
equal to 18/36
Institute of Communications Engineering, EE, NCTU 30
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
Is there a systematic way to find the MVU if it exists?
We start by defining the set of data that is sufficient for
estimation?
What do we mean by sufficiency in estimation?
We want to have a set of data T(x) such that given T(x), any
individual data x(n) is statistically independent of A
Suppose Â1= x[n]/N, then are the followings sufficient?
S1 = {x[0], x[1],…, x[N-1]}
S2 = x[n]
GivenT0 = x[n] , do we still need the individual data?
Institute of Communications Engineering, EE, NCTU 31
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
We say the conditional PDF : p( x | x[n] = T0 ; A)
should not depend on A if T0 is sufficient
E.g.
For (a), a value of A near A0 is more likely even given T0
For (b), however, p( x | x[n] = T0 ; A) is a constant
Institute of Communications Engineering, EE, NCTU 32
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
Now, we need to determine p( x | x[n] = T0 ; A) to
show x[n] = T0 is sufficient
By Baye’s rule
Since T(x) is a direct function of x,
Clearly, we have
Institute of Communications Engineering, EE, NCTU 33
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
Thus, we have
Institute of Communications Engineering, EE, NCTU 34
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
Since T(x) = x[n] ~ N(NA, N2)
Thus
which does not depend on A
Institute of Communications Engineering, EE, NCTU 35
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
In general, to identify potential sufficient statistics is
difficult
An efficient procedure for finding the sufficient statistics
is to employ the Neyman-Fisher factorization theorem
Observe that
If we can factor p(x;) into p(x;)=g(T(x),) h(x)
where
g is a function depends on x only through T(x)
h is a function depends only on x
T(x) is a sufficient statistics for
The converse is also true
If T(x) is a sufficient statistics p(x;)=g(T(x),) h(x)
Institute of Communications Engineering, EE, NCTU 36
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
Recall p(x;A)
On the other hand, we want to estimate 2 of y[n]=A+x[n]
Suppose A is given, then define x[n] = y[n]-A
Clearly, T(x) = x2[n] is a sufficient statistics for 2
Institute of Communications Engineering, EE, NCTU 37
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
Ex. we want to estimate the phase of a sinusoid
x[n]=A cos(2 f0 n + ) + w[n], n=0,1,…,N-1
Suppose A and f0 are given
Expand the exponent
Institute of Communications Engineering, EE, NCTU 38
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
In this case, no single sufficient statistics exists, however
g(T1(x),T2 (x), )
h(x)
Institute of Communications Engineering, EE, NCTU 39
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
The r statistics T1(x), T2(x),…, Tr(x) are jointly sufficient
if p(x | T1(x), T2(x),…, Tr(x) ; ) does not depend on
If p(x ; ) = g(T1(x), T2(x),…, Tr(x) , ) h(x)
{T1(x), T2(x),…, Tr(x)} are sufficient statistics for
Now, we know how to obtain the sufficient statistics
How do we apply them to help obtain the MVU estimator?
The Rao-Blackwell-Lehmann-Scheffe Theorem
If is an unbiased estimator of and T(x) is a sufficient
statistic for , then is unbiased and
A valid estimator for (not dependent on )
Of lesser or equal variance than that of for all
If T(x) is complete, then is the MVU estimator
Institute of Communications Engineering, EE, NCTU 40
Introduction to Detection and Estimation Theory Sau-Hsuan Wu
Finally, a statistic is complete if there is only one function,
say g, of the statistic that is unbiased
is solely a function of T(x)
If T(x) is complete is unique and unbiased
Besides, for any unbiased estimator
Then, must be the MVU
In summary, the MVU can be found by
Taking any unbiased and carrying out
Alternatively, since there is only one function of T(x) that
leads to an unbiased estimator
find the unique g(T(x)) that makes unbiased
Institute of Communications Engineering, EE, NCTU 41