2/19/2023
CHAPTER THREE
Ch3_Eigenanalysis and Performance Surface
Ch4_Search Methods
adaptive filters theory and applications
Behrouz Farhang-Boroujeny
EIGENVALUES AND EIGENVECTORS
Let
A nonzero N-by-1 vector q is said to be an eigenvector of R,
if it satisfies the equation
for some scalar constant λ. The scalar λ is called the
eigenvalue of R associated with the eigenvector q.
We note that if q is an eigenvector of R, then for any
nonzero scalar a, aq is also an eigenvector of R,
corresponding to the same eigenvalue, λ
To find the eigenvalues and eigenvectors of R, we note that
Eq. (4.2) may be rearranged as 2
1
2/19/2023
2 0
Example: find eigenvalues and eigenvectors of 𝑅 = .
0 3
Solution:
2 0 1 0
det(𝑅 − 𝜆𝐼) = 0 ⇒ det( −𝜆 )=0
0 3 0 1
2−𝜆 0
det( ) = 0 ⇒ (2 − 𝜆)(3 − 𝜆)=0
0 3−𝜆
𝜆 = 2 𝑎𝑛𝑑 𝜆 = 3
For 𝜆 = 2
2−2 0 𝑥
(𝑅 − 𝜆𝐼)𝑞 = 0 ⇒ ( . 𝑦 =0
0 3−2
0∗𝑥+0∗𝑦 0
= ⇒∴ 𝑦 =0
0∗𝑥+1∗𝑦 0
𝒙 𝟏
𝒒𝟏 = 𝒘𝒉𝒆𝒓𝒆 𝒙 ≠ 𝟎 𝒇𝒐𝒓 𝒔𝒊𝒎𝒑𝒍𝒊𝒄𝒊𝒕𝒚 𝒒𝟏 =
𝟎 𝟎
At the same manner for For 𝜆 = 3
𝟎 𝟎
𝒒𝟐 = 𝒘𝒉𝒆𝒓𝒆 𝒚 ≠ 𝟎 𝒇𝒐𝒓 𝒔𝒊𝒎𝒑𝒍𝒊𝒄𝒊𝒕𝒚 𝒒𝟐 =
𝒚 𝟏 4
2
2/19/2023
Example 1 : Find the eigenvalues and the eigenvectors
1 3
of the matrix 𝐴 = .
4 2
Solution:
1 3 1 0 1 3 0
𝐴 − 𝜆. 𝐼 = − . = −
4 2 0 1 4 2 0
1− 3
𝐴 − 𝜆. 𝐼 =
4 2−
In general to form A - I from A one simply subtracts
from the diagonal entries.
∴The characteristic equation is
1− 3
𝐴 − 𝜆. 𝐼 = = 1− 2− −3∗4 = 0
4 2−
∴ 2 − − 2 + − 12 = 0
− 3 − 10 = 0
So the eigenvalues are
5
1 = 5 and 2 = - 2
For 1 = 5
𝑥
Let 𝑉 = 𝑦 is the eigenvector of the matrix A
corresponding to 1 = 5.
∵ 𝐴 − 𝜆. 𝐼 . 𝑉 = 0
1 3 1 0 𝑥 0
∴ −5 . 𝑦 =
4 2 0 1 0
1−5 3 𝑥 0
. =
4 2−5 𝑦 0
−4𝑥 + 3𝑦 0
=
4𝑥 − 3𝑦 0
So
- 4x + 3y = 0
4x - 3y = 0 6
3
2/19/2023
Since the second equation is the negative of the
first, any solution to the first equation is also a
solution to the second. So it suffices to solve the
first equation.
3
𝑥= 𝑦
4
y 1 4 8 …..
x 3/4 3 6 ……
𝟑
So any multiple of the vector 𝑽𝟏 = is an eigenvector
𝟒
for 1 = 5.
7
For 2 = - 2
𝑥
Let 𝑉 = 𝑦 is the eigenvector of the matrix A
corresponding to 2 = -2.
∵ 𝐴 − .𝐼 .𝑉 = 0
1 3 1 0 𝑥 0
∴ +2 . 𝑦 =
4 2 0 1 0
1+2 3 𝑥 0
. =
4 2+2 𝑦 0
3𝑥 + 4𝑦 0
=
4𝑥 + 4𝑦 0
So
3x + 3y = 0
4x +4y = 0 8
4
2/19/2023
Since the second equation is the multiple of the
first, any solution to the first equation is also a
solution to the second. So it suffices to solve the
first equation.
x = -y
y -1 1 2 …..
x 1 -1 -2 ……
1
any multiple of the vector 𝑉 = is an eigenvector for 2 = -2.
−1
PROPERTIES OF
EIGENVALUES AND EIGENVECTORS
ome of the properties derived here are directly related
to the fact that the correlation matrix R is Hermitian
and nonnegative definite.
A matrix A, in general, is said to be Hermitian (or
self-adjoint matrix) if 𝑨 = 𝑨 = 𝐴 .
The N-by-N Hermitian matrix A is said to be
nonnegative definite or positive semidefinite, if :
𝑣 . 𝐴. 𝑣 ≥ 0
The fact that A is Hermitian implies that 𝒗𝑯 . 𝑨. 𝒗 is
real-valued.
the correlation matrix R is almost always
positive definite.
10
5
2/19/2023
1. The eigenvalues of the correlation matrix R are
all real and nonnegative.
2. If qi and qj are two eigenvectors of the
correlation matrix R that correspond
to two of its distinct eigenvalues, then:
In other words, eigenvectors associated with the
distinct eigenvalues of the correlation
matrix R are mutually orthogonal.
3. Assume the eigenvectors q0, q1, . . . , qN-1 are all
normalized to have a length of unity, and define the
N-by-N matrix 𝑄 = [𝑞 𝑞 … 𝑞 ] is then a unitary
matrix, i.e., 𝑄 𝑄 = 𝐼, This implies that the matrices
Q and 𝑄 are the inverse of each other. 11
4. For any N-by-N correlation matrix R, one can
always find a set of mutually orthogonal
eigenvectors. Such a set may be used as a basis
to express any vector in the N-dimensional
space of complex vectors.
5. Unitary Similarity Transformation. The
correlation matrix R can always be decomposed
as:
12
6
2/19/2023
6. Let λ0, λ1, . . . , λN−1 be the eigenvalues of the
correlation matrix R. Then,
where tr[R] denotes trace of R and is defined as the
sum of the diagonal elements of R.
7. Minimax Theorem:
for i = 1, 2, . . . , N - 1
13
8. The eigenvalues of the correlation matrix R of a
discrete-time stationary stochastic process {x(n)}
are bounded by the minimum and maximum
values of the power spectral density, Φ (𝑒 ),
of the process.
14
7
2/19/2023
9. Karhunen–Lo´eve expansion
15
THE CANONICAL FORM OF
THE ERROR-PERFORMANCE SURFACE
We recall from the last lecture the performance
function {mean-squared error (MSE)} of a
transversal Wiener filter with a real-valued input
sequence x(n) and a desired output sequence d(n)
is
Also, we recall that the optimum value of the
Wiener filter tap-weight vector is obtained from
the Wiener-Hopf equation
16
8
2/19/2023
The performance function ξ may be
rearranged as follows:
17
we use eigen-decomposition to express the
correlation matrix R of the tap-input vector in
terms of its eigenvalues and associated
eigenvectors (see Appendix E in Haykin Textbook)
18
9
2/19/2023
This new formulation of the mean-square error
contains no cross-product terms, as shown by
where vk is the kth component of the vector v.
19
Example 4.3: Consider the case where a two-tap
transversal Wiener filter is characterized by the
following parameters:
We want to explore the performance surface of this
filter for values of α ranging from 0 to 1.
The performance function of the filter is obtained
by substituting the above parameters in Eq. (4.81).
This gives
20
10
2/19/2023
Solving the Wiener–Hopf equation to obtain the
optimum tap weights of the filter, we obtain:
21
To convert this to its canonical form, we should
first find the eigenvalues and eigenvectors of R.
To find the eigenvalues of R, we should solve the
characteristic equation
22
11
2/19/2023
23
24
12
2/19/2023
25
SEARCH METHODS
1. METHOD OF STEEPEST DESCENT
26
13
2/19/2023
We recall from Chapter 2 that the optimum tap-
weight vector wo is the one that minimizes the
performance function
where e(n) = d(n) - y(n) is the estimation error of the
Wiener filter. Also, we recall that the performance
function ξ can be expanded as:
Here, we assume that R and p are available.
however, this approach need difficult arithmetic
circuits (requiring the (computationally challenging)
inversion of the matrix Rx) not suitable for many
applications, therefore in the next section we
introduce another approach to find the tap-weight
vector w.
27
ITERATIVE SEARCH METHOD
In this chapter we present a set of algorithms that
iteratively search for the minimum of the cost function.
They do this based (at least) on the gradient of the cost
function, so they are often called deterministic gradient
algorithms.
In order for the cost function to depend only on the filter w,
the statistics Rx and rxd must be given.
In this way, these algorithms solve the Wiener-Hopf
equation iteratively, most of them without requiring the
(computationally challenging) inversion of the matrix Rx.
However, all the information from the environment is
captured in the second-order statistics and these
algorithms do not have a learning mechanism for adapting
to changes in the environment. In the next chapter we will
see how adaptive filters solve this issue. 28
14
2/19/2023
STEEPEST DESCENT ALGORITHM
The method of steepest descent is a general scheme
that uses the following steps to search for the
minimum point of any convex function of a set of
parameters:
1. Start with an initial guess of the parameters whose
optimum values are to be found for minimizing the
function.
2. Find the gradient of the function with respect to
these parameters at the present point.
3. Update the parameters by taking a step in the
opposite direction of the gradient vector obtained in
Step 2. This corresponds to a step in the direction of
steepest descent in the cost function at the present
point. Furthermore, the size of the step taken is
chosen proportional to the size of the gradient
vector.
4. Repeat Steps 2 and 3 until no further significant 29
change is observed in the parameters.
To implement this procedure in the case of the
transversal filter shown in Figure 5.1, we recall
from Chapter 2 that
30
15
2/19/2023
As we shall soon show, the convergence of w(k) to
the optimum solution wo and the speed at which
this convergence takes place are dependent on
the size of the step-size parameter μ.
A large step-size may result in divergence of this
recursive equation
where I is the N-by-N identity matrix.
31
THE V-AXES
we substitute for p from Eq. (5.6). Also,we
subtract wo from both sides of Eq. (5.11) and
rearrange the result to obtain
This is the tap-weight update equation in terms
of the v-axes
32
16
2/19/2023
THE V’-AXES
Recall that R has the following unitary similarity
decomposition
33
The vector recursive Eq. (5.18) may be separated
into the scalar recursive equations
the step-size parameter μ is selected so that
34
17
2/19/2023
35
Starting with an initial value w(0) = [w0(0) w1(0)]T and letting the
recursive equation (5.29) to run, we get two sequences of the tap-weight
variables w0(k) and w1(k). 36
18
2/19/2023
LEARNING CURVE
The curve obtained by plotting ξ(k) as a function
of the iteration index, k, is called learning curve.
A learning curve of the steepest-descent
algorithm, as can be seen from Eq. (5.31),
consists of a sum of N exponentially decaying
terms, each of which corresponds to one of the
modes of convergence of the algorithm.
Each exponential term may be characterized by a
time constant, which is obtained as follows.
37
38
19
2/19/2023
39
The existence of two distinct time constants on the learning curve in
Figure 5.5 is clearly observed.
40
20
2/19/2023
41
42
21
2/19/2023
EFFECT OF EIGENVALUE SPREAD
Our study in the last two sections shows that the
performance of the steepest-descent algorithm is highly
dependent on the eigenvalues of the correlation matrix R.
In general, a wider spread of the eigenvalues results in a
poorer performance of the steepest-descent algorithm.
To gain further insight into this property of the steepest-
descent algorithm, we find the optimum value of the step-
size parameter μ, which results in the fastest possible
convergence of the steepest-descent algorithm.
43
THE GEOMETRICAL RATIO FACTOR
44
22
2/19/2023
NEWTON’S METHOD
Our discussions in the last few sections show that
the steepest-descent algorithm may suffer from
slow modes of convergence, which arise due to
the spread in the eigenvalues of the correlation
matrix R.
This means that if we can somehow get rid of the
eigenvalue spread, we can get a much better
convergence performance. This is exactly what
Newton’s method does.
45
To derive Newton’s method for the quadratic
case, we start from the steepest descent
algorithm given in Eq. (5.10). Using p = Rwo, Eq.
(5.10) becomes:
46
23
2/19/2023
47
48
24