Iterative method for solving linear system
Dr. Nachiketa Mishra
Indian Institute of Information Technology Design and Manufacturing Kancheepuram
Chennai-600127, India
March 26, 2024
1 / 33
Iterative method
▶ Solving a linear system of equations Ax = b can be approached using two
main methods: direct methods and iterative methods.
▶ Indirect methods are almost always iterative in nature: A simple process is
applied repeatedly to generate the sequence referred to previously.
▶ For large linear systems containing thousands of equations, iterative
methods often have decisive advantages over direct methods in terms of
speed and demands on computer memory.
▶ Sometimes, if the accuracy requirements are not stringent, a modest number
of iterations will suffice to produce an acceptable solution.
▶ For sparse systems (in which a large proportion of the elements in A are 0),
iterative methods are often very efficient.
2 / 33
Iterative methods
To convey the general idea, we describe two fundamental iterative methods.
Example: Consider the linear system
7 −6 x1 3
=
−8 9 x2 −4
How can it be solved by an iterative process?
Solution: A straightforward procedure would be to solve the i th equation for
the i th unknown as follows:
(k) 6 (k−1) 3
x1 = x2 +
7 7
(k) 8 (k−1) 4
x2 = x1 −
9 9
3 / 33
Jacobi method
▶ This is known as the Jacobi method or iteration.
▶ Initially, we select for x1(0) and x2(0) the best available guess for the solution,
or simply set them to 0.
▶ The equations above then generate what we hope are improved values, x1(1)
(1)
and x2 .
▶ The process is repeated a prescribed number of times or until a certain
(k) (k) T
precision appears to have been achieved in the vector x1 , x2 .
4 / 33
Jacobi iteration
Here are some selected values of the iterates of the Jacobi method for this
example:
(k) (k)
k x1 x2
0 0.00000 0.00000
10 0.14865 -0.19820
20 0.18682 -0.24909
30 0.19662 -0.26215
40 0.19913 -0.26551
50 0.19978 -0.26637
5 / 33
Seidel method
▶ It is apparent that this iterative process could be modified so the newest
(k)
value for x1 is used immediately in the second equation. The resulting
method is called the Gauss-Seidel method or iteration.
▶ Its equations are
(k) 6 (k−1) 3
x1 = x2 +
7 7
(k) 8 (k) 4
x2 = x1 −
9 9
6 / 33
Seidel iteration
▶ Some of the output from the Gauss-Seidel method follows:
(k) (k)
k x1 x2
0 0.00000 0.00000
10 0.21978 -0.24909
20 0.20130 -0.26531
30 0.20009 -0.26659
40 0.20001 -0.26666
50 0.20000 -0.26667
▶ Both the Jacobi and the Gauss-Seidel iterates seem to be converging to the
same limit, and the latter is converging faster.
▶ Also, notice that, in contrast to a direct method, the precision we obtain in
the solution depends on when the iterative process is halted.
7 / 33
Basic idea : splitting methods
▶ We now consider iterative methods in a more general mathematical setting.
A general type of iterative process for solving the system
Ax = b (1)
can be described as follows: A certain matrix Q, called the splitting matrix,
is prescribed, and the original problem is rewritten in the equivalent form
Qx = (Q − A)x + b (2)
▶ Equation (2) suggests an iterative process, defined by writing
Qx (k) = (Q − A)x (k−1) + b (k ≥ 1) (3)
8 / 33
Basic idea : splitting methods
▶ The initial vector x (0) can be arbitrary; if a good guess of the solution is
available, it should be used for x (0) .
▶ We shall say that the iterative method in Equation (3) converges if it
converges for any initial vector x (0) .
▶ A sequence of vectors x (1) , x (2) , . . . can be computed from Equation (3), and
our objective is to choose Q so that these two conditions are met
h i
1. The sequence x (k) is easily computed.
h i
2. The sequence x (k) converges rapidly to a solution.
9 / 33
Convergence
▶ In this section, we shall see that both of these conditions follow if it is easy
to solve Qx (k) = y and if Q −1 approximates A−1 .
h i
▶ Observe, to begin with, that if the sequence x (k) converges, say to a
vector x, then x is automatically a solution. Indeed, if we simply take the
limit in Equation (3) and use the continuity of the algebraic operations, the
result is
Qx = (Q − A)x + b (4)
which means that Ax = b.
▶ To assure that Equation (1) has a solution for any vector b, we shall assume
that A is nonsingular. We shall assume that Q is nonsingular as well, so that
Equation (3) can be solved for the unknown vector x (k) .
10 / 33
Convergence analysis
▶ Having made these assumptions, we can use the following equation for the
theoretical analysis:
x (k) = I − Q −1 A x (k−1) + Q −1 b
(5)
▶ It is to be emphasized that Equation (5) is convenient for the analysis, but
in numerical work x (k) is almost always obtained by solving Equation (3)
without the use of Q −1 .
▶ Observe that the actual solution x satisfies the equation
x = I − Q −1 A x + Q −1 b
(6)
▶ Thus, x is a fixed point of the mapping
x 7−→ I − Q −1 A x + Q −1 b
(7)
11 / 33
Convergence analysis
▶ By subtracting the terms in Equation (6) from those in Equation (5), we
obtain
(k) −1
(k−1)
x −x = I −Q A x −x (8)
▶ Now select any convenient vector norm and its subordinate matrix norm.
We obtain from Equation (8)
x (k) − x ≤ I − Q −1 A x (k−1) − x (9)
▶ By repeating this step, we eventually arrive at the inequality
k
x (k) − x ≤ I − Q −1 A x (0) − x (10)
12 / 33
Convergence of iterative method
▶ Thus, if I − Q −1 A < 1, we can conclude at once that
lim x (k) − x = 0 (11)
k→∞
for any x (0) .
Theorem
If I − Q −1 A < 1 for some subordinate matrix norm, then the sequence
produced by Equation (3) converges to the solution of Ax = b for any initial
vector x (0) .
13 / 33
Richardson method
▶ As an illustration of these concepts, we consider the Richardson method, in
which Q is chosen to be the identity matrix. Equation (3) in this case reads
as follows
x (k) = (I − A)x (k−1) + b = x (k−1) + r (k−1) (12)
where r (k−1) is the residual vector, defined by r (k−1) = b − Ax (k−1) .
▶ According to Theorem 1, the Richardson iteration will produce a solution to
Ax = b (in the limit) if ∥I − A∥ < 1 for some subordinate matrix norm.
▶ Try this method with the above example. Check which method converges
faster.
14 / 33
Convergence of Jacobi method
▶ Another illustration of our basic theory is provided by the Jacobi iteration, in
which Q is the diagonal matrix whose diagonal entries are the same as those
in the matrix A = aij .
▶ In this case, the generic element of Q −1 A is aij /aii . The diagonal elements
of this matrix are all 1, and hence,
n
I − Q −1 A ∞
= max ∑ aij /aii (13)
1≤i≤n j=1
j̸=i
15 / 33
Convergence of Jacobi method
Theorem
If A is diagonally dominant, then the sequence produced by the Jacobi iteration
converges to the solution of Ax = b for any starting vector.
Proof: Diagonal dominance means that
n
|aii | > ∑ aij (1 ≤ i ≤ n)
j=1
j̸=i
From Equation (13), we then conclude that
I − Q −1 A ∞
<1
By Theorem 1, the Jacobi iteration converges.
16 / 33
General theory for convergence
▶ Our next task is to develop some of the theory for arbitrary linear iterative
processes. We consider that such a process has been defined by an equation
of the form
x (k) = Gx (k−1) + c (14)
in which G is a prescribed n × n matrix and c is a prescribed vector in Rn .
▶ Notice that the iteration defined in Equation (3) will be included in any
general theory that we may develop for Equation (14); namely, we can set
G = I − Q −1 A and c = Q −1 b. We want to find a necessary and sufficient
condition on G so that the iteration of Equation (14) will converge for any
starting vector
17 / 33
Preliminary
▶ The eigenvalues of a matrix A are the complex numbers λ for which the
matrix A − λ I is not invertible. These numbers are then the roots of the
characteristic equation of A
det(A − λ I ) = 0
▶ The spectral radius of A is defined by the equation
ρ(A) = max{|λ | : det(A − λ I ) = 0}
Thus, ρ(A) is the smallest number such that a circle with that radius
centered at 0 in the complex plane will contain all the eigenvalues of A.
▶ A matrix A is said to be similar to a matrix B if there is a nonsingular
matrix S such that S −1 AS = B.
18 / 33
Theorems
Theorem
Every square matrix is similar to an (possibly complex) upper triangular matrix
whose off-diagonal elements are arbitrarily small.
Theorem
The spectral radius function satisfies the equation
ρ(A) = inf ∥A∥
∥·∥
in which the infimum is taken over all subordinate matrix norms.
19 / 33
Understanding
▶ Theorem 4 tells us that for any matrix A, its spectral radius lies below its
norm value for any subordinate matrix norm and a subordinate matrix norm
exists with a value arbitrarily close to the spectral radius.
▶ We now give a necessary and sufficient condition on the iteration matrix G
for convergence of the associated iterative method.
20 / 33
Necessary and Sufficient Conditions for Iterative Method
Convergence
Theorem
In order that the iteration formula
x (k) = Gx (k−1) + c
produce a sequence converging to (I − G )−1 c, for any starting vector x (0) , it is
necessary and sufficient that the spectral radius of G be less than 1 .
21 / 33
Proof
▶ Suppose that ρ(G ) < 1. By Theorem 4, there is a subordinate matrix norm
such that ∥G ∥ < 1. We write
x (1) = Gx (0) + c
x (2) = G 2 x (0) + Gc + c
x (3) = G 3 x (0) + G 2 c + Gc + c
▶ The general formula is
k−1
x (k) = G k x (0) + ∑ Gjc (15)
j=0
22 / 33
Proof
▶ Using the vector norm that engendered our matrix norm, we have
G k x (0) ≤ G k x (0) ≤ ∥G ∥k x (0) → 0 as k → ∞
▶ we have
∞
∑ G j c = (I − G )−1c
j=0
Thus, by letting k → ∞ in Equation (15), we obtain
lim x (k) = (I − G )−1 c
k→∞
23 / 33
Proof
▶ For the converse, suppose that ρ(G ) ≥ 1. Select u and λ so that
Gu = λ u |λ | ≥ 1 u ̸= 0
▶ Let c = u and x (0) = 0. By Equation (15), x (k) = ∑k−1 j k−1 j
j=0 G u = ∑j=0 λ u. If
λ = 1, u (k) = ku,
and this diverges as k → ∞. If λ ̸= 1, then
(k) k −1
x = λ − 1 (λ − 1) u, and this diverges also because limk→∞ λ k does
not exist.
24 / 33
Gauss seidel method convergence
Corollary
The iteration formula in (3), that is, Qx (k) = (Q − A)x (k−1) + b, will produce
a
(0) −1
sequence converging to the solution of Ax = b, for any x , if ρ I − Q A < 1.
Theorem
If A is diagonally dominant, then the Gauss-Seidel method converges for any
starting vector.
25 / 33
Proof of theorem
▶ By Corollary 6, it suffices to prove that
ρ I − Q −1 A < 1
To this end, let λ be any eigenvalue of I − Q −1 A. Let x be a corresponding
eigenvector. We assume, with no loss of generality, that ∥x∥∞ = 1. We have
now
I − Q −1 A x = λ x
or Qx − Ax = λ Qx
▶ Since Q is the lower triangular part of A, including its diagonal,
n i
− ∑ aij xj = λ ∑ aij xj (1 ≤ i ≤ n)
j=i+1 j=1
26 / 33
▶ By transposing terms in this equation, we obtain
i−1 n
λ aii xi = −λ ∑ aij xj − ∑ aij xj (1 ≤ i ≤ n)
j=1 j=i+1
▶ Select an index i such that |xi | = 1 ≥ xj for all j. Then,
i−1 n
|λ | |aii | ≤ |λ | ∑ aij + ∑ aij
j=1 j=i+1
▶ Solving for |λ | and using the diagonal dominance of A, we get
( )( )−1
n i−1
|λ | ≤ ∑ aij |aii | − ∑ aij <1
j=i+1 j=1
27 / 33
SOR method
▶ The Successive Over-Relaxation (SOR) method is an iterative technique
used to solve linear systems of equations. It is particularly effective for
diagonally dominant or symmetric positive definite matrices. The SOR
method introduces a relaxation parameter (ω) to accelerate convergence.
▶ Given a linear system of equations Ax = b, where A is the coefficient matrix,
x is the vector of unknowns, and b is the right-hand side vector, the SOR
method iterates as follows:
28 / 33
SOR Method
1. Start with an initial guess x (0) .
2. For each iteration k, update each component of x using the SOR formula:
!
i−1 n
(k+1) (k) ω (k+1) (k)
xi = (1 − ω)xi + bi − ∑ Aij xj − ∑ Aij xj
Aii j=1 j=i+1
where i = 1, 2, . . . , n (where n is the size of the system), x (k) is the current
approximation of the solution at iteration k, x (k+1) is the updated
approximation at iteration k + 1, and ω is the relaxation parameter.
3. Repeat step 2 until the change in x between consecutive iterations is below
a specified tolerance, or for a fixed number of iterations.
29 / 33
SOR Method
▶ The choice of the relaxation parameter ω can significantly affect the
convergence rate of the SOR method. An optimal value of ω depends on
the matrix A, and finding the best value often involves experimentation or
using heuristic methods.
▶ The SOR method can converge faster than the Gauss-Seidel method for
certain matrices, especially when ω is chosen appropriately. However,
selecting an optimal ω value can be challenging, and improper choices may
lead to slow convergence or divergence.
30 / 33
SOR example
Consider the system of linear equations:
4x1 + x2 − x3 = 4
x1 + 4x2 − 2x3 = 3
2x1 − x2 + 5x3 = 5
We can write this system in matrix form as Ax = b, where
4 1 −1 x1 4
A= 1 4 −2 , x = x2 , b = 3
2 −1 5 x3 5
31 / 33
Example
▶ To apply the SOR method, we need to rewrite the system in an iterative
form. The SOR iteration formula is:
!
i−1 n
(k+1) (k) ω (k+1) (k)
xi = (1 − ω)xi + bi − ∑ Aij xj − ∑ Aij xj
Aii j=1 j=i+1
0
(0)
▶ Let’s choose an initial guess x = 0 and a relaxation parameter
0
ω = 1.2. We’ll perform three iterations to approximate the solution.
32 / 33
Example
▶ Iteration 1:
(1) 1.2
x1 = (1 − 1.2) × 0 + (4 − 1 × 0 − (−1) × 0)
4
= 1.2
(1) (1)
Similarly, calculate x2 and x3 .
▶ Iteration 2: Update x1(2) , x2(2) , and x3(2) using the SOR formula with x (1) as
the initial guess.
▶ Iteration 3: Update x1(3) , x2(3) , and x3(3) using the SOR formula with x (2) as
the initial guess.
Repeat the iterations until convergence criteria are met (e.g., small change in x).
The final values of x should approximate the solution to the linear system.
33 / 33