MATRIX CHAIN
MULTIPLICATION
( DYNAMIC PROGRAMMING )
PRESENTED BY. ARUP JYOTI DUTTA
Dynamic Programming
• An algorithm design technique (like divide and
conquer)
• Divide and conquer
– Partition the problem into independent subproblems
– Solve the subproblems recursively
– Combine the solutions to solve the original problem
2
Dynamic Programming
• Applicable when subproblems are not independent
– Subproblems share subsubproblems
E.g.: Combinations:
n n-1 n-1
= +
k k k-1
n n
=1 =1
1 n
– A divide and conquer approach would repeatedly solve the
common subproblems
– Dynamic programming solves every subproblem just once
and stores the answer in a table
3
Example: Combinations
Comb (6,4)
= Comb (5, 3) Comb (5, 4)
+
= Comb (4,2) Comb (4, 3) + Comb (4, 3) + Comb (4, 4)
+
= Comb (3, 1)+ + Comb (3, 2) + Comb (3, 2) + Comb
+ (3, 3) + Comb
+ (3, 2) + Comb
+ (3, 3) + 1
= 3 + 1) + Comb (2, 2) + +Comb (2, 1) + Comb (2,
+ Comb (2, + 2) + 1 + +Comb (2, 1) + Comb (2,
+ 2) + 1 + 1+
= 3 + 2 + 1 + 2 + 1 + 1 + 2 + 1 + 1 + 1
n n-1 n-1
= +
k k k-1
4
Dynamic Programming Algorithm
1. Characterize the structure of an optimal
solution
2. Recursively define the value of an optimal
solution
3. Compute the value of an optimal solution in a
bottom-up fashion
4. Construct an optimal solution from computed
information (not always necessary)
5
Matrix Chain
Multiplication
Matrix Chain Multiplication
• Given some matrices to multiply, determine the best order to
multiply them so you minimize the number of single element
multiplications.
– i.e. Determine the way the matrices are parenthesized.
• First off, it should be noted that matrix multiplication is
associative, but not commutative. But since it is associative,
we always have:
• ((AB)(CD)) = (A(B(CD))), or any other grouping as long as the
matrices are in the same consecutive order.
• BUT NOT: ((AB)(CD)) = ((BA)(DC))
Matrix Chain Multiplication
• It may appear that the amount of work done
won’t change if you change the
parenthesization of the expression, but we
can prove that is not the case!
• Let us use the following example:
– Let A be a 2x10 matrix
– Let B be a 10x50 matrix
– Let C be a 50x20 matrix
Matrix Chain Multiplication
• Let’s get back to our example: We will show that the way we group
matrices when multiplying A, B, C matters:
– Let A be a 2x10 matrix
– Let B be a 10x50 matrix
– Let C be a 50x20 matrix
• Consider computing A(BC):
– # multiplications for (BC) = 10x50x20 = 10000, creating a 10x20
answer matrix
– # multiplications for A(BC) = 2x10x20 = 400
– Total multiplications = 10000 + 400 = 10400.
• Consider computing (AB)C:
– # multiplications for (AB) = 2x10x50 = 1000, creating a 2x50 answer
matrix
– # multiplications for (AB)C = 2x50x20 = 2000,
– Total multiplications = 1000 + 2000 = 3000
Matrix-Chain Multiplication
Problem: given a sequence A1, A2, …, An, compute
the product:
A1 A2 An
• Matrix compatibility:
C=AB C=A1 A2 Ai Ai+1 An
colA = rowB coli = rowi+1
rowC = rowA rowC = rowA1
colC = colB colC = colAn
10
MATRIX-MULTIPLY(A, B)
if columns[A] rows[B]
then error “incompatible dimensions”
else for i 1 to rows[A]
do for j 1 to columns[B] rows[A] cols[A] cols[B]
do C[i, j] = 0 multiplications
for k 1 to columns[A]
k do C[i, j] C[i, j] + A[i, k] B[k, j]
j cols[B]
j cols[B]
i = i
* k
A B C
rows[A]
11 rows[A]
Matrix-Chain Multiplication
• In what order should we multiply the matrices?
A1 A2 An
• Parenthesize the product to get the order in which
matrices are multiplied
• E.g.: A1 A2 A3 = ((A1 A2) A3)
= (A1 (A2 A3))
• Which one of these orderings should we choose?
– The order in which we multiply the matrices has a
significant impact on the cost of evaluating the product
12
Example
A1 A2 A3
• A1: 10 x 100
• A2: 100 x 5
• A3: 5 x 50
1. ((A1 A2) A3): A1 A2 = 10 x 100 x 5 = 5,000 (10 x 5)
((A1 A2) A3) = 10 x 5 x 50 = 2,500
Total: 7,500 scalar multiplications
2. (A1 (A2 A3)): A2 A3 = 100 x 5 x 50 = 25,000 (100 x 50)
(A1 (A2 A3)) = 10 x 100 x 50 = 50,000
Total: 75,000 scalar multiplications
one order of magnitude difference!!
13
Matrix-Chain Multiplication:
Problem Statement
• Given a chain of matrices A1, A2, …, An, where
Ai has dimensions pi-1x pi, fully parenthesize the
product A1 A2 An in a way that minimizes the
number of scalar multiplications.
A1 A2 Ai Ai+1 An
p0 x p1 p1 x p2 pi-1 x pi pi x pi+1 pn-1 x pn
14
1. The Structure of an Optimal
Parenthesization
• Notation:
Ai…j = Ai Ai+1 Aj, i j
• Suppose that an optimal parenthesization of Ai…j
splits the product between Ak and Ak+1, where
ik<j
Ai…j = Ai Ai+1 Aj
= Ai Ai+1 Ak Ak+1 Aj
= Ai…k Ak+1…j
15
2. A Recursive Solution
• Subproblem:
determine the minimum cost of parenthesizing
Ai…j = Ai Ai+1 Aj for 1 i j n
• Let m[i, j] = the minimum number of
multiplications needed to compute Ai…j
– full problem (A1..n): m[1, n]
– i = j: Ai…i = Ai m[i, i] =0, for i = 1, 2, …, n
16
2. A Recursive Solution
Consider the subproblem of parenthesizing
Ai…j = Ai Ai+1 Aj for 1 i j n
pi-1pkpj
= Ai…k Ak+1…j for i k < j
m[i, k] m[k+1,j]
• Assume that the optimal parenthesization splits
the product Ai Ai+1 Aj at k (i k < j)
m[i, k] + m[k+1, j] + pi-1pkpj
m[i, j] =
min # of multiplications min # of multiplications # of multiplications
to compute Ai…k to compute Ak+1…j to compute Ai…kAk…j
17
2. A Recursive Solution (cont.)
m[i, j] = m[i, k] + m[k+1, j] + pi-1pkpj
• We do not know the value of k
– There are j – i possible values for k: k = i, i+1, …, j-1
• Minimizing the cost of parenthesizing the product Ai Ai+1
Aj becomes:
0 if i = j
m[i, j] = min {m[i, k] + m[k+1, j] + pi-1pkpj} if i < j
ik<j
3. Computing the Optimal Costs
0 if i = j
m[i, j] = min {m[i, k] + m[k+1, j] + pi-1pkpj} if i < j
ik<j
• Computing the optimal solution recursively takes
exponential time! 1 2 3 n
• How many subproblems? n
(n2)
– Parenthesize Ai…j
j
for 1 i j n 3
– One problem for each 2
choice of i and j 1
19 i
3. Computing the Optimal Costs (cont.)
0 if i = j
m[i, j] = min {m[i, k] + m[k+1, j] + pi-1pkpj} if i < j
ik<j
• How do we fill in the tables m[1..n, 1..n]?
– Determine which entries of the table are used in computing m[i, j]
Ai…j = Ai…k Ak+1…j
– Subproblems’ size is one less than the original size
– Idea: fill in m such that it corresponds to solving problems of increasing
length
20
3. Computing the Optimal Costs (cont.)
0 if i = j
m[i, j] = min {m[i, k] + m[k+1, j] + pi-1pkpj} if i < j
ik<j
• Length = 1: i = j, i = 1, 2, …, n
• Length = 2: j = i + 1, i = 1, 2, …, n-1
1 2 3 n
n
m[1, n] gives the optimal
solution to the problem
j
Compute rows from bottom to top 3
and from left to right 2
1
i 21
Example: min {m[i, k] + m[k+1, j] + pi-1pkpj}
m[2, 2] + m[3, 5] + p1p2p5 k=2
m[2, 5] = min m[2, 3] + m[4, 5] + p1p3p5 k=3
m[2, 4] + m[5, 5] + p1p4p5 k=4
1 2 3 4 5 6
6
5
• Values m[i, j] depend only on
4
j values that have been
3 previously computed
2
1
i
22
Example min {m[i, k] + m[k+1, j] + pi-1pkpj}
1 2 3
Compute A1 A2 A3 2 2
3 7500 25000 0
• A1: 10 x 100 (p0 x p1)
1
2 0
• A2: 100 x 5 (p1 x p2) 5000
• A3: 5 x 50 (p2 x p3) 1 0
m[i, i] = 0 for i = 1, 2, 3
m[1, 2] = m[1, 1] + m[2, 2] + p0p1p2 (A1A2)
= 0 + 0 + 10 *100* 5 = 5,000
m[2, 3] = m[2, 2] + m[3, 3] + p1p2p3 (A2A3)
= 0 + 0 + 100 * 5 * 50 = 25,000
m[1, 3] = min m[1, 1] + m[2, 3] + p0p1p3 = 75,000 (A1(A2A3))
m[1, 2] + m[3, 3] + p0p2p3 = 7,500 ((A1A2)A3)
23
Matrix-Chain-Order(p)
O(N3)
24
4. Construct the Optimal Solution
• s[i, j] = value of k such that the optimal
parenthesization of Ai Ai+1 Aj splits
the product between Ak and Ak+1
1 2 3 4 5 6
6 3 3 3 5 5 -
• s[1, n] = 3 A1..6 = A1..3 A4..6
5 3 3 3 4 -
• s[1, 3] = 1 A1..3 = A1..1 A2..3
4 3 3 3 -
• s[4, 6] = 5 A4..6 = A4..5 A6..6
3 1 2 -
j
2 1 -
1 -
i
25
4. Construct the Optimal Solution (cont.)
PRINT-OPT-PARENS(s, i, j)
1 2 3 4 5 6
if i = j 6 3 3 3 5 5 -
then print “A”i 5 3 3 3 4 -
else print “(” 4 3 3 3 -
j
3 1 2 -
PRINT-OPT-PARENS(s, i, s[i, j])
2 1 -
PRINT-OPT-PARENS(s, s[i, j] + 1, j)
1 -
print “)”
i
26
Example: A1 A6 ( ( A1 ( A 2 A3 ) ) ( ( A 4 A5 ) A 6 ) )
PRINT-OPT-PARENS(s, i, j) s[1..6, 1..6] 1 2 3 4 5 6
if i = j 6 3 3 3 5 5 -
then print “A”i 5 3 3 3 4 -
else print “(”
4 3 3 3 -
PRINT-OPT-PARENS(s, i, s[i, j]) j
PRINT-OPT-PARENS(s, s[i, j] + 1, j) 3 1 2 -
print “)” 2 1 -
P-O-P(s, 1, 6) s[1, 6] = 3 1 -
i = 1, j = 6 “(“ P-O-P (s, 1, 3) s[1, 3] = 1 i
i = 1, j = 3 “(“ P-O-P(s, 1, 1) “A1”
P-O-P(s, 2, 3) s[2, 3] = 2
i = 2, j = 3 “(“ P-O-P (s, 2, 2) “A2”
P-O-P (s, 3, 3) “A3”
“)”
“)” … 27
Elements of dynamic programming
When should we apply the method of Dynamic
Programming?
Two key ingredients:
- Optimal substructure
- Overlapping subproblems
28
Elements of Dynamic Programming
• Optimal Substructure
– An optimal solution to a problem contains within it an
optimal solution to subproblems
– Optimal solution to the entire problem is build in a
bottom-up manner from optimal solutions to
subproblems
• Overlapping Subproblems
– If a recursive algorithm revisits the same subproblems
over and over the problem has overlapping
subproblems
29
Parameters of Optimal Substructure
• How many subproblems are used in an
optimal solution for the original problem
– Matrix multiplication:Two subproblems (subproducts Ai..k, Ak+1..j)
• How many choices we have in determining
which subproblems to use in an optimal
solution
– Matrix multiplication: j - i choices for k (splitting the
product)
30
Parameters of Optimal Substructure
• Intuitively, the running time of a dynamic
programming algorithm depends on two
factors:
– Number of subproblems overall
– How many choices we look at for each
subproblem
• Matrix multiplication:
– (n2) subproblems (1 i j n)
(n3) overall
– At most n-1 choices
31
Elements of dynamic programming (cont.)
Overlapping subproblems: (cont.)
1..
4
1..1 2..4 1..2 3..4 1..3 4..4
2..2 3..4 2..3 4..4 1..1 2..2 3..3 4..4 1..1 2..3 1..2 3..3
3..3 4..4 2..2 3..3 2..2 3..3 1..1 2..2
The recursion tree of RECURSIVE-MATRIX-CHAIN( p, 1, 4). The computations
performed in a shaded subtree are replaced by a single table lookup in MEMOIZED-
MATRIX-CHAIN( p, 1, 4). 32
Matrix-Chain multiplication (Contd.)
RECURSIVE-MATRIX-CHAIN (p, i, j)
1 if i = j
2 then return 0
3 m[i,j] ←∞
4 for k←i to j-1
5 do q←RECURSIVE-MATRIX-CHAIN (p, i, k)
+ RECURSIVE-MATRIX-CHAIN (p, k+1, j)+ pi-1 pk pj
6 if q < m[i,j]
7 then m[i,j] ←q
8 return m[i,j]
33
Elements of dynamic programming (cont.)
Overlapping subproblems: (cont.)
Time to compute m[ i,j] by RECURSIVE-MATRIX-CHAIN:
We assume that the execution of lines 1-2 and 6-7 take at least unit time.
T (1) 1,
n 1
T (n) 1 (T (k ) T (n k ) 1) for n 1
k 1
n 1
2 T (i ) n
i 1
34
Elements of dynamic programming (cont.)
Overlapping subproblems: (cont.)
WE guess that T (n) (2n ).
Using the substitution method with T (n) 2 n 1
n 1
T ( n) 2 2 i
i 1
n2
2 2i n
i 0
n 1
2( 2 1) n
2 n 1 35
Elements of dynamic programming (cont.)
Memoization
There is a variation of dynamic programming that often offers the
efficiency of the usual dynamic-programming approach while
maintaining a top-down strategy.
The idea is to memoize the the natural, but inefficient, recursive
algorithm.
We maintain a table with subproblem solutions, but the control
structure for filling in the table is more like the recursive algorithm.
36
Elements of dynamic programming (cont.)
Memoization
There is a variation of dynamic programming that often offers the
efficiency of the usual dynamic-programming approach while
maintaining a top-down strategy.
The idea is to memoize the the natural, but inefficient, recursive
algorithm.
We maintain a table with subproblem solutions, but the control
structure for filling in the table is more like the recursive algorithm.
37
Elements of dynamic programming (cont.)
• Memoization (cont.)
• An entry in a table for the solution to each subproblem is maintained.
• Eech table entry initially contains a special value to indicate that the
entry has yet to be filled.
• When the subproblem is first encountered during the execution of
the recursive algorithm, its solution is computed and then stored in
the table.
• Each subsequent time that the problem is encountered, the value
stored in the table is simply looked up and returned.
38
Elements of dynamic programming (cont.)
1 MEMOIZED-MATRIX-CHAIN(p)
2 n←length[p]-1
3 for i←1 to n
4 do for j←i to n
do m[i,j] ←∞
return LOOKUP-CHAIN(p,1,n)
39
Elements of dynamic programming (cont.)
Memoization (cont.)
LOOKUP-CHAIN(p,1,n)
1 if m[i,j] < ∞
2 then return m[i,j]
3 if i=j
4 then m[i,j] ←0
5 else for k←1 to j-1
6 do q← LOOKUP-CHAIN(p,i,k)
+ LOOKUP-CHAIN(p,k+1,j) + pi-1 pk pj
7 if q < m[i,j]
8 then m[i,j] ←q
9 return m[i,j]
40