Applied Algorithms
CSCI-B505 / INFO-I500
Lecture 10.
Dynamic Programming - I
M. Oguzhan Kulekci
• Dynamic Programming
• Fibonacci
• Binomial Coe cients
• Edit distance
• Some Variants of Edit Distance
• Longest Common Subsequence
ffi
Dynamic Programming
- Find the minimum or maximum of a combinatorial challenge (combinatorial opt.)
- Exhaustive search guarantees the optimum, but very expensive
- Greedy approach (!) is more reasonable, but no guarantees (in general)
- Dynamic programming aims to compute the optimum with a good complexity by
storing the results of some prior computations for the sake of some others later.
- DP is particularly useful when there is a reductive solution but with
signi cant overlaps between the recursive steps.
fi
Bad Recursions
• Same computation is repeated many times in the recursion tree !
Example: Fibonacci with recursion …
n
• Time complexity is exponential ≈ 1.6 . Why?
n
• F(n) > 1.6 , for large n, and since we will sum up to this by reaching that much of leaves.
Bad Recursions
• A better recursion with caching !
• Allocating O(n) space returns the result in O(n) time.
Simple DP for Fibonacci
Allocating O(1) space returns
the result in O(n) time.
Binomial Coefficients
( k ) (n − k)! ⋅ k!
n n! • It can be directly calculated. However, over ows are
= possible even with small n, k values.
• Recursive calculation avoids such over ows
(k) (k − 1) ( k )
n n−1 n−1
= +
fl
fl
Approximate String Matching
Given two strings s1 and s2, in how many steps can we alter s1 to become s2 ?
The alterations we are allowed to do are:
• Substitution: Change a speci c symbol of s1 to match a symbol of s2,
e.g., s1: shot s2:spot
• Insertion: Insert a new symbol into s1 to match a corresponding symbol on s2
e.g., s1: ago s2:agog
• Deletion: Delete symbol from s1
e.g., s1: hour s2:our
We will assume each operation has the equal cost of 1.
fi
Approximate String Matching by Recursion
Let edit distance between P1P2…Pi and T1T2…Tj be D[i, j].
There are three possibilities:
1. D[i, j] = D[i − 1,j − 1] + (1 | 0)
2. D[i, j] = D[i, j − 1] + 1 Extra symbol on T, indel(Tj), either insertion on P
or a deletion from T, thus, named indel
3. D[i, j] = D[i − 1,j] + 1 Extra symbol on P, indel(Pi) either insertion on T
or a deletion from P, indel
Approximate String Matching by Recursion
D[i, j] = D[i − 1,j − 1] + (1 | 0)
D[i, j] = D[i, j − 1] + 1
D[i, j] = D[i − 1,j] + 1
n
Recursive Program with exponential time ( ≈ 3 ) since
at each recursion, three children are born.
D[5][6]
D[4][5] D[5][5] D[4][6]
D[3][4] D[4][4] D[3][5] D[4][4] D[5][4] D[4][5] D[3][5] D[4][5] D[3][6]
BAD RECURSION ! ?
Approximate String Matching by Dynamic Programming
Tj
Initial row
Initial column
D[i − 1,j − 1]
D[i − 1,j]
Pi
D[i, j − 1]
D[i, j]
• Initial row and column values are xed.
• We just ll the matrix with a row-major or column major traversal.
fi
fi
Approximate String Matching by Dynamic Programming
The edit distance between P and T
DP solution of edit distance works in O(n ⋅ m)-time and -space.
Approximate String Matching by Dynamic Programming
• The parent information can be
separately maintained to
reconstruct the path.
• It is also possible to gather it
from the cost matrix as well.
t h o u - s h a l t
y o u - s h o u l d
D S M M M M M I S M S
Some Variants of the Edit Distance
• Edit distance has many di erent variants to solve di erent problems.
• All ll the DP matrix with some slight di erences that make a signi cant e ect.
• Di erent cost functions
• The position of the nal result on the matrix
• Di erent initialization of the matrix
• Di erent traceback actions
Here are some examples
ff
ff
ff
fi
fi
ff
ff
ff
fi
ff
Some Variants of the Edit Distance
Substring Matching: On a long text T we aim to spot P, wherever it matches best.
Example: Searching for the best alignment of a relatively short DNA sequence on a
long DNA sequence of a human around 3 gigabases long.
Edit Distance Substring Match
A T T C T G A C T A C A T A T T C T G A C T A C A T
0 1 2 3 4 5 6 7 8 9 10 11 12 13 0 0 0 0 0 0 0 0 0 0 0 0 0 0
G 1 G 1
A 2 A 2
T 3 T 3
T 4 T 4
A 5 A 5
C 6 C 6
A 7 A 7
Find minimum cost on the last row
Some Variants of the Edit Distance P: democrats
T: republicans
Longest Common Subsequence: Between P LCS(P,T):ecas
and T we investigate the longest match of d e m o c r a t s
possibly scattered sequence of symbols 0 0 0 0 0 0 0 0 0 0
r 0 0 0 0 0 0 1 1 1 1
e 0 0 1 1 1 1 1 1 1 1
p 0 0 1 1 1 1 1 1 1 1
u 0 0 1 1 1 1 1 1 1 1
b 0 0 1 1 1 1 1 1 1 1
l 0 0 1 1 1 1 1 1 1 1
i 0 0 1 1 1 1 1 1 1 1
c 0 0 1 1 1 2 2 2 2 2
a 0 0 1 1 1 2 3 3 3 3
n 0 0 1 1 1 2 3 3 3 3
s 0 0 1 1 1 2 3 3 3 4
Another way is to use normal edit distance,
but preventing substitutions ! How ?
Edit Distance Computation Challenges
The matrix needs O(n . m), quadratic, space. Is there a way to reduce it ?
The computation takes O(n . m) time, would it be possible to speed it up via
parallelization?
Reading assignment
• Read the Dynamic Programming chapters from the text books, particularly
from Cormen and Skiena.