GonzalezGauss Zhao 2020
GonzalezGauss Zhao 2020
Abstract
In this paper, we discover the concept of dynamic programming. Dy-
namic programming can be used in a multitude of fields, ranging from
board games like chess and checkers, to predicting how RNA is struc-
tured. In order to explain aspects of dynamic programming, we include
background information covering: induction, counting and combinatorics,
probability theory, and time and space complexity.
Contents
1 Introduction 2
2 Background Information 2
2.1 Mathematical Induction . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Time Complexity and Runtime . . . . . . . . . . . . . . . . . . . 3
2.2.1 Big-O Notation . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2.2 Brute-force versus Dynamic Programming Methods . . . 4
2.3 Counting and Useful Formulas . . . . . . . . . . . . . . . . . . . 4
2.4 Expected Values . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.5 Polynomial Time (P), Nondeterministic Polynomial Time (NP)
and Space Complexity . . . . . . . . . . . . . . . . . . . . . . . . 4
2.5.1 Turing Machine . . . . . . . . . . . . . . . . . . . . . . . . 5
2.5.2 Polynomial and Nondeterministic Polynomial Time . . . . 5
2.5.3 Space Complexity . . . . . . . . . . . . . . . . . . . . . . 6
1
3.3.2 International Rules of Checkers . . . . . . . . . . . . . . . 10
3.3.3 Tracing Backwards . . . . . . . . . . . . . . . . . . . . . . 10
3.3.4 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3.5 Runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3.6 Simplifications . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4 Go Dominant Strategy . . . . . . . . . . . . . . . . . . . . . . . . 13
3.4.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . 13
3.4.2 International Rules of Go . . . . . . . . . . . . . . . . . . 13
3.4.3 Case Examples . . . . . . . . . . . . . . . . . . . . . . . . 14
3.4.4 Impossible Scenarios . . . . . . . . . . . . . . . . . . . . . 14
3.4.5 Runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.4.6 Brute-Force Method . . . . . . . . . . . . . . . . . . . . . 16
1 Introduction
In this paper, we provide concepts important to the understanding of dynamic
programming. These topics are either utilized later in the paper, or allow for a
deeper and more contextual understanding of subjects which we do not cover.
We include inductive reasoning, time complexity, formulas of permutations, ex-
pected values, and polynomial versus nondeterministic polynomial time. We
then explore dynamic programming, examining the Longest Common Subse-
quence (LCS) problem, the dominant strategy of checkers, and the dominant
strategy of Go. This information should also demonstrate the effectiveness of
dynamic programming, as well as show the numerous applications that dynamic
programming has through various mathematical fields. Conclusively, we recog-
nize what can be learned from this paper, and identify future exploration.
Dynamic programming is applicable in graph theory; game theory; AI and
machine learning; economics and finance problems; bioinformatics; as well as
calculating the shortest path, which is used in GPS. Dynamic programming is
also used in laptops and phones to store data through caches so that it does not
need to retrieve data from the servers when data is needed. In fact, you may
be using dynamic programming without realizing that you are.
2 Background Information
2.1 Mathematical Induction
Induction is a way of proving a statement which depends on a natural number.
There are 2 main methods of induction - weak and strong induction. Weak
induction works like this:
1. Initial Step (base case) - This is the lowest case we look at (typically n = 1
or 0). The case must be solved using an alternative method.
2. Induction hypothesis - This step assumes that the claim holds true when
n = k.
2
3. Induction step - This step tries to demonstrate that the claim is true when
n = k + 1 using the assumption from step 2.
Induction works because of its recursive nature. If we establish that the base
case is true, let us assume the base case is it when n = 1, we can use the base
case to prove the next case: n = 2 is true, and then use that to prove n = 3 is
true and so on.
The difference between weak and strong induction is during the induction
hypothesis, weak induction assumes that only the kth step is true, while strong
induction assumes all values up to the kth step are true.
Example 1. A problem that can be solved through induction is calculating
the sum of the first n natural numbers. The formula for the sum of the first n
natural numbers is this
(n)(n + 1)
1 + 2 + 3 + ... + (n − 1) + n =
2
To prove it through induction, we first test the base case - when n = 1. The
base case is true because 1 = (1)(2)
2 . Next, we assume that the formula works
when n = k and prove
(k + 1)(k + 2)
1 + 2 + 3 + ... + k + (k + 1) =
2
which means it still works at n = k + 1. The left side of the equation can be
changed from
(k)(k + 1)
1 + 2 + 3 + ... + k + (k + 1) to + (k + 1)
2
because we assumed when n = k the equation was true. Next, using simple
algebra we see that (k)(k+1)
2 + (k + 1) = (k+2)(k+1)
2 and this completes the proof.
3
Definition 2.1 (Big-O Notation). If g(x) is a real or complex function, and
f (x) is a real function, g(x) is O(f (x)) if and only if the value |g(x)| is at most
a positive constant multiple of f (x) for all sufficiently large values of x.
4
We introduce the topic of the Turing machine to help describe Time and
Space complexity.
5
Figure 1: A Turing Machine.
6
Figure 2: A Visual Representation of Common Computational Complexity Sets.
7
3.1.1 Bellman Equation
The value function V (x0 ) of an infinite-horizon dynamic decision problem can
be rephrased as the maximum of the summation of every action’s payoff within
the problem.
The Bellman equation rephrases value functions to quantify and identify the
recursive means by which dynamic programming is applied. It defines the value
function V (x0 ) as being the maximum of the payoff in state x0 plus the value
function V (x1 ).
The Bellman equation has widespread applications in economics, particularly
among idealized problems surrounding economic growth, resource extraction,
and public finance. The problem can also be reworked to adapt to stochastic
optimal control problems.
8
L is formatted like this, L[a, b], which is considered a subproblem in dynamic
programming.
Definition 3.1. L[a, b] refers to the length of the longest common subsequence
between the first a characters of A and the first b characters of B. The inequal-
ities 1 ≤ a ≤ n and 1 ≤ b ≤ n must be true.
The algorithm compares the last elements of the two subsequences. This
comparison results in two cases - where the last elements are same, and where
the last elements are not.
In the case where the last digits match, add one to the length of the previous
largest common subsequence or L(a − 1, b − 1). This is because the matching
last characters of A and B are part of the longest common subsequence, so the
length of the longest common subsequence must be one more than the length
of the previous common subsequence.
In the case where the last digits are different, The longest subsequence would
be the length of L(a − 1, b) or the length of L(a, b − 1) depending on which is
longer. This is because the last digit does not add to the longest common
subsequence so the length of the longest common subsequence does not change.
Example 1. Take into account two example sequences A and B. Let us assume
that A : [1,2,3,4,5] and that B : [1,6,7,8,5]. Following the algorithm above, we
can see that L[1, 1] equals 1. We can, using this action, find L[2, 1] and L[1, 2],
from which we can then find L[2, 2], L[1, 3] and L[3, 1]. By repeating this process
we can use previous stored data to find L[n, n]. If we follow this method, we
will only calculate each L-value once.
3.2.4 Runtime
By repeating this process for each element of the sequences, we can find the
length of the longest common subsequence. The algorithm takes into account
each value of L-function only once, which is why it is efficient. The L-function
calculation simply compares the last element of each sequence, which makes each
comparison run in O(1) time. Ultimately, to calculate L(n, n), the algorithm
must go through every element of one sequence and compare it to every element
of the other sequence; this amounts to n2 L-function calculations. Finally,
9
multiplying the time it takes to complete one L-function calculation, to the
total amount of L-function calculations, results in a runtime of O(n2 ). This is
of the time complexity set P.
3.3.4 Algorithm
Let us look at an 8 by 8 board and also disregard crowned pieces. We will assign
either win, loss, or draw state for each individual scenario. These labels mean
10
from that point on, if both players were to play perfectly, that would be the
only result that could happen.
We first look at scenarios in which the opposing player has only one piece
left, which we can automatically consider a win.
After, understanding that any scenario with one piece left can force a win,
we can then observe scenarios with two pieces that could have preceded the one
piece. We should then verify that this scenario, under optimal play, it still of
the value Win or Draw. If not, disregard that case. After limiting the amount
of two piece scenarios, we can then move to three piece scenarios; again, finding
the optimal value possible for preceding scenarios, and disregarding the rest.
By repeating this process we can trace a chain of optimal moves back to the
beginning of the game, thus creating a means of optimal play.
The algorithm does not necessarily need to go through every sequence of
connected cases. An optimal move is a move that leads into another optimal
move. This means we have to look at our move and the opponents move to
guarantee an optimal move. Essentially, we will only have to look at 2 moves
11
total to determine whether a move is optimal or not, which will have an effect
on our runtime.
3.3.5 Runtime
What is the runtime of our algorithm? First, let us generalize the board to be
k by k. This means that each player has k2 ( k2 − 1) pieces. The number written
in Big O notation is O(k 2 ) because k is squared when simplified. Next, we find
how many moves each player can make. There are 2 types of pieces, regular
pieces and crowned pieces. These pieces can move in 2 different ways, capturing
a piece or moving regularly.
Definition 3.2. Regular Piece: A regular non-capturing piece can move in 2
one tiled ways and a regular capturing piece can move in 2k ways. Assume the
regular capturing piece can jump in both ways it can move. Then assume after
the jump, the piece can capture in both ways again. This can be repeated until
the end of the board during a worst case scenario, making it 2k . Written in
Big O notation, a regular non-capturing piece is O(1) because the number is
constant, and the regular capturing piece is O(2k ).
Definition 3.3. Crowned Piece: A crowned non-capturing piece can move in
4 unlimited-tiled ways - this ends up being at max k tiled because worst case
scenario is if they are in a corner and the move to the opposite corner which is k
tiles one direction. A crowned capturing piece can move 2×2k ways because the
crowned piece is like the regular piece but it can move backwards too. Written
in Big O notation, a crowned non-capturing piece is O(k) because the paths are
2k, and the crowned capturing piece is O(2k ).
The amount of subproblems can be determined by combining our factors
together.
To determine the payoff state of this subproblem, we can check the state of
the O(k 2 × 2k ) possible board configurations derived from this board. Figure
5 illustrates a system of subproblems, where each subproblem is connected to
the subproblem that preceded it. Determining state takes O(1) because of our
usage of recursion, so the total runtime of solving a subproblem is O(k 2 × 2k ).
There are O(k 2 ) spaces where pieces can be placed on a checkers board, each
with three possible states of a white piece, a black piece, or no piece. This means
2
that there O(3k ) possible board configurations. Therefore, the total runtime
2
of the applied dynamic programming algorithm is O(k 2 × 2k × 3k ). This is in
the set of EXPTIME.
3.3.6 Simplifications
We can simplify checkers to significantly reduce the runtime of the algorithm.
One massive simplification to the algorithm is if we do not include the capturing
12
pieces. The capturing part of checkers takes a lot of time and space to process
because O(2k ) is a extremely significant factor in the runtime and increases very
fast. Another simplification that can be made is by removing the crowned piece
aspect of the game. This limits each piece to at most 2 possible moves. This
allows each subproblem to have O(k 2 ) runtime. The total algorithm then has a
2
runtime of O(k 2 × 3k ).
13
Figure 7: The Beginnings of a Go Game.
1. The board should remain square. Any other shape would obviously dis-
tort the game-play. In some scenarios it would be impossible to capture
stones or create eyes due to how compromising the shape of the board
is. Players would be able to constrain stones more easily, and opponents
would struggle to defend their stones.
2. Go is a game of significant complexity. As such, this complexity should
be reflected in the size of the board. Pieces interact with one another in
ways that would be hard to replicate in smaller board sizes. Yet we also
wish to limit the amount of calculations.
3.4.5 Runtime
Bottom-Up Method Previously, when solving the checkers dominant
strategy, the tracing backwards, or top-down method is used. The bottom-
up method is the alternative to the top-down method. The bottom-up method
14
Figure 8: An Impossible Scenario.
is a method which recognizes the first possible boards, checks which configura-
tions are possible from the sequential board states, until the algorithm reaches
a stopping point where the win/loss state is palpable. The algorithm then looks
back on preceding configurations, to validate that, with perfect play, the board
configuration leads to a win.
The total runtime of calculating the bottom-up method is equal to the num-
ber of board positions multiplied by the number of possible moves in these board
positions.
15
given because, as previously stated, there are three possible cases per intersec-
tion. Board configurations are derived of different arrangements of these cases.
Every intersection provides three cases, surmounting to 3I total possible board
configurations. This runtime is within the set EXPTIME.
Therefore, for the traditional 19 × 19 board with 361 intersections, the
runtime of calculating the dominant strategy via the bottom-up method is
121.34 × 3361 , or approximately 2.1124 × 10174 .
16