REPEATED STRATEGIC GAME
Consider the prisoner’s dilemma game with possible actions Ci for Pi cooperating (with the other player)
and Di for Pi defecting from the other player. (Earlier, these actions were called quiet and fink respectively.)
The payoff matrix for the game is assumed to be as follows:
C 2 D2
C1 (2, 2) (0, 3)
D1 (3, 0) (1, 1)
We want to consider repeated play of this game for several or an infinite number of times. To simplify the
situation, we consider the players making simultaneous moves with the current move unknown to the other
player. This is defined formally on page 206. We use a game graph rather than a game tree to represent this
game. See Figure 1.
(C, C) (C, C) (C, C)
(C, D) (C, D) (C, D)
(D, C) (D, C) (D, C)
(D, D) (D, D) (D, D)
F IGURE 1. Game tree for repeated prisoner’s dilemma
Let a(t) = (a1(t) , a2(t) ) be the action profile at the t th stage. The one step payoff is assumed to depend on
only the action profile at the last stage, u i (a(`) ). There is a discount factor ) < δ < 1 to bring this quantity
back to an equivalent value at the first stage, δ t−1 u i (a(t) ). For a finitely repeated game of T stages (finite
horizon), the total payoff for Pi is
Ui (a(1) , . . . , a(T ) ) = u i (a(1) ) + δ u i (a(2) ) + · · · + δ T −1 u i (a(T ) )
T
δ t−1 u i (a(t) ).
X
=
t=1
There are a couple of ways to understand the discounting. If r > 0 is an interest rate, then capital V1 at
the first stage is worth Vt = (1 + r )t−1 V1 at the t th stage (t − 1 steps later). Thus, the value of Vt at the
first stage is Vt/(1 + r )t−1 . In this context, the discounting is δ = 1/(1 + r ). If the payoff is not money but
satisfaction, then δ is a measure of the extent the player wants rewards now, i.e., how impatient the player
is. See the book for further explanation.
For a finitely repeated prisoner’s dilemma game with payoffs as above, at the last stage, both players
optimize their payoff by selecting Di . Given this choice, then the choice that optimizes the payoff at the
T − 1 stage is again Di . By backward induction, both players will select D at each stage. See Section 14.4.
1
2 REPEATED STRATEGIC GAME
I NFINITELY REPEATED GAMES ( INFINITE HORIZON )
For the rest of this section, we consider an infinitely repeated game starting at stage one (infinite horizon).
The discounted payoff for player Pi is given by
∞
δ t−1 u i (a(t) ).
X
Ui ({at }∞
t=1 ) =
t=1
If {wt }∞
t=1 is the stream of payoffs (for one of the players), then the discounted sum is
∞
X
U ({wt }∞
t=1 ) = δ t−1 wt .
t=1
If all the payoffs are the same value, wt = c for all t, then
∞
X
U ({c}∞
t=1 ) = δ t−1 c
t=1
X∞
=c δk
k=0
c
= , so
1−δ
c = (1 − δ) U ({c})∞t=1 ).
Thus, For this reason, we call the quantity
Ũ ({wt }∞
t=1 ) = (1 − δ) U ({wt }t=1 )
∞
is called the discounted average. This quantity Ũ ({wt }∞t=1 ) is such that if the same quantity is repeated
infinitely many times then the same quantity is returned by Ũ . Applying this to actions, the quantity
t=1 ) = (1 − δ) Ui ((at )t=1 )
Ũi ({at }∞ ∞
is the discounted average payoff of the action stream.
S OME NASH EQUILIBRIA STRATEGIES
We describe some strategies as reactions to action profiles that have gone before. We only describe
situations where both players use the same rules to define their strategies. In describing the strategy for Pi ,
we let j be the other player. Thus, if i = 1 then j = 2, and if i = 2 then j = 1. We then describe a manner
in which to understand these strategies in terms of a modified game graph.
Defection Strategy. In this strategy, both players select D in response to any history of actions. It is easy
to check that this is a Nash equilibrium.
Grim Trigger Strategy. (page 426) The strategy for Pi is given by
Ci if t = 1 or a (`)
(
(1 (t−1) j = C for all 1 ≤ ` ≤ t − 1
si (a , . . . , a )= (`)
Di a j = D for some 1 ≤ ` ≤ t − 1.
We are next going to decibel this strategy in terms of states of the two players. The states are defined
so that the action of the strategy for player Pi depends only on the state of Pi . These states can be used to
determine a new game tree that has a vertex at each stage for a pair of states for the two players.
For the grim trigger strategy, there are two states for Pi :
Ci = {t = 1} ∪ {(a(1) , . . . , a(t−1) ) : a (`)
j = C j for all 1 ≤ ` ≤ t − 1 }
Di = {(a(1) , . . . , a(t−1) ) : a (`)
j = D j for some 1 ≤ ` ≤ t − 1 }.
REPEATED STRATEGIC GAME 3
The strategy of Pi is to select Ci if the state is Ci and to select Di if the state is Di . The transitions between
the states depend only on the action of the other player at the last stage. This situation can be represented
by the game tree in Figure 2.
(C1 , C2 )
(C1 , C2 ) (C1 , C2 )
(D1 , C2 )
(C1 , C2 )
(∗, C2 )
(C1 , D2 ) (C1 , D2 )
(D1 , C2 ) (C1 , D2 )
(C1 , C2 ) (∗, D2 )
(C1 , D2 )
(C1 , ∗) (D1 , C2 )
(D1 , C2 )
(D1 , D2 ) (D1 , ∗) (D1 , D2 )
(∗, ∗)
(D1 , D2 ) (D1 , D2 )
F IGURE 2. Game tree for grim trigger
As given in the book, rather than giving a game tree, it is easier to give a figure presenting the transitions
and states (of only one player). See Figure 3.
Cj
Dj
Ci : Ci Di : Di ∗
F IGURE 3. States and transitions for grim trigger
We next check that if both players use the grim trigger strategy the result is a Nash equilibrium. Since we
start in state (C1 , C2 ), applying the strategy will keep both players in the same states. The one step payoff
at each stage is 2. Assume that P2 maintains the strategy and P1 deviates at stage T by selecting D1 . Then,
P2 selects C2 for t = T and selects D2 for t > T . The greatest payoff for P1 results from selecting D1 for
t > T . Thus, if P1 selects D1 for t = T , then the greatest payoff from that stage onward is
3 δ T + δ T +1 + δ T +2 + · · · = 3 δ T + δ T +1 1 + δ + δ 2 + · · ·
δ T +1
= 3 δT + .
1−δ
If P1 plays the original strategy, the payoff from the T th stage onward is
2 δT
2 δ T + 2 δ T +1 + 2 δ T +2 + · · · = .
1−δ
4 REPEATED STRATEGIC GAME
Therefore, the grim trigger strategy is a Nash equilibrium provided that
2 δT δ T +1
≥ 3 δT +
1−δ 1−δ
2 ≥ 3(1 − δ) + δ = 3 − 2 δ
2δ ≥ 1
δ ≥ 12 .
This shows that if both players are patient enough so that δ ≥ 1/2, then the grim trigger strategy is a Nash
equilibrium.
Tit-for-tat Strategy. (page 427, Section 14.7.3) We describe this strategy in terms of states of the players.
For the tit-for-tat strategy, there are two states for Pi that only depend on the action of P j in the last period:
Ci = {t = 1} ∪ {(a(1) , . . . , a(t−1) ) : a (t−1)
j = Cj }
Di = {(a(1) , . . . , a(t−1) ) : a (t−1)
j = D j }.
The transitions between states are given in Figure 4.
Cj
Cj
Dj
Ci : Ci Di : Di Dj
F IGURE 4. States and transitions for tit-for-tat
We next check that the tit-for-tat strategy by both players is also a Nash equilibrium for δ ≥ 1/2. Assume
that P2 maintains the strategy and P1 deviates by selecting D1 at the T th -stage. The payoff for the original
strategy starting at the T th -stage is
2 δT
.
1−δ
The other possibilities for actions by P1 include (a) D1 for t ≥ T , (b) alternating D1 and C1 forever, and
(c) D1 for k times and then C1 . (The latter returns P2 to the original state, so it is enough to calculate this
segment of the payoffs. Note that the book ignores the last case.) We check these three case in turn.
(a) If P1 uses D1 for t ≥ T , the P2 uses C2 for t = T and then D2 for t > T . The payoff for these choices
is
δ T +1
3 δ T + δ T +1 + δ T +2 + · · · = 3 δ T + .
1−δ
For tit-for-tat to be a Nash equilibrium, we need
2 δT δ T +1
≥ 3 δT +
1−δ 1−δ
2 ≥ 3(1 − δ) + δ = 3 − 2 δ
2δ ≥ 1
δ ≥ 12 .
REPEATED STRATEGIC GAME 5
(b) If P1 alternates D1 and C1 , then P2 alternates C2 and D2 . The payoff for P1 is
3 δ T + (0) δ T +1 + 3 δ T +2 + · · · = 3 δ T 1 + δ 2 + δ 4 + · · ·
3 δT
= .
1 − δ2
In order for tit-for-tat to be a Nash equilibrium, we need
2 δT 3 δT
≥
1−δ 1 − δ2
2 (1 + δ) ≥ 3
2δ ≥ 1
δ ≥ 12 .
We get the same condition on δ as in case (a).
(c) If P1 selects D1 for k stages and then C1 , then P2 will select C2 and then D2 for k stages. At the end,
P2 is back in state C2 . The payoffs for these k + 1 stages of the original strategy and the the deviation are
2δ T + · · · 2δ T +k and 3δ T + δ T +1 + · · · + δ T +k−1 + (0)δ T +k .
Thus, we need
2δ T + · · · + 2δ T +k ≥ 3δ T + δ T +1 + · · · + δ T +k−1 or
−1 + δ + · · · + δ k−1 + 2δ k ≥ 0.
If δ ≥ 1/2, then
1 k
k−1
2δ k + δ k−1 + · · · + δ − 1 ≥ 2 + 12 + · · · + 12 − 1
2
1 k−1
k−1
+ 12 + · · · + 12 − 1
≥ 2
k−1 k−2
≥ 2 12 + 12 + · · · + 21 − 1
..
.
1
≥2 2
−1
= 0.
Thus, the condition is satisfied. This checks all the possible deviations, so the tit-for-tat strategy is a Nash
equilibrium for δ ≥ 1/2.
Limited punishment Strategy. (Section 14.7.2) In this strategy, each player has k + 1 states for some
k ≥ 2. For Pi , starting in state Pi,0 , if the other player selects D j , then there is a transition to Pi,1 , then a
transition to Pi,2 . . . , Pi,k , and then back to Pi,0 . The transitions from Pi,` for 1 ≤ ` ≤ k do not depend
on the actions of either player. For the limited punishment strategy, the actions of Pi are Ci in state Pi,0 and
Di in states Pi,` for 1 ≤ ` ≤ k. See Figure 5 for the case of k = 2. See the book for the case of k = 3.
Cj ∗
Dj ∗
Pi,0 : Ci Pi,1 : Di Pi,2 : Di
F IGURE 5. States and transitions for limited punishment
6 REPEATED STRATEGIC GAME
If P1 select D1 at some stage, the P2 will select C2 and then D2 for the next k stages. The maximum payoff
for P1 is obtained by selecting D1 for all of these k + 1 stages. The payoffs for P1 are 2 + 2δ + · · · + 2δ k for
the limited punishment strategy that results in all C for both players, and 3 + δ + · · · + δ k for the deviation.
Therefore, we need
3 + δ + · · · + δ k ≤ 2 + 2δ + · · · + 2δ k ,
1 − δk
1 ≤ δ + ··· + δ = δ
k
,
1−δ
1 − δ ≤ δ − δ k+1 , and
gk (δ) = 1 − 2 δ + δ k+1
≤ 0.
To check that this is true for δ large enough, we use calculus.
gk (1) = 0,
1 k+1
gk 12 = 1 − 1 + >
2
0,
gk0 (δ) = −2 + (k + 1)δ , k
and
gk0 (1) = −2 + k + 1 > 0 since k ≥ 2.
There is only one δ̄ such that gk0 (δ̄) = 0:
2
δ̄ k =
k+1
k1
2
δ̄ =
k
.
k+1
Therefore, there is a 12 ≤ δk∗ ≤ δ̄ < 1 such that gk (δ) ≤ 0 for δk∗ ≤ δ < 1. For this range of δ, the limited
punishment strategy is a Nash equilibrium.
The book mentions that δ2∗ ≈ 0.62 and δ3∗ ≈ 0.55.
Existence of many Nash equilibrium. The book states that it is possible to realize many different payoffs
with Nash equilibrium. See Theorem 435.1. In particular, there are uncountably many different payoffs for
different Nash equilibrium.
S UBGAME P ERFECT E QUILIBRIA : S ECTIONS 14.9 & 14.10
The following is a criterion for a subgame perfect equilibrium.
Definition 1. One deviation property: No player can increase her payoff by changing her action at the start
of any subgame in which she is the first mover, given the other players’ strategy and the rest of her own
strategy.
The point is that the deviation needs only be checked at one stage at a time.
Proposition (438.1). A strategy in an infinitely repeated game with discount factor 0 < δ < 1 is a subgame
perfect equilibrium iff it satisfies the one deviation property.
Defection Strategy. This is obviously a subgame perfect strategy since the same choice is made at every
vertex and it is a Nash equilibrium.
REPEATED STRATEGIC GAME 7
Grim Trigger Strategy. (Section 14.10.1) This is not subgame perfect as given. Starting at the state
(C1 , D2 ), it is not a Nash equilibrium. Since P2 is playing the grim trigger, she will pick D2 at every
stage after. Player P1 will play C1 and then D1 for every other stage. The payoff for P1 is
0 + δ + δ2 + · · · .
However, if P1 changes to always playing D1 , then the payoff is
1 + δ + δ2 + · · · ,
which is larger. Therefore, this is not a Nash equilibrium on a subgame with root pair of states (C1 , D2 ).
A slight modification leads to a subgame perfect equilibrium. Keep the states the same, but make a
transition from Ci to Di if the action of either player is D. See Figure 6. This gives a subgame perfect
equilibrium for δ ≥ 1/2.
(C1 , C2 )
Di , D j
Ci : Ci Di : Di ∗
F IGURE 6. States and transitions for the modified grim trigger
Limited punishment Strategy. (Section 14.10.2) This can also be modified to make a subgame perfect
equilibrium: Make the transition from Pi,0 to Pi,1 when either player takes the action D. The rest is the
same.
Tit-for-tat Strategy. (Section 14.10.3) The four combinations of states for the two players are (C1 , C2 ),
(C1 , D2 ), (D1 , C2 ), and (D1 , D2 ). We need to check that the strategy is a Nash equilibrium on a subgame
starting at any of these four state profiles.
(i) (C1 , C2 ): The analysis we gave to show that it was a Nash equilibrium applies and shows that it is
true for δ ≥ 1/2.
(ii) (C1 , D2 ): If both players adhere to the strategy, then the actions will be
(C1 , D2 ), (D1 , C2 ), (C1 , D2 ), · · · ,
with payoff
3δ
0 + 3 δ + (0) δ 2 + 3 δ 3 = 3 δ(1 + δ 2 + δ 4 + · · · ) = .
1 − δ2
If P1 instead starts by selecting D1 , then the actions will be
(D1 , D2 ), (D1 , D2 ), · · ·
with payoff
1
1 + δ + δ2 + · · · = .
1−δ
So we need
3δ 1
≥
1−δ 2 1−δ
3δ ≥ 1 + δ
2δ ≥ 1
δ ≥ 12 .
8 REPEATED STRATEGIC GAME
(iii) (D1 , C2 ): If both players adhere to the strategy, then the actions will be
(D1 , C2 ), (C1 , D2 ), (D1 , C2 ), · · · ,
with payoff
3
3 + (0) δ + 3 δ 2 + (0) δ 3 = 3 (1 + δ 2 + δ 4 + · · · ) = .
1 − δ2
If P1 instead starts by selecting C1 , then the actions will be
(C1 , C2 ), (C1 , C2 ), · · ·
with payoff
2
2 + 2 δ + 2 δ2 + · · · = .
1−δ
So we need
3 2
≥
1−δ 2 1−δ
3 ≥ 2 + 2δ
1 ≥ 2δ
δ ≤ 21 .
(iv) (D1 , D2 ): If both players adhere to the strategy, then the actions will be
(D1 , D2 ), (D1 , D2 ), (D1 , D2 ), · · · ,
with payoff
1
1 + δ + δ2 + · · · = .
1−δ
If P1 instead starts by selecting C1 , then the actions will be
(C1 , D2 ), (D1 , C2 ), · · ·
with payoff
3δ
0 + 3 δ + (0) δ 2 + 3 δ 3 = 3 δ(1 + δ 2 + δ 4 + · · · ) = .
1 − δ2
So we need
1 3δ
≥
1−δ 1 − δ2
1 + δ ≥ 3δ
1 ≥ 2δ
δ ≤ 21 .
For all four of these conditions to hold, we need δ = 1/2.