CS 224: Advanced Algorithms Fall 2014
Lecture 4 — September 16, 2014
Prof. Jelani Nelson Scribe: Albert Wu
1 Cuckoo Hashing
Let us say we have an array A of size m = 4n, two random has functions g, h. We try to insert
x into A[g(x)], potentially kicking out item already there and moving it. Note that this might
cascade.
If a sequence of items moves goes on for ≥ C · lg n steps, we give up, pick new g and h, and rebuilt
entire data structure.
Claim: E(time to insert x) ≤ O(1).
Proof: A cuckoo graph has m vertices (one per cell of A) and n edges (since for each x, we connect
g(x) to h(x)).
Consider the path we get from an insertion of x. We could get a simple path, a single cycle, or
a double cycle. Let us define the following random variables: T , the runtime; Pk , the indicator
random variable of a path being at least length k; Ck , the indicator random variable for single-cycle
config of length ≥ k; and D the indicator for a random variable for having a 2-cycle config. Note
also that the probability of the insertion process taking more than N = C log n steps implies that
one of either D, PN , or CN occurred. Therefore
We know that:
X X
ET = E Pk + E Ck + P (go on for more than C log n steps) · n · ET
k k
X X
≤E Pk + E Ck + (P (D = 1) + EPN + ECN ) · n · ET (1)
k k
Let us consider EPk . Fix x2 , x3 , . . . , xk+1 . Fix the assignment of the (k + 1) hash values to vertices.
1
The probability we see exactly this path is m · m12k · 2k . To do this, note that the number of total
possible has values is m , the number of ways to choose edges is n · (n − 1) · · · (n − k + 1) ≤ nk .
k+1
1 1 1
Then, by union bound, we know that E[Pk ] ≤ nk · mk+1 · m m2k · 2k = 2k
.
Now, let us bound Ck . For Ck , let us define 3 types of edges (the forward edges, the backward
edges, and edges on the subsequent path created by the other function). One of these must have
1
k/3 edges, giving us a similar bound as the path analysis E[Ck ] ≤ 2k/3 .
For D, we want P(D = 1). Let t denote the number of distinct vertices (which will also be the
number of distinct edges, not including edges labeled with x) in the double cycle graph. Let Dt be
the indicator random variable for having a tour of this type with t vertices. We know that
X
P (D = 1) = P (Dt = 1) (2)
k
1
Let us look for a particular configuration with t vertices. The probability we see this config is
1
· 1 · 2t (the extra 1/m2 comes from requiring x to hash to its two vertices). Union bounding
m2 (m2 )t
over all configurations: we have at most mt choices of vertices, at most nt choices of edges, and
at most t3 choices for the start of the first cycle, the length of the first cycle, and the start of the
second cycle. Thus
(2mn)t
P (Dt = 1) ≤ t3 · 2t+2 ,
m
which is at most (1/n2 )t3 /2t . Thus Eq. (2) converges and is O(1/n2 ).
Now, in Eq. (1), the probability of going on for more than N steps is at most P (D = 1) + E PN +
E CN . By setting C large enough, this is O(1/n2 ), dominated by the P (D = 1) term. Rearranging
terms thus gives ET = O(1), as desired.
2 Last Thing on Hashing
Let us talk about the “power of two choices.” Recall hashing w/ chaining. If we choose a perfect
lg n
random hash function, with high probability, the length of the longest list is O lg lg n .
[Azar, Broder, Karlin, Upfal, SICOMP ‘99] Pick 2 random hash functions g, h. When inserting x,
place in the least loaded amongst A[g(x)] and A[h(x)]. Now, with high probability, the heaviest
bin has at most lnlnln2n + Θ(1) items.
ln ln n
What about the power of d choices? We only improve by a constant factor, i.e., ln d + Θ(1) items
in heaviest.
[Vöcking JACM ‘03] Break up bins into d groups each of size n/d. When insert item, check random
locations in each group. Put in least loaded, break ties by placing in leftmost. Now, the maximum
load is Θ ln ln
d
n
.
To see more, see survey by Mitzenmacher, Richa, Sitaraman.
Intuition for power of 2 choices:
Let Bi be the number of bins with load ≥ i. Let the height of x, H(x) be such that x is he H(x)th
item inserted into that bin.
Let Qx be the indicator random variable for event that H(x) ≥ i + 1. The probability that H(x) ≥
2 2 2
i + 1 is at most Bni . So, if everything is as expected, Bi+1 ≤ n · Bni , i.e., Bi+1
n ≤ Bni .
B10 B10+j
Let’s say that n ≤ 12 . Then, n ≤ 1
j . We are done with B10+j n < 1
n, which append when
22
j ≥ lg lg n.
More rigorous details:
Below we outline how a more rigorous proof would go.
n eα2i
Define α6 = 2e , αi+1 = n . If Ei is the event that Bi ≤ αi , we will show that who all events Ei
occur.
n
First, P(E6 ) = 1 because 2e > n6 .
2
We would now like to show that P(∨i Ei ) is large. By the union bound, this is at least
X X
1− P(¬Ei ) ≥ 1 − P(¬E0 ) − (P(¬Ei+1 |Ei ) + P(¬Ei ))
i i
X
1− (P(¬Ei+1 |Ei ) + P(¬Ei )) (3)
i
It thus suffices to bound P(¬Ei+1 |Ei ) and P(¬Ei ).
Lemma 1.
αi 2
P(Bin n, n > αi+1 )
P(¬Ei+1 |Ei ) ≤
P(Ei )
where Bin(n, p) is a binomial random variable with parameter n, p. That is, it is the sum of n
independent random Bernoulli random variables each with expectation p. Recall that a Bernoulli
random variable is supported in {0, 1}.
Proof. For an item j, let the height H(j) be such that j is the H(j)th ball inserted into its
P bin.
Let Yj be an indicator random variable
P for the event H(j) ≥ i + 1. Then certainly B i+1 ≤ j Yj .
It thus suffices to upper bound P( j Yj > αi+1 |Ei ).
By Bayes’ rule, P
X P(( j Yj > αi+1 ) ∧ Ei )
P( Yj > αi+1 |Ei ) =
P(Ei )
j
We then want to bound the numerator of the right hand side. Let Xj be a Bernoulli random
variable with E Xj = (αi /n)2 . We will introduce the following “coupling” argument, which defines
two sets of random variables {Xj }, {Ỹj } on the same probability space. Imagine picking uniform
random variables Uj , Uj0 in [0, 1). If both Uj , Uj0 ≤ αi /n, then we set Xj to 1; else we set Xj to 0.
Now, imagine labeling the points a0 = 0/n, a1 = 1/n, . . . , an = n/n on the interval [0, 1]. As we
will describe, these points correspond to the n bins, in reverse sorted order by load. Uj , Uj0 when
generated will land in [at−1 , at ) and [at0 −1 , at0 ), respectively, for some t, t0 . We then imagine placing
a ball in the least loaded of bins t, t0 (recall t = 1 corresponds to the heaviest bin). If we are at
a point where Ei no longer holds, then we set Ỹj = 0. Otherwise we set Ỹj = 1 iff H(j) ≥ i + 1
according to this process. Now observe two things:
(a) Ỹj ≤ Xj always (with probability 1). Therefore
X X
P( Ỹj > αi+1 ) ≤ P( Xj > αi+1 ) (4)
j j
P
(b) In any point in the above defined probability space where both Ei and j Yj > αi+1 hold, it
P
also holds that j Ỹj > αi+1 . Thus
X X
P(( Yj > αi+1 ) ∧ Ei ) ≤ P( Ỹj > αi+1 ) (5)
j j
Combining Eqs. (4) and (5) concludes the proof.
3
We now P(E6 ) = 1 (and equivalently P(¬E6 ) = 0). By an inductive argument, once we up-
per bound P(¬Ei ), we can invoke Lemma 1 to yield that upper bounding P(Bin(n, (αi /n)2 ) >
αi+1 ) implies a bound on P(¬Ei+1 |Ei ) (since in our inductive hypothesis we claim we have an
upper bound on P(¬Ei ), and thus a lower bound on P(Ei ) = 1 − P(¬Ei )). One can bound
2
P(Bin(n, (αi /n)2 ) > αi+1 ) ≤ e−Cαi /n via the Chernoff bound (calculation left as an exercise to
the reader!). For the reader interested in seeing all the calculations worked out, see the notes at
http://www.cs.berkeley.edu/~sinclair/cs271/n15.pdf.
3 Next Time
We will talk about data structures + amortized analysis, heaps (binomial and Fibonacci [2]), and
splay trees [4].
For heaps, we store n items w/keys (comparable). We can insert(x), decreaseKey(x, k), and
deleteMin(). Dijkstra’s algorithm uses heaps in its implementation, and its runtime is m · insert +
m·decreaseKey+n·deleteMin if there are n vertices and m edges. With binary heaps, all operations
take log n time and thus Dijkstra runs in time O((m + n) log n). We will see that Fibonacci heaps
support insert and decreaseKey each in O(1) amortized time, and deleteMin in O(log n) amortized
time, thus speeding up Dijkstra to O(m + n log n).
References
[1] Yossi Azar, Andrei Z. Broder, Anna R. Karlin, Eli Upfal. Balanced Allocations. SIAM J.
Comput., 29(1):180–200, 1999.
[2] Michael L. Fredman, Robert Endre Tarjan. Fibonacci heaps and their uses in improved network
optimization algorithms. J. ACM 34(3), pages 596–615l, 1987.
[3] Michael Mitzenmacher, Andréa W. Richa, Ramesh Sitaraman. Chapter 9: The Power of Two
Random Choices: A Survey Of Techniques And Results. Handbook of Randomized Computing.
2001. Kluwer Academic Publishers.
[4] Daniel Dominic Sleator, Robert Endre Tarjan. Self-Adjusting Binary Search Trees. J. ACM
32(3), pages 652–686, 1985.
[5] Berthold Vöcking. How asymmetry helps load balancing. J. ACM., 50(4):568–589, 2003.