100% found this document useful (1 vote)

73 views4 pages

Advanced Algorithm Analysis

The document summarizes in 3 points: 1. It discusses cuckoo hashing and proves that the expected time to insert an item is O(1). This is done by modeling the insertion process as a random walk and bounding the probability of certain undesirable events like long paths or cycles. 2. It briefly discusses the "power of two choices" for hashing, where choosing between 2 random hash functions reduces maximum bin load to O(ln ln n). 3. It outlines a more rigorous proof technique for analyzing the power of two choices by defining indicator random variables and using a union bound over events to show maximum bin load is O(ln ln n).

Uploaded by

as.2007 S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

73 views4 pages

Advanced Algorithm Analysis

Uploaded by

as.2007 S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

CS 224: Advanced Algorithms Fall 2014

Lecture 4 — September 16, 2014

Prof. Jelani Nelson Scribe: Albert Wu

1 Cuckoo Hashing

Let us say we have an array A of size m = 4n, two random has functions g, h. We try to insert
x into A[g(x)], potentially kicking out item already there and moving it. Note that this might
cascade.
If a sequence of items moves goes on for ≥ C · lg n steps, we give up, pick new g and h, and rebuilt
entire data structure.

Claim: E(time to insert x) ≤ O(1).

Proof: A cuckoo graph has m vertices (one per cell of A) and n edges (since for each x, we connect
g(x) to h(x)).
Consider the path we get from an insertion of x. We could get a simple path, a single cycle, or
a double cycle. Let us define the following random variables: T , the runtime; Pk , the indicator
random variable of a path being at least length k; Ck , the indicator random variable for single-cycle
config of length ≥ k; and D the indicator for a random variable for having a 2-cycle config. Note
also that the probability of the insertion process taking more than N = C log n steps implies that
one of either D, PN , or CN occurred. Therefore
We know that:
X X
ET = E Pk + E Ck + P (go on for more than C log n steps) · n · ET
k k
X X
≤E Pk + E Ck + (P (D = 1) + EPN + ECN ) · n · ET (1)
k k

Let us consider EPk . Fix x2 , x3 , . . . , xk+1 . Fix the assignment of the (k + 1) hash values to vertices.
1
The probability we see exactly this path is m · m12k · 2k . To do this, note that the number of total
possible has values is m , the number of ways to choose edges is n · (n − 1) · · · (n − k + 1) ≤ nk .
k+1

1 1 1
Then, by union bound, we know that E[Pk ] ≤ nk · mk+1 · m m2k · 2k = 2k
.
Now, let us bound Ck . For Ck , let us define 3 types of edges (the forward edges, the backward
edges, and edges on the subsequent path created by the other function). One of these must have
1
k/3 edges, giving us a similar bound as the path analysis E[Ck ] ≤ 2k/3 .
For D, we want P(D = 1). Let t denote the number of distinct vertices (which will also be the
number of distinct edges, not including edges labeled with x) in the double cycle graph. Let Dt be
the indicator random variable for having a tour of this type with t vertices. We know that
X
P (D = 1) = P (Dt = 1) (2)
k

1
Let us look for a particular configuration with t vertices. The probability we see this config is
1
· 1 · 2t (the extra 1/m2 comes from requiring x to hash to its two vertices). Union bounding
m2 (m2 )t
over all configurations: we have at most mt choices of vertices, at most nt choices of edges, and
at most t3 choices for the start of the first cycle, the length of the first cycle, and the start of the
second cycle. Thus
(2mn)t
P (Dt = 1) ≤ t3 · 2t+2 ,
m
which is at most (1/n2 )t3 /2t . Thus Eq. (2) converges and is O(1/n2 ).
Now, in Eq. (1), the probability of going on for more than N steps is at most P (D = 1) + E PN +
E CN . By setting C large enough, this is O(1/n2 ), dominated by the P (D = 1) term. Rearranging
terms thus gives ET = O(1), as desired.

2 Last Thing on Hashing

Let us talk about the “power of two choices.” Recall hashing w/ chaining. If we choose a perfect
lg n
random hash function, with high probability, the length of the longest list is O lg lg n .

[Azar, Broder, Karlin, Upfal, SICOMP ‘99] Pick 2 random hash functions g, h. When inserting x,
place in the least loaded amongst A[g(x)] and A[h(x)]. Now, with high probability, the heaviest
bin has at most lnlnln2n + Θ(1) items.
ln ln n
What about the power of d choices? We only improve by a constant factor, i.e., ln d + Θ(1) items
in heaviest.
[Vöcking JACM ‘03] Break up bins into d groups each of size n/d. When insert item, check random
locations in each group. Put in least loaded, break ties by placing in leftmost. Now, the maximum
load is Θ ln ln
d
n
.
To see more, see survey by Mitzenmacher, Richa, Sitaraman.
Intuition for power of 2 choices:
Let Bi be the number of bins with load ≥ i. Let the height of x, H(x) be such that x is he H(x)th
item inserted into that bin.
Let Qx be the indicator random variable for event that H(x) ≥ i + 1. The probability that H(x) ≥
2 2 2
i + 1 is at most Bni . So, if everything is as expected, Bi+1 ≤ n · Bni , i.e., Bi+1
n ≤ Bni .

B10 B10+j
Let’s say that n ≤ 12 . Then, n ≤ 1
j . We are done with B10+j n < 1
n, which append when
22
j ≥ lg lg n.
More rigorous details:
Below we outline how a more rigorous proof would go.
n eα2i
Define α6 = 2e , αi+1 = n . If Ei is the event that Bi ≤ αi , we will show that who all events Ei
occur.
n
First, P(E6 ) = 1 because 2e > n6 .

2
We would now like to show that P(∨i Ei ) is large. By the union bound, this is at least
X X
1− P(¬Ei ) ≥ 1 − P(¬E0 ) − (P(¬Ei+1 |Ei ) + P(¬Ei ))
i i
X
1− (P(¬Ei+1 |Ei ) + P(¬Ei )) (3)
i

It thus suffices to bound P(¬Ei+1 |Ei ) and P(¬Ei ).

Lemma 1.
αi 2

P(Bin n, n > αi+1 )
P(¬Ei+1 |Ei ) ≤
P(Ei )
where Bin(n, p) is a binomial random variable with parameter n, p. That is, it is the sum of n
independent random Bernoulli random variables each with expectation p. Recall that a Bernoulli
random variable is supported in {0, 1}.

Proof. For an item j, let the height H(j) be such that j is the H(j)th ball inserted into its
P bin.
Let Yj be an indicator random variable
P for the event H(j) ≥ i + 1. Then certainly B i+1 ≤ j Yj .
It thus suffices to upper bound P( j Yj > αi+1 |Ei ).
By Bayes’ rule, P
X P(( j Yj > αi+1 ) ∧ Ei )
P( Yj > αi+1 |Ei ) =
P(Ei )
j

We then want to bound the numerator of the right hand side. Let Xj be a Bernoulli random
variable with E Xj = (αi /n)2 . We will introduce the following “coupling” argument, which defines
two sets of random variables {Xj }, {Ỹj } on the same probability space. Imagine picking uniform
random variables Uj , Uj0 in [0, 1). If both Uj , Uj0 ≤ αi /n, then we set Xj to 1; else we set Xj to 0.
Now, imagine labeling the points a0 = 0/n, a1 = 1/n, . . . , an = n/n on the interval [0, 1]. As we
will describe, these points correspond to the n bins, in reverse sorted order by load. Uj , Uj0 when
generated will land in [at−1 , at ) and [at0 −1 , at0 ), respectively, for some t, t0 . We then imagine placing
a ball in the least loaded of bins t, t0 (recall t = 1 corresponds to the heaviest bin). If we are at
a point where Ei no longer holds, then we set Ỹj = 0. Otherwise we set Ỹj = 1 iff H(j) ≥ i + 1
according to this process. Now observe two things:

(a) Ỹj ≤ Xj always (with probability 1). Therefore

X X
P( Ỹj > αi+1 ) ≤ P( Xj > αi+1 ) (4)
j j

P
(b) In any point in the above defined probability space where both Ei and j Yj > αi+1 hold, it
P
also holds that j Ỹj > αi+1 . Thus
X X
P(( Yj > αi+1 ) ∧ Ei ) ≤ P( Ỹj > αi+1 ) (5)
j j

Combining Eqs. (4) and (5) concludes the proof.

3
We now P(E6 ) = 1 (and equivalently P(¬E6 ) = 0). By an inductive argument, once we up-
per bound P(¬Ei ), we can invoke Lemma 1 to yield that upper bounding P(Bin(n, (αi /n)2 ) >
αi+1 ) implies a bound on P(¬Ei+1 |Ei ) (since in our inductive hypothesis we claim we have an
upper bound on P(¬Ei ), and thus a lower bound on P(Ei ) = 1 − P(¬Ei )). One can bound
2
P(Bin(n, (αi /n)2 ) > αi+1 ) ≤ e−Cαi /n via the Chernoff bound (calculation left as an exercise to
the reader!). For the reader interested in seeing all the calculations worked out, see the notes at
http://www.cs.berkeley.edu/~sinclair/cs271/n15.pdf.

3 Next Time

We will talk about data structures + amortized analysis, heaps (binomial and Fibonacci [2]), and
splay trees [4].
For heaps, we store n items w/keys (comparable). We can insert(x), decreaseKey(x, k), and
deleteMin(). Dijkstra’s algorithm uses heaps in its implementation, and its runtime is m · insert +
m·decreaseKey+n·deleteMin if there are n vertices and m edges. With binary heaps, all operations
take log n time and thus Dijkstra runs in time O((m + n) log n). We will see that Fibonacci heaps
support insert and decreaseKey each in O(1) amortized time, and deleteMin in O(log n) amortized
time, thus speeding up Dijkstra to O(m + n log n).

References

[1] Yossi Azar, Andrei Z. Broder, Anna R. Karlin, Eli Upfal. Balanced Allocations. SIAM J.
Comput., 29(1):180–200, 1999.

[2] Michael L. Fredman, Robert Endre Tarjan. Fibonacci heaps and their uses in improved network
optimization algorithms. J. ACM 34(3), pages 596–615l, 1987.

[3] Michael Mitzenmacher, Andréa W. Richa, Ramesh Sitaraman. Chapter 9: The Power of Two
Random Choices: A Survey Of Techniques And Results. Handbook of Randomized Computing.
2001. Kluwer Academic Publishers.

[4] Daniel Dominic Sleator, Robert Endre Tarjan. Self-Adjusting Binary Search Trees. J. ACM
32(3), pages 652–686, 1985.

[5] Berthold Vöcking. How asymmetry helps load balancing. J. ACM., 50(4):568–589, 2003.

Woman-Centered Coaching Blueprint - Workshop 3 - Handout
No ratings yet
Woman-Centered Coaching Blueprint - Workshop 3 - Handout
14 pages
TB3 - 117 Engine Maintenance Manual: (EMM Book1 TOC) (Chapter 72 TOC)
No ratings yet
TB3 - 117 Engine Maintenance Manual: (EMM Book1 TOC) (Chapter 72 TOC)
14 pages
s15 Pin Out
No ratings yet
s15 Pin Out
4 pages
Dual Clutch Transmission
0% (1)
Dual Clutch Transmission
18 pages
Notes On Randomized Algorithms
No ratings yet
Notes On Randomized Algorithms
539 pages
Super Memory British English Student A2 B1
No ratings yet
Super Memory British English Student A2 B1
6 pages
Cleaning Validation MACO Swab Rinse Ovais v1.1
No ratings yet
Cleaning Validation MACO Swab Rinse Ovais v1.1
8 pages
Randomizedd Algorithms
No ratings yet
Randomizedd Algorithms
195 pages
Eci 2023
No ratings yet
Eci 2023
507 pages
IB Chemistry Stoichiometry & Periodicity
No ratings yet
IB Chemistry Stoichiometry & Periodicity
309 pages
GS 150
No ratings yet
GS 150
72 pages
Notes On Randomized Algorithms: James Aspnes March 3rd, 2020
No ratings yet
Notes On Randomized Algorithms: James Aspnes March 3rd, 2020
453 pages
Lecture 9: Learning Decision Trees and DNFS: 1 Two Important Learning Algorithms
No ratings yet
Lecture 9: Learning Decision Trees and DNFS: 1 Two Important Learning Algorithms
8 pages
Treaps, Verification, and String Matching
No ratings yet
Treaps, Verification, and String Matching
6 pages
BestSub Heat Press Catalog 2024
No ratings yet
BestSub Heat Press Catalog 2024
37 pages
Randomised Algorithm
No ratings yet
Randomised Algorithm
385 pages
Randomized Algorithms: Tutorial 3 Hints For Homework 2
No ratings yet
Randomized Algorithms: Tutorial 3 Hints For Homework 2
31 pages
CS369N: Beyond Worst-Case Analysis Lecture #5: Self-Improving Algorithms
No ratings yet
CS369N: Beyond Worst-Case Analysis Lecture #5: Self-Improving Algorithms
11 pages
Book
No ratings yet
Book
267 pages
Hoc Sinh Gioi 8 - 2022
No ratings yet
Hoc Sinh Gioi 8 - 2022
10 pages
DAA - Section B.
No ratings yet
DAA - Section B.
17 pages
Topic 7 - Challenge Risk and Safety
No ratings yet
Topic 7 - Challenge Risk and Safety
83 pages
Probabilistic Method in Combinatorics
No ratings yet
Probabilistic Method in Combinatorics
9 pages
Endd 687
No ratings yet
Endd 687
5 pages
ProbabilisticCombinatorics 15 MAR 2019
No ratings yet
ProbabilisticCombinatorics 15 MAR 2019
114 pages
Algorithms and Data Structures
No ratings yet
Algorithms and Data Structures
7 pages
Pset 2
No ratings yet
Pset 2
5 pages
Prob Comb Soln
No ratings yet
Prob Comb Soln
5 pages
Lec1 Bloom Distinctcount
No ratings yet
Lec1 Bloom Distinctcount
76 pages
Lec 31 Handout
No ratings yet
Lec 31 Handout
18 pages
Mining Data Streams (Part 2)
No ratings yet
Mining Data Streams (Part 2)
56 pages
Button
No ratings yet
Button
11 pages
Shaastra OPC 2010 - Solution Sketches: 1 Fibonacci Sums)
No ratings yet
Shaastra OPC 2010 - Solution Sketches: 1 Fibonacci Sums)
5 pages
Cornerstones of Financial Accounting 3rd Canadian Edition Rich Unlocked Test Bank
No ratings yet
Cornerstones of Financial Accounting 3rd Canadian Edition Rich Unlocked Test Bank
311 pages
CS264: Beyond Worst-Case Analysis Lecture #19: Self-Improving Algorithms
No ratings yet
CS264: Beyond Worst-Case Analysis Lecture #19: Self-Improving Algorithms
13 pages
Covhyp
No ratings yet
Covhyp
15 pages
ProbabilisticMethod 12
No ratings yet
ProbabilisticMethod 12
10 pages
Randomized Algosnotes
No ratings yet
Randomized Algosnotes
362 pages
Goat Housing Design Guide
No ratings yet
Goat Housing Design Guide
2 pages
1 Warm-Up Examples: COS597D: Information Theory in Computer Science
No ratings yet
1 Warm-Up Examples: COS597D: Information Theory in Computer Science
5 pages
Dis 1
No ratings yet
Dis 1
4 pages
"Balls Into Bins" - A Simple and Tight Analysis
No ratings yet
"Balls Into Bins" - A Simple and Tight Analysis
12 pages
11 Tail Inequalities: 11.1 Markov's Inequality
No ratings yet
11 Tail Inequalities: 11.1 Markov's Inequality
5 pages
ProbabilisticMethod 6
No ratings yet
ProbabilisticMethod 6
16 pages
Advanced Pseudorandomness Concepts
No ratings yet
Advanced Pseudorandomness Concepts
6 pages
Streams 2
No ratings yet
Streams 2
49 pages
Rikhav Shah Probabilistic Method SCUM Talk
No ratings yet
Rikhav Shah Probabilistic Method SCUM Talk
7 pages
ProbabilisticMethod 4
No ratings yet
ProbabilisticMethod 4
13 pages
Randomized Algorithms Notes
No ratings yet
Randomized Algorithms Notes
13 pages
Unit 8 Probability
No ratings yet
Unit 8 Probability
9 pages
Random Graphs
No ratings yet
Random Graphs
9 pages
Lec7 - CS787 - Advanced Algorithms
No ratings yet
Lec7 - CS787 - Advanced Algorithms
7 pages
3.flajolet Martin Algorithm
No ratings yet
3.flajolet Martin Algorithm
31 pages
L11 PDF
No ratings yet
L11 PDF
5 pages
Compsci Algorithms For Data Science: Cameron Musco University of Massachusetts Amherst. Fall 2019
No ratings yet
Compsci Algorithms For Data Science: Cameron Musco University of Massachusetts Amherst. Fall 2019
28 pages
Algo Scribe
No ratings yet
Algo Scribe
10 pages
Sol 3
No ratings yet
Sol 3
7 pages
(Hooker and Monas, 2008) Shoestring Venture - The Startup Bible
No ratings yet
(Hooker and Monas, 2008) Shoestring Venture - The Startup Bible
532 pages
03 Hashing
No ratings yet
03 Hashing
21 pages
Expectation of Geometric Distribution Variance and Standard Deviation
No ratings yet
Expectation of Geometric Distribution Variance and Standard Deviation
5 pages
Info Theory Course Notes
No ratings yet
Info Theory Course Notes
46 pages
hw05 Solution PDF
No ratings yet
hw05 Solution PDF
8 pages
Randomized Algorithms Guide
No ratings yet
Randomized Algorithms Guide
407 pages
Designe and Analysis of Algoritham Mid-Term Equivalent Assignment
No ratings yet
Designe and Analysis of Algoritham Mid-Term Equivalent Assignment
9 pages
No of Flips For First Head
No ratings yet
No of Flips For First Head
8 pages
Universal Hashing Explained
No ratings yet
Universal Hashing Explained
4 pages
Indian Institute of Technology Bombay
No ratings yet
Indian Institute of Technology Bombay
6 pages
Applied Probability Theory - J. Chen
100% (3)
Applied Probability Theory - J. Chen
177 pages
CSE291 Course Notes
No ratings yet
CSE291 Course Notes
69 pages
Probabilistic Combinatorics Notes
No ratings yet
Probabilistic Combinatorics Notes
42 pages
Notes
No ratings yet
Notes
422 pages
Đề Khảo Sát Cuối Kỳ Ii
No ratings yet
Đề Khảo Sát Cuối Kỳ Ii
5 pages
CM2A
No ratings yet
CM2A
4 pages
Ci Driver Do Motor Do CD Rom Datasheet
No ratings yet
Ci Driver Do Motor Do CD Rom Datasheet
11 pages
Lec7 PDF
No ratings yet
Lec7 PDF
7 pages
Design and Manufacturing of Carbon Fiber Composite Drive Shaft As An Alternative To Conventional Steel Drive Shaft
No ratings yet
Design and Manufacturing of Carbon Fiber Composite Drive Shaft As An Alternative To Conventional Steel Drive Shaft
10 pages
Economics of Oil Prices 2
No ratings yet
Economics of Oil Prices 2
8 pages
Meaning and Discourse: Dr. Manjet Kaur Dr. Omer Mahfoodh
No ratings yet
Meaning and Discourse: Dr. Manjet Kaur Dr. Omer Mahfoodh
59 pages
Sample ICT Action Plan
100% (2)
Sample ICT Action Plan
2 pages
2006-12-31: Overall Conclusion For The Year of 'Arise and Shine'
No ratings yet
2006-12-31: Overall Conclusion For The Year of 'Arise and Shine'
6 pages
Nokia 303 User Guide: Issue 1.1
No ratings yet
Nokia 303 User Guide: Issue 1.1
50 pages
Bca Muj
No ratings yet
Bca Muj
4 pages
A Study On Customer Satisfaction at HDFC Bank Vijayapura
No ratings yet
A Study On Customer Satisfaction at HDFC Bank Vijayapura
85 pages
CS174: Note07
No ratings yet
CS174: Note07
5 pages
CS174: Note12
No ratings yet
CS174: Note12
5 pages
Android-Controlled Pesticide Spraying Robot
No ratings yet
Android-Controlled Pesticide Spraying Robot
6 pages
Action Plan For NLC
No ratings yet
Action Plan For NLC
9 pages
Anthony 8
No ratings yet
Anthony 8
2 pages
Goodwill Valuation in Accountancy
No ratings yet
Goodwill Valuation in Accountancy
4 pages

Advanced Algorithm Analysis

Uploaded by

Advanced Algorithm Analysis

Uploaded by

CS 224: Advanced Algorithms Fall 2014

Lecture 4 — September 16, 2014

Claim: E(time to insert x) ≤ O(1).

2 Last Thing on Hashing

It thus suffices to bound P(¬Ei+1 |Ei ) and P(¬Ei ).

(a) Ỹj ≤ Xj always (with probability 1). Therefore

Combining Eqs. (4) and (5) concludes the proof.

You might also like