Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
8 views299 pages

Cs330s20 Notes

Uploaded by

junior fotsing
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views299 pages

Cs330s20 Notes

Uploaded by

junior fotsing
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 299

Algorithms

Péter Gács, with Dóra Erdős


Freely using the textbooks by
Kleinberg-Tardos and Cormen-Leiserson-Rivest-Stein,
and the slides of Kevin Wayne

Computer Science Department


Boston University

Spring 2020
It is best not to print these slides, but rather to download them
frequently, since they will probably evolve during the semester.
Class structure: please, refer to the course homepage.
Stable matching

• Matchings in a graph. Perfect matchings.


• Nodes: persons. Each orders all possible partners by preference.
• Women, say A, B, C. Men, say X, Y, Z.
• Instability in a matching: a non-matched pair that would be an
improvement for both participants.
X Y Z
1 2 1
A
2 1 3

2 1 2 Unstable, Y proposes to B.
B
1 2 3

3 3 3
C
1 2 3

X Y Z
1 2 1
A
2 1 3
Stable, even if Z and C are
2 1 2
B unhappy . . .
1 2 3

3 3 3
C
1 2 3
Another stable matching:

X Y Z
1 2 1
A
2 1 3

2 1 2
B
1 2 3

3 3 3
C
1 2 3
Question 1 Is there always a stable matching? The answer is not
obvious. If for example everybody can be paired with
everybody (no sex distinction) the answer is no. (This is the
stable roommate problem.)
Question 2 Provided there is a stable matching, how to find it
(efficiently, by an algorithm)?

Example (Stable roommate problem)


Rankings of each of A, B, C, D:
A : (B, C, D), B : (C, A, D), C : (A, B, D), D : (A, B, C).
• D is ranked last by everyone else.
• Every non-D is ranked first by some non-D.
So in every matching, the partner of D will be part of an instability
with the one by whom he/she is ranked first.
Brute force

• We can always find if there is a stable matching and find one,


by going through all possible perfect matchings between n men
and n women.
• What is the number of those matchings?
An algorithm

Question 1: we give an algorithm (invented by Gale and Shapley).


This answers also Question 2.
The algorithm simply says: “men propose, women dispose”. That
is, starting from an empty matching, repeatedly an unmatched man
proposes to the woman he prefers most (among the ones not
matched to somebody they prefer better).

Theorem Terminates in O(n2 ) steps.

Indeed, each step benefits some woman (by giving her a match or
improving her match), without harming any other woman.

Theorem When the algorithm stops, we have a stable matching.

Indeed, no man has any reason to switch, since no woman he


prefers more is available to him.
Discussion

Questions
• Whom does the end result of this algorithm benefit: men or
women?
• How unique is the matching M ∗ we found?

• “Politically correct” version: men-women replaced with


hospitals-students. This is also a major real application.
• Example of an approach to problem solution: introduce some
new order to even where there is not any.
Of course, there are many possible such ideas (tricks,
heuristics). Generating them is only one part of the problem
solution, frequently the smaller one: checking whether the idea
works is just as important! (Example, RSA: Adleman’s role was
to break the cryptosystems proposed by Rivest and Shamir—not
all, only the first 41.)
Definition A valid partner of a person is one with which he/she
is matched in some stable matching.

Theorem In M ∗ ,
a each man gets the best valid partner,
b each woman gets the worst valid partner.

The proof refers to “hospitals-students” in place of “men-women”.


Hospital-optimality of M ∗

• Suppose hospital h is the first one rejected by a student s with


whom it is paired in some stable matching M .
• The rejection happens when h0 proposes, whom s prefers to h.
• h0 prefers s to its pair s0 in M , else it would have been rejected
by s0 before proposing to s—and it had no rejection yet.
• Then h0 − s is unstable in M : contradiction.
Student-pessimality of M ∗

• Suppose that h is not the worst valid partner of its pair s in M ∗ :


it has a less preferred one, h0 , matched with it in a stable
matching M .
• Let s0 be paired with h in M .
• By hospital-optimality, h prefers s to s0 .
• Then h − s is unstable in M : contradiction.
Questions
• Example (when not doing “men propose, women dispose”) of
an infinite series of switches through a cycle of unstable
matchings? Each switch joins an unstable pair and the partners
they leave behind.
• Does every possible algorithm need Ω(n2 ) steps?
• This a lower bound question. Such questions can be very hard,
they need the consideration of all possible algorithms.
• Assume that inspecting each preference is an extra step. How
many preferences do we have to inspect in the worst case?
Think of this as a game between the algorithm asking about the
preferences and an adversary supplying the data.
Solved exercises

1 Suppose that among the n women there are k rich ones, and
also among the n men there are k rich ones. The preference
lists are such that everybody prefers rich persons to the others.
Show that in each stable matching, rich men are married with
rich women.
2 Not all pairs are acquainted, and only acquainted pairs are
allowed to match. There is a natural notion of stable partial
matching for this case. Show that there is always such a
matching.
Five example problems

Instead of writing, say, O(n log n), I will just write n log n. But
Big-Oh is always understood here, see below.
Interval scheduling A simple greedy algorithm gives n log n.
Weighted interval scheduling Dynamic programming, n log n
Bipartite matching nk
Independent set NP-complete. No known algorithm is much
better than brute force (exponential).
Competitive facility location (say Shell and Mobil)
PSPACE-complete
Note that all these are variations on the theme of independent set.
Asymptotic analysis

Let f (n), g(n) be some positive functions. The following


asymptotic notation will be used.

• f (n) = O(g(n)), or f (n) < g(n), if there is a constant c > 0
with f (n) ≤ c · g(n) for all n.
• f (n) = o(g(n)) or, equivalently, f (n)  g(n) if
f (n)
limn→∞ g(n) = 0.
• f (n) = Ω(g(n)) if there is a constant c > 0 with f (n) ≥ c · g(n)
for all n. This is the same as g(n) = O( f (n)), but we generally
expect a simple formula on the right-hand side.

• f (n) = Θ(g(n)) or f (n) = g(n), if both f (n) = O(g(n)) and
f (n) = Ω(g(n)).
The most important function classes: log, logpower, linear, power,
exponential.
Some simplification rules.
• Addition: take the maximum. Do this always to simplify
expressions. Warning: do it only if the number of terms is
constant!
• An expression f (n) g(n) is generally worth rewriting as
2
2 g(n) log f (n) . For example, nlog n = 2(log n)·(log n) = 2log n .
• But sometimes we make the reverse transformation:

3log n = 2(log n)·(log 3) = (2log n )log 3 = nlog 3 .

The last form is easiest to understand, showing n to a constant


power log 3.
Examples

n/ log log n + log2 n ∼ n/ log log n.

Indeed, log log n  log n  n1/2 , hence


n/ log log n  n1/2  log2 n.
Order the following functions by growth rate:

n2 − 3 log log n ∼ n2 ,
log n/n,
log log n,
n log2 n,
3 + 1/n ∼ 1,
(5n)/2n ,
Æ
p
(1.2)n−1 + n + log n ∼ (1.2)n .

Solution:

(5n)/2n  log n/n  1  log log n


Æ

 n/ log log n  n log2 n  n2  (1.2)n .


Sums: the art of simplification

Arithmetic series
Geometric series its rate of growth is equal to the rate of growth
of its largest term.

Example

log n! = log 2 + log 3 + · · · + log n = Θ(n log n).

Indeed, upper bound: log n! < n log n.


Lower bound:

log n! > log(n/2) + log(n/2 + 1) + · · · + log n > (n/2) log(n/2)


= (n/2)(log n − 1) = (1/2)n log n − n/2.
Example Prove the following, via rough estimates:

1 + 23 + 33 + · · · + n3 = Θ(n4 ),
1/3 + 2/32 + 3/33 + 4/34 + · · · < ∞.
Example

1 + 1/2 + 1/3 + · · · + 1/n = Θ(log n).

Indeed, for n = 2k−1 , upper bound:

1 + 1/2 + 1/2 + 1/4 + 1/4 + 1/4 + 1/4 + 1/8 + . . .


= 1 + 1 + · · · + 1 (k times).

Lower bound:

1/2 + 1/4 + 1/4 + 1/8 + 1/8 + 1/8 + 1/8 + 1/16 + . . .


= 1/2 + 1/2 + · · · + 1/2 (k times).
Review of data structures
RAM

RAM (random access machine): a simple model that corresponds


reasonably to the real, complex machines we use.
Memory an array of n (large, fixed number) of memory locations
M [i], each containing a word of some width of w bits.
Normally, w ≈ log n, so an address of any memory location fits
into a word.
Primitive operations A machine-language program is a sequence
of these:
• reading/writing a memory location (given by an address)
• simple arithmetic/logic on words
• branching based on a simple condition, say M [0] > 0.
Computing time Number of primitive operations taken during the
execution of the program.
Higher-level constructs (like functions, recursion) must be
translated by a compiler into a machine-language program.
What is a data structure

Abstract data type Does not say how it represents the data but
does say what operations it is expected to carry out efficiently.
Example: stack (last-in-first-out), queue (first-in-first-out),
priority queue, dictionary.
Data structure A way of representing data that is one level up
from just words in memory.
• Most common: array. Can be fixed-length or
variable-length. Has many-many uses.
Implementation of variable length: copy into a new array
of double size (or half size) if needed.
• Other data structures may consist of some records that
contain fields, and some of these fields are links to other
records. All these fit together in a particular way
described by the structure.
Examples: linked list, binary tree, heap, hash table.
Linked lists

• A (say, doubly) linked list has the advantage that an item can be
inserted/deleted at a given position in it, at unit cost. It
implements stacks and queues easily.
• In practice, implementation by an array (if needed,
variable-length) is often easier and faster.
• If only one end changes (as for a stack), a single array suffices.
• If insertion/deletion at just one place is needed, use two arrays
(the second one in reverse order).
a b c d e f g h

A a b c

B h g f e d

A.length = 3, B.length = 5
Inverting an array

Suppose that an array A of length n contains all the numbers


1, . . . , n in some order. This is also called a permutation. Example

A[1] = 3, A[2] = 2, A[3] = 4, A[4] = 1.

Example of A: Hospital h assigns students 1, 2, 3, 4 the preferences


3, 2, 4, 1.
We may frequently want another array B, its inverse (also a
permutation), that is B[i] = j iff A[ j] = i. So we have

B[1] = 4, B[2] = 2, B[3] = 1, B[4] = 3.

Example of B: Hospital h lists the students in its order of


preferences (starting with the most preferred) as 4, 2, 1, 3.
The array A can be inverted into an array B by the command:
foreach i do
B[A[i]] = i
1 2 3 4 5 6 7 8 9 10 11 12

A 4 3 7 5 1 8 11 12 9 2 6 10

1 2 3 4 5 6 7 8 9 10 11 12

B 2 1 4 3
Implementing Gale-Shapley

Let us implement the algorithm just using arrays.


Input data in two 2-dimensional arrays (indexed from 1):
HrankS[h, s] = the rank given by hospital h to student s.
SrankH[s, h] = the rank given by student s to hospital h. Set
SrankH[s, 0] ← n + 1.
Preprocess for each hospital, invert the array HrankS[h, s]:
HbestS[h, k] shows the kth best student for hospital h.
Matching in two one-dimensional arrays:
Smatch[s] = the hospital matched to student s, 0 if unmatched.
HmatchR[h] = the rank of the student whom hospital h tries to
match. Initially, HmatchR[h] ← 1.
Unmatched hospitals in a stack, represented by an array U[i].
Initially U[i] = i. Variable L keeps the index of its last filled
element U[L]. Initially, L = n.
GS(HrankS, SrankH): // the Gale-Shapley algorithm
foreach h, s do // invert student ranks
HbestS[h, HrankS[h, s]] ← s
while L 6= 0 do // work on last unmatched hospital
h ← U[L]
s ← HbestS[HmatchR[h]]
h0 ← Smatch[s]
if SrankH[h] > SrankH[h0 ] then // h tries next best
HmatchR[h] ← HmatchR[h] + 1
else
Smatch[s] ← h
if h0 = 0 then // one fewer unmatched hospital
L ← L−1
else // h0 gets unmatched, will try next best
U[L] = h0
HmatchR[h0 ] ← HmatchR[h0 ] + 1
Comparing data structures

We may ask the cost (say, in time) to implement various operations


of an abstract data type. Some examples:
Stack operations: push, pop, peek.
Linked list implements each in constant, that is O(1) time.
Variable-length array implements these operations in constant
amortized time. This means that the cost of n operations
(starting from an empty stack) is O(n), though some push
operations may take more than constant time.
Implementation: when the array reaches size 2k then we
double its size, at cost O(2k ). Every increase of size 2k can
be charged to the last 2k previous push operations.
Priority queue of size n. Each record has a key, and the “first”
record is the one with the smallest key. Operations: Insert,
find-first, delete, delete-first, change-key.
Sorted linked list find-first, delete-first in constant time. Others
in time O(n).
Binary heap (recall from the data structure course!): all these
operations in O(log n) time.
We will use this as a component of some algorithms.
Dictionary of size n. Each item has an identifying key.
Operations: insert, find (by the key), delete.
Hash table On average, each operation costs O(1). In the
worst case, it may cost n.
Most software (including Python interpreters, Java
compilers) implement dictionaries via hash tables. In
Python, they are convenient even when containing just 3-4
items.
Balanced tree Each operation costs O(log n).
Large databases use balanced trees.
Graph representation

• A directed graph G = (V, E) is given by a set V of vertices


(nodes) and a set E of edges.
• Edge e = (u, v), u is the start node and v the end node. If u = v
then e is a loop edge.
• Undirected graph: each edge (u, v) can also be written as (v, u),
so it can be represented as the set {u, v}.
• We will always say whether the graph we are talking about is
directed or undirected.
• Parallel edges: when there is more than one edge of the same
form (u, v).
Unless we say so, our graphs will have no parallel edges, and
even no loop edges.
• Typically, we will use the notation n = |V |, m = |E|, that is our
graph has n vertices and m edges. But there will be many other
cases!
• In this graph, the number of vertices is n = 23. By the way, also
the number of edges is m = n = 23, since here each vertex has
exactly one outgoing edge: the outdegree is 1. The indegree
varies from 0 to 3.
• In undirected graphs, we just talk about the degree of a vertex.
Representing a graph in the computer

• For simplicity, assume no parallel edges: for a pair u, v, at most


one edge (u, v).
• With each node and edge, there can be some extra information
to be stored, (call it a label).
Example: vertices represent cities, edges represent roads
between them; the roads have length.
Several possibilities for representing a graph: we will choose one
based on:
• What questions to answer?
• How important is to save on storage?
Some questions:
1 Given vertices u, v, find out if there is an edge (u, v).
2 Given vertex u, find all edges leaving u, or all edges entering u.
Let V = {v1 , . . . , vn }. A natural data structure for answering
question 1 is an adjacency matrix: an array A[i, j] where
A[i, j] = 1 if (vi , v j ) ∈ E and 0 otherwise.
¨
1 if (vi , v j ) ∈ E
A[i, j] =
0 otherwise.

Or, if edges (u, v) has label l(u, v) representing its length, we could
denote the lack of edge by the symbol ∞, and set
¨
l(vi , v j ) if (vi , v j ) ∈ E
A[i, j] =
∞ otherwise.
Example in which the vertices (and the rows and columns of the
matrix) are numbered from 0, and we omit the entries with ∞:
0 1 2 3 4 5 6 7
3
0
5 15 0 5 9 8
1
3
1 12 15 4
8
4 12
2 3 11
7 2
9
7 9
3 9
5
6 1
11
4 4 20 5
5 5 1 13
4 13
4
20
6 6
7 7 6

A directed graph can have, even without loops, n(n − 1) ≈ n2


edges. We call a graph dense if the number of edges is m = Ω(n2 ),
and sparse if it is  n2 .
It wastes storage to represent a sparse graph by an adjacency
matrix. It may be better to represent it by an array of adjacency
lists. This is an array B where for each i, B[i] points to a (linked)
list of all the edges leaving vi . List i, below represents an edge
(vi , v j ) by the pair ( j : l(vi , v j )).

0 3 0 (1:5), (4:9), (7:8)


15
5
1 1 (2:12), (3:15), (7:4)
8
4 12
3 2 (3:3), (6:11)
7 7 2 9
3 (6:9)
9
6 1
4 (5:2), (6:20), (7:5)
5 11 5 (2:1), (6:13)
5
4 13 6
4 6
20 7 (2:7), (5:6)
Many algorithms actually rely on the adjacency lists, so this is not
just an economical data structure but also a useful one. Its storage
requirement is O(n + m) which can be much less than the O(n2 )
needed by the adjacency matrix.
• Most practical examples of graphs are sparse. Think of a Netflix
database, where each user rates each film she has seen. Say, 50
million users, up to 100 films seen by each.
• A graph that might be rather dense: consider a large collection
of documents, say all articles published by the science publisher
Elsevier (say, 3 million). Define a graph whose elements are all
English words in some dictionary, say 20 thousand words. Two
words are connected by an edge if there is some document in
which they appear together. A label could be added, showing
the number of documents in which they appear together.
• Answering the question: “is (u, v) an edge?”:
In the adjacency list representation, worst case: up to n steps, if
vertex u has a high degree. Indeed, we may have to scan the
whole list.
• If adjacency matrix is still not economical, we may use a
dictionary (implemented as a hash table). Two possibilities:
• One dictionary for the whole graph, storing all edges (u, v).
• An own dictionary for each vertex u, for all the edges leaving u.
This makes it also easy to list all edges leaving u.
Sometimes we will not want to choose between data structures, but
use several! For example, lists for incoming edges, lists for
outgoing edges.
Breadth-first search

In a directed (or undirected) graph G = (V, E), an algorithm relying


on adjacency lists: breadth-first search. Passing through all vertices
reachable from some source vertex s, in a certain order:
1 Find vertices reachable from s in one step.
2 Find vertices reachable from s in two steps.
3 And so on.
The algorithm proceeds in stages. Let L k (kth layer) be the set of
vertices reachable in exactly k steps. In stage k, we have found the
set I k = i<k L i (kth interior) of vertices reachable in < k steps,
S

and just finding L k .


Sk
Visited vertices: I k ∪ L k = i=0 L i .
Details of stage k : For each vertex in L k−1 ⊂ I k−1 , add its
unvisited neighbors (using its adjacency list) to L k .
• We can run the algorithm without dividing into stages, by
creating a single queue
0
Q = L k−1 ∪ L k0
0
where L k−1 is the part of L k−1 yet to be processed, and L k0 the
part of L k already added, with L k0 in the back.
• The single step to run repeatedly:

Take off a vertex u from the front of queue Q, visit all its
non-visited neighbors, add them to the end of Q.
• To every new vertex v visited, add the information on the node
u from whose edge list it was taken: we will call it the parent
u = v.parent of v. This way we create a tree called the shortest
path tree: following the parents repeatedly to the root s of the
tree gives a shortest path.
BFS example

• Edge list of each node ordered by (east, north, west, south).


• The record (u : v1 , v2 , v3 ) means removing u from the front of
the queue and adding its newly visited neighbors to the queue.
• Visited nodes green, black, queue Q black.
• Starting point b2, shown after processing a2.
• Blue arrows point to parents.
1 2 3 4
a
(b2 : b3, a2, b1, c2), (b3 : b4, a3, c3),
b (a2 : a1), This stage shown.
(b1 : c1), (c2 : d2), (b4 : a4, c4),
c (a3 :), (c3 : d3), (a1 :), (c1 : d1), (d2 :),
(a4 :), (c4 : d4), (d3 :), (d1 :), (d4 :)
d
Connected components

• Let G = (V, E) be an undirected graph, and s ∈ V a vertex. The


set of vertices connected to s by a path (possibly by a “path” of
length 0, so including s) is called the connected component of s.
• Clearly, if u is connected to v then v is also connected to u.
Their components coincide, and if not then they are disjoint. So
the set of vertices is a disjoint union V = C1 ∪ · · · ∪ Ck of
components: it is partitioned into the components.
• G is connected if it consists of a single component.
• We saw an algorithm that finds the connected component of a
vertex s, namely all vertices reachable from s: breadth-first
search.
An undirected graph with 3 components.
Directed graphs

• The case of directed graphs is more complex: a directed path


(v0 , v1 , . . . , vk ) is such that for each i, (vi , vi+1 ) is an edge.
Unless saying otherwise, by a “path” in a directed graph we
mean a directed path.
We say that v is reachable from u if it is reachable on a directed
path. We denote it as

u → v.
∗ ∗ ∗
• This relation is transitive: (u → v) ∧ (v → w) ⇒ (u → w).
∗ ∗
• In a directed graph, if u → v and v → u then we say that u and
v are strongly connected. The strongly connected component of
u is the set of all vertices strongly connected to it (including
itself, connected by a “path” of length 0).
• Again, the set of vertices is a disjoint union of such components.
There are only 2 strong components of size > 1. The red edges run
within the components. How many components are here?

Merging each component into a single vertex (shown black here),


this new component graph is acyclic: has no directed cycle.
Acyclic graphs

∗ ∗
• A graph is acyclic if and only if u → v and v → u implies u = v.
(Prove it!) For an acyclic graph G = (V, E) we will also use the

notation u ≤ v for u → v.
• A relation ≤ that is transitive with (u ≤ v) ∧ (v ≤ u) ⇒ (u = v)
is called a partial order. (Partial since it may happen that
neither u ≤ v nor v ≤ u holds.)
Topological sort

• If the vertices of a graph G = (V, E) are listed in an order


v1 , . . . , vn with the property that if (vi , v j ) is an edge then i < j,
then this list is called a topological sort.
• Clearly not possible if G has any directed cycle. But always
exists if G is acyclic: we will even see an algorithm for it.
• The topological sort extends the partial order defined by G into
a complete order.
• If the order is already complete then G contains a directed path
going through all vertices. (Prove it!)
d

b c

This acyclic graph has two possible topological sorts: a, b, c, d and


a, c, b, d. Adding the edge (b, c) turns it into a complete order
(unique topological sort).
Depth-first search

Note For this material the textbook by CLRS (Cormen ,


Leiserson, Rivest, Stein) gives more useful detail than
Kleinberg-Tardos.

• Another way to visit all vertices of a (directed or undirected)


graph by following edges: from every newly visited vertex,
follow an edge immediately to other unvisited vertices.
Backtrack if there is none.
• Very simple to define using recursion.
• The path tree it defines is very different from shortest path
trees: typically, it is narrow with long paths instead of wide
with short paths.
• Its nice properties come handy sometimes (like here for
topological sort).
1 2 3 4
a

• Depth-first search on an undirected graph. The adjacency lists


are ordered (east, north, west, south). Starting from vertex c3.
• Produces a long skinny path tree (parent directions shown).
1 2 3 4
a

• Breadth-first search on an undirected graph. The adjacency lists


are ordered (east, north, west, south). Starting from vertex c3.
• Produces a short bushy path tree (parent directions shown).
• Use the adjacency list representation, take the vertices from
some listIn containing all elements of V in some order.
• The record u representing a vertex has a Boolean field u.visited.
• After processing each vertex, add it to the end of listOut.

DFS-visit(G, u, listOut): // Visit all vertices reachable from u


u.visited ← True
foreach out-neighbor v of u do
if not v.visited then
v.parent ← u
DFS-visit(G, v, listOut)
add u to listOut

DFS(G):
listOut ← []
foreach u in V do
if not u.visited then
DFS-visit(G, u, listOut)
return listOut
2/15 3/14 6/10 7/9 29/31
28/32
30/30
5/11 8/8
4/13 26/34
1/16 18/18
22/22
12/12 27/33 35/35 36/36
19/19
17/20

21/23 24/24 25/25

Example discover/finish times of depth-first search. Output in


order of finish times:
8,9,10,11,12,13,14,15,16,18,19,20,22,23,24,25,30,31,32,33,34,35,36.
∗ ∗
Theorem If u → v, but not v → u, then depth-first search
outputs v before u.

Proof.
• Suppose that DFS-visit(G, u, L 0 ) was called before
DFS-visit(G, v, L 0 ). Then DFS-visit(G, v, L 0 ) was called
recursively from within the call DFS-visit(G, u, L 0 ), therefore v
will be output before u, since u is output only at the end of
DFS-visit(G, u, L 0 ).
• Suppose that DFS-visit(G, u, L 0 ) was called after

DFS-visit(G, v, L 0 ). Since not v → u, call to DFS-visit(G, v, L 0 )
ends before the call to DFS-visit(G, u, L 0 ), and outputs v earlier.

Corollary If G is acyclic then depth-first search outputs the


vertices in reverse topological order.
Finding the strong components

• For directed graph G we can define the transpose graph


G T = (V, E T ) by reversing the edges of G: same vertices, but
(u, v) ∈ E T iff (v, u) ∈ E.
• The adjacency list of G T represents G by listing for each vertex
u the incoming edges of G (these are the outgoing edges of G T ).
• The graphs G, G T are different, but have the same strong
components.
• We will see that taking the output list L 0 of DFS(G, L, L 0 ) in
reverse order and running a depth-first search of G T on it, we
obtain the strong components of G.
• Each strong component will be stored in its own list
strongComps[i]: its elements are
strongComps[i][0], strongComps[i][1], . . ..
findStrongComps(G):
listOut = DFS(G)
foreach u in V do u.visited ← 0
strongComps ← []; i ← 0
for j = n to 1 do
u ← listOut[ j]
if u.visited = 0 then
strongComps[i] ← []
DFS-visit(G T , u, strongComps[i])
add strongComps[i] to strongComps
i ← i+1
return strongComps
The program processes listOut in decreasing order.
• When DFS-visit(G T , u, strongComps[i]) is called, the whole
strong component of u is still unvisited (else u would have been
visited, too). All its elements are added to strongComps[i].
∗ ∗
• Nothing else is added: indeed, suppose v → u, but not u → v.
Then by the Theorem, v comes after u in listOut, so it has been
processed before the call to DFS(G T , u, ·).
2/15 3/14 6/10 7/9 29/31
28/32
30/30
5/11 8/8
4/13 26/34
1/16 18/18
22/22
12/12 27/33 35/35 36/36
19/19
17/20

21/23 24/24 25/25

Strong components, as discovered by the algorithm (by their


original finish times):

{36}, {35}, {34}, {33}, {32}, {31}, {30}, {25}, {24}, {23}, {22},
{20}, {19}, {18}, {16, 12, 13, 14, 15}, {11, 8, 9, 10}.
Greedy algorithms

A greedy algorithm generally assumes


• An objective function f (x 1 , . . . , x n ) to optimize that depends on
some choices x 1 , . . . , x n . (Say, we need to maximize f (·).)
• A way to estimate, roughly, the contribution of each choice x i to
the final value, but without taking into account how our choice
will constrain the later choices.
The algorithm makes the choice with the best contribution.
• Is generally fast.
• May not find the optimum, but sometimes it does.
• Even when it does not find the optimum it may find a good
approximation to it.
Interval scheduling (activity selection)

Given: activities

(si , f i )

with starting time si and finishing time f i .


Goal: to perform the largest number of activites. Example:

(1, 4), (3, 5), (0, 6), (5, 7), (3, 8), (5, 9), (6, 10),
(8, 11), (8, 12), (2, 13), (12, 14).
Greedy algorithm Repeatedly choose the activity with the
smallest f i compatible with the ones already chosen.
Rationale This restricts our later choices least.
On the example:

(1, 4), (3, 5), (0, 6), (5, 7), (3, 8), (5, 9), (6, 10),
(8, 11), (8, 12), (2, 13), (12, 14).

Chosen: (1, 4), (5, 7), (8, 11), (12, 14).


Is this correct? Yes, by design, since we always choose an activity
compatible with the previous ones.
Is this best? By induction, it is sufficient to see that in the first
step, the greedy choice is best possible. It is, since if an activity
is left available after the some choice, it is also left available
after the greedy choice.
How efficient? The cost is dominated by the cost of sorting.
Other possible strategies say, sorting by starting time we get only
(0, 6), (6, 10), (12, 14).
Interval partitioning

Suppose our intervals are activities to be performed on some


computers, and all must be scheduled. The goal is to schedule
them on the smallest number of computers.
Lower bound If some point of time is contained in k activites then
we need at least k computers. Example: activities
(1, 5), (3, 9), (6, 8), (4, 6). The activities (1, 5), (3, 9), (4, 6) all
contain point 4, so we need at least 3 computers. Let the depth
d be the size of the largest such overlap.
A “greedy” algorithm • Sort the activities by starting time.
• Repeatedly, schedule the next activity on the first available
computer.
Could this need > d computers? With this algorithm, at the first
time when k computers do not suffice, there is an overlap of
k + 1.
Implementing interval partitioning
Input: Array A with A[i] = (si , f i ), i = 1, . . . , n.
Output: List C[ j] of activities i assigned to computer j.
For computer j, let F [ j] be the finishing time of its last scheduled
activity. To always find j with the smallest F [ j], can use a priority
queue Q of records ( j, F [ j]) ordered by key F [ j].

intervalPartitioning(A):
k ← 1; F [1] ← 0; insert (1, F [1]) into Q
for i = 1 to n do
Let F [ j] be minimum key in Q
if si < F [ j] then
k ← k + 1; j ← k
else
delete the minimum ( j, F [ j]) from Q
append i to C[ j]; F [ j] ← f i
insert ( j, F [ j]) into Q
Cost of interval partitioning

Since Q can be implemented by a heap, each iteration costs at most


O(log d) steps, so the total time cost is O(n log d).
Minimize maximum lateness

Now each task i has a duration t i and a deadline di (and only one
computer to process them). We choose start time si , then
f i = si + t i . Minimize the maximum of all latenesses L i = f i − di
among all tasks.
“Greedy” algorithm Earliest deadline first.
Why optimal?

Consider a different order. Look at the first consecutive pair for


which this order is different from the order of their deadlines: say
d2 < d3 , s3 < s2 .

..............d2............d3...................
s3..............................f3.............f2
s2.............f2..............................f3

L2 − L3 = (s3 + t 3 + t 2 − d2 ) − (s3 + t 3 − d3 ) = t 2 − d2 + d3 > 0,

so L2 > L3 . After swap, L2 decreases by t 3 , and L3 increases by the


same. Decreased the larger and increased the smaller—the
maximum decreases.
(We used an exchange argument: if permutation π differs from the
original order ι then has an inversion in a consecutive pair: swap
brings π closer to ι.)
Other possible choices

With counterexamples: order by length t i , by slack di − t i .


(We may leave this to the lab.)
Task partitioning

• Suppose that we have again tasks i = 1, . . . , n with (integer)


durations t i , no deadlines, and 2 computers. What is the
shortest time we can finish them all?
• It is a very difficult problem to solve this problem exactly:
maybe all algorithms for it take exponential time.
• Suggest a greedy algorithm! How far can its result get from the
optimum?
Vertex cover

For an undirected graph G = (V, E), a vertex cover is a set S of


vertices such that every edge has one of its ends in S. Example:

• Finding the smallest possible vertex cover is known to be hard.


(Is the cover in the example optimal?)
• What would be a greedy algorithm giving at least a “rather
good” vertex cover?
A bad graph for the greedy vertex cover algorithm

Row 1 has 16 points. Points of row i connect to disjoint groups of


size i in the first row. The number of points in rows 2-16 is
8 + 5 + 4 + 3 + 2 + 2 + 2 + 1 + · · · + 1 = 34. The greedy algorithm
picks all these, instead of just the 16 points of the first row.
• The example (with n in place of 16) shows that a vertex cover
given by the greedy algorithm can be Ω(log n) larger than the
optimum. (Details below.)
• It is a theorem (proof skipped) that this is the worst possibility:
it is at most O(log n) times larger than the optimum. (Also true
for the more general problem of set cover.)
A non-greedy algorithm that is never that bad

• Repeatedly: pick an uncovered edge. Add both ends to the


cover. (Not optimal: the yellow vertices are not needed.)
• If we picked k edges, these are disjoint, so every vertex cover
must have size ≥ k. Our vertex cover has size 2k, so it is
guaranteed to be at most twice worse than the optimum.
• The example does not say that the greedy vertex cover
algorithm is always bad: only that in some bad cases it can be
much worse than this non-greedy one.
Analyzing the counterexample

Each row i ≥ 2 has bn/ic points. If i ≤ n/2 then

n n−i n/2
bn/ic ≥ −1= ≥ .
i i i
So our sum is at least
n 1 1 1
 ‹
+ + ··· + .
2 2 3 n/2
1 1
Let us lower-bound 2 + 3 + · · · + 1k . Write it as

1 1 1 1 1
 ‹  ‹
+ + + + ··· + + ··· .
2 3 4 5 8

There are blog kc full groups, each of size ≥ 1/2.


The sum is ≥ blog kc/2 = Ω(log k) = Ω(log n) for k = n/2.
Shortest paths
Composite words

Sometimes, it takes some thinking to see that a problem is a


shortest path problem.
Example: how to break up some composite words?
Personaleinkommensteuerschätzungskommissionsmitglieds-
reisekostenrechnungsergänzungsrevisionsfund (Mark Twain)
With a German dictionary, break into relatively few components.
Graph points all division points of the word, including start and
end.
Edges if the word between the points is in the dictionary (maybe
without the “s” at the end).
Path between the start and end corresponds to a legal breakup.
(Note that this graph is acyclic.)
• The word breakup problem is an example where the graph is
given only implicitly: To find out whether there is an edge
between two points, you must make a dictionary lookup.
• We may want to minimize those lookups, but minimizing two
different objective functions simultaneously (the number of
division points and the number of lookups) is generally not
possible.
(For minimizing lookups, depth-first search seems better.)
Car racing

start finish

In each step, the speed vector can change only by 1 in each


direction. We have to start and arrive with speed 1, vertical
direction. There is a graph in which this is a shortest path problem.
Vertices (point, speed vector) pairs (p, v).
Edges between (p1 , v1 ) and (p2 , v2 ): if p2 − p1 = v1 , |v2 − v1 |∞ ≤ 1.
Here |(x, y)|∞ = max(|x|, | y|) is the so-called maximum norm.
Shortest paths with edge weights (lengths)

• Weight of edge e = (u, v): l(e) = l(u, v).


• Weight of path: the sum of the weights of its edges.
• Shortest path: lightest path.
• Distance δ(u, v) is the length of lightest path from u to v.
Variants of the problem:
• Single-pair (from source s to destination t).
• Single-source s: to all reachable points. Returns a tree of
lightest paths, represented by the parent function v 7→ v.π.
• All-pairs.
Negative weights? These are also interesting, but first we assume
that all weights are nonnegative.
Dijkstra’s algorithm

• Let d(u) be the distance of point u from s.


• Follow the idea of breadth-first search. At any given time, for
some value x, we will have already found the set S of all points
u with d(u) < x and possibly some with d(u) = x.
• Key observation: if v ∗ is next closest (to be added to S), then it
is among those reachable by an edge from some u ∈ S: indeed,
some such u is on any shortest path to v ∗ : d(s) = d(u) + l(u, v ∗ ).
• Maintain the set Q of all points v not in S but reachable by an
edge from S. Maintain on it a distance upper bound

d 0 (v) = min d(u) + l(u, v).


u∈S

As we have seen, d 0 (v ∗ ) = d(v ∗ ) (but generally d 0 (v) 6= d(v) for


v 6= v ∗ ).
• The next element to take off Q and to add to S is
v ∗ = arg min v∈Q d 0 (v). Set d(v ∗ ) = d 0 (v ∗ ).
• Check all v 6∈ S in the adjacency list of v ∗ . If
d(v ∗ ) + l(v ∗ , v) < d 0 (v) set d 0 (v) ← d(v ∗ ) + l(v ∗ , v) and add v to
Q if it is not there yet.
s
0 3
5 15
1
8 3
4 12

7 7 2 9
9
6 1
5 11
5
4 13
4 6
20
t

We show S 0 = S \ {0} and Q as lists of v(d)u, where the (current)


shortest path to v has length d and last link (u, v).
S 0 = {1(5)0}, Q = {2(17)1, 3(20)1, 4(9)0, 7(8)0}
S 0 = {1(5)0, 7(8)0}, Q = {2(15)7, 3(20)1, 4(9)0, 5(14)7}
S 0 = {1(5)0, 4(9)0, 7(8)0}, Q = {2(15)7, 3(20)1, 5(13)4, 6(29)4}
S 0 = {1(5)0, 4(9)0, 5(13)4, 7(8)0}, Q = {2(14)5, 3(20)1, 6(26)5}
S 0 = {1(5)0, 2(14)5, 4(9)0, 5(13)4, 7(8)0}, Q = {3(17)2, 6(25)2}
S 0 = {1(5)0, 2(14)5, 3(17)2, 4(9)0, 5(13)4, 7(8)0}, Q = {6(25)2}
Implementing Dijkstra’s algorithm
Compute for every v distance d[v] from s and parent π[v].
To find v ∗ = arg min v∈Q d 0 (v) efficiently, use a priority queue Q of
pairs (δ, v) with value δ.

Dijkstra(G, s):
foreach v do
d[v] ← ∞; done[v] ← False
d[s] ← 0
Set up priority queue Q with (d[s], s) ∈ Q
while Q is not empty do
(δ, u) ← removeMin(Q)
if not done[u] then
done[u] ← True
for out-neighbors v of u do
γ ← d[u] + l(u, v)
if γ < d[v] then
π[v] ← u; d[v] ← γ; add (γ, v) to Q
• In this implementation, a vertex v gets onto Q in a new copy
every time v is reached on a new edge. But Q is still at most as
large as the number of edges, so log |Q| ≤ 2 log n.
• In a version with just one copy of each vertex on the heap, we
need a heap implementation allowing the removal of items (in
some other applications they are necessary). Most
implementations in the usual libraries don’t have this function.
Minimum spanning trees

• With respect to connectivity, another important algorithmic


problem is to find the smallest number of edges that still leaves
an undirected graph connected. More generally, edges have
weights, and we want the lightest tree.
• Negative weights are also allowed: this allows to ask for the
heaviest tree, too.
• Generic algorithm: Repeatedly, add some edge that does not
form a cycle with earlier selected edges.
A cut of a graph G is any partition V = S ∪ T , S ∩ T = ;. It respects
edge set A if no edge of A crosses the cut.

Theorem If the edge set A is a subset of some lightest spanning


tree, S a cut respecting A then after adding any lightest edge across
S to A, the resulting A0 still belongs to some lightest spanning tree.
Prim’s algorithm

Keep adding a lightest edge adjacent to the already constructed


tree.
Implement this similarly to Dijkstra’s algorithm: maintain a set Q of
neighbors v adjacent to the tree S, organize it as a priority queue.
The main difference to Dijkstra’s algorithm is the key value v.key:
Prim: smallest edge length (so far) from the current tree T to v.
Dijkstra: smallest path length (so far) from the source s to v.
s
0 3
5 15
1
8 3
4 12

7 7 2 9
9
6 1
5 11
5
4 13
4 6
20

We show S 0 = S \ {0} and Q as lists of v(d)u, where d is the length


of the (current) smallest edge (u, v) from S.
S 0 = {}, Q = {1(5)0, 7(8)0, 4(9)0}
S 0 = {1(5)0}, Q = {2(12)1, 3(15)1, 4(9)0, 7(4)1}
S 0 = {1(5)0, 7(4)1}, Q = {2(7)7, 3(15)1, 4(5)7, 5(6)7}
S 0 = {1(5)0, 7(4)1, 4(5)7},
Q = {2(7)7, 3(15)1, 5(6)7, 5(4)4, 6(20)4}
S 0 = {1(5)0, 7(4)1, 4(5)7, 5(4)4}, Q = {2(1)7, 3(15)1, 6(13)5}
S 0 = {1(5)0, 7(4)1, 4(5)7, 5(4)4, 2(1)7}, Q = {3(3)2, 6(11)2}
S 0 = {1(5)0, 7(4)1, 4(5)7, 5(4)4, 2(1)7, 3(3)2}, Q = {6(9)3}
Solved exercises

Question Given a graph with edge costs and an edge e, what is an


algorithm to decide whether there is a minimum spanning tree
containing e?
Answer Build a tree by Prim’s algorithm but starting from e, and
compare it with one built from scratch.
Implementing Prim’s algorithm
Compute for every v a parent π[v].
(Meaning of d[v] is different from Dijkstra.)

Prim(G, s):
foreach v do
done[v] ← False; d[v] ← ∞
d[s] ← −∞ // 0 for Dijkstra.
Set up priority queue Q with (d[s], s) ∈ Q
while Q is not empty do
(δ, u) ← removeMin(Q)
if not done[u] then
done[u] ← True
for neighbors v of u do
if not done[v] then
γ = l(u, v) // d[u] + l(u, v) for Dijkstra.
if γ < d[v] then
π[v] ← u; d[v] ← γ; add (γ, v) to Q
Kruskal’s algorithm

Another minimum spanning tree algorithm based on the same


theorem:

Kruskal’s algorithm Keep increasing a forest (cycle-free graph)


F , starting from the set of all points and no edges. Keep adding to
F the shortest one among all edges of G that do not create a cycle
(they connect two different trees of F ) .

Efficient implementation needs a smart way to track the


components of the graph; see below.
Clustering

Example Netflix has a database of a lot of customers i,


i = 1, . . . , n.. Each has a set Fi of films they have seen. For
customers i, j, let

l(i, j) = |Fi ∆F j | = |Fi \ F j | + |F j \ Fi |

be the number of films that one of them has seen and the other one
has not. This “distance” is an indicator of their difference in tastes.
Netflix wants to classify its customers: create clusters of them. One
way is (not the best, too rigid), for a given δ, to break them up into
the largest number of subsets with the property that customers i, j
in different groups are at distance l(i, j) > δ from each other.

Kruskal’s algorithm solves the clustering problem: just stop adding


edges when they become larger than δ.
Union-find

• To implement Kruskal’s algorithm, we need to keep track of a


family of disjoint sets (in case of the Kruskal algorithm, the
trees in the forest).
• Now concentrate on this task, abstracting away from the
spanning tree problem.
• The Union-Find abstract data type has a family of disjoint sets
S1 , . . . , Sk , Si ⊂ {1, . . . , n}. Two operations:
• Find-Set(x): Given an element x ∈ {1, . . . , n}, find Si with x ∈ Si .
• Union(i, j): given i, j, replace sets Si , S j with their union Si ∪ S j .
Implementing Union-Find

A common idea: represent each set Si by some root element


ui ∈ Si . We will write Si = S(ui ).
How to answer Find-Set questions?
First idea • For each element v let rep[v] = u where v ∈ S(u).
• Also, for each set S(u) a list (say, linked) of its elements.

• Find-Set(x)] returns rep[x]. Fast.


• Union(u, v): for example rep[u] ← v.
Add the list of S(v) to the list of S(u).
rep[w] ← u for all w ∈ S(v). This part is slow.
4 11 10

1 3 2 5 6 7 8 13 9 12 14 15

4 10

9 12 14 15
1 3

Union(4, 10): the black elements w had rep[w] reassigned.


Second idea A tree T (u) with root u for each set S(u).
• parent[v] belongs to the set of v, parent[u] = u for root u.
• rank[u]: (bound on) the height of the tree under u.

• Union(u, v):
if rank[u] ≤ rank[v] then parent[u] ← v.
If rank[u] = rank[v] then rank[v] ← rank[v] + 1.
Fast.
• Find-Set(x) is slower: follow parents to the root.
But each path has length ≤ log n. Indeed, if a vertex has
rank > 1 it has at least 2 children.
4 11 10

15
2 5 6 7 8 13 9 14
1 3

10 12
4

15
9 14
1 3
12

Union(4, 10)
Path compression

The log n cost per Find-Set() is not bad, but can be improved by the
following trick.
• With each Find-Set(), set the parent of all passed elements to
the root.
See the CLRS book for how much this saves you in in amortized
cost: for n operations, the cost is O(nα(n)) where α(n) grows much
slower than even log n.
10 10
4
13

15 4 15
9 14 1 9 14
1
3 12 3 12
7 11 7 11

8 13 2 5 6 8 2 5 6

Find-Set(13)
Find-Set(x): // With path compression
if x 6= parent[x] then
parent[x] ← Find-Set(parent[x])
return parent[x]

Union(x, y):
x 0 ← Find-Set(x); y 0 ← Find-Set( y)
if rank[x 0 ] > rank[ y 0 ] then
parent[ y 0 ] ← x 0
else
parent[x 0 ] ← y 0
if rank[x 0 ] = rank[ y 0 ] then
rank[ y 0 ] ← rank[ y 0 ] + 1
Implementing the Kruskal algorithm

Kruskal(G):
// Outputs the edges of a minimum spanning tree.

Sort the edges of G by weight into a list E


foreach v do
parent[v] = v; rank[v] = 0
while E is not empty do
remove the smallest-weight edge (u, v) from E
if Find-Set(u) 6= Find-Set(v) then
output edge (u, v)
Union(u, v)

Note The forest of the Kruskal algorithm consists of trees, and


the Union-Find data structure consists of trees on the same sets.
But these trees have nothing to do with each other!
Divide and conquer

An efficient algorithm can frequently be obtained using the


following idea:
1 Divide into subproblems of equal size.
2 Solve subproblems.
3 Combine results.
In order to handle subproblems, a more general procedure is often
needed.
Merge sort

1 Subproblems: sorting A[1 . . n/2] and A[n/2 + 1 . . n]


2 Sort these.
3 Merge the two sorted arrays of size n/2.
The more general procedures now are the ones that sort and merge
arbitrary parts of an array.
Merge(A, p, q, r): // Merges A[p . . q] and A[q + 1 . . r].
n1 ← q − p + 1; n2 ← r − q
create array L[1 . . n1 + 1] and R[1 . . n2 + 1]
for i ← 1 to n1 do L[i] ← A[p + i − 1]
for j ← 1 to n2 do R[ j] ← A[q + j]
L[n1 + 1] ← ∞; R[n2 + 1] ← ∞
i ← 1, j ← 1
for k ← p to r do
if L[i] ≤ R[ j] then A[k] ← L[ j]; i++
else A[k] ← R[ j]; j++

Why the ∞ business? These sentinel values allow to avoid an


extra part for the case that L or R are exhausted. This is also why
we used new arrays for the input, rather than the output.
Merge-Sort(A, p, r): // Sorts A[p . . r].
if p < r then
q ← b(p + r)/2c
Merge-Sort(A, p, q)
Merge-Sort(A, q + 1, r)
Merge(A, p, q, r)

Analysis, for the worst-case running time T (n):


¨
c1 if n = 1
T (n) ≤
2T (n/2) + c2 n otherwise.
Resolving the recursive inequality

Assume that n = 2k . When it is not, we just consider the smallest


number n0 = 2k > n (assuming that we sort a larger array).

T (n) ≤ c2 n + 2T (n/2) (1)


T (n/2) ≤ c2 n/2 + 2T (n/4) (2)
T (n) ≤ c2 n + c2 n + 4T (n/4) substituted (2) into (1)
...
T (n) ≤ kc2 n + 2k T (n/2k ) ≤ n(c1 + c2 log n) = O(n log n).
Recursion tree

n Work on top level

n/2 n/2 Total work on level 1

n/4 n/4 n/4 n/4 Total work on level 2

...
Nonrecursive version

Perform first the jobs at the bottom level, then those on the next
level, and so on. In passes k and k + 1:

2k−1

merge merge merge merge

2k

merge merge
Closest pair of points

In the plane, for points p, q, let d(p, q) denote the distance.

Question Given a set P of n points in the plane, find a pair


p, q ∈ P with d(p, q) minimal.

A brute-force solution takes O(n2 ) steps. Can we do better?


For simplicity, let n = 2k .
• Let Px = the points of P, sorted by the x coordinate.
Let Q = be the first n/2 points in Px , and R the rest.
• Find the smallest distance δQ in Q and the smallest distance δR
in R. Let δ = min(δQ , δR ).
We found the closest pair unless it is some (q, r) with q ∈ Q, r ∈ R.
Next we deal with this possibility, called the boundary case (q, r).

Q R

m
• Let m = the maximal x coordinate of points in Q.
Let S = the set of points in P whose x coordinate differs from m
by at most δ.
• If (q, r) is a boundary case then both q and r are in S. How to
find them?
• Let S y be the list in which the set S is sorted by the y
coordinate. The crucial observation:

Lemma q, r are within 15 positions of each other in S y .

It follows that we can find all such q, r in linear time: indeed, for
every q ∈ S y we only have to check the next 15 ones to see if they
can form a boundary pair (q, r).
Proof of the lemma. Vertical line: x coordinate m.

Each little square contains at most one point of S, since two would
be too close. If q has the smaller y coordinate then r must be in
one of the 16 little squares above.
Running time analysis

• Sort in both the x and the y directions, to get Px and P y .


No more sorting needed during the recursive calls.
• Let F (n) be the running time of the algorithm after this sorting.

F (2) = 1,
F (n) ≤ 2F (n/2) + cn.

Same recursion as for merge sort, same resolution: total


running time is O(n log n).
Integer multiplication

Some applications (say in cryptography) require the multiplication


of large integers. We will use binary notation:

(10110)2 = 0 · 1 + 1 · 2 + 1 · 22 + 0 · 23 + 1 · 24 ,
(x n−1 x n−2 . . . x 1 x 0 )2 = x 0 + x 1 · 2 + · · · + x n−1 2n−1 .

Multiplying two numbers:

X Y = (x 0 + x 1 · 2 + · · · + x n−1 · 2n−1 )( y0 + y1 · 2 + · · · + yn−1 · 2n−1 ).

School method: needs at least to compute all products x i y j , so


costs Ω(n2 ). As old as the positional number system (thousands of
years). Was first improved only around 1960.
Divide and conquer

X 0 = (x n/2−1 . . . x 1 x 0 )2 = x 0 + 2x 1 + · · · + 2n/2−1 x n/2−1 ,


X 1 = (x n−1 x n−2 . . . x n/2 )2 ,
X = X 0 + 2n/2 X 1 , Y = Y0 + 2n/2 Y1 ,
X Y = X 0 Y0 + 2n/2 (X 0 Y1 + X 1 Y0 ) + 2n X 1 Y1 .

• Gives a recursive inequality for running time:

T (n) ≤ 4T (n/2) + cn,

since we computed X i Y j and then made some additions.


• Does not save anything: Even the stronger inequality
T (n) ≤ 4T (n/2) allows T (n) = n2 .
The Karatsuba trick

We don’t need X 0 Y1 and X 1 Y0 separately, only their sum. This


appears as part of a single product

(X 0 − X 1 )(Y0 − Y1 ) = X 0 Y0 + X 1 Y1 − (X 0 Y1 + X 1 Y0 )

of maximum n-bit numbers. Its other parts are already computed!


New algorithm:

RecMult(X , Y ):
Find X 0 , X 1 , Y0 , Y1
p ← RecMult(X 0 , Y0 )
q ← RecMult(X 1 , Y1 )
r ← RecMult(X 0 − X 1 , Y0 − Y1 )
return p + 2n/2 (p + q − r) + 2n q
Running time analysis
Again assuming n = 2k :

M (n) ≤ cn + 3M (n/2) (3)


M (n/2) ≤ cn/2 + 3M (n/4) (4)
M (n) ≤ cn + (3/2)cn + 9M (n/4) where we substituted (4) into (3)
M (n) ≤ cn + (3/2)cn + (3/2)2 cn + 27M (n/8),
... =1

M (n) ≤ cn(1 + (3/2) + (3/2)2 + · · · + (3/2)k−1 ) + 3k M (n/2 ), k

1 + (3/2) + (3/2)2 + · · · + (3/2)k−1


≤ (3/2)k−1 (1 + (2/3) + (2/3)2 + · · · ) = (3/2)k−1 · 3 = 3k /2k−1 ,
M (n) ≤ c · 2k · 3k /2k−1 + 3k = (2c + 1)3k ,
3k = 2k log 3 = nlog 3 .

1
(Sum of geometric series: for q < 1, 1 + q + q2 + . . . = 1−q .)
• This algorithm is faster than the school method:
O(nlog 3 ) ≈ O(n1.585 ) in place of O(n2 ).
• Only the first step towards faster multiplication: the current
best algorithm has complexity O(n log n), faster than O(nλ ) for
any λ > 1.
Applications
• Same algorithm works for base 10; it applies even when X , Y
are polynomials:

X (z) = x 0 + x 1 z + x 2 z 2 + · · · + x n−1 z n−1 .

• The coefficient of z k in the product X (z)Y (z) is

x 0 yk + x 1 yk−1 + · · · + x k y0 .

Called the convolution of (x 0 , . . . , x n−1 ) and ( y0 , . . . , yn−1 ).


• Take two independent random variables, A, B with integer
values: A takes value i ≥ 0 with probability pi , B takes it with
with probability qi . Then A + B takes value k with probability

p0 qk + p1 qk−1 + · · · + pk q1 .

So this is the convolution of (p0 , p1 , . . . ) and (q0 , q1 , . . . ), and


can now be computed faster!
Solved exercises

1 Find the maximum of a unimodal function.


2 Find the maximum difference between any two elements ai , a j
of a sequence a1 , . . . , an .
Dynamic programming
Weighted interval scheduling

Interval scheduling problem, but with additional complication:


each task i = 1, . . . , n with start and finish times si < f i has some
value vi . Instead of maximizing the number of scheduled tasks, we
want to maximize the total value.
Example Task Ti = si (vi ) f i : starting time s, endtime f , value v.

T1 = 0(2)3, T2 = 1(4)5, T3 = 4(4)6,


T4 = 2(7)9, T5 = 7(2)10, T6 = 8(3)11.

2 4 4 2

0 1 2 3 4 5 6 7 8 9 10 11
3
7

Earliest-deadline-first would choose T1 , T3 , T5 . But its total value is


smaller than that of T1 , T3 , T6 .
Recursion idea Compute the optimum not just for the whole
system, but for many subsystems as well:
OPT(i) = the maximum total value of all possible selections
from tasks 1, 2, . . . , i.
Order by finish time again: f1 ≤ · · · ≤ f n . The last task before i
disjoint from it is

p(i) = max{ j : f j ≤ si },

Recursion:

OPT(i) = max of the two quantities below :


OPT(i − 1), passing Ti
vi + OPT(p(i)). including Ti .

On the example:

i 1 2 3 4 5 6
Ti 0(2)3 1(4)5 4(4)6 2(7)9 7(2)10 8(3)11
p(i) 0 0 1 0 3 3
OPT(i) 2 4 6 7 8 9
Use a table

• Just calling the recursive formula is unwise, leads to


exponential blowup.
Example: with, say, p(i) = i − 2 for each i:

n
n−1 n−2
n−2 n−3 n−3 n−4
n−3 n−4 n−4 n−5 n−4 n−5 n−5 n−6

• Save the computed values OPT(1), OPT(2), . . . , OPT(n) in some


array M . Instead of the recursive calls, just refer to M :

for i = 0 to n do M [n] ← 0
for i = 1 to n do
M [i] ← max(M (i − 1), vi + M [p(i)])
Memoization

General idea: memoization, (or caching).


• Applies when
• Some recursive calls would be repeated many times.
• There is enough memory to store the results for all of them.
• Store all these results in a table (in programming, a “static”, or
“global” array).
• In each call, first check whether the result is already in the
table: if yes, just return it, else compute it and store it.

RWS(i) : // Recursive weighted scheduling


if M [i] is not defined then
M [i] ← max(RWS(i − 1), vi + RWS(p(i)))
Running time found by tracing: at most twice that of the
non-recursive version above.
How to find which tasks are selected? Backwards (like following
parents in Dijkstra’s algorithm), collecting tasks in a set S:

i ← n; S = ;
while i > 0 do
if M [i] = M [i − 1] then i−−
else S ← S ∪ {i}; i ← p(i)
Other examples

• Fibonacci numbers.
• Binomial coefficients.
(In both cases, the algorithm is not as bad as the naive recursion,
but by far not as good as computing the known formulas.)
Sequence alignment

• A problem with many applications: for example the diff


program, biology, voice recognition.
• Two sequences of symbols from some alphabet S:
X = (x 1 , . . . , x m ), and Y = ( y1 , . . . , yn ).
An alignment is given by two sets of indices

1 ≤ i1 < i2 < · · · < ik ≤ m and 1 ≤ j1 < j2 < · · · < jk ≤ n

such that x ip is matched to y jp , for all p = 1, . . . , k.


• Example: match the words “kolor” and “colour”. We could
match the bold characters: kolor with colour.
There is a distance d(r, s) ≥ 0 between symbols r, s ∈ S, which is
also the penalty for matching them with each other. There is also a
penalty δ ≥ 0 for every symbol unmatched (passed), so the total
penalty is
k
X
((m − k) + (n − k))δ + d(x ip , y jp ).
p=1

Example: Let δ = 1, and d(r, s) = 0 if r = s and 2 otherwise.


In matching kolor with colour, the total penalty is 1 + 2, since
there is one unmatched symbol “u” and one mismatched pair (k,c).
A larger example

X = (r, p, q, q, r, a, x, y, b, b, x, y, a, b, w, v),
Y = (p, q, q, r, a, x, b, y, y, u, v, w, v).

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
r p q q r a x y b b x y a b w v
p q q r a x b y y u v w v
1 2 3 4 5 6 7 8 9 10 11 12 13

i = 2 3 4 5 6 7 10 11 12 15 16
j = 1 2 3 4 5 6 7 8 9 12 13
Black symbols matched with no penalty. Red symbols matched,
penalty 1. Blue symbols passed, penalty 1 for each.
Total penalty: 8.
Dynamic programming algorithm

Let A[i, j] be the minimum penalty for matching x 1 . . . x i with


y1 . . . y j . Of course, A[0, 0] = 0, A[0, j] = jδ, A[i, 0] = iδ for
i, j > 0. Recursion:

A[i, j] = min of the three quantities below :


d(x i , y j ) + A[i − 1, j − 1], matching x i , y j
δ + A[i − 1, j], passing x i
δ + A[i, j − 1]. passing y j

This allows to fill in the array A[i, j] for example row-by-row.


A[i, j] = min(d(x i , y j ) + A[i − 1, j − 1], δ + A[i − 1, j], δ + A[i, j − 1]).

c o l o u r
0 ← 1 2 3 4 5 6
k ↑ - ↑
1 ← 2 3 4 5 6 7
o -
2 3 2 3 4 5 6
l -
3 4 3 2 3 4 5
o -
4 5 4 3 2 ← 3 4
r -
5 6 5 4 3 4 3
c o l o u r
0 ← 1 2 3 4 5 6
k ↑ - ↑
1 ← 2 3 4 5 6 7
o -
2 3 2 3 4 5 6
l -
3 4 3 2 3 4 5
o -
4 5 4 3 2 ← 3 4
r -
5 6 5 4 3 4 3
(Choice at position (1, 1).) Find the alignment S going backwards:

i, j ← m, n; S = ;
while i > 0, j > 0 do
(i 0 , j 0 ) ← one of (i − 1, j − 1), (i, j − 1), (i − 1, j)
which gives the minimum A[i, j] above.
if (i 0 , j 0 ) = (i − 1, j − 1) then S ← S ∪ (i, j)
(i, j) ← (i 0 , j 0 )
Graph representation
y1 y2 y3 y4
0 1 2 3 4
0

x1
For x 1 , . . . , x m and y1 , . . . , yn .
1
Points (i, j) where i represents
x2
the position between x i and x i+1 .
2
x3

• Horizontal and vertical edges have length δ.


• Diagonal edge ending in (i, j) has length d(x i , y j ).
• Now A[m, n] is the length of the shortest path from the left top
to the right bottom. (Our algorithm is not slower than Dijkstra’s
for computing it.)
• Find the alignment going backwards using A[i, j], or record the
parents while computing the distances A[i, j] (as in Dijkstra).
Linear space via divide and conquer

(Not covered in Spring 2020)


• Computing A[·, ·] can be done in linear space, since we need
only two neighboring rows at a time. But we don’t only need
the number A[m, n], we also want to know the optimal
alignment!
• New idea: first just find the crossing point of the optimal path
in the middle column.
f (i, j) = the length of shortest path from (0, 0) to (i, j).
g(i, j) = the length of shortest path from (i, j) to (m, n).
Observation: for any k,

A[m, n] = min f (q, k) + g(q, k).


q

The point (q, k) where the minimum is achieved belongs to a


shortest path.
Space-saving alignment algorithm

Let a linear-space algorithm AlignmentCosts(X , Y ) return just the


one-dimensional array A[1 : m, n].

DC-Align(X , Y ) : // returns a shortest path P


if m ≤ 2 or n ≤ 2 then compute directly
f [1 : m] ← AlignmentCosts(X [1 : m], Y [1 : n/2])
g[1 : m] ← AlignmentCosts(X [1 : m], Y [n/2 + 1 : n])
q ← arg min f [q] + g[q]
P1 ← DC-Align(X [1 : q], Y [1 : n/2])
P2 ← DC-Align(X [q + 1 : n], Y [n/2 + 1 : n])
return P1 + (q, n/2) + P2

The space requirement is linear, since each recursive call reuses the
space of the earlier ones. (Exercise: prove it rigorously.)
Analysis

The time analysis is more complex. We have (assuming n is a


power of 2 for transparency)

T (m, 2) ≤ cm = (c/2) · 2m,


T (m, n) ≤ cmn + max(T (q, n/2) + T (m − q, n/2)).
q

We took the worst possible q.


Let us guess that the total time is still ≤ d mn for some constant d.
Find out how large must d be for the proof to work.
The base case works if d ≥ c/2.
By the above inequality and the inductive assumption

T (mn) ≤ cmn + dqn/2 + d(m − q)n/2 = (c + d/2)mn.

So if d ≥ 2c then also T (m, n) ≤ d mn by the induction step.


Shortest paths with possible negative costs

We have seen a shortest (smallest-cost) path algorithm already


(Dijkstra), but it could not handle negative costs.
Negative cycle: Consider a merchant: on certain edges he spends
money on traveling and buying merchandise, on others he
makes profit by selling (negative cost). A cycle is profitable if
the sum of edge costs along it is negative.
• If there is such a cycle then going around it repeatedly we
can drive the cost towards −∞.
• If there is no such cycle, then it is sufficient to look for
paths of length ≤ n − 1.
Distances from s:
-1 0 -4
-1 1 -3
s
2 2 -6

-3
-1 2

Distances to t:
-4 -3 -4 0
-1 1 -3 t
2 2 -6

-3
-6
-2

Wanted: an algorithm that in a graph G = (V, E) with a cost


function cuv and no negative cycle, for any start and end nodes
s, t, provides a smallest-cost path.
Idea: Compute for all v, i the length

M [i, v]

of the smallest-cost path from v to t using at most i edges.


• If i = 0 then M [i, t] = 0 and M [v, t] = ∞ for v 6= t.
• If i > 0 then the following recursive relation holds:

M [i, u] = min(M [i − 1, u], min (cuv + M [i − 1, v])),


(u,v)∈E

allowing to fill the an array for M [i, v].


• The running time is O(nm) where m is the number of edges.
Indeed, each edge is touched at most n times.
Improving space-efficiency
We can use the space of M [i − 1, v] to store M [i, v], and call it
simply M [v]. The number i is used now just as a counter of
repetition.
first[u] stores the first node of the current smallest-cost path from u
(like a parent link).

M [t] ← 0
for v 6= t do M [v] ← ∞
for i = 1 to n do
didImprove ← False
for u ∈ V do
for v : (u, v) ∈ E do
if cuv + M [v] < M [u] then
M [u] ← cuv + M [v]
first[u] ← v
didImprove ← True
if not didImprove then break
• This will stop at some i < n if M [v] does not change for any v.
• Following the (u, first[u]) edges we get a smallest-cost path to t.
• If we get into a cycle then it is a negative cycle.
• Computing the shortest path s = v0 , v1 , . . . , vk = t:

vi+1 = arg min w(vi , u) + M [u].


u

Then of course M [vi ] = w(vi , vi+1 ) + M [vi+1 ].

-4 -3 -4 0
-1 1 -3 t
2 2 -6

-3
-6
-2
The knapsack problem

Going to the market, to sell some items in my inventory. My


knapsack has volume b.
• Given: volumes b ≥ a1 , . . . , an > 0 of the items 1, 2, . . . , n, and
their integer values v1 ≥ · · · ≥ vn > 0.
• Find a subset i1 < · · · < ik of them fitting into the knapsack:
ai1 + · · · + aik ≤ b.
Maximize the sum of their values vi1 + · · · + vik .
• Expressed differently: x i ∈ {0, 1}, where x i = 1 means picking
item i.

maximize v1 x 1 + · · · + vn x n
subject to a1 x 1 + · · · + an x n ≤ b,
x i = 0, 1, i = 1, . . . , n.
Special cases

Subset sum problem find i1 , . . . , ik with ai1 + · · · + aik = b.


Obtained by setting vi = ai . Now if there is a solution with
value b, we are done.
Partition problem Given numbers a1 , . . . , an , find i1 , . . . , ik such
that ai1 + · · · + aik is as close as possible to (a1 + · · · + an )/2.
• The solution given here differs from the one in
Kleinberg-Tardos; helps later with the approximation algorithm.
• Compute for each integer p the smallest volume

mn (p)

that gives value ≥ p. (Upper bound on the total value:


V = v1 + · · · + vn .)
• The optimum is max{ p : mn (p) ≤ b }.
Dynamic programming solution

Subproblem using only the first k items:

mk (p) = min{ a1 x 1 + · · · + ak x k ≤ b : v1 x 1 + · · · + vk x k ≥ p }.

If the set is empty the minimum is ∞.


Memoization: array m[k, p] = mk (p). Notation |x|+ = max(x, 0).

for k = 0 to n do m[k, 0] ← 0
for p = 0 to V do
if p > 0 then m[0, p] = ∞
for k = 1 to n do
m[k, p] ← min(m[k − 1, p], ak + m[k − 1, |p − vk |+ ])

The min decides whether to include item k or not.


Another application: money changer problem. Produce the sum b
using smallest number of coins of denominations a1 , . . . , an (at
most one of each). Corresponds to a case of the knapsack problem
in which the volumes are all 1, and the values are ai .
Example

Let {v1 , . . . , v5 } = {1, 2, 4, 6, 10}, ai = vi + 1, b = 13.


Array m[k, p]. Recall

m[k, p] ← min(m[k − 1, p], ak + m[k − 1, |p − vk |+ ]).

The empty spaces in the array below have value ∞ (no items).

k\p 1 2 3 4 5 6 7 8 9 10 11 12 . . .
1 2
2 2 3 5
3 2 3 5 5 7 8 10
4 2 3 5 5 7 7 9 10 12 12 14
5 2 3 5 5 7 7 9 10 11 11 13 14

Complexity: O(nV ) steps, (counting additions as single steps).


Is the running time polynomial?

Assume that ai and b are also integers.


• A time complexity bound on an algorithm is generally given as
a function of the length of input (measured in bits).
• Each number vi written in binary has length dlog vi e; a number
of length m has size > 2m−1 .
• The length of the input here is—essentially—the sum of the
lengths of the numbers: a1 , a2 , . . . , an , b, v1 , . . . , vn :
X X
L≤ log ai + log b + log vi + 2n + 1.
i i

Assume log vi , log ai ≤ m, then L ≤ (m + 1)(2n + 1).


• We compute a table of size V n > 2m n.
In case m = n this is V n ≥ 2m m, exponential in input size L.
So our algorithm is very expensive when the numbers vi involved
are large.
• An algorithm that is polynomial as a function of the size of the
numbers in the input (as opposed to their representation
length) is called pseudo-polynomial. The dynamic
programming algorithm for the knapsack problem is such an
algorithm.
• No polynomial algorithm is known for knapsack, not even for
the subset sum problem: we will see that these problems are
really hard (NP-complete).
• On the other hand, our algorithm can be adapted to
approximate the optimum to within a factor of 1 + ", in time
polynomial in (the input size and) 1/".
See later in the course.
Matchings

Example (Workers and jobs) Suppose that we have n workers


and n jobs. Each worker is capable of performing some of the jobs.
Is it possible to assign each worker to a different job, so that
workers get jobs they can perform?

It depends. If each worker is familiar only with the same one job
(say, digging), then no.
• Bipartite graph: left set A (workers), right set B (jobs).
• Matching, perfect matching.

N(S)
S
Example with no perfect
matching:

For S ⊆ A let

N(S) ⊆ B

be the set of all neighbors of the nodes of A. Perfect matching


clearly needs the no bottleneck property:
For every S ⊆ A we have |N(S)| ≥ |S|.
Example (No bottleneck) 6 tribes partition an island into
hunting territories of 100 square miles each. 6 species of tortoise,
with disjoint habitats of 100 square miles each.
Can each tribe pick a tortoise living on its territory, with different
tribes choosing different totems?

No bottleneck here: the combined hunting area of any k tribes


intersects with at least k tortoise habitats.
Example (No bottleneck) At a dance party, with 300 students,
every boy knows 50 girls and every girl knows 50 boys. Can they
all dance simultaneously so that only pairs who know each other
dance with each other?

Theorem If every node of a bipartite graph has the same degree


d ≥ 1 then it contains a perfect matching.

• Bipartiteness is necessary, even if all degrees are the same.

• Bipartiteness and positive degrees is insufficient.


The no bottleneck property is also sufficient:

Theorem (The Marriage Theorem)


A bipartite graph has a perfect
matching if and only if |A| = |B| N(S)
and for every S ⊆ A we have S
|N(S)| ≥ |S|.

Proposition The condition


implies the same condition for all
S ⊆ B.

Prove this as an exercise.


Flow networks
A model, generalizing the matching problem:
• Directed graph. Source s, sink t.
• Capacity: c(u, v) on all edges (u, v) showing the amount of
material that can flow from u to v. (We may have
c(u, v) 6= c(v, u).) If there is no edge (u, v) then set c(u, v) = 0.
• Flow function: f (u, v) ≤ c(u, v) on all edges (u, v) showing the
amount of material going from u to v. No point of sending both
from u to v and v to u: define f 0 (u, v) = f (u, v) − f (v, u), then

f 0 (v, u) = − f 0 (u, v).

Our examples we will always show f 0 in place of f , and only in


its positive direction.
• What goes into a vertex u different from s, t, also goes out:
X
f 0 (u, v) = 0.
v
12/12
v1 v3
12/20
9/16

s 3/4 9 7
t

3/13 4
v2 v4
14

The notation f /c means flow f along an edge with capacity c.


• Our goal is to maximize the value | f | = v f (s, v).
P
Application to matching

s t

• n points on left, n on right. Edges directed to right, with unit


capacity from s to A and from B to t, and any capacity ≥ 1
(even ∞) between A and B.
• Perfect matching → flow of value n.
• Flow of value n → perfect matching?
Example Recall the theorem saying that if the bipartite graph is
regular then there is always a perfect matching.
I the dance party where every boy knows 50 girls and every girl
knows 50 boys, in the corresponding flow network, we could just
send a flow of 1/50 along every edge from boys to girls. This is
maximum flow does not give us a matching.

Fortunately (as will be seen), there is always an integer maximum


flow.
Residual network, augmenting path

Given a flow f , residual capacity


12/12
v1 v3
0
c f (u, v) = c(u, v) − f (u, v) 9/16
18/20

= c(u, v) − f (u, v) + f (v, u). s 3/4 9 6/7


t

13/13 4/4
The residual network G f may v2 v4
10/14
have edges (with positive
12
v1 v3
capacity) that were not in the 9
18
original network. An augmenting 7
s 3 9 2
path is an s-t path in G f (with 13
1 6 1
10 t
some flow along it). (How does it v2 v4 4

change the original flow?) 4

G f has the edges along which flow can still be sent.


If f (u, v) > 0 then sending from v to u means decreasing f (u, v).
We obtained:

12/12
v1 v3
19/20
10/16

s 2/4 9 7/7 t

13/13 4/4
v2 v4
11/14

This cannot be improved: look at the cut (S, T ) with T = {v3 , t}.
Cuts

Cut (S, T ) is a partition


P of V with s ∈ S, t ∈ T .
Net flow f (S, T ) = P u∈S,v∈T f (u, v).
Capacity c(S, T ) = u∈S,v∈T c(u, v). Obviously, f (S, T ) ≤ c(S, T ).

12/12
v1 v3
12/20
9/16

s 3/4 9 7
t

3/13 4
v2 v4
14
S T

In this example, c(S, T ) = 26, f (S, T ) = 12.


Lemma f (S, T ) = | f |, the value of the flow.

Corollary The value of any flow is bounded by the capacity of


any cut.
Theorem (Max-flow, min-cut) The following properties of a
flow f are equivalent.
1 | f | = c(S, T ) for some cut (S, T ).
2 f is a maximum flow.
3 There are no augmenting paths to f .

The equivalence of the first two statements says that the size of the
maximum flow is equal to the size of the minimum cut.
Proof: 1 ⇒ 2 and 2 ⇒ 3 are obvious. The crucial step is 3 ⇒ 1 .
Given f with no augmenting paths, we construct (S, T ): let S be
the nodes reachable from s in the residual network G f .
Proof of the marriage theorem

Using Max-Flow Min-Cut. Assume there is no perfect matching in


the bipartite graph G = (A ∪ B, E), with |A| = |B| = n. We find a
bottleneck H ⊆ A with |N(H)| < |H|.
Flow network over A ∪ B ∪ {s, t} as before. Since there is no perfect
matching, the maximum flow has size < n. So there is a cut (S, T ),
s ∈ S, t ∈ T , with c(S, T ) < n.
t
Let H = S ∩ A, H 0 = N(H). We have
c(H, H 0 ∩ B) ≥ |H 0 ∩ T | as each element
H� ∩ T
of H 0 gets at least one edge from H.
H� ∩ S

n > c(S, T )
= c({s}, T ) + c(H, H 0 ∩ T ) + c(H 0 ∩ S, {t})
T ∩A ≥ n − |H| + |H 0 ∩ T | + |H 0 ∩ S|
H =S∩A = n − |H| + |H 0 |,
s |H| > |H 0 |.
Efficient flow algorithms

• Does the Ford-Fulkerson algorithm terminate? Not necessarily


(if capacities are not integers), unless we choose the
augmenting paths carefully.
• Integer capacities: always terminates, but may take
exponentially long.
Network derived from the bipartite matching problem: each
capacity is 1, so we terminate in polynomial time.
• Dinic-Edmonds-Karp: use breadth-first search for the
augmenting paths. We will analyze it, but the following is more
efficient:
• Goldberg: Push-relabel algorithm. Push as much pre-flow as the
capacities bear, accumulating excess in the nodes. Push back the
excesses. The process is regulated by a height function (labels).
Goldberg algorithm

• Goldberg’s algorithm is “greedy”: it will push so much along the


edges that it will P
initially violate the flow property. Excess at
point v: e f (v) = u f (u, v).
• The flow function is called an s-t pre-flow if all excesses other
than at s are nonnegative. (It is a flow if e f (v) = 0 for all v 6= t.)
Goldberg’s algorithm:
• Will work with pre-flows without an augmenting path.
• Keeps adjusting the pre-flows until all excess is eliminated: with
no augmenting path, we reach optimum.
• Integer labeling function h(v) of nodes called the height.
Let n be the number of nodes.
Following properties will be maintained:
Source and sink heights h(s) = n, h(t) = 0.
Steepness bound If an edge (u, v) is in the residual network G f
then h(u) ≤ h(v) + 1. (Thus if h(v) − h(u) > 1 the edge (u, v) is
saturated.)
These imply that there is no augmenting path, since the height
could not sink fast enough along it.
Initialization h(s) ← n, and h(v) ← 0 for all v 6= s.
f (e) ← c(e) for every edge leaving s,
f (e) ← 0 for all other e.
Pushing or relabeling While there is a node u 6= s, t with
e f (u) > 0
(for additional efficiency choose one with the highest h(u)):

if there is an edge (u, v) in the residual network G f with


h(u) > h(v) then
push as much of the excess into f (u, v) as c(u, v) allows
else
h(u) ← h(u) + 1

Maintains the source and sink heights and the steepness bound.
• Termination in O(n3 ) steps: proof below.
Example
8 5 3 8
A path a −
→b− →c− →d− → e.
Below, we will write 3(5) for a vertex with height 3 and excess 5.

8 5 3 8
a −
→ b −
→ c −
→ d −
→ e
8/8 5 3 8
5 −−→ 0(8) −
→ 0 −
→ 0 −
→ 0
8/8 5/5 3 8
5 −−→ 1(3) −−→ 0(5) −
→ 0 −
→ 0
5/8 5/5 3 8
5 −−→ 6 −−→ 0(5) −
→ 0 −
→ 0
5/8 5/5 3/3 8
5 −−→ 6 −−→ 1(2) −−→ 0(3) −
→ 0
5/8 3/5 3/3 8
5 −−→ 6(2) −−→ 7 −−→ 0(3) −
→ 0
3/8 3/5 3/3 8
5 −−→ 6 −−→ 7 −−→ 0(3) −
→ 0
3/8 3/5 3/3 3/8
5 −−→ 6 −−→ 7 −−→ 1 −−→ 0(3)
5/8

(d)
(a) 5/5
8/8

3/3
5 3 8 8

3/5

5/8
(e)

3/3
8/8
(b)

5/5 8
3 8

3/5
5/8 3/8

3/3 (f-g)
5/5 (c)

3/8
3 8
Push-relabel on the example

(Looks better in presentation mode.)


0(16) 0
12
v1 v3
20
16/16

6 s 4 9 7
t 0

13/13 4
v2 v4
14
0(13) 0
Push-relabel on the example

(Looks better in presentation mode.)


1(4) 0(12)
12/12
v1 v3
20
16/16

6 s 4 9 7
t 0

13/13 4
v2 v4
14
0(13) 0
Push-relabel on the example

(Looks better in presentation mode.)


7 0(12)
12/12
v1 v3
20
12/16

6 s 4 9 7
t 0

13/13 4
v2 v4
14
0(13) 0
Push-relabel on the example

(Looks better in presentation mode.)


7 0(12)
12/12
v1 v3
20
12/16

6 s 4 9 7
t 0

13/13 4
v2 v4
13/14
1 0(13)
Push-relabel on the example

(Looks better in presentation mode.)


7 1
12/12
v1 v3
12/20
12/16

6 s 4 9 7
t 0

13/13 4
v2 v4
13/14
1 0(13)
Push-relabel on the example

(Looks better in presentation mode.)


7 1
12/12
v1 v3
12/20
12/16

6 s 4 9 7
t 0

13/13 4/4
v2 v4
13/14
1 1(9)
Push-relabel on the example

(Looks better in presentation mode.)


7 1(7)
12/12
v1 v3
12/20
12/16

6 s 4 9 7/7
t 0

13/13 4/4
v2 v4
11/14
1(2) 2
Push-relabel on the example

(Looks better in presentation mode.)


7 1
12/12
v1 v3
19/20
12/16

6 s 4 9 7/7
t 0

13/13 4/4
v2 v4
11/14
1(2) 2
Push-relabel on the example

(Looks better in presentation mode.)


7 1
12/12
v1 v3
19/20
12/16

6 s 4 9 7/7
t 0

13/13 4/4
v2 v4
11/14
7(2) 8

(After several back-and-forths between v2 and v4 .)


Push-relabel on the example

(Looks better in presentation mode.)


7 1
12/12
v1 v3
19/20
12/16

6 s 4 9 7/7
t 0

11/13 4/4
v2 v4
11/14
7 8
Claim If e f (u) > 0 then there is an augmenting path from u to s.

Indeed, let B be the set of vertices u with no augmenting path from


u to s. No flow comes into B, but then no excess can be created in
B, the sum of excesses is 0.
• The claim and the steepness bound imply h(u) ≤ h(s) + n − 1,
hence h(u) ≤ 2n − 1. Hence the number of relabeling
operations is bounded by (n − 2)(2n − 1) ≤ 2n2 .
A push operation on edge (u, v) is saturating if it results in
f (u, v) = c(u, v).

Claim On each edge (u, v) there are at most n saturating pushes.


So the total number of saturating pushes is ≤ 2mn.

Indeed, between each two saturating pushes, the height must


increase by at least 2 (at the push in the opposite direction
h(v) > h(u)). Now recall the bound 2n − 1 on maximum height.
• We will bound by 4n3 the number of non-saturating pushes.

Claim

a At each value H of the maximum height of nodes with excess,


from each node u of height H there is at most one
non-saturating push.
b H changes at most 4n2 times.

To prove a : a nonsaturating push eliminates the excess of u, and u


can get a new excess only from a neighbor with height above h(u).
To prove b : H can also decrease; however, it can increase only by
a relabel operation, of which there are < 2n2 .
p
• More sophisticated analysis shows a bound O(n2 m) in place
of O(n3 ).
Edmonds-Karp algorithm

(Not covered in Spring 2020)

Lemma
In the Edmonds-Karp algorithm, the shortest-path distance δ f (s, v)
increases monotonically with each augmentation.

Proof: Let δ f (s, u) be the distance of u from s in G f , and let f 0 be


the augmented flow. Assume, by contradiction δ f 0 (s, v) < δ f (s, v)
for some v: let v be the one among these with smallest δ f 0 (s, v).
Let u → v be be a shortest path edge in G f 0 , and

d := δ f (s, u)(= δ f 0 (s, u)), then δ f 0 (s, v) = d + 1.

Edge (u, v) is new in G f 0 ; so (v, u) was a shortest path edge in G f ,


giving δ f (s, v) = d − 1. But δ f 0 (s, v) = d + 1 contradicts
δ f 0 (s, v) < δ f (s, v).
An edge is said to be critical, when it has just been filled to capacity.

Lemma Between every two times that an edge (u, v) is critical,


δ f (s, u) increases by at least 2.

Proof: When it is critical, δ f (s, v) = δ f (s, u) + 1. Then it disappears


until some flow f 0 . When it reappears, then (v, u) is critical, so

δ f 0 (s, u) = δ f 0 (s, v) + 1 ≥ δ f (s, v) + 1 = δ f (s, u) + 2.

Corollary We have a polynomial algorithm.

Proof: Just bound the number of possible augmentations, noticing


that each augmentation makes some edge critical.
Let n = |V |, m = |E|. Each edge becomes critical at most n/2 times.
Therefore there are at most m · n/2 augmentations. Each
augmentation may take O(m) steps: total bound is

O(m2 n).

There are better algorithms: Goldberg’s push-relabel algorithm,


also given in your book, achieves O(n3 ).
Project selection

Network flow theory has many applications. Sometimes it takes


ingenuity to apply it: see the following example.
• Set of possible projects to choose from: P = {1, 2, . . . , n}.
Project i brings profit pi : positive or negative (then it is a cost).
• Dependencies: acyclic directed graph G = (P, E).
Edge (i, j): project i requires project j, too.
• Not a time ordering. Example:
4: a wedding shower in which we could collect p4 = 1000
dollars, but then we have to:
• 5: buy food beforehand for −p5 = 200 dollars, and
• 8: clean up afterwards for −p8 = 250 dollars.
-2
-1
-3

9
-2

-5
-2

-4
9
6

3 7

A subset S ⊆ P is feasible if with every project in it, it contains all


others on which it depends: u ∈ S, (u, v) ∈ E ⇒ v ∈ S.
Example: the set of green projects.
Goal: A feasible set S with maximum total profit p(S) = i∈S pi .
P
The solution introduces a flow network. Add source and sink s, t.
Capacities:
1 Edges (i, j) ∈ E have capacity c(i, j) = ∞.
2 Edges (s, i) with pi > 0 have capacity c(s, i) = pi .
3 Edges ( j, t) with p j < 0 have capacity c( j, t) = −p j .
A cut S, T has finite capacity if and only if S 0 = S \ {s} is feasible.
t

-2
-1
-3

9
-2

-5
-2

-4
9
6

3 7

Cut with capacity (7 + 9) + (1 + 2 + 3 + 4 + 2).


If S is feasible:
X X
c(S, T ) = pi − pj. (5)
i∈T :pi >0 j∈S:p j <0

Obvious upper bound on the total profit: C = pi .


P
i:pi >0

Claim p(S 0 ) = C − c(S, T ).

Indeed: the first sum of (5) is the amount of profits we lose, and
the second sum is the amount of the costs we incur.
• To maximize the profit, find a minimum cut.
Randomized algorithms

• A randomized algorithm uses some source of randomness as its


input, in addition to the input data. Surprisingly, this frequently
helps.
• There are cases when no deterministic algorithm can solve a
certain problem, but more frequently, bringing in randomness
results in a more efficient algorithm.
• We will have to learn (or recall) some facts from basic
probability theory along the way.
Contention resolution

• Assume that n processes P1 , . . . , Pn must access a database. This


happens in rounds. In each round, only one access is possible:
if more than one process makes an attempt, they all fail. There
is no communication between them, and no coordinator to help
them.
• Idea: each process makes an attempt an in each round, with
some probability 0 < p < 1. (say p = 0.1). We want to estimate
the time it takes for all processes to succeed with high
probability.
Events and their probabilities

• When we randomize, certain events acquire probabilities. The


probability of event A is denoted generally by Pr(A).
• For example, let event A(i, t) happen if process Pi makes an
attempt at time t. By definition, Pr(A(i, t)) = p.
• For an event A, let ¬A be the event that A does not happen.
Then Pr(¬A) = 1 − Pr(A). For example, the probability that
process Pi does not make an attempt at time t is 1 − p.
• For events A, B, let A ∪ B be the event that at least one of these
happens. Knowing Pr(A) and Pr(B) does not generally suffice
to know Pr(A ∪ B), but at least we know the union bound

Pr(A ∪ B) ≤ Pr(A) + Pr(B).

This becomes an equality for mutually exclusive events, A, B,


that is if A ∩ B = ; (see next).
• For two events A, B we write A ∩ B for the event that occurs if
both A and B occur. If Pr(A) > 0 then we denote by

Pr(A ∩ B)
Pr(B|A) =
Pr(A)

the conditional probability that B occurs provided that A


occurs.
For example, the conditional probability that the six-sided die
comes up with an even number of points provided it shows
more than 1, is 3/5.
• We say that B is independent of A if Pr(B|A) = Pr(B), that is if

Pr(A ∩ B) = Pr(A) · Pr(B).

In general, knowing the probabilities of A and B does not allow


yet finding the probability of A ∩ B. But in this case it does.
If A is independent of B then for example also ¬A is
independent of B: the answer of any question about A is
independent on the answer of any question about B.
• We say that event C is independent of events A, B if its
conditional probability is the same no matter what we assume
about A, B. So

Pr(C) = Pr(C|A ∩ B) = Pr(C|A ∩ (¬B)) = Pr(C|B) = · · ·

We say that the set of events A, B, C is independent if each of


them is independent of the rest.
• This is equivalent to saying that no matter what we ask about
A, B and C, the probability of the combined event is the
product of the probabilities of its constituents.
For example, we assume that each process attempts at time t
independently with probability p. Then the probability that
process 1 attempts and processes 2,3 don’t is

Pr(A(1, t) ∩ ¬A(2, t) ∩ ¬A(3, t)) = p(1 − p)(1 − p).


Bayes’s Theorem

Let B1 , . . . , Bn be mutually exclusive events of positive probability


such that Pr(B1 ) + · · · + Pr(Bn ) = 1. Then for an arbitrary event A
the following holds:

Pr(A) = Pr(A | B1 )Pr(B1 ) + · · · + Pr(A | Bn )Pr(Bn ).

This fact is sometimes called Bayes’s Theorem, or the theorem of


total probability.
What is the probability of the event S(i, t) that at time t process Pi
attempts and the other processes don’t (so Pi succeeds)?

A( j, t)) = p(1 − p)n−1 .


\
Pr(S(i, t)) = Pr(A(i, t) ∩
j6=i

The best strategy is to choose the value p that maximizes this.


Calculus shows to choose p = 1/n, and then we get

1 1 n−1
 ‹
Pr(S(i, t)) = 1− .
n n
We will estimate this also using calculus.
Estimating (1 + x) for products

Below we will explain the following two important inequalities


from calculus:
x
e 1+x ≤ 1 + x ≤ e x . (6)

Now, let us just apply them. Writing x = −1/n gives


1 1 1
e− n−1 ≤ 1 − ≤ e− n . (7)
n
Hence

e−1 ≤ (1 − 1/n)n−1 ,
1 1 1 n−1
 ‹
≤ 1− = Pr(S(i, t)).
en n n
Let F(i, t) be the event that process
Tt Pi does not succeed in any of
the rounds 1, . . . , t: F(i, t) = r=1 ¬S(i, r). The events at different
times are also independent of each other:
t
Y
Pr(F(i, t)) = (1 − Pr(S(i, r))) ≤ (1 − 1/en) t .
r=1

Using the second inequality of (7):

Pr(F(i, t)) ≤ (1 − 1/en) t ≤ e−t/en .


Sn
Let F t = i=1 F(i, t) be the event that some process does not
succeed by time t ≥ k · en. By the union bound:

Pr(F t ) ≤ n · e−k .

Choosing for example k = 2 ln n, we get the probability bound


n · n−2 = 1/n. So if not every process succeeded within 2en ln n
steps, then some rare disaster of probability < 1/n happened.
Estimating 1 + x

The above application is typical for the calculations you encounter


in probability theory. When many events are involved, the formulas
needed to calculate the probabilities become complex. The tools of
calculus are used to approximate them. Now let us prove the
inequalities (6) used above.
The important inequality 1 + x ≤ e x comes from the convexity of
the exponential function: the tangent line y = 1 + x of the curve
y = e x is below it. Inverting the same inequality gives a bound
from the other side:
1 x x
=1− ≤ e− 1+x ,
1+ x 1+ x
x
1+ x ≥ e .
1+x
Random variables

In a probability space, a random variable is some quantity X such


that events of the kind a ≤ X < b have probabilities assigned to
them.

Example We toss a 6-headed die twice. Let X i be the number of


points coming up in the ith toss. Then Pr { X i = j } = 1/6 for
j = 1, . . . , 6. Let Y = X 1 + X 2 . Then

i 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 5 4 3 2 1
Pr { Y = i } 36 36 36 36 36 36 36 36 36 36 36
Expected value

If the outcome of some experiment is a number X that can have


values x 1 , x 2 , . . . with probabilities p1 , p2 , . . . respectively, then the
expected value of X is defined as E X = p1 x 1 + p2 x 2 + . . ..

Examples
• If Z is a random variable whose values are the possible
outcomes of a toss of a 6-sided die, then

E Z = (1 + 2 + 3 + 4 + 5 + 6)/6 = 3.5.

• If Y is the random variable that is 1 if Z ≥ 5, and 0 otherwise,


then

E Y = 1 · Pr { Z ≥ 5 } + 0 · Pr { Z < 5 } = Pr { Z ≥ 5 } .
Sum theorem

Theorem For random variables X , Y (on the same sample


space):

E(X + Y ) = E X + E Y.

Example For the number X of spots on top after a toss of a die,


let A be the event 2|X , and B the event X > 1. Dad gives me a
dime if A occurs and Mom gives one if B occurs. What is my
expected win?
Let IA be the random variable that is 1 if A occurs and 0 otherwise.

E(IA + IB ) = E IA + E IB = Pr(A) + Pr(B) = 1/2 + 5/6 dimes.


Chebyshev-Markov inequality

We want to estimate the probability that runtime is much larger


than the expected value. The following theorem helps:

Theorem (Chebyshev-Markov inequality) Let X ≥ 0 be a


random variable with E X = µ, and let c > 0 be any constant. Then

Pr { X > cµ } ≤ 1/c.

Indeed, let X 0 = 0 when X ≤ cµ and X 0 = cµ when X > cµ. Then

cµ · Pr { X > cµ } = E X 0 ≤ E X = µ.

Example: the probability that some randomized algorithm takes


longer than 40n is less than 1/10, since the expected time is
bounded by 4n. (There are much better estimates for this
probability, but they use similar principles.)
Geometric random variable
• How many times do you have to toss a 6-sided die to have the
number 2 come up? The probability that it comes up is 1/6
each time. The number of tosses needed is a random variable.
• In general, for some probability p > 0, consider repeated
independent experiments in which the probability of success is
always p. Let T be the (random) number of experiments
needed until the first success. Then

Pr { T = k } = p(1 − p)k−1 .

This is called the geometric random variable.


• Many applications need the expected value of T . It is

X
ET = kp(1 − p)k−1 = 1/p.
k=1

So the expected number of throws of the die is 6.


Conditional expectation
Using conditional probabilities, one can also define the conditional
expected value E[X | A] of some random variable X with respect to
some event A of positive probability. The analogue of Bayes’s
Theorem holds for this case. Let B1 , . . . , Bn be mutually exclusive
events of positive probability such that Pr(B1 ) + · · · + Pr(Bn ) = 1.
Then for an arbitrary random variable X we have

E X = E[X | B1 ]Pr(B1 ) + · · · + E[X | Bn ]Pr(Bn ). (8)

Example Toss a die. If 1 comes up (event E) then toss it until an


even number comes up. Otherwise toss it until a 5 or 6 comes up.
Let S be the number of tosses after the first one to finish.

E S = E[S | E]Pr(E) + E[S | ¬E]Pr(¬E)


= 2 · (1/6) + 3 · (5/6).
Randomized caching

• A cache in a computer is a part of memory with very fast access


time. It contains a limited number k of items of the same size.
• The runtime system faces a sequence of requests for memory
items: σ = s1 , s2 , . . . , sn .
• If a memory item s is requested that is not in the cache then this
is a cache miss. Then it takes much longer to bring s into the
cache. Also, it will evict some other item from the cache.
• Next time when s is requested again, it can be accessed fast
(provided it still has not also been evicted from the cache).
• Looking for a good eviction policy, minimizing the number of
cache misses.
• In case the whole sequence σ of requests is known then there is
an optimal policy: evict an item that will be requested farthest
in the future. (Its optimality was proved in the greedy
algorithms section of Kleinberg-Tardos.)
• This policy is offline: it must know all future requests in
advance. Normally this sequence is not known in advance: we
need an online policy, deciding only on the basis of the past, not
the future.
• Example: evict the least recently used item (LRU).
• Every online policy will be good on some sequences and bad on
others. How to compare them?
Idea: compare the performance to that of the ideal offline policy
(evict the item requested farthest in the future). We call this
number of cache misses the offline optimum.

Claim For every (deterministic) online policy there is a sequence


σ that produces a number of cache misses k times larger than the
offline optimum.

Indeed, let the sequence σ = s1 s2 , . . . , sn consist of repeated


requests to items 1, 2, . . . , k + 1.
• If si+1 is made to be the item evicted at time i then every
request results in a cache miss.
• But the farthest-in-future policy will can always answer at least
k − 1 requests before the next cache miss.
Here is a randomized eviction policy. We will prove that its
expected number of cache misses is at most O(log k) times larger
than the offline optimum.
• The policy will mark some of the items in the cache. It works as
follows, when an item x is requested.

if x is not in the cache and all items in the cache are marked
then
remove all marks
mark x
if x is not in the cache then
evict a random unmarked item
Example Cache size k = 4.
The requests: 1, 8, 9, 4, 1, 3, 4, 3, 5, 6, 1.

shows marking. Cache and request:

(1, 8, 9, 4) ← 1, (1∗ , 8, 9, 4) ← 3,
(1∗ , 8, 3∗ , 4) ← 4, (1∗ , 8, 3∗ , 4∗ ) ← 3,
(1∗ , 8, 3∗ , 4∗ ) ← 5, (1, 5, 3, 4) ← 6,
(1, 6∗ , 3, 4) ← 1, (1∗ , 6∗ , 3, 4).
Analysis

Let us lower-bound the optimal offline number of cache misses.


• Divide the sequence of requests into phases. Each phase starts
with a unmarked cache and ends once the marks are removed.
• A requested item is called fresh if it has not been marked in the
previous phase. Else it is called stale.
• In phase j, let c j be the number of fresh requests.
Let r be the total number of phases.
1
Pr
Claim The optimal number of cache misses is at least 2 j=1 c j .

This follows (with some reasoning) from the fact that no matter
what policy is used, in phases j − 1 and j together there are k + c j
distinct element requests, hence at least c j cache misses.
Claim The expected number of cache misses in our algorithm is
at most
r
X
(H(k) + 1) cj,
j=1

where H(k) = 1 + 1/2 + · · · + 1/k = O(log k).

To prove it, estimate E X j where X j is the number of cache misses in


phase j.
• Each of the c j fresh requests is a cache miss.
• Consider the stale requests. For the ith request to some distinct
stale item s, let Ai = 1 if it is a cache
P miss and 0 otherwise. The
number of stalePcache misses is i Ai . The expected value of
this number is i E Ai .
• E Ai = Pr { the ith request is a cache miss }. We will estimate
this probability.
k = 16

a f a b c g a f d f a f
i=4

d q
c=2

Red requests are fresh. Blue request is a stale cache miss.


• From the k items in the cache in phase j − 1 (these are now the
stale ones), already i − 1 have been marked in phase j, so
k − i + 1 are unmarked.
• Suppose that c ≤ c j fresh requests were already made. Then c
elements among the k − i + 1 have been evicted.
• The set of c evicted elements is chosen uniformly from all
subsets of size c. So the probability for the ith request to belong
to this set is
c cj
≤ .
k−i+1 k−i+1
k
1 1 1
 ‹ X
E X j ≤ cj + cj + + . . . = cj + cj = c j (1 + H(k)).
k k−1 i=1
i

Calculus shows H(k) = ln k + O(1), but upper bound is easier:

H(k) = 1 + 1/2 + · · · + 1/k


= 1 + (1/2 + 1/3) + (1/4 + · · · + 1/7) + . . .
≤ 1 + 1 + 1 + . . . (log k times).

For a sequence of requests σ let f (σ) be the optimal number of


cache misses. We proved:

Theorem On σ, the randomized caching algorithm achieves an


expected number O(log k) f (σ) cache misses.
Randomized selection

(Not covered in Spring 2020)


Let A be an array of n numbers a1 , . . . , an . For simplicity assume
they are all distinct.
• The median is the bn/2cth number in the sorted order of
elements of A.
• The task of looking for the median is solved by sorting A. But
that takes n log n steps (we count comparisons). Is not there a
faster method?
• Some faster methods are based on divide-and-conquer. Since
they are recursive, we first generalize: we want to compute
Select(A, k), the kth element in the sorted order.
Idea:

GenericSelect(A, k):
if n = k = 1 then return a1
Pick some index r (by some method).
A1 ← all elements of A that are ≤ a r
A2 ← all elements of A that are > a r
if k ≤ |A1 | then
return GenericSelect(A1 , k)
else
return GenericSelect(A2 , k − |A1 |)

If max(|A1 |, |A2 |) = O(n) then the recursive analysis promises an


O(n) estimate.
But how to pick such a number r? Idea: Pick a random r, hoping it
is good enough on average.
RandomizedSelect(A, k) is essentially the same as GenericSelect(A, k),
except that r is a random variable R with Pr { R = i } = 1/n for
i = 1, . . . , n. Let X be the running time, and T (n) its expected
value.. Using Bayes’s Theorem (8) for conditional expectations:

T (n) = E X = E[X | R = 1]Pr { R = 1 } + · · · + E[X | R = n]Pr { R = n } .

If R = i then the recursion is called either in A1 or in A2 ; in the


worse case in the larger one.
Adding the cost n of partitioning into A1 and A2 :

E[X | R = i] ≤ n + T (max(i, n − i)),


1
T (n) ≤ n + (T (n − 1) + T (n − 2) + · · · + T (n − 2) + T (n − 1))
n
2
= n + (T (n − 1) + T (n − 2) + · · · + T (n/2 + 1)).
n
We know T (1) =. Instead of resolving the complex estimate

2
T (n) ≤ n + (T (n − 1) + T (n − 2) + · · · + T (n/2))
n
try to prove T (n) ≤ cn for some constant c: try to choose a c that
allows proving this by induction:

2c
T (n) ≤ n + ((n − 1) + (n − 2) + · · · + n/2).
n
The sum of an arithmetic series of k terms with starting term a and
3n2
ending term b is k a+b n 3n−1
2 . So the sum inside is 2 4 < 8 .
Substituting:

2c 3n2
T (n) < n + = n(1 + 3c/4).
n 8
Choosing c with 1 + 3c/4 = c, that is c = 4, completes the proof.
• The Kleinberg-Tardos book gives a different proof of
T (n) = O(n), also worth seeing.
• Similarly, it gives a different proof than the one given below that
randomized Quicksort has expected running time O(n log n).
Average-case analysis of Quicksort

(Not covered in 2020)


Please, review the definition of deterministic and randomized
Quicksort from your textbook. What is randomized is the choice of
the pivot elements.
The meaning of “average” is different for the two cases.
• In the deterministic case the running time on the worst possible
input order is Ω(n2 ). Averaging is over all possible input orders:
that average is O(n log n).
However, it is not sufficiently reassuring to say that Quicksort
works well on random orders. The typical arrays that we will
have to sort are probably not random at all, for example they
may be partially sorted.
• If we introduce randomness ourselves (we randomize), we
don’t have to rely on the fact that the input array is random.
The expected running time means averaging over all possible
choices of the pivots. We do not average over inputs—this
works for every possible input. The average obtained is
O(n log n). The worst case (taking the worst possible pivot
sequence) is still Ω(n2 ).
Let the sorted order be z1 < z2 < · · · < zn . If i < j then let

Zi j = {zi , zi+1 , . . . , z j }.

Let the random variable Ci j be defined to be 1 if zi and z j will be


compared sometime during the sort, and 0 otherwise.
Every comparison happens during some partition, with the pivot
element. Let πi j be the first (random) pivot element entering Zi j . A
little thinking shows:

Lemma We have Ci j = 1 if and only if πi j ∈ {zi , z j }. Also, for


every x ∈ Zi j , we have

1
Pr πi j = x = .

j−i+1
2
It follows that Pr Ci j = 1 = E Ci j = j−i+1 . The expected number


of comparisons is

2 1 1 1
X X  ‹
E Ci j = ≤ 2(n − 1) + + ··· + .
1≤i< j≤n 1≤i< j≤n
j−i+1 2 3 n

From analysis we know that the harmonic function


H(n) = 1 + 21 + 13 + · · · + 1n = ln n + O(1). Hence the average
complexity is ≤ 2n ln n = O(n log n).
Hash tables
(Not covered in Spring 2020)
Problem A very large universe U of possible items, each with a
different key. The set S ⊂ U of n of actual items that may
eventually occur is much smaller. We want to store the items in
a data structure in a way that they can be
• stored fast as they come, and found fast later.
Solution ideas • Balanced search trees: you must have seen
them in a data structures course.
• Hash function h(k), hash table T [0 . . m − 1]. Key k hashes
to hash value (bucket) h(k). This can be in practice faster,
but its analysis is more complex.
Problem with hashing Collisions.
Resolution Chaining, open hashing, and so on.
Uniform hashing assumption • Items arrive “randomly”.
• Search takes Θ(1 + n/m), on average, since the average
list length is n/m.
Hash functions

What do we need? The hash function should spread the


(hopefully randomly incoming) elements of the universe as
uniformly as possible over the table, to minimize the chance of
collisions.
Keys into natural numbers It is easier to work with numbers than
with words, so we translate words into numbers.
For example, a string of bytes can be treated as a base 256
integer, possibly adding up such integers for different segments
of the word.
Randomization: universal hashing

To guarantee the uniform hashing assumption, instead of assuming


that items arrive “randomly”, we we choose a random hash
function, h(·, r), where r is a parameter chosen randomly from
some set H.

Definition The family h(·, ·) is universal if for all x 6= y ∈ U we


have
1
Pr { h(x, r) = h( y, r) } ≤ .
m

If the values h(x, r) and h( y, r) are pairwise independent, then the


1
probability is exactly m (the converse is not always true). Thus,
from the point of view of collisions, universality is at least as good
as pairwise independence.
Using a universal hash function

Once we have chosen the random parameter r, we will keep it


fixed.
• No matter how we fix r, if the universe U is big then there is
some set S r ⊂ U with |S r | = n such that h(k, r) maps all
elements of S r to the same position in the table.
• But if we assume that somehow the set S ⊂ U was fixed before
we choose r (we just don’t know S), or chosen independently of
r, then universality helps bounding the expected number of
collisions.
An example universal hash function

We assume that our table size m is a prime number.


(There are tables of prime numbers, it will be easy to find one
between, say, m and 4m: by Chebyshev’s theorem for every k there
is a prime between k and 2k.)
Let d > 0 be an integer dimension. We break up our key x into a
sequence

x = (x 1 , x 2 , . . . , x d ), 0 ≤ x i < m.

(If x is a bit string, break it into segments of size log m.) Fix the
random coefficients 0 ≤ ri < m, i = 1, . . . , d, therefore the number
of possible random inputs is |H| = md .

h(x, r) = r1 x 1 + · · · + rd x d mod m.
We use the notation a ≡ b (mod m) for a mod m = b mod m. This
is the same as requiring m|(a − b).

Fact Let p be a prime number, d 6≡ 0 (mod p), and ad ≡ bd


(mod p) then a ≡ b (mod p).

Indeed, by the fundamental theorem of arithmetic, if a prime


number divides a product, it must divide one of its factors. Here, p
divides (a − b)d. It does not divide d, so it divides a − b.
Let us show that our random hash function is universal. Assume
(x 1 , . . . , x d ) 6= ( y1 , . . . , yd ). We show that
Pr { h(x, r) = h( y, r) } ≤ 1/m. There is an i with x i 6= yi , we might
as well assume x 1 6= y1 . If h(x, r) = h( y, r) then

0 ≡ h(x, r) − h( y, r) ≡ r1 (x 1 − y1 ) + A (mod m),


A ≡ r1 ( y1 − x 1 ) (mod m),

where A only depends on the random numbers r2 , . . . , rd . No


matter how we fix r2 , . . . , rd , there are m equally likely ways to
choose r1 . According to the Fact above, only one of these choices
gives r1 ( y1 − x 1 ) ≡ A (mod m), so the probability of this happening
(conditionally on fixing r2 , . . . , rd ) is 1/m. Since this probability is
the same under all conditions, it is equal to 1/m.
Closest pair of points revisited

(Not covered in Spring 2020)


Using hashing, we will give a more efficient and more general
algorithm to find the closest pair of points among n points.
Set of points P in the plane, say inside the unit square [0, 1] × [0, 1].
New strategy, using an appropriate data structure P(δ).
• We proceed by stages. In each stage, shortest distance upper
bound δ.
• Take points p1 , p2 , . . . from P in random order. For point pi
check whether it is closer than δ to any of the points
p1 , . . . , pi−1 in the data structure P(δ).
If yes, update the structure P(δ) with the new distance δ.
Otherwise store pi into P(δ).
Questions:
• What is the data structure?
• How long does this take, on average?
Partition the unit square into a grid of subsquares of sides δ/2:
they can be denoted as Sδ (k, l) for k, l = 1, . . . , d1/2δe. To each
point p ∈ P, let

Q δ (p)

be the square Sδ (k, l) in the partition where it belongs to.


• P(δ) is a hash table for the squares Q δ (pi ), for i = 1, 2, . . . as
keys. With each key Q δ (pi ) we store pi as a value. There is no
p j with j < i in the same square, since then we would have
d(pi , p j ) < δ.
• If for the new pi we have d(pi , p j ) < δ for some j < i, then p j is
in one of the 25 squares centered around Sδ (k, l) = Q δ (pi ): so
we check only the 25 elements Sδ (k0 , l 0 ) in the table, where
|k − k0 |, |l − l 0 | ≤ 2.
• If a new shortest distance δ is found then the new squares
Q δ (p j ) are reinserted into the new table P(δ) for j = 1, . . . , i.
Running time

Lookups and distance computations: At most 25 for each point.


Insertions: Let X i = 1 if if stage i causes the shortest distance to
change, 0 otherwise. Stage i has 1 + iX i insertions, the total is
n
X
n+ iX i .
i=1

Lemma E X i = 2/i.

Indeed, let p, q be the closest pair of points among p1 , . . . , pi .


Pr { p or q comes last } = 2/i. Expected number of insertions:
n
X
n+ i · 2/i = 3n.
i=1
Polynomial reduction, formally
• The computational problems in question are given by some
mapping from (say, binary) string inputs, or instances to string
outputs: so it is a function A : Σ∗ → Σ∗ where Σ is some finite
alphabet.
• Even if we talk about other objects as inputs or outputs (graphs,
numbers), they will eventually be encoded into strings for
processing by our machines.
• We say that a polynomial-time computable functions τ, φ
reduce problem A to problem B if for every possible input x, we
have

A(x) = φ(B(τ(x))).

So we encode the input x of A into an input τ(x) of problem B,


and then transform the solution y = B(τ(x)) into a solution
φ( y) of problem A. (If A(x), B(x) ∈ {0, 1} then we don’t need
the transformation φ.)
• Sometimes we allow for a more general kind of reduction: a
polynomial-time algorithm solving A in which asking and
(magically) getting an answer B( y) on any particular input y
counts only as one step.
• The imaginary “black box” device that answers our queries is
sometimes called an oracle, and the computation using it an
oracle computation.
When a reduction exists from problem A to problem B we can write
A ≤ B. This relation is transitive (creates a partial order): if A ≤ B
and B ≤ C then A ≤ C. Indeed, suppose that
• A is reduced to B with the help of the polynomial-time functions
τ1 , φ1 by A(x) = φ1 (B(τ1 (x))).
• Similarly B is reduced to C by B( y) = φ2 (C(τ2 ( y))).
The functions τ(x) = τ2 (τ1 (x)) and φ( y) = φ1 (φ2 ( y)) are also
polynomial-time, hence we have a polynomial reduction
A(x) = φ(C(τ(x))).
Some more examples:
• The sequence alignment problem was defined earlier.
Input: two sequences X = (x 1 , . . . , x m ), Y = ( y1 , . . . , yn ), of
symbols in some alphabet Γ , and penalties δ for deletion, and
distance function d(x, y) over the alphabet Γ (as penalty of
mismatching symbols).
Output: two sequences of indices i1 , . . . , ik , j1 , . . . , jk at which
the sequences must be aligned for minimizing the total penalty.
• Reduction: to the shortest path problem in a graph G(X , Y, δ, d)
with edge lengths, with source and destination vertices s, t. Any
algorithm finding a shortest path in G from s to t finds an
optimal alignment between X and Y .
y1 y2 y3 y4
0 1 2 3 4
0

x1 For X = (x 1 , . . . , x m ) and
1 Y = ( y1 , . . . , yn ). Points (i, j)
x2 where i represents the position
2 between x i and x i+1 .
x3

• Horizontal and vertical edges have length δ.


• Diagonal edge ending in (i, j) has length d(x i , y j ).
• Now A[m, n] is the length of the shortest path from the left top
to the right bottom. (Our algorithm is not slower than Dijkstra’s
for computing it.)
• Find the alignment going backwards using A[i, j], or record the
parents while computing the distances A[i, j] (as in Dijkstra).
There are two very different uses of reduction from a problem A to
problem B.
1 We may have a good algorithm for solving B, and now the
reduction provides also a good algorithm for solving A.
2 We know (or suspect) that there is no good algorithm for
solving A. By the reduction now we know (or suspect) that
there is no good algorithm for solving B either, since then the
reduction would provide also a good algorithm for B.
In NP-completeness theory we mostly use reductions for the second
purpose.
We discussed the subset sum problem as special case of the
knapsack problem: Given positive integers a1 , . . . , an , c, decide
whether there are x 1 , . . . , x n ∈ {0, 1} such that

a1 x 1 + · · · + an x n = c. (9)

A different problem where we have 2 equations to satisfy:

a1 x 1 + · · · + an x n = c,
(10)
b1 x 1 + · · · + bn x n = d.

Of course (9) can be reduced to (10) in a trivial way: just let the
second equation be, say, 0=0 (or equivalent to the first one).
But we will reduce (10) to (9) also, so (9) is just as difficult as (10).
a1 x 1 + · · · + an x n = c, (11)
b1 x 1 + · · · + bn x n = d. (12)

Multiply the second equation by M = a1 + · · · + an + c + 1 and add


it to the first one:

(a1 + M b1 )x 1 + · · · + (an + M bn )x n = c + M d. (13)

Let us see that if x 1 , . . . , x n solves (13) then it solves both (11)


and (12). The remaindering distributes into the sum:

(u + v) mod M = ((u mod M ) + (v mod M )) mod M . (14)

Taking the remainder of the two sides of (13) modulo M and


using (14) repeatedly, we get back (11). Indeed, for example
a1 + M b1 mod M = a1 since a1 < M . Subtracting (11) from (13)
and dividing by M we get back (12).
NP problems

Many of the problems we have seen, even if they are not solvable
polynomially, possess a weaker but useful property: if a solution is
offered, we can use it, we don’t have to trust it blindly: it can be
verified in polynomial time.

Examples
• Perfect matching.
• Compositeness of an integer.
• Large independent set in a graph.
• An NP problem is given by a verification relation

V (x, w)

over strings x, w.
x is the input, or instance.
w is a potential witness, or certificate (or a “solution”).
• We require V to be computable in time polynomial in the length
of the input x.
In detail: there is a constant c such that for every n, every input
string x of length n, the response V (x, w) is computable in time
O(nc ). String w is a witness if |w| = O(nc ), and V (x, w) = True.
Examples
Perfect matching problem: input is a bipartite graph G. Potential
witness: a set of edges M . Verification function: checks
whether M is a perfect matching for G (and thus a witness).
Compositeness problem: input is an integer x. Potential witness:
an integer w. Verification function: checks whether w is a
proper divisor of x.
Large independent set problem: input is a pair (G, k), graph G
and integer k. Potential witness: a set U of vertices in G.
Verification function: checks whether U is an independent set
of size ≥ k.
Decision problems, search problems

To every NP problem defined by a verification relation V (x, w)


belong two related questions:
Decision problem Given x, is there a w satisfying V (x, w)?
Search problem Given input x, find a witness w satisfying V (x, w)
(or say if there is none).
Formally, a set S of strings is in NP it is the decision problem for
some polynomial-computable verification relation V (x, w), that is if

x ∈ S ⇔ ∃wV (x, w).


• Solving the search problem also solves the decision problem.
An algorithm solving the decision problem sometimes helps
solving the search problem (see one of your homework
questions, on subset sum).
• But other times it does not.
• The compositeness question on number x is the same as the
question whether x is a prime number (that is not composite).
There is a (nontrivial!) polynomial algorithm for deciding
primality.
• The search problem is asking to find a proper divisor of the
number x. If there was a polynomial algorithm for this, then
repeating it would give a polynomial algorithm of factorization of
an input x (into prime divisors). Most of modern practical
cryptography (the basis of a lot our secure internet transactions)
is relying on the fact that no efficient factorization algorithm is
known.
Optimization problems

A number of NP problems are related to optimization problems.


For example, the “large independent set problem” is related to the
problem of finding the size of the largest independent set.
• Clearly, a solution to the largest independent set problem would
answer any question of the kind: “given G, k, does G have an
independent set of size ≥ k?”.
• But there is also a reduction in the other direction. Indeed, a
black-box reduction just could ask questions about (G, 1),
(G, 2),. . ., (G, n). The last k for which the answer is affirmative,
is the size of the maximum independent set.
• The above reduction can be speeded up via binary search.
• Clearly to each optimization problem belongs also a search
problem: asking not just for the size of the largest independent
set but also for an independent set of maximum size.
Boolean formulas

x i ∈ {0, 1}.

¬x = 1 − x, x ∧ y = min(x, y), x ∨ y = max(x, y),

negation, conjunction, disjunction. Example formula:

F (x 1 , x 2 , x 3 , x 4 ) = (x 1 ∨ ¬x 2 ) ∧ (x 2 ∨ ¬x 3 ∨ x 4 ).

Such a formula defines a Boolean function. An assignment (say


x 1 = 0, x 2 = 0, x 3 = 1, x 4 = 0) allows to compute a value (in our
example, F (0, 0, 1, 0) = 0).
Some important rules transforming a formula without changing the
Boolean function it represents: distributive and de Morgan rules,

(x ∨ y) ∧ z = (x ∧ z) ∨ ( y ∧ z),
(x ∧ y) ∨ z = (x ∨ z) ∧ ( y ∨ z),
¬(x ∧ y) = ¬x ∨ ¬ y, ¬(x ∨ z) = ¬x ∧ ¬z.
Disjunctive normal form (DNF): a disjunction of clauses, each a
conjunction of variables and negated variables (called literals).
Example:

(x 1 ∧ ¬x 2 ) ∨ (x 2 ∧ ¬x 3 ∧ x 4 ) ∨ ¬x 4

Conjunctive normal form (CNF): a conjunction of clauses, each a


disjunction of literals. Example:

(x 1 ∨ ¬x 2 ) ∧ (x 2 ∨ ¬x 3 ∨ x 4 ) ∧ ¬x 4 .
Fact Each Boolean formula is equivalent (as Boolean function)
to a conjunctive normal form, and also to a disjunctive normal
form.

You can find these normal forms by applying the distributivity and
de Morgan rules.

Note If you start for example from a CNF, the equivalent DNF
may be exponentially larger (applications of the distributive rule
can double the size repeatedly).

Fact Every Boolean function f (x 1 , . . . , x n ) can be represented by


a formula.

Indeed, a disjunctive normal form (of exponential size) can easily


be read off from the table of values of f (x 1 , . . . , x n ), as we go over
all possible assignments.
Satisfiability

• An assignment (a1 , a2 , a3 , a4 ) satisfies F , if F (a1 , a2 , a3 , a4 ) = 1.


Example: (¬x ∨ y) ∧ x is satisfied by x = 1, y = 1.
And no assignment satisfies (¬x ∨ y) ∧ x ∧ ¬ y .
• The formula is satisfiable if it has some satisfying assignment.
So it is unsatisfiable if it is always false. Example: x ∧ ¬x.
• The formula is a tautology if it is always true, that is its
negation is unsatisfiable. Example: x ∨ ¬x.
• Satisfiability problem FSAT: given a formula F (x 1 , . . . , x n )
decide whether it is satisfiable.
Special cases:
• SAT: the satisfiability problem for conjunctive normal forms.
• A 3-CNF is a conjunctive normal form in which each clause
contains at most 3 literals—it gives rise to 3SAT.
The 3SAT problem sounds especially basic. Asks to satisfy some
very simple constraints: each a disjunction clause with up to three
literals.
NP-completeness

Theorem (Cook-Levin) Every NP problem is reducible to 3SAT.

Significance of this result If a polynomial algorithm could be


found for SAT then every NP problem could be solved in
polynomial time. In other words, a fast algorithm to check
solutions to some problem (witnesses to a verification relation)
would guarantee also a fast algorithm to decide whether there
is any solution at all.
Usual conclusion This is unlikely, so probably SAT is hard.
• A problem to which every NP problem is reducible is called
NP-hard. If it is also in NP (thus a decision problem for some
verification relation) then it is called NP-complete.
Now we know that SAT is N P-complete.
• A problem can be NP-hard and not NP-complete even just by its
form: when it is not a decision problem but, say, an
optimization problem or a search problem.
• If we reduce an NP-complete problem A to some problem B
then this shows that B is also NP-hard. (If B is also in NP then
B is also NP-complete.)
NP-hard

NP-complete

NP

P
Chains of reductions have shown the NP-completeness of many
well-know problems, helping to explain their difficulty. We will
show some of these reductions, starting with the large independent
set problem.
Proof outline for the Cook-Levin theorem
Boolean circuits

Our computers are built up from Boolean circuits like this one:
x1 x2 x3 x4

¬ ¬

_
^
_
¬

It computes ((¬x 1 ∨ x 2 ) ∨ ¬x 3 ) ∧ ¬((¬x 1 ∨ x 2 ) ∨ x 4 ),


but is more economical: it reuses ¬x 1 ∨ x 2 .
Boolean circuits are missing an important computer component:
memory units. Still, polynomial-time functions are computable by
polynomial-size circuits:

Theorem (Poly-Sized Circuits) Suppose that a function f (x) is


c
computable in time O(n ) for some constant c—that is computable
in polynomial time. Then there for every n there is a Boolean
circuit Cn of size O(n2c )—that is polynomial size that from every x
of length n computes f (x).

The idea of the proof is that for every step of the computation, we
represent the state of memory of our computer by a “layer” of the
Boolean circuit.
Since circuits also compute Boolean functions, satisfiability is
defined for them as well. The CSAT problem’s inputs are circuits.

Lemma The CSAT problem is NP-complete.

• Consider any NP problem A with polynomial-time verification


relation VA(x, w).
Sketch of its reduction to CSAT:
• By the Poly-Sized Circuits Theorem, an O(nc ) algorithm
computing VA gives rise for each n to a circuit CA,n of size O(n2c
computing VA(x, w) for all |x| = n, |w| ≤ nc .
• Hardwiring the input string x we get a circuit CA,n,x with input
w. Its satisfying assignments w are just the witnesses for
VA(x, w).
CSAT to 3SAT

Given a circuit C, we introduce a new variable yi for the output of


each node, along with a constraint saying it computes what it is
supposed to. For example if it is an OR gate then we put the
formula x i ∨ x j ⇔ yk into conjunctive normal form:

(¬x i ∨ yk ) ∧ (¬x j ∨ yk ) ∧ ( yk ∨ ¬x i ∨ ¬x j ).

Let yN be the output of the circuit. The 3SAT formula


FC (x 1 , . . . , x n , y1 , . . . , yN ) that is the conjunction of all these
constraints is true if and only if yN is computed from x 1 , . . . , x n by
the circuit C. Therefore the 3CNF

FC (x 1 , . . . , x n , y1 , . . . , yN ) ∧ yN

is satisfiable if and only if the circuit C is.


SAT to independent set

How to satisfy a CNF C1 ∧ C2 ∧ · · · ∧ Cn ? Each clause must be


satisfied. For example if C1 = x ∨ y ∨ ¬z then one of x, y, ¬z must
be made true. If x appears in C1 and ¬x in C3 then both cannot be
made true.
So the task is this:
Pick one literal per clause, but if you picked a literal from some clause
you cannot pick its negation from elsewere.
(¬x ∨ y) ∧ (¬ y ∨ z) ∧ (x ∨ ¬z ∨ y) ∧ ¬ y

¬x ¬y x

y z ¬z ¬y

¬x ¬y x z

y z ¬z ¬y x

(¬x ∨ y) ∧ (¬ y ∨ z) ∧ (x ∨ ¬z ∨ y) ∧ ¬ y ∧ (z ∨ x)
Graph G F for conjunctive normal form F = C1 ∧ · · · ∧ Cm :
• Assign a vertex to each occurrence of each literal.
• Connect literals within each clause.
• Connect each literal to each occurrence of its negation.
Each independent set corresponds to a set of literals that can be
made true simultaneously, and which contains at most one literal
per clause.
Integer programming
Reduction of 3SAT to 0-1 linear programming
• Equations of the form x + x 0 = 1 where x 0 represents ¬x.
• Turn a clause of the form x ∨ y ∨ ¬z into an inequality

x + y + z 0 ≥ 1.

• Turn inequalities into equations: x + y + z 0 ≥ 1 into

x + y + z0 + t1 + t2 = 3

using some new variables t i . So, satisfying a CNF is reduced to


solving a set of equations
n
X
ai j x j = bi , i = 1, . . . , m
j=1

in 0-1 variables x j where all ai j ∈ {0, 1}.


• Reduction of many equations to one: same trick as used to
i−1
P Multiply the ith equation by M
reduce two equations to one.
where M = 1 + maxi bi + j ai j , and add them all up:

A1 x 1 + · · · + A n x n = B

where for example A1 = a11 + a21 M + · · · + am1 M m−1 .


This is a reduction to the subset sum problem, showing that it is
NP-complete.
• We gave a dynamic programming algorithm for the subset sum
problem (actually, for the more general knapsack problem). We
remarked that it is not polynomial when the coefficients are
large. Here the A j will be large.
Traveling salesman

Another famous group of NP-complete (or NP-hard) problems:


• A Hamiltonian cycle of a graph is a cycle that passes through
every vertex exactly once. The problem: given a graph G, does
it have a Hamiltonian cycle?
• Given a graph G and two vertices s, t in it, what is the length of
the longest path between s and t?
In contrast, we have seen that the shortest path problem is in P.
• Given a set of cities with connecting roads between them that
have lengths, what is the length of the shortest tour for a
traveling salesman who needs to pass through all of the cities?
These problems reduce rather easily to each other, but it is a little
tricky to show NP-completeness.
Similar problems in P and in NP

In a number of cases, changing a parameter turns a problem from


polynomial to NP-complete.
• 3SAT is NP-complete.
2SAT is in P (the set of polynomal-time solvable problems).
• 3-coloring is NP-complete.
2-coloring (deciding whether a graph is bipartite) is in P.
• 3-partite matching (whether a 3-partite graph can be covered
by disjoint triangles) is NP-complete.
2-partite matching (whether a bipartite graph has a perfect
matching) is in P.
• Traveling Salesman problem is NP-hard.
Chinese Postman problem (finding the shortest route passing
through all edges of a graph) is in P (not easy).
• Finding the largest independent set of vertices in a graph is
NP-hard.
Finding the largest independent set of edges (that is a
matching) is solvable in polynomial time. We have seen a
polynomial algorithm for bipartite graphs, but there is one also
for general graphs (more complex).
The Chinese Postman problem can be reduced to this.
• Finding a proper divisor of an integer x is probably hard (this is
the factorization problem), even if not NP-hard.
The trial division algorithm (just try all possible witnesses
1 < w < x) is not polynomial, as the input length is log x.
• Finding a common divisor of integers x, y (if it exists) is
solvable in polynomial time by the Euclidean algorithm: using
the identity

gcd(x, y) = gcd( y, x mod y).

in a recursion.
Linear programming

(Not covered in Spring 2020)


Reformulation of vertex cover: given an undirected graph
G = (V, E), linear inequalities

x u + x v ≥ 1 for all (u, v) ∈ E,


x u ≥ 0,
X
x u ≤ k.
u

When x u must be integers, the solvability is equvalent to the


question whether G has a vertex cover of size ≤ k.
• Relaxation: of the above question: allow x u to be real numbers:
so we are looking for a fractional vertex cover.
Is this still an NP problem? Not obvious, since maybe there are
only witnesses x u that cannot be expressed with a small number
of bits.
• The fractional vertex cover problem is a special linear program.
A general linear program for some (integer) coefficients
ai j , bi , c j :
Find a solution x 1 , . . . , x n for the linear inequalities

ai1 x 1 + · · · + ain x n ≤ bi , i = 1, . . . , m

(the constraints), maximizing c1 x 1 + · · · + cn x n (the objective


function).
Theorem The solvability (in real numbers) of a set of linear
inequalities in integer coefficients (just the constraints of the above
program) is an NP problem: more precisely, if there is a solution
p
then there is one in which x j = q jj with polynomial-length integers
pj, qj.

The proof uses linear algebra.

Theorem (famous) There is a polynomial algorithm to solve


(in real numbers) a set of linear inequalities in integer coefficients.
The maximum flow problem is a special linear program. Indeed, in
it, given capacities c(u, v) on the edges of a directed graph
G = (V, E) with source and target vertices s, t ∈ V , we are looking
for a flow function f (u, v) satisfying the inequalities

f (u, v) + f (v, u) = 0, for all edges (u, v),


f (u, v) ≤ c(u, v), for all edges (u, v),
X
f (u, v) = 0 for all u ∈ V \ {s, t},
v:(u,v)∈E

and maximizing v:(s,v)∈E f (s, v). This is a linear program for the
P

variables f (u, v).


• Known algorithms for general linear programs are more
complex than those for just solving the maximum flow problem.
Overview: representative problems

Interval scheduling Solved by a greedy method, near-linear time.


Weighted interval scheduling Solved by dynamic programming,
near-linear time.
Bipartite matching Solved by a more complex method, still
polynomial time.
Large independent set NP-complete, takes possibly exponential
time.
Competitive facility location Two players alternatingly select
nodes in a graph, each of which has some value. Cannot select
neighbor of an already selected node.
Question: Can player 1 achieve a total value of at least B, no
matter how player 2 plays?
This is a PSPACE-complete problem, believed to be even much
harder than NP problems. (It is NP-hard.)
Approximation algorithms

(Not covered in Spring 2020)


For some problems exact solution seems hopeless. For those that
were optimization problems, we can hope for a solution that
approximates the optimum. We have seen already some examples:
• In some example homework we may have shown that certain
activity selection algorithms, though not optimal, approximate
the optimal number of activities within a factor of 2.
• A simple algorithm for the knapsack problem approximates the
optimum within a factor of 2. (This is valuable since no
polynomial algorithm is known for the knapsack problem.)
• A simple greedy algorithm for maximum matching also finds a
matching that is not worse than half of the optimum.
Now we will see some more interesting examples—showing that
even if a problem is proven difficult, giving up is not the right
answer.
Center selection

Suppose we have a set of places V , and a set of towns U ⊆ V . Our


goal is to build a set C ⊆ V of service centers in such a way that
every town is close to at least one of these centers. More precisely,
we call the distance d(u, v) of two places the cost of going from
place u to place v. If C ⊆ V is a set of k centers, then let

d(v, C) = min d(v, c), r(C) = max d(u, C).


c∈C u∈U

So r(C) is the distance of the town farthest from the set C: it is


called the covering radius of C.

Center selection problem Given a distance function d(·, ·) over


the set V of places, a set of towns U ⊆ V and a number k, find a set
of k centers C with minimum r(C).

This is what it means that we want no town to be too far from C.


Properties of distance

We will consider a restricted version of the problem, in which the


cost d(u, v) satisfies the following two properties:
Symmetry d(u, v) = d(v, u): the cost of going from u to v is the
same as going from v to u.
Triangle inequality d(u, w) ≤ d(u, v) + d(v, w).
Though both requirements are reasonable, there are natural cases
when they are not satisfifed. For example, d(u, v) may not be
symmetric in a city with one-way streets. And if going from town u
to town w the only way is via town v, but this forces an overnight
stay in a hotel for a price p, then it may happen that
d(u, w) = d(u, v) + d(v, w) + p.
A bad greedy algorithm

Natural idea: Keep adding places in such a way that the covering
radius is always smallest.

Example The above greedy algorithm can give an arbitrarily


bad result. Let V = {−1, 0, 1}, U = {−1, 1}, k = 2, and the distance
the difference. Then this algorithm chooses point 0 first, and for
example point 1 second. This gives a covering radius 1, while the
optimum is 0.
An algorithm

Here is an algorithm that is not too bad: we only use towns as


centers. Keep adding the town to C that is farthest from it.

Add-Farthest(k)
C ← {u} for some arbitrary initial town u ∈ U
for i = 2 to k do
find the town v ∈ U farthest from C, that is
r(C) = d(v, C)
C ← C ∪ {v}
Algorithm Add-Farthest(k) is not optimal.

Example Let U = V be the set of 9 points on a 3 × 3 square grid


with the ordinary Euclidean distance:

V = { (x, y) : x ∈ {−1, 0, 1}, y ∈ {−1, 0, 1} },

k = 2. If u1 = (x 1 , y1 ), u2 = (x 1 , y2 ) then
1/2
d(u1 , u2 ) = (x 1 − x 2 )2 + ( y1 − y2 )2 .

If we start with v1 = {−1, −1}, then the algorithm chooses


v2 = (1, 1), so C = {(−1, −1), (1, 1)}.
Now r(C) = 2, since (1, −1) is at distancep2 from C. But even the
single point (0, 0) is better: r({(0, 0)}) = 2.
Algorithm Add-Farthest(k) is not too bad:

Theorem If C ∗ is an optimal set and algorithm Add-Farthest(k)


computes a set C then

r(C) ≤ 2r(C ∗ ).

Proof. Let r ∗ = r(C ∗ ). For a contradiction assume r(C) > 2r ∗ .


Then the towns of C are at a distance > 2r ∗ from each other. For
each town u in C there is some place u0 in C ∗ in the ball of radius
r ∗ with center u. If u 6= v then u0 6= v 0 , hence we cover all C ∗ this
way. Indeed, otherwise by the triangle inequality
d(u, v) ≤ d(u, u0 ) + d(u0 , v) ≤ 2r ∗ . So the balls of radius 2r ∗ around
towns of C include all balls of radius r ∗ around the places of C ∗ ,
and so cover all towns.
Vertex cover

See also the description under greedy algorithms.


Consider an undirected graph G = (V, E). A set of vertices H is a
vertex cover if every edge has at least one end in H: if e = {u, v} is
an edge, then H ∩ {u, v} = 6 ;.
• Our problem is to find a minimal-size vertex cover. We will see
later that this is a really hard problem, so we can only hope to
find an approximation.
• More generally, suppose that each vertex has some cost, or
weight: the cost of vertex v is c v ≥ 0. We want to find the
vertex cover whose cost is minimal, that is
X
c(H) = cv
v∈H

is as small as possible.
• The vertex cover question, even with unit costs, is interesting.
Consider, for example, the following theorem.

Theorem In a bipartite graph, the size of the minimum vertex


cover is equal to the size of the maximum matching.

It is easy to see that the size of each vertex cover bounds the size
of each maximum matching. On the other hand the equality is
not trivial: it follows from the max-flow min-cut theorem.
• The vertex cover problem is probably very hard: we will see
later that it is NP-hard, which makes it likely that even the best
algorithm for solving it is not much faster than brute-force
(exponential). On the other hand, there are some interesting
approximation algorithms for it.
Greedy approach

• There is a natural greedy algorithm, repeating the following step:


Choose a vertex with the largest degree, then delete it from
the graph.
This is not a bad algorithm, but there are some nasty graph
examples in which it approximates the optimum only within a
factor of O(log n) (see the example under greedy algorithms).
• On the other hand, the following non-greedy algorithm finds a
vertex cover that is at most twice as large as the optimum, in
the case where each vertex has unit cost. It repeats the
following step:
Find an edge that is not covered yet. Choose both of its end-
points. Delete the endpoints from the graph.
• In what follows we develop an algorithm for the more general
case, with possibly non-unit costs.
Dual approach via prices

• Sometimes it pays to start by finding a bound from the other


side on our solution: this helps estimate how far we are from
the optimum. This is similar to as if we approached the
maximum flow problem from the side of trying to find a small
cut: it is called a dual method.
• Here, the dual approach looks for a lower bound on the vertex
cover cost. For this, we introduce the notion of prices.
• Suppose that a number pe ≥ 0 is assigned to each edge. We say
that these constitute a valid set of prices, if for each vertex u,
X
pe ≤ cu .
e3u

We call this the fairness condition for vertex u. The idea is that
an edge e must pay to each of its end vertices pe for covering it,
but no vertex should receive more in total than its cost.
P lowerbound the optimum. For a set of prices p, let
Prices help
s(p) = e∈E pe . Let H be a vertex cover and p a set of prices
obeying fairness. Then c(H) ≥ s(p). Indeed,
X XX X
c(H) = cu ≥ pe ≥ pe = s(p). (15)
u∈H u∈H e3u e∈E

Our algorithm will just create some prices that push up the lower
bound (15) as much as possible. Every time it touches a price, it
pushes it up until one of the two fairness inequalities at its ends
becomes equality. In this case, we will call that vertex tight. When
there is no more increase possible, the tight vertices form a vertex
cover.

for each edge e do


increase pe until one of its two inequalities becomes tight
return the set H of tight vertices
Let us show that the cost c(H) in the vertex cover obtained is at
most 2 times the (lower bound to the) optimum:
X XX X
c(H) = cu = pe ≤ 2 pe = 2s(p).
u∈H u∈H e3u e∈E

The factor 2 is needed, since some edges on the left-hand side were
counted twice. But we have seen that s(p) is a lower bound to the
cost of every vertex cover, in particular of the optimal ones.

Note Our new algorithm for finding the prices pe is very simple:
it is just a greedy one. On the other hand, the idea to find a vertex
cover via the dual approach of edge prices is non-trivial.
Strong duality

Here is an exact theorem relating to the edge prices, similar to the


max-flow min-cut theorem. Let a fractional vertex cover be a set of
values x = (x v : v ∈ V ) satisfying the following inequalities:

x v ≥ 0 for all vertices v,


x u + x v ≥ 1 for all edges {u, v}.

Let c(x) = v∈V c v x v be the cost of the fractional vertex cover. It is


P

easy to see that for all fractional vertex covers x and prices p
obeying fairness, we have c(x) ≥ s(p). The strong duality theorem
says that min x c(x) = max p s(p). We will not prove it here.
Approximating for the knapsack problem

(Not in a lecture in Fall 2013, can be skipped.)


Recall the knapsack problem. Given: volumes b ≥ a1 , . . . , an > 0,
and integer values v1 ≥ · · · ≥ vn > 0.

maximize v1 x 1 + · · · + vn x n
subject to a1 x 1 + · · · + an x n ≤ b,
x i = 0, 1, i = 1, . . . , n.

solution x ∗ , giving OPT = i vi x i∗ .


OptimalP
P

Let v = i vi . We solved this problem using dynamic


programming, in O(vn) arithmetic operations: this was not
polynomial since v can be exponential in the input size.
Idea: break each vi into a polynomial-size number of big chunks,
for approximation. Let r > 0, vi0 = bvi /rc.

maximize P i vi0 x i
P

subject to i ai x i ≤ b,
x i = 0, 1, i = 1, . . . , n.

Optimal solution x 0 .
For each ", we will choose r in such a way that
• The value i vi x i0 approximates the original OPT within a
P

factor 1 − ".
• The runtime is polynomial in n, 1/".
This is called an approximation scheme.
Assume v1 = maxi vPi . For thePoptimal solution x 0 of the changed
0
i vi x i vi x i0
problem, estimate OPT = Pi ∗. We have
i i xi
v
X X X X
(vi /r)x i0 ≥ vi0 x i0 ≥ vi0 x i∗ ≥ (vi /r)x i∗ − n,
i i i i
X
vi x i0 ≥ OPT − r · n = OPT − "v1 ,
i

where we set r = "v1 /n. This gives


P 0 0
i vi x i "v1
≥1− ≥ 1 − ".
OPT OPT

With v = i vi , the number of operations is of the order of


P

nv/r ≤ n2 v/v1 " ≤ n3 /",

which is polynomial in n, 1/".

You might also like