Graph Algorithms
Basic Definitions and Applications
Undirected Graphs
•Undirected graph. G = (V, E)
• V = nodes.
• E = edges between pairs of nodes.
• Captures pairwise relationship between objects.
• Graph size parameters: n = |V|, m = |E|.
V = { 1, 2, 3, 4, 5, 6, 7, 8 }
E = { 1-2, 1-3, 2-3, 2-4, 2-5, 3-5, 3-7, 3-8, 4-5, 5-6 }
n=8
m = 11
Some Graph Applications
Graph Nodes Edges
transportation street intersections highways
communication computers fiber optic cables
World Wide Web web pages hyperlinks
social people relationships
food web species predator-prey
software systems functions function calls
scheduling tasks precedence constraints
circuits gates wires
Terrorist Network
Social network graph.
• Node: people.
• Edge: relationship between two people
One week of Enron emails
The evolution of FCC lobbying
coalitions
Framingham heart study
Graph Representation: Adjacency
Matrix
Adjacency matrix. n-by-n matrix with Auv = 1 if (u, v) is an edge.
Two representations of each edge.
Space proportional to n2.
Checking if (u, v) is an edge takes (1) time.
Identifying all edges takes (n2) time.
1 2 3 4 5 6 7 8
1 0 1 1 0 0 0 0 0
2 1 0 1 1 1 0 0 0
3 1 1 0 0 1 0 1 1
4 0 1 0 1 1 0 0 0
5 0 1 1 1 0 1 0 0
6 0 0 0 0 1 0 0 0
7 0 0 1 0 0 0 0 1
8 0 0 1 0 0 0 1 0
Graph Representation: Adjacency
List
Adjacency list. Node indexed array of lists.
Two representations of each edge.
degree = number of neighbors of u
Space proportional to m + n.
Checking if (u, v) is an edge takes O(deg(u)) time.
Identifying all edges takes (m + n) time.
1 2 3
2 1 3 4 5
3 1 2 5 7 8
4 2 5
5 2 3 4 6
6 5
7 3 8
8 3 7
Paths and Connectivity
Def. A path in an undirected graph G = (V, E) is a sequence P of nodes v 1, v2, …, vk-1, vk
with the property that each consecutive pair vi, vi+1 is joined by an edge in E.
Def. A path is simple if all nodes are distinct.
Def. An undirected graph is connected if for every pair of nodes u and v, there is a path
between u and v.
Cycles
Def. A cycle is a path v1, v2, …, vk-1, vk in which v1 = vk, k > 2, and the first k-1 nodes are
all distinct.
cycle C = 1-2-4-5-3-1
Trees
Def. An undirected graph is a tree if it is connected and does not contain a cycle.
Theorem. Let G be an undirected graph on n nodes. Any two of the following statements
imply the third.
G is connected.
G does not contain a cycle.
G has n-1 edges.
Rooted Trees
Rooted tree. Given a tree T, choose a root node r and orient each edge away from r.
Importance. Models hierarchical structure.
root r
parent of v
child of v
a tree the same tree, rooted at 1
Phylogeny Trees
Phylogeny trees. Describe evolutionary history of species.
GUI Containment Hierarchy
• GUI containment hierarchy. Describe organization of GUI widgets
Reference:
http://java.sun.com/docs/books/tutorial/uiswing/overview/anatomy.html
Graph Traversal
Connectivity
s-t connectivity problem. Given two nodes s and t, is there a path between s and t?
s-t shortest path problem. Given two nodes s and t, what is the length of the shortest
path between s and t?
Applications.
Facebook.
Maze traversal.
Erdos number.
Kevin Bacon number.
Fewest number of hops in a communication network.
Breadth First Search
BFS intuition. Explore outward from s in all possible directions, adding nodes one "layer" at a
time. Effect: find “shallow” paths to nodes.
s L1 L2 L n-1
BFS algorithm.
L0 = { s }.
L1 = all neighbors of L0.
L2 = all nodes that do not belong to L0 or L1, and that have an edge to a node in L1.
Li+1 = all nodes that do not belong to an earlier layer, and that have an edge to a node in
L i.
Theorem. For each i, Li consists of all nodes at distance exactly i
from s. There is a path from s to t iff t appears in some layer.
Implementing BFS
Q: What’s a good way to implement the above algorithm?
A: Use a queue for the “frontier”
Breadth First Search
Property. Let T be a BFS tree of G = (V, E), and let (x, y) be an edge of G. Then the level
of x and y differ by at most 1.
L0
L1
L2
L3
Breadth First Search: Analysis
Theorem. The above implementation of BFS runs in O(m + n) time if the graph is given by its
adjacency list representation.
Pf.
Easy to prove O(n2) running time:
– at most n lists Li
– each node occurs on at most one list; for loop runs n times
– when we consider node u, there are n incident edges (u, v),
and we spend O(1) processing each edge
Actually runs in O(m + n) time:
– when we consider node u, there are deg(u) incident edges (u, v)
– total time processing edges is uV deg(u) = 2m ▪
each edge (u, v) is counted exactly twice
in sum: once in deg(u) and once in deg(v)
Connected Component
Connected component. Find all nodes reachable from s.
Connected component containing node 1 = { 1, 2, 3, 4, 5, 6, 7, 8 }.
Q1: Finding connected components
• Give an algorithm to find the set of all connected components of an
undirected graph.
Connected Component
Connected component. Find all nodes reachable from s.
R
s
u v
it's safe to add v
Theorem. Upon termination, R is the connected component containing s.
BFS = explore in order of distance from s.
Flood Fill
Flood fill. Given lime green pixel in an image, change color of entire blob of neighboring
lime pixels to blue.
recolor lime green blob to blue
Flood Fill
Flood fill. Given lime green pixel in an image, change color of entire blob of neighboring
lime pixels to blue.
recolor lime green blob to blue
Flood Fill
Flood fill. Given lime green pixel in an image, change color of entire blob of neighboring
lime pixels to blue.
Node: pixel.
Edge: two neighboring lime pixels. recolor lime green blob to blue
Blob: connected component of lime pixels.
Depth-first search
Use recursion
DFS intuition. Explore outward from s along one path as far as possible, and backtrack
when you cannot progress. Effect: find faraway nodes.
DFS(u):
Mark u as “Explored” and add u to R
For each edge (u,v) incident to u
If v is not marked “Explored” then
Recursively call DFS(v)
Depth-first search
Property. For a given recursive call DFS(u), all nodes marked “Explored” between the
beginning and end of this recursive call are descendants of u in T.
Theorem. Let T be a depth-first search tree, let x and y be nodes in T, and let (x,y) be an
edge of G that is not an edge of T. Then one of x or y is an ancestor of the other.
BFS and DFS trees
We have a connected graph G = (V, E) and a specific vertex u. Suppose we compute a
DFS tree rooted at u, and obtain a tree T that includes all nodes of G. Suppose we then
compute a BFS tree rooted at u, and obtain the same tree T.
Prove that G = T.
Answer
Suppose G has an edge e = {a, b} that does not belong to T.
As T is a DFS tree, one of the two ends must be an ancestor of the other—say a is an
ancestor of b.
(*) Since T is a BFS tree, the distance of the two nodes from u in T can differ at most by
one.
But if a is an ancestor of b, and (*) holds, then a must be the direct parent of b. This
means that {a, b} is an edge in T. Contradiction.
Q4: Finding a cycle
Given a graph G, determine if it has a cycle. If so, the algorithm should output this cycle.
Answer: Assume that G is connected; otherwise work on the connected components.
Run BFS from an arbitrary node s, and obtain a BFS tree T. If every edge of G appears in
the tree, then G = T and there is no cycle.
Otherwise, there is an edge e = (v, w) that is in G but not in T. Consider the least
common ancestor u of v and w in T. We get a cycle from edge e and paths u-v and u-w
in T.
Testing Bipartiteness
Bipartite Graphs
Def. An undirected graph G = (V, E) is bipartite if the nodes can be colored red or blue
such that every edge has one red and one blue end.
Applications.
Stable marriage: men = red, women = blue.
Scheduling: machines = red, jobs = blue.
a bipartite graph
Testing Bipartiteness
Testing bipartiteness. Given a graph G, is it bipartite?
Many graph problems become:
– easier if the underlying graph is bipartite (matching)
– tractable if the underlying graph is bipartite (independent set)
Before attempting to design an algorithm, we need to understand structure of
bipartite graphs.
v2
v2 v3
v1
v4
v6 v5 v4 v3
v5
v6
v7 v1
v7
a bipartite graph G another drawing of G
An Obstruction to Bipartiteness
Lemma. If a graph G is bipartite, it cannot contain an odd length cycle.
Pf. Not possible to 2-color the odd cycle, let alone G.
bipartite not bipartite
(2-colorable) (not 2-colorable)
Bipartite Graphs
Lemma. Let G be a connected graph, and let L0, …, Lk be the layers produced by BFS
starting at node s. Exactly one of the following holds.
(i) No edge of G joins two nodes of the same layer, and G is bipartite.
(ii) An edge of G joins two nodes of the same layer, and G contains an
odd-length cycle (and hence is not bipartite).
L1 L2 L3 L1 L2 L3
Case (i) Case (ii)
Bipartite Graphs
Lemma. Let G be a connected graph, and let L0, …, Lk be the layers produced by BFS
starting at node s. Exactly one of the following holds.
(i) No edge of G joins two nodes of the same layer, and G is bipartite.
(ii) An edge of G joins two nodes of the same layer, and G contains an
odd-length cycle (and hence is not bipartite).
Pf. (i)
Suppose no edge joins two nodes in the same layer.
By previous lemma, this implies all edges join nodes on successive levels.
Bipartition: red = nodes on odd levels, blue = nodes on even levels
L1 L2 L3
Bipartite Graphs
Lemma. Let G be a connected graph, and let L0, …, Lk be the layers produced by BFS
starting at node s. Exactly one of the following holds.
(i) No edge of G joins two nodes of the same layer, and G is bipartite.
(ii) An edge of G joins two nodes of the same layer, and G contains an
odd-length cycle (and hence is not bipartite).
Pf. (ii)
Suppose (x, y) is an edge with x, y in same level L j. z = lca(x, y)
Let z = lca(x, y) = lowest common ancestor.
Let Li be level containing z.
Consider cycle that takes edge from x to y,
then path from y to z, then path from z to x.
Its length is 1 + (j-i) + (j-i), which is odd. ▪
(x, y) path from path from
y to z z to x
Obstruction to Bipartiteness
• Corollary. A graph G is bipartite iff it contain no odd length cycle.
5-cycle C
bipartite not bipartite
(2-colorable) (not 2-colorable)
Q1: Destroying paths
Suppose that an n-node undirected graph G = (V, E) contains two nodes s and t
such that the distance between s and t is strictly greater than n/2. Show that
there must exist some node v, not equal to either s or t, such that deleting v
from G destroys all s-t paths.
Give an algorithm with running O(m+n) to find such a node.
Answer
Run BFS starting from s. Let d be the layer where you encounter t. By assumption, d >
n/2.
Now we claim that one of the layers L1,…, Ld-1 has a single node. Why? Because if not,
then they account for at least 2(n/2) = n nodes. But G has only n nodes, and s and t
are not in these layers.
Now let Li be the layer containing a single node v. Suppose we delete v. Consider the set
X of all nodes in layers 0,…,i-1. This set cannot contain t.
Any edge out of these nodes can only lead to a node in L i or stay in X, by the properties of
BFS. But v is the only node in Li.
Q2: Interference-free paths
Consider the following robotics question. You have an undirected graph G = (V,E) that
represents the floor plan of a building, and there are two robots located at nodes a
and b. The robot at node a wants to move to node c; the robot at node b wants to
move to location d.
This is done using a schedule: a function that at each time step, specifies that a robot
moves across a single edge. A schedule is interference-free if there is no point at
which the two robots occupy nodes that are at a distance ≤ r from one another. (We
assume that a-b and c-d are sufficiently far apart.)
Give an algorithm to tell if there is an interference-free schedule that the robots can use.
Answer
Don’t consider the graph G but the “product” H of G with itself.
Nodes of H: pairs (u,v) where u, v are nodes of G.
Edges of H: ((u,v), (u’, v’)) where
1. Either u = u’ and there is an edge between v and v’ in G
2. Or v = v’ and there is an edge between u and u’ in G
Now delete from H all nodes where there would be interference, getting a graph H’.
Check if there is a path from (a,b) to (c,d) in H’.
Complexity: O(mn + n2)
Connectivity in Directed Graphs
Directed Graphs
Directed graph. G = (V, E)
Edge (u, v) goes from node u to node v.
Ex. Web graph - hyperlink points from one web page to another.
Directedness of graph is crucial.
Modern web search engines exploit hyperlink structure to rank web pages by
importance.
World Wide Web
Web graph.
• Node: web page.
• Edge: hyperlink from one page to another.
facebook.com
firefox.com twitter.com instagram.com timewarner.com
hbo.com
sorpranos.com
Road network
Vertex = intersection; edge = one-way street.
Political blogosphere graph
• Vertex = political blog; edge = link.
Ecological Food Web
Food web graph.
• Node = species.
• Edge = from prey to predator.
Reference: http://www.twingroves.district96.k12.il.us/Wetlands/Salamander/SalGraphics/salfoodweb.giff
Graph search
Directed reachability. Given a node s, find all nodes reachable from s.
Directed s-t shortest path problem. Given two node s and t, what is the length of the
shortest path between s and t?
Graph search. BFS extends naturally to directed graphs.
Web crawler. Start from web page s. Find all web pages linked from s, either directly or
indirectly.
Strong Connectivity
Def. Node u and v are mutually reachable if there is a path from u to v and also a path
from v to u.
Def. A graph is strongly connected if every pair of nodes is mutually reachable.
Lemma. Let s be any node. G is strongly connected iff every node is reachable from s,
and s is reachable from every node.
Pf. Follows from definition.
Pf. Path from u to v: concatenate u-s path with s-v path.
Path from v to u: concatenate v-s path with s-u path. ▪
ok if paths overlap
s u
v
Strong Connectivity: Algorithm
Theorem. Can determine if G is strongly connected in O(m + n) time.
Pf.
Pick any node s.
Run BFS from s in G. reverse orientation of every edge in G
Run BFS from s in Grev.
Return true iff all nodes reached in both BFS executions.
Correctness follows immediately from previous lemma. ▪
strongly connected not strongly connected
Strong components
• Def. A strong component is a maximal subset of
mutually reachable nodes
• Theorem. [Tarjan 1972] Can find all strong components
in O(m + n) time.
DAGs and Topological Ordering
Directed Acyclic Graphs
Def. An DAG is a directed graph that contains no directed cycles.
Ex. Precedence constraints: edge (vi, vj) means vi must precede vj.
Def. A topological order of a directed graph G = (V, E) is an ordering of its nodes as v 1,
v2, …, vn so that for every edge (vi, vj) we have i < j.
v2 v3
v6 v5 v4 v1 v2 v3 v4 v5 v6 v7
v7 v1
a DAG a topological ordering
Precedence Constraints
Precedence constraints. Edge (vi, vj) means task vi must occur before vj.
Applications.
Course prerequisite graph: course vi must be taken before vj.
Compilation: module vi must be compiled before vj. Pipeline of computing jobs:
output of job vi needed to determine input of job vj.
Directed Acyclic Graphs
Lemma. If G has a topological order, then G is a DAG.
Pf. (by contradiction)
Suppose that G has a topological order v1, …, vn and that G also has a directed cycle
C. Let's see what happens.
Let vi be the lowest-indexed node in C, and let vj be the node just before vi; thus (vj, vi)
is an edge.
By our choice of i, we have i < j.
On the other hand, since (vj, vi) is an edge and v1, …, vn is a topological order, we must
have j < i, a contradiction. ▪ the directed cycle C
v1 vi vj vn
the supposed topological order: v1, …, vn
Directed Acyclic Graphs
Lemma. If G has a topological order, then G is a DAG.
Q. Does every DAG have a topological ordering?
Q. If so, how do we compute one?
Directed Acyclic Graphs
Lemma. If G is a DAG, then G has a node with no incoming edges.
Pf. (by contradiction)
Suppose that G is a DAG and every node has at least one incoming edge. Let's see
what happens.
Pick any node v, and begin following edges backward from v. Since v has at least one
incoming edge (u, v) we can walk backward to u.
Then, since u has at least one incoming edge (x, u), we can walk backward to x.
Repeat until we visit a node, say w, twice.
Let C denote the sequence of nodes encountered between successive visits to w. C is
a cycle. ▪
w x u v
Directed Acyclic Graphs
Lemma. If G is a DAG, then G has a topological ordering.
Pf. (by induction on n)
Base case: true if n = 1.
Given DAG on n > 1 nodes, find a node v with no incoming edges.
G - { v } is a DAG, since deleting v cannot create cycles.
By inductive hypothesis, G - { v } has a topological ordering.
Place v first in topological ordering; then append nodes of G - { v }
in topological order. This is valid since v has no incoming edges. ▪
DAG
v
Topological Sorting Algorithm:
Running Time
Theorem. Algorithm finds a topological order in O(m + n) time.
Pf.
Maintain the following information:
– count[w] = remaining number of incoming edges
– S = set of remaining nodes with no incoming edges
Initialization: O(m + n) via single scan through graph.
Update: to delete v
– remove v from S
– decrement count[w] for all edges from v to w, and add w to S if c count[w] hits 0
– this is O(1) per edge ▪
Question
Can you have multiple topological orderings for a graph?
v2 v3
v6 v5 v4
v7 v1
Q3: Reachability game
Suppose you have a bipartite directed graph with nodes in the two partitions colored red
and blue, and two players: Red and Blue.
Red and Blue play a game where a token gets moved along edges of the graph. At each
point, the player whose name matches the color of the current node pushes the token.
Initially the token is at s (a red node).
The objective of the game is that Red wants the token to avoid a certain set of blue
nodes X. Blue wants the token to get to X at some point in the game; Red wants to
avoid this. If the token gets to X at any point, the game is over and Blue wins. Aside
from this there is no time bound on the game.
Can you give an algorithm that, given
the graph, s, and X, can tell if Red
has a strategy to win this game?
Strongly connected components
Theorem. Can determine if G is strongly connected in O(m + n) time.
Pf.
Do a DFS in G, ranking nodes u in decreasing order of finishing time f(u).
Run DFS in Grev with root nodes selected in order according to the above rank.
Return each DFS tree in the second DFS as a separate component. ▪