Module3 Final
Module3 Final
Greedy Approach
As discussed in the previous module, greedy algorithms are generally used for optimization problems –
either to minimize or to maximize the value of some objective functions. The greedy approach constructs
a solution through a sequence of steps, each expanding a partially constructed solution obtained so far,
until a complete solution to the problem is reached. Although such an approach may not work for some
computational tasks, there are many for which it is optimal.
In the previous module we have seen two maximization problems – knapsack problem where we maximize
the value of items that can be taken in a knapsack with a certain capacity and the job sequencing problem
where we try find the maximum profit that can be obtained by optimally scheduling a subset of tasks.
In this module we will explore three greedy algorithms (Prim’s, Kruskal’s and Dijkstra’s) that work on
graphs and another one to create an optimal code tree.
The following problem arises naturally in many practical situations: given n points, connect them in the
cheapest possible way so that there will be a path between every pair of points. It has direct applications
to the design of all kinds of networks— including communication, computer, transportation, and
electrical—by providing the cheapest way to achieve connectivity. It identifies clusters of points in data
sets.
We can represent the points given by vertices of a graph, possible connections by the graph’s edges, and
the connection costs by the edge weights. Then the question can be posed as the minimum spanning tree
problem.
A spanning tree of a connected graph is its connected acyclic subgraph (i.e., a tree) that contains all the
vertices of the graph. If such a graph has weights assigned to its edges, a minimum spanning tree (MST)
is its spanning tree of the smallest weight, where the weight of a tree is defined as the sum of the weights
on all its edges.
The minimum spanning tree problem is the problem of finding a minimum spanning tree for a given
weighted connected graph.
The figure below shows a weighted graph and its three spanning trees. The spanning tress have weights of
6, 9 and 8 and minimal spanning tree is the tree that has weight 6.
A bruteforce approach (using exhaustive search) to find minimal spanning tree is to list all possible
spanning trees and find the tree that has the minimum cost among them - this approach has exponential
time complexity.
All the well-known efficient algorithms for finding minimum spanning trees are applications of the greedy
method. We apply the greedy method by iteratively choosing objects (edges in our case) to join a growing
collection, by incrementally picking an object that minimizes the value of an objective function. In the case
of the minimal spanning tree problem, the objective function is the sum of edge weights in the spanning
tree.
Two classic algorithms for the minimum spanning tree problem are Prim’s algorithm and Kruskal’s
algorithm. Both these algorithms solve the problem by building the final spanning tree edge by edge.
However, the way they select the edges differ though both of them always yield an optimal solution.
Prim’s algorithm
Prim’s algorithm constructs a minimum spanning tree through a sequence of expanding subtrees. The
initial subtree in such a sequence consists of a single vertex selected arbitrarily from the set V of the graph’s
vertices. On each iteration, the algorithm expands the current tree in the greedy manner by simply attaching
to it the nearest vertex not in that tree. (By the nearest vertex, we mean a vertex not in the tree connected
to a vertex in the tree by an edge of the smallest weight. Ties can be broken arbitrarily). The algorithm
stops after all the graph’s vertices have been included in the tree being constructed.
Since the algorithm expands a tree by exactly one vertex on each if its iterations, the total number of such
iterations is n-1, where n is the number of vertices in the graph.
The table below shows the growth of the tree after every iteration of the loop in the algorithm.
The running time of Prim’s algorithm depends on the data structures chosen for the graph itself and for the
priority queue of the set V - VT whose vertex priorities are the distances to the nearest tree vertices. Since
there are V-1 iterations of the for loop the algorithm's running time will be in Θ(|V|2) if the graph is
represented by its weight matrix and the priority queue is implemented as an unordered array. If the graph
is represented by its adjacency lists and the priority queue is implemented as a min-heap, the running time
of the algorithm is in O(|E| log |V|).
df:5 We can stop here as 5 edges are already added to MST. The MST of
a graph with |V| nodes can have only |V|-1 edges. Adding any other
edge will create a cycle.
Though both algorithms (Prim’s and Kruskal’s) appear to be simple, Kruskal’s algorithm has to check
whether the addition of the next edge to the edges already selected would create a cycle. A new cycle is
created if and only if the new edge connects two vertices already connected by a path, i.e., if and only if
the two vertices belong to the same connected component. Note also that each connected component of a
subgraph generated by Kruskal’s algorithm is a tree because it has no cycles. One of the efficient
algorithms that can be used to check whether adding a new edge to the set of these disconnected trees is
the union find algorithm.
With an efficient union find algorithm, the running time of Kruskal's algorithm will be dominated by the
time needed for sorting the edge weights of a given graph. Hence, with an efficient sorting algorithm, the
time efficiency of Kruskal's algorithm will be in O(|E| log |E|).
Kruskal's algorithm require a dynamic partition of some n-element set S into a collection of disjoint subsets
Sl, S2, ... , Sk.
1. After being initialized as a collection of n one-element subsets, each containing a different element
of S, the collection is subjected to a sequence of intermixed union and find operations.
2. The number of union operations in any such sequence must be bounded above by n − 1 because
each union increases a subset’s size at least by 1 and there are only n elements in the entire set S.
This is represented by an abstract data type of a collection of disjoint sets of a finite set. Let us use an array
of integers, called Parent[]. If we are dealing with N items, i’th element of the array represents the parent
of i’th item. Initially, the values of all items is set to -1.
for(int i=0;i<n;i++)
par[i]=-1;
1. makeset(x) creates a one-element set {x}. It is assumed that this operation can be applied to each of
the elements of set S only once.
2. find(x) returns a subset containing x. The task is to find representative of the set of a given element.
The representative is always root of the tree. So we implement find() by recursively traversing the
parent array until we hit a node that is root (parent of itself).
while(par[u]!=-1)
u=par[u];
3. union(x, y) constructs the union of the disjoint subsets Sx and Sy containing x and y, respectively,
and adds it to the collection to replace Sx and Sy, which are deleted from it. The task is to combine
two sets and make one. It takes two elements as input and finds the representatives of their sets
using the Find operation, and finally puts either one of the trees (representing the set) under the root
node of the other tree.
if(u!=v)
if(u<v)
par[v]=u;
else
par[u]=v;
Applying makeset(i) six times, once each for each element in the set, initializes the structure to the
collection of six singleton sets: {1}, {2}, {3}, {4], {5}, {6}.
Performing union(l, 4) and union(5, 2) yields {1, 4}, {5, 2}, {3}, {6}.
If now, union( 4, 5) is called, we get the sets {1, 4, 5, 2}, {3}, {6}.
Most implementations of this abstract data type assign a common representative object for each object in
the subset. This representative object is usually one of the members of the subset. Some implementations
do not impose any specific constraints on such a representative. Some implementations, however, require
the smallest element of each subset to be used as the subset’s representative.
The representative object of any two objects x and y will be different unless both of them are in the same
subset. It is usually assumed that the set elements are (or can be mapped into) integers.
Edges arranged in ascending order of weights: 1-2, 4-5, 0-1, 1-5, 2-5, 0-5, 3-5, 0-4, 2-3, 3-4.
There are two principal alternatives for implementing this data structure.
The first one, called the quick find, optimizes the time efficiency of the find operation; the second one,
called the quick union, optimizes the union operation.
Under this scheme (quick union), the implementation of makeset(x) requires assigning the corresponding
element in the representative array to x. The time efficiency of this operation is obviously in O(1), and
hence the initialization of n singleton subsets is in O(n). The time efficiency of single find(x) operation is
O(n). Executing union(x, y) takes a constant-time operation O(1).
There are many applications, where finding the weight of the paths between two nodes in a graph are
required. For example, in a road network, the interconnection structure of a set of roads is modelled as a
graph whose vertices are intersections and dead ends in the set of roads, and edges are defined by segments
of road that exists between pairs of such vertices. In such contexts, we often would like to find the shortest
path that exists between two vertices in the road network.
The application of Dijkstra's algorithm is limited to graphs with nonnegative weights only.
Dijkstra's algorithm finds the shortest paths to a graph's vertices in order of their distance from a given
source. First, it finds the shortest path from the source to a vertex nearest to it, then to a second nearest,
and so on. The set of vertices adjacent to the vertices in Ti can be referred to as "fringe vertices"; they
are the candidates from which Dijkstra's algorithm selects the next vertex nearest to the source.
To facilitate the algorithm’s operations, we label each vertex with two labels. The numeric label d indicates
the length of the shortest path from the source to that vertex. The other label indicates the name of the next-
to-last vertex on such a path, i.e., the parent of the vertex in the tree being constructed. With such labelling,
finding the next nearest vertex u∗ becomes a simple task of finding a fringe vertex with the smallest d
value.
The labeling and mechanics of Dijkstra 's algorithm are quite similar to those used by Prim's algorithm.
Both of them construct an expanding subtree of vertices by selecting the next vertex from the priority
queue of the remaining vertices. It is important not to mix them up, however. They solve different problems
and therefore operate with priorities computed in a different manner: Dijkstra's algorithm compares path
lengths and therefore must add edge weights, while Prim's algorithm compares the edge weights as given.
Example: Use Dijkstra’s algorithm to find the shortest path from a to all other vertices of the graph given
below.
a-b : 3
a-b-d : 3+2=5
a-b-c : 3+4=7
a-b-d-e : 3+2+4=9
The shortest paths (identified by following nonnumeric labels backward from a destination vertex in the
left column to the source) and their lengths (given by numeric labels of the tree vertices) are as follows:
from a to b : a − b of length 3
from a to d : a − b − d of length 5
from a to c : a − b − c of length 7
from a to e : a − b − d − e of length 9
Dijkstra’s algorithm is very similar to Prim’s algorithm except that values assigned to the vertices in the
priority queue are the path weights rather than edge weights. The time efficiency of Dijkstra's algorithm
depends on the data structures used for implementing the priority queue and for representing an input graph
itself. For graphs represented by their weight matrix and the priority queue implemented as an unordered
array, the time complexity is Θ(|V|2). For graphs represented by their adjacency lists and the priority queue
implemented as a min-heap, it is in O(|E| log |V|).
Consider source as a.
5.
Consider source as a
Another way of getting a coding scheme that yields a shorter bit string on the average by assigning shorter
code words to more frequent characters and longer code words to less frequent characters. But this
introduces a problem, namely how can we tell how many bits of an encoded text represent any symbol. To
avoid this we use prefix code in which no code word is a prefix of a code word of another symbol. To
construct prefix code we will apply a greedy algorithm invented by David Huffman.
Huffman's Algorithm
Step 1 Initialize n one-node trees and label them with the characters of the alphabet. Record the frequency
of each character in its tree's root to indicate the tree's weight. (More generally, the weight of a tree will be
equal to the sum of the frequencies in the tree's leaves.)
Step 2 Repeat the following operation until a single tree is obtained. Find two trees with the smallest weight
(ties can be broken arbitrarily). Make them the left and right sub-tree of a new tree and record the sum of
their weights in the root of the new tree as its weight.
The tree constructed by the above algorithm is called a Huffman tree and the code constructed by the above
algorithm is called as a Huffman code.
Huffman Tree (HT) – HT is a binary tree that minimizes the path length from the root to the leaf of the
predefined weights.
Huffman Codes (HC) – HC is an optimal prefix variable length encoding scheme that assigns bit strings
to symbols based on their frequencies in a given text.
Example
Consider the five-symbol alphabet {A, B, C, D, _} with the following occurrence frequencies/probabilities
in a text made up of these symbols:
Encode for the string DAD and decode the code 1001101101110101.
Solution: The Huffman tree construction for this input is shown in Figure below.
With the occurrence frequencies/probabilities given and the codeword lengths obtained, the average
number of bits per symbol in this code is (mean):
Had we used a fixed-length encoding for the same alphabet, we would have to use at least 3 bits per each
symbol. Thus, for this example, Huffman’s code achieves the compression ratio—a standard measure of a
compression algorithm’s effectiveness—of (3−2.25)/3=0.25=25%. In other words, Huffman’s encoding of
the text will use 25% less memory than its fixed-length encoding.
[(number of bits for fixed length-average number of bits per symbol)/ number of bits for fixed length]*100
We want to store a hypothetical text document in compressed form. The document contains words
consisting of only 6 characters (a,b,c,d,e,f). We have scanned the document and counted the occurrence of
each character in the document. The result is as
Example 3: Construct a Huffman code for the following data and encode for the string
ABACABAD and decode for 100010111001010
Symbol A B C D -
Frequency 0.4 0.1 0.2 0.15 0.15
There are three major variations of this idea that differ by what we transform a given instance to
1. Transformation to a simpler or more convenient instance of the same problem-we call it instance
simplification
Checking element uniqueness in an array: The bruteforce algorithm takes O(n2). We can
transform the array by sorting it and then check for uniqueness in the sorted array. This can be
done in O(n log n).
A simple binary search tree (BST) can be transformed into a balanced BST (AVL tree) to
improve the performance in searching a key in a BST.
The Least Common Multiple (LCM) of two numbers, 'a' and 'b', can be calculated using the
formula: lcm(a, b) = (|a * b|) / gcd(a, b), where gcd(a, b) is the Greatest Common Divisor (GCD) of 'a'
and 'b'.
A heap is a data structure that is commonly used to implement a priority queue. It is also employed in
heapsort algorithm.
A priority queue is a multiset of items with an orderable characteristic called an item’s priority, with the
following operations:
1. Finding an item with the highest (i.e., largest) priority. This is called a max-heap. A min-heap can
be used to find the smallest item.
2. Deleting an item with the highest priority (lowest priority in the case of a min-heap_
3. Adding a new item to the multiset
A heap can be defined as a binary tree with keys assigned to its nodes, one key per node, provided the
following two conditions are met:
1. The shape property—the binary tree is essentially complete (or simply complete), i.e., all its levels
are full except possibly the last level, where only some rightmost leaves may be missing.
2. The parental dominance or heap property—the key in each node is greater than (less than in the
case of min-heap) or equal to the keys in its children. (This condition is considered automatically
satisfied for all leaves.)
The figure below shows some trees which are heaps and a few others which are not heaps
Note that key values in a heap are ordered top down; that is, a sequence of values on any path from the
root to a leaf is decreasing (non-increasing, if equal keys are allowed). However, there is no left-to-right
order in key values; that is, there is no relationship among key values for nodes either on the same level of
the tree or, more generally, in the left and right subtrees of the same node.
1. There exists exactly one essentially complete binary tree with n nodes. Its height is equal to log2 n.
2. The root of a heap always contains its largest element.
3. A node of a heap considered with all its descendants is also a heap.
4. A heap can be implemented as an array by recording its elements in the top-down, left-to-right
fashion.
There are two alternative approaches to construct a heap for given list of keys. The bottom-up heap
construction algorithm and the top-down heap construction algorithm.
This algorithm initializes the essentially complete binary tree with n nodes by placing keys in the order
given and then "heapifies" the tree as follows. Starting with the last parental node, the algorithm checks
whether the parental dominance holds for the key at this node. If it does not, the algorithm exchanges the
node's key K with the larger key of its children and checks whether the parental dominance holds for K in
its new position. This process continues until the parental dominance requirement for K is satisfied. After
completing the "heapification" of the subtree rooted at the current parental node, the algorithm proceeds
to do the same for the node's immediate predecessor. The algorithm stops after this is done for the tree's
root.
This algorithm constructs a heap by successive insertions of a new key into a previously constructed heap;
the algorithm starts with an empty heap and stops when all elements are inserted into the heap.
First, attach a new node with key K in it after the last leaf of the existing heap. Then shift K up to its
appropriate place in the new heap as follows.
Compare K with its parent’s key: if the latter is greater than or equal to K, stop (the structure is a
heap); otherwise, swap these two keys and compare K with its new parent. This swapping continues
until K is not greater than its last parent or it reaches the root (illustrated in Figure).
The figure below shows the sequence of key insertions done in the top-down construction of a heap for the
keys 2,8,6,1,10.
The deletion operation is normally performed to delete the root element of the heap. This is the dequeue
operation when heap is used as a priority queue. The algorithm for deleting an element from heap is given
below.
The figure below shows the use of the algorithm to delete from heap.
Inserting an element in to heap of n elements cannot require more key comparisons than the heap’s height.
Since the height of a heap with n nodes is about log2 n, the time efficiency of insertion is in O(log n).
The efficiency of deletion is determined by the number of key comparisons needed to “heapify” the tree
after the swap has been made and the size of the tree is decreased by 1. Since this cannot require more key
comparisons than twice the heap’s height, the time efficiency of deletion is in O(log n) as well.
Heap sort
Heapsort is a sorting algorithm that uses a heap data structure to sort elements efficiently. It has a time
complexity of O (n log n) in the worst, average, and best cases. This is a two-stage algorithm that works
as follows
Stage 2 (maximum deletions): Apply the root-deletion operation n − 1 times to the remaining heap.
In stage 2, the element deleted is placed immediately after the last element of the heap in the
underlying array structure.
For a given array that contains numbers, the steps in Stage 1 is shown in the figure below.
The steps in Stage 2 has deleting the max element from the heap placing it at the array position immediately
after the boundary of the heap. Deletion essentially involves swapping the root element with last leaf node
of the heap and then re-heapying the resulting tree. The heap structure and the array elements after each
deletion and re-heaping are shown in the figures below. The array elements which are not part of the heap
are at the end of array representation.
The time efficiency of heapsort is Θ(n log n) in both the worst and average cases. Each deletion operation
from the heap takes at most log n operations and since there are n such deletions, stage 2 of the heap sort
has a complexity of Θ(n log n). Stage 1, which constructs the heap also has O(n log n) complexity. Hence
the overall complexity of heap sort is Θ(n log n).
Problem 2
Dynamic programming is more often used in optimization problems though the technique can be used in
other problems also. We will first look at non-optimization problems where dynamic programming is used.
0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89 …., which can be defined by the simple recurrence
If we try to use recurrence directly to compute the nth Fibonacci number F(n), we would have to recompute
the same values of this function for a smaller number many times. For Example, computation of F(10)
requires computation of F(9) and F(8). The computation of F(9) also requires computation of F(8). If we
use recursive calls, F(8) will be computed twice. The time complexity of the algorithm can be reduced by
computing F(8) only once. Dynamic programming achieves this by noting down the result of the first
computation of F(8) and using this value in other computations where F(8) is required.
problems of a given problem to obtain the final solution. The iterative version of computing Fibonacci
number is following this approach of bottom-up dynamic programming. There exists a top-down variation
of dynamic programming which uses memory functions.
The following pseudocode can be used to compute all the binomial coefficients
What is the time efficiency of this algorithm? The algorithm's basic operation is addition. The inner loop
is executed i times until the value of i reaches k. Once i becomes greater the k, the loop is executed k times.
Hence,
Warshall's algorithm is for computing the transitive closure of a directed graph and Floyd's algorithm is
for the all-pairs shortest-paths problem. These algorithms are based on essentially the same idea, which we
can interpret as an application of the dynamic programming technique.
The adjacency matrix A of an unweighted directed graph is the Boolean matrix that has 1 in its ith row and
jth column if there is a directed edge from the ith vertex to the jth vertex and 0 otherwise. A transitive closure
of the digraph is a Boolean matrix that has 1 in its ith row and jth column if there is a path from ith vertex to
the jth vertex and 0 otherwise.
An example of a digraph, its adjacency matrix, and its transitive closure are given below.
One way to find out whether there is a path from i to j is to run either DFS or BFS from node i. This can
be repeated for all nodes to construct the transitive closure matrix.
Warshall’s algorithm named after Stephen Warshall, who discovered it provides a more efficient method
to find transitive closure. It is convenient to assume that the digraph’s vertices and hence the rows and
columns of the adjacency matrix are numbered from 1 to n. Warshall’s algorithm constructs the transitive
closure through a series of n × n boolean matrices:
R(0) does not allow any intermediate vertices in its paths (only direct edge)- Adjacency matrix
R(1) does not allow any intermediate vertices in its paths except vertex 1 is allowed as intermediate.
R(2) does not allow any intermediate vertices in its paths except vertex 1,2 is allowed as
intermediate.
….
R(n) reflects paths that can use all n vertices of the digraph as intermediate and hence is nothing else
but the digraph’s transitive closure.
The entry in ith row, jth column of R(k) tells us whether there in a path from vertex i to vertex j that has
vertices numbered 1 to k as intermediate vertices in the path. For example, R(0) is the original adjacency
matrix, which shows direct edges (i.e., 0 intermediate vertices) and R(1) has a 1 at [i,j] iff there is a path
from i to j involving vertex 1 as the only intermediate vertex in the path (i.e., there should be an edge from
i to 1 and from 1 to j).
The elements matrix R(k) is computed from R(k-1) using the formula:
The following method can be used for generating elements of matrix R(k)
Time complexity
All-Pairs Shortest-Paths Problems finds the shortest distance from each vertex to all other vertices in a
weighted graph (undirected or directed). Floyd’s algorithm is an all-pairs shortest path algorithm which
uses concepts that are similar to the ones used in Warshall’s algorithm. It is applicable to both undirected
and directed weighted graphs provided that they do not contain a cycle of a negative length.
Floyd’s algorithm records the lengths of shortest paths in an n × n matrix D called the distance matrix: the
element dij in the ith row and the jth column of this matrix indicates the length of the shortest path from the
ith vertex to the jth vertex.
Like Warshall’s algorithm, Floyd’s algorithm computes the shortest paths as series of n × n distance
matrices:
The entry in ith row, jth column of D(k) gives us the weight(distance) of the shortest path from vertex i to
vertex j that has vertices numbered 1 to k as intermediate vertices in the path.
D(0) is the original adjacency matrix, which shows the weight of the direct edges (i.e., no intermediate
vertices in the path). The entry in the ith row, jth column of this matrix is initialized with the weight of the
edges in the graph. If there is no edge between i and j in the graph, dij(0) has value infinity. The figure below
shows the adjacency matrix and the final distance matrix of a directed graph.
The elements matrix D(k) is computed from D(k-1) using the formula
This formula tells us that the shortest path from vertex i to j through vertices numbered 1 to k is the
minimum among the following
1. Distance of the path from i to j that has only intermediate nodes numbered 1 to k-1 (i.e., dij(k-1))
and
2. Sum of the distances of the two paths: Path from i to k that uses only intermediate nodes numbered
1 to k-1 , and path from k to j solely through nodes numbered 1 to k-1 (dik(k-1) + dkj(k-1))
Time complexity
For the illustration purpose, we will see how two entries in D(3) are calculated. Note that the node numbered
3 is c.