Optimal merge patterns:-
Optimal merge pattern is a pattern that relates to the merging of
two or more sorted files in a single sorted file.
here, we have two sorted files containing n and m records
respectively then they could be merged together, to obtain one
sorted file in time O (n+m).
The formula of external merging cost is:
Where, f (i) represents the number of records in each file and d (i)
represents the depth.
If more than 2 files need to be merged then it can be done in pairs .
For example, if need to merge 4 files A, B, C, D. First Merge A with
B to get X1, merge X1 with C to get X2, merge X2 with D to get X3
as the output file.
An optimal merge pattern corresponds to a binary merge tree with
minimum weighted external path length. The function tree
algorithm uses the greedy rule to get a two- way merge tree for n
files.
Question: we have 6 file with size of 2, 3, 5, 7, 9, 13 and we
have to find optimal merge Cost.
Step 1:-insert 2,3
Step 2:-insert 5
step 3:-insert 5
step 4:-insert 13
step 5:-insert 7,9
step 6:-
Root node represent the optimal merge cost, so the
Optimal Merge Cost: 5+10+23+16+39=93.
Huffman coding:-
Huffman Coding is a famous Greedy Algorithm.
It is used for the lossless compression of data.
It uses variable length encoding.
It assigns variable length code to all the characters.
The code length of a character depends on how frequently it occurs in the given text.
The character which occurs most frequently gets the smallest code.
The character which occurs least frequently gets the largest code.
It is also known as Huffman Encoding.
Prefix Rule-
Huffman Coding implements a rule known as a prefix rule.
This is to prevent the ambiguities while decoding.
It ensures that the code assigned to any character is not a prefix of the code assigned to
any other character.
Major Steps in Huffman Coding-
There are two major steps in Huffman Coding-
1. Building a Huffman Tree from the input characters.
2. Assigning code to the characters by traversing the Huffman Tree.
Huffman Tree-
The steps involved in the construction of Huffman Tree are as follows-
Step-01:
Create a leaf node for each character of the text.
Leaf node of a character contains the occurring frequency of that character.
Step-02:
Arrange all the nodes in increasing order of their frequency value.
Step-03:
Considering the first two nodes having minimum frequency,
Create a new internal node.
The frequency of this new node is the sum of frequency of those two nodes.
Make the first node as a left child and the other node as a right child of the newly created
node.
Step-04:
Keep repeating Step-02 and Step-03 until all the nodes form a single tree.
The tree finally obtained is the desired Huffman Tree.
Time Complexity-
The time complexity analysis of Huffman Coding is as follows-
extractMin( ) is called 2 x (n-1) times if there are n nodes.
As extractMin( ) calls minHeapify( ), it takes O(logn) time.
Thus, Overall time complexity of Huffman Coding becomes O(nlogn).
Here, n is the number of unique characters in the given text.
Important Formulas-
The following 2 formulas are important to solve the problems based on Huffman Coding-
Formula-01:
Formula-02:
Total number of bits in Huffman encoded message
= Total number of characters in the message x Average code length per character
= ∑ ( frequencyi x Code lengthi )
PRACTICE PROBLEM BASED ON HUFFMAN CODING-
Problem-
A file contains the following characters with the frequencies as shown. If Huffman Coding is
used for data compression, determine-
1. Huffman Code for each character
2. Average code length
3. Length of Huffman encoded message (in bits)
Characters Frequencies
a 10
e 15
i 12
o 3
u 4
s 13
t 1
Solution-
First let us construct the Huffman Tree.
Huffman Tree is constructed in the following steps-
Step-01:
Step-02:
Step-03:
Step-04:
Step-05:
Step-06:
Step-07:
Now,
We assign weight to all the edges of the constructed Huffman Tree.
Let us assign weight ‘0’ to the left edges and weight ‘1’ to the right edges.
Rule
If you assign weight ‘0’ to the left edges, then assign weight ‘1’ to the right
edges.
If you assign weight ‘1’ to the left edges, then assign weight ‘0’ to the right
edges.
Any of the above two conventions may be followed.
But follow the same convention at the time of decoding that is adopted at the
time of encoding.
After assigning weight to all the edges, the modified Huffman Tree is-
Now, let us answer each part of the given problem one by one-
1. Huffman Code For Characters-
To write Huffman Code for any character, traverse the Huffman Tree from root node to the
leaf node of that character.
Following this rule, the Huffman Code for each character is-
a = 111
e = 10
i = 00
o = 11001
u = 1101
s = 01
t = 11000
From here, we can observe-
Characters occurring less frequently in the text are assigned the larger code.
Characters occurring more frequently in the text are assigned the smaller code.
Fractional Knapsack Problem:-
The fractional knapsack problem is also one of the techniques which are
used to solve the knapsack problem. In fractional knapsack, the items are
broken in order to maximize the profit. The problem in which we break the
item is known as a Fractional knapsack problem.
This problem can be solved with the help of using two techniques:
o Brute-force approach: The brute-force approach tries all the possible
solutions with all the different fractions but it is a time-consuming
approach.
o Greedy approach: In Greedy approach, we calculate the ratio of
profit/weight, and accordingly, we will select the item. The item with
the highest ratio would be selected first.
There are basically three approaches to solve the problem:
o The first approach is to select the item based on the maximum profit.
o The second approach is to select the item based on the minimum
weight.
o The third approach is to calculate the ratio of profit/weight.
Consider the below example:
Objects: 1 2 3 4 5 6 7
Profit (P): 10 15 7 8 9 4
Weight (w): 1 3 5 4 1 3 2
W (Weight of the knapsack): 15
N (no of items): 7
Dijkstra Algorithm:-
Dijkstra Algorithm is a very famous greedy algorithm.
It is used for solving the single source shortest path problem.
It computes the shortest path from one particular source node to all other remaining nodes of
the graph.
Conditions-
It is important to note the following points regarding Dijkstra Algorithm-
Dijkstra algorithm works only for connected graphs.
Dijkstra algorithm works only for those graphs that do not contain any negative weight
edge.
The actual Dijkstra algorithm does not output the shortest paths.
It only provides the value or cost of the shortest paths.
By making minor modifications in the actual algorithm, the shortest paths can be easily
obtained.
Dijkstra algorithm works for directed as well as undirected graphs.
Algorithm-
dist[S] ← 0
for all v ∈ V - {S}
Π[S] ← NIL
do dist[v] ← ∞
S←∅
Π[v] ← NIL
while Q ≠ ∅
Q←V
S ← S ∪ {u}
do u ← mindistance (Q, dist)
for all v ∈ neighbors[u]
do if dist[v] > dist[u] + w(u, v)
then dist[v] ← dist[u] + w(u ,v)
return dist
Minimum Spanning Tree (MST):-
In a weighted graph, a minimum spanning tree is a spanning tree that has
minimum weight than all other spanning trees of the same graph. In real-world
situations, this weight can be measured as distance, congestion, traffic load or any
arbitrary value denoted to the edges.
A minimum spanning tree can be defined as the spanning tree in
which the sum of the weights of the edge is minimum. The weight
of the spanning tree is the sum of the weights given to the edges
of the spanning tree. In the real world, this weight can be
considered as the distance, traffic load, congestion, or any
random value.
Example of minimum spanning tree
Let's understand the minimum spanning tree with the help of an
example.
The sum of the edges of the above graph is 16. Now, some of the
possible spanning trees created from the above graph are -
So, the minimum spanning tree that is selected from the above
spanning trees for the given weighted graph is -
Applications of minimum spanning tree
The applications of the minimum spanning tree are given as
follows -
o Minimum spanning tree can be used to design water-supply
networks, telecommunication networks, and electrical grids.
o It can be used to find paths in the map.
Minimum Spanning-Tree Algorithm
We shall learn about two most important spanning tree algorithms here −
Kruskal's Algorithm
Prim's Algorithm
Kruskal’s Algorithm
Kruskal’s Algorithm is a famous greedy algorithm.
It is used for finding the Minimum Spanning Tree (MST) of a given graph.
To apply Kruskal’s algorithm, the given graph must be weighted,
connected and undirected.
Kruskal’s Algorithm Implementation-
The implementation of Kruskal’s Algorithm is explained in the following
steps-
Step-01:
Sort all the edges from low weight to high weight.
Step-02:
Take the edge with the lowest weight and use it to connect the vertices of
graph.
If adding an edge creates a cycle, then reject that edge and go for the
next least weight edge.
Step-03:
Keep adding edges until all the vertices are connected and a Minimum
Spanning Tree (MST) is obtained.
Thumb Rule to Remember
The above steps may be reduced to the following thumb rule-
Simply draw all the vertices on the paper.
Connect these vertices using edges with minimum weights such that no cycle gets
formed.
Kruskal’s Algorithm Time Complexity-
Worst case time complexity of Kruskal’s Algorithm
= O(ElogV) or O(ElogE)
Analysis-
The edges are maintained as min heap.
The next edge can be obtained in O(logE) time if graph has E edges.
Reconstruction of heap takes O(E) time.
So, Kruskal’s Algorithm takes O(ElogE) time.
The value of E can be at most O(V2).
So, O(logV) and O(logE) are same.
Special Case-
If the edges are already sorted, then there is no need to construct min
heap.
So, deletion from min heap time is saved.
In this case, time complexity of Kruskal’s Algorithm = O(E + V)
PRACTICE PROBLEMS BASED ON KRUSKAL’S ALGORITHM-
Problem-01:
Construct the minimum spanning tree (MST) for the given graph using
Kruskal’s Algorithm-
Solution-
To construct MST using Kruskal’s Algorithm,
Simply draw all the vertices on the paper.
Connect these vertices using edges with minimum weights such that no
cycle gets formed.
Step-01:
Step-02:
Step-03:
Step-04:
Step-05:
Step-06:
Step-07:
Since all the vertices have been connected / included in the MST, so we
stop.
Weight of the MST
= Sum of all edge weights
= 10 + 25 + 22 + 12 + 16 + 14
= 99 units
Prim’s Algorithm-
Prim’s Algorithm is a famous greedy algorithm.
It is used for finding the Minimum Spanning Tree (MST) of a given graph.
To apply Prim’s algorithm, the given graph must be weighted, connected
and undirected.
Prim’s Algorithm Implementation-
The implementation of Prim’s Algorithm is explained in the following steps-
Step-01:
Randomly choose any vertex.
The vertex connecting to the edge having least weight is usually selected.
Step-02:
Find all the edges that connect the tree to new vertices.
Find the least weight edge among those edges and include it in the existing
tree.
If including that edge creates a cycle, then reject that edge and look for the
next least weight edge.
Step-03:
Keep repeating step-02 until all the vertices are included and Minimum
Spanning Tree (MST) is obtained.
Prim’s Algorithm Time Complexity-
Worst case time complexity of Prim’s Algorithm is-
O(ElogV) using binary heap
O(E + VlogV) using Fibonacci heap
Time Complexity Analysis
If adjacency list is used to represent the graph, then using breadth first search, all the
vertices can be traversed in O(V + E) time.
We traverse all the vertices of graph using breadth first search and use a min heap for
storing the vertices not yet included in the MST.
To get the minimum weight edge, we use min heap as a priority queue.
Min heap operations like extracting minimum element and decreasing key value takes
O(logV) time.
So, overall time complexity
= O(E + V) x O(logV)
= O((E + V)logV)
= O(ElogV)
This time complexity can be improved and reduced to O(E + VlogV) using Fibonacci
heap.
PRACTICE PROBLEMS BASED ON PRIM’S
ALGORITHM-
Problem-01:
Construct the minimum spanning tree (MST) for the given graph using Prim’s
Algorithm-
Solution-
The above discussed steps are followed to find the minimum cost spanning
tree using Prim’s Algorithm-
Step-01:
Step-02:
Step-03:
Step-04:
Step-05:
Step-06:
Since all the vertices have been included in the MST, so we stop.
Now, Cost of Minimum Spanning Tree
= Sum of all edge weights
= 10 + 25 + 22 + 12 + 16 + 14
= 99 units
Problem-02:
Using Prim’s Algorithm, find the cost of minimum spanning tree (MST) of the
given graph-
Solution-
The minimum spanning tree obtained by the application of Prim’s Algorithm on
the given graph is as shown below-
Now, Cost of Minimum Spanning Tree
= Sum of all edge weights
= 1 + 4 + 2 + 6 + 3 + 10
= 26 units
Job sequencing with deadlines:-
In job sequencing problem, the object p is to find a sequence of jobs ,
which is completed within their deadlines and gives maximum profit.
Solution: -
Let us consider, a let of n given jobs which are associated
with deadlines and profit is earned, if a job is completed by its
deadlines these jobs need to be ordered in such a way that there is
maximum profit.
A may happen that all of the given jobs may not be completed within
their deadlines.
Assume, deadline of ith job ji is di and the profit received from this job is
Pi . Hence the optimal solution of this algorithm is a feasible solution
with maximum profit.
This D(i)>0 for i<=i<=n
Initially there jobs are ordered according to profit
i.e. P1>=P2>=P3>=……>=Pn.
Analysis:-
In this algorithm, we are using two loops one is within
another hence the complexity of this algorithm is 0(n2).