Sorting
Most common applications in computer science is sort. Sorting is the process of
arranging the elements of a list either in ascending or descending order.
Sorts are generally classified as:
1. Internal sorting
2. External sorting
If all the elements to be sorted are present in the main memory then such sorting is
called internal sorting on the other hand, if some of the elements to be sorted are
kept on the secondary storage, it is called external sorting.
1.Internal sorting:
Internal sorting algorithms have been grouped into several different
classifications depending on their general approach in sorting. Different
classifications are insertion, selection and exchange.
Some sorting algorithms in different classifications are:
i. Insertion
a. Insertion sort
b. Shell sort
ii. Selection
a. Selection sort
b. Heap sort
iii. Exchange
a. Bubble sort
b. Quick sort
Heap sort:
Heap sort is a comparison based internal sorting technique based on Binary Heap
data structure. It is similar to selection sort where we first find the maximum
element and place the maximum element at the end. We repeat the same process
for the remaining elements.
A complete binary tree is a tree in which every node other than the leaves has two
children. A binary heap is a Complete Binary Tree where items are stored in a
special order such that value in a parent node is greater(or smaller) than the
values in its two children nodes. The former is called as max heap and the latter is
called min-heap. The heap can be represented by a binary tree or array.
Array based representation for Binary Heap:
Since a Binary Heap is a Complete Binary Tree, it can be easily represented as an
array and the array-based representation is space-efficient. If the parent node is
stored at index I, the left child can be calculated by 2 * I + 1 and right child by 2 *
I + 2 (assuming the indexing starts at 0).
Heap Sort Algorithm for sorting in increasing order:
Heap sort algorithm requires two phases:
1. Construct heap
2. sort heap
1.Construct heap:
In this phase, an array based list is transformed into a max_heap. An array based
list can be viewed as a binary tree. A binary tree would be called as max_ heap if
each node (i) is greater than or equal to its left(2i+1) and right(2i+2) children. To
transform a list into heap follow the given steps:
i.start at last non-leaf node of the list. This can be done using the formula
i = (n/2) – 1 where n – no of elements in a list,i – parent node
ii..transform the subtree pointed by index ‘i’ into a heap.
iii.decrement the index ‘i’ to point to other parent node.
iv. repeat steps 2 and 3 until the value of i>=0.
Algorithm to construct max_heap:
for(i=n/2-1;i>=0;i--)
heapify(a,n,i);
To transform a particular subtree into a heap follow the given steps:
i.suppose a[lc] and a[rc] are the left and right children of array or list a.
ii. if a[lc] and a[rc] finds the largest ofthem. The index of largest element is stored
in m.
if(lc<n&&a[lc]>a[m]) m=lc;
if(rc<n&&a[rc]>a[m])m=rc;
iii.if a[i] < a[m] then interchange a[i] with a[m] otherwise it says that the subtree
is already a heap.
iv.if the values have been interchanged in step iii then there are chances that the
sub tree a[m] might have become non-heap. So we have to apply steps ii and iii at
the subtree until all the heaps in the subtree are restored or until i>=0.
Ex:
Input data: 4, 10, 3, 5, 1
4(0)
/ \
10(1) 3(2)
/ \
5(3) 1(4)
The numbers in bracket represent the indices in the array representation of data.
Applying heap procedure to index 1:
4(0)
/ \
10(1) 3(2)
/ \
5(3) 1(4)
Applying heap procedure to index 0:
10(0)
/ \
5(1) 3(2)
/ \
4(3) 1(4)
2.sort heap:
In this phase, the root node of heap will be the largest element in the tree. The
following steps are used to sort the heap.
i.swap the root node with the last node of the tree. Now the last node is arranged
at its proper place.
ii. leave the last element and consider the remaining elements as the new list.
iii.The new list may not be in heap form. Therefore construct the heap of new list.
iv.Repeat i,ii and iii until all elements are placed at their proper place.
Algorithm to sort heap:
for(i=n-1;i>=0;i--)
{
swap(a[i],a[0]);
heapify(a,n,0);
}
Time complexity or sort efficiency of heap sort:
In heapsort first loop starts at the end of the array and moves through the heap
one element at a time until it reaches the first element. Therefore it loops n times.
It reheap by branch down a binary tree from root to leaf. Following the branches of
the binary tree from a root to a leaf requires logn loops.
It leads to the conclusion that heap sort time complexity is O(nlogn).
It is more efficient than selection sort.
/*Program to arrange elements of list in increasing order using heap sort*/
#include<stdio.h>
#include<conio.h>
void heapify(int a[], int n , int i)
{
int t, m, lc, rc;
m = i;
lc = 2 * i + 1;
rc = 2 *i+2;
if(lc<n && a[lc]>a[m])
m = lc;
if(rc<n && a[rc]>a[m])
m = rc;
if(m!=i)
{
t = a[i];
a[i]=a[m];
a[m] = t;
heapify(a,n,m);
}
}
void heap_sort(int a[],int n)
{int i,t;
for( i = (n/2)-1 ; i>=0;i++)
heapify(a,n,i);
for(i = n-1;i>=0;i--)
{
t = a[i];
a[i]=a[0];
a[0]=t;
heapify(a,i,0);
}
}
void main()
{int a[20],n,i;
printf(“enter array size\n”);
scnaf(“%d”,&n);
printf(“enter array elements\n”);
for(i=0;i<n;i++)
scanf(“%d”,&a[i]);
heap_sort(a,n);
printf(“sorted array is \n”);
for(i=0;i<n;i++)
printf(“%3d”,a[i]);
}
2.External sorting:
This type of sorting algorithms can handle large amounts of data. External sorting is
required when the data being sorted do not fit into the main memory. They must
reside in the external memory usually harddisk.
External sorting algorithms are external memory algorithms. External sorting uses
sort-merge strategy. In the sorting phase chunks of data small enough to fit in
main memory are read, sorted and written out to a temporary file. Each sorted
chunk is called as run. In the merge phase the sorted sub lists are combined.
Ex:
Assume that the file has 2300 records needs to be sorted.
Assume the record size and memory available for our sort program allow a
maximum array size of 500 records.
We begin by reading and sorting the first 500 records and writing them to a merge
output file. The remaining 1800 records kept in secondary storage. After writing out
first merge run, read second 500 records, sort them and write them to alternate
merge output file. Repeat this process for all records. This first processing of the
data into merge runs is known as sort phase.
After completing the sort phase proceed with the merge phase. Each complete
reading and merging of the input files to one or more output merge files is called as
merge phase. There are 3 different types of merging techniques existed. They are
1. natural merge
2. balanced two-way merge
3. polyphase merge
1.Natural merge:
A natural merge sorts constant number of input files (merge runs) to one output
merge file. Between each merge phase a distribution phase is required to
redistribute the merge runs to the input files for remerging.
The following figure shows natural merge with two input merge files and one output
merge file.
In natural merge all merge runs are written into one file. After all records are
completely ordered, the merge runs must be distributed to two input files between
each merge phase.
This merge is inefficient because of more read and write operations.
2.Balanced merge:
A balanced merge uses constant number of input files and the same number of
output merge files. Because multiple merge files are created in each merge phase
no distribution phase is required. The following figure shows balanced two-way
merge:
4 merge files are required in the balanced two-way merge. The first merge
phase merges the first merge run on file1 with the first merge run on file2 and
writes it to file3. It then merges the second merge run on file1 with second merge
run on file2 and writes it to file4. At this point all merge runs on file2 are completed
so we copy the remaining merge run on file1 to file3. This copying of 300 records is
wasted effort. To eliminate this step and to make merge process more efficient we
can use polyphase merge.
3.polyphase merge:
In polyphase merge, a constant number of input files are merged to one output
merge file, and input files are reused as output files when their input has been
completely merged. The following figure shows polyphase merge:
Here the first merge run on file1 is merged with the first merge run on file2.
Then the second merge run on fiel1 is merged with second merge run on file2. At
this point file2 is empty and first merge phase is completed. Close file2 and open it
as output file and close file3 and open it as input. The third merge run on file1is
then merged with first merge run on file3, with the merged data being written to
file2. Because file1 is empty merge phase2 is complete. Close fiel1 and open it as
output while close file2 and open it as input. There is only one merge run on each
input file the sort is complete when these two merge runs are merged to file1.
Merge sort:
Merge sort is a divide-and-conquer algorithm based on the idea of breaking down a
list into several sub-lists until each sublist consists of a single element and merging
those sublists in a manner that results into a sorted list.
Divide the unsorted list into N sublists, each containing 1 element.
Take adjacent pairs of two singleton lists and merge them to form a list of 2
elements. N will now convert into N/2 lists of size 2.
Repeat the process till a single sorted list of obtained.
While comparing two sublists for merging, the first element of both lists is taken
into consideration. While sorting in ascending order, the element that is of a lesser
value becomes a new element of the sorted list. This procedure is repeated until
both the smaller sublists are empty and the new combined sublist comprises all the
elements of both the sublists.
Let’s consider the following image
At each step a list of size M is being divided into 2 sublists of size M/2, until no
further division can be done. To understand better, consider a smaller
array A containing the elements (9,7,8).
At the first step this list of size 3 is divided into 2 sublists the first consisting of
elements (9,7) and the second one being (8). Now, the first list consisting of
elements (9,7) isfurther divided into 2 sublists consisting of
elements (9) and (7) respectively.
As no further breakdown of this list can be done, as each sublist consists of a
maximum of 1 element, we now start to merge these lists. The 2 sub-lists formed in
the last step are then merged together in sorted order using the procedure
mentioned above leading to a new list (7,9). Backtracking further, we then need to
merge the list consisting of element (8) too with this list, leading to the new sorted
list (7,8,9).
Merge sort program:
#include<stdio.h>
void merge(int a[], int low , int mid, int high)
{
int i,j,k,b[20];
i = low;
j = mid + 1;
k = low;
while( i<= mid && j <= high)
{
if(a[i]<=a[j])
{
b[k] = a[i];
i++;
k++;
}
else
{
b[k] = a[j];
j++;
k++;
}
}
for( ; i<=mid;i++,k++)
b[k]=a[i];
for( ;j<=high;j++,k++)
b[k]=a[j];
for(i=low;i<=high;i++)
a[i]=b[i];
}
void merge_sort(int a[], int low, int high)
{
int mid;
if(low < high)
{
mid = ( low + high ) / 2;
msort(a, low, mid);
msort(a, mid+1 , high);
merge(a, low, mid , high);
}
}
void main()
{
int i,n,a[20];
printf("\n Enter the number of elements: ");
scanf("%d",&n);
printf("\n Enter elements of array:");
for(i = 0; i< n ; i++)
scanf("%d", &a[i]);
merge_sort(a, 0,n-1);
printf ("\n Array Elements after sorting: ");
for (i = 0; i< n; i++)
printf ("%3d", a[i]);
}
Time Complexity:
The list of size n is divided into logn sublists, and the merging of all sublists into a
single list takes O(n) time, the worst case run time of this algorithm is O(nLogn).
GRAPHS
A graph contains a set of points known as nodes (or vertices) and set of links known
as edges (or Arcs) which connects the vertices.
A graph is defined as Graph is a collection of vertices and arcs which connects
vertices in the graph. A graph G is represented as G = ( V , E ), where V is set of
vertices and E is set of edges.
Example: graph G can be defined as G = ( V , E ) Where V = {A,B,C,D,E} and E =
{(A,B),(A,C)(A,D),(B,D),(C,D),(B,E),(E,D)}. This is a graph with 5 vertices and 6
edges.
Graph Terminology
1.Vertex : An individual data element of a graph is called as Vertex. Vertex
is also known as node. In above example graph, A, B, C, D & E are known as
vertices.
2.Edge : An edge is a connecting link between two vertices. Edge is also known as
Arc. An edge is represented as (starting Vertex, ending Vertex).
In above graph, the link between vertices A and B is represented as (A,B).
Edges are three types:
1.Undirected Edge - An undirected edge is a bidirectional edge. If there is an
undirected edge between vertices A and B then edge (A , B) is equal to edge (B ,
A).
2.Directed Edge - A directed edge is a unidirectional edge. If there is a directed
edge between vertices A and B then edge (A , B) is not equal to edge (B , A).
3.Weighted Edge - A weighted edge is an edge with cost on it.
Types of Graphs
1.Undirected Graph
A graph with only undirected edges is said to be undirected graph.
2.Directed Graph
A graph with only directed edges is said to be directed graph.
3.Complete Graph
A graph in which any V node is adjacent to all other nodes present in the graph is
known as a complete graph. An undirected graph contains the edges that are
equal to edges = n(n-1)/2 where n is the number of vertices present in the graph.
The following figure shows a complete graph.
4.Cycle Graph
A graph having cycle is called cycle graph. In this case the first and last nodes are
the same. A closed simple path is a cycle.
5.Acyclic Graph
A graph without cycle is called acyclic graphs.
6. Weighted Graph
A graph is said to be weighted if there are some non negative value assigned
to each edges of the graph. The value is equal to the length between two
vertices. Weighted graph is also called a network.
Outgoing Edge
A directed edge is said to be outgoing edge on its orign vertex.
Incoming Edge
A directed edge is said to be incoming edge on its destination vertex.
Degree
Total number of edges connected to a vertex is said to be degree of that vertex.
Indegree
Total number of incoming edges connected to a vertex issaid
to be indegree of that vertex. Outdegree
Total number of outgoing edges connected to a vertex is said
to be outdegree of that vertex.
Self-loop
An edge (undirected or directed) is a self-loop if its two endpoints coincide.
Adjacent nodes
When there is an edge from one node to another then these
nodes are called adjacent nodes.
Path
A sequence of vertices. If there is an edge from each vertex to its successor is
called a path.
If e1 and e2 be the two edges between the pair of vertices (v1,v3) and (v1,v2)
respectively, then v3 v1 v2 be its path.
Length of a path
The number of edges in a path is called the length of that path. In the
following, the length of the path is 3.
Sub Graph
A graph S is said to be a sub graph of a graph G if all the vertices and all the
subgraph of G is a graph G’ such that V(G’) ⊆V(G) and E(G’) ⊆E(G)
edges of S are in G, and each edge of S has the same end vertices in S as in G. A
Connected Graph
A graph G is said to be connected if there is at least one path between every pair
of vertices in G. Otherwise,G is disconnected.
A connected graph G A disconnected graph G
This graph is disconnected because the vertex v1 is not connected
withthe other vertices of the graph.
ADT of Graph:
Structure Graph is
where each edge is a pair of vertices functions: for all graph ∈Graph, v,
objects: a nonempty set of vertices and a set of undirected edges,
v1 and v2 ∈Vertices
Graph Create()::=return an empty graph
Graph InsertVertex(graph, v)::= return a graph with v inserted. v has no edge.
Graph InsertEdge(graph, v1,v2)::= return a graph with new edge
between v1 and v2
Graph DeleteVertex(graph, v)::= return a graph in which v and all
edges incident to it are removed
Graph DeleteEdge(graph, v1, v2)::=return a graph in which the edge
(v1, v2) is removed
Boolean IsEmpty(graph)::= if (graph==empty graph) return TRUE else
return FALSE
List Adjacent(graph,v)::= return a list of all vertices that are adjacent
to v
Graph Representations
Graph data structure is represented using following representations
1. Adjacency Matrix
2. Adjacency List
1.Adjacency Matrix
In this representation, graph can be represented using a matrix of size total
number of vertices by total number of vertices; means if a graph with 4 vertices
can be represented using a matrix of 4X4 size.
In this matrix, rows and columns both represent vertices.
This matrix is filled with either 1 or 0. Here, 1 represents there is an edge from row
vertex to column vertex and 0 represents there is no edge from row vertex to
column vertex.
Adjacency Matrix : let G = (V, E) with n vertices, n ≥ 1. The adjacency matrix
of G is a 2-dimensional n × n matrix, A,
A(i, j) = 1 iff (vi, vj) ∈E(G) (〈vi, j〉for a diagraph)
A(i, j) = 0 otherwise.
The matrix is symmetric in case of undirected graph, while it may be asymmetric if
the graph is directed. This matrix is also called as Boolean matrix or bit matrix.
Graph-G1 Adjacency Matrix of G1
In case of weighted graph, the entries are weights of the edges between the
vertices. The adjacency matrix for a weighted graph is called as cost adjacency
matrix.
2.Adjacency List:
In this representation, the n rows of the adjacency matrix are represented as n
linked lists. An array Adj[1, 2, . . . . . n] of pointers where for 1 < v < n, Adj[v] points
to a linked list containing the vertices which are adjacent to v (i.e. the vertices that
can be reached from v by a single edge). If the edges have weights then these
weights may also be stored in the linked list elements.
Graph-G Adjacency list of G Adjacency Matrix of G
Spanning tree:
A spanning tree is a subset of graph G, it has all vertices and with minimum
number of edges. Spanning does not have cycles, and it cannot be disconnected.
A connected graph G can have more than one spanning trees.
All possible spanning trees have same number of edges and vertices.
The spanning tree does not have cycles.
Removing one edge from spanning tree will make the graph disconnected.
Adding one edge to the spanning tree will create a cycle or loop. There fore
the spanning tree is acyclic.
A complete graph can have maximum nn-2 number of spanning trees.
Here n is 3. Hence 33-2 = 31 = 3 spanning trees are possible.
APPLICATIONS OF GRAPH DATA STRUCTURE:
In Computer science graphs are used to represent the flow of
computation.
Google maps uses graphs for building transportation systems, where
intersection of two(or more) roads are considered to be a vertex and the
road connecting two vertices is considered to be an edge, thus their
navigation system is based on the algorithm to calculate the shortest
path between two vertices.
In Facebook, users are considered to be the vertices and if they are
friends then there is an edge running between them. Facebook’s Friend
suggestion algorithm uses graph theory. Facebook is an example
of undirected graph.
In World Wide Web, web pages are considered to be the vertices.
There is an edge from a page u to other page v if there is a link of page v
on page u. This is an example of Directed graph.
In Operating System, we come across the Resource Allocation Graph
where each process and resources are considered to be vertices. Edges
are drawn from resources to the allocated process, or from requesting
process to the requested resource.
Graph traversal techniques:
Graph traversal is a technique used for searching a vertex in a graph. The
graph traversal is also used to decide the order of vertices is visited in the search
process. A graph traversal finds the edges to be used in the search process without
creating loops. That means using graph traversal we visit all the vertices of the
graph without getting into looping path.
There are two graph traversal techniques and they are
1.DepthFirstSearch (DFS)
2.BredthFirstSearch(BFS)
1.DepthFirstSearch Traversal:
DFS traversal of a graph produces the spanning tree as final result. That is a
graph with no cycles.
We use stack data structure with maximum size of total number of vertices in
the graph to implement DFS traversal of a graph.
We use the following steps to implement DFS traversal...
Step 1: Define a Stack of size total number of vertices in the graph.
Step 2: Mark all vertices as unvisited initially.
Step 3: Start traversal from any vertex. Push initial vertex to be visited into stack.
Step 4: until stack is empty repeat steps 5 and 6.
Step 5: pop the top element of stack and name it as x.
Step 6: If x is not visited mark it as visited. Find all unvisited adjacent vertices of
x and called as w and push w into stack.
Step 7: When stack becomes Empty, then produce final spanning tree by
removing unused edges from the graph.
2.Bredth First Search traversal:
BFS traversal of a graph, produces a spanning tree as final result. Spanning
Tree is a graph without any loops. We use Queue data structure with maximum
size of total number of vertices in the graph to implement BFS traversal of a
graph.
We use the following steps to implement BFS traversal...
Step 1: Define a Queue of size total number of vertices in the graph.
Step 2: Select any vertex as starting point for traversal. Visit that vertex and
insert it into the Queue.
Step 3: Remove front node from queue.
Step 4: Find all the unvisited adjacent vertices of deleted node and insert them
into queue and mark those nodes as visited.
Step 5: Repeat step 3 and 4 until queue becomes empty.
Step 6: When queue becomes Empty, then produce final spanning tree by
removing unused edges from the graph.
Time complexity:time complexity of both bfs and dfs to visit all nodes is
O(V+E) where V – number of vertices and E-number of edges.
Ex: