DAA Unit 1 Notes
DAA Unit 1 Notes
(AKTU)
Rohit Mishra
(Assistant Professor)
Syllabus Unit-1
Introduction: Algorithms, Analyzing Algorithms, Complexity of Algorithms, Growth of Functions, Performance
Measurements, Sorting and Order Statistics - Shell Sort, Quick Sort, Merge Sort, Heap Sort, Comparison of Sorting
Algorithms, Sorting in Linear Time.
Unit-1
Introduction
1.1 Algorithm
An algorithm is any well-defined computational procedure that takes some value, or set of
values, as input and produces some value, or set of values, as output. An algorithm is thus a
sequence of computational steps that transform the input into the output.
For example, given the input sequence {31, 41, 59, 26, 41, 58), a sorting algorithm returns as
output the sequence {26, 31, 41, 41, 58, 59}. Such an input sequence is called an instance of the
sorting problem. ,
Instance: An instance of a problem consists of the input needed to compute a solution to the
problem.
An algorithm is said to be correct if, for every input instance, it halts with the correct output.
There are two aspects of algorithmic performance:
• Time
- Instructions take time.
- How fast does the algorithm perform?
- What affects its runtime?
• Space
- Data structures take space
- What kind of data structures can be used?
- How does choice of data structure affect the runtime?
1.1.1 Analysis of Algorithms
Analysis is performed with respect to a computational model
• We will usually use a generic uniprocessor random-access machine (RAM)
All memory equally expensive to access
No concurrent operations
All reasonable instructions take unit time
o Except, of course, function calls
Constant word size
Unless we are explicitly manipulating bits
Input Size:
• Time and space complexity
This is generally a function of the input size
o E.g., sorting, multiplication
How we characterize input size depends:
o Sorting: number of input items
Prepared By:
Mr. Rohit Mishra
Assistant Professor (UIT)
1
o Multiplication: total number of bits
o Graph algorithms: number of nodes & edges
o Etc
Running Time:
• Number of primitive steps that are executed
Except for time of executing a function call most statements roughly require the
same amount of time
o y=m*x+b
o c = 5 / 9 * (t - 32 )
o z = f(x) + g(y)
• We can be more exact if need be
Analysis:
• Worst case
Provides an upper bound on running time
An absolute guarantee
• Average case
Provides the expected running time
Very useful, but treat with care: what is “average”?
o Random (equally likely) inputs
o Real-life inputs
Prepared By:
Mr. Rohit Mishra
Assistant Professor (UIT)
2
Statement Effort
InsertionSort(A, n) {
for i = 2 to n { c1n
key = A[i] c2(n-1)
j = i - 1; c3(n-1)
= an-b = ϴ(n)
T(n) is a linear function
Worst case -- inner loop body executed for all previous elements
ti = i
Prepared By:
Mr. Rohit Mishra
Assistant Professor (UIT)
3
Average case: The “average case” is often roughly as bad as the worst case.
MERGE (A, p, q, r)
1 n1 ← q - p + 1
2 n2 ← r - q
3 create arrays L[1……. n1 + 1] and R[1……… n2 + 1]
4 for i ← 1 to n1
5 do L[i] ← A[p + i - 1]
6 for j ← 1 to n2
7 do R[j] ← A[q + j]
8 L[n1 + 1] ← ∞
9 R[n2 + 1] ← ∞
10 i ← 1
11 j ← 1
Prepared By:
Mr. Rohit Mishra
Assistant Professor (UIT)
4
12 for k ← p to r
13 do if L[i] ≤ R[j]
14 then A[k] ← L[i]
15 i ← i + 1
16 else A[k] ← R[j]
17 j ← j + 1
Example:
Prepared By:
Mr. Rohit Mishra
Assistant Professor (UIT)
5
Fig: The Merge procedure applies on given array and sort and combines the solution in recursive iteration.
MERGE-SORT (A, p, r)
1 if p < r
2 then q ← (p + r)/2
3 MERGE-SORT (A, p, q)
4 MERGE-SORT (A, q + 1, r)
5 MERGE (A, p, q, r)
The recurrence for the worst-case running time T(n) of merge sort:
The solution for above recurrence is ϴ (n log n).
Prepared By:
Mr. Rohit Mishra
Assistant Professor (UIT)
6
We say InsertionSort’s run time is O(n2). Properly we should say run time is in O(n2). Read O as
“Big-O” (you’ll also hear it as “order”)
In general a function f(n) is O(g(n)) if there exist positive constants c and n0 such that f(n)
Example.
1. functions in O(n2)
n2/ 1000, n1.9, n2, n2+n, 1000 n2+50n
2. Show 2n2 = O (n3)
0 ≤ h(n) ≤ cg(n) Definition of O(g(n))
0 ≤ 2n2 ≤ cn3 Substitute
0/n3 ≤ 2n2/n3 ≤ cn3/n3 Divide by n3
Determine c
0 ≤ 2/n ≤ c 2/n = 0
2/n maximum when n=1
0 ≤ 2/1 ≤ c = 2 Satisfied by c=2
Determine n0
0 ≤ 2/ n0 ≤ 2
0 ≤ 2/2 ≤ n0
0 ≤ 1 ≤ n0 = 1 Satisfied by n0=1
0 ≤ 2n2 ≤ 2n3 ∀ n ≥ n0=1
Big O Fact
A polynomial of degree k is O(nk)
Proof:
Suppose f(n) = bknk + bk-1nk-1 + … + b1n + b0
Let ai = | bi |
f(n) aknk + ak-1nk-1 + … + a1n + a0
n k a i nk n k a
i
cnk
n
Prepared By:
Mr. Rohit Mishra
Assistant Professor (UIT)
8
i
Prepared By:
Mr. Rohit Mishra
Assistant Professor (UIT)
9
Theorem: f(n) is (g(n)) iff f(n) is both O(g(n)) and (g(n))
Prepared By:
Mr. Rohit Mishra
Assistant Professor (UIT)
10
c g(n) < f(n) n n0
Intuitively,
• o() is like <
• () is like >
• () is like =
• () is like
• O() is like
1.4 Recurrences
A recurrence is an equation or inequality that describes a function in terms of its value on
smaller inputs. For example, the worst-case running time T (n) of the MERGE-SORT
procedure by the recurrence
T(n) = 2T(n/2˚) + n
We guess that the solution is T(n)=O(n lg n). The substitution method requires us to prove
T(n) ≤ c n lg n for an appropriate choice of the constant c > 0. We start by assuming that this
bound holds for all positive m < n, in particular for m = n/2˚, yielding T(n/2˚) =c(n/2˚) lg
(n/2˚).Substituting into the recurrence yields
Prepared By:
Mr. Rohit Mishra
Assistant Professor (UIT)
12
Fig: Constructing a recursion tree for the recurrence T(n)= 3T(n/4) + cn2. Part (a) shows T (n),
which progressively expands in (b)–(d) to form the recursion tree. The fully expanded tree in part
(d) has height log4n (it has log4n +1 levels).
The above fig shows how we derive the recursion tree for T(n)= 3T(n/4) + cn2 .For
convenience, we assume that n is an exact power of 4 so that all subproblem sizes are
integers. Part (a) of the figure shows T(n), which we expand in part (b) into an equivalent tree
representing the recurrence. The cn2 term at the root represents the cost at the top level of
recursion, and the three subtrees of the root represent the costs incurred by the subproblems
of size n=4. Part (c) shows this process carried one step further by expanding each node with
cost T(n/4) from part (b). The cost for each of the three children of the root is c(n/42). We
continue expanding each node in the tree by breaking it into its constituent parts as
determined by the recurrence.
Because subproblem sizes decrease by a factor of 4 each time the subproblem size for a node
at depth i is (n/4i). Thus for subproblem size at last level n=1 n/4i =1 then i = log4n and the
tree has log4n+1 levels. The cost of each node is derived by generalized term at depth i where
i= 0, 1, 2…….log4n-1 by c (n/4i)2 . the total cost over all nodes at depth i, for i= 0, 1,
2…….log4n-1 is 3i (n/4i )2 =(3/16)i cn2. The last level i.e. at depth log4n cost is 3log4n= nlog43
nodes each contributing cost T(1), for a total cost nlog43 T(1) which is ϴ(nlog43 ) since we
assume T(1) is constant.
Prepared By:
Mr. Rohit Mishra
Assistant Professor (UIT)
13
Now we add up the costs over all levels to determine the cost for the entire tree:
Thus, we have derived a guess of T(n)= O(n2). Now we can use the substitution method to
verify that our guess was correct, that is, T(n)= O(n2) is an upper bound for the recurrence
T(n)= 3T( n/4 ) + ϴ(n2). We want to show that T(n)≤ dn2 for some constant d > 0. Using the
same constant c > 0 as before, we have
Prepared By:
Mr. Rohit Mishra
Assistant Professor (UIT)
14
where we interpret n/b to mean either floor (n/b) or ceil (n/b). Then T(n) has the following
asymptotic bounds:
( ) ( )
1. If ( ) = − for some constant ϵ>0 then ()=
( ) ( )
2. If ( ) = − for some constant ϵ>0 then ()=
( )
3. If ( ) = + for some constant ϵ>0 and if a T (n/b) ≤c f(n) (regularity function
In each of the three cases, we compare the function f (n) with the function nlogba. Intuitively, the larger of the two
bba
functions determines the solution to the recurrence. In case 1the function nlog a is larger, solution is T(n)= n log .
In case 3 function f(n) is larger, solution is T(n)= ϴ(f (n)).
In case 2 both functions has same value, solution is T(n)= ϴ(nlog a logb n).
In the first case, not only must f(n) be smaller than nlog a, itb must be polynomially smaller. In the third
case, not only must f (n) be larger than nlog a, it also must
b be polynomially larger and in addition satisfy
the “regularity” condition that a T (n/b) ≤c f(n). An addition to that all three cases do not cover all
possibilities. Some function might be lies in between case 1 and 2 and some other lies in between case
2 and 3 because the comparison is not polynomial larger or smaller and in case 3 the regularity
condition fails.
Example. 1. . The given recurrence is
T(n) = 9T(n/3) + n
Sol: a=9, b=3, f(n) = n
nlogb a = nlog3 9
= (n2)
Since f(n) = O(nlog3 9 - ), where =1, case 1 applies:
( )
T (n) = n log b a when f (n) = O n log b a − )
Thus the solution is T(n) = (n2)
2. T(n)= T(2n/3)+1
in which a = 1, b= 3/2, f(n)= 1, and nlogba = nlog3/2 =n0 = 1. Case 2
applies, since f (n)= (nlogba )= (1) and thus the solution to the recurrence is T(n) =(lg n)
An array A that represents a heap is an object with two attributes: A.length, which (as usual)
gives the number of elements in the array, and A.heap-size, which represents how many
Prepared By:
Mr. Rohit Mishra
Assistant Professor (UIT)
16
elements in the heap are stored within array A. That is, although A (1……..A.length)
may contain numbers, only the elements in A(1……….A.heap-size), where 0≤ A.heap-size≤
A.length, are valid elements of the heap. The root of the tree is A(1), and given the index i of
a node, we can easily compute the indices of its parent, left child, and right child:
Parent(i)
return floor (i/2)
Left(
i)
retur
n 2i
Right(i)
return
2i+1
There are two kinds of binary heaps: max-heaps and min-heaps. In both kinds, the values in
the nodes satisfy a heap property, the specifics of which depend on the kind of heap. In a
max-heap, the max-heap property is that for every node i other than the root,
A[parent(i)]≥A[i]
That is the value of a node is lesser or equal to parent.
A min-heap is organized in the opposite way; the min-heap property is that for every node i
other than the root,
A[parent(i)]≤A[i]
For the heapsort algorithm, we use max-heaps.
Figure: The action of MAX-HEAPIFY(A, 2), where heap-size[A] = 10. (a) The initial
configuration, with A[2] at node i = 2 violating the max-heap property since it is not larger
than both children. The max-heap property is restored for node 2 in (b) by exchanging A[2]
with A[4], which destroys the max-heap property for node 4. The recursive call
MAXHEAPIFY (A, 4) now has i = 4. After swapping A[4] with A[9], as shown in (c), node 4
is fixed up, and the recursive call MAX-HEAPIFY(A, 9) yields no further change to the data
structure.
1.5.1.2 Complexity
We can describe the running time of MAX-HEAPIFY by the
recurrence T(n)≤2T(n/3)+ϴ(1)
The solution to this recurrence, by case 2 of the master theorem is T(n)= O(lg n)
1.5.2.1 BUILD-MAX-HEAP(A)
1. A.heap-size = A.length
2. for i = length[A]/2˚ downto 1
3. MAX-HEAPIFY(A, i)
The time required by MAX-HEAPIFY when called on a node of height h is O(h), the running
time of above algorithm is O(n).
Prepared By:
Mr. Rohit Mishra
Assistant Professor (UIT)
18
Figure: The operation of BUILD-MAX-HEAP, showing the data structure before the call to
MAX-HEAPIFY in line 3 of BUILD-MAX-HEAP. (a) A 10-element input array A and the
binary tree it represents. The figure shows that the loop index i refers to node 5 before the call
MAX-HEAPIFY(A, i). (b) The data structure that results. The loop index i for the next
iteration refers to node 4. (c)-(e) Subsequent iterations of the for loop in BUILD-MAXHEAP.
Observe that whenever MAX-HEAPIFY is called on a node, the two subtrees of that node are
both max- heaps. (f) The max-heap after BUILD-MAX-HEAP finishes.
1.5.3.1 HEAPSORT(A)
1. BUILD-MAX-HEAP(A)
2. for i = length(A) downto 2
3. Exchange (A[1] with A[i]);
4. A.heap-size= A.heap-size -1;
5. MAX-HEAPIFY (A, 1);
Prepared By:
Mr. Rohit Mishra
Assistant Professor (UIT)
19
Figure: The operation of HEAPSORT. (a) The max-heap data structure just after it has been built by
BUILD-MAX-HEAP. (b)-(j) The max-heap just after each call of MAXHEAPIFY in line 5. The value
of i at that time is shown. Only lightly shaded nodes remain in the heap. (k) The resulting sorted array
A.
Prepared By:
Mr. Rohit Mishra
Assistant Professor (UIT)
20
The key to the algorithm is the PARTITION procedure, which rearranges the subarray
A(p….r) in place.
PARTITION(A, p, r)
1x←
A[r] 2 i
←p-
1
3 for j ← p to r
- 1 4 do if A[j]
≤x
5 then i ← i + 1
6 exchange A[i] ↔ A[j]
7 exchange A[i + 1] ↔
A[r] 8 return i + 1
Figure: The operation of PARTITION on a sample array. Lightly shaded array elements are
all in the first partition with values no greater than x. Heavily shaded elements are in the
second partition with values greater than x. The unshaded elements have not yet been put in
one of the first two partitions, and the final white element is the pivot. (a) The initial array
and variable settings. None of the elements have been placed in either of the first two
partitions. (b) The value 2 is "swapped with itself" and put in the partition of smaller values.
(c)-(d) The values 8 and 7 are added to the partition of larger values. (e) The values 1 and 8
are swapped, and the smaller partition Grows. (f) The values 3 and 8 are swapped, and the
smaller partition grows. (g)-(h) The larger partition grows to include 5 and 6 and the loop
Prepared By:
Mr. Rohit Mishra
Assistant Professor (UIT)
21
terminates. (i) In lines 7-8, the pivot element is swapped so that it lies between the two par
titions.
Figure: The operation of COUNTING-SORT on an input array A[1,… .. , 8], where each
element of A is a nonnegative integer no larger than k = 5. (a) The array A and the
auxiliary array C after line 4. (b) The array C after line 7. (c)-(e) The output array B and
the auxiliary array C after one, two, and three iterations of the loop in lines 9-11,
respectively. Only the lightly shaded elements of array B have been filled in. (f) The final
sorted output array B.
Prepared By:
Mr. Rohit Mishra
Assistant Professor (UIT)
RadixSort(A, d)
for i=1 to d
StableSort(A) on digit i
Given n d-digit numbers in which each digit can take on up to k possible values,
RADIXSORT correctly sorts these numbers in Θ(d(n + k)) time.
Example:
Figure: The operation of radix sort on a list of seven 3-digit numbers. The leftmost
column is the input. The remaining columns show the list after successive sorts on
increasingly significant digit positions. Shading indicates the digit position sorted on to
produce each list from the previous one.
Prepared By:
Mr. Rohit Mishra
Assistant Professor (UIT)
Example:
Figure: The operation of BUCKET-SORT for n= 10. (a) The input array A(1… 10). (b) The
array B(0…… 9) of sorted lists (buckets) after line 8 of the algorithm. Bucket i holds values
in the half-open interval [i/10,.i + 1/10). The sorted output consists of a concatenation in
order of the lists B[0], B[1] B[9] To analyze the running time, observe that all lines except
line 5 take O (n) time in the worst case.
Prepared By:
Mr. Rohit Mishra
Assistant Professor (UIT)