Sorting and Searching
Chapter 7 of textbook 1
Ph.D. Truong Dinh Huy
Sorting: The Problem Space
General problem
Given a set of N orderable items, put them in order
Without (significant) loss of generality, assume:
• Items are integers
• Ordering is (increasing order)
In-place Sorting
Sorting algorithms may be performed in-place, that is, with the allocation of
at most Q(1) additional memory (e.g., fixed number of local variables)
Other sorting algorithms require the allocation of second array of equal
size
• Requires Q(n) additional memory
We will prefer in-place sorting algorithms
Classifications
The operations of a sorting algorithm are based on the actions
performed:
• Insertion
• Exchanging
• Selection
• Merging
• Distribution
Run-time
The run time of the sorting algorithms we will look at fall into one of three
categories:
Q(n) Q(n ln(n)) O(n2)
We will examine best-, average- and worst-case scenarios for each
algorithm
The run-time may change significantly based on the scenario
Run-time
We will review the more traditional O(n2) sorting algorithms:
• Insertion sort
Some of the faster Q(n logn) sorting algorithms:
• Heap sort, Merge sort, and Quick sort
And linear-time sorting algorithms
• Bucket sort, Radix Sort
• We must make assumptions about the data
Lower-bound Run-time
Any sorting algorithm must examine each entry in the array at
least once
• Consequently, all sorting algorithms must be W(n)
We will not be able to achieve Q(n) behaviour without additional
assumptions
Insertion Sort
Initially P = 1
Let the first P elements be sorted.
(3)Then the P+1 th element is inserted properly in
the list so that now P+1 elements are sorted.
Now increment P and go to step (3)
P+1th element is inserted as follows:
Store the P+1 th element first as some temporary variable,
temp;
If Pth element greater than temp, then P+1th element is set
equal to the Pth one,
If P-1 th element greater than temp, then Pth element is set
equal to P-1th one…..
Continue like this till some kth element is less than or equal
to temp, or you reach the first position.
Let this stop at kth position (k can be 1). Set the k+1th
element equal to temp.
Extended Example
Need to sort:
34 8 64 51 32 21
P = 1;
Looking at first element only, and we do not change.
P = 2;
Temp = 8;
34 > Temp, so second element is set to 34.
We have reached the end of the list. We stop there. Thus,
first position is set equal to Temp;
After second pass;
8 34 64 51 32 21
Now, the first two elements are sorted.
Set P = 3.
Temp = 64, 34 < 64, so stop at 3rd position and set 3rd
position = 64
After third pass: 8 34 64 51 32 21
P = 4, Temp = 51, 51 < 64, so we have 8 34 64 64 32 21,
34 < 51, so stop at 2nd position, set 3rd position = Temp,
List is 8 34 51 64 32 21
Now P = 5, Temp = 32, 32 < 64, so 8 34 51 64 64 21,
32 < 51, so 8 34 51 51 64 21, next 32 < 34 so, 8 34
34, 51 64 21, next 32 > 8, so we stop at first position and
set second position = 32, we have
8 32 34 51 64 21,
Now P = 6,
We have 8 21 32 34 51 64
Pseudo Code
Assume that the list is stored in an array, A (can do with a
linked list as well)
Insertion Sort(A[],int N)
{
for (P = 1; P < N; P++)
{
Temp = A[P];
for (j = P; j > 0 and A[j-1] > Temp; j--) A[j] = A[j-1];
A[j] = Temp;
} }
Quiz
Sort the sequence 3, 1, 4, 1, 5, 9, 2, 6, 5 using insertion sort
Inversions
Consider the following three lists:
1 16 12 26 25 35 33 58 45 42 56 67 83 75 74 86 81 88 99 95
1 17 21 42 24 27 32 35 45 47 57 23 66 69 70 76 87 85 95 99
22 20 81 38 95 84 99 12 79 44 26 87 96 10 48 80 1 31 16 92
To what degree are these three lists unsorted?
Inversions
The first list requires only a few exchanges to make it sorted
Inversions
The second list has two entries significantly out of order
however, most entries (13) are in place
Inversions
The third list would, by any reasonable definition, be
significantly unsorted
Inversions
Given any list of n numbers, there are
n n(n 1)
2 2
pairs of numbers
For example, the list (1, 3, 5, 4, 2, 6) contains the following 15 pairs:
(1, 3) (1, 5) (1, 4) (1, 2) (1, 6)
(3, 5) (3, 4) (3, 2) (3, 6)
(5, 4) (5, 2) (5, 6)
(4, 2) (4, 6)
(2, 6)
Inversions
You may note that 11 of these pairs of numbers are in
order:
(1, 3) (1, 5) (1, 4) (1, 2) (1, 6)
(3, 5) (3, 4) (3, 2) (3, 6)
(5, 4) (5, 2) (5, 6)
(4, 2) (4, 6)
(2, 6)
Inversions
The remaining four pairs are reversed, or inverted
(1, 3) (1, 5) (1, 4) (1, 2) (1, 6)
(3, 5) (3, 4) (3, 2) (3, 6)
(5, 4) (5, 2) (5, 6)
(4, 2) (4, 6)
(2, 6)
Inversions
Given a permutation of n elements
a0, a1, ..., an – 1
an inversion is defined as a pair of entries which are reversed
That is, (aj, ak) forms an inversion if
j < k but aj > ak
Inversions
Therefore, the permutation
1, 3, 5, 4, 2, 6
contains four inversions:
(3, 2) (5, 4) (5, 2) (4, 2)
Number of Inversions
There are n n n 1 pairs of numbers in any set of n objects
2 2
Consequently, each pair contributes to
• the set of ordered pairs, or
• the set of inversions
For a random ordering, we would expect approximately half of all pairs, or
1 n nn 1
O(n 2 ), inversions
2 2 4
Any Algorithm that sorts by exchanging adjacent elements has O(n2) on average
Number of Inversions
Let us consider the number of inversions in our first three lists:
1 16 12 26 25 35 33 58 45 42 56 67 83 75 74 86 81 88 99 95
1 17 21 42 24 27 32 35 45 47 57 23 66 69 70 76 87 85 95 99
22 20 81 38 95 84 99 12 79 44 26 87 96 10 48 80 1 31 16 92
Each list has 20 entries, and therefore:
20 20 20 1
• There are 190 pairs
2
2
• On average, 190/2 = 95 pairs would form inversions
Number of Inversions
The first list
1 16 12 26 25 35 33 58 45 42 56 67 83 75 74 86 81 88 99 95
has 13 inversions:
(16, 12) (26, 25) (35, 33) (58, 45) (58, 42) (58, 56) (45, 42)
(83, 75) (83, 74) (83, 81) (75, 74) (86, 81) (99, 95)
This is well below 95, the expected number of inversions
• Therefore, this is likely not to be a random list
Number of Inversions
The second list
1 17 21 42 24 27 32 35 45 47 57 23 66 69 70 76 87 85 95 99
also has 13 inversions:
(42, 24) (42, 27) (42, 32) (42, 35) (42, 23) (24, 23) (27, 23)
(32, 23) (35, 23) (45, 23) (47, 23) (57, 23) (87, 85)
This, too, is not a random list
Number of Inversions
The third list
22 20 81 38 95 84 99 12 79 44 26 87 96 10 48 80 1 31 16 92
has 100 inversions:
(22, 20) (22, 12) (22, 10) (22, 1) (22, 16) (20, 12) (20, 10) (20, 1) (20, 16) (81, 38)
(81, 12) (81, 79) (81, 44) (81, 26) (81, 10) (81, 48) (81, 80) (81, 1) (81, 16) (81, 31)
(38, 12) (38, 26) (38, 10) (38, 1) (38, 16) (38, 31) (95, 84) (95, 12) (95, 79) (95, 44)
(95, 26) (95, 87) (95, 10) (95, 48) (95, 80) (95, 1) (95, 16) (95, 31) (95, 92) (84, 12)
(84, 79) (84, 44) (84, 26) (84, 10) (84, 48) (84, 80) (84, 1) (84, 16) (84, 31) (99, 12)
(99, 79) (99, 44) (99, 26) (99, 87) (99, 96) (99, 10) (99, 48) (99, 80) (99, 1) (99, 16)
(99, 31) (99, 92) (12, 10) (12, 1) (79, 44) (79, 26) (79, 10) (79, 48) (79, 1) (79, 16)
(79, 31) (44, 26) (44, 10) (44, 1) (44, 16) (44, 31) (26, 10) (26, 1) (26, 16) (87, 10)
(87, 48) (87, 80) (87, 1) (87, 16) (87, 31) (96, 10) (96, 48) (96, 80) (96, 1) (96, 16)
(96, 31) (96, 92) (10, 1) (48, 1) (48, 16) (48, 31) (80, 1) (80, 16) (80, 31) (31, 16)
This may be a random list
Complexity Analysis of Insertion Sort
Worst case analysis
Worst case occurs when for every j the inner loop has to move all elements A[1], . . . , A[j – 1]
(which happens when A[j] = key is smaller than all of them), that takes Θ(i – 1) time. Totally, we
have:
T(n)= Θ(1) + Θ(2) + … + Θ(n – 1) = Θ(n2)
Average case analysis
The time complexity of the algorithm depends on the number of inversion in the an array:
T(n) = O(n+d)= O(n+n(n-1)/2)= O(n2) with d is the number of inversions
Note: If the number of inversions in an array is low, the time complexity is low
Best case analysis
T(n)= Θ(n) when d = Θ(n) => the array has very few inversions or already increasing
order (d=0)
Additional Space requirement: O(1)
Quiz: Selection Sort
One way to sort is to select the smallest value in the group and bring
it to the top of the list. Continue this process until the entire list is
selected
Give the code and analyze the worst-, average-, and best-case of this
algorithm. Compare this algorithm with insertion sort.
Merge Sort
The merge sort algorithm is defined recursively:
• If the list is of size 1, it is sorted—we are done;
• Otherwise:
• Divide an unsorted list into two sub-lists,
• Sort each sub-list recursively using merge sort, and
• Merge the two sorted sub-lists into a single sorted list
• Divide and conquer algorithm
Merge Sort (2)
Need to sort:
34 8 64 51 32 21
Sort 34 8 64 : 8, 34, 64
Sort 51 32 21: 21, 32, 51
Merge the two:
We have 8, 21, 32, 34,51,64
Question: How quickly can we recombine the two
sub-lists into a single sorted list?
Merging Example
Consider the two sorted arrays and an empty array
Define three indices at the start of each array
Merging Example
We compare 2 and 3: 2 < 3
• Copy 2 down
• Increment the corresponding indices
Merging Example
We compare 3 and 7
• Copy 3 down
• Increment the corresponding indices
Merging Example
We compare 5 and 7
• Copy 5 down
• Increment the appropriate indices
Merging Example
We compare 18 and 7
• Copy 7 down
• Increment...
Merging Example
We compare 18 and 12
• Copy 12 down
• Increment...
Merging Example
We compare 18 and 16
• Copy 16 down
• Increment...
Merging Example
We compare 18 and 33
• Copy 18 down
• Increment...
Merging Example
We compare 21 and 33
• Copy 21 down
• Increment...
Merging Example
We compare 24 and 33
• Copy 24 down
• Increment...
Merging Example
We would continue until we have passed beyond the
limit of one of the two arrays
After this, we simply copy over all remaining entries in
the non-empty array
Merging Two Lists
Programming a merge is straight-forward:
• the sorted arrays, array1 and array2, are of size n1
and n2, respectively, and
• we have an empty array, arrayout, of size n1 + n2
Define three variables
int i1 = 0, i2 = 0, k = 0;
which index into these three arrays
Merging Two Lists
We can then run the following loop:
//...
int i1 = 0, i2 = 0, k = 0;
while ( i1 < n1 && i2 < n2 ) {
if ( array1[i1] < array2[i2] ) {
arrayout[k] = array1[i1];
++i1;
} else {
arrayout[k] = array2[i2];
++i2;
}
++k;
}
Merging Two Lists
We’re not finished yet, we have to empty out the remaining
array
for ( ; i1 < n1; ++i1, ++k ) {
arrayout[k] = array1[i1];
}
for ( ; i2 < n2; ++i2, ++k ) {
arrayout[k] = array2[i2];
}
Analysis of merging
• The body of the loops run a total of n1 + n2 times
• Hence, merging may be performed in Q(n1 + n2) = Q(n)
time
Problem: We cannot merge two arrays in-place
• This algorithm always required the allocation of a new
array
• Therefore, the memory requirements are also Q(n)
The Merge Sort Algorithm
The algorithm:
• Split the list into two approximately equal sub-lists
• Recursively call merge sort on both sub lists
• Merge the resulting sorted lists
The Merge Sort Algorithm
Question:
• we split the list into two sub-lists and sorted them
• how should we sort those lists?
Answer (theoretical):
• if the size of these sub-lists is > 1, use merge sort again
• if the sub-lists are of length 1, do nothing: a list of length
one is sorted
Code of Merge Sort
Run-time
The following table summarizes the run-times of merge sort
T(n) = 2T(N/2) + n => T(n) = Qnlogn)
Case Run Time Comments
Worst Q(n log(n)) No worst case
Average Q(n log(n))
Best Q(n log(n)) No best case
Why is it not O(n2)
When we are merging, we are comparing values
• What operation prevents us from performing O(n2) comparisons?
• During the merging process, if 2 came from the second half, it was only
compared to 3 and it was not compared to any other of the other n – 1
entries in the first array
• In this case, we remove n inversions with one comparison
Space Complexity
- Addition Memory required is O(logn +n) = O(n)
- Each recursive function call places its local variables, parameters, etc., on a stack
• The depth of the recursion tree is O(logn)
- Temporary array for merging: O(n)
- It is hardly ever used for main memory sorts
- Space complexity
- Additional work spent copying temporary array.
Quiz
• Sort 3, 1,4, 1,5,9,2,6 using mergesort.
Summary
• Introduction of sorting
• Insertion sort
• Merge sort
• Next week: Quick Sort, Bucket Sort, and Searching