Internal Sorting
Department of Computer Science
HCMC University of Technology, Viet Nam
04, 2014
Outline
Introduction
Sorting is one of the most important concepts and
common applications
Classifying
• Internal Sort: all data in primary memory
• External Sort: big data (not fitted in primary memory)
Properties
• Stability: data with equal key maintain their relative
input order in the output
• Eficiency: number of comparisons and number of
moves
Insertion Sort
• Divide the list into two parts: sorted and unsorted
• For each step, insert the first element in the unsorted
part into suitable position the sorted part
Sorted part Unsorted part
Insertion Sort
• Divide the list into two parts: sorted and unsorted
• For each step, insert the first element in the unsorted
part into suitable position the sorted part
Sorted part Unsorted part
Insertion Sort
• Divide the list into two parts: sorted and unsorted
• For each step, insert the first element in the unsorted
part into suitable position the sorted part
Sorted part Unsorted part
Example
9 9 8 6 6 6 6 6
12 12 9 8 8 8 8 8
8 8 12 9 9 9 9 9
6 6 6 12 12 12 12 12
22 22 22 22 22 15 15 15
15 15 15 15 15 22 22 22
30 30 30 30 30 30 30 25
25 25 25 25 25 25 25 30
Insertion Sorting Implementation
Input: Unsorted array arr
Output: Sorted array arr
1 for (i = 1; i < n ; i++) {
2 tmp = arr[i];
3 for (j = i-1;
4 j >= 0 && tmp < arr[j];
5 j--)
6 arr[j+1]=arr[j];
7 arr[j+1]=tmp;
8 }
Selection Sort
• Divide the list into two parts: sorted and unsorted
• For each step, select the smallest/largest element in
the unsorted part and place it to the sorted part
Sorted part Unsorted part
Selection Sort
• Divide the list into two parts: sorted and unsorted
• For each step, select the smallest/largest element in
the unsorted part and place it to the sorted part
smallest
Sorted part Unsorted part
Selection Sort
• Divide the list into two parts: sorted and unsorted
• For each step, select the smallest/largest element in
the unsorted part and place it to the sorted part
Sorted part Unsorted part
Example
9 6 6 6 6 6 6 6
12 12 8 8 8 8 8 8
8 8 12 9 9 9 9 9
6 9 9 12 12 12 12 12
22 22 22 22 22 15 15 15
15 15 15 15 15 22 22 22
30 30 30 30 30 30 30 25
25 25 25 25 25 25 25 30
Selection Sorting Implementation
Input: Unsorted array arr
Output: Sorted array arr
1 for (i=0 ; i<n-1 ; i++) {
2 lowindex = i;
3 for (j=i+1 ; j<n ; j++)
4 if (arr[j] < arr[lowindex])
5 lowindex = j;
6 swap(arr,i,lowindex);
7 }
Bubble Sort
• Divide the list into two parts: sorted and unsorted
• For each step, the smallest/largest in the unsorted
part is bubbled toward the sorted part
Sorted part Unsorted part
Bubble Sort
• Divide the list into two parts: sorted and unsorted
• For each step, the smallest/largest in the unsorted
part is bubbled toward the sorted part
Sorted part Unsorted part
Bubble Sort
• Divide the list into two parts: sorted and unsorted
• For each step, the smallest/largest in the unsorted
part is bubbled toward the sorted part
Sorted part Unsorted part
Example
9 6 6 6
12 9 8 8
8 12 9 9
6 8 12 12
22 15 15 15
15 22 22 22
30 25 25 25
25 30 30 30
Shell Sort
• Invented by Donald L.Shell (1995)
• Also called diminishing-increment sort
• Divide data into K segments (or increment)
• These segments are dispersed through out the data
0 1 2 3 4 5 6 7 8 9
k=3
0 0+k 0+2k 0+3k
Segment 1
1 1+k 1+2k
Segment 2
2 2+k 2+2k
Segment 3
Shell Sorting Algorithm
• In each iteration, sort K segments using insertion sort
• Reduce K after each iteration until K == 1
Step 0 k=5 Sorted Seg. Combine
24 24 6 6
12 12 5 5
14 14 2 2
7 7 1 1
22 22 9 9
6 6 8 8
5 5 12 12
16 16 14 14
34 34 7 7
9 9 22 22
8 8 24 24
62 62 62 62
2 2 16 16
1 1 34 34
Step 1 k=3 Sorted Seg. Combine
Shell Sorting Implementation
1 for (k=first_incremental_value;
2 k>=1;
3 k=next_incremental_value)
4 for (segment=0;
5 segment<k;
6 segment++)
7 segmentSort(segment,k);
segmentSort(int segment,int k)
1 for (current=segment+k;
2 current<size;
3 current=current+k) {
4 tmp = data[current];
5 for (walker=current-k;
6 walker>=0
7 && tmp < data[walker];
8 walker=walker-k)
9 data[walker+k]=data[walker];
10 data[walker+k]=tmp;
11 }
Shell Sort Discussion
• incremental values should not be multiple of each
other
• (2k+1): 1, 3, 7, 15, 31,. . .
• (3k+1): 1, 4, 13, 40, . . .
• Time complexity through experiments: O(n1.25 )
Heap Sort
• Build a heap for data
• For each step, take the root of the heap and put it in
the sorted part
8 4
12 10 7 6
Heap Sort
• Build a heap for data
• For each step, take the root of the heap and put it in
the sorted part
8 7 2 4
12 10
Heap Sort
• Build a heap for data
• For each step, take the root of the heap and put it in
the sorted part
8 10 2 4 6
12
Example
heap swap reheap
12 6 10 7 8 4 7 2 6
10 10 6 6 6 6 6 6 2
8 8 8 8 7 7 4 4 4
2 2 2 2 2 2 2 7 7
4 4 4 4 4 8 8 8 8
7 7 7 10 10 10 10 10 10
6 12 12 12 12 12 12 12 12
Divide-And-Conquer Meta Algorithm
1 Partition data into many parts
2 for (each part)
3 Divide-And-Conquer on each part
4 Combine many resulted parts
Merge Sort
8 4 12 6 33 42 16 7
• Partition the array into two parts
8 4 12 6 33 42 16 7
• Sort two parts (using rescursive or iterative)
4 6 8 12 7 16 33 42
• Merge two ordered parts
4 6 7 8 12 16 33 42
Quick Sort
8 4 12 6 33 42 16 7 5
• Based on the pivot, partition the array into three
parts: less than, pivot, and greater than or equal to.
4 6 7 5 8 12 33 42 16
• Sort the first and the last parts (using rescursive or
iterative)
4 5 6 7 8 12 16 33 42
• Append three ordered parts
4 5 6 7 8 12 16 33 42
Pivot Selection
• C. A. Hoare (1962): the first element
• Simple
• Unbalanced parts
• R. C. Singleton (1969): the median of the first, last
and the middle elements
Partition
1 i n t p a r t i t i o n ( i n t key [ ] , i n t l e f t , i n t r i g h t , i n t p i v o t ) {
2 do {
3 while ( key [++ l e f t ] < p i v o t ) ;
4 while ( ( l e f t < r i g h t ) && key[−− r i g h t ] >= p i v o t ) ;
5 myswap ( key , l e f t , r i g h t ) ;
6 } while ( l e f t < r i g h t ) ;
7 return l e f t ;
8 }
Partition
1 i n t p a r t i t i o n ( i n t key [ ] , i n t l e f t , i n t r i g h t , i n t p i v o t ) {
2 do {
3 while ( key [++ l e f t ] < p i v o t ) ;
4 while ( ( l e f t < r i g h t ) && key[−− r i g h t ] >= p i v o t ) ;
5 myswap ( key , l e f t , r i g h t ) ;
6 } while ( l e f t < r i g h t ) ;
7 return l e f t ;
8 }
Partition
1 i n t p a r t i t i o n ( i n t key [ ] , i n t l e f t , i n t r i g h t , i n t p i v o t ) {
2 do {
3 while ( key [++ l e f t ] < p i v o t ) ;
4 while ( ( l e f t < r i g h t ) && key[−− r i g h t ] >= p i v o t ) ;
5 myswap ( key , l e f t , r i g h t ) ;
6 } while ( l e f t < r i g h t ) ;
7 return l e f t ;
8 }
left right
Partition
1 i n t p a r t i t i o n ( i n t key [ ] , i n t l e f t , i n t r i g h t , i n t p i v o t ) {
2 do {
3 while ( key [++ l e f t ] < p i v o t ) ;
4 while ( ( l e f t < r i g h t ) && key[−− r i g h t ] >= p i v o t ) ;
5 myswap ( key , l e f t , r i g h t ) ;
6 } while ( l e f t < r i g h t ) ;
7 return l e f t ;
8 }
< >= < >=
left right
Partition
1 i n t p a r t i t i o n ( i n t key [ ] , i n t l e f t , i n t r i g h t , i n t p i v o t ) {
2 do {
3 while ( key [++ l e f t ] < p i v o t ) ;
4 while ( ( l e f t < r i g h t ) && key[−− r i g h t ] >= p i v o t ) ;
5 myswap ( key , l e f t , r i g h t ) ;
6 } while ( l e f t < r i g h t ) ;
7 return l e f t ;
8 }
< >= < >=
left right
Partition
1 i n t p a r t i t i o n ( i n t key [ ] , i n t l e f t , i n t r i g h t , i n t p i v o t ) {
2 do {
3 while ( key [++ l e f t ] < p i v o t ) ;
4 while ( ( l e f t < r i g h t ) && key[−− r i g h t ] >= p i v o t ) ;
5 myswap ( key , l e f t , r i g h t ) ;
6 } while ( l e f t < r i g h t ) ;
7 return l e f t ;
8 }
< < >= >=
left right
Partition
1 i n t p a r t i t i o n ( i n t key [ ] , i n t l e f t , i n t r i g h t , i n t p i v o t ) {
2 do {
3 while ( key [++ l e f t ] < p i v o t ) ;
4 while ( ( l e f t < r i g h t ) && key[−− r i g h t ] >= p i v o t ) ;
5 myswap ( key , l e f t , r i g h t ) ;
6 } while ( l e f t < r i g h t ) ;
7 return l e f t ;
8 }
< >= >=
left right
Radix-Exchange Sort
7 4 1 6 3 2 5 0
1 11
111 100
1 00 011
0 11 110
1 10 011
0 11 010
0 10 101
1 01 000
0 00
• Based on the bit representation of the keys
• For each step, based on the corresponding bit,
partition the array into two parts: bit == 1 and bit == 0.
1 3 2 0 7 4 6 5
0 01 0 11 0 10 0 00 1 11 1 00 1 10 1 01
• Sort these two parts (using rescursive on the next
bits or iterative)
0 1 2 3 4 5 6 7
• Append two ordered parts
0 1 2 3 4 5 6 7
Example
7 111 011 001 000 0
4 100 000 000 001 1
2 010 010 010 010 2
6 110 001 011 011 3
5 101 101 101 100 4
1 001 110 100 101 5
0 000 100 110 110 6
3 011 111 111 111 7
Radix-Exchange Sorting Implementation
1 void radixSort(int[] keys,int l, int r,
int mask)
2 int i = l,j = r;
3 if (r<=l mask==0) return;
4 while (j!=i) {
5 while ((a[i] & mask==0) && (i<j)) i++;
6 while ((a[j] & mask==1) && (j>i)) j--;
7 swap(a,i,j);
8 }
9 radixSort(a,l,j-1,mask>>1);
10 radixSort(a,j,r,mask>>1);
11 }
Quick Sort vs. Radix-Exchange Sort
Similarities
• partition array
• sort subarray recursively
Differences
• Partitioning Method
• RE partitions array based on the bit at corresponding
position
• Q partitions array based on the pivot value
• Time complexity
• RE: O(bn)
• Q: O(nlog2 n)
Empirical Comparison
Sort 10 100 1K 10K 100K 1M Up Down
Insertion .00023 .007 0.66 64.98 7381.0 674420 0.04 129.05
Bubble .00035 .020 2.25 277.94 27691.0 2820680 70.64 108.69
Selection .00039 .012 0.69 72.47 7356.0 780000 69.76 69.58
Shell .00034 .008 0.14 1.99 30.2 554 0.44 0.79
Shell/O .00034 .008 0.12 1.91 29.0 530 0.36 0.64
Merge .00050 .010 0.12 1.61 19.3 219 0.83 0.79
Merge/O .00024 .007 0.10 1.31 17.2 197 0.47 0.66
Quick .00048 .008 0.11 1.37 15.7 162 0.37 0.40
Quick/O .00031 .006 0.09 1.14 13.6 143 0.32 0.36
Heap .00050 .011 0.16 2.08 26.7 391 1.57 1.56
Heap/O .00033 .007 0.11 1.61 20.8 334 1.01 1.04
Radix/4 .00838 .081 0.79 7.99 79.9 808 7.97 7.97
Radix/8 .00799 .044 0.40 3.99 40.0 404 4.00 3.99
Table: Running time in miliseconds
Summary
• Internal Sort requires all elements available on the
memory
• Insertion, Selection and Bubble Sort are simple but
bad performance
• Shell, Merge, Quick and Heap Sort are more complex
but good performance
• Radix Sort is not based on the value of keys but on
their radix