|
| 1 | +# Heap |
| 2 | + |
| 3 | +[Heap](#heap) is a widely adopted data structure in various computing applications such as priority queues, heap sort<sup>[[1]](https://en.wikipedia.org/wiki/Heap_(data_structure))</sup> and so on. |
| 4 | + |
| 5 | +Inherited from general [tree](overview.md) structure, a heap is either a [max-heap or a min-heap](#max-heap-and-min-heap). Normally, the heap is referred as binary heap, theoretical tree-like, which could be stored in either a static or dynamic structure such as a static array, tree nodes. |
| 6 | + |
| 7 | +It is mostly favored to use a static array structure given that INSERTION operations of new entries always happen in a row from left-side of the tree to the right-side of the tree, which means the heap structure is nearly a [complete binary tree](overview.md) fully filled except the leaf level. Figuratively, |
| 8 | + |
| 9 | +<figure style="text-align:center"> |
| 10 | + <img src="../images/binary_heap.png" /> |
| 11 | + <figcaption>Figure 1. Binary Heap in a Static Array Implementation</figcaption> |
| 12 | +</figure> |
| 13 | + |
| 14 | +New elements will be promptly appended to the end of the array and entry removal only happens at the _root_. And these operations will incur the [heap maintenance steps](#heap-property-maintenance) to ensure its status as a [max-heap or min-heap](#max-heap-and-min-heap) |
| 15 | + |
| 16 | +_Note: There are certain operations to take such as PARENT(i) = i/2, which is the index of node i's parent; LEFT(i) = 2i, which is the index of node i's left child; RIGHT(i) = 2i + 1, which is the index of node i's right child_. |
| 17 | + |
| 18 | +## Max Heap and Min Heap |
| 19 | + |
| 20 | +In a max heap, the key of a node is larger than or equal to the keys of its children. The largest element is stored in root. Specifically, given an array _A_ for entry storage: |
| 21 | + |
| 22 | +_A[PARENT(i)]_ ⩾ _A[i]_ |
| 23 | + |
| 24 | +Similarly, a min heap will have keys of nodes are smaller than their children. The smallest entry is stored in root. Then, |
| 25 | + |
| 26 | +_A[PARENT(i)]_ ⩽ _A[i]_ |
| 27 | + |
| 28 | +In a [Heap Sort](../sorting/heap-sort.md) algorithm, the max heap is chosen while the min heap is generally common in building a [priority queue](http://pages.cs.wisc.edu/~vernon/cs367/notes/11.PRIORITY-Q.html). |
| 29 | + |
| 30 | +_Note: only max heap data structure is used in further discussions_. |
| 31 | + |
| 32 | +## Heap Property Maintenance |
| 33 | + |
| 34 | +A MAX-HEAPIFY is a critical process for heap property maintenance. Given an array A and an index i, assuming LEFT(i) and RIGHT(i) are both max heap. Then, MAX-HEAPIFY is called upon if A[i] smaller than its children in order to adjust the position of A[i] in the total heap to maintain the overall max heap property. |
| 35 | + |
| 36 | +It is worth noted that MAX-HEAPIFY operation should only be performed where a single heap property violation happens. In a top-down fashion, |
| 37 | + |
| 38 | +<pre> |
| 39 | +<code> |
| 40 | +MAX_HEAPIFY(A, i) |
| 41 | + l = left(i) |
| 42 | + r = right(i) |
| 43 | + if l ⩽ heap-size(A) and A[l] > A[i] |
| 44 | + largest = l |
| 45 | + else |
| 46 | + largest = i |
| 47 | + if r ⩽ heap-size(A) and A[r] > A[largest] |
| 48 | + largest = r |
| 49 | + if largest ≠ i |
| 50 | + swap A[i] and A[largest] |
| 51 | + MAX_HEAPIFY(A, largest) |
| 52 | +</code> |
| 53 | +</pre> |
| 54 | + |
| 55 | +_Note: recursive calls happen because after swapping the current index i entry with left or right child, the corresponding right or left sub-tree could have a new heap property violation_. |
| 56 | + |
| 57 | +The operations before each call of MAX_HEAPIFY take constant time, and the total number of calls on MAX_HEAPIFY is bounded by Ο(_h_), wherein _h_ is the height of the heap. Therefore, the time complexity of MAX_HEAPIFY operation is Ο(log(n)) for a n-entry heap. |
| 58 | + |
| 59 | +## Build a Heap |
| 60 | + |
| 61 | +Given an unordered inputs _A_ stored in a static array structure, build a max heap from it involves iterative calls to MAX_HEAPIFY operation. Specifically, |
| 62 | + |
| 63 | +``` |
| 64 | +BUILD_MAX_HEAP(A) |
| 65 | + for i = length(A)/2 to 1 |
| 66 | + MAX_HEAPIFY(A, i) |
| 67 | +``` |
| 68 | + |
| 69 | +wherein all leaves of the heap are between the index length(A)/2 + 1 to length(A). The overall process is a bottom-up fashion and generate the max heap regardless of the number of heap property violations. |
| 70 | + |
| 71 | +### Algorithm Analysis |
| 72 | + |
| 73 | +At the bottom level of the heap, there are 2<sup>h</sup> nodes with each cost none for the heapify operation; at the level above the bottom, there are 2<sup>h-1</sup> nodes with each cost most 1 swapping for the heapify operation, and so on. Figuratively, |
| 74 | + |
| 75 | +<figure style="text-align:center"> |
| 76 | + <img src="../images/build_max_heap.png" /> |
| 77 | + <figcaption>Figure 2. Build Max Heap Total Work</figcaption> |
| 78 | +</figure> |
| 79 | + |
| 80 | +Then, at the level j, there are 2<sup>h-j</sup> nodes with each cost most j swappings for the heapify operation. Counting them up, |
| 81 | + |
| 82 | +<figure style="text-align:center"> |
| 83 | + <img src="../images/build_max_heap_1.png" /> |
| 84 | +</figure> |
| 85 | + |
| 86 | +By [infinite geometric series](https://en.wikipedia.org/wiki/Geometric_series#Proof_of_convergence), the sum of j/2<sup>j</sup> converges to 2; thus, |
| 87 | + |
| 88 | +Τ(n) ⩽ 2<sup>h+1</sup> = n + 1 = Ο(n) |
| 89 | + |
| 90 | +Obviously, the operation must access each of the inputs during heap building and a more tighter bound will be Θ(n). |
| 91 | + |
| 92 | +## Additional References |
| 93 | + |
| 94 | +1. Data Structures: Heaps. https://www.youtube.com/watch?v=t0Cq6tVNRBA |
| 95 | + |
| 96 | +2. Lecture Notes: Heapsort analysis. http://www.cs.umd.edu/~meesh/351/mount/lectures/lect14-heapsort-analysis-part.pdf |
0 commit comments