Class 2 – Abstract Data Types
Abstract Data: An abstract data type (ADT) is a set of objects with a set of
operations.
Common ADTs:
● Stacks: LIFO (Last-In-First-Out) data structure.
● Operations: push, pop, peek, isEmpty.
● Queues: FIFO (First-In-First-Out) data structure.
● Operations: enqueue, dequeue, peek, isEmpty.
● Lists: Ordered collection of elements.
● Operations: insert, delete, search, traverse.
● Trees: Hierarchical data structure.
● Types: binary trees, binary search trees, AVL trees, etc.
● Graphs: Non-linear data structure with nodes and edges.
● Types: directed graphs, undirected graphs, weighted graphs, etc.
Maps:
Maps or dictionaries are abstract data types that store key-value pairs. This
allows for efficient retrieval of values based on their corresponding keys.
Key Components of a Map ADT:
● Keys: Unique identifiers used to access values.
● Values: Data associated with the keys.
● Operations:
● insert(key, value): Adds a new key-value pair to the map.
● get(key): Retrieves the value associated with the given key.
● remove(key): Removes the key-value pair from the map.
● contains(key): Checks if the map contains the given key.
● size(): Returns the number of key-value pairs in the map.
Common Implementations of Maps:
● Hash Tables [unsorted]: Use hash functions to map keys to indices in an array.
● Advantages: Efficient for most operations.
● Disadvantages: Can suffer from collisions (multiple keys mapped to the
same index).
● Binary Search Trees: Store key-value pairs in a tree structure, with keys sorted in a
specific order.
● Advantages: Efficient for ordered operations (e.g., finding the minimum or
maximum value).
● Disadvantages: Can be inefficient for unordered operations in the worst
case.
● Red-Black Trees: A self-balancing binary search tree that maintains balance
properties.
● Advantages: Guaranteed logarithmic time for all operations.
● Disadvantages: More complex implementation than binary search trees.
Hash Tables:
Hash tables are a type of data structure that use a hash function to map keys to indices in
an array. This allows for efficient storage and retrieval of key-value pairs.
Key Components of a Hash Table:
● Hash Function: A function that takes a key as input and returns an integer index within
the array.
● Array: The underlying data structure that stores the key-value pairs.
● Collision Handling: A mechanism to handle cases where multiple keys map to the same
index (collision).
Common Collision Handling Techniques:
Two objects map to the same cell in the table
1. Separate Chaining:
a. Each index in the array points to a linked list that stores all key-value pairs that
hashed to that index.
b. Requires additional memory.
2. Open Addressing: When a collision occurs, the algorithm probes other indices in the
array until an empty slot is found.
a. Linear probing: Search sequentially from the collision point.
b. Quadratic probing: Search using a quadratic function of the probe index.
c. Double hashing: Use a second hash function to determine the probe sequence.
Worst-Case Performance: In the worst case, when all keys hash to the same index, operations can
degrade to O(n).
Linear Probing
When a collision occurs, linear probing sequentially searches for the next available index in
the hash table.
How Linear Probing Works:
1. Hash the key: Calculate the hash value using the hash function.
2. Check the index: If the corresponding index in the hash table is empty, insert the
key-value pair.
3. Collision: If the index is occupied, increment the index by 1 and repeat step 2.
xx= [18,41,22,44,59,32,31]
y=13 //this is the table size
hash(x) = x mod y = [5, 2, 9, 5, 7, 6, 5]
Cons of Linear Probing:
● Clustering: Can lead to clustering of elements, where consecutive indices are
filled. This can degrade performance, especially for high load factors.
● Deletion difficulties: Deleting elements can create "holes" in the table, which can
make subsequent searches less efficient.
Quadratic Probing:
Quadratic probing is not guaranteed to find an empty bucket.
Collision: If the index is occupied, increment the index by a quadratic function of the probe
number. The quadratic function is typically of the form i^2, where i is the probe number
starting from 0.
xx= [18,41,31,54,28,44,15]
y=13 //this is the table size
hash(x) = x mod y = [5, 2, 5, 2, 2, 5, 2]
collision occurs:
put it in where
j = 1,2,3…..N-1, and iteration.
2
𝐴 [ (𝑖 + 𝑗 ) % 𝑁
Array Hash (x%size) calculation
18 5
41 2
31 5 -> 6 2
(5 + 1 ) % 13 = 6
54 2 -> 3 2
(2 + 1 ) % 13 = 3
28 2 -> 11 2
(2 + 2 ) % 13 = 6
2
(2 + 3 ) % 13 = 11
44 5 -> 9 2
(5 + 2 ) % 13 = 9
15 2 -> 1 2
(2 + 4 ) % 13 = 5
2
(2 + 5 ) % 13 = 1
Double Hashing
Aims to minimize clustering and improve performance by using two hash functions.
How Double Hashing Works:
1. Primary hash function
2. Secondary hash function
3. Probe sequence: Use the secondary hash value to determine the probe sequence.
The probe sequence is calculated as follows:
○ index = (initial_hash + j * secondary_hash) % table_size
○ i starts from 0 and increments with each probe.
Common Secondary Hash Function:
d2(k) = q - k mod q
where
▪q<N
▪ q and N are prime
Collision:
( Hash1(x) + j * Hash2(x) ) mod N
Where,
j = 1,2,3…..N-1, an iteration
N = table_size
31 41 18 32 59 73 22 44
0 1 2 3 4 5 6 7 8 9 10 11 12
Arr Hash1 Hash2 probes calculation
(x%size) (q - x%q)
18 5 3 5
41 2 1 2
22 9 6 9
44 5 5 5 ->10 (5 + 1 * 5) % 13 = 10
59 7 4 7
32 6 3 6
31 5 4 5 -> 0 (5 + 1 * 4) % 13 = 9
(5 + 2 * 4) % 13 = 0
73 8 4 8
Load Factor:
The load factor of a hash table is the ratio of the number of elements stored in the table to
the total size of the table. It is calculated as:
Load factor (α) = Number of elements (n) / Table size (N)
𝑛
α = 𝑁
If,
α = 0, p is a constant
α = 1, p is a ∞ (infinite)
Ideal load factor is a<0.5. Then, the expected running time of a search/insertion/deletion is
O(1).
Rehashing
1. Create a new hash table with a new size.
2. remove all elements from the old table (one at a time)
3. Insert them into the new table (one at a time).
What’s the size of the new table?
it’s usually decided by the load factor
new table size = load_factor * number of elements
𝑁 = α*𝑛
Cuckoo hashing
It's named after the cuckoo bird, which lays its eggs in other birds' nests, causing the
original eggs to be evicted. In cuckoo hashing, when a collision occurs, one of the
conflicting elements is evicted and rehashed to a different location.
How it works:
1. Multiple hash functions: Cuckoo hashing uses two independent hash functions, h1
and h2.
2. Initial insertion:
a. h1, h2 tables are calculated.
b. Initially try to place it in table 1 using h1.
3. Collision:
a. KICK OUT occupant and get placed.
b. The kicked out dude uses h2 to get placed in the 2nd table.
h1 h2 Tab 1 Tab2
A 0 2 B D
B 0 0 C
D 1 0 B
C 1 4 E
F 3 4 F
E 3 2
What if the second table is also occupied?
- evict and try to place the evicted in table 1.
- This process continues. If it cycles indefinitely (causing many displacements),
the algorithm detects a cycle or limit of moves and triggers a rehashing
(resizing the hash tables and recalculating the hash functions) to resolve the
issue.
Advantages of cuckoo hashing:
● High load factors: Cuckoo hashing can handle very high load factors (ratio of
elements to buckets) without significant performance degradation.
● Constant time operations: On average, operations like insertion, deletion, and
search have constant time complexity.
● No clustering: Unlike linear probing or quadratic probing, cuckoo hashing doesn't
suffer from clustering issues.
Disadvantages of cuckoo hashing:
● Worst-case performance: In rare cases, cuckoo hashing can require a large
number of rehashings, leading to linear time complexity.
● Complexity: The implementation of cuckoo hashing can be more complex than
other collision resolution techniques.
Binary Heap
A heap is a binary tree satisfying the following properties:
▪ Heap-Order: for every node v other than the root
𝑘𝑒𝑦(𝑣) ≥ 𝑘𝑒𝑦(𝑝𝑎𝑟𝑒𝑛𝑡(𝑣))
▪ Complete Binary Tree: let h be the height of the heap
➢ for i = 0, … , h-1, there are 2i nodes of depth i
➢ at depth h, the external nodes are arranged to the
left of the tree
Height of a Heap
• Theorem: A heap storing n keys has height
O(log n)