Data Structure and Algorithms [CO2003]
Chapter 9 - Hash
Lecturer: Vuong Ba Thinh
Contact:
[email protected]Faculty of Computer Science and Engineering
Hochiminh city University of Technology
Contents
1. Basic concepts
2. Hash functions
3. Collision resolution
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 1 / 44
Outcomes
L.O.5.1 - Depict the following concepts: hashing table, key,
collision, and collision resolution.
L.O.5.2 - Describe hashing functions using pseudocode and give
examples to show their algorithms.
L.O.5.3 - Describe collision resolution methods using pseudocode
and give examples to show their algorithms.
L.O.5.4 - Implement hashing tables using C/C++.
L.O.5.5 - Analyze the complexity and develop experiment (program)
to evaluate methods supplied for hashing tables.
L.O.1.2 - Analyze algorithms and use Big-O notation to
characterize the computational complexity of algorithms composed
by using the following control structures: sequence, branching, and
iteration (not recursion).
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 2 / 44
Basic concepts
Basic concepts
Sequential search: O(n)
Binary search: O(log2 n)
→ Requiring several key comparisons before the target is found.
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 3 / 44
Basic concepts
Search complexity:
Size Binary Sequential (Av- Sequential (Worst
erage) Case)
16 4 8 16
50 6 25 50
256 8 128 256
1,000 10 500 1,000
10,000 14 5,000 10,000
100,000 17 50,000 100,000
1,000,000 20 500,000 1,000,000
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 4 / 44
Basic concepts
Is there a search algorithm whose complexity is O(1)?
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 5 / 44
Basic concepts
Is there a search algorithm whose complexity is O(1)?
YES
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 5 / 44
Basic concepts
Figure 1: Each key has only one address
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 6 / 44
Basic concepts
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 7 / 44
Basic concepts
Home address: address produced by a hash function.
Prime area: memory that contains all the home addresses.
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 8 / 44
Basic concepts
Home address: address produced by a hash function.
Prime area: memory that contains all the home addresses.
Synonyms: a set of keys that hash to the same location.
Collision: the location of the data to be inserted is already occupied
by the synonym data.
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 8 / 44
Basic concepts
Home address: address produced by a hash function.
Prime area: memory that contains all the home addresses.
Synonyms: a set of keys that hash to the same location.
Collision: the location of the data to be inserted is already occupied
by the synonym data.
Ideal hashing:
No location collision
Compact address space
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 8 / 44
Basic concepts
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 9 / 44
Basic concepts
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 10 / 44
Basic concepts
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 11 / 44
Basic concepts
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 12 / 44
Hash functions
Hash functions
Direct hashing
Modulo division
Digit extraction
Mid-square
Folding
Rotation
Pseudo-random
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 13 / 44
Direct Hashing
The address is the key itself:
hash(Key) = Key
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 14 / 44
Direct Hashing
Advantage: there is no collision.
Disadvantage: the address space (storage size) is as large as the key
space.
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 15 / 44
Modulo division
Address = Key mod listSize
Fewer collisions if listSize is a prime number.
Example:
Numbering system to handle 1,000,000 employees
Data space to store up to 300 employees
hash(121267) = 121267 mod 307 = 2
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 16 / 44
Digit extraction
Address = selected digits f rom Key
Example:
379452→394
121267→112
378845→388
160252→102
045128→051
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 17 / 44
Mid-square
Address = middle digits of Key 2
Example:
9452 * 9452 = 89340304→3403
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 18 / 44
Mid-square
Disadvantage: the size of the Key 2 is too large.
Variations: use only a portion of the key.
Example:
379452: 379 * 379 = 143641→364 121267: 121 * 121 =
014641→464 045128: 045 * 045 = 002025→202
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 19 / 44
Folding
The key is divided into parts whose size matches the address size.
Example:
Key = 123|456|789
fold shift
123 + 456 + 789 = 1368
→ 368
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 20 / 44
Folding
The key is divided into parts whose size matches the address size.
Example:
Key = 123|456|789
fold shift
123 + 456 + 789 = 1368
→ 368
fold boundary
321 + 456 + 987 = 1764
→ 764
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 20 / 44
Rotation
Hashing keys that are identical except for the last character may
create synonyms.
The key is rotated before hashing.
original key rotated key
600101 160010
600102 260010
600103 360010
600104 460010
600105 560010
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 21 / 44
Rotation
Used in combination with fold shift.
original key rotated key
600101 → 62 160010 → 26
600102 → 63 260010 → 36
600103 → 64 360010 → 46
600104 → 65 460010 → 56
600105 → 66 560010 → 66
Spreading the data more evenly across the address space.
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 22 / 44
Pseudo-random
For maximum eciency, a and c should be prime numbers.
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 23 / 44
Pseudo-random
Example:
Key = 121267
a = 17
c = 7
listSize = 307
Address = ((17*121267 + 7) mod 307
= (2061539 + 7) mod 307
= 2061546 mod 307
= 41
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 24 / 44
Collision resolution
Collision resolution
Except for the direct hashing, none of the others are one-to-one
mapping
→ Requiring collision resolution methods
Each collision resolution method can be used independently with
each hash function
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 25 / 44
Collision resolution
Open addressing
Linked list resolution
Bucket hashing
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 26 / 44
Open addressing
When a collision occurs, an unoccupied element is searched for placing
the new element in.
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 27 / 44
Open addressing
Hash function:
h : U → {0, 1, 2, ..., m − 1}
set of keys addresses
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 28 / 44
Open addressing
Hash and probe function:
hp : U × {0, 1, 2, ..., m − 1} → {0, 1, 2, ..., m − 1}
set of keys probe numbers addresses
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 29 / 44
Open Addressing
Algorithm hashInsert(ref T <array>, val k <key>)
Inserts key k into table T.
i = 0
while i<m do
j = hp(k, i)
if T[j] = nil then
T[j] = k
return j
else
i = i + 1
end
end
return error: hash table overow
End hashInsert
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 30 / 44
Open Addressing
Algorithm hashSearch(val T <array>, val k <key>)
Searches for key k in table T.
i = 0
while i<m do
j = hp(k, i)
if T[j] = k then
return j
else if T[j] = nil then
return nil
else
i = i + 1
end
end
return nil
End hashSearch
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 31 / 44
Open Addressing
There are dierent methods:
Linear probing
Quadratic probing
Double hashing
Key oset
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 32 / 44
Linear Probing
When a home address is occupied, go to the next address (the
current address + 1):
hp(k, i) = (h(k) + i) mod m
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 33 / 44
Linear Probing
When a home address is occupied, go to the next address (the
current address + 1):
hp(k, i) = (h(k) + i) mod m
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 33 / 44
Linear Probing
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 34 / 44
Linear Probing
Advantages:
quite simple to implement
data tend to remain near their home address (signicant for disk
addresses)
Disadvantages:
produces primary clustering
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 35 / 44
Quadratic Probing
The address increment is the collision probe number squared:
hp(k, i) = (h(k) + i2 ) mod m
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 36 / 44
Quadratic Probing
Advantages:
works much better than linear probing
Disadvantages:
time required to square numbers
produces secondary clustering
h(k1 ) = h(k2 ) → hp(k1 , i) = hp(k2 , i)
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 37 / 44
Double Hashing
Using two hash functions:
hp(k, i) = (h1 (k) + ih2 (k)) mod m
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 38 / 44
Key Oset
The new address is a function of the collision address and the key.
of f set = [key/listSize]
newAddress = (collisionAddress + of f set) mod listSize
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 39 / 44
Key Oset
The new address is a function of the collision address and the key.
of f set = [key/listSize]
newAddress = (collisionAddress + of f set) mod listSize
hp(k, i) = (hp(k, i − 1) + [k/m]) mod m
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 39 / 44
Open addressing
Hash and probe function:
hp : U × {0, 1, 2, ..., m − 1} → {0, 1, 2, ..., m − 1}
set of keys probe numbers addresses
{hp(k, 0), hp(k, 1), . . . , hp(k, m − 1)} is a permutation of
{0, 1, . . . , m − 1}
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 40 / 44
Linked List Resolution
Major disadvantage of Open Addressing: each collision resolution
increases the probability for future collisions.
→ use linked lists to store synonyms
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 41 / 44
Linked list resolution
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 42 / 44
Bucket hashing
Hashing data to buckets that can hold multiple pieces of data.
Each bucket has an address and collisions are postponed until the
bucket is full.
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 43 / 44
Bucket hashing
Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 44 / 44