Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
7 views54 pages

Ch09 Hash

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views54 pages

Ch09 Hash

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

Data Structure and Algorithms [CO2003]

Chapter 9 - Hash

Lecturer: Duc Dung Nguyen, PhD.


Contact: [email protected]

Faculty of Computer Science and Engineering


Hochiminh city University of Technology
Contents

1. Basic concepts

2. Hash functions

3. Collision resolution

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 1 / 44
Outcomes

• L.O.5.1 - Depict the following concepts: hashing table, key, collision, and collision
resolution.
• L.O.5.2 - Describe hashing functions using pseudocode and give examples to show their
algorithms.
• L.O.5.3 - Describe collision resolution methods using pseudocode and give examples to
show their algorithms.
• L.O.5.4 - Implement hashing tables using C/C++.
• L.O.5.5 - Analyze the complexity and develop experiment (program) to evaluate methods
supplied for hashing tables.
• L.O.1.2 - Analyze algorithms and use Big-O notation to characterize the computational
complexity of algorithms composed by using the following control structures: sequence,
branching, and iteration (not recursion).

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 2 / 44
Basic concepts
Basic concepts

• Sequential search: O(n)


• Binary search: O(log2 n)

→ Requiring several key comparisons before the target is found.

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 3 / 44
Basic concepts

Search complexity:
Size Binary Sequential (Av- Sequential (Worst
erage) Case)
16 4 8 16
50 6 25 50
256 8 128 256
1,000 10 500 1,000
10,000 14 5,000 10,000
100,000 17 50,000 100,000
1,000,000 20 500,000 1,000,000

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 4 / 44
Basic concepts

Is there a search algorithm whose complexity is O(1)?

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 5 / 44
Basic concepts

Is there a search algorithm whose complexity is O(1)?


YES

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 5 / 44
Basic concepts

Figure 1: Each key has only one address


Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 6 / 44
Basic concepts

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 7 / 44
Basic concepts

• Home address: address produced by a hash function.


• Prime area: memory that contains all the home addresses.

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 8 / 44
Basic concepts

• Home address: address produced by a hash function.


• Prime area: memory that contains all the home addresses.
• Synonyms: a set of keys that hash to the same location.
• Collision: the location of the data to be inserted is already occupied by the synonym data.

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 8 / 44
Basic concepts

• Home address: address produced by a hash function.


• Prime area: memory that contains all the home addresses.
• Synonyms: a set of keys that hash to the same location.
• Collision: the location of the data to be inserted is already occupied by the synonym data.
• Ideal hashing:
• No location collision
• Compact address space

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 8 / 44
Basic concepts

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 9 / 44
Basic concepts

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 10 / 44
Basic concepts

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 11 / 44
Basic concepts

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 12 / 44
Hash functions
Hash functions

• Direct hashing
• Modulo division
• Digit extraction
• Mid-square
• Folding
• Rotation
• Pseudo-random

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 13 / 44
Direct Hashing

The address is the key itself:


hash(Key) = Key

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 14 / 44
Direct Hashing

• Advantage: there is no collision.


• Disadvantage: the address space (storage size) is as large as the key space.

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 15 / 44
Modulo division

Address = Key mod listSize

• Fewer collisions if listSize is a prime number.


• Example:
Numbering system to handle 1,000,000 employees
Data space to store up to 300 employees
hash(121267) = 121267 mod 307 = 2

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 16 / 44
Digit extraction

Address = selected digits f rom Key


Example:
379452→394
121267→112
378845→388
160252→102
045128→051

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 17 / 44
Mid-square

Address = middle digits of Key 2


Example:
9452 * 9452 = 89340304→3403

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 18 / 44
Mid-square

• Disadvantage: the size of the Key 2 is too large.


• Variations: use only a portion of the key.
Example:
379452: 379 * 379 = 143641→364 121267: 121 * 121 = 014641→464 045128: 045 *
045 = 002025→202

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 19 / 44
Folding

The key is divided into parts whose size matches the address size.

Example:
Key = 123|456|789
fold shift
123 + 456 + 789 = 1368
→ 368

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 20 / 44
Folding

The key is divided into parts whose size matches the address size.

Example:
Key = 123|456|789
fold shift
123 + 456 + 789 = 1368
→ 368

fold boundary
321 + 456 + 987 = 1764
→ 764

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 20 / 44
Rotation

• Hashing keys that are identical except for the last character may create synonyms.
• The key is rotated before hashing.

original key rotated key


600101 160010
600102 260010
600103 360010
600104 460010
600105 560010

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 21 / 44
Rotation

• Used in combination with fold shift.

original key rotated key


600101 → 62 160010 → 26
600102 → 63 260010 → 36
600103 → 64 360010 → 46
600104 → 65 460010 → 56
600105 → 66 560010 → 66

Spreading the data more evenly across the address space.

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 22 / 44
Pseudo-random

For maximum efficiency, a and c should be prime numbers.

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 23 / 44
Pseudo-random

Example:
Key = 121267
a = 17
c=7
listSize = 307
Address = ((17*121267 + 7) mod 307
= (2061539 + 7) mod 307
= 2061546 mod 307
= 41

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 24 / 44
Collision resolution
Collision resolution

• Except for the direct hashing, none of the others are one-to-one mapping
→ Requiring collision resolution methods

• Each collision resolution method can be used independently with each hash function

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 25 / 44
Collision resolution

• Open addressing
• Linked list resolution
• Bucket hashing

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 26 / 44
Open addressing

When a collision occurs, an unoccupied element is searched for placing the new element in.

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 27 / 44
Open addressing

Hash function:
h : U → {0, 1, 2, ..., m − 1}

set of keys addresses

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 28 / 44
Open addressing

Hash and probe function:


hp : U × {0, 1, 2, ..., m − 1} → {0, 1, 2, ..., m − 1}

set of keys probe numbers addresses

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 29 / 44
Open Addressing

Algorithm hashInsert(ref T <array>, val k <key>)


Inserts key k into table T.

i=0
while i < m do
j = hp(k, i)
if T[j] = nil then
T[j] = k
return j
else
i=i+1
end
end
return error: “hash table overflow”
End hashInsert
Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 30 / 44
Open Addressing

Algorithm hashSearch(val T <array>, val k <key>)


Searches for key k in table T.

i=0
while i < m do
j = hp(k, i)
if T[j] = k then
return j
else if T[j] = nil then
return nil
else
i=i+1
end
end
return nil
End hashSearch
Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 31 / 44
Open Addressing

There are different methods:


• Linear probing
• Quadratic probing
• Double hashing
• Key offset

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 32 / 44
Linear Probing

• When a home address is occupied, go to the next address (the current address + 1):
hp(k, i) = (h(k) + i) mod m

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 33 / 44
Linear Probing

• When a home address is occupied, go to the next address (the current address + 1):
hp(k, i) = (h(k) + i) mod m

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 33 / 44
Linear Probing

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 34 / 44
Linear Probing

• Advantages:
• quite simple to implement
• data tend to remain near their home address (significant for disk addresses)

• Disadvantages:
• produces primary clustering

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 35 / 44
Quadratic Probing

• The address increment is the collision probe number squared:


hp(k, i) = (h(k) + i2 ) mod m

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 36 / 44
Quadratic Probing

• Advantages:
• works much better than linear probing

• Disadvantages:
• time required to square numbers
• produces secondary clustering
h(k1 ) = h(k2 ) → hp(k1 , i) = hp(k2 , i)

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 37 / 44
Double Hashing

• Using two hash functions:


hp(k, i) = (h1 (k) + ih2 (k)) mod m

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 38 / 44
Key Offset

• The new address is a function of the collision address and the key.

of f set = [key/listSize]
newAddress = (collisionAddress + of f set) mod listSize

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 39 / 44
Key Offset

• The new address is a function of the collision address and the key.

of f set = [key/listSize]
newAddress = (collisionAddress + of f set) mod listSize

hp(k, i) = (hp(k, i − 1) + [k/m]) mod m

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 39 / 44
Open addressing

Hash and probe function:


hp : U × {0, 1, 2, ..., m − 1} → {0, 1, 2, ..., m − 1}

set of keys probe numbers addresses

{hp(k, 0), hp(k, 1), . . . , hp(k, m − 1)} is a permutation of {0, 1, . . . , m − 1}

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 40 / 44
Linked List Resolution

• Major disadvantage of Open Addressing: each collision resolution increases the probability
for future collisions.
→ use linked lists to store synonyms

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 41 / 44
Linked list resolution

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 42 / 44
Bucket hashing

• Hashing data to buckets that can hold multiple pieces of data.


• Each bucket has an address and collisions are postponed until the bucket is full.

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 43 / 44
Bucket hashing

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Data Structure and Algorithms [CO2003] 44 / 44

You might also like