Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
88 views54 pages

Data Structure and Algorithms (CO2003) : Chapter 9 - Hash

This document discusses hash tables and hash functions. It covers key concepts related to hashing like home addresses, collisions, and collision resolution methods. The document outlines learning outcomes related to depicting hashing concepts, describing hashing functions and collision resolution methods, implementing hashing tables, and analyzing hashing algorithms. It also covers various hash functions including direct hashing, modulo division, digit extraction, mid-square, folding, rotation, and pseudo-random techniques.

Uploaded by

Hùng Nguyễn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views54 pages

Data Structure and Algorithms (CO2003) : Chapter 9 - Hash

This document discusses hash tables and hash functions. It covers key concepts related to hashing like home addresses, collisions, and collision resolution methods. The document outlines learning outcomes related to depicting hashing concepts, describing hashing functions and collision resolution methods, implementing hashing tables, and analyzing hashing algorithms. It also covers various hash functions including direct hashing, modulo division, digit extraction, mid-square, folding, rotation, and pseudo-random techniques.

Uploaded by

Hùng Nguyễn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

Data Structure and Algorithms [CO2003]

Chapter 9 - Hash

Lecturer: Vuong Ba Thinh


Contact: [email protected]

Faculty of Computer Science and Engineering


Hochiminh city University of Technology
Contents

1. Basic concepts

2. Hash functions

3. Collision resolution

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 1 / 44
Outcomes

ˆ L.O.5.1 - Depict the following concepts: hashing table, key,

collision, and collision resolution.

ˆ L.O.5.2 - Describe hashing functions using pseudocode and give

examples to show their algorithms.

ˆ L.O.5.3 - Describe collision resolution methods using pseudocode

and give examples to show their algorithms.

ˆ L.O.5.4 - Implement hashing tables using C/C++.

ˆ L.O.5.5 - Analyze the complexity and develop experiment (program)

to evaluate methods supplied for hashing tables.

ˆ L.O.1.2 - Analyze algorithms and use Big-O notation to

characterize the computational complexity of algorithms composed

by using the following control structures: sequence, branching, and

iteration (not recursion).

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 2 / 44
Basic concepts
Basic concepts

ˆ Sequential search: O(n)


ˆ Binary search: O(log2 n)

→ Requiring several key comparisons before the target is found.

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 3 / 44
Basic concepts

Search complexity:

Size Binary Sequential (Av- Sequential (Worst

erage) Case)

16 4 8 16

50 6 25 50

256 8 128 256

1,000 10 500 1,000

10,000 14 5,000 10,000

100,000 17 50,000 100,000

1,000,000 20 500,000 1,000,000

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 4 / 44
Basic concepts

Is there a search algorithm whose complexity is O(1)?

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 5 / 44
Basic concepts

Is there a search algorithm whose complexity is O(1)?


YES

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 5 / 44
Basic concepts

Figure 1: Each key has only one address

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 6 / 44
Basic concepts

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 7 / 44
Basic concepts

ˆ Home address: address produced by a hash function.

ˆ Prime area: memory that contains all the home addresses.

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 8 / 44
Basic concepts

ˆ Home address: address produced by a hash function.

ˆ Prime area: memory that contains all the home addresses.

ˆ Synonyms: a set of keys that hash to the same location.

ˆ Collision: the location of the data to be inserted is already occupied

by the synonym data.

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 8 / 44
Basic concepts

ˆ Home address: address produced by a hash function.

ˆ Prime area: memory that contains all the home addresses.

ˆ Synonyms: a set of keys that hash to the same location.

ˆ Collision: the location of the data to be inserted is already occupied

by the synonym data.

ˆ Ideal hashing:

ˆ No location collision
ˆ Compact address space

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 8 / 44
Basic concepts

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 9 / 44
Basic concepts

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 10 / 44
Basic concepts

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 11 / 44
Basic concepts

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 12 / 44
Hash functions
Hash functions

ˆ Direct hashing

ˆ Modulo division

ˆ Digit extraction

ˆ Mid-square

ˆ Folding

ˆ Rotation

ˆ Pseudo-random

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 13 / 44
Direct Hashing

The address is the key itself:

hash(Key) = Key

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 14 / 44
Direct Hashing

ˆ Advantage: there is no collision.

ˆ Disadvantage: the address space (storage size) is as large as the key

space.

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 15 / 44
Modulo division

Address = Key mod listSize

ˆ Fewer collisions if listSize is a prime number.

ˆ Example:

Numbering system to handle 1,000,000 employees

Data space to store up to 300 employees

hash(121267) = 121267 mod 307 = 2

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 16 / 44
Digit extraction

Address = selected digits f rom Key


Example:

379452→394

121267→112

378845→388

160252→102

045128→051

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 17 / 44
Mid-square

Address = middle digits of Key 2


Example:

9452 * 9452 = 89340304→3403

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 18 / 44
Mid-square

ˆ Disadvantage: the size of the Key 2 is too large.

ˆ Variations: use only a portion of the key.

Example:

379452: 379 * 379 = 143641→364 121267: 121 * 121 =

014641→464 045128: 045 * 045 = 002025→202

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 19 / 44
Folding

The key is divided into parts whose size matches the address size.

Example:

Key = 123|456|789

fold shift
123 + 456 + 789 = 1368

→ 368

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 20 / 44
Folding

The key is divided into parts whose size matches the address size.

Example:

Key = 123|456|789

fold shift
123 + 456 + 789 = 1368

→ 368

fold boundary
321 + 456 + 987 = 1764

→ 764

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 20 / 44
Rotation

ˆ Hashing keys that are identical except for the last character may

create synonyms.

ˆ The key is rotated before hashing.

original key rotated key

600101 160010

600102 260010

600103 360010

600104 460010

600105 560010

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 21 / 44
Rotation

ˆ Used in combination with fold shift.

original key rotated key

600101 → 62 160010 → 26

600102 → 63 260010 → 36

600103 → 64 360010 → 46

600104 → 65 460010 → 56

600105 → 66 560010 → 66

Spreading the data more evenly across the address space.

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 22 / 44
Pseudo-random

For maximum eciency, a and c should be prime numbers.

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 23 / 44
Pseudo-random

Example:

Key = 121267

a = 17

c = 7

listSize = 307

Address = ((17*121267 + 7) mod 307

= (2061539 + 7) mod 307

= 2061546 mod 307

= 41

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 24 / 44
Collision resolution
Collision resolution

ˆ Except for the direct hashing, none of the others are one-to-one

mapping

→ Requiring collision resolution methods

ˆ Each collision resolution method can be used independently with

each hash function

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 25 / 44
Collision resolution

ˆ Open addressing

ˆ Linked list resolution

ˆ Bucket hashing

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 26 / 44
Open addressing

When a collision occurs, an unoccupied element is searched for placing

the new element in.

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 27 / 44
Open addressing

Hash function:

h : U → {0, 1, 2, ..., m − 1}

set of keys addresses

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 28 / 44
Open addressing

Hash and probe function:

hp : U × {0, 1, 2, ..., m − 1} → {0, 1, 2, ..., m − 1}

set of keys probe numbers addresses

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 29 / 44
Open Addressing

Algorithm hashInsert(ref T <array>, val k <key>)

Inserts key k into table T.

i = 0

while i<m do
j = hp(k, i)

if T[j] = nil then


T[j] = k

return j
else
i = i + 1

end

end

return error: hash table overow

End hashInsert

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 30 / 44
Open Addressing

Algorithm hashSearch(val T <array>, val k <key>)

Searches for key k in table T.

i = 0

while i<m do
j = hp(k, i)

if T[j] = k then
return j

else if T[j] = nil then


return nil

else
i = i + 1

end

end

return nil

End hashSearch

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 31 / 44
Open Addressing

There are dierent methods:

ˆ Linear probing

ˆ Quadratic probing

ˆ Double hashing

ˆ Key oset

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 32 / 44
Linear Probing

ˆ When a home address is occupied, go to the next address (the

current address + 1):

hp(k, i) = (h(k) + i) mod m

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 33 / 44
Linear Probing

ˆ When a home address is occupied, go to the next address (the

current address + 1):

hp(k, i) = (h(k) + i) mod m

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 33 / 44
Linear Probing

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 34 / 44
Linear Probing

ˆ Advantages:

ˆ quite simple to implement


ˆ data tend to remain near their home address (signicant for disk
addresses)

ˆ Disadvantages:

ˆ produces primary clustering

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 35 / 44
Quadratic Probing

ˆ The address increment is the collision probe number squared:

hp(k, i) = (h(k) + i2 ) mod m

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 36 / 44
Quadratic Probing

ˆ Advantages:

ˆ works much better than linear probing

ˆ Disadvantages:

ˆ time required to square numbers


ˆ produces secondary clustering
h(k1 ) = h(k2 ) → hp(k1 , i) = hp(k2 , i)

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 37 / 44
Double Hashing

ˆ Using two hash functions:

hp(k, i) = (h1 (k) + ih2 (k)) mod m

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 38 / 44
Key Oset

ˆ The new address is a function of the collision address and the key.

of f set = [key/listSize]
newAddress = (collisionAddress + of f set) mod listSize

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 39 / 44
Key Oset

ˆ The new address is a function of the collision address and the key.

of f set = [key/listSize]
newAddress = (collisionAddress + of f set) mod listSize

hp(k, i) = (hp(k, i − 1) + [k/m]) mod m

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 39 / 44
Open addressing

Hash and probe function:

hp : U × {0, 1, 2, ..., m − 1} → {0, 1, 2, ..., m − 1}

set of keys probe numbers addresses

{hp(k, 0), hp(k, 1), . . . , hp(k, m − 1)} is a permutation of

{0, 1, . . . , m − 1}

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 40 / 44
Linked List Resolution

ˆ Major disadvantage of Open Addressing: each collision resolution

increases the probability for future collisions.

→ use linked lists to store synonyms

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 41 / 44
Linked list resolution

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 42 / 44
Bucket hashing

ˆ Hashing data to buckets that can hold multiple pieces of data.

ˆ Each bucket has an address and collisions are postponed until the

bucket is full.

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 43 / 44
Bucket hashing

Lecturer: Vuong Ba Thinh Contact: [email protected] Data Structure and Algorithms [CO2003] 44 / 44

You might also like