Algorithmics
CT065-3-3
Hashing
Level 3 Computing (Software Engineering)
Prepared by: Tan Choon Ling First Prepared on: 09-10-06 Last Modified on: 09-10-06
Quality checked by:
Copyright 2006 Asia Pacific Institute of Information Technology
Topic & Structure of Lesson
Hashing
Hashing Strategies
Truncation
Folding
Modular Arithmetic
Collisions
Open Addressing using Linear Probing
Chaining
Module Code and Module Title Title of Slides Slide 2 (of 22)
Learning Outcomes
By the end of this lesson you should
be able to:
Differentiate between the various hashing
strategies
Identify techniques to resolve collisions
Module Code and Module Title Title of Slides Slide 3 (of 22)
Searching
Consider the problem of searching an array for a
given value
If the array is not sorted, the search requires O(n)
time
If the value isnt there, we need to search all n elements
If the value is there, we search n/2 elements on average
If the array is sorted, we can do a binary search
A binary search requires O(log n) time
About equally fast whether the element is found or not
It doesnt seem like we could do much better
How about an O(1), that is, constant time search?
We can do it if the array is organized in a particular way
Module Code and Module Title Title of Slides Slide 4 (of 22)
Hashing
Suppose we were to come up with a
magic function that, given a value to
search for, would tell us exactly where in
the array to look
If its in that location, its in the array
If its not in that location, its not in the
array
This function would have no other purpose
This function is called a hash function
Module Code and Module Title Title of Slides Slide 5 (of 22)
Hashing
a hash table employs a function, H, that
maps key values to table index values
eg. Student records for a class could be
stored in an array C of size 10000 by
truncation the students ID number to its
last four digits:
H(IDNum) = IDNum % 10000
Given an ID number k, the corresponding record
would be found at C[h(k)]
Module Code and Module Title Title of Slides Slide 6 (of 22)
Hashing
Terminology
h is a hash function
k hashes to slot h(k)
the hash value of k is h(k)
Module Code and Module Title Title of Slides Slide 7 (of 22)
Hashing Properties
a good hash function should:
Be easy and quick to compute
Achieve an even distribution of the key values that
actually occur across the index range supported by
the table
a hash function will take a key value and:
chop it up into pieces, and
mix the pieces together in some fashion, and
compute an index that will be uniformly distributed
across the available range
Module Code and Module Title Title of Slides Slide 8 (of 22)
Hashing Properties
The hash function h(x) = x mod 8
gives
x 17 20 24 38 51
h(x) 1 4 0 6 3
The hash function h(x) = (x div 2) mod 8
gives
x 17 20 24 38 51
h(x) 0 2 4 3 1
Module Code and Module Title Title of Slides Slide 9 (of 22)
Hashing Strategies
Truncation
Ignore part of the key and use the remaining
part directly as the index
eg.: if the keys are 8-digit numbers and the
hash table has 1000 entries, then the first,
fourth and eighth digit could make the hash
function
21296876 maps to 296
Module Code and Module Title Title of Slides Slide 10 (of 22)
Hashing Strategies
Folding
Break up the key in parts and combine them
in some way
eg.: if the keys are 8-digit numbers and the
hash table has 1000 entries, break up a key
into three, three and two digits, add them up
and, if necessary, truncate them
21296876 maps to 212 + 968 + 76 = 1256
and then mod to 256
Module Code and Module Title Title of Slides Slide 11 (of 22)
Hashing Strategies
Modular Arithmetic
Convert the key to an integer, and then mod
that integer by the size of the table
eg.: 21296876 maps to 876
Module Code and Module Title Title of Slides Slide 12 (of 22)
Collisions
When two values hash to the same array
location, this is called a collision
Collisions are normally treated as first
come, first served basis - the first value
that hashed to the location gets it
Module Code and Module Title Title of Slides Slide 13 (of 22)
Handling Collisions
Open addressing
Idea:
Store all elements in the hash table itself.
If a collision occurs, find another slot. (How?)
When searching for an element examine slots until
the element is found or it is clear that it is not in the
table.
The sequence of slots to be examined (probed) is
computed in a systematic way.
It is possible to fill up the table so that you cant insert
any more elements.
idea: extendible hash tables?
Module Code and Module Title Title of Slides Slide 14 (of 22)
Handling Collisions
Open addressing
Probing must be done in a systematic way
(why?)
There are several ways to determine a
probe sequence:
linear probing
quadratic probing
double hashing
random probing
Module Code and Module Title Title of Slides Slide 15 (of 22)
Handling Collisions
Linear Probing: start with the original
hash index, say K, and search the table
sequentially
0 Vacant
1 Filled
Name
Name 2 Filled
Address ..
ID K Filled
Major H( ) K +1 Filled
Level K +2 Vacant
....... ..
Table Index
Module Code and Module Title Title of Slides Slide 16 (of 22)
Handling Collisions
Problem with Linear Probing
Clustering
The probability that a slot will be hit are
no longer uniform.
Module Code and Module Title Title of Slides Slide 17 (of 22)
Handling Collisions
Quadratic Probing: attempt to scatter
the effect of collisions across the table
in a more distributed way
Module Code and Module Title Title of Slides Slide 18 (of 22)
Resolving Collisions
Chaining / closed addressing
Idea : put all elements that hash to the
same slot in a linked list (chain). The slot
contains a pointer to the head of the list.
The load factor indicates the average
number of elements stored in a chain. It
could be less than, equal to, or larger than
1.
Module Code and Module Title Title of Slides Slide 19 (of 22)
Resolving Collisions
Chaining
Insert : O(1)
worst case
Delete : O(1)
worst case
assuming doubly-linked list
its O(1) after the element has been found
Search : ?
depends on length of chain.
Module Code and Module Title Title of Slides Slide 20 (of 22)
Summary
Hashing
Hashing Strategies
Truncation
Folding
Modular Arithmetic
Collisions
Open Addressing using Linear Probing
Chaining
Module Code and Module Title Title of Slides Slide 21 (of 22)
Next Lesson
String/Pattern Matching Algorithms
Brute Force/ Nave Search
Knuth-Morris-Pratt
Rabin-Karp
Module Code and Module Title Title of Slides Slide 22 (of 22)