Chapter 5 - Hashing - Part1

Uploaded by

hamodhh313

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views28 pages

Chapter 5 - Hashing - Part1

Uploaded by

hamodhh313

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Data Structures and Algorithms 2

Prof. Ahmed Guessoum

The National Higher School of AI

Chapter 5
Hashing
Motivating Example
We want to store a list whose elements
are integers between 1 and 5
We will define an array of size 5, and if
the list has element j, then j is stored in
A[j-1], otherwise A[j-1] contains 0.
Complexity of find operation is O(1)

2
The space for storage is called ``hash table
Ideal hash table data structure is just an array of some
fixed size, TableSize, containing the data items.
A search for an item is performed on some part (i.e.
data member) of the item, called the key.
For example, an item could consist of a string (that
serves as the key) and additional data members (for
instance, a name that is part of a large employee
structure, etc.).
The common convention is to have the table run from 0
to TableSize
3
Hashing
Each key is mapped into some number in the range 0 to
TableSize
The mapping is called a hash function, h, which ideally
simple to compute and
any two distinct keys should get different cells
Since there are a finite number of cells and a very large
supply of keys, this is clearly impossible,
we seek a hash function that

distributes the keys evenly among the cells

4
Hash Functions

Suppose that the hash table has size M

There is a hash function which maps an
-1, and the
element is placed in position p in the hash
table.
The function is called h; the hash value for
key j is h[j]
If h[j] = k, then the element is added to H[k],
i.e. at position k in H.
5
Example of an ideal hash table

How to choose a hash

function?
Khalid 25000
How to decide on the table
Tariq 31250
size?
Aicha 27500
What do we do with
Asma 28200
collisions?

6
Choice of a Hash Function
If the input keys are integers, then Key mod TableSize
is generally a reasonable strategy, unless Key happens
to have some undesirable properties.
One has to be careful in the design of the hash
function.
E.g., suppose tableSize = 10 and that the keys all end in
zero, then the standard hash function is clearly a bad
choice!
It is often a good idea to ensure that the table size is
prime
When the input keys are random integers, then the
above function is not only very simple to compute but
also distributes the keys evenly 7
Choosing a hash function
Usually, the keys are strings; in this case, the hash
function needs to be chosen carefully.
One option is to add up the ASCII values of the
characters in the string.
Consider Example 1 of a hash function
int hash( const string & key, int tableSize )
{
int hashVal = 0;
for( char ch : key )
hashVal += ch;
return hashVal % tableSize;
} 8
The previous hash function is simple to implement and
computes an answer quickly.
However, if the table size is large, the function does not
distribute the keys well (fairly evenly).
E.g., suppose that TableSize = 10,007 (a prime number).
Suppose all the keys are eight or fewer characters long.
Since an ASCII character has an integer value <= 127,
the values produced by the hash function are between 0
and 1,016, which is 127 8.
This is clearly not an even distribution over the hash
table! (About 90% of the table will never be used!)
9
Example 2 of a hash function
int hash( const string & key, int tableSize )
{
return ( key[ 0 ] + 27 * key[ 1 ] + 729 * key[ 2 ] ) % tableSize;
}
27 is the number of English letters + blank char; 729 is
This hash function is easy to compute.
It examines only the first three characters.
If characters are random and table size is 10,007, as before,
then we would expect a reasonably equitable distribution.
In fact, looking up a dictionary, there are only 2,851 not
17576 ( ) combinations. though no collisions, only
28% of the table is actually hashed to. 10
Example 3 of a hash function
unsigned int hash( const string & key, int tableSize )
{
unsigned int hashVal = 0;
for( char ch : key )
hashVal = 37 * hashVal + ch;
return hashVal % tableSize;
}
Involves all characters in the key and can generally be
expected to distribute well (it computes

The code computes a polynomial function (of 37). 11

The hash function takes advantage of the fact that
overflow is allowed and uses unsigned int to avoid
introducing a negative number.
This hash function has a reasonable table distribution,
not necessarily the best.
It does have the merit of extreme simplicity and is
reasonably fast.
If the keys are very long, the hash function will take too
long to compute.
A common practice in this case is not to use all the
characters.
E.g. a street address key: Use a couple of characters
from the street address, a couple from city, and from
zip code. 12
Handling Collisions: Separate Chaining
Separate chaining approach: keep a linked list of all the
elements that hash to the same value.
Suppose that the keys are the first 10 perfect squares and
hashing function is hash(x) = x mod 10

13
Operations using Hashing
Search in Hash Table: use the hash function to
determine which list to traverse. Then search the
appropriate list.
Insertion in Hash Table: check the appropriate list to
see if element is found (if duplicates are expected, an
extra counter data member is incremented). Otherwise,
insert it at front of the list: convenience and likelihood of
frequent access.
Deletion of an element: do the hashing, then delete from
the linked list.
Note: the hash tables in this chapter work only for
objects that provide a hash function and equality
operators (operator== an/or operator!=). Comparables? 14
Hash function implementation
Use of function object template (C++11)
template <typename Key>
class hash
{
public:
size_t operator() ( const Key & k ) const;
};

The type size_t is an unsigned integral type that represents the size
of an object; it is guaranteed to be able to store an array index
On a 32-bit system size_t will take 32 bits, on a 64-bit one 64 bits
15
Default implementations of hash function template
using standard type string:

template <>
class hash<string>
{
public:
size_t operator()( const string & key )
{
size_t hashVal = 0;
for( char ch : key )
hashVal = 37 * hashVal + ch;
return hashVal;
}
};
16
Alternatives to Linked Lists?
Any scheme could be used besides linked lists to
resolve the collisions.
A binary search tree or even another hash table
would work.
If the table is large and the hash function is good,
all the lists should be short so basic separate
chaining makes no attempt to try anything
complicated.

18
Load factor of a hash table
Load factor of a hash table:
= # of elements in the hash table / table size
In previous example, = 1.0.
Usually, a threshold is set on to do the
rehashing: i.e. expanding the table and re-
calculating the hash code of already stored entries
Time required to perform a search = constant time
to evaluate the hash function + time to traverse
the list.
19
HTs without LLs: probing hash tables
Hashing with separate chaining has the disadvantage
of using linked lists can slow the algorithm down
Alternative approach (to resolving collisions with
linked lists) is to try alternative cells until an empty
cell is found.
More formally, cells h0(x), h1(x), h2(x), . . . are tried in
succession, where
hi(x) = (hash(x) + f (i)) mod TableSize, with f(0) = 0.
f is the collision resolution strategy.
All the data go inside the table a bigger table is
needed in this approach
Generally, the load factor should be below = 0.5 20
Linear Probing
Linear probing: f is a linear function of i, typically f (i) = i
trying cells sequentially (with wraparound) in search of an
empty cell
e.g. with hash(x) = x mod 10 and linear probing, insert 89, 18,
49, 58, 69

21
Linear Probing (cont.)
As long as the table is big enough, a free cell can always
be found, but the time to do so can get quite large.
Worse, even if the table is relatively empty, blocks of
occupied cells start forming. This effect, known as
primary clustering, key that hashes into the cluster
will require several attempts to resolve the collision, and
then it will be added to the cluster.
It can be shown that the expected number of probes
using linear probing is roughly 1/2 (1 + 1/ )) for
insertions and unsuccessful searches, and
1/2 (1 + 1/ )) for successful searches. 22
Quadratic probing
Quadratic probing: a collision resolution method that
eliminates the primary clustering problem of linear
probing.
Collision function is quadratic. Popular choice is f (i) =

23
Probing properties
For linear probing, it is a bad idea to let the hash table get
nearly full, because performance degrades.
For quadratic probing, the situation is even more drastic:
There is no guarantee of finding an empty cell once
the table gets more than half full, or
even before the table gets half full if the table size is not
prime.
This is because at most half of the table can be used as
alternative locations to resolve collisions.
Theorem 5.1: If quadratic probing is used, and the table size
is prime, then a new element can always be inserted if the
table is at least half empty.
Code for hash tables using probing strategies 24
Double Hashing
For double hashing, one popular choice is
f (i) = i · hash2(x).
This formula says that we apply a second hash function
to x and probe at a distance hash2(x), 2hash2(x), . . . ,
and so on.
A poor choice of hash2(x) would be disastrous.
For instance, the obvious choice hash2(x) = x mod 9
would not help if 99 were inserted into the input in the
previous examples.
Thus, the function must never evaluate to zero.
It is also important to make sure all cells can be probed
25
A function as hash2(x) = R x mod R), with R a prime
smaller than TableSize, will work well.
Below: same previous example with R = 7

Important reminder: Size of table should be prime!

26
Size = 10 in example was for convenience of mod 10.
Rehashing
If the hash table is close to full, then running time for the
operations will start taking too long, and insertions might
fail if separate chaining with quadratic probing
a hash table of bigger size (~ twice as big) is used
with a new hash function
compute the new hash value for each element of the
original table and insert it in the new table.
The old hash table is subsequently deleted.
This operation is called Rehashing.
It Should be done infrequently. 27
Rehashing example
Insert 13, 15, 24, 6 in hash table of size 7 6
15
with h(x) = x mod 7 (with linear probing)
23
24
If 23 is inserted (linear probing), then
the table > 70% full
13
New table created; 17 is next prime
number about twice as large as 7
New hash function h(x) = x mod 17

Exercise: You can easily check the new hash table with
these data elements. 28
When to rehash?
Rehashing can be implemented in several ways with
quadratic probing
Rehash as soon as the table is half full.
The other extreme is to rehash only when an insertion
fails (even with probing).
A third, middle-of-the-road strategy is to rehash when
the table reaches a certain load factor.
Since performance does degrade as the load factor
increases, the third strategy, implemented with a good
cutoff, could be best.
Implementation of rehashing for quadratic probing
See in textbook rehashing for separate chaining hash table.
29

Hashing
50% (2)
Hashing
43 pages
Lecture 23 Hash Code Map
No ratings yet
Lecture 23 Hash Code Map
41 pages
9-Hashing 1
No ratings yet
9-Hashing 1
35 pages
Hashing Techniques Explained
No ratings yet
Hashing Techniques Explained
32 pages
09 Hashtables
No ratings yet
09 Hashtables
25 pages
Srinivas University: Institute of Engineering and Technology Mukka, Mangaluru
No ratings yet
Srinivas University: Institute of Engineering and Technology Mukka, Mangaluru
34 pages
Vtucode - in Module 5 DS 2022 Scheme
No ratings yet
Vtucode - in Module 5 DS 2022 Scheme
24 pages
Unit 1 Hashing
No ratings yet
Unit 1 Hashing
61 pages
SORTING PROGRAMS - Counting + Bucket + Heap
No ratings yet
SORTING PROGRAMS - Counting + Bucket + Heap
27 pages
Hash Tables for Computer Science Students
No ratings yet
Hash Tables for Computer Science Students
20 pages
DSA2 Chapter 5 Hashing
No ratings yet
DSA2 Chapter 5 Hashing
44 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
27 pages
Chapter10 HashTables
No ratings yet
Chapter10 HashTables
49 pages
Hashing Techniques for CS Students
No ratings yet
Hashing Techniques for CS Students
25 pages
Hashing in Data Structure
No ratings yet
Hashing in Data Structure
43 pages
05 Csai 230 Course 05
No ratings yet
05 Csai 230 Course 05
44 pages
Lecture 27 - Hashing
No ratings yet
Lecture 27 - Hashing
48 pages
Hashing
No ratings yet
Hashing
37 pages
Cse373 10 Hashing
No ratings yet
Cse373 10 Hashing
36 pages
Hashing
No ratings yet
Hashing
44 pages
Hashing Lecture FarshadRabbani
No ratings yet
Hashing Lecture FarshadRabbani
7 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
26 pages
Unit 3 Hashing
No ratings yet
Unit 3 Hashing
23 pages
Lec12 Hash Tables 09092024 090609pm
No ratings yet
Lec12 Hash Tables 09092024 090609pm
48 pages
Hashing
No ratings yet
Hashing
20 pages
IT245 - Module 8
No ratings yet
IT245 - Module 8
41 pages
22CS302 LM21
No ratings yet
22CS302 LM21
7 pages
Ds 5 Update
No ratings yet
Ds 5 Update
26 pages
DS 5
No ratings yet
DS 5
23 pages
Hash Tables: Concepts & Implementations
No ratings yet
Hash Tables: Concepts & Implementations
53 pages
Lect Hashing
No ratings yet
Lect Hashing
36 pages
Hash Tables: A Guide for CS Students
No ratings yet
Hash Tables: A Guide for CS Students
48 pages
15 HashTables
No ratings yet
15 HashTables
27 pages
Lecture 12
No ratings yet
Lecture 12
33 pages
ADI Hashing
No ratings yet
ADI Hashing
47 pages
Group 15 Hash Tables
No ratings yet
Group 15 Hash Tables
42 pages
Hashing Techniques Explained
No ratings yet
Hashing Techniques Explained
11 pages
Lecture 08 - Hash Tables
No ratings yet
Lecture 08 - Hash Tables
21 pages
3 Hashing
No ratings yet
3 Hashing
20 pages
Data Structure Module 5
No ratings yet
Data Structure Module 5
22 pages
DSA MK Lect2 PDF
No ratings yet
DSA MK Lect2 PDF
92 pages
05 Hashing
No ratings yet
05 Hashing
47 pages
CH 4
No ratings yet
CH 4
58 pages
L5 HashTables
No ratings yet
L5 HashTables
22 pages
Hashing: 15-111 Data Structures Data Structures
No ratings yet
Hashing: 15-111 Data Structures Data Structures
30 pages
Hashing Techniques Explained
No ratings yet
Hashing Techniques Explained
56 pages
Hashing
No ratings yet
Hashing
56 pages
Hash Table
No ratings yet
Hash Table
9 pages
Hashing Techniques Explained
No ratings yet
Hashing Techniques Explained
20 pages
Hashing PDF
No ratings yet
Hashing PDF
61 pages
Struktur Data: By: Sri Rezeki Candra Nursari
No ratings yet
Struktur Data: By: Sri Rezeki Candra Nursari
34 pages
Cs 218 - Data Structures: Hashing
No ratings yet
Cs 218 - Data Structures: Hashing
18 pages
AST20105 Data Structure and Algorithms: Chapter 9 - Hash Table
No ratings yet
AST20105 Data Structure and Algorithms: Chapter 9 - Hash Table
39 pages
Hash Tables: Concepts and Applications
No ratings yet
Hash Tables: Concepts and Applications
15 pages
Hashing
No ratings yet
Hashing
13 pages
Hash Table PDF
No ratings yet
Hash Table PDF
25 pages
Idst 2016 SA 05 Hashing
No ratings yet
Idst 2016 SA 05 Hashing
68 pages
Problem Solving by Searching
No ratings yet
Problem Solving by Searching
88 pages
13 Cryptographic Hash Functions
No ratings yet
13 Cryptographic Hash Functions
47 pages
Hashing and Indexing
No ratings yet
Hashing and Indexing
28 pages
Cs3491-Aiml Lab Manual
No ratings yet
Cs3491-Aiml Lab Manual
18 pages
Lecture Notes - 3
No ratings yet
Lecture Notes - 3
18 pages
Module 4 Using Heuristic in Game Min-Max-Procedure
No ratings yet
Module 4 Using Heuristic in Game Min-Max-Procedure
11 pages
Solving Problems by Searching
No ratings yet
Solving Problems by Searching
4 pages
Double Hashing
No ratings yet
Double Hashing
4 pages
AI Lab Journal for IT Students
No ratings yet
AI Lab Journal for IT Students
33 pages
Bin Packing Problem
No ratings yet
Bin Packing Problem
12 pages
Backtracking Search For CSPs
No ratings yet
Backtracking Search For CSPs
5 pages
2.1 Searching Algorithms Computing OCR GCSE
No ratings yet
2.1 Searching Algorithms Computing OCR GCSE
19 pages
Informed and Uninformed Search
No ratings yet
Informed and Uninformed Search
74 pages
Efficient String Matching Techniques
No ratings yet
Efficient String Matching Techniques
2 pages
Maximum Weight Node
No ratings yet
Maximum Weight Node
3 pages
CrackStation - Online Password Hash Cracking - MD5, SHA1, Linux, Rainbow Tables, Etc
No ratings yet
CrackStation - Online Password Hash Cracking - MD5, SHA1, Linux, Rainbow Tables, Etc
2 pages
Class-8 - Indexing
No ratings yet
Class-8 - Indexing
20 pages
Assignment Questions S2
No ratings yet
Assignment Questions S2
7 pages
Hashing Problem Set Solutions
No ratings yet
Hashing Problem Set Solutions
3 pages
Improving Cuckoo Hashing With Perfect Hashing
No ratings yet
Improving Cuckoo Hashing With Perfect Hashing
3 pages
Intro To Artificial Intelligence Search: Ahmed Ezzat Labib Helwan University
No ratings yet
Intro To Artificial Intelligence Search: Ahmed Ezzat Labib Helwan University
21 pages
5-Hash Table Datastructure
No ratings yet
5-Hash Table Datastructure
19 pages
Re Used Value
No ratings yet
Re Used Value
5 pages
A Comparative Study of Breadth First Search and Depth First Search Algorithms in Solving The Water Jug Problem On Google Colab
No ratings yet
A Comparative Study of Breadth First Search and Depth First Search Algorithms in Solving The Water Jug Problem On Google Colab
11 pages
SHA1
No ratings yet
SHA1
7 pages
Best First Search
No ratings yet
Best First Search
8 pages
Assignment 12 Sample Problems Sample Problems
No ratings yet
Assignment 12 Sample Problems Sample Problems
19 pages
Set 3: Informed Heuristic Search: ICS 271 Fall 2014 Kalev Kask
No ratings yet
Set 3: Informed Heuristic Search: ICS 271 Fall 2014 Kalev Kask
78 pages
VHDL Realisation of SHA-256 Algorithm
No ratings yet
VHDL Realisation of SHA-256 Algorithm
59 pages
Collision Resistance: Online Cryptography Course Dan Boneh
No ratings yet
Collision Resistance: Online Cryptography Course Dan Boneh
43 pages

Chapter 5 - Hashing - Part1

Uploaded by

Chapter 5 - Hashing - Part1

Uploaded by

Data Structures and Algorithms 2

Prof. Ahmed Guessoum

distributes the keys evenly among the cells

Suppose that the hash table has size M

How to choose a hash

The code computes a polynomial function (of 37). 11

Important reminder: Size of table should be prime!

You might also like