Types of Hashing in Data Structure
There are two primary hashing techniques in a data structure.
Open hashing / separate chaining / closed addressing
Closed hashing / open addressing
1. Open Hashing / Separate Chaining
Separate chaining is the most used collision hashing technique in data
structures that uses a lined list. Any two or more components that meet at the
same point are chained together to form a single-linked list known as a chain.
Every linked list member that hashes is chained to the same position here.
Also known as closed addressing, open hashing is used to avoid any hash
collisions, using an array of linked lists in order to resolve the collision.
Example:
In a simple hash function, h(key) = key mod table size.
Let us take a simple hash table with table size = 6
Sequence keys = 50, 600, 74, 113, 14, 99
Therefore,
0 600
2 50 74 14
3 99
4
5 113
(As collision occurs after inserting keys 74 and 14, they are added to the
chain)
Advantages of Open hashing Disadvantages
Easy and basic to implement Wastage of space
As there are a lot of empty spaces in
With a long chain, the search time is
the hash table, we can add more
increased
keys to the table.
Comparatively less sensitive to Poor cache performance as
different load factors compared to closed hashing.
Mostly used when unsure about the
amount and frequency of the keys to
be implemented in the hash table.
2. Closed hashing (Open addressing)
Open addressing stores all entry records within the array itself, as opposed to
linked lists. The phrase 'open addressing' refers to the notion that the hash
value of an item does not identify its location or address. In order to insert a
new entry, the array is first checked before computing the hash index of the
hashed value, starting with the hashed index. If the space at the hashed index
is empty, the entry value is inserted there; otherwise, some probing sequences
are used until an empty slot is found.
The procedure used to navigate through entries is known as the probe
sequence. You can vary the time between succeeding entry slots or probes in
different probe sequences.
In Closed hashing, there are three techniques that are used to resolve the
collision:
A. Linear Probing
Linear probing involves systematically checking the hash table from its very
beginning. A different site is searched if the one received is already occupied.
In linear probing, the interval between the probes is usually fixed (generally,
to a value of 1).
The formula for linear probing: index = key % hashTableSize
The hash(n) is the index computed using a hash function, and T is the
table size.
If slot index = ( hash(n) % T) is full, then the next slot index is calculated
by adding 1 ((hash(n) + 1) % T).
The sequence goes as:
index = ( hash(n) % T)
(hash(n) + 1) % T
(hash(n) + 2) % T
(hash(n) + 3) % T … and so on.
Example:
For a hash table, Table Size = 20
Keys = 3,2,46,6,11,13,53,12,70,90
Therefore,
INDEX
SL. NO KEY HASH INDEX
(AFTER LINEAR PROBING)
1 3 3%20 3 3
2 2 2%20 2 2
3 46 46%20 6 6
4 6 6%20 6 7
5 11 11%20 11 11
6 13 13%20 13 13
7 53 53%20 13 14
8 12 12%20 12 12
9 70 70%20 10 10
INDEX
SL. NO KEY HASH INDEX
(AFTER LINEAR PROBING)
10 90 90%20 10 11
B. Quadratic Probing
The only distinction between linear and quadratic probing is the space
between succeeding probes or entry slots. When a hashed index slot for an
entry record is already taken, you must start traversing until you discover an
open slot. The spacing between slots is calculated by adding each subsequent
value of any arbitrary polynomial in the initial hashed index.
The formula for quadratic probing: index = index % hashTableSize
The hash(n) is the index computed using a hash function, and T is the
table size.
If slot index = ( hash(n) % T) is full, then the next slot index is calculated
by adding 1 (hash(n) + 1 x 1) % T
The sequence goes as -
index = ( hash(n) % T)
(hash(n) + 1 x 1) % T
(hash(n) + 2 x 2) % T
(hash(n) + 3 x 3) % T … and so on
Example:
For a hash table, Table Size = 7
Keys = 22,30,50
Thus,
INDEX
SL. NO KEY HASH INDEX (AFTER QUADRATIC
PROBING)
1 22 22%7 1 1
2 30 30%7 2 2
5 50 50%7 1 1(1+2 x 2)
C. Double-Hashing in data structure
It is another hash function that determines the intervals between probes. An
optimized method of reducing clustering is double hashing. An additional
hash function is used to calculate the increments for the probing sequence.
The formula for double hashing - (first hash(key) + i * secondHash(key)) %
size of the table
The sequence goes as follows -
index = hash(x) % S
(hash(x) + 1*hash2(x)) % S
(hash(x) + 2*hash2(x)) % S
(hash(x) + 3*hash2(x)) % S … an so on
Example:
For a hash table, Table Size = 7
Keys = 27,43,98,72
Thus,
SL. INDEX
KEY HASH INDEX
NO (AFTER DOUBLE HASHING)
1 43 43%7 1
[h1(92) + i * (h2(92)] % 7
= [6 + 1 * (1 + 92 % 5)] % 7
2 92 92%7 6
= 9 % 7
=2
4
SL. INDEX
KEY HASH INDEX
NO (AFTER DOUBLE HASHING)
[h1(72) + i * (h2(72)] % 7
= [2 + 1 * (1 + 72 % 5)] % 7
5 72 72%7 2
= 5 % 7
=5
6 27 27%7 6