UNIT-II
SYLLABUS
D i c t i ona r i e s : Li ne a r Li s t R e p r e s e n t ati o n , S k i p L i s t R e p r e s e n t a ti on,
Operations – Insertion, Deletion And Searching.
H as h tab l e r e pr e s e n t a ti o n : H a sh F u nc ti o n s , C o l l i s i on R es o l u t i on -
s e p a r a t e C h a i ni n g , O pe n A ddr e s s i n g - li ne a r Pr o b i n g , Q u ad ra t i c
Probing, Double Hashing, Rehashing, Extendible Hashing.
ACE Engineering College(Autonomous)
Linear Search
Linear search is implemented using following steps... Example
Step 1 - Read the search element from the user. Search Element:18
Step 2 - Compare the search element with the first
Time Complexity: O(n)
element in the list.
Step 3 - If both are matched, then display "Given element
is found!!!" and terminate the function
Step 4 - If both are not matched, then compare search
element with the next element in the list.
Step 5 - Repeat steps 3 and 4 until search element is
compared with last element in the list.
Step 6 - If last element in the list also doesn't match, then
display "Element is not found!!!" and terminate the
function.
ACE Engineering College(Autonomous)
Binary Search
Binary search is implemented using following steps...
Step 1 - Read the search element from the user.
Step 2 - Find the middle element in the sorted list.
Step 3 - Compare the search element with the middle
element in the sorted list.
Step 4 - If both are matched, then display "Given
element is found!!!" and terminate the function.
Step 5 - If both are not matched, then check whether
the search element is smaller or larger than the
middle element.
Step 6 - If the search element is smaller than middle element,
repeat steps 2, 3, 4 and 5 for the left sublist of the
middle element.
Step 7 - If the search element is larger than middle element,
repeat steps 2, 3, 4 and 5 for the right sublist of the
middle element.
Step 8 - Repeat the same process until we find the search
element in the list or until sublist contains only one element.
Step 9 - If that element also doesn't match with the
Time Complexity: O(log n) search element, then display "Element is not found in
the list!!!" and terminate the function.
ACE Engineering College(Autonomous)
Drawbacks
The main drawback of these techniques is-
As the number of elements increases, time taken to perform the search also
increases.
This becomes problematic when total number of elements become too large.
ACE Engineering College(Autonomous)
Hashing
Hashing is another approach in which time required to search an element
doesn't depend on the total number of elements.
Hashing is an effective way to reduce the number of comparisons to search an
element in a data structure.
Using hashing data structure, a given element is searched with constant
time complexity.
Hashing is the process of indexing and retrieving element (data) in
a data structure to provide a faster way of finding the element
using a hash key.
ACE Engineering College(Autonomous)
Advantage of Hashing
Unlike other searching techniques,
Hashing is extremely efficient.
The time taken by it to perform the search does not depend upon
the total number of elements.
It completes the search with constant time complexity O(1).
ACE Engineering College(Autonomous)
Hashing (Static Hashing)
There are two concepts in hashing:
Hash table
Hash function
Hash table is a data structure used for storing and retrieving data very
quickly.
Insertion of data in the hash table is based on the key value.
Hash function is a function which takes a piece of data (i.e. key) as input
and produces an integer (i.e. hash value) as output which maps the data to a
particular index in the hash table.
ACE Engineering College(Autonomous)
Hashing
ACE Engineering College(Autonomous)
Hashing Functions
There are various hash functions
Division method
Mid square method
Multiplicative hash function
Digit folding
ACE Engineering College(Autonomous)
Division Method
The hash function depends upon the
remainder of division.
The divisor is table length 0
Formula to calculate hash key is, 1
H(key )= record % size 2 72
For example: 3
If the record 54, 72, 89, 37 is to be 4 54
placed in the hash table and if the 5
table size is 10. 6
54%10=4 7 37
8
72%10=2
9 89
89%10=9
37%10=7
ACE Engineering College(Autonomous)
Mid Square Method
In this method, the key is squared and the middle or mid part of
the result is used as index.
Consider, the key 3111 then
31112 =9678321
For the hash table of size 1000
H(3111)=783(the middle 3 digits)
ACE Engineering College(Autonomous)
Multiplicative Hash Function
The given record is multiplied by some constant value.
Formula for computing the hash key is:
H(key)=floor(p*(fractional part of key*A))
where, p is integer constant and
A is constant real number
Donald knuth suggested to use constant A=0.61803398987
If the key 107 and p=50 then
H(key)= floor(50*(107*0.61803398987))
=floor(3306.4818458045)
=3306
ACE Engineering College(Autonomous)
Digit Folding
The key is divided into separate parts and using some simple
operation these parts are combined to produce the hash key.
For example, consider a record 12365412
divided into separate parts as : 123 654 12
add all these parts
H(key)=123+654+12
=0789
The record will be placed at location 789 in the hash table.
ACE Engineering College(Autonomous)
COLLISION
Collision occurs when the hash function maps two different keys
to same location. Obviously, two records can not be stored in the
same location.
Similarly when there is no room for a key in the hash table then
such a situation is called Overflow.
ACE Engineering College(Autonomous)
Collision
Example: 0
Consider a hash function 1 131
H(key)=recordkey%10 hash table size=10
2
The record keys are: 131, 44, 43, 78, 19, 36, 57 and 77
131%10=1 3 43
44%10=4 4 44
43%10=3
5
78%10=8
19%10=9 6 36
36%10=6 7 57
57%10=7
8 78
77%10=7 Collision
9 19
From the index 7 if we look for next vacant passion at subsequent indices 8,9 also ther
is no place in hash table. This situation is called Overflow.
ACE Engineering College(Autonomous)
Collision Resolution/Overflow Handling
Therefore, a method used to solve the problem of collision also
called collision resolution technique is applied.
The two most popular methods of resolving collision are:
Collision resolution by open addressing
Collision resolution by chaining
ACE Engineering College(Autonomous)
Open Addressing
Once a collision takes place, open addressing computes new positions
using a probe sequence and the next record is stored in that position.
In this technique of collision resolution, all the values are stored in the
hash table.
The hash table will contain two types of values- either
sentinel value (for example, -1) or
a data value.
The presence of sentinel value indicates that the location contains no data
value at present but can be used to hold a value.
The process of examining memory locations in the hash table is called
probing.
ACE Engineering College(Autonomous)
Open Addressing Techniques
Open addressing technique can be implemented using-
Linearprobing,
Quadratic probing
Double hashing.
Rehashing
ACE Engineering College(Autonomous)
Linear Probing
When two records demand for the same home bucket in the hash table then
collision can be solved by placing the second record linearly down whenever
the empty bucket is found.
In linear probing, the hash table is represented as a one dimensional array
with indices range from 0 to hash table size-1.
Before inserting, initialize all slots in the table to be empty.
It allows us to detect collisions and overflows when we insert into hash table.
ACE Engineering College(Autonomous)
Linear Probing
For example, Keys: 131, 4,8,7,21, 5, 31, 61, 9, 29 0 -1
Hash table size: 10 1 -1
Step 1: initialize all locations with -1. 0 2 -1 0 -1 -1
Step 2: first put 131, 4, 8, 7 1 -1 3 1310 1 -1 -1 131
0 0 -1
131%10=1 4%10=4 8%10=8 7 %10=7 2 131 4 0
-11 0 2 -1 131 -129-1
1 1 131
Next key is 21 3 21 5 -12 1 3 -1 21 131
2 2 1 21 131-1
H(key)=21%10=1 Collision(131) 4 -1 6 43 2 4 -1 -1 21214
3 2
3 next empty 31
To resolve this collision we will linearly move down and at the3
5 4 7 -14 3 5 -1 4 3131-1
location we will probe the element 4 4 4
6 5 8 -15 4 6 -1 -1 4 4-1
Next 5 will be placed at index 5 5 5 4 5
7 -1 9 76 5 7 -1 -1 5 5 7
31,61 will be follows linear probing 6 6 5 61
8 7 6
87 6 8 7 61618
Next record 9 will be placed at index 9 7 7 7
Next final record is 29 collision and overflow to9 handle it 7
-18we7move
9 8back7 -1 to 0.
8 8 8 8 7
that index 0 is empty 29 will be placed. 9 -1 9 88 -1 8 8
9 -1
9 9
9 9
ACE Engineering College(Autonomous)
GATE Question
Consider a hash table of size seven, with starting index zero, and a hash
function (3x + 4)mod7. Assuming the hash table is initially empty, which of
the following is the contents of the table when the sequence 1, 3, 8, 10 is
inserted into the table using closed hashing? Note that ‘_’ denotes an empty
location in the table GATE 2007
(A) 8, _, _, _, _, _, 10
(B) 1, 8, 10, _, _, _, 3
(C) 1, _, _, _, _, _,3
(D) 1, 10, 8, _, _, _, 3
Answer: B
ACE Engineering College(Autonomous)
1. what is hashing? 2marks (R15) Nov/Dec 2016
Explain various hashing methods with suitable examples.
What is collision? Explain different collision resolution techniques with
examples. 10 marks (R15) Nov/Dec 2016
Discuss about linear probing. R18 Dec 2019
ACE Engineering College(Autonomous)
HASHING
OPEN ADDRESSING
QUADRATIC PROBING
ACE Engineering College(Autonomous)
Limitations of Linear Probing
One major problem with linear
probing is primary clustering.
0 39
Primary clustering is a process in 1 29 Cluster is formed
which a block of data is formed in the 2 8
hash table when collision is resolved. 3
For example: 19, 18, 39, 29, 8 4
Rest of the table is empty
19%10=9 5
6
18%10=8
7
39%10=9 8 18
29%10=9 9 19
8%10=8
Clustering problem can be solved by quadratic probing
ACE Engineering College(Autonomous)
Quadratic Probing
Quadratic Probing is similar to Linear probing.
It operates by taking the original hash value and adding successive values of
an arbitrary quadratic polynomial to the starting vale.
This method uses the following formula:
Hi(key)=(Hash(key)+i2)%m //where m can be table size or any prime number
ACE Engineering College(Autonomous)
Example on Quadratic Probing
For example, Hash Table size=10 Hi(key)=(Hash(key)+i2)%m
elements to be inserted are: 37, 90, 55, 22, 11, 17, 49, 87
We will fill the hash table step by step 0 90
0
0 90
90
37%10=7 90%10=0 55%10=5 22%10=2 11%10=1 1 0 11 90
1 11
17%10=7 Collision 1 11
2 22
We will apply quadratic probing to insert 17 into the hash table 2 22
2 22
We will choose value i =1,2,3…, whichever is applicable 3
3
Consider i=1 then 3
4
(7+12)%10=8 , when i=1 4
4
The bucket 8 is empty hence we will place the element at index 8 5 55
5 55
Next, 49%10=9 5 55
6
Now to place 87 87%10=7 collision we will use quadratic probing 6
6 87
(7+1)%10=8 Collision 7 37
7 37
7 37
(7+4)%10=1 collision 8
8 17
(7+9)%10=6 this bucket is free we can place 8 17
9
9 49
9 49
ACE Engineering College(Autonomous)
Limitations of Quadratic Probing
Secondary Clustering: When using quadratic probing, there is no
guarantee of finding an empty cell once the table becomes more than
half full, or
Even before this if the table size is composite, because collisions must be resolved
using half of the table at most.
For example,
if our hash table has three slots, then records that hash to slot 0 can probe only
to slots 0 and 1 (that is, the probe sequence will never visit slot 2 in the table). Thus,
if slots 0 and 1 are full, then the record cannot be inserted even though the table is
not full!
A more realistic example is a table with 105 slots. The probe sequence starting from
any given slot will only visit 23 other slots in the table. If all 24 of these slots should
happen to be full, even if other slots in the table are empty, then the record cannot
be inserted because the probe sequence will continually hit only those same 24 slots.
ACE Engineering College(Autonomous)
Quadratic Probing
Secondary Clustering example: 19,18,39,29,8
ACE Engineering College(Autonomous)
Hash the following in a table of size 11. use any two collision resolution
techniques. 23,0,52,61,78,33,100,8,90,10,14
ACE Engineering College(Autonomous)
HASHING
OPEN ADDRESSING
DOUBLE HASHING
ACE Engineering College(Autonomous)
Double Hashing
Double Hashing is a technique in which a second hash function is applied to the key when collision
occurs.
Double hashing can be done using :
(hash1(key) + i * hash2(key)) % TABLE_SIZE
Here hash1() and hash2() are hash functions and TABLE_SIZE
is size of hash table.
(We repeat by increasing i when collision occurs)
By applying the second hash function we will get the number of positions from the point of collision
to insert.
There are two important rules to be followed for the second function:
It must never evaluate to zero.
Must make sure that all cells can be probed.
The formula to be used for double hashing is:
H1(key)=key %Table size
H2(key)=M-(key % M)
Where M is a prime number smaller than the size of the table.
ACE Engineering College(Autonomous)
H1(key)=key %Table
Example on Double Hashing: size
H2(key)=M-(key % M)
Consider the following elements to be placed in the hash table of size 10
37, 90, 45, 22, 17, 49, 55
Initially insert the elements using the formula for H1(key).
Insert 37,90,45,22
H1(37)=37%10=7
0 90
H1(90)=90%10=0 00 9090
H1(45)=45%10=5 1 0 90
H1(22)=22%10=2
1 1 1717
2 1 22 17
Now if 17 is to be inserted then 22 2222
H1(17)=17%10=7Collision so we have to apply second hash function 3 2 22
33
H2(key)=M-(key%M) here M is a prime number smaller than the size of the table size 10 is 7 4 3
Hence M=7 44
H2(17)=7-(17%7)=7-3=4 //so we have to add this number after four places of index 7 we 5 have4 to45take 4
55 4545
jumps. Therefore the 17 will be placed at index 1. 6 5 45
Next 49 will be 49%10=9 66
Now to insert number 55 7 6 37 55
77 3737
H1(55)=55%10=5 Collision 8 7 37
H2(55)=7-(55%7)=7-6=1 //one jump from index 5. 8 8
9 8
99 49
9 49
ACE Engineering College(Autonomous)
Practice Question on Double Hashing
Keys: 46,28,21,35,57,39,19,50
h1(x)=x mod 11
H2(x)=M-(x mod M) // where M=7
ACE Engineering College(Autonomous)
Rehashing
When the hash table becomes too full in open addressing
hashing, the successive insertion operation will take more time to
complete. To overcome this situation rehashing technique can be
used.
In rehashing technique another hash table is build that is about
twice as big and scan down the entire original hash table,
computing the new hash value for each element and inserting it
in the new table.
For example: insert 13,15,24,6,23 to a hash table of size 7
Hash function: h(x)=x mod 7
ACE Engineering College(Autonomous)
Rehashing(cont..)
Since the table is full, a new table is created. The size of this table will be 17,
because this is the first prime which is twice as large as the old table size.
h(x)= x mod 17
Now the old table is scanned and the elements are inserted into the new table.
The resulting table is: h(x)= x mod 17
Rehashing can be implemented in several ways:
Rehash as soon as the table is half full.
Rehash only when an insertion fails.
Rehash when the table reaches a certain load factor.
ACE Engineering College(Autonomous)
Extendible hashing
The major problems using open hashing or closed hashing is that collisions
could cause several blocks to be examined during a find, even for a well
distributed hash table gets too full, and when the table gets too full, an
extremely expensive rehashing step must be performed.
To avoid these problems extendible hashing is used.
In extendible hashing, a hash table is stored in the main memory
and buckets are stored in the disk.
Each value in the hash table is a pointer to a bucket in the
secondary memory.
The hash function is applied on the input value to generate the bucket pointer
and perform the store and retrieve operation.
ACE Engineering College(Autonomous)
Extendible hashing(cont..)
Example: consider our data consists of 6 bit integer values. The root of the
tree contains four pointers determined by the leading two bits of the data.
Each leaf has up to m=4 elements.
In each leaf the first two bits are identical, this is indicated by the number in
parenthesis. D will represent the number of bits used by the root, which is
sometimes known as the directory.
The number of entries in the directory is thus 2D
DL is the number of leading bits that all the elements of some leaf L have in
common. DL<=D
ACE Engineering College(Autonomous)
00 01 10 11
000100 010100 100000 111000
001000 011000 101000 111001
001010 101100
001011 101110
ACE Engineering College(Autonomous)
Gate Questions on Hashing
Given the following input (4322, 1334, 1471, 9679, 1989, 6171, 6173, 4199)
and the hash function x mod 10, which of the following statements are true?
i) 9679, 1989, 4199 hash to the same value
ii) 1471, 6171 has to the same value
iii) All elements hash to the same value
iv) Each element hashes to a different value
(a) i only
(b) ii only
(c) i and ii only
(d) iii or iv Gate 2004
Answer: C
ACE Engineering College(Autonomous)
Gate Questions on Hashing
A hash table contains 10 buckets and uses linear probing to resolve collisions.
The key values are integers and the hash function used is key % 10. If the
values 43, 165, 62, 123, 142 are inserted in the table, in what location would
the key value 142 be inserted? GATE 2005 Question
(A) 2
(B) 3
(C) 4
(D) 6
Answer: D
ACE Engineering College(Autonomous)
Gate Questions on Hashing
The keys 12, 18, 13, 2, 3, 23, 5 and 15 are inserted into an initially empty hash
table of length 10 using open addressing with hash function h(k) = k mod 10
and linear probing. What is the resultant hash table?
Answer: C
ACE Engineering College(Autonomous)