Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
45 views16 pages

Hashing

Hashing is a search technique that allows for constant time complexity in searching elements, unlike linear and binary searches which depend on the number of elements. It utilizes a hash table to store data indexed by a hash key generated from a hash function, and handles collisions through methods like separate chaining and open addressing. Various hash functions, such as division, mid-square, folding, and multiplication methods, are used to map keys to indices in the hash table.

Uploaded by

hp1509032014
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views16 pages

Hashing

Hashing is a search technique that allows for constant time complexity in searching elements, unlike linear and binary searches which depend on the number of elements. It utilizes a hash table to store data indexed by a hash key generated from a hash function, and handles collisions through methods like separate chaining and open addressing. Various hash functions, such as division, mid-square, folding, and multiplication methods, are used to map keys to indices in the hash table.

Uploaded by

hp1509032014
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Hashing

In all search techniques like linear search, binary search, the time required to
search an element is depends on the total number of elements in that array/list. In
these search techniques, as the number of elements is increased the time required
to search an element also increased linearly.

Hashing is another approach in which time required to search an element doesn't


depend on the number of element. Using hashing data structure, an element is
searched with constant time complexity. Hashing is an effective way to reduce
the number of comparisons to search an element in a data structure.

Hashing is the process of indexing and retrieving element (data) in a data


structure to provide faster way of finding the element using the hash key.

Here, hash key is a value which provides the index value where the actual data is
likely to store in the data structure.

In this data structure, we use a concept called Hash table to store data. All the data
values are inserted into the hash table based on the hash key value.

Hash key value is used to map the data with index in the hash table. And the hash
key is generated for every data using a hash function. That means every entry in
the hash table is based on the key value generated using a hash function.

Hash Table is defined as follows...

Hash table is just an array which maps a key (data) into the data structure
with the help of hash function such that insertion, deletion and search
operations can be performed with constant time complexity (i.e. O(1)).

Hash tables are used to perform the operations like insertion, deletion and search
very quickly in a data structure. Using hash table concept insertion, deletion and
search operations are accomplished in constant time. Generally, every hash table
make use of a function, which we'll call the hash function to map the data into the
hash table.

Page 1
A hash function is defined as follows...

Hash function is a function which takes a piece of data (i.e. key) as input and
outputs an integer (i.e. hash value) which maps the data to a particular index
in the hash table.

Basic concept of hashing and hash table is shown in the following figure...

Page 2
What is Collision?
Since a hash function gets us a small number for a key which is a big integer or
string, there is possibility that two keys result in same value. The situation where a
newly inserted key maps to an already occupied slot in hash table is called
collision and must be handled using some collision handling technique.

24,19,32,44
Hash table=6 0 to5
0 24
1 19
2 32
3
4
5

k mod 6
24
H(24)=24%6=0
19%6=1
32%6=2
44%6=2

Page 3
How to handle Collisions?

There are mainly two methods to handle collision:


1) Separate Chaining
2) Open Addressing

1. Separate Chaining:
The idea is to make each cell of hash table point to a linked list of records that have
same hash function value.

Page 4
Let us consider a simple hash function as “key mod 7” and sequence of keys as 50,
700, 76, 85, 92, 73, 101.

0 700
1 85 92
2
3 73 101
4
5
6 76

Page 5
Advantages:
1) Simple to implement.
2) Hash table never fills up, we can always add more elements to chain.
3) Less sensitive to the hash function or load factors.
4) It is mostly used when it is unknown how many and how frequently keys may
be inserted or deleted.

Disadvantages:
1) Cache performance of chaining is not good as keys are stored using linked list.
Open addressing provides better cache performance as everything is stored in same
table.
2) Wastage of Space (Some Parts of hash table are never used)
3) If the chain becomes long, then search time can become O(n) in worst case.
4) Uses extra space for links.

2. Open Addressing

Like separate chaining, open addressing is a method for handling collisions. In


Open Addressing, all elements are stored in the hash table itself. So at any point,
size of table must be greater than or equal to total number of keys (Note that we
can increase table size by copying old data if needed).

Insert(k): Keep probing until an empty slot is found. Once an empty slot is found,
insert k.

Search(k): Keep probing until slot’s key doesn’t become equal to k or an empty
slot is reached.

Delete(k): If we simply delete a key, then search may fail. So slots of deleted keys
are marked specially as “deleted”.
Insert can insert an item in a deleted slot, but search doesn’t stop at a deleted slot.

Open Addressing is done following ways:

a) Linear Probing: In linear probing, we linearly probe for next slot. For example,
typical gap between two probes is 1 as taken in below example also.
let hash(x) be the slot index computed using hash function and S be the table size

Page 6
If slot hash(x) % S is full, then we try (hash(x) + 1) % S

If (hash(x) + 1) % S is also full, then we try (hash(x) + 2) % S

If (hash(x) + 2) % S is also full, then we try (hash(x) + 3) % S

..................................................

..................................................

Let us consider a simple hash function as “key mod 7” and sequence of keys as 50,
700, 76, 85, 92, 73, 101.

H(k)=k mod 7 85%7=1

(H(k)+i) mod

Clustering: The main problem with linear probing is clustering, many consecutive
elements form groups and it starts taking time to find a free slot or to search an
element.

Page 7
H(k)=K mod 10
0 19
H’(k,i)=(h(k)+i) mod 10 (3+1) mod 10 1
2 72
9+1 mod 10=0 3 43
4 23
(2+4)mod 5 135
Keys: 43,135,72,23,99,19,82 6 82
7
43%10=3 8
9 99
135%10=5

72%10=2

23%10=3 O(n)

99%10=9

19%10=9

b) Quadratic Probing We look for i2‘th slot in i’th iteration.

let hash(x) be the slot index computed using hash function.

If slot hash(x) % S is full, then we try (hash(x) + 1*1) % S

If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S

If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S

H(k)=k mod 10

H’(k,i)=(h(k)+i^2) mod 10

Keys:42,16,91,33,18,27,36,62

c) Double Hashing We use another hash function hash2(x) and look for
i*hash2(x) slot in i’th rotation.

let hash(x) be the slot index computed using hash function.

Page 8
If slot hash(x) % S is full, then we try (hash(x) + 1*hash2(x)) % S

If (hash(x) + 1*hash2(x)) % S is also full, then we try (hash(x) + 2*hash2(x)) % S

If (hash(x) + 2*hash2(x)) % S is also full, then we try (hash(x) + 3*hash2(x)) % S

Comparison of above three:


Linear probing has the best cache performance, but suffers from clustering. One
more advantage of Linear probing is easy to compute.

 Quadratic probing lies between the two in terms of cache performance and
clustering.

 Double hashing has poor cache performance but no clustering. Double


hashing requires more computation time as two hash functions need to be
computed.

Open Addressing vs. Separate Chaining


Advantages of Chaining:
1) Chaining is Simpler to implement.
2) In chaining, Hash table never fills up, we can always add more elements to
chain. In open addressing, table may become full.
3) Chaining is Less sensitive to the hash function or load factors.
4) Chaining is mostly used when it is unknown how many and how frequently keys
may be inserted or deleted.
5) Open addressing requires extra care for to avoid clustering and load factor.

Advantages of Open Addressing


1) Cache performance of chaining is not good as keys are stored using linked list.
Open addressing provides better cache performance as everything is stored in same
table.
2) Wastage of Space (Some Parts of hash table in chaining are never used). In
Open addressing, a slot can be used even if an input doesn’t map to it.
3) Chaining uses extra space for links.

Page 9
Rehashing:

Rehashing is a technique in which the table is resized i.e.size of the table is


doubled by creating a new table.

It is preferable if the total size of table is a prime number.

When table is completely full

When insertion fail due to overflow.

Example:

37,90,55,22,17,49,87

Table size=10

H(key)=k mod table size

Page 10
Types of Hash functions

Types of Hash functions

1. Division Method.

2. Mid Square Method.

3. Folding Method.

4. Multiplication Method.

1. Division Method:

This is the most simple and easiest method to generate a hash value. The hash
function divides the value k by M and then uses the remainder obtained.

Formula:

h(K) = k mod M

Here,
k is the key value, and
M is the size of the hash table.

It is best suited that M is a prime number as that can make sure the keys are more
uniformly distributed. The hash function is dependent upon the remainder of a
division.

Example:

k = 12345
M = 95

H(12345)=12345%95

Page 11
h(12345) = 12345 mod 95
= 90

H(1276)=1276%11=0

k = 1276
M = 11 0 1276
1

10
h(1276) = 1276 mod 11
=0

54,72,89,37 if the table size is 10 then 0

M=10 0
1
H(54)=54%10=4 2 72
3
H(72)=72%10=2 4 54
5
6
7 37
8
9 89

Page 12
H(89)=89%10=9

H(37)=37%10=7

Advantages:

1. This method is quite good for any value of M.

2. The division method is very fast since it requires only a single division
operation.

Disadvantages:

1. This method leads to poor performance since consecutive keys map to


consecutive hash values in the hash table.

2. Sometimes extra care should be taken to choose the value of M.

2. Mid Square Method:

The mid-square method is a very good hashing method. It involves two steps to
compute the hash value-

1. Square the value of the key k i.e. k2

2. Extract the middle r digits as the hash value.

Formula:

h(K) = h(k x k)

Here,
k is the key value.

The value of r can be decided based on the size of the table.

Example:

k = 60

H(60)=60*60=3600

H(60)=60

Page 13
k x k = 60 x 60
= 3600
h(60) = 60

The hash value obtained is 60

3. Digit Folding Method:

This method involves two steps:

1. Divide the key-value k into a number of parts i.e. k1, k2, k3,….,kn, where
each part has the same number of digits except for the last part that can have
lesser digits than the other parts.

2. Add the individual parts. The hash value is obtained by ignoring the last
carry if any.

Formula:

k = k1, k2, k3, k4, ….., kn


s = k1+ k2 + k3 + k4 +….+ kn
h(K)= s

Here,
s is obtained by adding the parts of the key k

Example:

k = 12345
k1 = 12, k2 = 34, k3 = 5
s = k1 + k2 + k3
= 12 + 34 + 5
= 51
h(K) = 51

Page 14
4. Multiplication Method

This method involves the following steps:

1. Choose a constant value A such that 0 < A < 1.

2. Multiply the key value with A.

3. Extract the fractional part of kA.

4. Multiply the result of the above step by the size of the hash table i.e. M.

5. The resulting hash value is obtained by taking the floor of the result obtained
in step 4.

Formula:

h(K) = floor (M (kA mod 1))

Here,
M is the size of the hash table.
k is the key value.
A is a constant value.

Example:

k = 12345
Donald Knuth suggested to use A = 0.61803398987
M = 100

Example:

Let key=107,assume M=50

A=0.61803398987

H(k)= floor[ 50 (107*0.61803398987)]


= floor[ 66.12]) ]
h(k)=0.12

=50*0.12

Page 15
=6

107 will be placed at index 6 in hash table

Extraction

In this method some digits are extracted from the key to form the address location
in hash table

For example:

Suppose first,third and fourth digit from left is selected for hash key.

497824

478->at 478 location in the hash table of size 1000 the key can be stored

3111

H(3111)=783

Page 16

You might also like