UNIT-III
UNIT III SPECIAL TREES AND HASHING
AVL Tree
B Tree
Trie
Hashing
Separate Chaining
Open Addressing
Linear Probing & Quadratic Probing
Double Hashing & Rehashing
2
AVL Tree
An AVL tree (named after inventors Adelson-Velsky
and Landis) is a self-balancing binary search tree
AVL Tree can be defined as height balanced binary search
tree in which each node is associated with a balance factor.
It is calculated by subtracting the height of its right sub-tree
from that of its left sub-tree.
Balance Factor (k) = height (left(k)) - height (right(k))
3
In AVL tree,Balance factor of every node is either 0 or 1 or -
1.
If balance factor of any node is 1, it means that the left sub-
tree is one level higher than the right sub-tree.
If balance factor of any node is 0, it means that the left sub-
tree and right sub-tree contain equal height.
If balance factor of any node is -1, it means that the left sub-
tree is one level lower than the right sub-tree.
4
BALANCED AVL TREE
5
AVL Tree Operations
Like BST Operations, commonly performed operations on AVL tree
are-
1.Search Operation
2.Insertion Operation
3.Deletion Operation
After performing any operation on AVL tree, the balance factor of
each node is checked.
6
AVL Tree Rotations
1.Left Rotation (LL Rotation)
2.Right Rotation (RR Rotation)
3.Left-Right Rotation (LR Rotation)
4.Right-Left Rotation (RL Rotation)
7
LL Rotation
◦ When a node is added into the right subtree of the right subtree, if the tree gets out of
balance, we do a single left rotation.
8
RR Rotation
If a node is added to the left subtree of the left subtree, the AVL tree may get
out of balance, we do a single right rotation.
9
Left-Right Rotation:
◦ A left-right rotation is a combination in which first left rotation takes place after that right
rotation executes.
10
Right-Left Rotation:
◦ A right-left rotation is a combination in which first right rotation takes place after that left
rotation executes.
11
Insertion in AVL Tree
To insert an element in the AVL tree, follow the following steps-
•Insert the element in the AVL tree in the same way the insertion is
performed in BST.
•After insertion, check the balance factor of each node of the
resulting tree.
12
Now, following two cases are possible-
Case-01:
After the insertion, the balance factor of each node is either 0 or 1 or -1.
In this case, the tree is considered to be balanced.
Conclude the operation.
Insert the next element if any.
Case-02:
After the insertion, the balance factor of at least one node is not 0 or 1 or -1.
In this case, the tree is considered to be imbalanced.
Perform the suitable rotation to balance the tree.
After the tree is balanced, insert the next element if any.
13
Construct AVL Tree for the following sequence of numbers-
50 , 20 , 60 , 10 , 8 , 15 , 32 , 46 , 11 , 48
Step-01: Insert 50
14
Step-02: Insert 20
Step-03: Insert 60
15
Step-04: Insert 10
Step-05: Insert 8
16
Step-05: Insert 8
17
Step-06: Insert 15
18
Step-07: Insert 32
Step-08: Insert 46
19
Step-09: Insert 11
20
Step-10: Insert 48
21
22
AVL Deletion
Deleting a node from an AVL tree is similar to that in a binary search
tree.
Deletion also may disturb the balance factor of an AVL tree and therefore
the tree needs to be rebalanced in order to maintain the AVLness.
For this purpose, we need to perform rotations
23
Delete 30
24
25
Pros and cons
Pros of AVL Trees
Faster Search Operations faster search operations than their counterparts
such as red-black trees
AVL trees can self-balance themselves and therefore provides time
complexity as O(Log n) for search, insert and delete.
It is a BST only (with balancing), so items can be traversed in sorted order.
Cons of AVL Trees
Slow Inserts and Deletes If it is slightly unbalanced it can severely impact
the amount of time that it takes to perform inserts and deletes
It is difficult to implement compared to normal BST and easier compared to
Red Black
26
B - Tree
In search trees like binary search tree, AVL Tree, Red-Black
tree, etc., every node contains only one value (key) and a
maximum of two children.
But there is a special type of search tree called B-Tree in
which a node contains more than one value (key) and more
than two children.
B-Tree is a self-balanced search tree in which every node
contains multiple keys and has more than two children.
27
PROPERTIES OF B TREE
B-Tree of Order m has the following properties...
Property 1 - All leaf nodes must be at same level.
Property 2 - All nodes except root must have at least [m/2]-1 keys and
maximum of m-1 keys.
Property 3 - All non leaf nodes except root (i.e. all internal nodes) must
have at least m/2 children.
Property 4 - If the root node is a non leaf node, then it must have at least
2 children.
Property 5 - A non leaf node with n-1 keys must have n number of
children.
Property 6 - All the key values in a node must be in Ascending Order.
28
29
Operations on a B-Tree
The following operations are performed on a B-Tree...
Search
Insertion
Deletion
30
Insertion Operation in B-Tree
•Step 1 - Check whether tree is Empty.
•Step 2 - If tree is Empty, then create a new node with new key value and
insert it into the tree as a root node.
•Step 3 - If tree is Not Empty, then find the suitable leaf node to which the
new key value is added using Binary Search Tree logic.
•Step 4 - If that leaf node has empty position, add the new key value to that
leaf node in ascending order of key value within the node.
•Step 5 - If that leaf node is already full, split that leaf node by sending
middle value to its parent node. Repeat the same until the sending value is
fixed into a node.
•Step 6 - If the spilting is performed at root node then the middle value
becomes new root node for the tree and the height of the tree is increased by
one.
31
Example
Construct a B-Tree of Order 3 by inserting values 1,2,3,4,5,6,7,8,9,10
32
33
34
35
36
37
38
39
40
Delete Operation
The delete operation has more rules than insert and search operations.
The following algorithm applies:
•Run the search operation and find the target key in the nodes
•Three conditions applied based on the location of the target key, as
explained in the following sections
41
Delete Operation
42
Delete Operation
43
If the target key is in the leaf node
Target is in the leaf node, more than min keys.
Deleting this will not violate the property of B Tree
Target is in leaf node, it has min key nodes
Deleting this will violate the property of B Tree
Target node can borrow key from immediate left node, or immediate
right node (sibling)
The sibling will say yes if it has more than minimum number of keys
The key will be borrowed from the parent node, the max value will be
transferred to a parent, the max value of the parent node will be
transferred to the target node, and remove the target value
44
If the target key is in an internal node
•Either choose, in- order predecessor or in-order successor
•In case the of in-order predecessor, the maximum key from its
left subtree will be selected
•In case of in-order successor, the minimum key from its right
subtree will be selected
•If the target key’s in-order predecessor has more than the min
keys, only then it can replace the target key with the max of the
in-order predecessor
•If the target key’s in-order predecessor does not have more than
min keys, look for in-order successor’s minimum key.
•If the target key’s in-order predecessor and successor both have
less than min keys, then merge the predecessor and successor.
45
If the target key is in a root node
•Replace with the maximum element of the in-order predecessor subtree
•If, after deletion, the target has less than min keys, then the target node
will borrow max value from its sibling via sibling’s parent.
•The max value of the parent will be taken by a target, but with the nodes
of the max value of the sibling.
46
Example:
47
Delete H
Since H is in a leaf and the leaf has more than the minimum number of keys, this
is easy. We move the I over where the H had been. This gives:
48
Delete R
Since R is not in a leaf, we find its successor (the next item in ascending order),
which happens to be S, and move S up to replace the R.
49
Target is in the leaf node, but no siblings have
more than min number of keys
Search for key
Merge with siblings and the minimum of parent nodes
Total keys will be now more than min
The target key will be replaced with the minimum of a parent node
50
Delete P
Although P is in a leaf, this leaf does not have an extra key; the deletion
results in a node with only one key, which is not acceptable for a B-tree
of order 5.
If the sibling node to the immediate left or right has an extra key, we can
then borrow a key from the parent and move a key up from this sibling.
In our specific case, the sibling to the right has an extra key. So, the
successor of P, which is S, is moved down from the parent, and the T is
moved up.
51
52
Delete D
Although D is in a leaf, the leaf has no extra keys, nor do the siblings to
the immediate right or left.
In such a case the leaf has to be combined with one of these two siblings.
This includes moving down the parent's key that was between those of
these two leaves.
In our example, let's combine the leaf containing E with the leaf
containing A B. We also move down the C.
53
54
Search Operation in B-Tree
•Step 1 - Read the search element from the user.
•Step 2 - Compare the search element with first key value of root node in the
tree.
•Step 3 - If both are matched, then display "Given node is found!!!" and
terminate the function
•Step 4 - If both are not matched, then check whether search element is
smaller or larger than that key value.
•Step 5 - If search element is smaller, then continue the search process in left
subtree.
•Step 6 - If search element is larger, then compare the search element with
next key value in the same node and repeate steps 3, 4, 5 and 6 until we find
the exact match or until the search element is compared with last key value
in the leaf node.
•Step 7 - If the last key value in the leaf node is also not matched then
55
display "Element is not found" and terminate the function.
Trie
The word "Trie" is an excerpt from the word "retrieval".
Trie is a sorted tree-based data-structure that stores the set
of strings.
It has the number of pointers equal to the number of
characters of the alphabet in each node.
It can search a word in the dictionary with the help of the
word's prefix.
For example, if we assume that all strings are formed from
the letters 'a' to 'z' in the English alphabet, each trie node
can have a maximum of 26 points.
56
Properties of the Trie for a set of the string:
The root node of the trie always represents the null node.
Each child of nodes is sorted alphabetically.
Each node can have a maximum of 26 children (A to Z).
Each node (except the root) can store one letter of the alphabet.
57
The diagram below depicts a trie representation for the bell, bear, bore, bat, ball,
stop, stock, and stack.
58
https://youtu.be/mFY0J5W8Udk
https://youtu.be/zeMa9sg-VJM?si=EgWExSafOaKLxzYJ
https://youtu.be/AYcsTOeFVas?si=qiblM7abn7zG0-RO
59
Hashing
• Hashing is a process of generating an index or address based
on the data.
• For example, file systems use hash table to generate the disk
location using the filename.
• A good hash function is the one which generates distinct
addresses for distinct file names.
• It is used to perform insertions, deletions, and finds in
constant average time
60
Hashing
Hash Table:
Hash Table is a data structure used to store data elements in a specific
order. The ideal hash table is a fixed size (TableSize) array containing keys.
Hash Function:
The mapping of key into some number in the range 0 to tablesize-1 of the
hash table is called a hash function.
It is used to put the data in the hash table and also to retrieve the data from
the hash table. Thus hash function is used to implement the hash table.
Hash Key:
The integer returned by hash function is called hash key. For numeric keys,
one simple hash function is Key mod TableSize.
61
Hashing
Characteristics of Good Hashing function
1. The hash function should be simple to compute.
2. Number of collisions should be less while placing the record/key
in the hash table. Ideally no collision should occur. Such a function
is called perfect hash function.
3. Hash functions should produce such a keys (buckets) which will
get distributed uniformly over an array.
4. The hash function should depend upon every bit of the key.
Thus the hash function that simply extracts the portion of a key is
not suitable.
62
Hashing
Types of Hash function:-
There are different types of hash function. They are:
1. Division method
2. Mid square
3. Digit folding
63
Hashing
1. Division method
The hash function depends upon the remainder of division. Typically the divisor
is table length.
Example:
If the record 54,72,89,37 is to be placed in the hash table and if the table size is
10 then 0
H(key) = Key % table size m 1
54%10=4 (Places record 54 at index 4 of hash table) 2 72
72%10=2 3
89%10=9 4 54
37%10=7 5
6
7 37
8
9 89
64
Hashing
2. Mid square:
In the mid square method, the key is squared and the middle or
mid part of the result is used as the index.
Consider that if we want to place a record 3111 then
31112 = 9678321
For the hash table of size 1000
H(3111)=783 (the middle 3 digits)
65
Hashing
Collision
Definition:
The situation in which the hash function returns the same hash key for more than one
record is called collision.
Example:
Consider a hash function.
H(key) = recordkey%10 having the hash table of size 10.
The record keys to be placed are
131, 44, 43, 78, 19, 36, 57 and 77
Now if we try to place 77 in the hash table then we get the hash
key to be 7 and index 7 already has the record key 57. This situation
is called collision.
From the index 7 we look for next vacant position at subsequent indices
8, 9 then we find that there is no room to place 77 in the hash table.
This situation is called overflow.
66
Collision Resolution Techniques
The techniques which are used to resolve or overcome collision while inserting
data into the hash table are called collision resolution techniques.
There are two methods for detecting collisions and overflows in the hash table
1. Chaining or Separate chaining.
2. Open addressing
Linear probing
Quadratic probing
Double hashing
67
Collision Resolution Techniques
1. Separate chaining
In this method, a linked list of all elements that hash to the same value is kept. The linked
list has a header node. Any new element inserted will be inserted in the beginning of the
list.
Example:
Consider the keys to be placed in their home buckets are
131, 3, 4, 21, 61, 24, 7, 97, 8, 9
Then we will apply a hash function as
H(key) = key % D
where D is the size of table.
Here D = 10.
68
Collision Resolution Techniques
2. Open Addressing
Open Addressing is an alternative method to resolve collision with linked lists. If a
collision occurs, alternative cells are tried until an empty cell is found. Because all the data
go inside the table, a bigger table is needed for open addressing hashing than for separate
chaining hashing.
There are three methods in open addressing. They are:
i. Linear Probing
ii. Quadratic Probing
iii. Double Hashing
69
Collision Resolution Techniques
Linear Probing:
This is the easiest method of handling collision. If collision occurs, alternative cells are
tried until an empty cell is found.
In linear probing method, the hash table is represented one-dimensional array with
indices that range from 0 to the desired table.
Example:
Consider following keys that are to be inserted in the hash table.
131, 4, 8, 7, 21, 5, 31, 61, 9, 29
Initially, we will put the following keys in the hash table.
131,4,8,7.
We will use Division hash function. That means the keys are placed using the
H(key) = key % tablesize
For instance the element 131 can be placed at
H(key) = 131%10=1
131 is placed at the Index 1. Continuing in this fashion we will place 4, 8 and 7.
70
Collision Resolution Techniques
Now the next key to be inserted is 21. According to the hash function
H(key) = 21%10 = 1
But the index 1 location is already occupied by 131 i.e. collision occurs. To resolve collision
we will linearly move down and at the next empty location. Therefore 21 will be placed at
the index 2. If the next element is 5 we will put element 5 at index 5.
The Hash table after the insertion of 21 and 5 is given below
71
Collision Resolution Techniques
After placing record keys 31, 61 the hash table will be
The next record key that comes is 9, According to decision hash function it demands the
index 9. Hence we will place 9 at index 9. Now the next final record key is 29 and it hashes a
key 9. But the index 9 is already occupied. So there is no next empty bucket as the table size
is limited to index 9. The overflow occurs.
72
Collision Resolution Techniques
Quadratic Probing
In Quadratic Probing, the collision function is quadratic. It eliminates the
primary clustering problem. . If collision occurs, alternative cells are tried until
an empty cell is found. The alternative cells are calculated using the formula,
F(i) = i2.
The formula to calculate the Hash(key) when collision occurs is given by the
formula,
H = (Hash(key)+i2) mod m
Where m is a table size
73
Collision Resolution Techniques
Example :
If we have to insert following elements in the hash table with size 10
37, 90, 55, 22, 11, 17, 49, 87.
We will fill the hash table step by step
74
Collision Resolution Techniques
Now if we want to place 17 a collision will occur as 17%10=7, since 37 is already present in
that location. Hence we will apply quadratic probing to insert this recor the hash table.
H = (Hash(key)+i2)
Consider i = 0 then
(17+02) %10 = 7
(17+12) %10 = 8, When i = 1
The index 8 is empty hence we will place the element at index 8.
Then comes 49 which will be placed at index 9.
49% 10 = 9
75
Collision Resolution Techniques
(87 + 0) % 10 = 7
(87 + 1) % l0 = 8 ... but already occupied
(87 + 22) % 10 = 1 ... already occupied
(87 + 32) % l0 = 6
It is observed that if we want to place all the necessary elements in
the hash table the size of divisor (m) should be twice as large as
total number of elements.
76
Collision Resolution Techniques
Double Hashing
Double hashing is technique in which a second hash function
is applied to the key when a collision occurs.
By applying the second hash function we will get the number
of positions from the point of collision to insert.
There are two important rules to be followed for second hash
function:
• It must never evaluate to zero.
• Must make sure that all cells can be probed.
The formula to be used for double hashing is
H1(key)= key mod tablesize
H2(key) =M –(key mod M)
where M is a prime number smaller than the size of the table.
77
Collision Resolution Techniques
Double Hashing
Example:
Consider the following elements to be placed in the hash table of size
10
37, 90 45, 22, 49, 17, 55
Initially insert the elements using the formula for H
Insert 37, 90, 45, 22,49
78
Collision Resolution Techniques
Double Hashing
Example:
Now if 17 is to be inserted then
H1(17)=17%10=7. Now collision occurs since index 7 is already filled
in.
Now, by using the second function
H2(Key)=M-(Key%M) ( M = 7)
H2(17)=7-(17%7)=7-3=4
That means we have to insert the element 17 at 4
places from 37.
In short we have to take 4 jumps .Therefore the
17 will be placed at index 1.
Now to insert numbers 55
79
Collision Resolution Techniques
Double Hashing
Example:
Insert number 55.
H 1(55) = 55%10=5
H 2(55) =7—(55 %7) =7—6 =1
We have to take one jump from index 5 to place 55.
Finally the hash table looks like this.
80
Collision Resolution Techniques
Rehashing
• Rehashing is a technique in which the table is resized, i.e., the
size of table is doubled by creating a new table.
• It is preferable if the total size of table is a prime number. There
are situations in which the rehashing is required-
- When table is completely full.
- With quadratic probing when the table is filled half.
- When insertions fail due to overflow.
• In such situations, we have to transfer entries from old table to
the new table by re-computing their positions using suitable
hash functions.
81
Collision Resolution Techniques
Rehashing
82
Collision Resolution Techniques
Rehashing
Consider we have to insert the elements 37,
90, 55, 22, 17, 49 and 87. The size is 10 and
will use hash function,
H(key) = key mod tablesize
83
Collision Resolution Techniques
Rehashing
Now the table is almost full and if we try to insert more
elements collisions will occur eventually further
insertions will fail. Hence we will rehash by doubling the
table . The old table size is 10 then we should double this
size for new table, but 20 is not a prime number, we will
prefer to make the table size as 23.
The hash function will be
H(key) = key mod 23
Now the hash table is sufficiently large to accommodate
new insertions.
84
TRIE
• TRIE data structure is an advanced data structure used for storing
and searching strings efficiently.
• TRIE data structure is also known as a Prefix Tree or a Digital
Tree.
• TRIE comes from the word reTRIEval which means to find or get
something back.
• Dictionaries can be implemented efficiently using a TRIE data
structure and Tries are also used for the autocomplete features
that we see in the search engines.
• TRIE data structure is faster than binary search trees and hash
tables for storing and retrieving data.
• We can do prefix-based searching easily with the help of a TRIE.
85
TRIE
• Each node of a trie consists of two things:
A character
A boolean value is used to implement whether this
character represents the end of the word.
Properties of the Trie for a set of the string:
The root node of the trie always represents the null node.
Each child of nodes is sorted alphabetically.
Each node can have a maximum of 26 children (A to Z).
Each node (except the root) can store one letter of the alphabet.
86
TRIE
The diagram below depicts a trie representation for the Words:
cat, car, dog, pick, pickle
87
TRIE
The diagram below depicts a trie representation for the Words:
cat, car, dog, pick, pickle
88
TRIE
The diagram
below depicts
a trie
representation
for the Words:
ball, bat, bear,
bell, bore,
stack, stock,
stot
89
Basic Operations in TRIE
• Insertion
• Deletion
• Search
90
Basic Operations in Tries
Insertion
• This operation is used to insert new strings into the Trie data
structure.
•
91
Basic Operations in Tries
92
Basic Operations in Tries
2. Searching in Trie Data Structure:
• This operation is used to search whether a string is
present in the Trie data structure or not.
• There are two search approaches in the Trie data
structure.
Find whether the given word exists in Trie.
Find whether any word that starts with the
given prefix exists in Trie.
93
Basic Operations in Tries
2. Searching in Trie Data Structure:
2.1 Searching Prefix in Trie Data Structure:
Search for the prefix “an” in the Trie Data Structure.
94
Basic Operations in Tries
2. Searching in Trie Data Structure:
2.2 Searching Complete word in Trie Data Structure:
It is similar to prefix search but additionally, we have to check if the word is ending at
the last character of the word or not.
95
Basic Operations in Tries
3. Deletion in Trie Data Structure
• This operation is used to delete strings from the Trie data
structure.
• There are two cases when deleting a word from Trie.
The deleted word shares a common prefix with other
words in Trie.
The deleted word does not share any common prefix
with other words in Trie.
96
Basic Operations in Tries
3. Deletion in Trie Data Structure
3.1 The deleted word shares a common prefix with other words in Trie.
As shown in the following figure, the deleted word “and” has some common prefixes with
other words ‘ant’. They share the prefix ‘an’.
The solution for this case is to delete all the nodes starting from the end of the prefix to the
last character of the given word.
97
Basic Operations in Tries
3. Deletion in Trie Data Structure
3.2 The deleted word does not share any common prefix with other words in Trie.
As shown in the following figure, the word “geek” does not share any common prefix with
any other words.
The solution for this case is just to delete all the nodes.
98
Applications of Trie
1. Autocomplete Feature: Autocomplete provides suggestions based on what
you type in the search box. Trie data structure is used to implement
autocomplete functionality.
99
Applications of Trie
2. Spell Checkers: If the word typed does not appear in
the dictionary, then it shows suggestions based on what
you typed.
It is a 3-step process that includes :
• Checking for the word in the data dictionary.
• Generating potential suggestions.
• Sorting the suggestions with higher priority on top.
100