Unit 4 DSA
Unit 4 DSA
We read the linear data structures like an array, linked list, stack and queue in which all
the elements are arranged in a sequential manner. The different data structures are
used for different kinds of data.
○ What type of data needs to be stored?: It might be a possibility that a certain data
structure can be the best fit for some kind of data.
○ Cost of operations: If we want to minimize the cost for the operations for the
most frequently performed operations. For example, we have a simple list on
which we have to perform the search operation; then, we can create an array in
which elements are stored in sorted order to perform the binary search. The
binary search works very fast for the simple list as it divides the search space
into half.
○ Memory usage: Sometimes, we want a data structure that utilizes less memory
○ A tree data structure is a non-linear data structure because it does not store in a
sequential manner. It is a hierarchical structure as elements in a Tree are
arranged in multiple levels.
○ In the Tree data structure, the topmost node is known as a root node. Each node
contains some data, and data can be of any type. In the above tree structure, the
node contains the name of the employee, so the type of data would be a string.
○ Each node contains some data and the link or reference of other nodes that can
be called children.
○ Root: The root node is the topmost node in the tree hierarchy. In other words, the
root node is the one that doesn't have any parent. In the above structure, node
numbered 1 is the root node of the tree. If a node is directly linked to some other
node, it would be called a parent-child relationship.
○ Child node: If the node is a descendant of any node, then the node is known as a
child node.
○ Parent: If the node contains any sub-node, then that node is said to be the parent
of that sub-node.
○ Sibling: The nodes that have the same parent are known as siblings.
○ Leaf Node:- The node of the tree, which doesn't have any child node, is called a
leaf node. A leaf node is the bottom-most node of the tree. There can be any
number of leaf nodes present in a general tree. Leaf nodes can also be called
external nodes.
○ Internal nodes: A node has atleast one child node known as an internal
○ Recursive data structure: The tree is also known as a recursive data structure. A
tree can be defined as recursively because the distinguished node in a tree data
structure is known as a root node. The root node of the tree contains a link to all
the roots of its subtrees. The left subtree is shown in the yellow color in the
below figure, and the right subtree is shown in the red color. The left subtree can
be further split into subtrees shown in three different colors. Recursion means
reducing something in a self-similar manner. So, this recursive property of the
tree data structure is implemented in various applications.
○ Number of edges: If there are n nodes, then there would n-1 edges. Each arrow in
the structure represents the link or path. Each node, except the root node, will
have atleast one incoming link known as an edge. There would be one link for the
parent-child relationship.
○ Depth of node x: The depth of node x can be defined as the length of the path
from the root to the node x. One edge contributes one-unit length in the path. So,
the depth of node x can also be defined as the number of edges between the root
node and the node x. The root node has 0 depth.
○ Height of node x: The height of node x can be defined as the longest path from
the node x to the leaf node.
Based on the properties of the Tree data structure, trees are classified into various
categories.
Implementation of Tree
The tree data structure can be created by creating the nodes dynamically with the help
of the pointers. The tree in the memory can be represented as shown below:
The above figure shows the representation of the tree data structure in the memory. In
the above structure, the node contains three fields. The second field stores the data; the
first field stores the address of the left child, and the third field stores the address of the
right child.
1. struct node
2. {
3. int data;
4. struct node *left;
5. struct node *right;
6. }
The above structure can only be defined for the binary trees because the binary tree can
have utmost two children, and generic trees can have more than two children. The
structure of the node for generic trees would be different as compared to the binary
tree.
Applications of trees
○ Organize data: It is used to organize data for efficient insertion, deletion and
searching. For example, a binary tree has a logN time for searching an element.
○ Trie: It is a special kind of tree that is used to store the dictionary. It is a fast and
efficient way for dynamic spell checking.
○ B-Tree and B+Tree: B-Tree and B+Tree are the tree data structures used to
implement indexing in databases.
○ Routing table: The tree data structure is also used to store the data in routing
tables in the routers.
○ General tree: The general tree is one of the types of tree data structure. In the
general tree, a node can have either 0 or maximum n number of nodes. There is
no restriction imposed on the degree of the node (the number of nodes that a
node can contain). The topmost node in a general tree is known as a root node.
The children of the parent node are known as subtrees.
There can be n number of subtrees in a general tree. In the general tree, the
subtrees are unordered as the nodes in the subtree cannot be ordered.
Every non-empty tree has a downward edge, and these edges are connected to
the nodes known as child nodes. The root node is labeled with level 0. The nodes
that have the same parent are known as siblings.
○ Binary tree: Here, binary name itself suggests two numbers, i.e., 0 and 1. In a
binary tree, each node in a tree can have utmost two child nodes. Here, utmost
means whether the node has 0 nodes, 1 node or 2 nodes.
○ Binary Search tree: Binary search tree is a non-linear data structure in which one
node is connected to n number of nodes. It is a node-based data structure. A
node can be represented in a binary search tree with three fields, i.e., data part,
left-child, and right-child. A node can be connected to the utmost two child nodes
in a binary search tree, so the node contains two pointers (left child and right
child pointer).
Every node in the left subtree must contain a value less than the value of the root
node, and the value of each node in the right subtree must be bigger than the
value of the root node.
A node can be created with the help of a user-defined data type known as struct, as
shown below:
1. struct node
2. {
3. int data;
4. struct node *left;
5. struct node *right;
6. }
The above is the node structure with three fields: data field, the second field is the left
pointer of the node type, and the third field is the right pointer of the node type.
○ AVL tree It is one of the types of the binary tree, or we can say that it is a variant
of the binary search tree. AVL tree satisfies the property of the binary tree as well
as of the binary search tree. It is a self-balancing binary search tree that was
invented by Adelson Velsky Lindas. Here, self-balancing means that balancing
the heights of left subtree and right subtree. This balancing is measured in terms
of the balancing factor.
We can consider a tree as an AVL tree if the tree obeys the binary search tree as well as
a balancing factor. The balancing factor can be defined as the difference between the
height of the left subtree and the height of the right subtree. The balancing factor's
value must be either 0, -1, or 1; therefore, each node in the AVL tree should have the
value of the balancing factor either as 0, -1, or 1.
○ Red-Black Tree The red-Black tree is the binary search tree. The prerequisite of
the Red-Black tree is that we should know about the binary search tree. In a
binary search tree, the value of the left-subtree should be less than the value of
that node, and the value of the right-subtree should be greater than the value of
that node. As we know that the time complexity of binary search in the average
case is log2n, the best case is O(1), and the worst case is O(n).
When any operation is performed on the tree, we want our tree to be balanced so that all
the operations like searching, insertion, deletion, etc., take less time, and all these
operations will have the time complexity of log2n.
The red-black tree is a self-balancing binary search tree. AVL tree is also a height
balancing binary search tree then why do we require a Red-Black tree. In the AVL tree,
we do not know how many rotations would be required to balance the tree, but in the
Red-black tree, a maximum of 2 rotations are required to balance the tree. It contains
one extra bit that represents either the red or black color of a node to ensure the
balancing of the tree.
○ Splay tree The splay tree data structure is also binary search tree in which
recently accessed element is placed at the root position of tree by performing
some rotation operations. Here, splaying means the recently accessed node. It is
a self-balancing binary search tree having no explicit balance condition like AVL
tree.
It might be a possibility that height of the splay tree is not balanced, i.e., height of both
left and right subtrees may differ, but the operations in splay tree takes order of logN
time where n is the number of nodes.
Splay tree is a balanced tree but it cannot be considered as a height balanced tree
because after each operation, rotation is performed which leads to a balanced tree.
○ Treap Treap data structure came from the Tree and Heap data structure. So, it
comprises the properties of both Tree and Heap data structures. In Binary search
tree, each node on the left subtree must be equal or less than the value of the
root node and each node on the right subtree must be equal or greater than the
value of the root node. In heap data structure, both right and left subtrees contain
larger keys than the root; therefore, we can say that the root node contains the
lowest value.
In treap data structure, each node has both key and priority where key is derived from
the Binary search tree and priority is derived from the heap data structure.
The Treap data structure follows two properties which are given below:
○ Right child of a node>=current node and left child of a node <=current node
(binary tree)
○ Children of any subtree must be greater than the node (heap)
○ B-tree B-tree is a balanced m-way tree where m defines the order of the tree. Till
now, we read that the node contains only one key but b-tree can have more than
one key, and more than 2 children. It always maintains the sorted data. In binary
tree, it is possible that leaf nodes can be at different levels, but in b-tree, all the
leaf nodes must be at the same level.
○ For minimum children, a leaf node has 0 children, root node has minimum 2
children and internal node has minimum ceiling of m/2 children. For example, the
value of m is 5 which means that a node can have 5 children and internal nodes
can contain a maximum of 3 children.
The root node must contain minimum 1 key and all other nodes must contain at least a
ceiling of m/2 minus 1 keys.
○ Preorder traversal
○ Inorder traversal
○ Postorder traversal
So, in this article, we will discuss the above-listed techniques of traversing a tree. Now,
let's start discussing the ways of tree traversal.
Preorder traversal
This technique follows the 'root left right' policy. It means that, first root node is visited
after that the left subtree is traversed recursively, and finally, right subtree is recursively
traversed. As the root node is traversed before (or pre) the left and right subtree, it is
called preorder traversal.
So, in a preorder traversal, each node is visited before both of its subtrees.
Algorithm
Example
So, for left subtree B, first, the root node B is traversed itself; after that, its left subtree D
is traversed. Since node D does not have any children, move to right subtree E. As node
E also does not have any children, the traversal of the left subtree of root node A is
completed.
Now, move towards the right subtree of root node A that is C. So, for right subtree C, first
the root node C has traversed itself; after that, its left subtree F is traversed. Since node
F does not have any children, move to the right subtree G. As node G also does not have
any children, traversal of the right subtree of root node A is completed.
Therefore, all the nodes of the tree are traversed. So, the output of the preorder traversal
of the above tree is - A → B → D → E → C → F → G
Postorder traversal
This technique follows the 'left-right root' policy. It means that the first left subtree of the
root node is traversed, after that recursively traverses the right subtree, and finally, the
root node is traversed. As the root node is traversed after (or post) the left and right
subtree, it is called postorder traversal.
So, in a postorder traversal, each node is visited after both of its subtrees.
Algorithm
Now, start applying the postorder traversal on the above tree. First, we traverse the left
subtree B that will be traversed in postorder. After that, we will traverse the right subtree
C in postorder. And finally, the root node of the above tree, i.e., A, is traversed.
So, for left subtree B, first, its left subtree D is traversed. Since node D does not have any
children, traverse the right subtree E. As node E also does not have any children, move
to the root node B. After traversing node B, the traversal of the left subtree of root node
A is completed.
Now, move towards the right subtree of root node A that is C. So, for right subtree C, first
its left subtree F is traversed. Since node F does not have any children, traverse the right
subtree G. As node G also does not have any children, therefore, finally, the root node of
the right subtree, i.e., C, is traversed. The traversal of the right subtree of root node A is
completed.
At last, traverse the root node of a given tree, i.e., A. After traversing the root node, the
postorder traversal of the given tree is completed.
Therefore, all the nodes of the tree are traversed. So, the output of the postorder
traversal of the above tree is - D → E → B → F → G → C → A
Inorder traversalThis technique follows the 'left root right' policy. It means that first
left subtree is visited after that root node is traversed, and finally, the right subtree is
traversed. As the root node is traversed between the left and right subtree, it is named
inorder traversal.
So, in the inorder traversal, each node is visited in between of its subtrees.
Algorithm
Example
Now, start applying the inorder traversal on the above tree. First, we traverse the left
subtree B that will be traversed in inorder. After that, we will traverse the root node A.
And finally, the right subtree C is traversed in inorder.
So, for left subtree B, first, its left subtree D is traversed. Since node D does not have any
children, so after traversing it, node B will be traversed, and at last, right subtree of node
B, that is E, is traversed. Node E also does not have any children; therefore, the traversal
of the left subtree of root node A is completed.
At last, move towards the right subtree of root node A that is C. So, for right subtree C;
first, its left subtree F is traversed. Since node F does not have any children, node C will
be traversed, and at last, a right subtree of node C, that is, G, is traversed. Node G also
does not have any children; therefore, the traversal of the right subtree of root node A is
completed.
As all the nodes of the tree are traversed, the inorder traversal of the given tree is
completed. The output of the inorder traversal of the above tree is -
D→B→E→A→F→C→G
Whereas the space complexity of tree traversal techniques discussed above is O(1) if
we do not consider the stack size for function calls. Otherwise, the space complexity of
these techniques is O(h), where 'h' is the tree's height.
Binary Tree
The Binary tree means that the node can have maximum two children. Here, binary
name itself suggests that 'two'; therefore, each node can have either 0, 1 or 2 children.
The above tree is a binary tree because each node contains the utmost two children.
The logical representation of the above tree is given below:
In the above tree, node 1 contains two pointers, i.e., left and a right pointer pointing to
the left and right node respectively. The node 2 contains both the nodes (left and right
node); therefore, it has two pointers (left and right). The nodes 3, 5 and 6 are the leaf
nodes, so all these nodes contain NULL pointer on both left and right parts.
The full binary tree is also known as a strict binary tree. The tree can only be considered
as the full binary tree if each node must contain either 0 or 2 children. The full binary
tree can also be defined as the tree in which each node must contain 2 children except
the leaf nodes.
The complete binary tree is a tree in which all the nodes are completely filled except the
last level. In the last level, all the nodes must be as left as possible. In a complete binary
tree, the nodes should be added from the left.
A tree is a perfect binary tree if all the internal nodes have 2 children, and all the leaf
nodes are at the same level.
The below tree is not a perfect binary tree because all the leaf nodes are not at the
same level.
Note: All the perfect binary trees are the complete binary trees as well as the full binary tree,
but vice versa is not true, i.e., all complete binary trees and full binary trees are the perfect
binary trees
Degenerate Binary Tree
The degenerate binary tree is a tree in which all the internal nodes have only one
children.
The above tree is a degenerate binary tree because all the nodes have only one child. It
is also known as a right-skewed tree as all the nodes have a right child only.
The above tree is also a degenerate binary tree because all the nodes have only one
child. It is also known as a left-skewed tree as all the nodes have a left child only.
Balanced Binary Tree
The balanced binary tree is a tree in which both the left and right trees differ by atmost
1. For example, AVL and Red-Black trees are balanced binary tree.
The above tree is a balanced binary tree because the difference between the left subtree
and right subtree is zero.
The above tree is not a balanced binary tree because the difference between the left
subtree and the right subtree is greater than 1.
Binary Search tree
A binary search tree follows some order to arrange the elements. In a Binary search tree,
the value of left node must be smaller than the parent node, and the value of right node
must be greater than the parent node. This rule is applied recursively to the left and right
subtrees of the root.
In the above figure, we can observe that the root node is 40, and all the nodes of the left
subtree are smaller than the root node, and all the nodes of the right subtree are greater
than the root node.
Similarly, we can see the left child of root node is greater than its left child and smaller
than its right child. So, it also satisfies the property of binary search tree. Therefore, we
can say that the tree in the above image is a binary search tree.
Suppose if we change the value of node 35 to 55 in the above tree, check whether the
tree will be binary search tree or not.
In the above tree, the value of root node is 40, which is greater than its left child 30 but
smaller than right child of 30, i.e., 55. So, the above tree does not satisfy the property of
Binary search tree. Therefore, the above tree is not a binary search tree.
○ Searching an element in the Binary search tree is easy as we always have a hint
that which subtree has the desired element.
○ As compared to array and linked lists, insertion and deletion operations are faster
in BST.
Suppose the data elements are - 45, 15, 79, 90, 10, 55, 12, 20, 50
○ First, we have to insert 45 into the tree as the root of the tree.
○ Then, read the next element; if it is smaller than the root node, insert it as the root
of the left subtree, and move to the next element.
○ Otherwise, if the element is larger than the root node, then insert it as the root of
the right subtree.
As 15 is smaller than 45, so insert it as the root node of the left subtree.
Step 3 - Insert 79.
As 79 is greater than 45, so insert it as the root node of the right subtree.
90 is greater than 45 and 79, so it will be inserted as the right subtree of 79.
55 is larger than 45 and smaller than 79, so it will be inserted as the left subtree of 79.
12 is smaller than 45 and 15 but greater than 10, so it will be inserted as the right
subtree of 10.
20 is smaller than 45 but greater than 15, so it will be inserted as the right subtree of
15.
Step 9 - Insert 50.
50 is greater than 45 but smaller than 79 and 55. So, it will be inserted as a left subtree
of 55.
Now, the creation of binary search tree is completed. After that, let's move towards the
operations that can be performed on Binary search tree.
Now, the creation of binary search tree is completed. After that, let's move towards the
operations that can be performed on Binary search tree.
1. First, compare the element to be searched with the root element of the tree.
2. If root is matched with the target element, then return the node's location.
3. If it is not matched, then check whether the item is less than the root element, if it
is smaller than the root element, then move to the left subtree.
4. If it is larger than the root element, then move to the right subtree.
6. If the element is not found or not present in the tree, then return NULL.
Now, let's understand the searching in binary tree using an example. We are taking the
binary search tree formed above. Suppose we have to find node 20 from the below tree.
Step1:
Step2:
Step3:
Algorithm to search an element in Binary search tree
1. Search (root, item)
2. Step 1 - if (item = root → data) or (root = NULL)
3. return root
4. else if (item < root → data)
5. return Search(root → left, item)
6. else
7. return Search(root → right, item)
8. END if
9. Step 2 - END
In a binary search tree, we must delete a node from the tree by keeping in mind that the
property of BST is not violated. To delete a node from BST, there are three possible
situations occur -
It is the simplest case to delete a node in BST. Here, we have to replace the leaf node
with NULL and simply free the allocated space.
We can see the process to delete a leaf node from BST in the below image. In below
image, suppose we have to delete node 90, as the node to be deleted is a leaf node, so it
will be replaced with NULL, and the allocated space will free.
When the node to be deleted has only one child
In this case, we have to replace the target node with its child, and then delete the child
node. It means that after replacing the target node with its child node, the child node will
now contain the value to be deleted. So, we simply have to replace the child node with
NULL and free up the allocated space.
We can see the process of deleting a node with one child from BST in the below image.
In the below image, suppose we have to delete the node 79, as the node to be deleted
has only one child, so it will be replaced with its child 55.
So, the replaced node 79 will now be a leaf node that can be easily deleted.
This case of deleting a node in BST is a bit complex among other two cases. In such a
case, the steps to be followed are listed as follows -
○ After that, replace that node with the inorder successor until the target node is
placed at the leaf of tree.
○ And at last, replace the node with NULL and free up the allocated space.
The inorder successor is required when the right child of the node is not empty. We can
obtain the inorder successor by finding the minimum element in the right child of the
node.
We can see the process of deleting a node with two children from BST in the below
image. In the below image, suppose we have to delete node 45 that is the root node, as
the node to be deleted has two children, so it will be replaced with its inorder successor.
Now, node 45 will be at the leaf of the tree so that it can be deleted easily.
A new key in BST is always inserted at the leaf. To insert an element in BST, we have to
start searching from the root node; if the node to be inserted is less than the root node,
then search for an empty location in the left subtree. Else, search for the empty location
in the right subtree and insert the data. Insert in BST is similar to searching, as we
always have to maintain the rule that the left subtree is smaller than the root, and right
subtree is larger than the root.
Now, let's see the process of inserting a node into BST using an example.
The complexity of the Binary Search tree
1. Time Complexity
Operati Best case time Average case time Worst case time
ons complexity complexity complexity
2. Space Complexity
Insertion O(n)
Deletion O(n)
Search O(n)
AVL Tree can be defined as a height balanced binary search tree in which each node is
associated with a balance factor which is calculated by subtracting the height of its
right sub-tree from that of its left sub-tree.
An AVL tree is given in the following figure. We can see that, balance factor associated
with each node is in between -1 and +1. therefore, it is an example of AVL tree.
Complexity
Algorithm Average case Worst case
S Operatio Description
N n
in the AVL tree property and therefore the tree may need balancing.
in a binary search tree. Deletion may also disturb the balance of the
tree.
Why AVL Tree?
AVL tree controls the height of the binary search tree by not letting it to be skewed. The
time taken for all operations in a binary search tree of height h is O(h). However, it can
be extended to O(n) if the BST becomes skewed (i.e. worst case). By limiting this height
to log n, AVL tree imposes an upper bound on each operation to be O(log n) where n is
the number of nodes.
AVL Rotations
We perform rotation in AVL tree only in case if Balance Factor is other than -1, 0, and 1.
There are basically four types of rotations which are as follows:
Where node A is the node whose balance Factor is other than -1, 0, 1.
The first two rotations LL and RR are single rotations and the next two rotations LR and
RL are double rotations. For a tree to be unbalanced, minimum height must be at least 2,
Let us understand each rotation
1. RR Rotation
When BST becomes unbalanced, due to a node is inserted into the right subtree of the
right subtree of A, then we perform RR rotation, RR rotation is an anticlockwise rotation,
which is applied on the edge below a node having balance factor -2
In above example, node A has balance factor -2 because a node C is inserted in the right
subtree of A right subtree. We perform the RR rotation on the edge below A.
2. LL Rotation
When BST becomes unbalanced, due to a node is inserted into the left subtree of the
left subtree of C, then we perform LL rotation, LL rotation is clockwise rotation, which is
applied on the edge below a node having balance factor 2.
In above example, node C has balance factor 2 because a node A is inserted in the left
subtree of C left subtree. We perform the LL rotation on the edge below A.
3. LR Rotation
Double rotations are bit tougher than single rotation which has already explained above.
LR rotation = RR rotation + LL rotation, i.e., first RR rotation is performed on subtree and
then LL rotation is performed on full tree, by full tree we mean the first node from the
path of inserted node whose balance factor is other than -1, 0, or 1.
State Action
subtree of C
left of C
B, A is left subtree of B
is balanced now.
4. RL Rotation
As already discussed, that double rotations are bit tougher than single rotation which has
already explained above. R L rotation = LL rotation + RR rotation, i.e., first LL rotation is
performed on subtree and then RR rotation is performed on full tree, by full tree we mean the
first node from the path of inserted node whose balance factor is other than -1, 0, or 1.
State Action
A node B has been inserted into the left subtree of C the right
i.e. on node A. node C has now become the right subtree of node
balanced now.
M-way Trees
Before learning about B-Trees we need to know what M-way trees are, and how B-tree is
a special type of M-way tree. An M-way(multi-way) tree is a tree that has the following
properties:
The above image shows a 4-way tree, where each node can have at most 3(4-1) key
fields and at most 4 children. It is also a 4-way search tree.
● Each node in the tree can associate with m children and m-1 key fields.
● The keys in any node of the tree are arranged in a sorted order(ascending).
● The keys in the first K children are less than the Kth key of this node.
● The keys in the last (m-K) children are higher than the Kth key.
M-way search trees have the same advantage over the M-way trees, which is making the
search and update operations much more efficient. Though, they can become
unbalanced which in turn leaves us to the same issue of searching for a key in a skewed
tree which is not much of an advantage.
If we want to search for a value say X in an M-way search tree and currently we are at a
node that contains key values from Y1, Y2, Y3,.....,Yk. Then in total 4 cases are possible
to deal with this scenario, these are:
● If X < Y1, then we need to recursively traverse the left subtree of Y1.
● If X > Yk, then we need to recursively traverse the right subtree of Yk.
● If X = Yi, for some i, then we are done, and can return.
● Last and only remaining case is that when for some i we have Yi < X < Y(i+1), then
in this case we need to recursively traverse the subtree that is present in between
Yi and Y(i+1).
For example, consider the 3-way search tree that is shown above, say, we want to
search for a node having key(X) equal to 60. Then, considering the above cases, for the
root node, the second condition applies, and (60 > 40) and hence we move on level
down to the right subtree of 40. Now, the last condition is valid only, hence we traverse
the subtree which is in between the 55 and 70. And finally, while traversing down, we
have our value that we were looking for.
B Tree
B Tree is a specialized m-way tree that can be widely used for disk access. A B-Tree of
order m can have at most m-1 keys and m children. One of the main reasons for using B
tree is its capability to store a large number of keys in a single node and large key values
by keeping the height of the tree relatively small.
A B tree of order m contains all the properties of an M way tree. In addition, it contains
the following properties.
2. Every node in a B-Tree except the root node and the leaf node contain at least
m/2 children.
It is not necessary that all the nodes contain the same number of children but, each
node must have m/2 number of nodes.
While performing some operations on B Tree, any property of B Tree may violate such
as number of minimum children a node can have. To maintain the properties of B Tree,
the tree may split or join.
Operations
Searching :
Searching in B Trees is similar to that in Binary search tree. For example, if we search
for an item 49 in the following B Tree. The process will something like following :
1. Compare item 49 with root node 78. since 49 < 78 hence, move to its left
sub-tree.
Searching in a B tree depends upon the height of the tree. The search algorithm takes
O(log n) time to search any element in a B tree.
Inserting
Insertions are done at the leaf node level. The following algorithm needs to be followed
in order to insert an item into B Tree.
1. Traverse the B Tree in order to find the appropriate leaf node at which the node
can be inserted.
2. If the leaf node contain less than m-1 keys then insert the element in the
increasing order.
3. Else, if the leaf node contains m-1 keys, then follow the following steps.
○ If the parent node also contain m-1 number of keys, then split it too by
following the same steps.
Example:
Insert the node 8 into the B Tree of order 5 shown in the following image.
The node, now contain 5 keys which is greater than (5 -1 = 4 ) keys. Therefore split the
node from the median i.e. 8 and push it up to its parent node shown as follows.
Deletion
Deletion is also performed at the leaf nodes. The node which is to be deleted can either
be a leaf node or an internal node. Following algorithm needs to be followed in order to
delete a node from a B tree.
2. If there are more than m/2 keys in the leaf node then delete the desired key from
the node.
3. If the leaf node doesn't contain m/2 keys then complete the keys by taking the
element from eight or left sibling.
○ If the left sibling contains more than m/2 elements then push its largest
element up to its parent and move the intervening element down to the
node where the key is deleted.
○ If the right sibling contains more than m/2 elements then push its
smallest element up to the parent and move intervening element down to
the node where the key is deleted.
4. If neither of the sibling contain more than m/2 elements then create a new leaf
node by joining two leaf nodes and the intervening element of the parent node.
5. If parent is left with less than m/2 nodes then, apply the above process on the
parent too.
If the the node which is to be deleted is an internal node, then replace the node with its
in-order successor or predecessor. Since, successor or predecessor will always be on
the leaf node hence, the process will be similar as the node is being deleted from the
leaf node.
Example 1
Delete the node 53 from the B Tree of order 5 shown in the following figure.
53 is present in the right child of element 49. Delete it.
Now, 57 is the only element which is left in the node, the minimum number of elements
that must be present in a B tree of order 5, is 2. it is less than that, the elements in its
left and right sub-tree are also not sufficient therefore, merge it with the left sibling and
intervening element of parent i.e. 49.
Application of B tree
B tree is used to index the data and provides fast access to the actual data stored on
the disks since, the access to value stored in a large database that is stored on a disk is
a very time consuming process.
Searching an un-indexed and unsorted database containing n key values needs O(n)
running time in worst case. However, if we use B Tree to index this database, it will be
searched in O(log n) time in worst case.
B+ Tree
B+ Tree is an extension of B Tree which allows efficient insertion, deletion and search
operations.
In B Tree, Keys and records both can be stored in the internal as well as leaf nodes.
Whereas, in B+ tree, records (data) can only be stored on the leaf nodes while internal
nodes can only store the key values.
The leaf nodes of a B+ tree are linked together in the form of a singly linked lists to
make the search queries more efficient.
B+ Tree are used to store the large amount of data which can not be stored in the main
memory. Due to the fact that, size of main memory is always limited, the internal nodes
(keys to access records) of the B+ tree are stored in the main memory whereas, leaf
nodes are stored in the secondary memory.
The internal nodes of B+ tree are often called index nodes. A B+ tree of order 3 is shown
in the following figure.
Advantages of B+ Tree
1. Records can be fetched in equal number of disk accesses.
5. Faster search queries as the data is stored only on the leaf nodes.
B Tree VS B+ Tree
S B Tree B+ Tree
N
1 Search keys can not be repeatedly stored. Redundant search keys can be present.
2 Data can be stored in leaf nodes as well as internal nodes Data can only be stored on the leaf nodes.
3 Searching for some data is a slower process since data can Searching is comparatively faster as data can only
be found on internal nodes as well as on the leaf nodes.
be found on the leaf nodes.
4 Deletion of internal nodes are so complicated Deletion will never be a complexed process since
and time consuming. element will always be deleted from the leaf nodes.
5 Leaf nodes can not be linked together. Leaf nodes are linked together to make the search
Insertion in B+ Tree
Step 1: Insert the new node as a leaf node
Step 2: If the leaf doesn't have required space, split the node and copy the middle node
to the next index node.
Step 3: If the index node doesn't have required space, split the node and copy the
middle element to the next index page.
Example :
Insert the value 195 into the B+ tree of order 5 shown in the following figure.
195 will be inserted in the right sub-tree of 120 after 190. Insert it at the desired
position.
The node contains greater than the maximum number of elements i.e. 4, therefore split
it and place the median node up to the parent.
Now, the index node contains 6 children and 5 keys which violates the B+ tree
properties, therefore we need to split it, shown as follows.
Deletion in B+ Tree
Step 1: Delete the key and data from the leaves.
Step 2: if the leaf node contains less than minimum number of elements, merge down
the node with its sibling and delete the key in between them.
Step 3: if the index node contains less than minimum number of elements, merge the
node with the sibling and move down the key in between them.
Example
Delete the key 200 from the B+ Tree shown in the following figure.
200 is present in the right sub-tree of 190, after 195. delete it.
Merge the two nodes by using 195, 190, 154 and 129.
Now, element 120 is the single element present in the node which is violating the B+
Tree properties. Therefore, we need to merge it by using 60, 78, 108 and 120.
In the linked representation of binary trees, more than one half of the link fields contain
NULL values which results in wastage of storage space. If a binary tree consists of n
nodes then n+1 link fields contain NULL values. So in order to effectively manage the
space, a method was devised by Perlis and Thornton in which the NULL links are
replaced with special links known as threads. Such binary trees with threads are known
as threaded binary trees. Each node in a threaded binary tree either contains a link to its
child node or thread to other nodes in the tree.
Types of Threaded Binary Tree
In one-way threaded binary trees, a thread will appear either in the right or left link field
of a node. If it appears in the right link field of a node then it will point to the next node
that will appear on performing in order traversal. Such trees are called Right threaded
binary trees. If thread appears in the left field of a node then it will point to the nodes
inorder predecessor. Such trees are called Left threaded binary trees. Left threaded
binary trees are used less often as they don't yield the last advantages of right threaded
binary trees. In one-way threaded binary trees, the right link field of last node and left
link field of first node contains a NULL. In order to distinguish threads from normal links
they are represented by dotted lines.
The above figure shows the inorder traversal of this binary tree yields D, B, E, A, C, F.
When this tree is represented as a right threaded binary tree, the right link field of leaf
node D which contains a NULL value is replaced with a thread that points to node B
which is the inorder successor of a node D. In the same way other nodes containing
values in the right link field will contain NULL value.
In two-way threaded Binary trees, the right link field of a node containing NULL values is
replaced by a thread that points to nodes inorder successor and left field of a node
containing NULL values is replaced by a thread that points to nodes inorder
predecessor.
The above figure shows the inorder traversal of this binary tree yields D, B, E, G, A, C, F. If
we consider the two-way threaded Binary tree, the node E whose left field contains
NULL is replaced by a thread pointing to its inorder predecessor i.e. node B. Similarly, for
node G whose right and left linked fields contain NULL values are replaced by threads
such that right link field points to its inorder successor and left link field points to its
inorder predecessor. In the same way, other nodes containing NULL values in their link
fields are filled with threads.
In the above figure of two-way threaded Binary tree, we noticed that no left thread is
possible for the first node and no right thread is possible for the last node. This is
because they don't have any inorder predecessor and successor respectively. This is
indicated by threads pointing nowhere. So in order to maintain the uniformity of threads,
we maintain a special node called the header node. The header node does not contain
any data part and its left link field points to the root node and its right link field points to
itself. If this header node is included in the two-way threaded Binary tree then this node
becomes the inorder predecessor of the first node and inorder successor of the last
node. Now threads of left link fields of the first node and right link fields of the last node
will point to the header node.