Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
42 views35 pages

UNIT 2 Part2

Uploaded by

Vsarchana Qa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views35 pages

UNIT 2 Part2

Uploaded by

Vsarchana Qa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

UNIT 2 – part 2

Physical database design issue :


The main issues considered during the physical database design are the storage format for each
attribute (choosing the data type), the grouping of attributes into physical records, arranging records
into file organizations, selecting structures for storing and connecting files to make retrieving data
efficient, preparing stategies for handling queries against the database that will optimize performance

Physical Database Design Issues


The database design includes the process of logical design with the use of E-
R diagram, normalisation, etc., followed by the physical design.
The Key issues in the Physical Database Design are:

o The use of physical database design is to alter the logical description of data into
the technical specifications for retrieving and storing data for the DBMS.
o The aim is to create a design for storing data that will give adequate performance
and make sure database integrity, recoverability and security.

Some of the basic inputs needed for Physical Database Design are:

o Normalised relations
o Attribute definitions - Choosing the data type for each attribute from the logical data
model to mininize storage space and to maximize data integrity
o Data usage: retrieved, entered, deleted, updated - Arranging similarly structured
records in secondary memory so that individual and groups of records can be stored,
retrieved and updated rapidly.
o Requirements for security, recovery, backup, retention, integrity- protecting and
recovering data after errors are found
o DBMS characteristics. - Selecting structures for storing and connecting files to make
retrieving related data more efficient.
o Performance criterion such as response time necessity with respect to volume
estimates.- preparing strategies for handling queries against the database that will
optimize performance and take advantage of the file organizations and indexes which
have been specified.

Though, for this type of data some of the Physical Database Design Decisions that are
to be taken are:

o 1. The first step to take in physical database design is to estimate the size
and usage patterns of the database.(Data volume and access frequencies) -
o 2. Choosing data type for each attribute from the logical data model. -
Optimising attribute data types. And Modifying the logical design.
o 3. To minimize storage space and to maximize data integrity. - Specifying
the file Organisation.
o 4. Selecting indexes for storing and connecting files to make retrieving
related data more efficient. - .
o 5. Preparing strategies for handling queries against the database.

Database integrity and performance are important issues to consider.

The following decisions effect these issues during physical database design.

1. Choosing the storage format for each attribute from the logical data model.
INDE499B, Classroom Preparation for 10/30/00

2. Grouping attributes from the logical data model into physical records.

3. Arranging similary structured records in secondary memory so that individual


and groups of records can be stored, retrieved, and updated rapidly.

4. Selecting structures for storing and connecting files to make retrieving related
data more efficient.

5. Preparing strategies for handling queries against the database that will
opitmize performance and take advantage of the file organizations and indexes
that you have specified.

---How is the Storage of database on


hard disks :
The records in databases are stored in file formats. Physically,
the data is stored in electromagnetic format on a device. The
electromagnetic devices used in database systems for data
storage are classified as follows:
1. Primary Memory
2. Secondary Memory
3. Tertiary Memory

Types of Memory

1. Primary Memory
The primary memory of a server is the type of data storage that
is directly accessible by the central processing unit, meaning that
it doesn’t require any other devices to read from it. The primary
memory must, in general, function flawlessly with equal
contributions from the electric power supply, the hardware
backup system, the supporting devices, the coolant that
moderates the system temperature, etc.
• The size of these devices is considerably smaller and they are
volatile.
• According to performance and speed, the primary memory
devices are the fastest devices, and this feature is in direct
correlation with their capacity.
• These primary memory devices are usually more expensive
due to their increased speed and performance.
The cache is one of the types of Primary Memory.
• Cache Memory: Cache Memory is a special very high-speed
memory. It is used to speed up and synchronize with a high-
speed CPU. Cache memory is costlier than main memory or
disk memory but more economical than CPU registers. Cache
memory is an extremely fast memory type that acts as a buffer
between RAM and the CPU.
2. Secondary Memory
Data storage devices known as secondary storage, as the name
suggests, are devices that can be accessed for storing data that
will be needed at a later point in time for various purposes or
database actions. Therefore, these types of storage systems are
sometimes called backup units as well. Devices that are plugged
or connected externally fall under this memory category, unlike
primary memory, which is part of the CPU. The size of this group
of devices is noticeably larger than the primary devices and
smaller than the tertiary devices.
• It is also regarded as a temporary storage system since it can
hold data when needed and delete it when the user is done
with it. Compared to primary storage devices as well as tertiary
devices, these secondary storage devices are slower with
respect to actions and pace.
• It usually has a higher capacity than primary storage systems,
but it changes with the technological world, which is expanding
every day.
Some commonly used Secondary Memory types that are present
in almost every system are:
• Flash Memory: Flash memory, also known as flash storage, is
a type of nonvolatile memory that erases data in units called
blocks and rewrites data at the byte level. Flash memory is
widely used for storage and data transfer in consumer devices,
enterprise systems, and industrial applications. Unlike
traditional hard drives, flash memories are able to retain data
even after the power has been turned off
• Magnetic Disk Storage: A Magnetic Disk is a type of
secondary memory that is a flat disc covered with a magnetic
coating to hold information. It is used to store various programs
and files. The polarized information in one direction is
represented by 1, and vice versa. The direction is indicated by
0.
3. Tertiary Memory
For data storage, Tertiary Memory refers to devices that can hold
a large amount of data without being constantly connected to the
server or the peripherals. A device of this type is connected
either to a server or to a device where the database is stored
from the outside.
• Due to the fact that tertiary storage provides more space than
other types of device memory but is most slowly performing,
the cost of tertiary storage is lower than primary and
secondary. As a means to make a backup of data, this type of
storage is commonly used for making copies from servers and
databases.
• The ability to use secondary devices and to delete the contents
of the tertiary devices is similar.
Some commonly used Tertiary Memory types that are almost
present in every system are:
• Optical Storage: It is a type of storage where reading and
writing are to be performed with the help of a laser. Typically
data written on CDs and DVDs are examples of Optical
Storage.
• Tape Storage: Tape Storage is a type of storage data where
we use magnetic tape to store data. It is used to store data for
a long time and also helps in the backup of data in case of data
loss.
Memory Hierarchy
A computer system has a hierarchy of memory. Direct access to
a CPU’s main memory and inbuilt registers is available.
Accessing the main memory takes less time than running a
CPU. Cache memory is introduced to minimize this difference in
speed. Data that is most frequently accessed by the CPU resides
in cache memory, which provides the fastest access time to data.
Fastest-accessing memory is the most expensive. Although large
storage devices are slower and less expensive than CPU
registers and cache memory, they can store a greater amount of
data.
1. Magnetic Disks
Present-day computer systems use hard disk drives as
secondary storage devices. Magnetic disks store information
using the concept of magnetism. Metal disks are coated with
magnetizable material to create hard disks. Spindles hold these
disks vertically. As the read/write head moves between the disks,
it de-magnetizes or magnetizes the spots under it. There are two
magnetized spots: 0 (zero) and 1 (one). Formatted hard disks
store data efficiently by storing them in a defined order. The hard
disk plate is divided into many concentric circles, called tracks.
Each track contains a number of sectors. Data on a hard disk is
typically stored in sectors of 512 bytes.
2. Redundant Array of Independent Disks(RAID)
In the Redundant Array of Independent Disks technology, two or
more secondary storage devices are connected so that the
devices operate as one storage medium. A RAID array consists
of several disks linked together for a variety of purposes. Disk
arrays are categorized by their RAID levels.
• RAID 0: At this level, disks are organized in a striped array.
Blocks of data are divided into disks and distributed over disks.
Parallel writing and reading of data occur on each disk. This
improves performance and speed. Level 0 does not support
parity and backup.

Raid-0

• RAID 1: Mirroring is used in RAID 1. A RAID controller copies


data across all disks in an array when data is sent to it. In case
of failure, RAID level 1 provides 100% redundancy.

Raid-1

• RAID 2: The data in RAID 2 is striped on different disks, and


the Error Correction Code is recorded using Hamming
distance. Similarly to level 0, each bit within a word is stored on
a separate disk, and ECC codes for the data words are saved
on a separate set of disks. As a result of its complex structure
and high cost, RAID 2 cannot be commercially deployed.

Raid-2

• RAID 3: Data is striped across multiple disks in RAID 3. Data


words are parsed to generate a parity bit. It is stored on a
different disk. Thus, single-disk failures can be avoided.

Raid-3

• RAID 4: This level involves writing an entire block of data onto


data disks, and then generating the parity and storing it
somewhere else. At level 3, bytes are striped, while at level 4,
blocks are striped. Both levels 3 and 4 require a minimum of
three disks.
Raid-4

• RAID 5: The data blocks in RAID 5 are written to different


disks, but the parity bits are spread out across all the data
disks rather than being stored on a separate disk.

Raid-5

• RAID 6: The RAID 6 level extends the level 5 concept. A pair


of independent parities are generated and stored on multiple
disks at this level. A pair of independent parities are generated
and stored on multiple disks at this level. Ideally, you need four
disk drives for this level.
Raid-6

Storage Hierarchy
Rather than the storage devices mentioned above, there are also
other devices that are also used in day-to-day life. These are
mentioned below in the form of faster speed to lower speed from
top to down.
Storage Hierarchy

A DBMS must balance the utilization of primary, secondary, and


tertiary memory. Secondary memory meets long-term storage
demands, tertiary memory can be used for archiving, and
primary memory guarantees quick access for active data. Using
various storage types strategically in accordance with needs and
patterns of data access is essential for optimal database
performance.
File organization and its types
File Organization
o The File is a collection of records. Using the primary key, we

can access the records. The type and frequency of access can
be determined by the type of file organization which was
used for a given set of records.
o File organization is a logical relationship among various

records. This method defines how file records are mapped


onto disk blocks.
o File organization is used to describe the way in which the

records are stored in terms of blocks, and the blocks are


placed on the storage medium.
o The first approach to map the database to the file is to use

the several files and store only one fixed length record in any
given file. An alternative approach is to structure our files so
that we can contain multiple lengths for records.
o Files of fixed length records are easier to implement than the

files of variable length records.

Objective of file organization


o It contains an optimal selection of records, i.e., records can

be selected as fast as possible.


o To perform insert, delete or update transaction on the

records should be quick and easy.


o The duplicate records cannot be induced as a result of insert,

update or delete.
o For the minimal cost of storage, records should be stored

efficiently.
Types of file organization:

File organization contains various methods. These particular


methods have pros and cons on the basis of access or selection.
In the file organization, the programmer decides the best-suited
file organization method according to his requirement.

Types of file organization are as follows:

o Sequential file organization


o Heap file organization
o Hash file organization
o B+ file organization
o Indexed sequential access method (ISAM)
o Cluster file organization
Sequential File Organization
This method is the easiest method for file organization. In this method, files are stored sequentially. This method can
be implemented in two ways:

1. Pile File Method:


o It is a quite simple method. In this method, we store the record in a sequence, i.e., one after another. Here,
the record will be inserted in the order in which they are inserted into tables.
o In case of updating or deleting of any record, the record will be searched in the memory blocks. When it is
found, then it will be marked for deleting, and the new record is inserted.

Insertion of the new record:

Suppose we have four records R1, R3 and so on upto R9 and R8 in a sequence. Hence, records are nothing but a row
in the table. Suppose we want to insert a new record R2 in the sequence, then it will be placed at the end of the file.
Here, records are nothing but a row in any table.

2. Sorted File Method:


o In this method, the new record is always inserted at the file's end, and then it will sort the sequence in
ascending or descending order. Sorting of records is based on any primary key or any other key.
o In the case of modification of any record, it will update the record and then sort the file, and lastly, the
updated record is placed in the right place.
Insertion of the new record:

Suppose there is a preexisting sorted sequence of four records R1, R3 and so on upto R6 and R7. Suppose a new
record R2 has to be inserted in the sequence, then it will be inserted at the end of the file, and then it will sort the
sequence.

Pros of sequential file organization


o It contains a fast and efficient method for the huge amount of data.
o In this method, files can be easily stored in cheaper storage mechanism like magnetic tapes.
o It is simple in design. It requires no much effort to store the data.
o This method is used when most of the records have to be accessed like grade calculation of a student,
generating the salary slip, etc.
o This method is used for report generation or statistical calculations.

Cons of sequential file organization


o It will waste time as we cannot jump on a particular record that is required but we have to move sequentially
which takes our time.
o Sorted file method takes more time and space for sorting the records.

Heap file organization


o It is the simplest and most basic type of organization. It works with data blocks. In heap file organization, the
records are inserted at the file's end. When the records are inserted, it doesn't require the sorting and
ordering of records.
o When the data block is full, the new record is stored in some other block. This new data block need not to be
the very next data block, but it can select any data block in the memory to store new records. The heap file is
also known as an unordered file.
o In the file, every record has a unique id, and every page in a file is of the same size. It is the DBMS
responsibility to store and manage the new records.

Insertion of a new record


Suppose we have five records R1, R3, R6, R4 and R5 in a heap and suppose we want to insert a new record R2 in a
heap. If the data block 3 is full then it will be inserted in any of the database selected by the DBMS, let's say data
block 1.
If we want to search, update or delete the data in heap file organization, then we need to traverse the data from
staring of the file till we get the requested record.

If the database is very large then searching, updating or deleting of record will be time-consuming because there is
no sorting or ordering of records. In the heap file organization, we need to check all the data until we get the
requested record.

Pros of Heap file organization


o It is a very good method of file organization for bulk insertion. If there is a large number of data which needs
to load into the database at a time, then this method is best suited.
o In case of a small database, fetching and retrieving of records is faster than the sequential record.

Cons of Heap file organization


o This method is inefficient for the large database because it takes time to search or modify the record.
o
o This method is inefficient for large databases.

Hash File Organization


Hash File Organization uses the computation of hash function on some fields of the records. The hash function's
output determines the location of disk block where the records are to be placed.
When a record has to be received using the hash key columns, then the address is generated, and the whole record is
retrieved using that address. In the same way, when a new record has to be inserted, then the address is generated
using the hash key and record is directly inserted. The same process is applied in the case of delete and update.

In this method, there is no effort for searching and sorting the entire file. In this method, each record will be stored
randomly in the memory.
B+ File Organization
o B+ tree file organization is the advanced method of an indexed sequential access method. It uses a tree-like
structure to store records in File.
o It uses the same concept of key-index where the primary key is used to sort the records. For each primary
key, the value of the index is generated and mapped with the record.
o The B+ tree is similar to a binary search tree (BST), but it can have more than two children. In this method, all
the records are stored only at the leaf node. Intermediate nodes act as a pointer to the leaf nodes. They do
not contain any records.
The above B+ tree shows that:
o There is one root node of the tree, i.e., 25.
o There is an intermediary layer with nodes. They do not store the actual record. They have only pointers to
the leaf node.
o The nodes to the left of the root node contain the prior value of the root and nodes to the right contain next
value of the root, i.e., 15 and 30 respectively.
o There is only one leaf node which has only values, i.e., 10, 12, 17, 20, 24, 27 and 29.
o Searching for any record is easier as all the leaf nodes are balanced.
o In this method, searching any record can be traversed through the single path and accessed easily.

Pros of B+ tree file organization


o In this method, searching becomes very easy as all the records are stored only in the leaf nodes and sorted
the sequential linked list.
o Traversing through the tree structure is easier and faster.
o The size of the B+ tree has no restrictions, so the number of records can increase or decrease and the B+
tree structure can also grow or shrink.
o It is a balanced tree structure, and any insert/update/delete does not affect the performance of tree.

Cons of B+ tree file organization


o This method is inefficient for the static method.

Indexed sequential access method (ISAM)


ISAM method is an advanced sequential file organization. In this method, records are stored in the file using the
primary key. An index value is generated for each primary key and mapped with the record. This index contains the
address of the record in the file.
If any record has to be retrieved based on its index value, then the address of the data block is fetched and the record
is retrieved from the memory.

Pros of ISAM:
o In this method, each record has the address of its data block, searching a record in a huge database is quick
and easy.
o This method supports range retrieval and partial retrieval of records. Since the index is based on the primary
key values, we can retrieve the data for the given range of value. In the same way, the partial value can also
be easily searched, i.e., the student name starting with 'JA' can be easily searched.

Cons of ISAM
o This method requires extra space in the disk to store the index value.
o When the new records are inserted, then these files have to be reconstructed to maintain the sequence.
o When the record is deleted, then the space used by it needs to be released. Otherwise, the performance of
the database will slow down.

Cluster file organization


o When the two or more records are stored in the same file, it is known as clusters. These files will have two or
more tables in the same data block, and key attributes which are used to map these tables together are
stored only once.
o This method reduces the cost of searching for various records in different files.
o The cluster file organization is used when there is a frequent need for joining the tables with the same
condition. These joins will give only a few records from both tables. In the given example, we are retrieving
the record for only particular departments. This method can't be used to retrieve the record for the entire
department.

In this method, we can directly insert, update or delete any record. Data is sorted based on the key with which
searching is done. Cluster key is a type of key with which joining of the table is performed.

Types of Cluster file organization:


Cluster file organization is of two types:
1. Indexed Clusters:

In indexed cluster, records are grouped based on the cluster key and stored together. The above EMPLOYEE and
DEPARTMENT relationship is an example of an indexed cluster. Here, all the records are grouped based on the cluster
key- DEP_ID and all the records are grouped.

2. Hash Clusters:

It is similar to the indexed cluster. In hash cluster, instead of storing the records based on the cluster key, we generate
the value of the hash key for the cluster key and store the records with the same hash key value.

Pros of Cluster file organization


o The cluster file organization is used when there is a frequent request for joining the tables with same joining
condition.
o It provides the efficient result when there is a 1:M mapping between the tables.

Cons of Cluster file organization


o This method has the low performance for the very large database.
o If there is any change in joining condition, then this method cannot use. If we change the condition of
joining then traversing the file takes a lot of time.
o This method is not suitable for a table with a 1:1 condition.

TYPES of INDEXES
Indexing in DBMS – Types of Indexes in Database
A database index is a data structure that helps in improving the speed of data access.
However it comes with a cost of additional write operations and storage space to store
the database index. The database index helps quickly locate the data in database
without having to search every row of database. The process of creating an index
for a database is known indexing. In this guide, you will learn various types of Indexes
in DBMS (Database management system) with examples.

Real life example of Indexing


1. You must have read a book, the first few pages of book contains the index of book,
which tells which topic is covered at which page number. This helps you quickly locate
the topic in the book using the index. Without the index, you would have to scan the
entire book to look for the topic which would take a long time.

2. In the library, the books are arranged on the shelf in an alphabetical order. If
you are looking for a book starting with the the letter ‘A’ then you go to the shelf ‘A’.
Here shelf naming with the letter ‘A’ is the index. Imagine if the books are not arranged
in alphabetical order in shelves, it would take a very long time to search for a book.

Index structure in Database


The most common index data structure contains two fields.

1. First field is the search key, this is the column that a user can use to access the
record quickly. For example, if a user is searching for a student in database, the user
can use student id as a search key to quickly locate the student record.
2. The second field contains the address of the student record in the database.
Remember indexing doesn’t replicate the whole database, rather it creates an index that
refers to the actual data in database. This field is a reference to the data. If user is
searching for a student with student id “S01” then the S01 is the search key and the
second field of the index contains the address where the student data such as student
name, age, address is stored.
Indexing Methods

Ordered indices

The indices are usually sorted to make searching faster.


The indices which are sorted are known as ordered indices.

Example: Suppose we have an employee table with


thousands of record and each of which is 10 bytes long. If
their IDs start with 1, 2, 3....and so on and we have to
search student with ID-543.

o In the case of a database with no index, we have to


search the disk block from starting till it reaches 543.
The DBMS will read the record after reading
543*10=5430 bytes.
o In the case of an index, we will search using indexes
and the DBMS will read the record after reading
542*2= 1084 bytes which are very less compared to
the previous case.
Primary Index
o If the index is created on the basis of the primary key

of the table, then it is known as primary indexing.


These primary keys are unique to each record and
contain 1:1 relation between the records.
o As primary keys are stored in sorted order, the

performance of the searching operation is quite


efficient.
o The primary index can be classified into two types:

Dense index and Sparse index.

1. Dense Index
In Dense Index, there is an index for every record in the database. For example, if a table
student contains 100 records then in dense index the number of indices would be 100, one
index for each record in table.

If more than one record has the same search key then the dense index points to the first record
in the database that has the search key.

The dense name is given to this index is based on the fact that every record in the database has
a corresponding index in index file so the index file is very dense in this index based database.
Advantages of dense indexes:
1. Searching a record is faster compared to other indexes.
2. It doesn’t require the database to be sorted in any order to generate a dense indexes.

Disadvantages of dense indexes:


1. Requires more space as the index file is huge because it contains indexes for all records.
2. More write operations to generate index file.
3. It requires more maintenance as any change in any record would require a maintenance in
index file.

2. Sparse Index
In this index based system, the indexes of very few data items are maintained in the index
file. Unlike Dense index system where every record has an index entry in index file, in this
system, indexes are limited to one per block of data items as shown in the following diagram.
In sparse indexing database needs to be sorted in an order.

For example, let’s say we are creating a sparse index file for student database that contains
records for 100 students.

Student records are divided in blocks where every block contains two records. If index file
contains the indexes for alternate records then we need to maintain indexes for only 50 records
whereas in dense index system, we had to have 100 records in index file.

Advantages of sparse indexing:


1. It requires less storage space for managing the index file as it stores the indexes of few
records instead of all records. This improves the performance.
2. Since limited entries need to be maintained in index file, it requires less write operations for
generating a sparse index file.
3. It requires less maintenance compared to dense indexes.

Disadvantages of sparse indexing:


1. Searching is little slower than dense indexes as not all records have corresponding indexes
and it requires a binary search to locate the search record.
2. Sparse index requires file to be sorted.
Difference between Dense and Sparse indexes

DESCRIPTION DENSE SPARSE

Write operations to

Search is faster as generate indexes are


1.
index for every data faster as indexes for
Performance
item is present. few records needs to

be generated.

It requires the

2. Prerequisite No prerequisites database to be

sorted.

More storage space is Less storage space


3. Storage
required. is required.

Requires more time as


Requires less
every insert, update
maintenance as
4. and delete operation in
number of indexes
Maintenance database requires
are less compared to
maintenance in the
dense index system.
index file.
3. Clustered Index
As the name suggests, in clustered index, the records with the similar type are grouped together
to form a cluster and an index is created for this cluster which is maintained in clustered index
file.

For example:
Let’s say students are assigned to multiple courses and we are creating indexes
on course_id filed. In this case, all the students that are assigned to a
particular course_id form a cluster and the index for that particular course_id points to this
cluster as shown in the following diagram.

This helps in quickly locating a record in a particular cluster as the the size of the cluster is
limited and smaller than the actual database so searching a record is faster.

One of the type of clustered indexing is primary indexing: In this type of clustered indexing, data
is sorted based on the search key. In this type of indexing, searching is even faster as the
records are sorted.
4. Non-clustered or secondary indexing
In non-clustered indexing, the indexing is done on multiple levels. This indexing is also known
as secondary indexing.

For example, let’s say we have records of 300 students in database, instead of creating indexes
for 300 records on the root level, we create indexes for 1st student records, 101st student and
201st student. This index is maintained in the primary memory such as RAM. Here we have
divided the complete index file in three groups.

The second level of indexes are stored in hard disk, the primary index file is stored in
RAM, refers to this file and this file then points to the actual data block in memory as shown
below:
1. Multilevel index

B+ Tree structure
o The B+ tree is a balanced binary search tree. It follows a
multi-level index format.
o In the B+ tree, leaf nodes denote actual data pointers. B+
tree ensures that all leaf nodes remain at the same height.
o In the B+ tree, the leaf nodes are linked using a link list.
Therefore, a B+ tree can support random access as well as
sequential access.

Structure of B+ Tree
o In the B+ tree, every leaf node is at equal distance from the

root node. The B+ tree is of the order n where n is fixed for


every B+ tree.
o It contains an internal node and leaf node.

Intenal node
o An internal node of the B+ tree can contain at least n/2

record pointers except the root node.


o At most, an internal node of the tree contains n pointers.
Leaf node
o The leaf node of the B+ tree can contain at least n/2 record

pointers and n/2 key values.


o At most, a leaf node contains n record pointer and n key

values.
o Every leaf node of the B+ tree contains one block pointer P

to point to next leaf node.

Searching a record in B+ Tree

Suppose we have to search 55 in the below B+ tree structure.


First, we will fetch for the intermediary node which will direct to
the leaf node that can contain a record for 55.

So, in the intermediary node, we will find a branch between 50


and 75 nodes. Then at the end, we will be redirected to the third
leaf node. Here DBMS will perform a sequential search to find 55.

B+ Tree Insertion

Suppose we want to insert a record 60 in the below structure. It


will go to the 3rd leaf node after 55. It is a balanced tree, and a
leaf node of this tree is already full, so we cannot insert 60 there.

In this case, we have to split the leaf node, so that it can be


inserted into tree without affecting the fill factor, balance and
order.
The 3rd leaf node has the values (50, 55, 60, 65, 70) and its current
root node is 50. We will split the leaf node of the tree in the
middle so that its balance is not altered. So we can group (50, 55)
and (60, 65, 70) into 2 leaf nodes.

If these two has to be leaf nodes, the intermediate node cannot


branch from 50. It should have 60 added to it, and then we can
have pointers to a new leaf node.

This is how we can insert an entry when there is overflow. In a


normal scenario, it is very easy to find the node where it fits and
then place it in that leaf node.
B+ Tree Deletion

Suppose we want to delete 60 from the above example. In this


case, we have to remove 60 from the intermediate node as well as
from the 4th leaf node too. If we remove it from the intermediate
node, then the tree will not satisfy the rule of the B+ tree. So we
need to modify it to have a balanced tree.

After deleting node 60 from above B+ tree and re-arranging the


nodes, it will show as follows:

You might also like