Dbms 4
Dbms 4
Class DBMS
Multi-version Protocol
2. Non-Repeatable Reads:
A transaction reads the same row twice but gets different data because
another transaction modifies it between reads.
UNIT 4 1
3. Phantom Reads:
4. Lost Updates:
Types of Lock
1. Shared Lock (S):Shared Lock is also known as Read-only lock. As the
name suggests it can be shared between transactions because while
holding this lock the transaction does not have the permission to update
data on the data item. S-lock is requested using lock-S instruction.
2. Exclusive Lock (X): Data item can be both read as well as written. This is
Exclusive and cannot be held simultaneously on the same data item. X-lock
is requested using lock-X instruction.
UNIT 4 2
It is the simplest way of locking the data while transaction. Simplistic lock-
based protocols allow all the transactions to get the lock on the data before
insert or delete or update on it. It will unlock the data item after completing the
transaction.
Before initiating an execution of the transaction, it requests DBMS for all the
lock on all those data items.
If all the locks are granted then this protocol allows the transaction to begin.
When the transaction is completed then it releases all the lock.
If all the locks are not granted then this protocol allows the transaction to
rolls back and waits until all the locks are granted.
In the first part, when the execution of the transaction starts, it seeks
permission for the lock it requires.
In the second part, the transaction acquires all the locks. The third phase is
started as soon as the transaction releases its first lock.
In the third phase, the transaction cannot demand any new locks. It only
releases the acquired locks.
UNIT 4 3
There are two phases of 2PL:
Growing phase: In the growing phase, a new lock on the data item may be
acquired by the transaction, but none can be released.
Shrinking phase: In the shrinking phase, existing lock held by the transaction
may be released, but no new locks can be acquired.
In the below example, if lock conversion is allowed then the following phase
can happen:
Example:
UNIT 4 4
The following way shows how unlocking and locking work with 2-PL.
Transaction T1:
Lock point: at 3
Transaction T2:
Lock point: at 6
UNIT 4 5
The only difference between 2PL and strict 2PL is that Strict-2PL does not
release a lock after using it.
Strict-2PL waits until the whole transaction to commit, and then it releases
all the locks at a time.
The priority of the older transaction is higher that's why it executes first. To
determine the timestamp of the transaction, this protocol uses system time
or logical counter.
UNIT 4 6
The protocol manages concurrent execution such that the timestamps
determine the serializability order. The timestamp ordering protocol ensures
that any conflicting read and write operations are executed in timestamp order.
If R_TS(X) > TS(T) and if W_TS(X) > TS(T), then abort and rollback T
and reject the operation. else,
If W_TS(X) > TS(T), then abort and reject T and reject the operation,
else
If W_TS(X) <= TS(T), then execute the R_item(X) operation of T and set
R_TS(X) to the larger of TS(T) and current R_TS(X).
Whenever the Basic TO algorithm detects two conflicting operations that occur
in an incorrect order, it rejects the latter of the two operations by aborting the
Transaction that issued it.
UNIT 4 7
Precedence Graph for TS ordering
But the schedule may not be cascade free, and may not even be
recoverable.
Advantages
High Concurrency: Timestamp-based concurrency control allows for a
high degree of concurrency by ensuring that transactions do not interfere
with each other.
Disadvantages
Limited Granularity: The granularity of timestamp-based concurrency
control is limited to the precision of the timestamp. This can lead to
situations where transactions are unnecessarily blocked, even if they do not
conflict with each other.
UNIT 4 8
Timestamp Synchronization: Timestamp-based concurrency control
requires that all transactions have synchronized clocks. If the clocks are not
synchronized, it can lead to incorrect ordering of transactions.
deadlock handling
The Deadlock is a condition in a multi-user database environment where
transactions are unable to the complete because they are each waiting for the
resources held by other transactions. This results in a cycle of the
dependencies where no transaction can proceed.
Characteristics of Deadlock
Mutual Exclusion: Only one transaction can hold a particular resource at a
time.
Hold and Wait: The Transactions holding resources may request additional
resources held by others.
UNIT 4 9
Deadlock Avoidance
When a database is stuck in a deadlock state, then it is better to avoid the
database rather than aborting or restating the database. This is a waste of
time and resource.
Deadlock Detection
In a database, when a transaction waits indefinitely to obtain a lock, then the
DBMS should detect whether the transaction is involved in a deadlock or not.
The lock manager maintains a Wait for the graph to detect the deadlock cycle in
the database.
The wait for a graph for the above scenario is shown below:
UNIT 4 10
Deadlock Prevention
Deadlock prevention method is suitable for a large database. If the
resources are allocated in such a way that deadlock never occurs, then the
deadlock can be prevented.
In this, older transactions must wait for the In this, older transactions never wait
younger one to release its data items. for younger transactions.
The number of aborts and rollbacks is higher in In this, the number of aborts and
these techniques. rollback is lesser.
UNIT 4 11
directly to the database. All updates are applied to local copies of data items
kept for the transaction.
1. Read phase: In this phase, the transaction T is read and executed. It is used
to read the value of various data items and stores them in temporary local
variables. It can perform all the write operations on temporary variables
without an update to the actual database.
The validation phase examines the reads and writes of the transaction that may
cause overlapping. So each transaction is assigned with the following different
timestamps:
The validation phase for Ti checks that for all transaction Tj one of the following
below conditions must hold to being validated or pass validation phase:
1. Finish(Tj)<Starts(Ti), since Tj finishes its execution means completes its
write-phase before Ti started its execution(read-phase). Then the serializability
indeed maintained.
UNIT 4 12
2. Ti begins its write phase after Tj completes its write phase, and the read_set
of Ti should be disjoint with write_set of Tj.
3. Tj completes its read phase before Ti completes its read phase and both
read_set and write_set of Ti are disjoint with the write_set of Tj.
Advantages:
1. Avoid Cascading-rollbacks: This validation based scheme avoid cascading
rollbacks since the final write operations to the database are performed only
after the transaction passes the validation phase. If the transaction fails then no
updation operation is performed in the database. So no dirty read will happen
hence possibilities cascading-rollback would be null.
2. Avoid deadlock: Since a strict time-stamping based technique is used to
maintain the specific order of transactions. Hence deadlock isn’t possible in
this scheme.
Disadvantages:
1. Starvation: There might be a possibility of starvation for long-term
transactions, due to a sequence of conflicting short-term transactions that
cause the repeated sequence of restarts of the long-term transactions so on
and so forth. To avoid starvation, conflicting transactions must be temporarily
blocked for some time, to let the long-term transactions to finish.
multiple granularity
Granularity: It is the size of data item allowed to lock.
Multiple Granularity:
It can be defined as hierarchically breaking up the database into blocks
which can be locked.
It makes easy to decide either to lock a data item or to unlock a data item.
This type of hierarchy can be graphically represented as a tree.
UNIT 4 13
The first level or higher level shows the entire database.
The second level represents a node of type area. The higher level database
consists of exactly these areas.
The area consists of children nodes which are known as files. No file can
be present in more than one area.
Finally, each file contains child nodes known as records. The file has
exactly those records that are its child nodes. No records represent in more
than one file.
Hence, the levels of the tree starting from the top level are as follows:
1. Database
2. Area
3. File
4. Record
UNIT 4 14
Failure Classification
To find that where the problem has occurred, we generalize a failure into the
following categories:
1. Transaction failure
2. System crash
3. Disk failure
1. Transaction failure
The transaction failure occurs when it fails to execute or when it reaches a
point from where it can't go any further. If a few transaction or process is hurt,
then this is called as transaction failure.
Reasons for a transaction failure could be -
2. System Crash
System failure can occur due to power failure or other hardware or software
failure. Example: Operating system error.
Fail-stop assumption: In the system crash, non-volatile storage is assumed
not to be corrupted.
3. Disk Failure
It occurs where hard-disk drives or storage drives used to fail frequently. It
was a common problem in the early days of technology evolution.
Disk failure occurs due to the formation of bad sectors, disk head crash,
and unreachability to the disk or any other failure, which destroy all or part
of disk storage.
UNIT 4 15
storage
A database system provides an ultimate view of the stored data. However, data
in the form of bits, bytes get stored in different storage devices.
In this section, we will take an overview of various types of storage devices that
are used for accessing and storing data.
Primary Storage
Secondary Storage
Tertiary Storage
Primary Storage
It is the primary area that offers quick access to the stored data. We also know
the primary storage as volatile storage. It is because this type of memory does
not permanently store the data. As soon as the system leads to a power cut or
UNIT 4 16
a crash, the data also get lost. Main memory and cache are the types of
primary storage.
Main Memory: It is the one that is responsible for operating the data that is
available by the storage medium. The main memory handles each
instruction of a computer machine. This type of memory can store
gigabytes of data on a system but is small enough to carry the entire
database. At last, the main memory loses the whole content if the system
shuts down because of power failure or other reasons.
1. Cache: It is one of the costly storage media. On the other hand, it is the
fastest one. A cache is a tiny storage media which is maintained by the
computer hardware usually. While designing the algorithms and query
processors for the data structures, the designers keep concern on the
cache effects.
Secondary Storage
Secondary storage is also called as Online storage. It is the storage area that
allows the user to save and store data permanently. This type of memory does
not lose the data due to any power failure or system crash. That's why we also
call it non-volatile storage.
There are some commonly described secondary storage media which are
available in almost every type of computer system:
Flash Memory: A flash memory stores data in USB (Universal Serial Bus)
keys which are further plugged into the USB slots of a computer system.
These USB keys help transfer data to a computer system, but it varies in
size limits. Unlike the main memory, it is possible to get back the stored
data which may be lost due to a power cut or other reasons. This type of
memory storage is most commonly used in the server systems for caching
the frequently used data. This leads the systems towards high performance
and is capable of storing large amounts of databases than the main
memory.
Magnetic Disk Storage: This type of storage media is also known as online
storage media. A magnetic disk is used for storing the data for a long time.
It is capable of storing an entire database. It is the responsibility of the
computer system to make availability of the data from a disk to the main
memory for further accessing. Also, if the system performs any operation
over the data, the modified data should be written back to the disk. The
UNIT 4 17
tremendous capability of a magnetic disk is that it does not affect the data
due to a system crash or failure, but a disk failure can easily ruin as well as
destroy the stored data.
Tertiary Storage
It is the storage type that is external from the computer system. It has the
slowest speed. But it is capable of storing a large amount of data. It is also
known as Offline storage. Tertiary storage is generally used for data backup.
There are following tertiary storage devices available:
Storage Hierarchy
Besides the above, various other storage devices reside in the computer
system. These storage media are organized on the basis of data accessing
speed, cost per unit of data to buy the medium, and by medium's reliability.
Thus, we can create a hierarchy of storage media on the basis of its cost and
speed.
Thus, on arranging the above-described storage media in a hierarchy
according to its speed and cost, we conclude the below-described image:
UNIT 4 18
In the image, the higher levels are expensive but fast. On moving down, the
cost per bit is decreasing, and the access time is increasing. Also, the storage
media from the main memory to up represents the volatile nature, and below
the main memory, all are non-volatile devices.
The atomicity property of DBMS states that either all the operations of
transactions must be performed or none. The modifications done by an aborted
transaction should not be visible to the database and the modifications done by
the committed transaction should be visible. To achieve our goal of atomicity,
the user must first output stable storage information describing the
modifications, without modifying the database itself.
Log-based recovery is a technique used in database management systems
(DBMS) to recover a database to a consistent state in the event of a failure or
crash. It involves the use of transaction logs, which are records of all the
UNIT 4 19
transactions performed on the database.
UNIT 4 20
Other types of log records are:
1. Undo: using a log record sets the data item specified in log record to old
value.
2. Redo: using a log record sets the data item specified in log record to new
value.
1. Transaction Ti needs to be undone if the log contains the record <Ti start>
but does not contain either the record <Ti commit> or the record <Ti
abort>.
Use of Checkpoints – When a system crash occurs, user must consult the log.
In principle, that need to search the entire log to determine this information.
UNIT 4 21
There are two major difficulties with this approach:
ARIES
ARIES Algorithm
1) Analysis:
Scan the log from start to reconstruct the transaction and dirty page table.
Dirtypages contain data that has been changed but not yet written to disk.
The active transactions which were present at the time of crash are
identified.
During analysis phase the log is scanned forward from the checkpoint
record to construct snapshot of what system looks like at the time of crash.
2) Redo :
3) Undo :
UNIT 4 22
Write-Ahead Logging: Any change to a database object is first recorded in
the log. The log record must be written to stable storage before the change
to the database object is written to disk.
Advantages:
1) It is simple and flexible.
2) It supports concurrency control protocol.
UNIT 4 23
UNIT 4 24
UNIT 4 25