Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
17 views42 pages

DBMS Unit 4

complete notes of 4th semister DBMS Subject Unit 4

Uploaded by

mimipe8643
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views42 pages

DBMS Unit 4

complete notes of 4th semister DBMS Subject Unit 4

Uploaded by

mimipe8643
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

GMR Institute of TechnologyRajam, AP GMRIT/ADM/F-44

(An Autonomous Institution Affiliated to JNTUGV, AP) REV.: 00

Cohesive Teaching – Learning Practices (CTLP)

Class 5rd Sem. – B. Tech Department: ECE


Course Database Management Systems Course Code 21IT304
Prepared by Mr. N. L. V. Venu Gopal
Lecture Topic Introduction to Lock Management, Lock Based Concurrency Control: 2pl, Strict
2pl,Concurrency without Locking: Timestamp–Based Concurrency Control, Optimistic
Concurrency Control. Introduction to Aries - the Log - the Write-Ahead Log
Protocol-Check Pointing, Indexing: Types of Single-Level Ordered Indexes, Multilevel
Indexes, Different Types of Data: Structured, Semi-Structured and Unstructured Data
Course Outcome (s) CO5, CO6 Program Outcome (s) PO1, PO4, PO12
Duration 60 Min Lecture 37-48 Unit IV
Pre-requisite (s) SQL

1. Objective
 To impart the knowledge of Locking based protocols.
 To impart the knowledge of Concurrency control techniques.
 To Impart the knowledge of Timestamp-Based Concurrency Control.
 To Impart the knowledge of Locking protocols
 To Impart the knowledge of Concurrency control technique and optimistic concurrency
control technique
 To Impart the knowledge of Failure occurring and recovery concepts
 To Impart the knowledge of ARIES recovery and its algorithms
 To Impart the knowledge of Indexing, single level indexing and multi-level indexing
 To Impart the knowledge of Structured, Semi-Structured and Unstructured Data

2. Intended Learning Outcomes (ILOs)

At the end of this session the students will able to:

A. Comprehend the concepts of Two Phase Locking Protocol .


B. Demonstrate the RTS and WTS of given transaction.
C. How to apply concurrency control techniques on database
D. How to reduce the conflicts in a relation.
E. Applying the normalization techniques in database project

3. 2D Mapping of ILOs with Knowledge Dimension and Cognitive Learning Levels of RBT

Cognitive Learning Levels


Knowledge
Remember Understand Apply Analyze Evaluate Create
Dimension
Factual A

1
Conceptual A,B
Procedural C,D,E
Meta Cognitive

4. Teaching Methodology

 Power Point Presentation, Chalk Talk, visual presentation

5. Evocation

2
6. Deliverables

Lecture -37 locking


Locking is used to lock some data in a database so that only one user/session may updatethatparticular data
at the same time.

 Transactions uses locks to deny access to other transactions and so preventincorrect


updates.
 Lock holds the resources until entire transaction is completed.
 It is the most widely used approach to ensure serializability.

LOCK MANAGEMENT:

Locking protocols are used in Database Management Systems as a means of concurrencycontrol.Multiple

transactions may request a lock on a same data item simultaneously.

Hence we require a mechanism to manage the locking requests made by transactions.Such a


mechanism is called as lock management.

It relies on the process of message passing where transactions and lock managerexchangemessages to
handle the locking and unlocking of data items.

LOCK BASED CONCURRENCYCONTROLCONCURRENCY CONTROL:

Concurrency control in Database Management System is a procedure of managingsimultaneous


operations without conflicting with each other.

It ensures that database transactions are performed concurrently and accurately toproducecorrect
results without violating data integrity of the database.

Purpose Of Concurrency Control:

 To enforce Isolation.
 To resolve read-write and write-write conflict issues.
 To preserve database consistency.
 Concurrency control helps to ensure serializability.

LOCK - BASED PROTOCOLS:

Lock based protocols in Database Management System is a mechanism in which atransactioncannot


Read or Write the data until it acquires an appropriate lock.

3
Lock based protocols help to eliminate the concurrency problem in DBMS forsimultaneoustransactions
by locking or isolating a particular transaction to a single user.

All lock requests are made to the concurrency-control manager. Transactions proceedonlyonce the
lock request is granted.

 To access the data item – lock acquire


 After completion of transaction – release lock

Note: All the data items must be accessed in a mutually exclusive manner.There

are mainly two types of locks in which a data item is locked

1. Shared lock
2. Exclusive lock

Shared lock:

A Shared lock is also called a Read-only lock. With the shared lock, the data item can besharedbetween
transactions. It is denoted by S.

If transaction has a shared lock on data item

 Can read but cannot update data item.

For example, If a transaction T1 has obtained a shared-lock on item Q, then T1 can read, but cannot
write Q. If any other transactions T2 wants to read the data item Q the database will let them read by
placing shared lock.

More than one transaction can hold shared locks on same data item simultaneously.

 Read cannot conflict.

Exclusive lock:

A Exclusive lock is also called write lock. With the exclusive lock, the data item cannot be shared
between transactions.

It is denoted by X.

If transaction has a exclusive lock on data item

4
 Can read and update data item.

For example, If a transaction T1 has obtained an exclusive-lock on item Q, then T2 can both read and
write Q. If any other transaction T2 wants to read or write on item Q exclusive lock prevent this
operation.

This is exclusive and can’t held concurrently on the same data item. Transactions mayunlock the
data item after finishing the write operation.

Lock Compatibility Matrix-

 A transaction may be granted a lock on an item if the requested lock is compatiblewith


locks already held on the item by other transactions.
 Any number of transactions can hold shared locks on an item, but if any transaction
holds an exclusive lock on item no other transaction may hold any lockon the item.
 If a lock cannot be granted, the requesting transaction is made to wait till all
incompatible locks held by other transactions have been released. Then locks is
granted.

Some system allow transaction to:

 Upgrade shared lock to exclusive lock.


 Downgrade exclusive lock to shared lock.

Example for lock protocol

T1 T2
Lock-S(B) Read(B)
Lock-X(B)
SharedUnlock(B)
Exclusive Read(B)
lock Balance := Balance-50
Write(B)
Unlock(B)
5
lock

Here the transaction T1 and T2 want to access the same data item B. Transaction T1 has applied
Exclusive lock so the transaction T2 cannot access the data item B until the transaction T1 releases the
lock.

6
Problem with simple locking

Consider the above execution phase. Now, T1 holds an Exclusive lock over data item B, and T2 holds a Shared
lock Over data item A. If T2 requests for lock on B, while T1 requests lockon A. This leads to Deadlock and none
can proceed with further execution.
When a transaction needs to wait for an indefinite period to acquire lock then it leads to
Starvation. It is mainly because the waiting scheme for locks is not managed properly.

Summary:

1. Locking means holding the resources until the transaction is completed . It


prevents from incorrect updates.
2. To manage the locking requests made by transaction we require a
mechanismwhich is called lock management.
3. Concurrency control is the procedure of managing the simultaneous
transactionswithout any conflicts. It ensure serializability and enforce isolation.
4. There are mainly two modes of locking 1.Shared lock – can read but cannot
updatedata item and 2.Exclusive lock – can read and update the data item.
5. A lock compatibility matrix is used which states whether a data item can be
lockedby two transactions at the same time.
6. If a first transaction waits for second transaction to release lock and the second

7
transaction wait for first transaction release lock the it leads to deadlock

Lecture -38 strict 2pl


Two-phase Locking Protocol (2PL protocol)
If Locking as well as the Unlocking can be performed in 2 phases, a transaction is considered to follow the
Two-Phase Locking protocol. The two phases are known as the growing and shrinking phases.

Two-phase locking helps to reduce the amount of concurrency in a schedule but just like the two sides of a
coin two-phase locking has a few cons too. The protocol raises transaction processing costs and may have
unintended consequences. The likelihood of establishing deadlocks is one bad result.

 Growing Phase: In the growing phase, the transaction only obtains the lock. The transaction can not
release the lock in the growing phase. Only when the data changes are committed the transaction starts
the Shrinking phase.

 Shrinking Phase: Neither are locks obtained nor they are released in this phase. When all the data
changes are stored, only then the locks are released

8
Two-Phase Locking Types (2PL types)
Two-phase Locking is further classified into three types :

1. Strict two-phase locking protocol :

 The transaction can release the shared lock after the lock point.
 The transaction cannot release any exclusive lock until the transaction commits.
 In strict two-phase locking protocol, if one transaction rollback then the other transaction should
also have to roll back. The transactions are dependent on each other. This is called Cascading
schedule.

2. Rigorous two-phase locking protocol :

 The transaction cannot release either of the locks, i.e., neither shared lock nor exclusive lock.
 Serailizability is guaranteed in a Rigorous two-phase locking protocol.
 Deadlock is not guaranteed in rigorous two-phase locking protocol.

3. Conservative two-phase locking protocol :

 The transaction must lock all the data items it requires in the transaction before the transaction
begins.
 If any of the data items are not available for locking before execution of the lock, then no data
items are locked.
 The read-and-write data items need to know before the transaction begins. This is not possible
normally.
 Conservative two-phase locking protocol is deadlock-free.
 Conservative two-phase locking protocol does not ensure a strict schedule.

9
Cascading Roll Back in 2PL

Deadlock in 2PL

T1 T2

Lock-X(R1) Lock-X(R2)

Read(R1) Read(R2)

Lock-X(R2) Lock-X(R1)

10
Lecture -39 concurrency control techniques

Several problems that arise when numerous transactions execute simultaneously in a random manner are
referred to as Concurrency Control Problems.

Dirty Read Problem

The dirty read problem in DBMS occurs when a transaction reads the data that has been updated by
another transaction that is still uncommitted. It arises due to multiple uncommitted transactions executing
simultaneously.

Example: Consider two transactions A and B performing read/write operations on a data DT in the
database DB. The current value of DT is 1000: The following table shows the read/write operations in A
and B transactions.

Time A B
T1 READ(DT) ------
T2 DT=DT+500 ------
T3 WRITE(DT) ------
T4 ------ READ(DT)
T5 ------ COMMIT
T6 ROLLBACK ------

Transaction A reads the value of data DT as 1000 and modifies it to 1500 which gets stored in the
temporary buffer. The transaction B reads the data DT as 1500 and commits it and the value of DT
permanently gets changed to 1500 in the database DB. Then some server errors occur in transaction A and
it wants to get rollback to its initial value, i.e., 1000 and then the dirty read problem occurs.

Unrepeatable Read Problem

The unrepeatable read problem occurs when two or more different values of the same data are read during
the read operations in the same transaction.

Example: Consider two transactions A and B performing read/write operations on a data DT in the
database DB. The current value of DT is 1000: The following table shows the read/write operations in A
and B transactions.

Time A B
T1 READ(DT) ------
T2 ------ READ(DT)
T3 DT=DT+500 ------
T4 WRITE(DT) ------
T5 ------ READ(DT)

Transaction A and B initially read the value of DT as 1000. Transaction A modifies the value of DT from
1000 to 1500 and then again transaction B reads the value and finds it to be 1500. Transaction B finds two
different values of DT in its two different read operations.

11
Phantom Read Problem

In the phantom read problem, data is read through two different read operations in the same transaction. In
the first read operation, a value of the data is obtained but in the second operation, an error is obtained
saying the data does not exist.

Example: Consider two transactions A and B performing read/write operations on a data DT in the
database DB. The current value of DT is 1000: The following table shows the read/write operations in A
and B transactions.

Time A B
T1 READ(DT) ------
T2 ------ READ(DT)
T3 DELETE(DT) ------
T4 ------ READ(DT)

Transaction B initially reads the value of DT as 1000. Transaction A deletes the data DT from the database
DB and then again transaction B reads the value and finds an error saying the data DT does not exist in the
database DB.

Lost Update Problem

The Lost Update problem arises when an update in the data is done over another update but by two
different transactions.

Example: Consider two transactions A and B performing read/write operations on a data DT in the
database DB. The current value of DT is 1000: The following table shows the read/write operations in A
and B transactions.

Time A B
T1 READ(DT) ------
T2 DT=DT+500 ------
T3 WRITE(DT) ------
T4 ------ DT=DT+300
T5 ------ WRITE(DT)
T6 READ(DT) ------

Transaction A initially reads the value of DT as 1000. Transaction A modifies the value of DT from 1000
to 1500 and then again transaction B modifies the value to 1800. Transaction A again reads DT and finds
1800 in DT and therefore the update done by transaction A has been lost.

Incorrect Summary Problem

The Incorrect summary problem occurs when there is an incorrect sum of the two data. This happens when
a transaction tries to sum two data using an aggregate function and the value of any one of the data get
changed by another transaction.

12
Example: Consider two transactions A and B performing read/write operations on two data DT1 and DT2
in the database DB. The current value of DT1 is 1000 and DT2 is 2000: The following table shows the
read/write operations in A and B transactions.

Time A B
T1 READ(DT1) ------
T2 add=0 ------
T3 add=add+DT1 ------
T4 ------ READ(DT2)
T5 ------ DT2=DT2+500
T6 READ(DT2) ------
T7 add=add+DT2 ------

Transaction A reads the value of DT1 as 1000. It uses an aggregate function SUM which calculates the
sum of two data DT1 and DT2 in variable add but in between the value of DT2 get changed from 2000 to
2500 by transaction B. Variable add uses the modified value of DT2 and gives the resultant sum as 3500
instead of 3000.

Concurrency Control Protocols

To avoid concurrency control problems and to maintain consistency and serializability during the execution
of concurrent transactions some rules are made. These rules are known as Concurrency Control Protocols.

Lock-Based Protocols

Time-based Protocols

According to this protocol, every transaction has a timestamp attached to it. The timestamp is based on the
time in which the transaction is entered into the system. There is read and write timestamps associated with
every transaction which consists of the time at which the latest read and write operations are performed
respectively.

Timestamp Ordering Protocol:

The timestamp ordering protocol uses timestamp values of the transactions to resolve the conflicting pairs
of operations. Thus, ensuring serializability among transactions. Following are the denotations of the terms
used to define the protocol for transaction A on the data item DT:

Terms Denotations
Timestamp of transaction A TS(A)
Read time-stamp of data-item
R-timestamp(DT)
DT
Write time-stamp of data-item
W-timestamp(DT)
DT

Following are the rules on which the Time-ordering protocol works:

1. When transaction A is going to perform a read operation on data item DT:

13
 TS(A) < W-timestamp(DT): Transaction will rollback. If the timestamp of transaction A at which it
has entered in the system is less than the write timestamp of DT that is the latest time at which DT
has been updated then the transaction will roll back.

 TS(A) >= W-timestamp(DT): Transaction will be executed. If the timestamp of transaction A at


which it has entered in the system is greater than or equal to the write timestamp of DT that is the
latest time at which DT has been updated then the read operation will be executed.

 All data-item timestamps updated.

2. When transaction A is going to perform a write operation on data item DT:

 TS(A) < R-timestamp(DT): Transaction will rollback. If the timestamp of transaction A at which it
has entered in the system is less than the read timestamp of DT that is the latest time at which DT
has been read then the transaction will rollback.

 TS(A) < W-timestamp(DT): Transaction will rollback. If the timestamp of transaction A at which it
has entered in the system is less than the write timestamp of DT that is the latest time at which DT
has been updated then the transaction will rollback.

 All the operations other than this will be executed.

Thomas' Write Rule: The rule alters the timestamp-ordering protocol to make the schedule view
serializable. For the case TS(A) < W-timestamp(DT), in the timestamp-ordering protocol, the transaction
will get rollback but according to Thomas Write Rule, whenever the write operation comes up, it will get
ignored.

Validation Based Protocol


This protocol executes the transaction undergoing through the following three phases:

Read phase: In this phase, the transaction stores all the values of data in its local buffer that occurs after
the execution of every operation in the transaction. There is no modification done in the database.

Validation phase: In this phase, validation tests are performed that check whether the values of data
present in the local buffer can replace the original value of the database without causing any harm to
serializability.

Validation Test: Validation tests have performed on transaction A executing concurrently with transaction
B such that TS(A)<TS(B). The transactions must follow one of the following conditions:

 Finish(A)<Start(B): The operations in transaction A are finished its execution before transaction B
starts. Consider two transactions A and B executing its operations. Hence serializability order is
maintained.

 Start(B)<Finish(A)<Validate(B): The list of data items written by transaction A during its write
operation should not intersect with the read of the transaction B.

14
Write phase: If the transaction passes the tests of the validation phase, then the values get copied to the
database, otherwise the transaction rolls back.

Example: Consider two transactions A and B performing read/write operations on two data DT1 and DT2
in the database DB. The current value of DT1 is 1000 and DT2 is 2000: The following table shows the
read/write operations in A and B transactions.

Time A B
T1 READ(DT1) ------
T2 ------ READ(DT1)
T3 ------ DT1=DT1-100
T4 ------ READ(DT2)
T5 ------ DT2=DT2+500
T6 ------ READ(DT2)
T7 ------
PRINT(DT2-
T8 ------
DT1)
T9 ------
T10 ------ WRITE(DT1)
T11 ------ WRITE(DT2)

The schedule passes the validation test of the validation phase due to the timestamp transaction B being
less than transaction A. It should be observed that the write operations are implemented after the validation
of both transactions. All the operations before the final write are performed in the local buffer.

15
Lecture -40 optimistic concurrency control technique
It is a concurrency control method applied to transactional systems such as relational database
management systems and software transactional memory. Optimistic concurrency control transactions
involve these phases:

Begin: Record a timestamp marking the transaction's beginning.


Modify: Read database values, and tentatively write changes.
Validate: Check whether other transactions have modified data that this transaction has used (read or
written). This includes transactions that completed after this transaction's start time, and optionally,
transactions that are still active at validation time.
Commit/Rollback: If there is no conflict, make all changes take effect. If there is a conflict, resolve it,
typically by aborting the transaction, although other resolution schemes are possible. Care must be
taken to avoid a time-of-check to time-of-use bug, particularly if this phase and the previous one are
not performed as a single atomic operation.

16
Transaction processing, concurrency control and recovery issues have played a major role in
conventional databases. Generally optimistic control demonstrates a few improvements over pessimistic
concurrency controls like two–phase locking or time stamp based protocol.

Optimistic concurrency control:


All data items are updated at the end of the transaction, at the end, if any data item is found inconsistent
with respect to the value in, then the transaction is rolled back.
Check for conflicts at the end of the transaction. No checking while the transaction is executing. Checks
are all made at once, so low transaction execution overhead. Updates are not applied until end-
transaction. They are applied to local copies in a transaction space.

Phases
The optimistic concurrency control has three phases, which are explained below

Read Phase
Various data items are read and stored in temporary variables (local copies). Alloperations are performed in these
variables without updating the database.
Validation Phase
All concurrent data items are checked to ensure serializability will not be validated if the transaction
updates are actually applied to the database. Any changes in the value cause the transaction rollback.
The transaction timestamps are used and the write-sets and read-sets are maintained.
To check that transaction A does not interfere with transaction B the followingmust hold −

TransB completes its write phase before TransA starts the read phase.

17
Lecture -41 failures occurring in transactions and recovery

Failure in terms of a database can be defined as its inability to execute the specified transaction or loss of

data from the database. A DBMS is vulnerable to several kinds of failures and each of these failures

needs to be managed differently. There are many reasons that can cause database failures such as

network failure, system crash, natural disasters, carelessness, sabotage(corrupting the data intentionally),

software errors, etc.

Failure Classification in DBMS

A failure in DBMS can be classified as:

Transaction Failure:

If a transaction is not able to execute or it comes to a point from where the transaction becomes

incapable of executing further then it is termed as a failure in a transaction.

Reason for a transaction failure in DBMS:

 Logical error: A logical error occurs if a transaction is unable to execute because of some

mistakes in the code or due to the presence of some internal faults.

 System error: Where the termination of an active transaction is done by the database system
18
itself due to some system issue or because the database management system is unable to proceed

with the transaction. For example– The system ends an operating transaction if it reaches a

deadlock condition or if there is an unavailability of resources.

System Crash:

A system crash usually occurs when there is some sort of hardware or software breakdown. Some other

problems which are external to the system and cause the system to abruptly stop or eventually crash

include failure of the transaction, operating system errors, power cuts, main memory crash, etc.

These types of failures are often termed soft failures and are responsible for the data losses in the volatile

memory. It is assumed that a system crash does not have any effect on the data stored in the non-volatile

storage and this is known as the fail-stop assumption.

Data-transfer Failure:

When a disk failure occurs amid data-transfer operation resulting in loss of content from disk storage

then such failures are categorized as data-transfer failures. Some other reason for disk failures includes

disk head crash, disk unreachability, formation of bad sectors, read-write errors on the disk, etc.

In order to quickly recover from a disk failure caused amid a data-transfer operation, the backup copy of

the data stored on other tapes or disks can be used. Thus it’s a good practice to backup your data

frequently.

Lecture -42 recovery techniques


 If database damaged:(restore or re-update)
- Keep tracking of current state of transactions ,so in case ofdatabasedamaged the last backup of
files are to be maintained .
- Reapply updates of committed transactions using log file.

19
Example:
Social media - whats app.
- Whats app is the best example, for database recovery. It backups togoogle drive sothat it helps
on tracking with current status of media, texts,doc,etc..
- If whats app has been uninstalled it can recover all the previous datafrom lastbackup from
google drive whenever re-installed.

 If database inconsistent: (need to undo and redo changes by using before &
after-image)
- If a transaction crashes, then the recovery manager may undo
transactionsi.e; reverse the operations of a transaction.

- The undo operations performed that caused inconsistency.


- May need to redo some transactions to ensure updates reach secondary
storage.(checkpoint - enable updates to database in progress to be
madepermanent).
- Do not need backup, but can restore database using before-image and after-
image in the log file.

Three main Recovery Techniques:

i. Deferred update - no undo/redo algorithm

ii. Immediate update - undo/redo algorithm

iii. Shadow paging - provides durability and atomicity

(i) Deferred update:

 All transactions updates not written to database until after


transaction hasreached its commit point.
 If transaction fails before reaching its commit, it will not have
changed thedatabase in any way so UNDO is not required.
 Before reaching commit, all transaction updates are recorded inthe
localtransaction workspace.
 It may be necessary to redo updates of committed transactions astheir
effectmay not have reached database.
 Transaction operation do not immediately update the database.
20
 The Database is updated only after the transaction is committed.

Advantages:
1) Any changes made to the data by a transaction are first recorded in a log fileand
applied to the database on commit.

Disadvantages:
1) Whenever any transaction is executed, the updates are not made
immediatelyto the database.
2) Increased time taken to recover in case of a system failure.

(ii) Immediate update:


If any transaction fails to reach its commit point, the effect of its operation must be undone i.e. the
transaction must be rolled back hence we require both undo and redo.
 In the immediate update,the data is automatically updated even before it
reachescommit point.
 Undo operations performed in reverse order in which they were written to log.
 Essential that log records are written before write to database.

Write-ahead log protocol


Once you commit your data ,if any system crash exists and the ability to roll back the data changed
from uncommitted transactions. The mechanism that is being utilized is called Write-Ahead
Logging (WAL).

Advantages:
1) Whenever any transaction is executed, the updates are made directly to the
database and the log file is also maintained which contains both old and newvalues.

Disadvantages:
1) Frequent I/O operations while the transaction is active.

(ii) Shadow paging:


 Shadow paging is a recovery technique that provides atomicity and durability in
databasesystem.

21
 Maintain two –page tables during life of transaction
- Current page table
- Shadow page table
 Shadow page table is used when the transaction starts which is
copying currentpage table. After this, shadow page table gets saved
on disk .
 Current page table is used for transaction. After transaction, bothtables
becomeidentical.
 When a transaction begins, all the entries of the current page table arecopied
to the shadow page table. In simple words, the ith entry of the current page
table and shadowpage table points to the same address or data.

Advantages:
1) Whenever any failure occurs,the recovery is faster.
2) No need of undo/ redo operations performed.

Disadvantages:
1) It do not supports to extend algorithms to allow transaction to run concurrently.
2) Data is fragmented(the process or state of breaking or being broken into parts).

22
Lecture -43 check pointing

The Checkpoint is used to declare a point before which the DBMS was in a consistent state, and all

transactions were committed. During transaction execution, such checkpoints are traced. After

execution, transaction log files will be created. Upon reaching the savepoint/checkpoint, the log file is

destroyed by saving its update to the database. Then a new log is created with upcoming execution

operations of the transaction and it will be updated until the next checkpoint and the process continues.

Whenever transaction logs are created in a real-time environment, it eats up lots of storage space. Also

keeping track of every update and its maintenance may increase the physical space of the system.

Eventually, the transaction log file may not be handled as the size keeps growing. This can be

addressed with checkpoints. The methodology utilized for removing all previous transaction logs and

storing them in permanent storage is called a Checkpoint.

Steps to Use Checkpoints in the Database

1. Write the begin_checkpoint record into a log.

2. Collect checkpoint data in stable storage.

3. Write the end_checkpoint record into a log.

The behavior when the system crashes and recovers when concurrent transactions are executed is

shown below:

23
Transactions and operations of the above diagram:
Transaction 1 Transaction 2 Transaction 3 Transaction 4
(T1) (T2) (T3) (T4)
START
START
COMMIT
START
COMMIT
START
FAILURE

 The recovery system reads the logs backward from the end to the last checkpoint i.e. from T4

to T1.

 It will keep track of two lists – Undo and Redo.

 Whenever there is a log with instructions <Tn, start>and <Tn, commit> or only <Tn, commit>

then it will put that transaction in Redo List. T2 and T3 contain <Tn, Start> and <Tn, Commit>

whereas T1 will have only <Tn, Commit>. Here, T1, T2, and T3 are in the redo list.

 Whenever a log record with no instruction of commit or abort is found, that transaction is put

to Undo List <Here, T4 has <Tn, Start> but no <Tn, commit> as it is an ongoing transaction.

T4 will be put on the undo list.

All the transactions in the redo list are deleted with their previous logs and then redone before saving

their logs. All the transactions in the undo list are undone and their logs are deleted.

Types of Checkpoints

There are basically two main types of Checkpoints:

1. Automatic Checkpoint

2. Manual Checkpoint

1. Automatic Checkpoint: These checkpoints occur very frequently like every hour or every day.

24
These intervals are set by the database administrator. They are generally used by heavy databases as

they are frequently updated, and we can recover the data easily in case of failure.

2. Manual Checkpoint: These are the checkpoints that are manually set by the database

administrator. Manual checkpoints are generally used for smaller databases. They are updated very

less frequently only when they are set by the database administrator.

Relevance of Checkpoints

A checkpoint is a feature that adds a value of C in ACID-compliant to RDBMS. A checkpoint is used

for recovery if there is an unexpected shutdown in the database. Checkpoints work on some intervals

and write all dirty pages (modified pages) from logs relay to data file from i.e from a buffer to a

physical disk. It is also known as the hardening of dirty pages. It is a dedicated process and runs

automatically by SQL Server at specific intervals. The synchronization point between the database and

transaction log is served with a checkpoint.

Advantages of Checkpoints

 Checkpoints help us in recovering the transaction of the database in case of a random shutdown

of the database.

 It enhancing the consistency of the database in case when multiple transactions are executing

in the database simultaneously.

 It increasing the data recovery process.

 Checkpoints work as a synchronization point between the database and the transaction log file

in the database.

 Checkpoint records in the log file are used to prevent unnecessary redo operations.

 Since dirty pages are flushed out continuously in the background, it has a very low overhead

25
and can be done frequently.

 Checkpoints provide the baseline information needed for the restoration of the lost state in the

event of a system failure.

 A database checkpoint keeps track of change information and enables incremental database

backup.

 A database storage checkpoint can be mounted, allowing regular file system operations to be

performed.

 Database checkpoints can be used for application solutions which include backup, recovery or

database modifications.

Disadvantages of Checkpoints

1. Database storage checkpoints can only be used to restore from logical errors (E.g. a human error).

2. Because all the data blocks are on the same physical device, database storage checkpoints cannot be

used to restore files due to a media failure.

Real-Time Applications of Checkpoints

1. Backup and Recovery

2. Performance Optimization

3. Auditing

1. Checkpoint and Recovery

A checkpoint is one of the key tools which helps in the recovery process of the database. In case of a

system failure, DBMS can find the information stored in the checkpoint to recover the database till its

last known stage.

The speed of recovery in case of a system failure depends on the duration of the checkpoint set by the

database administrator. For Example, if the checkpoint interval is set to a shorter duration, it helps in

26
faster recovery and vice-versa. If more frequent checkpoint has to be written to disk, it can also impact

the performance.

2. Importance of Checkpoint in Performance Optimization

Checkpoint plays an essential role in the Recovery of the database. Still, it also plays a vital role in

improving the performance of DBMS, and this can be done by reducing the amount of work that

should be done during recovery. It can discard any unnecessary information which helps to keep the

database clean and better for optimization purposes.

Another way in which checkpoint is used to improve the performance of the database is by reducing

the amount of data that is to be read from the disk in case of recovery. Analyzing the checkpoints

clearly helps in minimizing the data that is to be read from the disk, which improves the recovery time.

and in that way, it helps in Performance Optimization.

3. Checkpoints and Auditing

Checkpoints can be used for different purposes like Performance Optimization, it can also be used for

Auditing Purposes. Checkpoints help view the database’s history and identify any problem that had

happened at any particular time.

In case of any type of failure, database administrators can use the checkpoint to determine when it has

happened and what amount of data has been affected.

27
Lecture -44 Aries recovery
Algorithm for Recovery and Isolation Exploiting
Semantics (ARIES)
Algorithm for Recovery and Isolation Exploiting Semantics (ARIES) is based on the Write
Ahead Log (WAL) protocol. Every update operation writes a log record which is one of the
following:
1. Undo-only log record: Only the before image is logged. Thus, an undo operation can be
done to retrieve the old data.
2. Redo-only log record: Only the after image is logged. Thus, a redo operation can be
attempted.
3. Undo-redo log record: Both before images and after images are logged.

In it, every log record is assigned a unique and monotonically increasing log sequence number
(LSN). Every data page has a page LSN field that is set to the LSN of the log record
corresponding to the last update on the page. WAL requires that the log record corresponding
to an update make it to stable storage before the data page corresponding to that update is
written to disk. For performance reasons, each log write is not immediately forced to disk. A
log tail is maintained in main memory to buffer log writes. The log tail is flushed to disk when
it gets full. A transaction cannot be declared committed until the commit log record makes it to
disk.

Once in a while the recovery subsystem writes a checkpoint record to the log. The checkpoint
record contains the transaction table and the dirty page table. A master log record is maintained
separately, in stable storage, to store the LSN of the latest checkpoint record that made it to
disk. On restart, the recovery subsystem reads the master log record to find the checkpoint’s
LSN, reads the checkpoint record, and starts recovery from there on.

28
The recovery process actually consists of 3 phases:

1. Analysis:
The recovery subsystem determines the earliest log record from which the next pass must
start. It also scans the log forward from the checkpoint record to construct a snapshot of
what the system looked like at the instant of the crash.
2. Redo:
Starting at the earliest LSN, the log is read forward and each update redone.
3. Undo:
The log is scanned backward and updates corresponding to loser transactions are undone.

Lecture -45 Aries Recovery Algorithm

Write Ahead Logging

Most crash recovery systems are built around a STEAL/NO-FORCE approach, accepting the risks of
writing possibly uncommitted data to memory to gain the performance improvements of not having to
force all commits to disk. The STEAL policy imposes the need to UNDO transactions and the NO-
FORCE policy imposes the need to REDO transactions. Databases rely on a log that stores transaction
data and system state with enough information to make it possible to undo or redo transactions to
ensure atomicity and durability of the database.

A database log is a file on disk that stores a sequential list of operations on the databas e. Each entry in
the log is called a log record. During normal database operation, one or more log entries are written to
the log file for each update to the database performed by a transaction. Each entry has a unique
sequence number called the Log Sequence Number, or LSN that uniquely identifies each record in the
log. A second type of log record is a checkpoint. Periodically, these records are written to describe the
overall state of the database at a certain point in time and may contain information about the contents
of the buffer pool, active transactions, or other details depending on the implementation of the
recovery system. The log file contains enough information that we are able to undo the effects of any
incomplete or aborted transactions, and that we are able to redo any effects of committed transactions

29
that haven’t been flushed to disk.

Log records are only valuable if they can ensure recovery from failure. Write Ahead Logging (WAL) is
a protocol stating that the database must write to disk the log file records corresponding to database
changes before those changes to the database can be written to the main database files. The WAL
protocol ensures that:

1. All log records updating a page are written to non-volatile storage before the page itself is over-
written in non-volatile storage.
2. A transaction is not considered committed until all of tis log records have been written to non -
volatile storage.

WAL ensures that all log records for an updated page are written to non-volatile storage before the
page itself is allowed to be over-written in non-volatile storage. This ensures that UNDO information
required by a STEAL policy will be present in the log in the event of a crash. It also ensures that a
transaction is not considered committed until all of its log records (including its commit record) have
been written to non-volatile storage. This ensures that REDO information required by NO-FORCE
policy will be present in the log.

To highlight the performance advantages achieved with write-ahead logging, imagine that a
transaction makes an update 1000 objects in the database and that each of those objects reside on
different pages on disk. Without write ahead logging, the DBMS would need to write 1000 pages to
disk to successfully complete the transaction. With WAL and a STEAL/NO-FORCE policy, we can
update the data in-memory and write a single log record to disk that includes all information necessary
to REDO the 1000 object update. Writing the remaining 1000 pages to disk can be done
asynchronously without impacting running user-facing transactions.

30
Lecture -46 single level indexing.

Indexing is used to quickly retrieve particular data from the database. Formally we can define Indexing
as a technique that uses data structures to optimize the searching time of a database query in DBMS.
Indexing reduces the number of disks required to access a particular data by internally creating an
index table.

Indexing is achieved by creating Index-table or Index.

Index usually consists of two columns which are a key-value pair. The two columns of the index
table(i.e., the key-value pair) contain copies of selected columns of the tabular data of the database.

Here, Search Key contains the copy of the Primary Key or the Candidate Key of the database table.
Generally, we store the selected Primary or Candidate keys in a sorted manner so that we can reduce
the overall query time or search time(from linear to binary).

Data Reference contains a set of pointers that holds the address of the disk block. The pointed disk
block contains the actual data referred to by the Search Key. Data Reference is also called Block
Pointer because it uses block-based addressing.

Indexing Attributes
31
Indexing is a data structure technique to efficiently retrieve records from the database files based on
some attributes on which the indexing has been done. Indexing in database systems is similar to what
we see in books.

Indexing is defined based on its indexing attributes. Indexing can be of the following types −

 Primary Index − Primary index is defined on an ordered data file. The data file is ordered on
a key field. The key field is generally the primary key of the relation.
 Secondary Index − Secondary index may be generated from a field which is a candidate key
and has a unique value in every record, or a non-key with duplicate values.
 Clustering Index − Clustering index is defined on an ordered data file. The data file is ordered
on a non-key field.

32
Ordered Indexing is of two types −

 Dense Index
 Sparse Index

Dense Index

In dense index, there is an index record for every search key value in the database. This makes
searching faster but requires more space to store index records itself. Index records contain search key
value and a pointer to the actual record on the disk.

Sparse Index

In sparse index, index records are not created for every search key. An index record here contains a
search key and an actual pointer to the data on the disk. To search a record, we first proceed by index
record and reach at the actual location of the data. If the data we are looking for is not where we
directly reach by following the index, then the system starts sequential search until the desired data is
found.

33
Lecture -47 Multilevel Indexes

Index records comprise search-key values and data pointers. Multilevel index is stored on the disk
along with the actual database files. As the size of the database grows, so does the size of the indices.
There is an immense need to keep the index records in the main memory so as to speed up the search
operations. If single-level index is used, then a large size index cannot be kept in memory which leads
to multiple disk accesses.

Multi-level Index helps in breaking down the index into several smaller indices in order to make the
outermost level so small that it can be saved in a single disk block, which can easily be accommodated
anywhere in the main memory.

B+ Tree
34
A B+ tree is a balanced binary search tree that follows a multi-level index format. The leaf nodes of a
B+ tree denote actual data pointers. B+ tree ensures that all leaf nodes remain at the same height, thus
balanced. Additionally, the leaf nodes are linked using a link list; therefore, a B+ tree can support
random access as well as sequential access.

Structure of B+ Tree

Every leaf node is at equal distance from the root node. A B+ tree is of the order n where n is fixed for
every B+ tree.

Internal nodes –
 Internal (non-leaf) nodes contain at least ⌈n/2⌉ pointers, except the root node.
 At most, an internal node can contain n pointers.

Leaf nodes −

 Leaf nodes contain at least ⌈n/2⌉ record pointers and ⌈n/2⌉ key values.
 At most, a leaf node can contain n record pointers and n key values.
 Every leaf node contains one block pointer P to point to next leaf node and forms a linked list.

35
B+ Tree Insertion

 B+ trees are filled from bottom and each entry is done at the leaf node.
 If a leaf node overflows −
o Split node into two parts.
o Partition at i = ⌊(m+1)/2⌋.
o First i entries are stored in one node.
o Rest of the entries (i+1 onwards) are moved to a new node.
th
o i key is duplicated at the parent of the leaf.
 If a non-leaf node overflows −
o Split node into two parts.
o Partition the node at i = ⌈(m+1)/2⌉.
o Entries up to i are kept in one node.
o Rest of the entries are moved to a new node.

B+ Tree Deletion

 B+ tree entries are deleted at the leaf nodes.


 The target entry is searched and deleted.
o If it is an internal node, delete and replace with the entry from the left position.
 After deletion, underflow is tested,
o If underflow occurs, distribute the entries from the nodes left to it.
 If distribution is not possible from left, then
o Distribute from the nodes right to it.
 If distribution is not possible from left or from right, then
o Merge the node with left and right to it.

36
Lecture -48 Structured, Semi-Structured and Unstructured Data

Types of data:
i. Structured data
ii. Semi-Structured data
iii. Unstructured data
Structured data
 Structured data is data whose elements are addressable for effective analysis.
 It has been organized into a formatted repository that is typically a database.
 It concerns all data which can be stored in database SQL in a table with rows
andcolumns.
 They have relational keys and can easily be mapped into pre-designed fields.
 Structured data depends on the existence of a data model – a model of how data
canbe stored, processed and accessed.
 Because of a data model, each field is discrete and can be accesses separately
orjointly along with data from other fields.
 This makes structured data extremely powerful (i.e. it is possible to quickly
aggregatedata from various locations in the database.)
 Structured data is is considered the most ‘traditional’ form of data
storage.Example: Relational data, Excel files or SQL databases.

Semi-Structured data
 Semi-structured data is information that does not reside in a relational database butthat
has some organizational properties that make it easier to analyze.
 With some processes, you can store them in the relation database (it could be veryhard
for some kind of semi-structured data), but Semi-structured exist to ease space.
 Contain tags or other markers to separate semantic elements and enforce hierarchies
37
ofrecords and fields within the data.
 This reduces the complexity to analyse structured data, compared to unstructured data.
 It is also known as self-describing
structure.Example: JSON and XML data
Unstructured data
 Unstructured data is a data which is not organized in a predefined manner or does not
have a predefined data model.
 Thus it is not a good fit for a mainstream relational database.
 So for Unstructured data, there are alternative platforms for storing and managing, it is
increasingly prevalent in IT systems and is used by organizations in a variety of business
intelligence and analytics applications.
 Unstructured information is typically text-heavy, but may contain data such as dates,
numbers, and facts as well.
 This results in irregularities and ambiguities that make it difficult to understand using
traditional programs as compared to data stored in structured databases.
Example: Word, PDF, Text, Media logs ,audio, video
.Difference between structured ,semi structured , unstructured data
Properties Structured data Semi-structureddata Unstructureddata
Technology It is based on It is based on It is based on
Relational database XML/RDF(Resource character and
table Description binary data
Framework).
Transaction Matured transaction Transaction is No transaction
management and various adapted from DBMS management and
concurrency not matured no concurrency
techniques
Version management Versioning over Versioning over Versioned as a
tuples,row,tables tuples or graph is whole
possible
Flexibility It is more flexible It is more flexible
It is than structured data and there is
schema but less flexible than absence of
dependent unstructured data schema
and less
flexible
Scalability It is very difficult to It’s scaling is simpler It is more
scale DB schema than structured data scalable.
Very robust New technology, not —
Robustness very spread
Query Structured query Queries over Only textual
performance allow complex anonymous nodes are queries are
joining possible possible
Means of Structured Data is While in case of Semi On other hand in
get organized by Structured Data is case of
Data Organization
the means of partially organized by Unstructured Data
38
Relational Database. the data is
means of XML/RDF based on simple
Example: character and
binary data.

7. Keywords

 Transaction
 Locking
 2PL,Strict 2PL,
 Concurrency Control
 WR,WW,RW conflicts
 ARIES
 WAL Protocol
 Indexing
 Structures
 Log Record

8. Sample Questions

Remember:

1. Define Locking
2. List out the Locking protocols.
3. Define 2 PL.
4. What is Time Stamp?
5. Define RTS.
6. Define ARIES
7. List the types of Indexing
8. Define the purpose LSN in log record
9. What is Sparse index.
10. Define Dense index
11. List out the types of data

Understand:
1. Explain Lock based protocols.
2. Explain briefly Concurrency Control Techniques with example.
3. Explain types of Conflicts in Concurrency Scheduling.
4. Explain Optimistic concurrency control technique.
5. Explain ARIES algorithm in detail.
6. Explain different types Indexing with example.
7. Explain checkpoints in detail.
8. Explain the insertion and deletion operation in B tree with example.

39
9. Stimulating Question (s)

-----

10. Mind Map

11. Student Summary

At the end of this session, the facilitator (Teacher) shall randomly pick-up few students to
summarize the deliverables.

12. Reading Materials

1. Database Management Systems, Raghurama Krishnan, Johannes Gehrke,


TATAMcGrawHill3rdEdition page no: 192-198
2. Database System Concepts, Silberschatz, Korth, McGraw hill, 5th Edition

13. Scope for Mini Project

40
-------------------------

41
42

You might also like