Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
115 views84 pages

Unit IV Database Transaction Management-1

dbms ppt sppu third year engineering

Uploaded by

Supriya Salke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
115 views84 pages

Unit IV Database Transaction Management-1

dbms ppt sppu third year engineering

Uploaded by

Supriya Salke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 84

Government College of Engineering and Research , Avasari

Department of Computer Engineering

DATABASE MANAGEMENT
SYSTEM
TE 2019 Course

Prof.K. B. Sadafale
Assistant Professor
Syllabus
Unit I: Introduction to Database Management Systems and ER Model

UNIT II: SQL and PL/SQL

UNIT III: Relational Database Design

UNIT IV: Database Transaction Management

UNIT V: NoSQL Databases

UNIT VI: Advances in Databases


UNIT IV: Database Transaction Management
Introduction to Database Transaction, Transaction states,
ACID properties, Concept of Schedule, Serial Schedule.
Serializability: Conflict and View, Cascaded Aborts,
Recoverable and Non-recoverable Schedules.
Concurrency Control: Lock-based, Time-stamp based
Deadlock handling.
Recovery methods: Shadow-Paging and Log-Based
Recovery, Checkpoints.
Log-Based Recovery: Deferred Database Modifications
and Immediate Database Modifications.
Unit III
Database Transaction Management
Basic concept of a Transaction.
Transaction
A sequence of many actions which are considered to be one
atomic unit of work.
– Read, write, commit, abort

Database transaction
A unit of work performed within a database management system
Governed by four ACID properties:
 Atomicity,
 Consistency,
 Isolation,
 Durability

 Has a unique starting point, some actions and one end point

 A transaction is a unit of work which completes as a unit or fails as


a unit.
Transactions
Many enterprises use databases to store information
about their state

e.g., Balances of all depositors at a bank
When an event occurs in the real world that changes
the state of the enterprise,
a program is executed to change the database state in a
corresponding way

e.g., Bank balance must be updated when deposit is
made
Such a program is called a transaction.
Transaction Concept
 A transaction is a unit of program execution that accesses and
possibly updates various data items.
 E.g. transaction to transfer $50 from account A to account B:

1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
 Two main issues to deal with:

 Failures of various kinds, such as hardware failures and


system crashes.
 Concurrent execution of multiple transactions.
Example of Fund Transfer
Transaction to transfer $50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
Atomicity requirement

if the transaction fails after step 3 and before step 6, money will be “lost”
leading to an inconsistent database state
Failure could be due to software or hardware

the system should ensure that updates of a partially executed transaction
are not reflected in the database
Durability requirement
once the user has been notified that the transaction has completed (i.e., the
transfer of the $50 has taken place),
The updates to the database by the transaction must persist even if there are
software or hardware failures.
Transaction to transfer $50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
Consistency requirement
in above example:

the sum of A and B is unchanged by the execution of the transaction
In general, consistency requirements include
Explicitly specified integrity constraints such as
primary keys and foreign keys
Implicit integrity constraints

e.g. sum of balances of all accounts, minus sum of loan
amounts must equal value of cash-in-hand

A transaction must see a consistent database.

During transaction execution the database may be temporarily inconsistent.

When the transaction completes successfully the database must be
consistent
Erroneous transaction logic can lead to inconsistency
Isolation requirement
if between steps 3 and 6, another transaction T2 is allowed to access
the partially updated database, it will see an inconsistent database
(the sum A + B will be less than it should be).

T1 T2
1. read(A)
2. A := A – 50
3. write(A)
read(A), read(B), print(A+B)
4. read(B)
5. B := B + 50
6. write(B
Isolation can be ensured trivially by running transactions serially

that is, one after the other.

However, executing multiple transactions concurrently has significant


benefits.
The ACID Properties
 Atomicity:
All actions in the transaction happen, or none happen.
A real-world event either happens or does not happen

Student either registers or does not register
Similarly, the system must ensure that either the corresponding
transaction runs to completion or, if not, it has no effect at all
 Consistency:
If each transaction is consistent, and the DB starts consistent, it ends
up consistent.
 Isolation:
Execution of one transaction is isolated from that of other
transactions.
 Durability:
If a transaction commits, its effects persist.
The system must ensure that once a transaction commits, its effect
on the database state is not lost in spite of subsequent failures
Concurrent execution of user programs is
essential for good DBMS performance.

A user’s program may carry out many operations


on the data retrieved from the database.

but the DBMS is only concerned about what data


is read/written from/to the database.

A transaction is the DBMS’s abstract view of a


user program: a sequence of reads and writes.
Transaction State
Active
The initial state; the transaction stays in this state while it is
executing
Partially committed
After the final statement has been executed.
Failed
After the discovery that normal execution can no longer proceed.

Aborted
After the transaction has been rolled back and the database
restored to its state prior to the start of the transaction.
Two options after it has been aborted:

restart the transaction
can be done only if no internal logical error

kill the transaction
Committed
After successful completion.
Transaction State (Cont.)
Concept of Schedule
In the fields of databases and transaction processing (transaction
management), a schedule of a system is an abstract model to describe
execution of transactions running in the system.
Often it is a list of operations (actions) ordered by time, performed by
a set of transactions that are executed together in the system.
Schedule – a sequences of instructions that specify the chronological
order in which instructions of concurrent transactions are executed

a schedule for a set of transactions must consist of all instructions of
those transactions

must preserve the order in which the instructions appear in each
individual transaction.
A transaction that successfully completes its execution will have a
commit instructions as the last statement

by default transaction assumed to execute commit instruction as its last
step
A transaction that fails to successfully complete its execution will have
an abort instruction as the last statement .
Concurrent execution
Transaction processing systems usually allows multiple transactions to run
concurrently.
Concurrent execution of multiple transaction cause several complication
with consistency of data.
There are two good reasons for allowing concurrency
1) Improved throughput and resource utilization.
2) Reduce waiting time.

1) Improved throughput and resource utilization.


“Throughput is number of transaction executed in a given amount of
time.”
A transaction consist of many steps.
Some involves I/O activity other involves CPU activity.
The CPU and disks in Computer system can operate in parallels.
The parallelism of the CPU and the I/O system can therefore be exploited
to run multiple transaction in parallel.

If one transaction is reading or writing data on disk another can running in


CPU.

All of this increase throughput of the system correspondingly , the


processor and disk utilization also increase.

Thus the processor and disk spend less time idle.


2) Reduce waiting time.
There may be a mix of transaction running on the system, some short and
some long.
If transaction run serially , a short transaction may have to wait for a
preceding long transaction to complete.
Which can lead to unpredictable delay in running a transaction.
If the transaction are operating on different parts of the database , it is
better to let them run concurrently, sharing the CPU cycles and disk
access among them.

Concurrent execution reduces the unpredictable delays in running


transaction.
It also reduces the average response time- the average time for a
transaction to be completed after it has been submitted .
The motivation for using concurrent execution in database is essentially
the same as the motivation for using multiprogramming in an operating
system.
Example
Let T1 and T2 are two transaction
Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance
from A to B.
 Suppose the current value of A and B are $ 1000 & $ 2000
respectively.
 Suppose the two transactions are executed in the order T1
followed by T2.
A serial schedule in which T1 is followed by T2 :

Schedule 1
The final values of account A and account B after execution of schedule1
are $855 and $2145 respectively. Thus the sum A+B is preserved.

A=950
Write(950)
Read(2000)
B=2050
write(2050)
Read(950)
temp=95
A=950-95=855
Write(855)
Read(2050)
B=2050 + 95
Write(2145)
The final values of account A and B after the execution of schedule1
are $855 and $2145 respectively.
Thus the sum A+B is preserved.
The execution sequence which represent the chronological order in
which instructions are executed in the system , are called schedule.
 A serial schedule where T2 is followed by T1

Schedule 2
The given two transaction can be executed concurrently.
Let T1 and T2 be the transactions defined previously.
One possible concurrent schedule is given below.
The following schedule is not a serial schedule, but it is equivalent to
Schedule 1.

A=1000
A=950
Write(950)
Read(950)
Temp=95
A=950-95=855
Write(855)
Read(2000)
Schedule 3
B=2050
Write(2050)
Read(2050)
B=2050+95=2145
Write(2145)
After the execution of above schedule 3 , we arrive at the same
state as the one which the transactions are executed serially in
the order T1 followed by T2.
The sum A+B is preserved.
Not all concurrent execution result in a correct state.
The following concurrent schedule 4 does not preserve the value
of (A + B ).

A=1000
A=950
Read(950)
Temp=100
Schedule 4 A=950-100=850
Write(850)
Read(2000)
Write(850)
Read(2000)
B=2050
Write(2050)
B=2000+100=2100
Write(2100)
The concurrent schedule 4 does not preserve
the value of (A + B ).
After the execution of schedule 4 , we arrive at
a state where the final values of account A and
B are $850 and $2100 respectively.
This final state in an inconsistent state.
We can ensure consistency of the database
under concurrent execution by making sure
that any schedule that executed has the same
effect as that of serial schedule.
Serializability
A transaction schedule is serializable if its outcome (e.g., the resulting
database state) is equal to the outcome of its transactions executed serially,
i.e., sequentially without overlapping in time.
Serializability is the major correctness criterion for concurrent transactions'
executions
Basic Assumption – Each transaction preserves database consistency.
Thus serial execution of a set of transactions preserves database
consistency.
A (possibly concurrent) schedule is serializable if it is equivalent to a
serial schedule.
We must first understand which schedule will ensure consistency and
which schedule is not.
Transactions are a programs, it is computationally difficult to
determine exactly what operations a transaction performs and how
operation of various transaction interact.
We consider only two operations: read and write.
Different forms of schedule equivalence give rise to the notions of:
1. conflict serializability
2. view serializability
Simplified view of transactions

We ignore operations other than read and write instructions

We assume that transactions may perform arbitrary
computations on data in local buffers in between reads and
writes.

Our simplified schedules consist of only read and write
instructions.
Conflict Serializability
Consider the schedule S in which there are two
consecutive instructions Ii and Ij of transaction Ti and Tj
respectively ( i not equals to j)
If Ii and IJ refers to different data items then we can
swap Ii and IJ
Without affecting the results of any instructions in the
schedule.
If Ii and Ij refers to the same data item Q then the order
of two steps may matters.
Since we are dealing with only read and write
instruction.
There are four cases that we need to consider.
1) Ii =Read(Q) , Ij = Read(Q).
The order of Ii and Ij does not matter.
The same values of Q read by Ti and Tj.

2) Ii=Read(Q) , Ij=Write(Q)
If Ii comes before Ij , then Ti does not read the value of Q that is
written by Tj in instruction Ij.
If Ij comes before Ii then Ti read the value of Q that is written by Tj.
The order of Ii and Ij matters.

3) Ii=Write(Q) , Ij = Read(Q)
The order of Ii and Ij matters for reason similar to those of the
previous case.

4) Ii = Write(Q) , Ij=Write(Q)
The order of these instructions does not affect either Ti or Tj
because both operation are write.
We say that Ii and Ij conflict
if there are operations by different transactions on
the same data items.
At least one of these instructions is write operation.
Instructions li and lj of transactions Ti and Tj
respectively, conflict if and only if there exists some
item Q accessed by both li and lj,
and at least one of these instructions write Q.
1. li = read(Q), lj = read(Q). li and lj don’t conflict.
2. li = read(Q), lj = write(Q). They conflict.
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict
Consider the following schedule for the concept of conflict
instructions.

Write(A) of T1 conflicts with Read(A) instruction of T2


The write(A) of T2 does not conflict with Read(B) of T1
Reasons:-Two instruction access different data items A & B.
View Serializability
Let S and S´ be two schedules with the same set of transactions.
S and S´ are view equivalent if the following three conditions are met:
1. For each data item Q, if transaction Ti reads the initial value of Q
in schedule S, then transaction Ti must, in schedule S´, also read
the initial value of Q.
2. For each data item Q if transaction Ti executes read(Q) in
schedule S, and that value was produced by transaction Tj (if
any), then transaction Ti must in schedule S´ also read the value
of Q that was produced by transaction Tj .
3. For each data item Q, the transaction (if any) that performs the
final write(Q) operation in schedule S must perform the final
write(Q) operation in schedule S´.

As can be seen, view equivalence is also based purely on reads and


writes alone.
Condition 1 and 2 ensure that each transaction reads the same
values in both schedules and therefore performs the same
computation.
Condition 3 ,coupled with condition 1 and 2 , ensure that both
schedules result in the same final system state.
Consider the following schedules

Schedule 1 Schedule 2 Schedule 3


In above schedule , schedule 1 is not view equivalence to
schedule 2.
Since in schedule 1 , the value of account A read by
transaction T2 was produced by T1.
Where as this case does not hold in schedule 2.

Schedule 1 Schedule 2
However schedule 1 is view equivalence to schedule 3 ,
Because the value of account A and B read by transaction T2
were produced by T1 in both schedules.

Schedule 1 Schedule 3

We say that a schedule S is view serializable if it is view


equivalent to serial schedule.
Testing for serializability
How to determine given a particular schedule S , whether
the schedule is serializable.
A simple and efficient method for determining conflict
serializability of a schedule is precedence graph or
directed graph.
This graph consist of a pair G =(V , E) where V is set of
vertices and E is a set of edges.
The set of vertices consist of all the transactions
participating in the schedule.
The set of edges consists of all edges Ti Tj for which
one of three conditions holds.
1. Ti executes write(Q) before Tj executes read(Q).
2. Ti executes read(Q) before Tj executes write(Q).
3. Ti executes write(Q) before Tj executes write(Q).
If an edge Ti Tj exists in the precedence graph , then in any
serial schedule S’ equivalent to S , Ti must appear before Tj
Consider schedule 1 as follows

Precedence graph for this is as shown

T1 T2
Consider schedule 2 as follows

T2 T1

Precedence graph for this is as shown


Consider schedule 4 as follows

Precedence graph for this is as shown


It contain edge T1 T2 because T1 executes read(A) before T2
executes write(A).
It also contains the edge T2 T1 because T2 executes read(B)
before T1 execute write(B).

If the precedence graph for S has a cycle , then schedule is not


conflict serializable.
If the graph contains no cycles, then the schedule S is conflict
serializable.

There exists no efficient algorithm to test for view serializability.


If the sufficient conditions are satisfied , the schedule is view
serializable.
Recoverability
Need to address the effect of transaction failures during
concurrent execution.

If transaction Ti fails , we need to undo the effect of this


transaction to ensure the atomicity property of transaction.

It is necessary also to ensure that any transaction Tj that is


dependent on Ti (means Tj has read data written by Ti) is also
aborted.

To achieve this we need to place restrictions on the type of


schedules permitted in the system.
Recoverable and Non-recoverable
Schedules
Recoverable schedule — if a transaction Tj reads a data items previously
written by a transaction Ti , the commit operation of Ti appears before
the commit operation of Tj.
Consider the following schedule

T9 is a transaction that performs only one instruction : read(A).


Suppose that T9 commit immediately after executing read(A).
T9 commit before T8 does.
Suppose T8 fails before its commits.
Since T9 has read the value of data item A written by T8 , we must abort
T9 to ensure transaction atomicity.
However ,T9 has already committed and can not
aborted .

Thus we have a situation where it is impossible to


recover correctly from the failure of T8.

The commit happening immediately after the read (A)


instruction , is an example of a Nonrecoverable
schedule .

Most data base require that all schedule be recoverable.


Cascadeless schedules
To recover schedule correctly from the failure of transaction
Ti , we may have to roll back several transaction.
Consider the following schedule

Transaction T10 writes a value of A that is read by transaction


T11.
Transaction T11 writes a value of A that is read by transaction
T12.
Suppose that at this point T10 fails.
T10 must be rolled back .
Since T11 is dependent on T10,T11 must be rolled back ,
Since T12 is dependent on T11, T12 must be rolled back.
This phenomenon in which a single transaction failure leads to a
series of transaction rollbacks, is called cascading rollback.
Cascading rollback is undesirable , since it leads to the undoing of
a significant amount of work.
It is desirable to restrict a schedules to those where cascading
rollbacks cannot occur.
Such schedule are called cascadless schedules.
Cascadless schedules:- one where , for each pair of transactions
Ti and Tj such that Tj reads a data item previously written by Ti ,
the commit operation of Ti appears before the read operation of
Tj.
Concurrency Control
Concurrence control is a mechanism to ensure that the
system must control the interaction among the
concurrent transaction.

Lock-Based Protocols
One way to ensure serializability is to require that
while one transaction is accessing a data item, no other
transaction can modify that data item.

The most common method used to implement this


requirement is holding a lock on that item.
locks
A lock is a mechanism to control concurrent access to a
data item
Data items can be locked in two modes :
1. exclusive (X) mode. Data item can be both read as
well as written.
X-lock is requested using lock-X instruction.
2. shared (S) mode. Data item can only be read.
S-lock is requested using lock-S instruction.
Lock requests are made to concurrency-control
manager. Transaction can proceed only after request is
granted.
Example of a transaction performing locking:
T2: lock-S(A);
read (A);
unlock(A);
lock-S(B);
read (B);
unlock(B);
display(A+B)
Locking as above is not sufficient to guarantee
serializability — if A and B get updated in-between the
read of A and B, the displayed sum would be wrong.
A locking protocol is a set of rules followed by all
transactions while requesting and releasing locks.
Locking protocols restrict the set of possible schedules.
Pitfalls of Lock-Based Protocols
 Consider the partial schedule

 Neither T3 nor T4 can make progress — executing lock-S(B) causes T4 to


wait for T3 to release its lock on B, while executing lock-X(A) causes T3 to
wait for T4 to release its lock on A.
 Such a situation is called a deadlock.
 To handle a deadlock one of T or T must be rolled back
3 4
and its locks released.
The potential for deadlock exists in most locking
protocols.

Starvation is also possible if concurrency control


manager is badly designed.

For example:

A transaction may be waiting for an X-lock on an item, while a
sequence of other transactions request and are granted an S-
lock on the same item.

The same transaction is repeatedly rolled back due to deadlocks.

Concurrency control manager can be designed to


prevent starvation.
The Two-Phase Locking Protocol
This is a protocol which ensures conflict-
serializable schedules.
By the 2PL protocols are applied and removed in
two phases.
Phase 1: Growing Phase

Locks are acquired and no locks are released.
Phase 2: Shrinking Phase

Locks are released and no locks are acquired.
The protocol assures serializability.
It can be proved that the transactions can be
serialized in the order of their lock points (i.e. the
point where a transaction acquired its final lock).
The Two-Phase Locking
Protocol (Cont.)
Two-phase locking does not ensure freedom from
deadlocks
Cascading roll-back is possible under two-phase locking.
To avoid this, follow a modified protocol called strict
two-phase locking.
Here a transaction must hold all its exclusive locks till it
commits/aborts.
Rigorous two-phase locking is even stricter: here all
locks are held till commit/abort.
In this protocol transactions can be serialized in the
order in which they commit.
There can be conflict serializable schedules that cannot
be obtained if two-phase locking is used.
Deadlock Handling
System is deadlocked if there is a set of transactions such
that every transaction in the set is waiting for another
transaction in the set.

E.g : there exists a set of waiting transaction


{ T0,T1,T2…Tn}

Such that T0 is waiting for a data item that T1 holds , and T1


is waiting for a data item that T2 hold and ….., and Tn-1 is
waiting for a data item that Tn holds , and Tn is waiting for a
data item that T0 holds.

Non of the transactions can make progress in such situation.


There are two principal methods for dealing with
deadlock problem.

1.A deadlock prevention protocol to ensure that the


system will never enter a deadlock state.

2.A deadlock detection and deadlock recovery


scheme, we can allow the system to enter a deadlock
state , and then try to recover by using deadlock
detection and deadlock recovery scheme.
deadlock prevention
There are two approaches to deadlock prevention

1.Ensures that no cyclic waits can occur by ordering the


request for locks.
2.Transaction rollback instead of waiting for a lock.

The first approach require that each transaction locks all its
data items before it begins execution.

Another approach for preventing deadlocks is to impose an


ordering of all data items, and to require that a transaction
lock data items only in a sequence consistent with the ordering
More Deadlock Prevention Strategies

Following schemes use transaction timestamps for


the sake of deadlock prevention alone.
wait-die scheme — non-preemptive

Older transaction may wait for younger one to release data item.

Younger transactions never wait for older ones; they are rolled
back instead.

a transaction may die several times before acquiring needed data
item
wound-wait scheme — preemptive

Older transaction wounds (forces rollback) of younger transaction
instead of waiting for it.

Younger transactions may wait for older ones.

may be fewer rollbacks than wait-die scheme.
deadlock detection and deadlock
Deadlock detection
recovery
Deadlocks can be described as a wait-for graph, which
consists of a pair G = (V,E),

V is a set of vertices (all the transactions in the system)

E is a set of edges; each element is an ordered pair Ti Tj.
If Ti  Tj is in E, then there is a directed edge from Ti to Tj,
implying that Ti is waiting for Tj to release a data item.
When Ti requests a data item currently being held by Tj, then
the edge Ti Tj is inserted in the wait-for graph. This edge is
removed only when Tj is no longer holding a data item
needed by Ti.
The system is in a deadlock state if and only if the wait-for
graph has a cycle.
Must invoke a deadlock-detection algorithm periodically to
look for cycles.
Deadlock Detection (Cont.)

Wait-for graph without a cycle Wait-for graph with a cycle


Deadlock Recovery
When detection algorithm determines that a deadlock
exists, the system must recover from the deadlock.
The most common solution is to rollback one or more
transactions to break the deadlock.
Three action need to be taken
 1.Selection of victim
 Some transaction will have to rolled back (made a victim) to break
deadlock.
 Select that transaction as victim that will incur minimum cost.
 2.Rollback
Once we have decided that a particular transaction
must be rollback .
We must determine how far this transaction should be
rolled back.
Total rollback -Abort the transaction and then restart it.

Partial rollback- Rollback the transaction only as far as


necessary to break the deadlock.

 3.Starvation
 In a system where the selection of victims is based primarily
on cost factor.
 It may happen that the some transaction always picked as a
victim.
 This transaction never completes its designated task , thus
there is starvation.
 We must ensure that a transaction can be picked as a victim
only a (small) finite number of times.
 The most common solution is to include the number of
rollbacks in the cost factor.
Timestamp-Based Protocols
With each transaction Ti in the system , data base system
associate a unique fixed timestamp.
Denoted by TS(Ti).
If an old transaction Ti has time-stamp TS(Ti), a new transaction Tj
is assigned time-stamp TS(Tj) then TS(Ti) <TS(Tj).
The protocol manages concurrent execution such that the time-
stamps determine the serializability order.
If TS(Ti) <TS(Tj) , then the system must ensure that the produced
schedule is equivalent to a serial schedule in which transaction Ti
appears before transaction Tj.
In order to assure such behavior, the protocol
maintains for each data Q two timestamp
values:


W-timestamp(Q) is the largest time-stamp of any
transaction that executed write(Q) successfully.


R-timestamp(Q) is the largest time-stamp of any
transaction that executed read(Q) successfully.
The timestamp-ordering protocol
The timestamp-ordering protocol ensuring that any
conflicting read and write operations are executed in
timestamp order.
This protocol operate as follows:

Suppose a transaction Ti issues a read(Q)


1. If TS(Ti)  W-timestamp(Q), then Ti needs to read a
value of Q that was already overwritten. Hence, the read
operation is rejected, and Ti is rolled back.
2. If TS(Ti) W-timestamp(Q), then the read operation is
executed, and R-timestamp(Q) is set to the maximum
of R- timestamp(Q) and TS(Ti).
Suppose that transaction Ti issues write(Q).
If TS(Ti) < R-timestamp(Q), then the value of Q
that Ti is producing was needed previously, and
the system assumed that value would never be
produced.
Hence, the write operation is rejected, and Ti is
rolled back.
Otherwise, the write operation is executed,
and W-timestamp(Q) is set to TS(Ti).
Recovery methods
There are various types of failure that may occur in a
system.

Each of which needs to be deal with different


manner.

The simplest failure – does not result the loss of


information in the system.

The difficult failure – deals with loss of information in


the system.
Types of failure
Transaction failure :

Logical errors: transaction cannot complete due to
some internal error condition

System errors: the database system must
terminate an active transaction due to an error
condition (e.g., deadlock)

System crash: a power failure or other hardware or


software failure causes the system to crash.

Disk failure: a head crash or similar disk failure


destroys all or part of disk storage
Recovery Algorithms
Recovery algorithms are techniques to ensure
database consistency and transaction atomicity
and durability despite failures.

Recovery algorithms have two parts


1. Actions taken during normal transaction processing
to ensure enough information exists to recover
from failures.
2. Actions taken after a failure to recover the database
contents to a state that ensures atomicity,
consistency and durability.
Recovery and Atomicity
Modifying the database without ensuring that the transaction
will commit may leave the database in an inconsistent state.

Consider transaction Ti that transfers $50 from account A to


account B; goal is either to perform all database modifications
made by Ti or none at all.

Several output operations may be required for Ti (to output A


and B). A failure may occur after one of these modifications
have been made but before all of them are made.
We study two approaches:

Log-Based Recovery, and

Shadow-Paging

We assume (initially) that transactions run


serially, that is, one after the other.
Log-Based Recovery
A log is kept on stable storage.

The log is a sequence of log records, and maintains a record of
update activities on the database.
When transaction Ti starts, it registers itself by writing a
<Ti start>log record
Before Ti executes write(X), a log record <Ti, X, V1, V2> is written, where
V1 is the value of X before the write, and V2 is the value to be written to
X.

Log record notes that Ti has performed a write on data item Xj Xj had
value V1 before the write, and will have value V2 after the write.
When Ti finishes it last statement, the log record <Ti commit> is written.
We assume for now that log records are written directly to stable
storage (that is, they are not buffered)
Two approaches using logs

Deferred database modification

Immediate database modification
Deferred Database Modification
The deferred database modification scheme records all
modifications to the log, but defers(postpone) all the
writes to after partial commit.
Assume that transactions execute serially
Transaction starts by writing <Ti start> record to log.
A write(X) operation results in a log record <Ti, X, V>
being written, where V is the new value for X

Note: old value is not needed for this scheme
The write is not performed on X at this time, but is
deferred.
When Ti partially commits, <Ti commit> is written to the
log
Finally, the log records are read and used to actually
execute the previously deferred writes.
Deferred Database Modification (Cont.)
During recovery after a crash, a transaction needs to be redone if and
only if both <Ti start> and<Ti commit> are there in the log.
Redoing a transaction Ti ( redoTi) sets the value of all data items
updated by the transaction to the new values.
Crashes can occur while

the transaction is executing the original updates, or

while recovery action is being taken
example transactions T0 and T1 (T0 executes before T1):
T0: read (A) T1 : read (C)
A: - A - 50 C:-C- 100
Write (A) write (C)
read (B)
B:- B + 50
write (B)
Deferred Database Modification (Cont.)
Below we show the log as it appears at three instances of time.

If log on stable storage at time of crash is as in case:


(a) No redo actions need to be taken
(b) redo(T0) must be performed since < T0 commit> is present
(c) redo(T0) must be performed followed by redo( T1) since
<T0 commit> and <Ti commit> are present
Immediate Database Modification
The immediate database modification scheme allows
database updates of an uncommitted transaction to be made
as the writes are issued

since undoing may be needed, update logs must have both old
value and new value
Update log record must be written before database item is
written

We assume that the log record is output directly to stable storage

Can be extended to postpone log record output, so long as prior
to execution of an output(B) operation for a data block B, all log
records corresponding to items B must be flushed to stable
storage
Output of updated blocks can take place at any time before
or after transaction commit
Order in which blocks are output can be different from the
order in which they are written.
Immediate Database Modification Example

Log Write Output

<T0 start>
<T0, A, 1000, 950>
To, B, 2000, 2050
A = 950
B = 2050
<T0 commit>
x1
<T1 start>
<T1, C, 700, 600>
C = 600
BB , BC
<T1 commit>
BA
Note: BX denotes block containing X.
Immediate Database Modification (Cont.)
Recovery procedure has two operations instead of one:

undo(Ti) restores the value of all data items updated by Ti to their old
values, going backwards from the last log record for Ti

redo(Ti) sets the value of all data items updated by Ti to the new values,
going forward from the first log record for Ti
Both operations must be idempotent

That is, even if the operation is executed multiple times the effect is the
same as if it is executed once
Needed since operations may get re-executed during recovery
When recovering after failure:

Transaction Ti needs to be undone if the log contains the record
<Ti start>, but does not contain the record <Ti commit>.

Transaction Ti needs to be redone if the log contains both the record <Ti
start> and the record <Ti commit>.
Undo operations are performed first, then redo operations.
Immediate DB Modification Recovery Example
Below we show the log as it appears at three instances of time.

Recovery actions in each case above are:


(a) undo (T0): B is restored to 2000 and A to 1000.
(b) undo (T1) and redo (T0): C is restored to 700, and then A and B are
set to 950 and 2050 respectively.
(c) redo (T0) and redo (T1): A and B are set to 950 and 2050
respectively. Then C is set to 600
Shadow Paging
Shadow paging is an alternative to log-based recovery; this scheme is useful if
transactions execute serially
The database is partitioned into some number of fixed length blocks, which are
referred to as pages.
The pages are stored in any random order on the disk .
Therefore there is a page table which points to the blocks on the disk.
The key Idea behind the shadow paging technique is to maintain two page tables
during the lifetime of a transaction –the current page table, and the shadow page
table
Store the shadow page table in nonvolatile storage, such that state of the database
prior to transaction execution may be recovered.

Shadow page table is never modified during execution.

Current page table may changed when a transaction performs a write operation.
To start with, both the page tables are identical. Only current page table is used for
data item accesses during execution of the transaction.
Whenever any page is about to be written for the first time

A copy of this page is made onto an unused page.

The current page table is then made to point to the copy

The update is performed on the copy
Sample Page Table
Example of Shadow Paging

Shadow and current page tables after write to page 4


Shadow Paging (Cont.)
To commit a transaction :
1. Flush all modified pages in main memory to disk
2. Output current page table to disk
3. Make the current page table the new shadow page table, as
follows:

keep a pointer to the shadow page table at a fixed (known)
location on disk.

to make the current page table the new shadow page table,
simply update the pointer to point to current page table on disk
Once pointer to shadow page table has been written, transaction is
committed.
No recovery is needed after a crash — new transactions can start
right away, using the shadow page table.
Pages not pointed to from current/shadow page table should be
freed (garbage collected).
Checkpoints
Problems in recovery procedure as discussed earlier :
1. searching the entire log is time-consuming

2. we might unnecessarily redo transactions which

have already
3. output their updates to the database.

Streamline recovery procedure by periodically


performing checkpointing
1. Output all log records currently residing in main

memory onto stable storage.


2. Output all modified buffer blocks to the disk.

3. Write a log record < checkpoint> onto stable

storage.
Checkpoints (Cont.)
During recovery we need to consider only the most
recent transaction Ti that started before the
checkpoint, and transactions that started after Ti.
1. Scan backwards from end of log to find the most recent
<checkpoint> record
2. Continue scanning backwards till a record <Ti start> is
found.
3. Need only consider the part of log following above start
record. Earlier part of log can be ignored during recovery,
and can be erased whenever desired.
4. For all transactions (starting from Ti or later) with no <Ti
commit>, execute undo(Ti). (Done only in case of
immediate modification.)
5. Scanning forward in the log, for all transactions starting
from Ti or later with a <Ti commit>, execute
redo(Ti).
Example of Checkpoints
Tc Tf
T1
T2
T3
T4

checkpoint system failure

T1 can be ignored (updates already output to disk


due to checkpoint)
T2 and T3 redone.
T4 undone
End of Unit IV

You might also like