Unit-2 Notes - Part-2
Unit-2 Notes - Part-2
Transaction Operations
The low level operations performed in a transaction are −
• commit − A signal to specify that the transaction has been successfully completed in
its entirety and will not be undone.
• rollback − A signal to specify that the transaction has been unsuccessful and so all
temporary changes in the database are undone. A committed transaction cannot be
rolled back.
• Consistency − A transaction should take the database from one consistent state to
another consistent state. It should not adversely affect any data item in the database.
Goal:
The goal of transaction management in a distributed database is to control the execution of
transactions so that: 1. Transactions have atomicity, durability, serializability and isolation
properties.
• CPU and main memory utilization
• Control messages
• Response time
• Availability
Distributed Transactions
A distributed transaction is a database transaction in which two or more network hosts are
involved. Usually, hosts provide transactional resources, while the transaction manager is
responsible for creating and managing a global transaction that encompasses all operations
against such resources.
• After each slave has locally completed its transaction, it sends a “DONE” message to
the controlling site. When the controlling site has received “DONE” message from
all slaves, it sends a “Prepare” message to the slaves.
• The slaves vote on whether they still want to commit or not. If a slave wants to
commit, it sends a “Ready” message.
• A slave that does not want to commit sends a “Not Ready” message. This may
happen when the slave has conflicting concurrent transactions or there is a timeout.
• After the controlling site has received “Ready” message from all the slaves −
o The slaves apply the transaction and send a “Commit ACK” message to the
controlling site.
o When the controlling site receives “Commit ACK” message from all the
slaves, it considers the transaction as committed.
• After the controlling site has received the first “Not Ready” message from any slave
−
o The slaves abort the transaction and send a “Abort ACK” message to the
controlling site.
o When the controlling site receives “Abort ACK” message from all the slaves,
it considers the transaction as aborted.
Concurrency control for distributed Transactions
Locking-based concurrency control systems can use either one-phase or two-phase locking
protocols.
• Centralized
2 2
3 3
1 1 1
4 4
5 5
Prepare Ready or Abort Commit or Abort ACK
Figure3. 2 Centralized
• Hierarchial 3
3
2
2
4 1
4 1
1
5
5
Prepare Ready or Abort Commit or Abort ACK
3
3
2
2
4 1
4 1
1
5
5
• Linear
(Commit or Abort)
1 2 3 4
(Prepare or Ready)
Ordering is defined
• Distributed
2 2
1
3 3
4 4
5 5
Concurrency Control
Concurrency controlling techniques ensure that multiple transactions are executed
simultaneously while maintaining the ACID properties of the transactions and serializability
in the schedules.
• Serial Schedules − In a serial schedule, at any point of time, only one transaction is
active, i.e. there is no overlapping of transactions. This is depicted in the following
graph −
Conflicts in Schedules
In a schedule comprising of multiple transactions, a conflict occurs when two active
transactions perform non-compatible operations. Two operations are said to be in conflict,
when all of the following three conditions exists simultaneously −
• At least one of the operations is a write_item() operation, i.e. it tries to modify the
data item.
Serializability
A serializable schedule of ‘n’ transactions is a parallel schedule which is equivalent to a
serial schedule comprising of the same ‘n’ transactions. A serializable schedule contains the
correctness of serial schedule while ascertaining better CPU utilization of parallel schedule.
Equivalence of Schedules
Equivalence of two schedules can be of the following types −
• Result equivalence − Two schedules producing identical results are said to be result
equivalent.
• View equivalence − Two schedules that perform similar action in a similar manner
are said to be view equivalent.
Distributed deadlocks
Distributed deadlocks can occur in distributed systems whendistributed transactions or
concurrency control is being used.Distributed deadlocks can be detected either by
constructing a global wait-for graph from local wait-for graphs at a deadlockdetector or by
a distributed algorithm like edge chasing.
Transaction processing in a distributed database system is also distributed, i.e. the same
transaction may be processing at more than one site. The two main deadlock handling
concerns in a distributed database system that are not present in a centralized system
are transaction location and transaction control. Once these concerns are addressed,
deadlocks are handled through any of deadlock prevention, deadlock avoidance or deadlock
detection and removal.
Transaction Location
Transactions in a distributed database system are processed in multiple sites and use data
items in multiple sites. The amount of data processing is not uniformly distributed among
these sites. The time period of processing also varies. Thus the same transaction may be
active at some sites and inactive at others. When two conflicting transactions are located in a
site, it may happen that one of them is in inactive state. This condition does not arise in a
centralized system. This concern is called transaction location issue.
This concern may be addressed by Daisy Chain model. In this model, a transaction carries
certain details when it moves from one site to another. Some of the details are the list of
tables required, the list of sites required, the list of visited tables and sites, the list of tables
and sites that are yet to be visited and the list of acquired locks with types. After a
transaction terminates by either commit or abort, the information should be sent to all the
concerned sites.
Transaction Control
Transaction control is concerned with designating and controlling the sites required for
processing a transaction in a distributed database system. There are many options regarding
the choice of where to process the transaction and how to designate the center of control,
like −
The site where the transaction enters is designated as the controlling site. The controlling
site sends messages to the sites where the data items are located to lock the items. Then it
waits for confirmation. When all the sites have confirmed that they have locked the data
items, transaction starts. If any site or communication link fails, the transaction has to wait
until they have been repaired.
• In case of site or link failure, a transaction has to wait for a long time so that the sites
recover. Meanwhile, in the running sites, the items are locked. This may prevent
other transactions from executing.
• If the controlling site fails, it cannot communicate with the other sites. These sites
continue to keep the locked data items in their locked state, thus resulting in
blocking.
• Distributed Wound-Die
• Distributed Wait-Wait
Alternatively, deadlock detection algorithms can use timers. Each transaction is associated
with a timer which is set to a time period in which a transaction is expected to finish. If a
transaction does not finish within this time period, the timer goes off, indicating a possible
deadlock.
Another tool used for deadlock handling is a deadlock detector. In a centralized system,
there is one deadlock detector. In a distributed system, there can be more than one deadlock
detectors. A deadlock detector can find deadlocks for the sites under its control. There are
three alternatives for deadlock detection in a distributed system, namely.
• Distributed Deadlock Detector − All the sites participate in detecting deadlocks and
removing them.
Timestamp is a unique identifier created by the DBMS to identify a transaction. They are
usually assigned in the order in which they are submitted to the system. Refer to the
timestamp of a transaction T as TS(T). For basics of Timestamp you may refer here.
Timestamp Ordering Protocol –
The main idea for this protocol is to order the transactions based on their Timestamps. A
schedule in which the transactions participate is then serializable and the only equivalent
serial schedule permitted has the transactions in the order of their Timestamp Values. Stating
simply, the schedule is equivalent to the particular Serial Order corresponding to the order of
the Transaction timestamps. Algorithm must ensure that, for each items accessed
by Conflicting Operations in the schedule, the order in which the item is accessed does not
violate the ordering. To ensure this, use two Timestamp Values relating to each database
item X.
These algorithms ensure that transactions commit in the order dictated by their timestamps.
An older transaction should commit before a younger transaction, since the older transaction
enters the system before the younger one.
• Late Transaction Rule − If a younger transaction has written a data item, then an
older transaction is not allowed to read or write that data item. This rule prevents the
older transaction from committing after the younger transaction has already
committed.
• Younger Transaction Rule − A younger transaction can read or write a data item
that has already been written by an older transaction.
Basic Timestamp Ordering –
Every transaction is issued a timestamp based on when it enters the system. Suppose, if an
old transaction Ti has timestamp TS(Ti), a new transaction Tj is assigned timestamp TS(Tj)
such that TS(Ti) < TS(Tj).The protocol manages concurrent execution such that the
timestamps determine the serializability order. The timestamp ordering protocol ensures that
any conflicting read and write operations are executed in timestamp order. Whenever some
Transaction T tries to issue a R_item(X) or a W_item(X), the Basic TO algorithm compares
the timestamp of T with R_TS(X) & W_TS(X) to ensure that the Timestamp order is not
violated. This describe the Basic TO protocol in following two cases.
1. Whenever a Transaction T issues a W_item(X) operation, check the following
conditions:
1.
• If R_TS(X) > TS(T) or if W_TS(X) > TS(T), then abort and rollback T and reject
the operation. else,
• Execute W_item(X) operation of T and set W_TS(X) to TS(T).
2. Whenever a Transaction T issues a R_item(X) operation, check the following
conditions:
• If W_TS(X) > TS(T), then abort and reject T and reject the operation, else
• If W_TS(X) <= TS(T), then execute the R_item(X) operation of T and set
R_TS(X) to the larger of TS(T) and current R_TS(X).
Whenever the Basic TO algorithm detects twp conflicting operation that occur in incorrect
order, it rejects the later of the two operation by aborting the Transaction that issued it.
Schedules produced by Basic TO are guaranteed to be conflict serializable. Already
discussed that using Timestamp, can ensure that our schedule will be deadlock free.
One drawback of Basic TO protocol is that it Cascading Rollbackis still possible. Suppose
we have a Transaction T1 and T2 has used a value written by T1. If T1 is aborted and
resubmitted to the system then, T must also be aborted and rolled back. So the problem of
Cascading aborts still prevails.
Let’s gist the Advantages and Disadvantages of Basic TO protocol:
• Timestamp Ordering protocol ensures serializablity
• Timestamp protocol ensures freedom from deadlock as no transaction ever waits.
• But the schedule may not be cascade free, and may not even be recoverable.
Optimistic Concurrency Control Algorithm
In systems with low conflict rates, the task of validating every transaction for serializability
may lower performance. In these cases, the test for serializability is postponed to just before
commit. Since the conflict rate is low, the probability of aborting transactions which are not
serializable is also low. This approach is called optimistic concurrency control technique.
In this approach, a transaction’s life cycle is divided into the following three phases −
• Commit Phase − A transaction writes back modified data item in memory to the
disk.
Rule 1 − Given two transactions Ti and Tj, if Ti is reading the data item which Tj is writing,
then Ti’s execution phase cannot overlap with Tj’s commit phase. Tj can commit only after
Ti has finished execution.
Rule 2 − Given two transactions Ti and Tj, if Ti is writing the data item that Tj is reading,
then Ti’s commit phase cannot overlap with Tj’s execution phase. Tj can start executing only
after Ti has already committed.
Rule 3 − Given two transactions Ti and Tj, if Ti is writing the data item which Tj is also
writing, then Ti’s commit phase cannot overlap with Tj’s commit phase. Tj can start to
commit only after Ti has already committed.