Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
7 views15 pages

Unit-2 Notes - Part-2

Uploaded by

ambikamani.phd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views15 pages

Unit-2 Notes - Part-2

Uploaded by

ambikamani.phd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Unit III

MANAGEMENT OF DISTRIBUTED TRANSACTIONS

Management of Distributed Transactions- Framework for Transaction Management-


Supporting Atomicity of Distributed Transactions- Concurrency Control for Distributed
Transactions- Architectural Aspects of Distributed Transactions-Concurrency Control-
Foundation of Distributed Concurrency Control- Distributed Deadlocks-Concurrency
Control based on Timestamps- Optimistic Methods for Distributed Concurrency Control

A transaction is a program including a collection of database operations, executed as a


logical unit of data processing. The operations performed in a transaction include one or more
of database operations like insert, delete, update or retrieve data.
• read_item() − reads data item from storage to main memory.

• modify_item() − change value of item in the main memory.

• write_item() − write the modified value from main memory to storage.

Transaction Operations
The low level operations performed in a transaction are −

• begin_transaction − A marker that specifies start of transaction execution.

• read_item or write_item − Database operations that may be interleaved with main


memory operations as a part of transaction.

• end_transaction − A marker that specifies end of transaction.

• commit − A signal to specify that the transaction has been successfully completed in
its entirety and will not be undone.
• rollback − A signal to specify that the transaction has been unsuccessful and so all
temporary changes in the database are undone. A committed transaction cannot be
rolled back.

Desirable Properties of Transactions


Any transaction must maintain the ACID properties, viz. Atomicity, Consistency, Isolation,
and Durability.

• Atomicity − This property states that a transaction is an atomic unit of processing,


that is, either it is performed in its entirety or not performed at all. No partial update
should exist.

• Consistency − A transaction should take the database from one consistent state to
another consistent state. It should not adversely affect any data item in the database.

• Isolation − A transaction should be executed as if it is the only one in the system.


There should not be any interference from the other concurrent transactions that are
simultaneously running.

• Durability − If a committed transaction brings about a change, that change should be


durable in the database and not lost in case of any failure.
States of a transaction
Active: Initial state and during the execution
Partially committed: After the final statement has been executed
Committed: After successful completion
Failed: After the discovery that normal execution can no longer proceed
Aborted: After the transaction has been rolled back and the DB restored to its state
prior to the start of the transaction. Restart it again or kill it.

Goal:
The goal of transaction management in a distributed database is to control the execution of
transactions so that: 1. Transactions have atomicity, durability, serializability and isolation
properties.
• CPU and main memory utilization
• Control messages
• Response time
• Availability

Distributed Transactions
A distributed transaction is a database transaction in which two or more network hosts are
involved. Usually, hosts provide transactional resources, while the transaction manager is
responsible for creating and managing a global transaction that encompasses all operations
against such resources.

Supporting Atomicity of Distributed Transactions


Logs:
A log contains information for undoing or redoing all actions which are performed by
transactions. The log record contains
• Identifier of the transaction
• Identifier of the record
Type of action(insert,delete, modify)
• Old record value
• New record value
• Auxiliary information for the recovery procedure
Recovery procedures:
When a failure occurs a recovery procedure reads the log file and performs the following
operations,
• Determine all noncommitted transactions that have to be undone
• Determine all transactions which need to be redone.
• Undo the transactions determined at step 1 and redo the transactions determined at
step 2.

Recovery of distributed transactions


Each site have alocal transaction manager(LTM) which is capable of implementing
local transactions.

Figure 3.1 Reference Model of istributed transaction recovery

The relationship between distributed transaction management and local transaction


management is represented in the reference model. At the bottom level we have the local
transaction managers which do not need communication between them. The LTM’s
implement interface Local_begin. Local_commit, and local_abort. At the next higher level
we have the distributed transaction manager. DTM is by its nature a distributed a distributed
level;DTM will be implemented by a set of local DTM agents which exchanges messages
between them. DTM implements interface begin_transasction , commit, abort, and create.
At the higher level we have the distributed transaction , constituted by the root agent and the
other agents.

The 2-phase commit protocol


Distributed two-phase commit reduces the vulnerability of one-phase commit protocols. The
steps performed in the two phases are as follows −

Phase 1: Prepare Phase

• After each slave has locally completed its transaction, it sends a “DONE” message to
the controlling site. When the controlling site has received “DONE” message from
all slaves, it sends a “Prepare” message to the slaves.

• The slaves vote on whether they still want to commit or not. If a slave wants to
commit, it sends a “Ready” message.

• A slave that does not want to commit sends a “Not Ready” message. This may
happen when the slave has conflicting concurrent transactions or there is a timeout.

Phase 2: Commit/Abort Phase

• After the controlling site has received “Ready” message from all the slaves −

o The controlling site sends a “Global Commit” message to the slaves.

o The slaves apply the transaction and send a “Commit ACK” message to the
controlling site.

o When the controlling site receives “Commit ACK” message from all the
slaves, it considers the transaction as committed.

• After the controlling site has received the first “Not Ready” message from any slave

o The controlling site sends a “Global Abort” message to the slaves.

o The slaves abort the transaction and send a “Abort ACK” message to the
controlling site.

o When the controlling site receives “Abort ACK” message from all the slaves,
it considers the transaction as aborted.
Concurrency control for distributed Transactions

Locking Based Concurrency Control Protocols


Locking-based concurrency control protocols use the concept of locking data items.
A lock is a variable associated with a data item that determines whether read/write
operations can be performed on that data item. Generally, a lock compatibility matrix is used
which states whether a data item can be locked by two transactions at the same time.

Locking-based concurrency control systems can use either one-phase or two-phase locking
protocols.

One-phase Locking Protocol


In this method, each transaction locks an item before use and releases the lock as soon as it
has finished using it. This locking method provides for maximum concurrency but does not
always enforce serializability.

Two-phase Locking Protocol


In this method, all locking operations precede the first lock-release or unlock operation. The
transaction comprise of two phases. In the first phase, a transaction only acquires all the
locks it needs and do not release any lock. This is called the expanding or the growing
phase. In the second phase, the transaction releases the locks and cannot request any new
locks. This is called the shrinking phase.

Every transaction that follows two-phase locking protocol is guaranteed to be serializable.


However, this approach provides low parallelism between two conflicting transactions.

Architectural Aspects of Distributed Transactions

• Structure of the computation

• Communication of a distributed transactions


• Sessions and datagrams:
The communications between processes or servers can be performed through sessions and
datagrams. Sessions have a basic advantage: the authentication and identification functions
need to be oerformed only once and then messages can be exchanged without repeating
these operations.

Communication structure for commit protocols

• Centralized

2 2

3 3
1 1 1

4 4

5 5
Prepare Ready or Abort Commit or Abort ACK

Figure3. 2 Centralized

• Hierarchial 3
3
2
2
4 1
4 1
1

5
5
Prepare Ready or Abort Commit or Abort ACK

Figure 3.3 Hierarchial

3
3
2
2
4 1
4 1
1

5
5

Figure 3.4 Hierarchial

• Linear
(Commit or Abort)

1 2 3 4
(Prepare or Ready)

Figure 3.5 Linear

Ordering is defined

• Distributed
2 2
1
3 3

4 4

5 5

Prepare Ready or Abort


(No messages are required for the decision)

Figure 3.6 Distributed

Concurrency Control
Concurrency controlling techniques ensure that multiple transactions are executed
simultaneously while maintaining the ACID properties of the transactions and serializability
in the schedules.

Serializability in distributed database


In a system with a number of simultaneous transactions, a schedule is the total order of
execution of operations. Given a schedule S comprising of n transactions, say T1, T2,
T3………..Tn; for any transaction Ti, the operations in Ti must execute as laid down in the
schedule S.
Types of Schedules
There are two types of schedules −

• Serial Schedules − In a serial schedule, at any point of time, only one transaction is
active, i.e. there is no overlapping of transactions. This is depicted in the following
graph −

Figure 3.7 Serial Schedules


• Parallel Schedules − In parallel schedules, more than one transactions are active
simultaneously, i.e. the transactions contain operations that overlap at time. This is
depicted in the following graph −

Figure 3.8 Parallel Schedules

Conflicts in Schedules
In a schedule comprising of multiple transactions, a conflict occurs when two active
transactions perform non-compatible operations. Two operations are said to be in conflict,
when all of the following three conditions exists simultaneously −

• The two operations are parts of different transactions.

• Both the operations access the same data item.

• At least one of the operations is a write_item() operation, i.e. it tries to modify the
data item.

Serializability
A serializable schedule of ‘n’ transactions is a parallel schedule which is equivalent to a
serial schedule comprising of the same ‘n’ transactions. A serializable schedule contains the
correctness of serial schedule while ascertaining better CPU utilization of parallel schedule.

Equivalence of Schedules
Equivalence of two schedules can be of the following types −

• Result equivalence − Two schedules producing identical results are said to be result
equivalent.

• View equivalence − Two schedules that perform similar action in a similar manner
are said to be view equivalent.

• Conflict equivalence − Two schedules are said to be conflict equivalent if both


contain the same set of transactions and has the same order of conflicting pairs of
operations.
Serial schedules have less resource utilization and low throughput. To improve it, two are
more transactions are run concurrently. But concurrency of transactions may lead to
inconsistency in database. To avoid this, we need to check whether these concurrent
schedules are serializable or not.
Conflict Serializable: A schedule is called conflict serializable if it can be transformed into a
serial schedule by swapping non-conflicting operations.
Conflicting operations: Two operations are said to be conflicting if all conditions satisfy:
• They belong to different transaction
• They operation on same data item
• At Least one of them is a write operation
Example: –
• Conflicting operations pair (R1(A), W2(A)) because they belong to two different
transactions on same data item A and one of them is write operation.
• Similarly, (W1(A), W2(A)) and (W1(A), R2(A)) pairs are also conflicting.
• On the other hand, (R1(A), W2(B)) pair is non-conflictingbecause they operate on
different data item.
• Similarly, ((W1(A), W2(B)) pair is non-conflicting.
Consider the following schedule:
S1: R1(A), W1(A), R2(A), W2(A), R1(B), W1(B), R2(B), W2(B)
If Oi and Oj are two operations in a transaction and Oi< Oj (Oi is executed before Oj),
same order will follow in schedule as well. Using this property, we can get two
transactions of schedule S1 as:
T1: R1(A), W1(A), R1(B), W1(B)
T2: R2(A), W2(A), R2(B), W2(B)
Possible Serial Schedules are: T1->T2 or T2->T1
-> Swapping non-conflicting operations R2(A) and R1(B) in S1, the schedule becomes,
S11: R1(A), W1(A), R1(B), W2(A), R2(A), W1(B), R2(B), W2(B)
-> Similarly, swapping non-conflicting operations W2(A) and W1(B) in S11, the
schedule becomes,
S12: R1(A), W1(A), R1(B), W1(B), R2(A), W2(A), R2(B), W2(B)
S12 is a serial schedule in which all operations of T1 are performed before starting any
operation of T2. Since S has been transformed into a serial schedule S12 by swapping
non-conflicting operations of S1, S1 is conflict serializable.
Let us take another Schedule:
S2: R2(A), W2(A), R1(A), W1(A), R1(B), W1(B), R2(B), W2(B)
Two transactions will be:
T1: R1(A), W1(A), R1(B), W1(B)
T2: R2(A), W2(A), R2(B), W2(B)
Possible Serial Schedules are: T1->T2 or T2->T1
Original Schedule is:
S2: R2(A), W2(A), R1(A), W1(A), R1(B), W1(B), R2(B), W2(B)
Swapping non-conflicting operations R1(A) and R2(B) in S2, the schedule becomes,
S21: R2(A), W2(A), R2(B), W1(A), R1(B), W1(B), R1(A), W2(B)
Similarly, swapping non-conflicting operations W1(A) and W2(B) in S21, the schedule
becomes,
S22: R2(A), W2(A), R2(B), W2(B), R1(B), W1(B), R1(A), W1(A)
In schedule S22, all operations of T2 are performed first, but operations of T1 are not in
order (order should be R1(A), W1(A), R1(B), W1(B)). So S2 is not conflict serializable.
Conflict Equivalent: Two schedules are said to be conflict equivalent when one can be
transformed to another by swapping non-conflicting operations. In the example discussed
above, S11 is conflict equivalent to S1 (S1 can be converted to S11 by swapping non-
conflicting operations). Similarly, S11 is conflict equivalent to S12 and so on.
Note 1: Although S2 is not conflict serializable, but still it is conflict equivalent to S21
and S21 because S2 can be converted to S21 and S22 by swapping non-conflicting
operations.
Note 2: The schedule which is conflict serializable is always conflict equivalent to one of
the serial schedule. S1 schedule discussed above (which is conflict serializable) is
equivalent to serial schedule (T1->T2).

Distributed deadlocks
Distributed deadlocks can occur in distributed systems whendistributed transactions or
concurrency control is being used.Distributed deadlocks can be detected either by
constructing a global wait-for graph from local wait-for graphs at a deadlockdetector or by
a distributed algorithm like edge chasing.
Transaction processing in a distributed database system is also distributed, i.e. the same
transaction may be processing at more than one site. The two main deadlock handling
concerns in a distributed database system that are not present in a centralized system
are transaction location and transaction control. Once these concerns are addressed,
deadlocks are handled through any of deadlock prevention, deadlock avoidance or deadlock
detection and removal.

Transaction Location
Transactions in a distributed database system are processed in multiple sites and use data
items in multiple sites. The amount of data processing is not uniformly distributed among
these sites. The time period of processing also varies. Thus the same transaction may be
active at some sites and inactive at others. When two conflicting transactions are located in a
site, it may happen that one of them is in inactive state. This condition does not arise in a
centralized system. This concern is called transaction location issue.

This concern may be addressed by Daisy Chain model. In this model, a transaction carries
certain details when it moves from one site to another. Some of the details are the list of
tables required, the list of sites required, the list of visited tables and sites, the list of tables
and sites that are yet to be visited and the list of acquired locks with types. After a
transaction terminates by either commit or abort, the information should be sent to all the
concerned sites.

Transaction Control
Transaction control is concerned with designating and controlling the sites required for
processing a transaction in a distributed database system. There are many options regarding
the choice of where to process the transaction and how to designate the center of control,
like −

• One server may be selected as the center of control.


• The center of control may travel from one server to another.
• The responsibility of controlling may be shared by a number of servers.
Distributed Deadlock Prevention
Just like in centralized deadlock prevention, in distributed deadlock prevention approach, a
transaction should acquire all the locks before starting to execute. This prevents deadlocks.

The site where the transaction enters is designated as the controlling site. The controlling
site sends messages to the sites where the data items are located to lock the items. Then it
waits for confirmation. When all the sites have confirmed that they have locked the data
items, transaction starts. If any site or communication link fails, the transaction has to wait
until they have been repaired.

Though the implementation is simple, this approach has some drawbacks −

• Pre-acquisition of locks requires a long time for communication delays. This


increases the time required for transaction.

• In case of site or link failure, a transaction has to wait for a long time so that the sites
recover. Meanwhile, in the running sites, the items are locked. This may prevent
other transactions from executing.

• If the controlling site fails, it cannot communicate with the other sites. These sites
continue to keep the locked data items in their locked state, thus resulting in
blocking.

Distributed Deadlock Avoidance


As in centralized system, distributed deadlock avoidance handles deadlock prior to
occurrence. Additionally, in distributed systems, transaction location and transaction control
issues needs to be addressed. Due to the distributed nature of the transaction, the following
conflicts may occur −

• Conflict between two transactions in the same site.


• Conflict between two transactions in different sites.
In case of conflict, one of the transactions may be aborted or allowed to wait as per
distributed wait-die or distributed wound-wait algorithms.
Let us assume that there are two transactions, T1 and T2. T1 arrives at Site P and tries to
lock a data item which is already locked by T2 at that site. Hence, there is a conflict at Site
P. The algorithms are as follows −

• Distributed Wound-Die

o If T1 is older than T2, T1 is allowed to wait. T1 can resume execution after


Site P receives a message that T2 has either committed or aborted
successfully at all sites.

o If T1 is younger than T2, T1 is aborted. The concurrency control at Site P


sends a message to all sites where T1 has visited to abort T1. The controlling
site notifies the user when T1 has been successfully aborted in all the sites.

• Distributed Wait-Wait

o If T1 is older than T2, T2 needs to be aborted. If T2 is active at Site P, Site P


aborts and rolls back T2 and then broadcasts this message to other relevant
sites. If T2 has left Site P but is active at Site Q, Site P broadcasts that T2 has
been aborted; Site L then aborts and rolls back T2 and sends this message to
all sites.

o If T1 is younger than T1, T1 is allowed to wait. T1 can resume execution after


Site P receives a message that T2 has completed processing.

Distributed Deadlock Detection


Just like centralized deadlock detection approach, deadlocks are allowed to occur and are
removed if detected. The system does not perform any checks when a transaction places a
lock request. For implementation, global wait-for-graphs are created. Existence of a cycle in
the global wait-for-graph indicates deadlocks. However, it is difficult to spot deadlocks
since transaction waits for resources across the network.

Alternatively, deadlock detection algorithms can use timers. Each transaction is associated
with a timer which is set to a time period in which a transaction is expected to finish. If a
transaction does not finish within this time period, the timer goes off, indicating a possible
deadlock.

Another tool used for deadlock handling is a deadlock detector. In a centralized system,
there is one deadlock detector. In a distributed system, there can be more than one deadlock
detectors. A deadlock detector can find deadlocks for the sites under its control. There are
three alternatives for deadlock detection in a distributed system, namely.

• Centralized Deadlock Detector − One site is designated as the central deadlock


detector.
• Hierarchical Deadlock Detector − A number of deadlock detectors are arranged in
hierarchy.

• Distributed Deadlock Detector − All the sites participate in detecting deadlocks and
removing them.

Time and time stamps in a distributed database

Timestamp is a unique identifier created by the DBMS to identify a transaction. They are
usually assigned in the order in which they are submitted to the system. Refer to the
timestamp of a transaction T as TS(T). For basics of Timestamp you may refer here.
Timestamp Ordering Protocol –
The main idea for this protocol is to order the transactions based on their Timestamps. A
schedule in which the transactions participate is then serializable and the only equivalent
serial schedule permitted has the transactions in the order of their Timestamp Values. Stating
simply, the schedule is equivalent to the particular Serial Order corresponding to the order of
the Transaction timestamps. Algorithm must ensure that, for each items accessed
by Conflicting Operations in the schedule, the order in which the item is accessed does not
violate the ordering. To ensure this, use two Timestamp Values relating to each database
item X.

• W_TS(X) is the largest timestamp of any transaction that


executed write(X) successfully.
• R_TS(X) is the largest timestamp of any transaction that
executed read(X) successfully.

Timestamp Concurrency Control Algorithms


Timestamp-based concurrency control algorithms use a transaction’s timestamp to
coordinate concurrent access to a data item to ensure serializability. A timestamp is a unique
identifier given by DBMS to a transaction that represents the transaction’s start time.

These algorithms ensure that transactions commit in the order dictated by their timestamps.
An older transaction should commit before a younger transaction, since the older transaction
enters the system before the younger one.

Timestamp-based concurrency control techniques generate serializable schedules such that


the equivalent serial schedule is arranged in order of the age of the participating transactions.

Some of timestamp based concurrency control algorithms are −

• Basic timestamp ordering algorithm.


• Conservative timestamp ordering algorithm.
• Multiversion algorithm based upon timestamp ordering.
Timestamp based ordering follow three rules to enforce serializability −
• Access Rule − When two transactions try to access the same data item
simultaneously, for conflicting operations, priority is given to the older transaction.
This causes the younger transaction to wait for the older transaction to commit first.

• Late Transaction Rule − If a younger transaction has written a data item, then an
older transaction is not allowed to read or write that data item. This rule prevents the
older transaction from committing after the younger transaction has already
committed.

• Younger Transaction Rule − A younger transaction can read or write a data item
that has already been written by an older transaction.
Basic Timestamp Ordering –
Every transaction is issued a timestamp based on when it enters the system. Suppose, if an
old transaction Ti has timestamp TS(Ti), a new transaction Tj is assigned timestamp TS(Tj)
such that TS(Ti) < TS(Tj).The protocol manages concurrent execution such that the
timestamps determine the serializability order. The timestamp ordering protocol ensures that
any conflicting read and write operations are executed in timestamp order. Whenever some
Transaction T tries to issue a R_item(X) or a W_item(X), the Basic TO algorithm compares
the timestamp of T with R_TS(X) & W_TS(X) to ensure that the Timestamp order is not
violated. This describe the Basic TO protocol in following two cases.
1. Whenever a Transaction T issues a W_item(X) operation, check the following
conditions:
1.
• If R_TS(X) > TS(T) or if W_TS(X) > TS(T), then abort and rollback T and reject
the operation. else,
• Execute W_item(X) operation of T and set W_TS(X) to TS(T).
2. Whenever a Transaction T issues a R_item(X) operation, check the following
conditions:

• If W_TS(X) > TS(T), then abort and reject T and reject the operation, else
• If W_TS(X) <= TS(T), then execute the R_item(X) operation of T and set
R_TS(X) to the larger of TS(T) and current R_TS(X).
Whenever the Basic TO algorithm detects twp conflicting operation that occur in incorrect
order, it rejects the later of the two operation by aborting the Transaction that issued it.
Schedules produced by Basic TO are guaranteed to be conflict serializable. Already
discussed that using Timestamp, can ensure that our schedule will be deadlock free.
One drawback of Basic TO protocol is that it Cascading Rollbackis still possible. Suppose
we have a Transaction T1 and T2 has used a value written by T1. If T1 is aborted and
resubmitted to the system then, T must also be aborted and rolled back. So the problem of
Cascading aborts still prevails.
Let’s gist the Advantages and Disadvantages of Basic TO protocol:
• Timestamp Ordering protocol ensures serializablity
• Timestamp protocol ensures freedom from deadlock as no transaction ever waits.
• But the schedule may not be cascade free, and may not even be recoverable.
Optimistic Concurrency Control Algorithm
In systems with low conflict rates, the task of validating every transaction for serializability
may lower performance. In these cases, the test for serializability is postponed to just before
commit. Since the conflict rate is low, the probability of aborting transactions which are not
serializable is also low. This approach is called optimistic concurrency control technique.

In this approach, a transaction’s life cycle is divided into the following three phases −

• Execution Phase − A transaction fetches data items to memory and performs


operations upon them.

• Validation Phase − A transaction performs checks to ensure that committing its


changes to the database passes serializability test.

• Commit Phase − A transaction writes back modified data item in memory to the
disk.

This algorithm uses three rules to enforce serializability in validation phase −

Rule 1 − Given two transactions Ti and Tj, if Ti is reading the data item which Tj is writing,
then Ti’s execution phase cannot overlap with Tj’s commit phase. Tj can commit only after
Ti has finished execution.

Rule 2 − Given two transactions Ti and Tj, if Ti is writing the data item that Tj is reading,
then Ti’s commit phase cannot overlap with Tj’s execution phase. Tj can start executing only
after Ti has already committed.

Rule 3 − Given two transactions Ti and Tj, if Ti is writing the data item which Tj is also
writing, then Ti’s commit phase cannot overlap with Tj’s commit phase. Tj can start to
commit only after Ti has already committed.

You might also like