Concurrency control
Concurrency control in Database Management Systems (DBMS) refers to the techniques used to manage
simultaneous access to data by multiple users or transactions, ensuring that transactions operate correctly
and consistently in a multi-user environment. Concurrency control prevents problems such as lost
updates, uncommitted data, and inconsistent reads that can occur when multiple transactions are
executed concurrently.
Transaction
In the context of databases, a transaction is a unit of work performed against a database
management system (DBMS). It's a logical unit of work that must be either entirely completed or
aborted (rolled back), ensuring data integrity and consistency.
Properties of a Transaction
In a database management system (DBMS), a transaction is a logical unit of work that is performed as a
single, indivisible operation. Transactions must exhibit four key properties, often abbreviated as ACID:
1. Atomicity: This property ensures that either all operations of a transaction are completed
successfully, or none of them are. If any part of a transaction fails, the entire transaction is
aborted, and the database is left unchanged (i.e., in the state it was in before the transaction
started).
2. Consistency: This property ensures that a transaction can only bring the database from one valid
state to another. It guarantees that integrity constraints, such as foreign key constraints or
uniqueness constraints, are not violated during the execution of a transaction.
3. Isolation: This property ensures that the intermediate state of a transaction is not visible to other
transactions until the transaction is committed. This prevents interference between concurrent
transactions, ensuring that each transaction is executed as if it were the only transaction running
in the system.
4. Durability: This property ensures that once a transaction has been committed, its effects persist
even in the event of a system failure. The changes made by the transaction are recorded in non-
volatile storage (such as a disk) and are not lost.
These properties are crucial for ensuring the reliability and integrity of data in a DBMS, especially in multi-
user environments where multiple transactions may be executed concurrently.
Concurrency Control Problems
Concurrency control is a crucial aspect of database management systems (DBMS) to ensure the
consistency and integrity of data in a multi-user environment. However, it introduces several challenges,
known as concurrency control problems. Here are some common problems and their explanations:
1. Lost Updates: This problem occurs when two or more transactions read the same data item and
then update it independently. If one transaction's update is lost due to the other transaction
overwriting it, the first transaction's update is lost.
2. Uncommitted Dependency (Dirty Read): A transaction reads a data item that has been modified
by another transaction but not yet committed. If the other transaction is rolled back, the data
read by the first transaction becomes invalid.
3. Inconsistent Analysis (Non-repeatable Read): A transaction reads the same data item multiple
times but obtains different results because another transaction has modified the data item in
between the reads.
4. Phantom Reads: A transaction reads a set of records that satisfy a certain condition, but when it
tries to read the same records again, it finds that there are new records that satisfy the condition.
This occurs due to the insertion or deletion of records by other transactions.
5. Concurrency Control Overhead: Implementing concurrency control mechanisms such as locking
or timestamping adds overhead to the system, affecting its performance.
Concurrency Control Protocols
Concurrency control protocols are the set of rules which are maintained in order to solve the concurrency
control problems in the database. It ensures that the concurrent transactions can execute properly while
maintaining the database consistency. The concurrent execution of a transaction is provided with
atomicity, consistency, isolation, durability, and serializability via the concurrency control protocols.
Locked based concurrency control protocol
Timestamp based concurrency control protocol
Locked based Protocol
In locked based protocol, each transaction needs to acquire locks before they start accessing or modifying
the data items. There are two types of locks used in databases.
Shared Lock : Shared lock is also known as read lock which allows multiple transactions to read
the data simultaneously. The transaction which is holding a shared lock can only read the data
item but it can not modify the data item.
Exclusive Lock : Exclusive lock is also known as the write lock. Exclusive lock allows a transaction
to update a data item. Only one transaction can hold the exclusive lock on a data item at a time.
While a transaction is holding an exclusive lock on a data item, no other transaction is allowed to
acquire a shared/exclusive lock on the same data item.
There are two kind of lock based protocol mostly used in database:
Two Phase Locking Protocol : Two phase locking is a widely used technique which ensures strict
ordering of lock acquisition and release. Two phase locking protocol works in two phases.
Growing Phase : In this phase, the transaction starts acquiring locks before performing
any modification on the data items. Once a transaction acquires a lock, that lock can not
be released until the transaction reaches the end of the execution.
Shrinking Phase : In this phase, the transaction releases all the acquired locks once it
performs all the modifications on the data item. Once the transaction starts releasing the
locks, it can not acquire any locks further.
Strict Two Phase Locking Protocol : It is almost similar to the two phase locking protocol the only
difference is that in two phase locking the transaction can release its locks before it commits, but
in case of strict two phase locking the transactions are only allowed to release the locks only when
they performs commits.
Timestamp based Protocol
In this protocol each transaction has a timestamp attached to it. Timestamp is nothing but the
time in which a transaction enters into the system.
The conflicting pairs of operations can be resolved by the timestamp ordering protocol through
the utilization of the timestamp values of the transactions. Therefore, guaranteeing that the
transactions take place in the correct order.
Serializability in DBMS
Schedule is an order of multiple transactions executing in concurrent environment.
Serial Schedule: The schedule in which the transactions execute one after the other is called serial
schedule. It is consistent in nature.
For example: Consider following two transactions T1 and T2.
T1 | T2
----------|----------
Read(A) |
Write(A) |
Read(B) |
Write(B) |
| Read(A)
| Write(A)
| Read(B)
| Write(B)
All the operations of transaction T1 on data items A and then B executes and then in transaction T2 all the
operations on data items A and B execute.
Non Serial Schedule: The schedule in which operations present within the transaction are intermixed. This
may lead to conflicts in the result or inconsistency in the resultant data.
For example- Consider following two transactions,
T1 | T2
----------|----------
Read(A) |
Write(A) |
| Read(A)
| Write(B)
Read(A) |
Write(B) |
| Read(B)
| Write(B)
The above transaction is said to be non serial which result in inconsistency or conflicts in the data.
What is serializability? How it is tested?
Serializability is the property that ensures that the concurrent execution of a set of transactions produces
the same result as if these transactions were executed one after the other without any overlapping, i.e.,
serially.
Why is Serializability Important?
In a database system, for performance optimization, multiple transactions often run concurrently. While
concurrency improves performance, it can introduce several data inconsistency problems if not managed
properly. Serializability ensures that even when transactions are executed concurrently, the database
remains consistent, producing a result that's equivalent to a serial execution of these transactions.
Testing for serializability in DBMS
Testing for serializability in a DBMS involves verifying if the interleaved execution of transactions
maintains the consistency of the database. The most common way to test for serializability is using a
precedence graph (also known as a serializability graph or conflict graph).
Types of Serializability
1. Conflict Serializability
2. View Serializability
Conflict Serializability
Conflict serializability is a form of serializability where the order of non-conflicting operations is not
significant. It determines if the concurrent execution of several transactions is equivalent to some serial
execution of those transactions.
Two operations are said to be in conflict if:
They belong to different transactions.
They access the same data item.
At least one of them is a write operation.
Examples of non-conflicting operations
T1 | T2
----------|----------
Read(A) | Read(A)
Read(A) | Read(B)
Write(B) | Read(A)
Read(B) | Write(A)
Write(A) | Write(B)
Examples of conflicting operations
T1 | T2
----------|----------
Read(A) | Write(A)
Write(A) | Read(A)
Write(A) | Write(A)
A schedule is conflict serializable if it can be transformed into a serial schedule (i.e., a schedule with no
overlapping transactions) by swapping non-conflicting operations. If it is not possible to transform a given
schedule to any serial schedule using swaps of non-conflicting operations, then the schedule is not conflict
serializable.
To determine if S is conflict serializable:
Precedence Graph (Serialization Graph): Create a graph where:
Nodes represent transactions.
Draw an edge from \( T_i \) to \( T_j \) if an operation in \( T_i \) precedes and conflicts with an operation
in \( T_j \).
For the given example:
T1 | T2
----------|----------
Read(A) |
| Read(A)
Write(A) |
| Read(B)
| Write(B)
\( R1(A) \) conflicts with W1(A), so there's an edge from T1 to T1, but this is ignored because they´re from
the same transaction.
R2(A) conflicts with W1(A), so there's an edge from T2 to T1.
No other conflicting pairs.
The graph has nodes T1 and T2 with an edge from T2 to T1. There are no cycles in this graph.
Decision: Since the precedence graph doesn't have any cycles,Cycle is a path using which we can start
from one node and reach to the same node. the schedule S is conflict serializable. The equivalent serial
schedules, based on the graph, would be T2 followed by T1.
View Serializability
View Serializability is one of the types of serializability in DBMS that ensures the consistency of a database
schedule. Unlike conflict serializability, which cares about the order of conflicting operations, view
serializability only cares about the final outcome. That is, two schedules are view equivalent if they have:
Initial Read: The same set of initial reads (i.e., a read by a transaction with no preceding write by
another transaction on the same data item).
Updated Read: For any other writes on a data item in between, if a transaction \(T_j\) reads the
result of a write by transaction \(T_i\) in one schedule, then \(T_j\) should read the result of a
write by \(T_i\) in the other schedule as well.
Final Write: The same set of final writes (i.e., a write by a transaction with no subsequent writes
by another transaction on the same data item).
Let's understand view serializability with an example:
Consider two transactions \(T_1\) and \(T_2\):
Schedule 1(S1):
| Transaction T1 | Transaction T2 |
|---------------------|---------------------|
| Write(A) | |
| | Read(A) |
| | Write(B) |
| Read(B) | |
| Write(B) | |
| Commit | Commit |
Schedule 2(S2):
| Transaction T1 | Transaction T2 |
|---------------------|---------------------|
| | Read(A) |
| Write(A) | |
| | Write(A) |
| Read(B) | |
| Write(B) | |
| Commit | Commit |
Here,
1. Both S1 and S2 have the same initial read of A by \(T_2\).
2. Both S1 and S2 have the final write of A by \(T_2\).
3. For intermediate writes/reads, in S1, \(T_2\) reads the value of A after \(T_1\) has written to it.
Similarly, in S2, \(T_2\) reads A which can be viewed as if it read the value after \(T_1\) (even
though in actual sequence \(T_2\) read it before \(T_1\) wrote it). The important aspect is the
view or effect is equivalent.
4. B is read and then written by \(T_1\) in both schedules.
Considering the above conditions, S1 and S2 are view equivalent. Thus, if S1 is serializable, S2 is also view
serializable.