Unit 4:
1. Serializability of schedules
Scheduling is the technique of preserving the order of the operations from one transaction to
another while executing such concurrent transactions. A series of operations from one transaction to
another transaction is known as a schedule.
Transactions are a set of instructions that perform operations on databases. When multiple transactions are
running concurrently, then a sequence is needed in which the operations are to be performed because at a
time, only one operation can be performed on the database. This sequence of operations is known as
Schedule, and this process is known as Scheduling.
When multiple transactions execute simultaneously in an unmanageable manner, then it might lead to
several problems, which are known as concurrency problems. In order to overcome these problems,
scheduling is required.
Schedule
A series of operation from one transaction to another transaction is known as schedule. It is used to
preserve the order of the operation in each of the individual transaction.
Schedule, as the name suggests, is a process of lining the transactions and executing them one by one.
When there are multiple transactions that are running in a concurrent manner and the order of operation
is needed to be set so that the operations do not overlap each other, Scheduling is brought into play and
the transactions are timed accordingly.
Types of Schedules
There are mainly two types of scheduling -
1. Serial Schedule
2. Non-serial Schedule Further, they are divided into their subcategories, as shown below.
1. Serial Schedules:
Schedules in which the transactions are executed non-interleaved, i.e., a serial schedule is one in
which no transaction starts until a running transaction has ended are called serial schedules.
Example: Consider the following schedule involving two transactions T 1 and T2.
2. Non-Serial Schedules:
If a non-serial schedule can be transformed into its corresponding serial schedule, it is said to be
serializable. Simply said, a non-serial schedule is referred to as a serializable schedule if it yields the
same results as a serial timetable.
A schedule where the transactions are overlapping or switching places. As they are used to carry out
actual database operations, multiple transactions are running at once. It’s possible that these
transactions are focusing on the same data set. Therefore, it is crucial that non-serial schedules can be
serialized in order for our database to be consistent both before and after the transactions are executed.
Example:
Transaction-1 Transaction-2
R(a)
W(a)
R(b)
W(b)
Transaction-1 Transaction-2
R(b)
R(a)
W(b)
W(a)
We can observe that Transaction-2 begins its execution before Transaction-1 is finished, and they are
both working on the same data, i.e., “a” and “b”, interchangeably. Where “R”-Read, “W”-Write
a. Serializable:
This is used to maintain the consistency of the database. It is mainly used in the Non-Serial
scheduling to verify whether the scheduling will lead to any inconsistency or not. On the
other hand, a serial schedule does not need the serializability because it follows a transaction
only when the previous transaction is complete. The non-serial schedule is said to be in a
serializable schedule only when it is equivalent to the serial schedules, for an n number of
transactions. Since concurrency is allowed in this case thus, multiple transactions can
execute concurrently. A serializable schedule helps in improving both resource utilization
and CPU throughput. These are of two types:
1. Conflict Serializable:
A schedule is called conflict serializable if it can be transformed into a serial
schedule by swapping non-conflicting operations. Two operations are said to be
conflicting if all conditions satisfy:
They belong to different transactions
They operate on the same data item
At Least one of them is a write operation
2. View Serializable:
A Schedule is called view serializable if it is view equal to a serial schedule (no
overlapping transactions). A conflict schedule is a view serializable but if the
serializability contains blind writes, then the view serializable does not conflict
serializable.
b. Non-Serializable:
The non-serializable schedule is divided into two types, Recoverable and Non-recoverable
Schedule.
2. Conflict & view serializable schedule
Types of Serializability
There are two ways to check whether any non-serial schedule is serializable.
Types of Serializability – Conflict & View
Conflict Serializable
Concurrency serializability, also known as conflict serializability, is a type of
concurrency control that guarantees that the outcome of concurrent
transactions is the same as if the transactions were executed consecutively.
Conflict serializable schedules: A schedule is called conflict serializable if
it can be transformed into a serial schedule by swapping non-conflicting
operations.
Non-conflicting operations: When two operations operate on separate
data items or the same data item but at least one of them is a read
operation, they are said to be non-conflicting.
Conflicting Operations
Two operations are said to be conflicting if all conditions are satisfied:
They belong to different transactions
They operate on the same data item
At Least one of them is a write operation
Example:
Conflicting operations pair (R 1(A), W2(A)) because they belong to
two different transactions on the same data item A and one of them
is a write operation.
Similarly, (W1(A), W2(A)) and (W1(A), R2(A)) pairs are
also conflicting.
On the other hand, the (R 1(A), W2(B)) pair is non-
conflicting because they operate on different data items.
Similarly, ((W1(A), W2(B)) pair is non-conflicting.
Consider the following schedule:
S1: R1(A), W1(A), R2(A), W2(A), R1(B), W1(B), R2(B), W2(B)
If Oi and Oj are two operations in a transaction and O i< Oj (Oi is executed
before Oj), same order will follow in the schedule as well. Using this
property, we can get two transactions of schedule S1:
T1: R1(A), W1(A), R1(B), W1(B)
T2: R2(A), W2(A), R2(B), W2(B)
Possible Serial Schedules are: T1->T2 or T2->T1
-> Swapping non-conflicting operations R2(A) and R1(B) in S1, the
schedule becomes,
S11: R1(A), W1(A), R1(B), W2(A), R2(A), W1(B), R2(B), W2(B)
-> Similarly, swapping non-conflicting operations W2(A) and W1(B) in
S11, the schedule becomes,
S12: R1(A), W1(A), R1(B), W1(B), R2(A), W2(A), R2(B), W2(B)
S12 is a serial schedule in which all operations of T1 are performed before
starting any operation of T2. Since S has been transformed into a serial
schedule S12 by swapping non-conflicting operations of S1, S1 is conflict
serializable.
2. View Serializability
o A schedule will view serializable if it is view equivalent to a serial schedule.
o If a schedule is conflict serializable, then it will be view serializable.
2. Updated Read
In schedule S1, if Ti is reading A which is updated by Tj then in S2 also, Ti should
read A which is updated by Tj.
3. Deadlock Handling
What is Deadlock:
In a database management system (DBMS), a deadlock occurs when two or more transactions are
waiting for each other to release resources, such as locks on database objects, that they need to
complete their operations. As a result, none of the transactions can proceed, leading to a situation where
they are stuck or “deadlocked.”
Deadlocks can happen in multi-user environments when two or more transactions are running
concurrently and try to access the same data in a different order. When this happens, one transaction
may hold a lock on a resource that another transaction needs, while the second transaction may hold a
lock on a resource that the first transaction needs. Both transactions are then blocked, waiting for the
other to release the resource they need.
A deadlock is a condition where two or more transactions are waiting indefinitely for one another to give
up locks. Deadlock is said to be one of the most feared complications in DBMS as no task ever gets
finished and is in waiting state forever.
Example – let us understand the concept of Deadlock with an example :
Suppose, Transaction T1 holds a lock on some rows in the Students table and needs to update some
rows in the Grades table. Simultaneously, Transaction T2 holds locks on those very rows (Which T1
needs to update) in the Grades table but needs to update the rows in the Student table held by
Transaction T1.
Now, the main problem arises. Transaction T1 will wait for transaction T2 to give up the lock, and
similarly, transaction T2 will wait for transaction T1 to give up the lock. As a consequence, All activity
comes to a halt and remains at a standstill forever unless the DBMS detects the deadlock and aborts one
of the transactions.
Another Example:
Deadlock Avoidance
When a database is stuck in a deadlock, It is always better to avoid the deadlock rather than restarting
or aborting the database. The deadlock avoidance method is suitable for smaller databases whereas the
deadlock prevention method is suitable for larger databases.
One method of avoiding deadlock is using application-consistent logic. In the above-given example,
Transactions that access Students and Grades should always access the tables in the same order. In this
way, in the scenario described above, Transaction T1 simply waits for transaction T2 to release the lock
on Grades before it begins. When transaction T2 releases the lock, Transaction T1 can proceed freely.
Another method for avoiding deadlock is to apply both the row-level locking mechanism
Deadlock Detection
When a transaction waits indefinitely to obtain a lock, The database management system should detect
whether the transaction is involved in a deadlock or not.
The lock manager maintains a Wait for the graph to detect the deadlock cycle in the database.
o Wait-for-graph is one of the methods for detecting the deadlock situation. This method is
suitable for smaller databases. In this method, a graph is drawn based on the transaction and its
lock on the resource. If the graph created has a closed loop or a cycle, then there is a deadlock.
o The wait for the graph is maintained by the system for every transaction which is waiting for some
data held by the others. The system keeps checking the graph if there is any cycle in the graph.
The wait for a graph for the above scenario is shown below:
Deadlock prevention: For a large database, the deadlock prevention method is suitable. A deadlock can
be prevented if the resources are allocated in such a way that a deadlock never occurs. The DBMS
analyzes the operations whether they can create a deadlock situation or not, If they do, that transaction
is never allowed to be executed.
Wait-Die scheme
In this scheme, if a transaction requests for a resource which is already held with a conflicting lock by
another transaction then the DBMS simply checks the timestamp of both transactions. It allows the older
transaction to wait until the resource is available for execution.
Let's assume there are two transactions Ti and Tj and let TS(T) is a timestamp of any transaction T. If T2
holds a lock by some other transaction and T1 is requesting for resources held by T2 then the following
actions are performed by DBMS:
1. Check if TS(Ti) < TS(Tj) - If Ti is the older transaction and Tj has held some resource, then Ti is
allowed to wait until the data-item is available for execution. That means if the older transaction is
waiting for a resource which is locked by the younger transaction, then the older transaction is
allowed to wait for resource until it is available.
2. Check if TS(Ti) < TS(Tj) - If Ti is older transaction and has held some resource and if Tj is
waiting for it, then Tj is killed and restarted later with the random delay but with the same
timestamp.
Wound wait scheme
o In wound wait scheme, if the older transaction requests for a resource which is held by the
younger transaction, then older transaction forces younger one to kill the transaction and release
the resource. After the minute delay, the younger transaction is restarted but with the same
timestamp.
o If the older transaction has held a resource which is requested by the Younger transaction, then the
younger transaction is asked to wait until older releases it.
4. Concurrency Control
Concurrency Control is the management procedure that is required for controlling concurrent execution of
the operations that take place on a database.
Executing a single transaction at a time will increase the waiting time of the other
transactions which may result in delay in the overall execution. Hence for increasing the
overall throughput and efficiency of the system, several transactions are executed.
Concurrently control is a very important concept of DBMS which ensures the simultaneous
execution or manipulation of data by several processes or user without resulting in data
inconsistency.
Concurrency control provides a procedure that is able to control concurrent execution of the
operations in the database.
But before knowing about concurrency control, we should know about concurrent execution.
o In a multi-user system, multiple users can access and use the same database at one time, which is
known as the concurrent execution of the database. It means that the same database is executed
simultaneously on a multi-user system by different users.
o While working on the database transactions, there occurs the requirement of using the database by
multiple users for performing different operations, and in that case, concurrent execution of the
database is performed.
o The thing is that the simultaneous execution that is performed should be done in a manner, that no
operation should affect the other executing operations, thus maintaining the consistency of the
database. Thus, on making the concurrent execution of the transaction operations, there occur
several challenging problems that need to be solved.
Concurrency control is provided in a database to:
(i) enforce isolation among transactions.
(ii) preserve database consistency through consistency preserving execution of transactions.
(iii) resolve read-write and write-read conflicts.
Problems with Concurrent Execution
In a database transaction, the two main operations are READ and WRITE operations. So, there is a need
to manage these two operations in the concurrent execution of the transactions as if these operations are
not performed in an interleaved manner, and the data may become inconsistent. So, the following
problems occur with the Concurrent Execution of the operations:
Problem 1: Lost Update Problems (W - W Conflict)
The problem occurs when two different database transactions perform the read/write operations on the
same database items in an interleaved manner (i.e., concurrent execution) that makes the values of the
items incorrect hence making the database inconsistent.
Dirty Read Problems (W-R Conflict)
The dirty read problem occurs when one transaction updates an item of the database, and somehow the
transaction fails, and before the data gets rollback, the updated database item is accessed by another
transaction. There comes the Read-Write Conflict between both transactions.
For example:
Consider two transactions TX and TY in the below diagram performing read/write operations on
account A where the available balance in account A is $300:
Lock-Based Protocol
In this type of protocol, any transaction cannot read or write data until it acquires an appropriate lock on
it. There are two types of lock:
1. Shared lock:
o It is also known as a Read-only lock. In a shared lock, the data item can only read by the
transaction.
o It can be shared between the transactions because when the transaction holds a lock, then it can't
update the data on the data item.
2. Exclusive lock:
o In the exclusive lock, the data item can be both reads as well as written by the transaction.
o This lock is exclusive, and in this lock, multiple transactions do not modify the same data
simultaneously.
There are four types of lock protocols available:
1. Simplistic lock protocol
It is the simplest way of locking the data while transaction. Simplistic lock-based protocols allow all the
transactions to get the lock on the data before insert or delete or update on it. It will unlock the data item
after completing the transaction.
2. Pre-claiming Lock Protocol
o Pre-claiming Lock Protocols evaluate the transaction to list all the data items on which they need
locks.
o Before initiating an execution of the transaction, it requests DBMS for all the lock on all those
data items.
o If all the locks are granted then this protocol allows the transaction to begin. When the transaction
is completed then it releases all the lock.
o If all the locks are not granted then this protocol allows the transaction to rolls back and waits
until all the locks are granted.
There are two phases of 2PL:
Growing phase: In the growing phase, a new lock on the data item may be acquired by the transaction,
but none can be released.
Shrinking phase: In the shrinking phase, existing lock held by the transaction may be released, but no
new locks can be acquired.
In the below example, if lock conversion is allowed then the following phase can happen:
1. Upgrading of lock (from S(a) to X (a)) is allowed in growing phase.
2. Downgrading of lock (from X(a) to S(a)) must be done in shrinking phase.
o The Timestamp Ordering Protocol is used to order the transactions based on their Timestamps.
The order of transaction is nothing but the ascending order of the transaction creation.
o The priority of the older transaction is higher that's why it executes first. To determine the
timestamp of the transaction, this protocol uses system time or logical counter.
o The lock-based protocol is used to manage the order between conflicting pairs among transactions
at the execution time. But Timestamp based protocols start working as soon as a transaction is
created.
o Let's assume there are two transactions T1 and T2. Suppose the transaction T1 has entered the
system at 007 times and transaction T2 has entered the system at 009 times. T1 has the higher
priority, so it executes first as it is entered the system first.
o The timestamp ordering protocol also maintains the timestamp of last 'read' and 'write' operation
on a data.
Basic Timestamp ordering protocol works as follows:
1. Check the following condition whenever a transaction Ti issues a Read (X) operation:
o If W_TS(X) >TS(Ti) then the operation is rejected.
o If W_TS(X) <= TS(Ti) then the operation is executed.
o Timestamps of all the data items are updated.
2. Check the following condition whenever a transaction Ti issues a Write(X) operation:
o If TS(Ti) < R_TS(X) then the operation is rejected.
o If TS(Ti) < W_TS(X) then the operation is rejected and Ti is rolled back otherwise the operation is
executed.
Where,
TS(TI) denotes the timestamp of the transaction Ti.
R_TS(X) denotes the Read time-stamp of data-item X.
W_TS(X) denotes the Write time-stamp of data-item X.
3. Validation Based Protocol (don’t read for UT 3)
Validation phase is also known as optimistic concurrency control technique. In the validation based
protocol, the transaction is executed in the following three phases:
1. Read phase: In this phase, the transaction T is read and executed. It is used to read the value of
various data items and stores them in temporary local variables. It can perform all the write
operations on temporary variables without an update to the actual database.
2. Validation phase: In this phase, the temporary variable value will be validated against the actual
data to see if it violates the serializability.
3. Write phase: If the validation of the transaction is validated, then the temporary results are
written to the database or system otherwise the transaction is rolled back.
Here each phase has the following different timestamps:
Start(Ti): It contains the time when Ti started its execution.
Validation (Ti): It contains the time when Ti finishes its read phase and starts its validation phase.
Finish(Ti): It contains the time when Ti finishes its write phase.
o his protocol is used to determine the time stamp for the transaction for serialization using the time
stamp of the validation phase, as it is the actual phase which determines if the transaction will
commit or rollback.
o Hence TS(T) = validation(T).
o The serializability is determined during the validation process. It can't be decided in advance.
o While executing the transaction, it ensures a greater degree of concurrency and also less number
of conflicts.
o Thus it contains transactions which have less number of rollbacks.
5. Distributed Databases
A distributed database represents multiple interconnected databases spread out across several sites
connected by a network. Since the databases are all connected, they appear as a single database to the
users.
A distributed database is basically a database that is not limited to one system, it is spread over different
sites, i.e, on multiple computers or over a network of computers. A distributed database system is
located on various sites that don’t share physical components. This may be required when a particular
database needs to be accessed by various users globally. It needs to be managed such that for the users
it looks like one single database.
Distributed databases utilize multiple nodes. They scale horizontally and develop a distributed system.
More nodes in the system provide more computing power, offer greater availability, and resolve
the single point of failure issue.
Different parts of the distributed database are stored in several physical locations, and the processing
requirements are distributed among processors on multiple database nodes.
Types:
1. Homogeneous Database:
In a homogeneous database, all different sites store database identically. The operating system,
database management system, and the data structures used – all are the same at all sites. Hence,
they’re easy to manage.
2. Heterogeneous Database:
In a heterogeneous distributed database, different sites can use different schema and software that can
lead to problems in query processing and transactions. Also, a particular site might be completely
unaware of the other sites. Different computers may use a different operating system, different database
application. They may even use different data models for the database. Hence, translations are required
for different sites to communicate.
Distributed Data Storage :
There are 2 ways in which data can be stored on different sites. These are:
1. Replication –
In this approach, the entire relationship is stored redundantly at 2 or more sites. If the entire database is
available at all sites, it is a fully redundant database. Hence, in replication, systems maintain copies of
data.
This is advantageous as it increases the availability of data at different sites. Also, now query requests
can be processed in parallel.
However, it has certain disadvantages as well. Data needs to be constantly updated. Any change made
at one site needs to be recorded at every site that relation is stored or else it may lead to inconsistency.
This is a lot of overhead. Also, concurrency control becomes way more complex as concurrent access
now needs to be checked over a number of sites.
2. Fragmentation –
In this approach, the relations are fragmented (i.e., they’re divided into smaller parts) and each of the
fragments is stored in different sites where they’re required. It must be made sure that the fragments are
such that they can be used to reconstruct the original relation (i.e, there isn’t any loss of data).
Fragmentation is advantageous as it doesn’t create copies of data, consistency is not a problem.
Fragmentation of relations can be done in two ways:
Horizontal fragmentation – Splitting by rows –
The relation is fragmented into groups of tuples so that each tuple is assigned to at least one
fragment.
Vertical fragmentation – Splitting by columns –
The schema of the relation is divided into smaller schemas. Each fragment must contain a
common candidate key so as to ensure a lossless join.
Applications of Distributed Database:
It is used in Corporate Management Information System.
It is used in multimedia applications.
Used in Military’s control system, Hotel chains etc.
It is also used in manufacturing control system.
6. Comparative study of OODBMS Vs RDBMS
RDBMS:
RDBMS stands for Relational Database Management System. It is a database management system
based on the relational model i.e. the data and relationships are represented by a collection of inter-
related tables. It is a DBMS that enables the user to create, update, administer and interact with a
relational database. RDBMS is the basis for SQL, and for all modern database systems like MS SQL
Server, IBM DB2, Oracle, MySQL, and Microsoft Access.
OODBMS:
OODBMS stands for Object-Oriented Database Management System. It is a DBMS where data is
represented in the form of objects, as used in object-oriented programming. OODB implements object-
oriented concepts such as classes of objects, object identity, polymorphism, encapsulation, and
inheritance. An object-oriented database stores complex data as compared to relational database. Some
examples of OODBMS are Versant Object Database, Objectivity/DB, ObjectStore, Caché and ZODB.
Difference Between RDBMS and OODBMS:
BASIS RDBMS OODBMS
Stands for Relational Database Stands for Object Oriented Database
Long Form
Management System. Management System.
Way of storing Stores data in Entities, defined as
Stores data as Objects.
data tables hold specific information.
Data Handles comparatively simpler Handles larger and complex data than
Complexity data. RDBMS.
BASIS RDBMS OODBMS
Entity type refers to the collection Class describes a group of objects that have
Grouping of entity that share a common common relationships, behaviors, and also
definition. have similar properties.
Data Handling RDBMS stores only data. Stores data as well as methods to use it.
Data Independence from
Main Objective Data Encapsulation.
application program.
An object identifier (OID) is an unambiguous,
A Primary key distinctively
Key long-term name for any type of object or
identifies an object in a table..
entity.
SQL (Structured Query
Data Retrieval Object Query Language (OQL)
Language)
RDBMS has Limited scalability OODBMS has Highly scalable due to flexible
Scalability
due to rigid schema schema
Concurrency OODBMS has Optimistic concurrency
RDBMS has Fine-grained locking
Control control
In RDBMS Relational data is
Data In OODBMS faster for complex object-
stored in tables and linked via
Relationships oriented queries
foreign keys
RDBMS is Efficient for complex OODBMS is Faster for complex object-
Performance
queries involving multiple tables oriented queries
RDBMS has Limited flexibility OODBMS has highly flexible due to object-
Flexibility
due to fixed schema oriented nature
Data In RDBMS Data is stored in In OODBMS Data is stored in objects in
Persistence tables on disk memory or on disk
Examples MySQL, Oracle, SQL Server db4o, Versant, Objectivity/DB