Module 4 DBMS
Module 4 DBMS
SQL stands for Structured Query Language. It is used for storing and managing data in
relational database management system (RDMS). It is a standard language for Relational
Database System. It enables a user to create, read, update and delete relational
databases and tables. All the RDBMS like MySQL, Informix, Oracle, MS Access and SQL
Server use SQL as their standard database language. SQL allows users to query the
database in a number of ways, using English-like statements.
Rules:
Structure query language is not case sensitive. Generally, keywords of SQL are written in
uppercase. Statements of SQL are dependent on text lines. We can use a single SQL
statement on one or multiple text line. Using the SQL statements, you can perform most
of the actions in a database. SQL depends on tuple relational calculus and relational
algebra.
SQL process:
When an SQL command is executing for any RDBMS, then the system figure out the
best way to carry out the request and the SQL engine determines that how to interpret
the task. In the process, various components are included. These components can be
optimization Engine, Query engine, Query dispatcher, classic, etc. All the non-SQL
queries are handled by the classic query engine, but SQL query engine won't handle
logical files.
Characteristics of SQL
SQL is easy to learn. SQL is used to access data from relational database management
systems. SQL can execute queries against the database. SQL is used to describe the data.
SQL is used to define the data in the database and manipulate it when needed. SQL is
used to create and drop the database and table. SQL is used to create a view, stored
procedure, function in a database. SQL allows users to set permissions on tables,
procedures, and views.
Advantages of SQL
1) High speed: Using the SQL queries, the user can quickly and efficiently retrieve a
large amount of records from a database.
2) No coding needed: In the standard SQL, it is very easy to manage the database
system. It doesn't require a substantial amount of code to manage the database system.
3) Well defined standards: Long established are used by the SQL databases that are
being used by ISO and ANSI.
4) Portability : SQL can be used in laptop, PCs, server and even some mobile phones.
6) Multiple data view: Using the SQL language, the users can make different views of the
database structure.
SQL Data type is used to define the values that a column can contain. Every column is
required to have a name and data type in the database table.
SQL Commands
SQL commands are instructions. It is used to communicate with the database. It is also
used to perform specific tasks, functions, and queries of data. SQL can perform various
tasks like create a table, add data to tables, drop the table, modify the table, set
permission for users.
There are five types of SQL commands: DDL, DML, DCL, TCL, and DQL.
SQL COMMAND
o DDL changes the structure of the table like creating a table, deleting a table,
altering a table, etc.
o All the command of DDL are auto-committed that means it permanently save all
the changes in the database.
Here are some commands that come under DDL:
o CREATE
o ALTER
o DROP
o TRUNCATE
Syntax:
Example:
c. ALTER: It is used to alter the structure of the database. This change could be either to
modify the characteristics of an existing attribute or probably to add a new attribute.
Syntax:
Syntax:
Example:
o DML commands are used to modify the database. It is responsible for all form of
changes in the database.
o The command of DML is not auto-committed that means it can't permanently
save all the changes in the database. They can be rollback.
o INSERT
o UPDATE
o DELETE
a. INSERT: The INSERT statement is a SQL query. It is used to insert data into the row of
a table.
Syntax:
INSERT INTO TABLE_NAME(col1, col2, col3,.... col N) VALUES (value1, value2, value3, .... ……..
valueN);
Or
INSERT INTO TABLE_NAME VALUES (value1, value2, value3, .... valueN); .
b. UPDATE: This command is used to update or modify the value of a column in the
table.
Syntax:
Example
UPDATE students SET User_Name = 'Scott' WHERE Student_Id = '3'
Syntax:
o Grant
o Revoke
Example
Example
These operations are automatically committed in the database that's why they cannot
be used while creating tables or dropping them.
o COMMIT
o ROLLBACK
o SAVEPOINT
a. Commit: Commit command is used to save all the transactions to the database.
Syntax:
COMMIT;
Example :
b. Rollback: Rollback command is used to undo transactions that have not already been
saved to the database.
Syntax:
ROLLBACK;
Example
c. SAVEPOINT: It is used to roll the transaction back to a certain point without rolling
back the entire transaction.
Syntax:
SAVEPOINT SAVEPOINT_NAME;
o SELECT
a. SELECT: This is the same as the projection operation of relational algebra. It is used to
select the attribute based on the condition described by WHERE clause.
Syntax:
For example:
reference data. Anything which we make from create command is known as Database
Object. It can be used to hold and manipulate the data. Some of the examples of
Example :
CREATE VIEW salvu50 AS SELECT employee_id ID_NUMBER, last_name NAME,
salary*12 ANN_SALARY FROM employees WHERE department_id = 50;
Example :
CREATE INDEX emp_last_name_idx ON employees(last_name);
In the syntax:
PUBLIC : creates a synonym accessible to all users
synonym : is the name of the synonym to be created
object : identifies the object for which the synonym is created
Example :
CREATE SYNONYM d_sum FOR dept_sum_vu;
Views
Views in SQL are considered as a virtual table. A view also contains rows and columns.
To create the view, we can select the fields from one or more tables present in the
database. A view can either have specific rows based on certain condition or all the rows
of a table.
Student_Detail
1 Stephan Delhi
2 Kathrin Noida
3 David Ghaziabad
4 Alina Gurugram
Student_Marks
1 Stephan 97 19
2 Kathrin 86 21
3 David 74 18
4 Alina 90 20
5 John 96 18
1. Creating view
A view can be created using the CREATE VIEW statement. A view can be creted from a
single table or multiple tables.
Syntax:
CREATE VIEW DetailsView AS SELECT NAME, ADDRESS FROM Student_Details WHERE STU_ID
< 4;
Just like table query, the view can be queried to view the data.
NAME ADDRESS
Stephan Delhi
Kathrin Noida
David Ghaziabad
View from multiple tables can be created by simply include multiple tables in the SELECT
statement.
In the given example, a view is created named MarksView from two tables
Student_Detail and Student_Marks.
Query:
Stephan Delhi 97
Kathrin Noida 86
David Ghaziabad 74
Alina Gurugram 90
4. Deleting View
Syntax
SQL SEQUENCES
Example:
Indexes are special lookup tables. It is used to retrieve data from the database very fast. An
Index is used to speed up select queries and where clauses. But it shows down the data input
with insert and update statements. Indexes can be created or dropped without affecting the
data. An index in a database is just like an index in the back of a book. For example: When you
reference all pages in a book that discusses a certain topic, you first have to refer to the index,
which alphabetically lists all the topics and then referred to one or more specific page numbers.
Example:
SYNONYM
A SYNONYM provides another name for database object, referred to as original
object that may exist on a local or another server. A synonym belongs to
schema, name of synonym should be unique. A synonym cannot be original
object for an additional synonym and synonym cannot refer to user-defined
function.
The query below results in an entry for each synonym in database. This query
provides details about synonym metadata such as the name of synonym and
name of the base object.
select * from sys.synonyms ;
Assertions in DBMS
When a constraint involves 2 (or) more tables, the table constraint mechanism is
sometimes hard and results may not come as expected. To cover such situation SQL
supports the creation of assertions that are constraints not associated with only one
table. And an assertion statement should ensure a certain condition will always exist in
the database. DBMS always checks the assertion whenever modifications are done in the
corresponding table.
• Assertions = conditions that the database must always satisfy
• ASSERTION – verify one or more tables, one or more attributes Some constraints
cannot be expressed by using only domain constraints or referential-integrity
constraints; for example,
• “Every department must have at least five courses offered every semester” – must be
expressed as an assertion
Syntax –
Eg:-The total length of all movies by a given studio shall not exceed 10,000 minutes
CREATE ASSERTION sumLength CHECK (10000 >= ALL (SELECT SUM(length) FROM
Movies GROUP BY studioName ) )
Cursor
Cursor is a mechanism that provides a way to select multiple rows of data from the
database and then process each row individually inside a PL/SQL program. The cursor
first points at row1 and once it is processed it then advances to row2 and so on
TYPES OF CURSORS
1. IMPLICIT
2. EXPLICIT
IMPLICIT
• These are created by default when DML statements like, INSERT, UPDATE, and
DELETE statements are executed.
• The user is not aware of this happening & will not be able to control or process
the information.
● When an implicit cursor is working, DBMS performs the open, fetches and close
automatically Implicit cursors
Explicit cursors
• Explicit cursors are programmer defined cursors for gaining more control over the
context area.
▪ An explicit cursor should be defined in the declaration section of the PL/SQL Block.
Triggers are stored programs, which are automatically executed or fired when some
events occur. Triggers are written to be executed in response to any of the following
events −
• A database manipulation (DML) statement (DELETE, INSERT, or UPDATE)
• A database definition (DDL) statement (CREATE, ALTER, or DROP).
• A database operation (SERVERERROR, LOGON, LOGOFF, STARTUP, or
SHUTDOWN).
Triggers can be defined on the table, view, schema, or database with which the event is
associated.
4 • User events
Major Features
Benefits of Triggers
Where,
• CREATE [OR REPLACE] TRIGGER trigger_name − Creates or replaces an existing
trigger with the trigger_name.
• {BEFORE | AFTER | INSTEAD OF} − This specifies when the trigger will be
executed. The INSTEAD OF clause is used for creating trigger on a view.
• {INSERT [OR] | UPDATE [OR] | DELETE} − This specifies the DML operation.
• [OF col_name] − This specifies the column name that will be updated.
• [ON table_name] − This specifies the name of the table associated with the
trigger.
• [REFERENCING OLD AS o NEW AS n] − This allows you to refer new and old
values for various DML statements, such as INSERT, UPDATE, and DELETE.
• [FOR EACH ROW] − This specifies a row-level trigger, i.e., the trigger will be
executed for each row being affected. Otherwise the trigger will execute just
once when the SQL statement is executed, which is called a table level trigger.
• WHEN (condition) − This provides a condition for rows for which the trigger
would fire. This clause is valid only for row-level triggers.
Example
The following program creates a row-level trigger for the customers table that would
fire for INSERT or UPDATE or DELETE operations performed on the CUSTOMERS table.
This trigger will display the salary difference between the old values and new values −
Types of trigger
Row Triggers • A row trigger is fired each time the table is affected by the triggering
statement.
BEFORE triggers run the trigger action before the triggering statement is run.
situations: To eliminate unnecessary processing To derive specific column values.
AFTER triggers run the trigger action after the triggering statement is run.
System Events
User Events
DDL statements
DML statements
Stored Procedures
The most important part is parameters. Parameters are used to pass values to
the Procedure. There are 3 different types of parameters, they are as follows:
1. IN:
This is the Default Parameter for the procedure. It always receives the values
from calling program.
2. OUT:
This parameter always sends the values to the calling program.
3. IN OUT:
This parameter performs both the operations. It Receives value from as well
as sends the values to the calling program.
Example:
Imagine a table named with emp_table stored in Database. Writing a Procedure
to update a Salary of Employee with 1000.
Stored Functions
A stored function is a set of SQL statements that perform some operation and
return a single value.
Just like Mysql in-built function, it can be called from within a Mysql statement.
By default, the stored function is associated with the default database.
The CREATE FUNCTION statement requires CREATE ROUTINE database
privilege.
Syntax:
The syntax for CREATE FUNCTION statement is:
Parameters used:
1. function_name:
It is the name by which stored function is called. The name should not be
same as native(built_in) function. In order to associate routine explicitly with
a specific database function name should be given
as database_name.func_name.
2. func_parameter:
It is the argument whose value is used by the function inside its body. You
can’t specify to these parameters IN, OUT, INOUT. The parameter
declaration inside parenthesis is provided as func_parameter type. Here, type
represents a valid Mysql datatype.
3. datatype:
It is datatype of value returned by function.
4. characteristics:
The CREATE FUNCTION statement is accepted only if at least one of the
characteristics { DETERMINISTIC, NO SQL, or READS SQL DATA } is specified
in its declaration.
Example
find the number of years the employee has been in the company-
DELIMITER //
Embedded SQL
• To be set apart, they must be defined within a special section known as a declare
section.
char EmployeeID[7];
double Salary;
• Each host variable must be assigned a unique name even though declared in different
declaration section
Dynamic SQL
• Objective:
– Composing and executing new (not previously compiled) SQL statements
at run-time
• a program accepts SQL statements from the keyboard at run-time
• a point-and-click operation translates to certain SQL query
• Dynamic update is relatively simple; dynamic query can be complex
– because the type and number of retrieved attributes are unknown at
compile time
Example
varchar sqlupdatestring[256];
• Primary Storage − The memory storage that is directly accessible to the CPU
comes under this category. CPU's internal memory (registers), fast memory
(cache), and main memory (RAM) are directly accessible to the CPU, as they are
all placed on the motherboard or CPU chipset. This storage is typically very
small, ultra-fast, and volatile. Primary storage requires continuous power supply
in order to maintain its state. In case of a power failure, all its data is lost.
• Secondary Storage − Secondary storage devices are used to store data for
future use or as backup. Secondary storage includes memory devices that are
not a part of the CPU chipset or motherboard, for example, magnetic disks,
optical disks (DVD, CD, etc.), hard disks, flash drives, and magnetic tapes.
• Tertiary Storage − Tertiary storage is used to store huge volumes of data. Since
such storage devices are external to the computer system, they are the slowest
in speed. These storage devices are mostly used to take the back up of an entire
system. Optical disks and magnetic tapes are widely used as tertiary storage.
Memory Hierarchy
A computer system has a well-defined hierarchy of memory. A CPU has direct access to
it main memory as well as its inbuilt registers. The access time of the main memory is
obviously less than the CPU speed. To minimize this speed mismatch, cache memory is
introduced. Cache memory provides the fastest access time and it contains data that is
most frequently accessed by the CPU.
The memory with the fastest access is the costliest one. Larger storage devices offer
slow speed and they are less expensive, however they can store huge volumes of data
as compared to CPU registers or cache memory.
Storage Hierarchy
Besides the above, various other storage devices reside in the computer system. These
storage media are organized on the basis of data accessing speed, cost per unit of data
to buy the medium, and by medium's reliability. Thus, we can create a hierarchy of
storage media on the basis of its cost and speed.
In the image, the higher levels are expensive but fast. On moving down, the cost per bit
is decreasing, and the access time is increasing. Also, the storage media from the main
memory to up represents the volatile nature, and below the main memory, all are non-
volatile devices.
Magnetic Disks
Hard disk drives are the most common secondary storage devices in present computer
systems. These are called magnetic disks because they use the concept of
magnetization to store information. Hard disks consist of metal disks coated with
magnetizable material. These disks are placed vertically on a spindle. A read/write head
moves in between the disks and is used to magnetize or de-magnetize the spot under
it. A magnetized spot can be recognized as 0 (zero) or 1 (one).
Hard disks are formatted in a well-defined order to store data efficiently. A hard disk
plate has many concentric circles on it, called tracks. Every track is further divided
into sectors. A sector on a hard disk typically stores 512 bytes of data.
RAID 0
In this level, a striped array of disks is implemented. The data is broken down into
blocks and the blocks are distributed among disks. Each disk receives a block of data to
write/read in parallel. It enhances the speed and performance of the storage device.
There is no parity and backup in Level 0.
RAID 1
RAID 1 uses mirroring techniques. When data is sent to a RAID controller, it sends a
copy of data to all the disks in the array. RAID level 1 is also called mirroring and
provides 100% redundancy in case of a failure.
RAID 2
RAID 2 records Error Correction Code using Hamming distance for its data, striped on
different disks. Like level 0, each data bit in a word is recorded on a separate disk and
ECC codes of the data words are stored on a different set disks. Due to its complex
structure and high cost, RAID 2 is not commercially available.
RAID 3
RAID 3 stripes the data onto multiple disks. The parity bit generated for data word is
stored on a different disk. This technique makes it to overcome single disk failures.
RAID 4
In this level, an entire block of data is written onto data disks and then the parity is
generated and stored on a different disk. Note that level 3 uses byte-level striping,
whereas level 4 uses block-level striping. Both level 3 and level 4 require at least three
disks to implement RAID.
RAID 5
RAID 5 writes whole data blocks onto different disks, but the parity bits generated for
data block stripe are distributed among all the data disks rather than storing them on a
different dedicated disk.
RAID 6
RAID 6 is an extension of level 5. In this level, two independent parities are generated
and stored in distributed fashion among multiple disks. Two parities provide additional
fault tolerance. This level requires at least four disk drives to implement RAID.
File Structure
A file is a sequence of records stored in binary format. A disk drive is formatted into several
blocks that can store records. File records are mapped onto those disk blocks.
File Organization
File Organization defines how file records are mapped onto disk blocks. We have four
types of File Organization to organize file records −
File Operations
Operations on database files can be broadly classified into two categories −
• Update Operations
• Retrieval Operations
Update operations change the data values by insertion, deletion, or update. Retrieval
operations, on the other hand, do not alter the data but retrieve them after optional
conditional filtering. In both types of operations, selection plays a significant role. Other
than creation and deletion of a file, there could be several operations, which can be
done on files.
• Open − A file can be opened in one of the two modes, read mode or write
mode. In read mode, the operating system does not allow anyone to alter data.
In other words, data is read only. Files opened in read mode can be shared
among several entities. Write mode allows data modification. Files opened in
write mode can be read but cannot be shared.
• Locate − Every file has a file pointer, which tells the current position where the
data is to be read or written. This pointer can be adjusted accordingly. Using find
(seek) operation, it can be moved forward or backward.
• Read − By default, when files are opened in read mode, the file pointer points to
the beginning of the file. There are options where the user can tell the operating
system where to locate the file pointer at the time of opening a file. The very
next data to the file pointer is read.
• Write − User can select to open a file in write mode, which enables them to edit
its contents. It can be deletion, insertion, or modification. The file pointer can be
located at the time of opening or can be dynamically changed if the operating
system allows doing so.
• Close − This is the most important operation from the operating system’s point
of view. When a request to close a file is generated, the operating system
o removes all the locks (if in shared mode),
o saves the data (if altered) to the secondary storage media, and
o releases all the buffers and file handlers associated with the file.
The organization of data inside a file plays a major role here. The process to locate the
file pointer to a desired record inside a file various based on whether the records are
arranged sequentially or clustered.
Indexing
Data is stored in the form of records. Every record has a key field, which helps it to be
recognized uniquely.
Indexing is a data structure technique to efficiently retrieve records from the database
files based on some attributes on which the indexing has been done. Indexing in
database systems is similar to what we see in books.
Indexing is defined based on its indexing attributes. Indexing can be of the following
types −
• Primary Index − Primary index is defined on an ordered data file. The data file is
ordered on a key field. The key field is generally the primary key of the relation.
• Secondary Index − Secondary index may be generated from a field which is a
candidate key and has a unique value in every record, or a non-key with
duplicate values.
• Clustering Index − Clustering index is defined on an ordered data file. The data
file is ordered on a non-key field.
Ordered Indexing is of two types −
• Dense Index
• Sparse Index
Dense Index
In dense index, there is an index record for every search key value in the database. This
makes searching faster but requires more space to store index records itself. Index
records contain search key value and a pointer to the actual record on the disk.
Sparse Index
In sparse index, index records are not created for every search key. An index record
here contains a search key and an actual pointer to the data on the disk. To search a
record, we first proceed by index record and reach at the actual location of the data. If
the data we are looking for is not where we directly reach by following the index, then
the system starts sequential search until the desired data is found.
Multilevel Index
Index records comprise search-key values and data pointers. Multilevel index is stored
on the disk along with the actual database files. As the size of the database grows, so
does the size of the indices. There is an immense need to keep the index records in the
main memory so as to speed up the search operations. If single-level index is used,
then a large size index cannot be kept in memory which leads to multiple disk accesses.
Multi-level Index helps in breaking down the index into several smaller indices in order
to make the outermost level so small that it can be saved in a single disk block, which
can easily be accommodated anywhere in the main memory.
B Tree
+
A B tree is a balanced binary search tree that follows a multi-level index format. The
+
leaf nodes of a B tree denote actual data pointers. B tree ensures that all leaf nodes
+ +
remain at the same height, thus balanced. Additionally, the leaf nodes are linked using
a link list; therefore, a B tree can support random access as well as sequential access.
+
Structure of B Tree
+
Every leaf node is at equal distance from the root node. A B tree is of the
+
• Internal (non-leaf) nodes contain at least ⌈n/2⌉ pointers, except the root node.
• At most, an internal node can contain n pointers.
Leaf nodes −
• Leaf nodes contain at least ⌈n/2⌉ record pointers and ⌈n/2⌉ key values.
• At most, a leaf node can contain n record pointers and n key values.
• Every leaf node contains one block pointer P to point to next leaf node and forms a
linked list.
B Tree Insertion
+
• B trees are filled from bottom and each entry is done at the leaf node.
+
Hashing
For a huge database structure, it can be almost next to impossible to search all the
index values through all its level and then reach the destination data block to retrieve
the desired data. Hashing is an effective technique to calculate the direct location of a
data record on the disk without using index structure.
Hashing uses hash functions with search keys as parameters to generate the address of
a data record.
Hash Organization
• Bucket − A hash file stores data in bucket format. Bucket is considered a unit of
storage. A bucket typically stores one complete disk block, which in turn can
store one or more records.
• Hash Function − A hash function, h, is a mapping function that maps all the set
of search-keys K to the address where actual records are placed. It is a function
from search keys to bucket addresses.
Static Hashing
In static hashing, when a search-key value is provided, the hash function always
computes the same address. For example, if mod-4 hash function is used, then it shall
generate only 5 values. The output address shall always be same for that function. The
number of buckets provided remains unchanged at all times.
Operation
• Insertion − When a record is required to be entered using static hash, the hash
function h computes the bucket address for search key K, where the record will
be stored.
Bucket address = h(K)
• Search − When a record needs to be retrieved, the same hash function can be
used to retrieve the address of the bucket where the data is stored.
• Delete − This is simply a search followed by a deletion operation.
Bucket Overflow
The condition of bucket-overflow is known as collision. This is a fatal state for any
static hash function. In this case, overflow chaining can be used.
• Overflow Chaining − When buckets are full, a new bucket is allocated for the
same hash result and is linked after the previous one. This mechanism is
called Closed Hashing.
• Linear Probing − When a hash function generates an address at which data is
already stored, the next free bucket is allocated to it. This mechanism is
called Open Hashing.
Dynamic Hashing
The problem with static hashing is that it does not expand or shrink dynamically as the
size of the database grows or shrinks. Dynamic hashing provides a mechanism in which
data buckets are added and removed dynamically and on-demand. Dynamic hashing is
also known as extended hashing.
Hash function, in dynamic hashing, is made to produce a large number of values and
only a few are used initially.
Organization
The prefix of an entire hash value is taken as a hash index. Only a portion of the hash
value is used for computing bucket addresses. Every hash index has a depth value to
signify how many bits are used for computing a hash function. These bits can address
2n buckets. When all these bits are consumed − that is, when all the buckets are full −
then the depth value is increased linearly and twice the buckets are allocated.
Operation
• Querying − Look at the depth value of the hash index and use those bits to
compute the bucket address.
• Update − Perform a query as above and update the data.
• Deletion − Perform a query to locate the desired data and delete the same.
• Insertion − Compute the address of the bucket
o If the bucket is already full.
▪ Add more buckets.
▪ Add additional bits to the hash value.
▪ Re-compute the hash function.
o Else
▪ Add data to the bucket,
o If all the buckets are full, perform the remedies of static hashing.
Hashing is not favorable when the data is organized in some ordering and the queries
require a range of data. When data is discrete and random, hash performs the best.
Hashing algorithms have high complexity than indexing. All hash operations are done
in constant time.
DBMS - Transaction
A transaction can be defined as a group of tasks. A single task is the minimum processing unit
which cannot be divided further.
Properties of transaction
Atomicity − This property states that a transaction must be treated as an atomic unit, that is,
either all of its operations are executed or none. There must be no state in a database where a
transaction is left partially completed. States should be defined either before the execution of
the transaction or after the execution/abortion/failure of the transaction.
Consistency − The database must remain in a consistent state after any transaction. No
transaction should have any adverse effect on the data residing in the database. If the database
was in a consistent state before the execution of a transaction, it must remain consistent after
the execution of the transaction as well.
Durability − The database should be durable enough to hold all its latest updates even if the
system fails or restarts. If a transaction updates a chunk of data in a database and commits, then
the database will hold the modified data. If a transaction commits but the system fails before the
data could be written on to the disk, then that data will be updated once the system springs
back into action.
Isolation − In a database system where more than one transaction are being executed
simultaneously and in parallel, the property of isolation states that all the transactions will be
carried out and executed as if it is the only transaction in the system. No transaction will affect
the existence of any other transaction.
Serializability
When multiple transactions are being executed by the operating system in a multiprogramming
environment, there are possibilities that instructions of one transaction are interleaved with
some other transaction.
Serial Schedule − It is a schedule in which transactions are aligned in such a way that one
transaction is executed first. When the first transaction completes its cycle, then the next
transaction is executed. Transactions are ordered one after the other. This type of schedule is
called a serial schedule, as transactions are executed in a serial manner.
States of Transactions
Concurrent Transactions
A transaction is a unit of database processing which contains a set of operations. For example,
deposit of money, balance enquiry, reservation of tickets etc.
Every transaction starts with delimiters begin transaction and terminates with end transaction
delimiters. The set of operations within these two delimiters constitute one transaction.
There are three possible ways in which a transaction can be executed. These are as follows −
1. Serial execution.
2. Parallel execution.
3. Concurrent execution.
Advantages
1. Increases throughput which is nothing but number of transactions completed per unit time.
Disadvantage
The disadvantage is that the execution of concurrent transactions may result in inconsistency
Lock-Based Protocol
In this type of protocol, any transaction cannot read or write data until it acquires an appropriate
lock on it. There are two types of lock:
1. Shared lock:
It is also known as a Read-only lock. In a shared lock, the data item can only read by the
transaction. It can be shared between the transactions because when the transaction holds a
lock, then it can't update the data on the data item.
2. Exclusive lock:
In the exclusive lock, the data item can be both reads as well as written by the transaction. This
lock is exclusive, and in this lock, multiple transactions do not modify the same data
simultaneously.
Schedule
The serial schedule is a type of schedule where one transaction is executed completely before
starting another transaction. In the serial schedule, when the first transaction completes its cycle,
then the next transaction is executed.
For example: Suppose there are two transactions T1 and T2 which have some operations. If it
has no interleaving of operations, then there are the following two possible outcomes:
Execute all the operations of T1 which was followed by all the operations of T2.
Execute all the operations of T1 which was followed by all the operations of T2.
In the given (a) figure, Schedule A shows the serial schedule where T1 followed by T2.
In the given (b) figure, Schedule B shows the serial schedule where T2 followed by T1.
2. Non-serial Schedule
It contains many possible orders in which the system can execute the individual operations of
the transactions.
In the given figure (c) and (d), Schedule C and Schedule D are the non-serial schedules. It has
interleaving of operations.
3. Serializable schedule
The serializability of schedules is used to find non-serial schedules that allow the transaction to
execute concurrently without interfering with one another.
It identifies which schedules are correct when executions of the transaction have interleaving of
their operations.
A non-serial schedule will be serializable if its result is equal to the result of its transactions
executed serially.
Locks
It is the simplest way of locking the data while transaction. Simplistic lock-based protocols allow
all the transactions to get the lock on the data before insert or delete or update on it. It will
unlock the data item after completing the transaction.
Pre-claiming Lock Protocols evaluate the transaction to list all the data items on which they
need locks. Before initiating an execution of the transaction, it requests DBMS for all the lock on
all those data items. If all the locks are granted then this protocol allows the transaction to
begin. When the transaction is completed then it releases all the lock. If all the locks are not
granted then this protocol allows the transaction to rolls back and waits until all the locks are
granted.
The two-phase locking protocol divides the execution phase of the transaction into three parts.
In the first part, when the execution of the transaction starts, it seeks permission for the lock it
requires. In the second part, the transaction acquires all the locks. The third phase is started as
soon as the transaction releases its first lock. In the third phase, the transaction cannot demand
any new locks. It only releases the acquired locks.
The first phase of Strict-2PL is similar to 2PL. In the first phase, after acquiring all the locks, the
transaction continues to execute normally. The only difference between 2PL and strict 2PL is that
Strict-2PL does not release a lock after using it. Strict-2PL waits until the whole transaction to
commit, and then it releases all the locks at a time. Strict-2PL protocol does not have shrinking
phase of lock release
Growing phase: In the growing phase, a new lock on the data item may be acquired by the
transaction, but none can be released.
Shrinking phase: In the shrinking phase, existing lock held by the transaction may be released,
but no new locks can be acquired
Deadlock
In a multi-process system, deadlock is an unwanted situation that arises in a shared
resource environment, where a process indefinitely waits for a resource that is held by
another process.
For example, assume a set of transactions {T , T , T , ...,T }. T needs a resource X to
0 1 2 n 0
complete its task. Resource X is held by T , and T is waiting for a resource Y, which is
1 1
held by T . T is waiting for resource Z, which is held by T . Thus, all the processes wait
2 2 0
for each other to release resources. In this situation, none of the processes can finish
their task. This situation is known as a deadlock.
Deadlocks are not healthy for a system. In case a system is stuck in a deadlock, the
transactions involved in the deadlock are either rolled back or restarted.
Deadlock Prevention
To prevent any deadlock situation in the system, the DBMS aggressively inspects all the
operations, where transactions are about to execute. The DBMS inspects the operations
and analyzes if they can create a deadlock situation. If it finds that a deadlock situation
might occur, then that transaction is never allowed to be executed.
There are deadlock prevention schemes that use timestamp ordering mechanism of
transactions in order to predetermine a deadlock situation.
Wait-Die Scheme
In this scheme, if a transaction requests to lock a resource (data item), which is already
held with a conflicting lock by another transaction, then one of the two possibilities
may occur −
• If TS(T ) < TS(T ) − that is T , which is requesting a conflicting lock, is older than
i j i
• If TS(T ) > TS(t ) − that is T is younger than T − then T dies. T is restarted later
i j i j i i
Wound-Wait Scheme
In this scheme, if a transaction requests to lock a resource (data item), which is already
held with conflicting lock by some another transaction, one of the two possibilities may
occur −
• If TS(T ) < TS(T ), then T forces T to be rolled back − that is T wounds T . T is
i j i j i j j
restarted later with a random delay but with the same timestamp.
• If TS(T ) > TS(T ), then T is forced to wait until the resource is available.
i j i
This scheme, allows the younger transaction to wait; but when an older transaction
requests an item held by a younger one, the older transaction forces the younger one
to abort and release the item.
In both the cases, the transaction that enters the system at a later stage is aborted.
Deadlock Avoidance
Aborting a transaction is not always a practical approach. Instead, deadlock avoidance
mechanisms can be used to detect any deadlock situation in advance. Methods like
"wait-for graph" are available but they are suitable for only those systems where
transactions are lightweight having fewer instances of resource. In a bulky system,
deadlock prevention techniques may work well.
Wait-for Graph
This is a simple method available to track if any deadlock situation may arise. For each
transaction entering into the system, a node is created. When a transaction T requests i
for a lock on an item, say X, which is held by some other transaction T , a directed edge
j
is created from T to T . If T releases item X, the edge between them is dropped and
i j j
The system maintains this wait-for graph for every transaction waiting for some data
items held by others. The system keeps checking if there's any cycle in the graph.
Here, we can use any of the two following approaches −
• First, do not allow any request for an item, which is already locked by another
transaction. This is not always feasible and may cause starvation, where a
transaction indefinitely waits for a data item and can never acquire it.
• The second option is to roll back one of the transactions. It is not always feasible
to roll back the younger transaction, as it may be important than the older one.
With the help of some relative algorithm, a transaction is chosen, which is to be
aborted. This transaction is known as the victim and the process is known
as victim selection.
Data Backup
Loss of Volatile Storage
A volatile storage like RAM stores all the active logs, disk buffers, and related data. In
addition, it stores all the transactions that are being currently executed. What happens
if such a volatile storage crashes abruptly? It would obviously take away all the logs and
active copies of the database. It makes recovery almost impossible, as everything that is
required to recover the data is lost.
Following techniques may be adopted in case of loss of volatile storage −
• We can have checkpoints at multiple stages so as to save the contents of the
database periodically.
• A state of active database in the volatile memory can be
periodically dumped onto a stable storage, which may also contain logs and
active transactions and buffer blocks.
• <dump> can be marked on a log file, whenever the database contents are
dumped from a non-volatile memory to a stable one.
Recovery
• When the system recovers from a failure, it can restore the latest dump.
• It can maintain a redo-list and an undo-list as checkpoints.
• It can recover the system by consulting undo-redo lists to restore the state of all
transactions up to the last checkpoint.
Database Backup & Recovery from Catastrophic Failure
A catastrophic failure is one where a stable, secondary storage device gets corrupt.
With the storage device, all the valuable data that is stored inside is lost. We have two
different strategies to recover data from such a catastrophic failure −
• Remote backup ; Here a backup copy of the database is stored at a remote
location from where it can be restored in case of a catastrophe.
• Alternatively, database backups can be taken on magnetic tapes and stored at a
safer place. This backup can later be transferred onto a freshly installed database
to bring it to the point of backup.
Grown-up databases are too bulky to be frequently backed up. In such cases, we have
techniques where we can restore a database just by looking at its logs. So, all that we
need to do here is to take a backup of all the logs at frequent intervals of time. The
database can be backed up once a week, and the logs being very small can be backed
up every day or as frequently as possible.
Remote Backup
Remote backup provides a sense of security in case the primary location where the
database is located gets destroyed. Remote backup can be offline or real-time or
online. In case it is offline, it is maintained manually.
Online backup systems are more real-time and lifesavers for database administrators
and investors. An online backup system is a mechanism where every bit of the real-time
data is backed up simultaneously at two distant places. One of them is directly
connected to the system and the other one is kept at a remote place as backup.
As soon as the primary database storage fails, the backup system senses the failure and
switches the user system to the remote storage. Sometimes this is so instant that the
users can’t even realize a failure.
Data Recovery
Crash Recovery
DBMS is a highly complex system with hundreds of transactions being executed every
second. The durability and robustness of a DBMS depends on its complex architecture
and its underlying hardware and system software. If it fails or crashes amid transactions,
it is expected that the system would follow some sort of algorithm or techniques to
recover lost data.
Failure Classification
To see where the problem has occurred, a failure can be divided into various
categories, as follows −
Transaction failure
A transaction has to abort when it fails to execute or when it reaches a point from
where it can’t go any further. This is called transaction failure where only a few
transactions or processes are hurt.
Reasons for a transaction failure could be −
• Logical errors − Where a transaction cannot complete because it has some code
error or any internal error condition.
• System errors − Where the database system itself terminates an active
transaction because the DBMS is not able to execute it, or it has to stop because
of some system condition. For example, in case of deadlock or resource
unavailability, the system aborts an active transaction.
System Crash
There are problems − external to the system − that may cause the system to stop
abruptly and cause the system to crash. For example, interruptions in power supply
may cause the failure of underlying hardware or software failure.
Examples may include operating system errors.
Disk Failure
In early days of technology evolution, it was a common problem where hard-disk drives
or storage drives used to fail frequently.
Disk failures include formation of bad sectors, unreachability to the disk, disk head
crash or any other failure, which destroys all or a part of disk storage.
Storage Structure
We have already described the storage system. In brief, the storage structure can be
divided into two categories −
• Volatile storage − As the name suggests, a volatile storage cannot survive
system crashes. Volatile storage devices are placed very close to the CPU;
normally they are embedded onto the chipset itself. For example, main memory
and cache memory are examples of volatile storage. They are fast but can store
only a small amount of information.
• Non-volatile storage − These memories are made to survive system crashes.
They are huge in data storage capacity, but slower in accessibility. Examples may
include hard-disks, magnetic tapes, flash memory, and non-volatile (battery
backed up) RAM.
Log-based Recovery
Log is a sequence of records, which maintains the records of actions performed by a
transaction. It is important that the logs are written prior to the actual modification and
stored on a stable storage media, which is failsafe.
Log-based recovery works as follows −
• The log file is kept on a stable storage media.
• When a transaction enters the system and starts execution, it writes a log about
it.
<T , Start>
n
Checkpoint
Keeping and maintaining logs in real time and in real environment may fill out all the
memory space available in the system. As time passes, the log file may grow too big to
be handled at all. Checkpoint is a mechanism where all the previous logs are removed
from the system and stored permanently in a storage disk. Checkpoint declares a point
before which the DBMS was in consistent state, and all the transactions were
committed.
Recovery
When a system with concurrent transactions crashes and recovers, it behaves in the
following manner −
• The recovery system reads the logs backwards from the end to the last
checkpoint.
• It maintains two lists, an undo-list and a redo-list.
• If the recovery system sees a log with <T , Start> and <T , Commit> or just <T ,
n n n
Database Recovery:
The Database is prone to failures due to inconsistency, network failure, errors or any
kind of accidental damage. So, database recovery techniques are highly important to
bring a database back into a working state after a failure. There are four
different recovery techniques are available in the Database.
Archival Backup
Immediate Backup is kept in a floppy disk, hard disk or magnetic tapes. These come in
handy when a technical fault occurs in the primary database such as system failure, disk
crashes, network failure. Damage due to virus attacks repair using the immediate
backup.
Archival Backups are kept in mass storage devices such as magnetic tape, CD-ROMs,
Internet Servers etc. They are very useful for recovering data after a disaster such as fire,
earthquake, flood etc. Archival Backup should be kept at a different site other than
where the system is functioning. Archival Backup at a separate place remains safe from
thefts and international destruction by user staff.
4. Shadow Paging:
This system can use for data recovery instead of using transaction logs. In the Shadow
Paging, a database is divided into several fixed-sized disk pages, say n, thereafter a
current directory creates. It is having n entries with each entry pointing to a disk page in
the database. The current directory transfer to the main memory.
When a transaction begins executing, the current directory copies into a shadow
directory. Then, the shadow directory saves on the disk. The transaction will be using the
current directory. During the transaction execution, all the modifications are made on
the current directory and the shadow directory is never modified.