Unit-4 DBMS Material
Unit-4 DBMS Material
Query processing
Deadlock
A deadlock is a condition when two or more transactions are executing and each
transaction is waiting for the other to finish but none of them are ever finished. So all the
transactions will wait for infinite time and not a single transaction is completed.
Wait-for-graph
Table 1
Held by Wait for
Transaction 1 Transaction 2
In the above figure there are two transactions 1 and 2 and two table’s as table1 and
table 2.
Transaction 1 hold table 1 and wait for table 2. Transaction 2 hold table 2 and wait for
table 1.
Now the table 1 is wanted by transaction 2 and that is hold by transaction 1 and same way
table 2 is wanted by transaction 1 and that is hold by transaction 2. Until any one can’t get
this table they can’t precede further so this is called wait for graph. Because both of these
transaction have to wait for some resources.
When dead lock occurs
A deadlock occurs when two separate processes struggle for resources are held by one
another.
Deadlocks can occur in any concurrent system where processes wait for each other and a
cyclic chain can arise with each process waiting for the next one in the chain.
Deadlock can occur in any system that satisfies the four conditions:
1. Mutual Exclusion Condition: only one process at a time can use a resource or
each resource assigned to 1 process or is available.
2. Hold and Wait Condition: processes already holding resources may request new
resources.
1
3. No Preemption Condition: only a process holding a resource can release it
voluntarily after that process has completed its task or previously granted resources
cannot forcibly taken away from any process.
4. Circular Wait Condition: two or more processes forms circular chain where each
process requests a resource that the next process in the chain holds.
R1 R3 R1 R3
P1 P2 P3 P1 P2 P3
R2 R4 R2 R4
Deadlock Recovery:
When a detection algorithm determines that a deadlock exists then there are several available
alternatives. There one possibility and that is to inform the operator about the deadlock and let him
deal with this problem manually.
Another possibility is to let the system recover from the deadlock automatically. These are two
options that are mainly used to break the deadlock.
Process termination
2
Kill a Process
In this process, the process responsible for the Deadlock is terminated. However, selecting
which process to kill can be difficult. As a result, primarily kills the process that no longer
works.
RollBack
A rollback of the system to the earlier safe state is possible. This requires the implementation of
checkpoints in every state.
When we identify Deadlock, we must roll back all allocations and return to the previous safe
state.
Wait/Die Wound/Wait
O needs a resource held by Y O waits Y dies
Y needs a resource held by O Y dies Y waits
4
resources may be allocated and remain unused for long periods. Also, a process
requiring a popular resource may have to wait indefinitely; as such a resource may
always be allocated to some process, resulting in resourcestarvation.
3. No Preemption Condition: If a process that is holding some resources requests
another resource that cannot be immediately allocated to it, then all resources
currently being held are released. Preempted resources are added to the list of
resources for which the process is waiting. Process will be restarted only when it can
regain its old resources, as well as the new ones that it is requesting.
4. Circular Wait Condition: Approaches that avoid circular waits include disabling
interrupts during critical sections and using a hierarchy to determine a partial
ordering of resources and resources are requested in the increasing order of the
enumeration.
Transaction – T0 Transaction – T1
Read (A) Read (C)
A =A -100 C=C-200
Write (A) Write (C)
Read (B)
B =B+ 100
Write (B)
The following figure shows the transaction log for above two transactions at three
different instances of time.
Time Tc Tf
T1
T2
T3
T4
9
updated or none of them, so that the databases remain synchronized.
• In two phase commit protocol there is one node which is act as a coordinator and all
other participating node are known as cohorts or participant.
• Coordinator – the component that coordinates with all the participants.
• Cohorts (Participants) – each individual node except coordinator are participant.
• As the name suggests, the two phase commit protocol involves two phases.
2. The first phase is Commit Request phase OR phase 1
3. The second phase is Commit phase OR phase 2
Commit Request Phase (Obtaining Decision)
To commit the transaction, the coordinator sends a request asking for “ready for
commit” to each cohort.
The coordinator waits until it has received a reply from all cohorts to “vote” on the
request.
Each participant votes by sending a message back to the coordinator as follows:
It votes YES if it is prepared to commit
It may vote NO for any reason if it cannot prepare the transaction due to a local
failure.
It may delay in voting because cohort was busy with other work.
Commit Phase (Performing Decision)
If the coordinator receives YES response from all cohorts, it decides to commit. The
transaction is now officially committed. Otherwise, it either receives a NO response or
gives up waiting for some cohort, so it decides to abort.
The coordinator sends its decision to all participants (i.e. COMMIT or ABORT).
Participants acknowledge receipt of commit or about by replying DONE.
Explain Shadow Paging Technique.
Concept
Shadow paging is an alternative to transaction-log based recovery techniques.
Here, database is considered as made up of fixed size disk blocks, called pages. These
pages are mapped to physical storage using a table, called page table.
The page table is indexed by a page number of the database. The information about
physical pages, in which database pages are stored, is kept in this page table.
This technique is similar to paging technique used by Operating Systems to allocate
memory, particularly to manage virtual memory.
The following figure depicts the concept of shadow paging.
Execution of Transaction
During the execution of the transaction, two page tables are maintained.
1. Current Page Table: Used to access data items during transaction execution.
2. Shadow Page Table: Original page table, and will not get modified during
transaction execution.
Whenever any page is about to be written for the first time
1. A copy of this page is made onto an free page,
1
0
2. The current page table is made to point to the copy,
3. The update is made on this copy.
4. At the start of the transaction, both tables are same and point· to same pages.
5. The shadow page table is never changed, and is used to restore the database in
case of any failure occurs. However, current page table entries may change
during transaction execution, as it is used to record all updates made to the
database.
6. When the transaction completes, the current page table becomes shadow
page table.At this time, it is considered that the transaction has committed.
7. The following figure explains working of this technique.
8. As shown in this figure, two pages - page 2 & 5 - are affected by a
transaction andcopied to new physical pages. The current page table
points to these pages.
9. The shadow page table continues to point to old pages which are not
changed by thetransaction. So, this table and pages are used for undoing the
transaction.
1
1
Advantages
No overhead of maintaining transaction log.
Recovery is quite faster, as there is no any redo or undo operations required.
Disadvantages
Copying the entire· page table is very expensive.
Data are scattered or fragmented.
After each transaction, free pages need to be collected by garbage collector. Difficult to
extend this technique to allow concurrent transactions.
Query Processing includes translations on high level Queries into low level expressions that can
be used at physical level of file system, query optimization and actual execution of query to get
the actual result.
Block Diagram of Query Processing is as:
1
2
It is done in the following steps:
Step-1:
Parser: During parse call, the database performs the following checks- Syntax check, Semantic
check and Shared pool check, after converting the query into relational algebra.
Parser performs the following checks as (refer detailed diagram):
1. Syntax check – concludes SQL syntactic validity. Example:
SELECT * FORM employee
Here error of wrong spelling of FROM is given by this check.
2. Semantic check – determines whether the statement is meaningful or not. Example:
query contains a tablename which does not exist is checked by this check.
3. Shared Pool check – Every query possess a hash code during its execution. So, this check
determines existence of written hash code in shared pool if code exists in shared pool
then database will not take additional steps for optimization and execution.
Hard Parse and Soft Parse –
If there is a fresh query and its hash code does not exist in shared pool then that query has to
pass through from the additional steps known as hard parsing otherwise if hash code exists then
query does not passes through additional steps. It just passes directly to execution engine (refer
detailed diagram). This is known as soft parsing.
Hard Parse includes following steps – Optimizer and Row source generation.
1
3
Step-2:
Optimizer: During optimization stage, database must perform a hard parse atleast for one
unique DML statement and perform optimization during this parse. This database never
optimizes DDL unless it includes a DML component such as subquery that require optimization.
It is a process in which multiple query execution plan for satisfying a query are examined and
most efficient query plan is satisfied for execution.
Database catalog stores the execution plans and then optimizer passes the lowest cost plan for
execution.
Row Source Generation –
The Row Source Generation is a software that receives a optimal execution plan from the
optimizer and produces an iterative execution plan that is usable by the rest of the database.
the iterative plan is the binary program that when executes by the sql engine produces the
result set.
Step-3:
Execution Engine: Finally runs the query and display the required result.
Query optimization is the process of selecting an efficient execution plan for evaluating the query.
After parsing of the query, parsed query is passed to query optimizer, which generates different execution
plans to evaluate parsed query and select the plan with least estimated cost.
Catalog manager helps optimizer to choose best plan to execute query generating cost of each plan.
21
4
Query optimization is used for accessing the database in an efficient manner. It is an art of obtaining
desired information in a predictable, reliable and timely manner. Formally defines query optimization as a
process of transforming a query into an equivalent form which can be evaluated more efficiently. The
essence of query optimization is to find an execution plan that minimizes time needed to evaluate a query.
To achieve this optimization goal, we need to accomplish two main tasks. First one is to find out the best
plan and the second one is to reduce the time involved in executing the query plan.
Three different phases during the query processing in DBMS which are as follows:
Parsing and translation
Optimization
Evaluation.
Usually, user queries are submitted to DBMS as SQL queries. During the parsing and translation phase, the
given query is translated into its internal form. In generating the internal form of the query, the parser
checks the syntax of the user's query, verifies that the relation names appearing in the query are names of
the relations in the database and so on. The system constructs a parse tree representation of the query,
which it then translates into a relational algebra expression.
This query is then translated into either of the following relational algebra expressions as follows:-
After parsing and translation into relational algebra expression, the query is then transformed into a form
which is usually query tree or graph that can be handled by the optimization engine. Query representation
During the optimization phase, the optimization engine performs various analyses on the query data. It
applies various rules to the internal data structures of the query to transform these structures into
equivalent and efficient representation. It then generates valid evaluation plans based upon the rules
applied. From the generated evaluation plans, the best evaluation plan to be executed is determined and
passed onto the query execution engine. The final phase in processing a query is the evaluation phase.
During the evaluation phase, the best evaluation plan generated by the optimization engine is selected
and then executed.
The next step is an optimization step that transforms the initial algebraic query using relational algebra
transformation into other algebraic queries until the best one is found. A query execution plan is then
founded which represented as a query tree includes information about the access method available for
each relation as well as the algorithms used in computing the relational operations in the tree. The next
step is called code generator, where we generate code for the selected query execution plan. This code is
then executed by the run time database processor to produce the query result. The run time database
processor has the task of running the query code, whether in compiled or interpreted mode, to produce
21
5
the query result. If a run time error results, an error message is generated by the run time database
processor.
Query optimization is the process of choosing the most efficient or the most favorable type of executing
an SQL statement. Query optimization is an art of science for applying rules to rewrite the tree of
operators that is invoked in a query and to produce an optimal plan. A plan is said to be optimal if it
returns the answer in the least time or by using the least space.
Cost-Based Optimization:
For a given query and environment, the Optimizer allocates a cost in numerical form which is related to
each step of a possible plan and then finds these values together to get a cost estimate for the plan or for
the possible strategy. After calculating the costs of all possible plans, the Optimizer tries to choose a plan
which will have the possible lowest cost estimate. For that reason, the Optimizer may be sometimes
referred to as the Cost-Based Optimizer. Below are some of the features of the cost-based optimization-
1. The cost-based optimization is based on the cost of the query that to be optimized.
2. The query can use a lot of paths based on the value of indexes, available sorting methods, constraints,
etc.
3. The aim of query optimization is to choose the most efficient path of implementing the query at the
possible lowest minimum cost in the form of an algorithm.
4. The cost of executing the algorithm needs to be provided by the query Optimizer so that the most
suitable query can be selected for an operation.
5. The cost of an algorithm also depends upon the cardinality of the input.
Cost Estimation:
To estimate the cost of different available execution plans or the execution strategies the query tree is
viewed and studied as a data structure that contains a series of basic operation which are linked in order
to perform the query. The cost of the operations that are present in the query depends on the way in
which the operation is selected such that, the proportion of select operation that forms the output. It is
also important to know the expected cardinality of an operation output. The cardinality of the output is
very important because it forms the input to the next operation.
The cost of optimization of the query depends upon the following-
1. Cardinality-
Cardinality is known to be the number of rows that are returned by performing the operations
specified by the query execution plan. The estimates of the cardinality must be correct as it highly
affects all the possibilities of the execution plan.
2. Selectivity-
Selectivity refers to the number of rows that are selected. The selectivity of any row from the table or
any table from the database almost depends upon the condition. The satisfaction of the condition
takes us to the selectivity of that specific row. The condition that is to be satisfied can be any,
depending upon the situation.
21
6
3. Cost-
Cost refers to the amount of money spent on the system to optimize the system. The measure of cost
fully depends upon the work done or the number of resources used.
21
7