0% found this document useful (0 votes)

43 views73 pages

Ch-2 (B) Overview of Query Processing

Uploaded by

mehari kiros

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views73 pages

Ch-2 (B) Overview of Query Processing

Uploaded by

mehari kiros

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 73

Chapter 2

An Overview of Query Processing

PART--II

1
2.2 Query Processing
The activities involved in retrieving data from the database.
In declarative languages such as SQL, which is suitable for
human use but ill suited to be the system internal
representation of a query:
The user specifies what data is required rather than how it is
retrieved.
It give the responsibility for selecting the best strategy to
the DBMS
Which prevents users from choosing strategies that are
known to be inefficient
And gives the DBMS more control system performance

2
Cont…
The aims of query processing
To transform a query written in a high-level language into
a low-level language (implementing the relational algebra)
To determine a strategy the one which is the most cost
effective and efficient.
To execute the strategy to retrieve the required data.

The phase of Query processing

1. Decomposition(Scanning, Parsing and Validation)
2. Optimization
3. Code generation and
4. Execution
3
Cont…
Query in high-level language (typically SQL)

System
Query Decomposition Catalog
Relational Algebra expression
/intermediate form of query
Compile
time Database
Query Optimization
statistics
Execution plan

Code Generation

Generated code
Main
Runtime Query Execution databases
Runtime

Query output

4
2.2.1 Query Decomposition
The aims of query decomposition
To transform a high-level query into a relational algebra
query.
To check that the query is syntactically and semantically
correct.
The typical stages of query decomposition are:
Analysis
Normalization
Semantic analysis
Simplification and
Query restructuring

5
1) Analysis
In this stage,
 The query is lexically and syntactically analyzed using the
techniques of programming language in compilers.
 Verifies that the relations and attributes specified in the
query are defined in the system catalog.
 Verifies that any operations applied to database objects are
appropriate for the object type.
 Checks whether the operations on attributes donot conﬂict
with the types of the attributes, e.g., a comparison >
operation with an attribute of type string.
Transform the query into some internal representation

6
1) Analysis
 Example:
Table Name: Staff
StaffNo fName lName Position Sex DOB Salary branchNo

SELECT StaffNumber FROM Staff WHERE position >10;

Errors of this query are:
The StaffNumber attribute is not defined like this rather
defined as (staffNo) for staff relation.
Data type mismatch (Position) i.e. the data type of position is
not integer rather it is character.

7
Cont…
On compilation of this stage, the high-level query has been
transformed into some internal representation that is more
suitable for processing.
The internal form that is typically chosen is some kind of tree,
which is constructed as follows:
A leaf node is created for each base relation in the query
A non-leaf node is created for each intermediate relation
produced by a relational algebra operation.
The root of the tree represents the result of the query
The sequence of operations is derived from the leaves to the
root

8
Cont…
 Example: Find all managers who work at a London branch
SELECT * FROM Staff s, Branch b WHERE S.branchNo = b.branchNo
AND (s.position = ’manager’AND b.city = ’London’);

∞s.branchNo=b.branchNo root

σs.position=’manager’ σb.city=’London’ Intermediate

Staff Branch
leaves

σ (position=manager’(Staff)) ∞ staff.branchNo=Branch.branchNo
9 (σ city=’London’(Branch))
2) Normalization
Converts the query into a normalized form that can be more
easily manipulated.
There are two different normal forms, Conjunctive normal
form and Disjunctive normal form.
Conjunction normal form
A sequence of conjunction that are connected with the
∧(AND) operator.
Each conjunct contains one or more terms connected by the
∨(OR) operator.
(p11 ∨ p12 ∨ ··· ∨ p1n) ∧ ··· ∧ (pm1 ∨ pm2 ∨ ··· ∨ pmn)
 A conjunction selection contains only those tuples that satisfy all
conjuncts
10
Cont…
Disjunctive normal form
 A sequence of disjunction that are connected with the ∨(OR)
operator.
 Each disjunction contains one or more terms connected by the
∧(AND) operator.
 (p11 ∧ p12 ∧ ··· ∧ p1n) ∨ ··· ∨ (pm1 ∧ pm2 ∧ ··· ∧ pmn)
 Disjunctive selection contains those tuples formed by the union
of all tuples that satisfy the disjunctions.
Example:
(position=’Manager’ ∨ salary > 20000) ∧ branchNo =
‘B003’
(position=‘Manager’ ∧
branchNo=‘B003’) ∨
11 (salary>20000 ∧ branchNo=’B003’
Cont…
Example: Consider the following query:
Find the names of employees who have been working on
project P1 for 12 or 24 months?
The query in SQ L :
 SELECTE NAME FROM EMP, ASG WHERE EMP.ENO = ASG.ENO
AND ASG.PNO =‘‘P1’’ AND DUR = 12 OR DUR = 24
The qualiﬁcation in conjunctive normal form:
 EMP.ENO=ASG.ENO ∧ ASG.PNO=”P1”∧ DUR=12 ∨
DUR=2 4 )
The qualiﬁcation in disjunctive normal form :
 (EMP.ENO=ASG.ENO ∧ ASG.PNO =”P1” ∧ DUR =12 ) ∨
(EMP.ENO = ASG.ENO ∧ ASG.PNO =”P1” ∧ DUR=24)

12
3) Semantic Analysis
Applied to normalized queries
Rejects contradictory queries:
Qualification condition cannot be satisfied by any tuple
Rejects incorrectly formulated queries:
Condition components do not contribute to generation of the
result.
A query is contradictory if it is predicate cannot be satisfied
by any tuple
Example:
(position=’Manager’ ∧ position =’Assistance’) ∨ salary >20000
Could be simplified to (salary>20000)
 @ F ∨ salary >20000
13
4) Simplification
The objective of the simplification stage are:
 To detect redundant qualifications
 To eliminate common sub expression and
 To transform the query to a semantically equivalent but
more easily and efficiently computed form.
Typically:
Access restrictions
View definitions and
Integrity constraints are considered at this stage ,some of
which may also introduce redundancy
If the user does not have the appropriate access to all the
components of the query the query must be rejected
14
Cont…
 View definition
 CREATE VIEW Staff3 AS SELECT staffNo, fName, lName,
salary, branchNo FROM Staff WHERE branchNo=‘B003’;

 SELECT * FROM Staff3 WHERE (branchNo=‘B003’ AND

salary > 20000);

 SELECT staffNo, fName, lName, salary FROM Staff WHERE

(branchNo=‘B003’ AND salary>20000) AND
branchNo=‘B003’;

 WHERE (branchNo=‘B003’ AND salary>20000)

15
Cont…
 Assuming that the user has the appropriate access privileges an
initial optimization is to apply the well-known idempotent rules of
Boolean algebra, such as:
● p ∧ p =p ● P ∧ False= False ● P ∧ true = P
● P ∧ -P=False ● p ∧ (p V q)=p ● p v p=p
● p v False =p ● p v true =true
● p v –p = true ● p v (p ∧ q)=p
 Generally in View resolution:
View select-list is translated into corresponding select-list in the
view defining query
From-list of the query is modified to hold the names of base
tables
Qualifications from WHERE clause are combined
16 GROUP BY and HAVING clauses are modified
5) Query restructuring
The final stage of query decomposition
The query is restructured to provide a more efficient
implementation.
Rewriting a query using relational algebra operators
Modifying relational algebra expression to provide more
efficient implementation
Heuristically Approach:- uses transformation rules to convert
one relational algebra expression into an equivalent form that
is known to be more efficient.
Example:
‘Select’ before ‘join’  manager working in London Branch

17
Transformation Rules
 Used in restructuring query
 In listing these rules ,we use three relations R,S, and T with
 R defines over the attributes A={ A1, A2, …An}, and
 S defines over the attributes B ={B1, B2, … Bn},
 p, q, and r denotes predicates, and
 L, L1, L2, M, M1, M2, and N denote sets of attributes
1. Conjunctive selection operations can cascade into individual
selection operations (and vice versa)
σp^q^r(R)= σp( σp( σq( σr(R) ))
Example
σ branchNo=’B0003’^salary>15000(Staff)

18
σbranchNo=’B0003’(σsalary>15000(Staff)
Cont…
2. Commutativity of selection operations
σp( σq(R)) = σq( σp(R))
Example
σ branchNo=’B0003’(σ salary>15000(Staff))

(σsalary>15000(σ branchNo=’B0003’(Staff))
3. In a sequence of projection operations, only the last in the sequence
is required.
 Example
∏lName ∏ branchNo,lName(Staff)

∏lName(Staff)
19
Cont…
4. Commutativity of selection and projection
 If the predicate P involves only the attributes in the projection list,
then the selection and projection operations commute:
∏A1,….Am( σp(R) = σp( ∏ A1,….Am(R))
 Example
∏fName,lName(σlName=’Beech’(Staff))

σlName=’Beech’(∏ fName,lName(Staff))
5. Commutativity of Theta Join (and Cartesian Product)
● R ⋈ p S = S ⋈p R ● R X S=SXR
 Example: staff ⋈ staff.branchNo=Branch.branchNo Branch

Branch ⋈ staff.branchNo=Branch.branchNo Staff

20
Cont…
6. Commutativity of selection and Theta join (Cartesian
product)
If the selection predicates involves only attributes of one of
the relations being joined, then the selection and join (or
Cartesian product) operations commute:
● σp(R ⋈ rS) = (σp(R) ⋈ r S
● σp(R X S) = (σp(R)) X S
Alternatively if the selection predicates is conjunctive
predicate of the form (p ∧ q), where p involves only attributes
of S, then the selection and Theta join operations commute as:
● σp ∧ q(R ⋈ rS) = (σp(R)) ⋈ r(σq(S))
● σp (R X S) = (σp(R) ) X (σq(S))
21
Cont…
Example:
σposition =’manager’ ^ city =’London’(Staff ⋈
staff.brancNo=Branch.branchNo (Branch))

σposition =’manager’(Staff))
staff.brancNo=Branch.branchNo
(σcity =’London’(Branch))

7. Commutative of Union and Intersection(but not set difference)

R U S = S U R
R n S = S n R
22
Cont…
8. Commutativity of projection and Theta join(or Cartesian product)
 If the projection list is of the form L = L1 U L2, where L1 involves
only attributes of R and L2 involves only attributes of S, then
provided the join condition only contains attributes of L, the
Projection and Theta join operations commute as:
∏L1 U L2(R ⋈ rS) = ∏L1(R)) ⋈ r(∏L2(S))

9. Commutativity of selection and set operations (Union, Intersection

and Set Difference)
σp(R U S) = σp(R) U σp(S)
σp(R n S) = σp(R) n σp(S)
σp(R - S) = σp(R) - σp(S)

23
Cont…
10. commutativity of projection and union
 πL(RUS)= πL(S) U πL(R)
11. Associativity of Theta join(and Cartesian product)
 (R ⋈ S) ⋈ T = R ⋈ (S ⋈ T)
 (R X S) X T = R X (S X T)
12. Associativity of union and Intersection(but not set Difference)
 (R ⋈ S) U T = S U ( R U T)
 (R n S) n T = S n ( R n T)

24
Cont…
Example: To justify importance of query restructuring
 Find all managers who work at a London branch
SELECT * FROM Staff s, Branch b WHERE
S.branchNo=b.branchNo
AND (s.position=’manager’AND b.city =’London’);
 The equivalent relational algebra queries corresponding to this SQL
statements are:
1. σ (position=manager’) ᴧ(city=’London’) ᴧ (staff.branchNo =
Branch.branchNo)(StaffXBranch)
2. σ(position=manager’) ᴧ(city=’London’) (Staff ∞ staff.branchNo =
Branch.branchNo Branch)
3. σ (position=manager’(Staff)) ∞
staff.branchNo=Branch.branchNo(σ city=’London’(Branch))
25
Cont…
Assume there are:
 1000 tuples in staff and 50 tuples in Branch,
50 Managers(one for each branch), and
5 London branches.
We compare these queries based on the number of disk
accesses required
Assume there are no indexes or sort keys on either relations
and that the result of any intermediate operations are stored on
disk.
Assume tuples are accessed one at a time and
Main memory is large enough to process entire relations for
each relational algebra operation
26
Cont…
The first query calculates the Cartesian product of staff and
Branch, which requires:
(1000+50) disk accesses to read the relations
(1000*50) disk accesses to create a relation with
(1000*50) tuples which is the result of Cartesian product
(1000*50) cost of another disk accesses, to read each of
these tuples again to test them against the selection
predicate
Giving a total cost of (1000 +50) +2*(1000*50)=101050
disk accesses

27
Cont…
The second query joins staff and Branch on the branch
number branchNo, which again requires:
(1000+50)disk accesses to read each of the relations
 We know that the join of the two relations has 1000
tuples ,one for each member of staff( a member of staff
can only work at one branch).
Selection operation requires 1000 disk accesses to read
the result of the join,
Giving a total cost 2*1000+(1000+50)=3050 disk
accesses

28
The final query
First reads each staff tuple to determine the manager tuples,
which requires 1000 disk access and produces a relations
with 50 tuples.
The second selections operations reads each Branch tuple to
determine the London branch, which requires 50 disk access
and produces a relations with 5 tuples.
The final operation is the join of the reduced staff and
Branch relations, which requires(50+5) disk accesses
Giving a total cost of 1000 + 50 + 2 *(50 + 5) =1160 disk
access

29
Cont…
Exercise 4:
 Find all books which have a price of greater than 300 and whose
author reside at New York.

SELECT title, price FROM catalog c, author a WHERE

c.author_id = a.author_id
AND (c.price > 300 AND a.city =’New York’);

 Assume there are 100 tuples in the catalog table, 50 tuples in the
author table.
 There are 40 books, which have a price greater than 300 in the
catalog table and 10 authors who reside in the ‘New York’ in the
author table.
30
Cont…
Further assume that there are no indexes or sort keys on either
relations and that the result of any intermediate operations are
stored on disk.
Also assume tuples are accessed one at a time and
Main memory is large enough to process entire relations for
each relational algebra operation

Q#1) Write the different relational algebra queries which is

equivalent to this SQL statements

Q#2) Compare the relational algebra queries you get in Q#1

based on the number of disk accesses they required
31
Query Optimization

32
2.2.2 Query Optimization
 The activity of choosing an efficient execution strategy for
processing a query .
 An important aspect of query processing is query optimization.
 As there are many equivalent transformations of same high- l
evel query ; the aim of query optimization is to choose the one
that minimizes resource usage.
 Generally the Optimization criteria‘s are:
 Reduce total execution time of the query:
Minimize the sum of the execution times of all individual
operations that makeup the query
Reduce the number of disk access
 Reduce response time of the query:
Ensure well usage of resource

33
Maximize parallel operations (pipelining)
Cont…
Both methods (criteria) of query optimization depend on
database statistics to evaluate properly the difference options
that are available.
The accuracy and currency of these statistics have a
significant bearing on the efficiency of the execution strategy
chosen.
The statistics cover information about relations, attributes and
indexes.
Example: The system catalog may store statistics
 Giving the cardinality of relations
 The number of distinct values for each attribute and
 The number of levels in a multilevel index
34
Cont…
keeping the statistics current can be problematic
If the DBMS updates the statistics every time a tuple is
inserted, updated, or deleted, this would have a significant
impact on performance during peak period.
An alternative approach is to update the statistics on a periodic
basis, for example nightly, or whenever the system is idle.
Various DBMS implementations have used different
optimization techniques to obtain efficient execution plans.
Some of the techniques are
Syntactical Optimization
Semantic Optimization
Heuristic (Rule-based) Optimization
Cost-based Optimization
35
Dynamic and Static query optimization
1)Dynamic query optimization
Query decomposition and optimization carry out every time
the query is run.
Advantage: All information required to select an optimum
strategy is up to date.
Disadvantage:
The performance of the query is affected because the
query has to be parsed, validated, and optimized before it
can be executed.
In some case, the number of execution strategies analyzed
need to be reduced in order to keep the overhead costs
within acceptable limits which might result in selecting a
strategy that is not best or there is a chance of left out best
strategy.
36
Cont…
2) Static query optimization
 The query is parsed, validated, and optimized once that is similar to
the approach taken by a compiler for a programming language.
 DBMS analyze large number of alternative strategies before
selecting the optimum strategy.
 More suitable for query that is executed frequently
 Advantages
The runtime overhead is removed
More time available to evaluate a larger number of execution
strategies.
 Disadvantage:
The execution strategy that is chosen as being optimal when
the query is compiled may no longer be optimal when the
query is run.
37
A) Syntactical Optimization
Relies on the user’s understanding of both the underlying
database schema and the distribution of the data within the
table.
Tables are joined in the original order specified by the user
Can be extremely efficient when accessing data in a relatively
static environment.
The drawback of this techniques are:
It is up to user to find more efficient method of accessing
the data.
When query dynamically changed (embedded query) it
need to be recompiled to improve their data access
performance.
38
B) Semantic optimization
Operates on the premise that the optimizer has a basic
understanding of the actual database schema.

The optimizer uses its knowledge of system constraints to

simplify or to ignore a particular query.

This technique holds great promise for providing even more

improvements to query processing efficiency in the future
relational database system

39
C) Heuristic query optimization
Heuristic: problem-solving by experimental methods
 Applying general rules to choose the most appropriate
internal query representation
Based on transformation rules for relational algebra operators
Used by most of DBMSs to determine the best strategies.
Heuristics rules include
Performing selection and projections as early as possible
Compute common expression only once and store it.
Combining Cartesian product with a subsequent selection
whose predicate represents a join condition into a join
operation.
Use associatively of binary operations to rearrange leaf
nodes so that leaf nodes with the most restrictive
40
selections are executed first
D) Cost based query optimization
A method of optimizing the query by choosing a strategy that
result in minimum cost.
Optimizer needs specific information about the stored data.
The information is system dependent and may include
information such as:
File size
File structure types
Available primary and secondary indexes
Attribute selectivity (percentage of tuples expected to be
retrieved for a given predicate)
Its goal is not to produce the ‘optimal’ execution plan for
retrieving data, to provide a reasonable execution plan.
41
Cont…
The cost of executing a query includes the following
components:
Secondary storage access cost:
The cost of accessing, reading, searching for and
writing data blocks that reside on secondary storage.
Most important than other one
Basically, most database system compare different
execution strategies in terms of number of block
transfers between the secondary storage and main
memory.
Storage cost:
The cost of storing any intermediate files that are
generated by an execution strategy for the query.
42
Cont…
Computation cost:
The cost of performing in-memory operations on the data
buffer during query execution. Example sorting and
merging records.
Memory usage cost:
The cost of pertaining to the number of memory buffer
needed during query execution.
Communication cost:
Cost of communicating the query from the source to the
database and then return the query result to where query
originated.
Take more attention in distributed database system where
communication cost most significant.
43
Cost Estimation and statistics
Cost estimation depends on statistical information held in the
system catalog.
The dominant cost in query processing is usually that of disk
access which are slow compared with memory access.
Many of the cost estimates are based on the cardinality of the
relation
The success of estimating the size and cost of intermediate
relational algebra operations depends on the amount and
currency of the statistical information that the DBMS holds.
If we wish to maintain accurate statistics, then, every time a
relation is modified, we must also update the statistics
44
Cont…
Typically, we would expect a DBMS to hold the following
types of information in its system catalog:
For each base relation R
n Tuples(R) nr:- The number of tuples (records) in
relation R (that is, its cardinality)
Sr:- The size of a tuple of relation r in bytes.
bFactor(R) fr:- The blocking factor of R (that is, the
number of tuples of R that fit into one block)
nBlocks(R) br :- The number of blocks required to store
R. If the tuples of R are stored physically together, then:
nBlocks(R) = [nTuples(R) / bFactor(R)
br = n r / f r
45
Cont…
For each multilevel index I on attribute set A
nLevelsA(I)  X:- The number of levels in I
nLfBlocksA(I)  B11:- The number of first level index in
blocks in I
For each attribute A of base relation R
nDistinctA(R)  D  V(A, r):- The number of distinct
values that appear in the relation R for attribute A.
Equal to size of A(r ). If A is a key for relation R
minA(R), maxA(R):- The minimum and maximum
possible values for the attributes A in relation R.
SCA(R) S:- The selection cardinality of attribute A in
relation R which is the average number of tuples that satisfy
46
an equality conditions on attribute A.
Cont…
 If A is a key attribute of R or selection condition α forces A to take
on a specified value SCA(A, α(R)) = 1 and otherwise SCA(A,
α(R)) = [nTuples(R) / nDistinctA(R)]

 If we assume that the values of A are uniformly distributed in R,

and that there is at least one values that satisfy the condition, the
 SCA(R) = [nTuples (R) * ((maxA(R)-c) / (maxA(R) -minA(R)))]
if A> C
 SCA(R) = [nTuples (R) * ((c-maxA(R)) / (maxA(R) - minA(R)))]
for A<C
 [(nTuples (R)/nDistinctA(R))* n] for A in{c1, c2,… cn}
 SCA(R) * SCB(R) for (A ∧ B)
 SCA(R) + SCB(R) - SCA(R) * SCB(R) for (A ∨ B)

47
Selection operation(S= σp(R)
The selection operations in the relational algebra works on a
single relation R.
There are a number of different implementations for the
selection operation, depending on the structure of the file in
which the relation is stored, and on whether the attribute(s)
involved in the predicate have been indexed(hashed).
The cost are given in terms of secondary storage access and
other coasts like computation time, storage cost, and so on are
ignored for the time being, as they are not more significant.
The commonly used search algorithms and their associated
costs are discussed in the following slide.

48
Cont…
 The main strategies that we consider are:
(S1) Linear search(unordered file, no index):-
Retrieve every record in the file, and test whether its
attribute values satisfy the selection condition.
(S2) Binary search(ordered file, no index):-
If the selection condition involves an equality comparison
on a key attribute on which the file is ordered, binary
search (which is more efficient than linear search) can be
used.
(S3) Using a primary index or hash key to retrieve a single
record:
If the selection condition involves an equality comparison
on a key attribute with a primary index (or a hash key),
use the primary index (or the hash key) to retrieve the
49 record.
Cont…
(S4) Using a primary index to retrieve multiple records:
If the comparison condition is >, ≥, <, or ≤ on a key field
with a primary index, use the index to find the record
satisfying the corresponding equality condition.
(S5) Using a clustering index to retrieve multiple records:
If the selection condition involves an equality comparison
on a non-key attribute with a clustering index, use the
clustering index to retrieve all the records satisfying the
selection condition.
(S6) Using a secondary (B+-tree) index:
On an equality comparison, this search method can be
used to retrieve a single record if the indexing field has
unique values (is a key) or to retrieve multiple records if
the indexing field is not a key.
Also can be used to retrieve records on conditions
involving >,>=, <, or <=. (FOR RANGE QUERIES)
50
Cont…

(S7) Conjunctive selection using an individual index:

If an attribute involved in any single simple condition in
the conjunctive condition has an access path that permits
the use of one of the methods S2 to S6, use that condition
to retrieve the records and then check whether each
retrieved record satisfies the remaining simple conditions
in the conjunctive condition.

(S8) Conjunctive selection using a composite index

If two or more attributes are involved in equality
conditions in the conjunctive condition and a composite
index (or hash structure) exists on the combined field, we
can use the index directly.
51
Cont…
Strategies Cost estimation
Linear search (equality condition & record exist) Br / 2
Linear search (equality condition & no record) Br
Binary search (equality condition on key attribute) log 2Br
Binary search (otherwise) log 2Br+(S/fr)-1
Equality on hash key 1
Equality on primary key X+1
Inequality on primary key X + Br / 2
Inequality on any ordered index X + Br / 2
Equality on clustering index X + (S/fr)
Equality on non-clustering index X+S
Inequality
52
on a B +
- tree index X+(B11/2)+(nr/2)
Cost estimation strategies example
Assume that the BOOK file contains 20,000 records stored in
4000 blocks.
The file has the following access paths:
A secondary index on the ID with 4 levels
Clustering index on PRICE with 3 levels and SCA of 30
Secondary index on the non key attribute YEAR with 2
levels and first level index blocks.
There are 200 distinct values for year
Now consider the following queries
• Q1: (ID = ‘D02’ (CATALOG))
• Q2: (YEAR > 1995 (CATALOG))
• Q3: (YEAR = 2000(CATALOG))
53
Cont…
 The cost components for the BOOK table are as follow
Number of records (nr) = 20,000
Number blocks (Br)
Blocking factors (Fr) = nr / Br = 20000/ 4000 = 5
XID = 4; DID = 20000; SID = 20000/20000 = 1
XPRICE = 3; SPRICE = 30
XYEAR = 2; B11YEAR = 4 DYEAR = 200;
SYEAR = 20000/200 = 100
 Where
S  Selectivity; D  Distinct value
X  number of level on index
54 B11  number of first level on index
Cont…
 Now we will choose the query execution strategies by
comparing the cost as follows:
 For the first query Q1 we can use either a linear search or a
secondary B+-tree. So average cost of linear search is 2000
while the of second method is XID + 1 = 4 + 1 = 5 which is
a better strategies.
 For the second query Q2 we can use either a linear search
(4000) or a secondary B+-tree
[ cost = XYEAR + (B11YEAR / 2) + (nr/2)]
= 3+4/2 + 2000/2 = 10005. In this
case the linear search is selected as better strategies.
 For the third query Q3 we can use either a linear search
(4000) or a secondary B+-tree (XYEAR + SYEAR = 2 + 100
55
= 102 ) which is the better method.
Join operation(S ∞ R)
It is the most time consuming operations in a query
processing.
The main strategies for implementing the join operation
Block nested loop join
Indexed nested loop join
Sort-merge join
Hash join
Join selectivity (JS) = ration of join file to the size of
Cartesian product file and with value always (0 <= JS <= 1).
If no join condition, the size of join file is equal to Cartesian
product and JS = 1.
If no tuples satisf the condition then JS = 0;

56
Cont…
(J1) Nested-loop join (brute force):
For each record ‘t’ in the outer loop, retrieve every record
from ‘s’ inner loop and test whether the two records satisfy
the join condition t[A] = s[B].

(J2) Single-loop join:

 (Using an access structure to retrieve the matching
records):
If an index (or hash key) exists for one of the two join
attributes say, B and S, retrieve each record ‘t’ in R, one at
a time, and then use the access structure to retrieve directly
all matching records ‘s’ from S that satisfy s[B] = t[A].
57
Cont…
(J3) Sort-merge join:
Efficiently implemented if the records of R and S are
physically sorted (ordered) by value of the join attributes A
and B, respectively

Both files are scanned in order of the join attributes,

matching the records that have the same values for A and
B.

The records of each file are scanned only once each for
matching with the other file. Unless both A and B are non-
key attributes, in which case the method needs to be
58
modified slightly.
Cont…
(J4) Hash-join:
The records of files R and S are both hashed to the same
hash file, using the same hashing function on the join
attributes A of R and B of S as hash keys.

A single pass through the file with fewer records (say, R)

hashes its records to the hash file buckets.

A single pass through the other file (S) then hashes each of
its records to the appropriate bucket, where the record is
combined with all matching records from R.

59
Cost
Block nested loop join
nBlocks(R) +(nBlocks(R)* nBlocks(S)), if buffer has
only one block for R and S
nBlocks(R)+ [nBlocks(S)*(nBlocks(r)/nBuffer-2))], if
(nBuffer-2)blocks for R
nBlocks(R) + nBlocks(S), if all blocks of R can be read
into database buffer
Indexed nested loop join
nBlocks(R) + nTuples(R)*(nLevelsA(I) +1), if join
attribute A in S is the primary key
nBlocks(R) + nTuples(R) *(nLevelsA(I)
+[SCA(R)/bFactor(R)]), for clustering index I on
60
attribute A
Cost…

Sort-merge join
nBlocks(R) *[log2(nBlocks(R)+nBlocks(S) *
[log2(nBlocks(S)], for sorts
nBlocks(R)+nBlockss(S), for merg

Hash join
3(nBlocks(R) + nBlocks(S), if hash index is held in
memory
2(nBlocks®+nBlocks(S))*[lognBuffer-1(nBlocks(S)-
1]+nBlocks(R) +nBlocks(S), otherwise

61
2.2.3 Materialization and Pipelining
Materialization
The process in which the results of intermediate relational
algebra operations are written temporarily to disk.

Pipelining
It is also known as on-the fly processing
Used to improve the performance of queries
In this case the results of one operation to another operation
without creating a temporary relation to hold the
intermediate result.
Saves the cost of creating temporary relations and reading
the result back
62
Cont…
A buffer is created for each pair of adjacent operations to
hold the tuples being passed from the operations to second
one.
One drawback ,with pipelining is that the inputs to operation
are not necessarily available all at once for processing.

Example:
Position=’Manager’ and salary>20000(Staff)
If we assume that there is an index on the salary attribute, we
use the cascade of solution rule to transform this selection
into two operations:
Position=’Manager’(salary>20000(Staff))
63
Query Optimization
in
Oracle

64
2.3 Query Optimization in Oracle
Oracle supports the two approaches to query optimization
 Rule-based and
Cost-based

1) Rule-based optimization rankings

Rank Access Path
1 single row by ROWID(row identifier)
2 Single row by cluster join
3 Single row by hash cluster key with unique or primary key
4 Single row by unique or primary key
5 Cluster join
6 Hash cluster key
65
Cont…
Rank Access Path
7 Indexed cluster key
8 Composite key
9 Single-column indexes
10 Bounded range Search on indexed columns
11 unbounded range search on indexed
columns
12 sort-merge join
13 MAX or MIN of indexed column
14 ORDER BY on indexed column
15 Full table scan
66
Cont…
For Example
SELECT propertyNo FROM propertyForRent
WHERE rooms >7 and city=’London’;
 Assume that we have an index on the
Primary key, propertyNo
Rooms column and City column
In this case, the rule-based optimizer will consider the
following access paths:
A single-column access path using the index on the city
column from the WHERE condition(city=’London’). This
access path has rank 9

67
Cont…
An unbounded range scan using the index on the rooms
column from the WHERE condition(rooms>7).This access
path has rank 11

A full table scan, which is available for all SQL statements.

This access path has rank 15.
Although there is an index on the proprtyNo column, this
column does not appear in the WHERE clause and so is not
considered by the rule-based optimizer.

Based on these paths, the rule-based optimizer will choose to

use the index based on the city column.
68
Cont…
2) The cost-based optimizer
Selects the execution strategy that requires the minimal
resource use to process all rows accessed by the query

The user selects whether the minimal resource usage is based

on throughput or based on response time, by setting the
OPTIMIZER_mode initialization parameters

The cost-based optimizer depends on statistics for all tables,

clusters, and indexes accessed by the query.

69
Cont…
However, Oracle does not gather statistics automatically but
makes it the users’ responsibility to generate these statistics
and keep them current

The PL/SQL package DBMS_STATS can be used to

generate and manage statistics on tables, columns, indexes,
partitions, and on all schema objects in a schema or database.

EXECUTE
DBMS_STATS.GATHER_SCHEMA_STATS(‘Manager’);

70
QUESTIONS

71
Disk Structure
same size as
memory
page

72
 Suppose you were given a chance to visit 10 pre-selected different
cities in Ethiopia. The only constraint would be ‘Time’
 Would you have a plan to visit the cities in any order?
 Place the 10 cities in different groups based on their proximity to
each other. And start with one group and move on to the next
group.
 Important point made over here is that you would have visited the
cities in a more organized manner, and the ‘Time’ constraint
mentioned earlier would have been dealt with efficiently.

 Query Optimization works in a similar way: There can be many

different ways to get an answer from a given query. The result
would be same in all scenarios.
 DBMS strive to process the query in the most efficient way (in
terms of ‘Time’) to produce the answer.
73 Cost = Time needed to get all answers

Alex Xu - Sahn Lam - System Design Interview - An Insider's Guide - Volume 2 (2023, ByteByteGo Inc) - Libgen - Li
No ratings yet
Alex Xu - Sahn Lam - System Design Interview - An Insider's Guide - Volume 2 (2023, ByteByteGo Inc) - Libgen - Li
417 pages
ADBMS Notes
67% (3)
ADBMS Notes
48 pages
Advanced Database Systems: Chapter 3:query Processing and Evaluation
100% (1)
Advanced Database Systems: Chapter 3:query Processing and Evaluation
36 pages
Chapter 2 Query Processing
No ratings yet
Chapter 2 Query Processing
21 pages
Chapter 4 Query Optimization
100% (2)
Chapter 4 Query Optimization
35 pages
Advanced Database Ch2 and 3
100% (1)
Advanced Database Ch2 and 3
73 pages
Chapter 1 - Query Processing and Optimization
No ratings yet
Chapter 1 - Query Processing and Optimization
62 pages
Query Processing
0% (1)
Query Processing
15 pages
Query Processing
No ratings yet
Query Processing
66 pages
Advanced Database Systems Chapter 2
100% (1)
Advanced Database Systems Chapter 2
16 pages
Relational Algebra Selection and Projection
No ratings yet
Relational Algebra Selection and Projection
47 pages
Database Management System: MCQ Assignment
No ratings yet
Database Management System: MCQ Assignment
100 pages
SAP Tuning for IT Professionals
No ratings yet
SAP Tuning for IT Professionals
36 pages
Query Optimization Techniques
No ratings yet
Query Optimization Techniques
38 pages
Database Query Optimization Guide
No ratings yet
Database Query Optimization Guide
38 pages
Chapter 2 - Query Processing and Optimization
100% (1)
Chapter 2 - Query Processing and Optimization
28 pages
Chapter - 2 Query Processing
No ratings yet
Chapter - 2 Query Processing
61 pages
Search Warrant Details
No ratings yet
Search Warrant Details
474 pages
Chapter 1 Query Processing
100% (1)
Chapter 1 Query Processing
63 pages
Query Processing & Optimization Guide
100% (1)
Query Processing & Optimization Guide
45 pages
Chapter 2 Query Processing and Optimization (Autosaved)
No ratings yet
Chapter 2 Query Processing and Optimization (Autosaved)
35 pages
Chapter - 2 Query Processing
No ratings yet
Chapter - 2 Query Processing
63 pages
Query Decomposition & Localization
0% (2)
Query Decomposition & Localization
26 pages
Chapter - 2 Query Processing
No ratings yet
Chapter - 2 Query Processing
61 pages
Advanced Database System Chapter Two Query Processing and Optimization
No ratings yet
Advanced Database System Chapter Two Query Processing and Optimization
50 pages
Chapter - 2 Query Processing
No ratings yet
Chapter - 2 Query Processing
64 pages
Chapter 4
No ratings yet
Chapter 4
34 pages
Chapter 13: Query Processing
No ratings yet
Chapter 13: Query Processing
25 pages
Itm661 Lecture03 Part2 2015
No ratings yet
Itm661 Lecture03 Part2 2015
47 pages
Query Processing & Optimization
No ratings yet
Query Processing & Optimization
60 pages
Chapter 1 Query Processing
No ratings yet
Chapter 1 Query Processing
58 pages
Practical Guide To Pandas For Data Science
100% (1)
Practical Guide To Pandas For Data Science
26 pages
Chapter 20
No ratings yet
Chapter 20
99 pages
Cobol File Handling
No ratings yet
Cobol File Handling
37 pages
CHAPTER-6
No ratings yet
CHAPTER-6
45 pages
Chapter 2
No ratings yet
Chapter 2
50 pages
Query Processing
No ratings yet
Query Processing
28 pages
UNIT 1 and 2 SysAdmin Lecture Manual
No ratings yet
UNIT 1 and 2 SysAdmin Lecture Manual
9 pages
Query Processing and Optimization Guide
No ratings yet
Query Processing and Optimization Guide
13 pages
Adbms Unit2
No ratings yet
Adbms Unit2
20 pages
Advancedchapter 2 2013
No ratings yet
Advancedchapter 2 2013
16 pages
2.advanced Database System
No ratings yet
2.advanced Database System
184 pages
Chapter 1 Query Processing and Optimization
No ratings yet
Chapter 1 Query Processing and Optimization
108 pages
Chapter - 2-Query Prosessing and Optimization
No ratings yet
Chapter - 2-Query Prosessing and Optimization
44 pages
Ch-2 Query Processing and Optimization
No ratings yet
Ch-2 Query Processing and Optimization
26 pages
Advanced Database
No ratings yet
Advanced Database
47 pages
Chapter 2 Adb
No ratings yet
Chapter 2 Adb
21 pages
2 Chapter 3 Query Optimization
No ratings yet
2 Chapter 3 Query Optimization
29 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
7 pages
Chapter 2 Query Processing and Optimization
No ratings yet
Chapter 2 Query Processing and Optimization
45 pages
Key Differences
No ratings yet
Key Differences
1 page
Module 4 - 3 Bhargavi
No ratings yet
Module 4 - 3 Bhargavi
56 pages
Chapter 2-Query Processing and Optimi
No ratings yet
Chapter 2-Query Processing and Optimi
43 pages
Unit 5
No ratings yet
Unit 5
41 pages
CO3 Session 7
No ratings yet
CO3 Session 7
32 pages
DBMS Unit - 7
No ratings yet
DBMS Unit - 7
33 pages
Relational Database Basics
No ratings yet
Relational Database Basics
52 pages
Chapter 1 Query Processing and Optimization
No ratings yet
Chapter 1 Query Processing and Optimization
40 pages
Introduction To Emerging Technologies
No ratings yet
Introduction To Emerging Technologies
34 pages
Introduction To Emerging Technologies
No ratings yet
Introduction To Emerging Technologies
34 pages
4 Chapter Four
No ratings yet
4 Chapter Four
34 pages
29-Query Optimization-04-10-2024
No ratings yet
29-Query Optimization-04-10-2024
35 pages
DBMS Unit - 7
No ratings yet
DBMS Unit - 7
34 pages
Module - 4
No ratings yet
Module - 4
60 pages
Database Management System Chapter One
No ratings yet
Database Management System Chapter One
137 pages
Database Management System Chapter One
No ratings yet
Database Management System Chapter One
137 pages
RDBMS Unit-1
No ratings yet
RDBMS Unit-1
67 pages
SQL Database Programming Quiz
No ratings yet
SQL Database Programming Quiz
47 pages
PeopleSoft Data Mover Guide
100% (1)
PeopleSoft Data Mover Guide
49 pages
DE Module5 QueryOptimization
No ratings yet
DE Module5 QueryOptimization
11 pages
Chapter 2: Data Science
No ratings yet
Chapter 2: Data Science
32 pages
Client Interview Question Bank (Mainframe)
No ratings yet
Client Interview Question Bank (Mainframe)
26 pages
MySQL技术内幕InnoDB存储引擎第2版
No ratings yet
MySQL技术内幕InnoDB存储引擎第2版
437 pages
EmTec Chapter 3 Finally Edited
No ratings yet
EmTec Chapter 3 Finally Edited
38 pages
Ch-2 Query Processing and Optimization
No ratings yet
Ch-2 Query Processing and Optimization
21 pages
Lab Session 1
No ratings yet
Lab Session 1
14 pages
QUERY Processing and Relational Algebra
No ratings yet
QUERY Processing and Relational Algebra
27 pages
Oodb Vs Ordb
No ratings yet
Oodb Vs Ordb
85 pages
SQL Query Examples and Techniques
No ratings yet
SQL Query Examples and Techniques
112 pages
Database Chapter - 2
No ratings yet
Database Chapter - 2
49 pages
Module 4 Online Collaboration
No ratings yet
Module 4 Online Collaboration
59 pages
SAP Database Performance Guide
No ratings yet
SAP Database Performance Guide
3 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
44 pages
Lab Relational Algebra
No ratings yet
Lab Relational Algebra
5 pages
Introduction To Object Oriented Programming
No ratings yet
Introduction To Object Oriented Programming
128 pages
Emerging Chap 4
No ratings yet
Emerging Chap 4
35 pages
Chapter 2
No ratings yet
Chapter 2
91 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
73 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
73 pages
Database Ch-3
No ratings yet
Database Ch-3
61 pages
Aryan DBMS Assignment 1
No ratings yet
Aryan DBMS Assignment 1
12 pages
CH-5 Database Recovery System
No ratings yet
CH-5 Database Recovery System
30 pages
Chapter 7 - EMTE
No ratings yet
Chapter 7 - EMTE
30 pages
MOCK Exam Iinformation Technology
No ratings yet
MOCK Exam Iinformation Technology
62 pages
Customer Orders Database Schema
No ratings yet
Customer Orders Database Schema
23 pages
20it403 DBMS Digital Material Unit Iv
No ratings yet
20it403 DBMS Digital Material Unit Iv
115 pages
Java Collections Explained
No ratings yet
Java Collections Explained
78 pages
Sys Admin ch7
No ratings yet
Sys Admin ch7
27 pages
ADS Unit-3
No ratings yet
ADS Unit-3
22 pages
System Admin Assignment
No ratings yet
System Admin Assignment
1 page
Module 5 Word Processing
No ratings yet
Module 5 Word Processing
21 pages
Chapter 5
No ratings yet
Chapter 5
48 pages
Chapter 5
No ratings yet
Chapter 5
48 pages
Mongodb Report
No ratings yet
Mongodb Report
26 pages
AD3391 - Quest - Bank - DDM - Updated
No ratings yet
AD3391 - Quest - Bank - DDM - Updated
21 pages
Trigger Lab
No ratings yet
Trigger Lab
4 pages
Chapter Ghsaghsgystytqtyqtw-7
No ratings yet
Chapter Ghsaghsgystytqtyqtw-7
20 pages
BDA Exp Removed Removed
No ratings yet
BDA Exp Removed Removed
33 pages
ExitExam Tutorial
No ratings yet
ExitExam Tutorial
6 pages
Splunk Quick Reference Guide
100% (1)
Splunk Quick Reference Guide
6 pages
MidExam SystemAdmin2017
No ratings yet
MidExam SystemAdmin2017
4 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
14 pages
ISAM B+trees
No ratings yet
ISAM B+trees
12 pages
SQL Essentials for Beginners
No ratings yet
SQL Essentials for Beginners
3 pages
Cognos Report Optimization Tips
No ratings yet
Cognos Report Optimization Tips
12 pages
SQL Server 2005: DBA Enhancements
No ratings yet
SQL Server 2005: DBA Enhancements
3 pages

Ch-2 (B) Overview of Query Processing

Uploaded by

Ch-2 (B) Overview of Query Processing

Uploaded by

Chapter 2

An Overview of Query Processing

The phase of Query processing

SELECT StaffNumber FROM Staff WHERE position >10;

σs.position=’manager’ σb.city=’London’ Intermediate

 SELECT * FROM Staff3 WHERE (branchNo=‘B003’ AND

 SELECT staffNo, fName, lName, salary FROM Staff WHERE

 WHERE (branchNo=‘B003’ AND salary>20000)

Branch ⋈ staff.branchNo=Branch.branchNo Staff

7. Commutative of Union and Intersection(but not set difference)

9. Commutativity of selection and set operations (Union, Intersection

SELECT title, price FROM catalog c, author a WHERE

Q#1) Write the different relational algebra queries which is

Q#2) Compare the relational algebra queries you get in Q#1

The optimizer uses its knowledge of system constraints to

This technique holds great promise for providing even more

 If we assume that the values of A are uniformly distributed in R,

(S7) Conjunctive selection using an individual index:

(S8) Conjunctive selection using a composite index

(J2) Single-loop join:

Both files are scanned in order of the join attributes,

A single pass through the file with fewer records (say, R)

1) Rule-based optimization rankings

A full table scan, which is available for all SQL statements.

Based on these paths, the rule-based optimizer will choose to

The user selects whether the minimal resource usage is based

The cost-based optimizer depends on statistics for all tables,

The PL/SQL package DBMS_STATS can be used to

 Query Optimization works in a similar way: There can be many

You might also like