0 ratings 0% found this document useful (0 votes) 14 views 18 pages 3 Query Processing and Optimization-1
This document discusses query processing and optimization in databases, detailing the steps involved such as parsing, optimization, and evaluation. It emphasizes the importance of energy efficiency and cost measurement in query execution, including factors like disk access and CPU time. Additionally, it covers various selection operations and sorting techniques, particularly focusing on external sort-merge algorithms for handling large datasets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here .
Available Formats
Download as PDF or read online on Scribd
Go to previous items Go to next items
Save 3 Query Processing and Optimization-1 For Later QUERY PROCESSING AND
OPTIMIZATION
6 II
\fter comprehensive study of this chapter, you will bo able to:
% Concept of Query Processing
Query Trees and Heuristics for Query Optimization
* Choice of Query Execution Plans
* Cost-Based Optimization.TIO —_ Advanced Database
OVERVIEW OF QUERY PROCESSING
Energy effcieny is an important feature in designing and executing databases. The in
query processing are to transform a query written in a high-level language, typically soy, if
correct and efficient execution strategy expressed in a low-level language (implementing
relational algebra), and to execute tho strategy to retrieve the required data, Thus, gy
Processing is the activities involved in parsing, validating, optimizing, and executing »
' = . a
‘The steps involved in processing a query processing is shown in figure 8.1 and they 7
are:
1. Parsing and translation
2 Optimization
3. Evaluation
‘Query in high-level language
Query Optimizer
‘Query Evaluation Engine
Query Output
Figure 3.1: Steps in query processing\\ ETI Query Prcessingand Optimization TH
\ parsing and Transtat ne
he Query
\ phe main work of a query proc;
‘e880r is Lo convert a q
* ye query submitted by the
user, Into A fort
converts the search string into definite instruc
Mery string into ‘query objects i.e., conver
lunderstood by the query processing engine. It
h i ‘ Hone. The query parser must analyze the query
language ies recognizing and interpreting operntors (AND, OR, NOT, +, - ete,), placing the
\ operators into Rroups ete. The basic job of the 8 (.f., keywords,
operators, operands, literal strings ete data elements (i.
relational algebra operations and open query graph), Parser
also verifies the validity and gyntay o
ting
\
Parser is to extract the token
) into their corresponding internal
ands) and structures (ie., query treo,
F the query string.
\ Optimizing the Query
“In this stage, Query optimizer t
long with the implementation methods to
beemployed for each relational operator.
| Beample 3.1: Consider the following SQL. query respectively:
SELECT Stu_name, Stu_address
FROM Student
WHERE age < 25;
This query can be translated into either of the following relational-algebra expressions:
© Gree 525 (Tsu. same, stu sdden(Student))
Tsu same Si address (Gages (Student))
This can be represented as either of the following query trees:
Gage 25 TU sans stato
Tsu pane, st ates , a"
Student Student
Figure 3.2: Query Tree hs ~
. is then transforme
After parsing and translation into a relational algebra expression, the query eisietia eg
into a form, usually a query tree or query graph, that can be handled by the opt|
"|
2 Advanced Database
‘The optimization engine then performs various analyses on the query data, generatin,
i 1 8 Number
f valid evaluation plans. From there, it determines the most appropriate
of e 3
evaluation plan
execute.
‘After the evaluation plan has boon selected, itis passed into the DMB query-oxccutionengig
(ato refered tone the runtime database processor), where the plan is executed and the rn
are returned,
MEASURING OF QueRY Cost
Cost of query is the time taken by the query to hit the database and return the result. ft invo
ves
query
optimize it, evaluate, execute and return the rsul
rocessing time i, time taken fo parse and transiate the query, optimize it, eva 4 il oe
teri called cost of the query, Executing the optimized query involves hitting the primary and secant
emory base onthe file organization method, Depending on file organization and the indexes used, mean),
tovev the data may vary Query cst concise nub of diferent esau ta eae
* The number of disk accesses / the number of disk block transfers / the size of the table
+ Time taken by CPU for executing the query
‘The time taken by CPU is negligible in most systems when compared with the number fdisk
accesses, If we consider the number of block transfers as the main component in calculating te
cost of a query, it would include more sub-components. Those are:
© Rotational latency:
of the disk.
* Seek time:
time taken to bring and spin the required data under the read-write head
{ime taken to position the read-write head over the required track or cylinder.
* Sequential YO:
reading data that are stored in contiguous blocks of the disk
Random W/O; reading data that are stored in different blocks that are not contiguous.
For simplicity we just use the number of block tr:
seeks as the cost measures of a query-evaluation plan,
fetch a record and there are b bloc
calculated as below
Query Cost =b x tr+S x ts
‘ansfers from disk and the number of
- Suppose a query need to seek S times to
‘ks needs to be returned to the user. The disk U0 cast i
Where,
© b-block transfer
© S-seeks
* tr-time to transfer one block
+ ts~time for one seek
The values of tr and ts must be calibrated for the disk system used, if tr=0.1 ms, ts =4 ms th
block size is 4 KB, and its transfor rate is 40 MB per second. With this, we can easily caleult®
the estimated cost of the given query evaluation plan,
an ‘the wes
Generally, for estimating the cost, we consider the worst ease that could happen. The ies
assume that initially, the data is rend from the disk only. But there must be a chance ta
information is already present in the main memory. However, the users usually ign
effect, and due to this, the actual cost of execution comes out less than the estimated valu.yr
ETO Query Processing and Optimization — 179
nse time, ie., the timo rev
spo ot attired to execu
of the query evaluation plan, But due to the | a
seule the response time without netually executing
: i plan, could bo used for estimating the
ollowing reasons, it becomes difficult to
Mt the query evaluati
‘The response time depends on the contonte of peeled
this information is not available when the gi
even if it were available,
“ tho buffer whon the query begins execution:
ery in of is
'Y is optimized, and is hard to account for
In a aystem with multiple disks, the
. ; n response time depend:
distributed among disks, which is hard to esti ithout detailed roeecee na
ise ward to estimate without detailed knowledge of data
SELECTION OPERATION (0)
ee
Queries are ultimately reduced to a number of file scan operations on the underlying physical
file structures. For each relational operation there can exist several different acca ate Ze ~
particular records needed, The query execution engine can have a multitude of specialized
algorithms designed to process particular relational operation and access path combinations.
Selections Using File Scans
File scans are search algorithms that locate and retrieve records that fulfill a selection
condition. The Select operation must search through the data files for records meeting the
selection criteria. The following are some ways of simple (one attribute) selection algorithms:
+ Al @inear search): Retrieve every record in the file, and test whether its attribute
values satisfy the selection condition.
Worst Case Costs = b, x tr + ts. Where, br is the number of blocks containing records
from relation r.
Ifa selection is on a key attribute, can stop on finding record
© Average Cost = (b1/2) x tr + ts.
Linear search is slow, but it is general
ordering of the file, or the availability of in
operation.
* — A2 (binary search): If the selection condition i ane
attribute on which the file is ordered, binary search (which is
search) ean be used.
© Worst Case Costs =[logs(b)] x (tr + ts)
1 because it can be applied regardless of the
dices, or the nature of the selection
nyolves an equality comparison on & key
more efficient than tinear
Selections Using Indices
n index sean and the index structure
search algorithm that makes use of an index is called a1
is called access path.
* — A8 (primary index, equality ©
with a primary index, we can use
corresponding equality condition.
© Cost = (ut 1) x (tr + t9) where hi
comparison on a key attribute
n Icey): For an equality compar at satisfies the
the index to retrieve a single
is the height of the index.1m ‘Advanced Database
. Ad (primary index, equality on non-key): For an equality comparison on no
attribute with a primary index, we can use the index to retrieve multiple recon io
spread over b successive blocks) that satisfy the corresponding equality condition, "¥
° Cost = hi X (tr + ts) + ts + tr x b, where hi is the height of the index.
«_ Ab(secondary index, equality): Selection specifying an equality condition can y
secondary index. This strategy can retrieve if the indexing field is not a key. Retry
single record if the search-key is a candidate key a
Cost = (hi + 1) X (tr + ts), where hi is the height of the index.
Retrieve multiple records if search-key is not a candidate key each of n may,
records may be on a different block. ig
Cost = (hi +n) x (tr + ts), where hi is the height of the index.
For large number of blocks n with matching records, this ean be very expensie x
cost even more than a linear sean!
Selections Involving Comparisons
We assume that the relation is sorted on attribute A. Consider a selection of the form oy
We can implement the selection either by using linear search, binary search or by using nics
in one of the following ways:
+ AG (primary index, comparison): A primary ordered index (for example, a primary B-
tree index) can be used when selection condition is a comparison.
0 For caze(r) use index to find first tuple > v and scan relation sequentially from
there.
For oaso(r) just scans relation sequentially till first tuple > v without using anv
index.
. ‘AT (secondary index, comparison): We can use a secondary ordered index to guide
retrieval for comparison conditions involving <, <, 2, or >.
° For o42¥(r) use index to find first index entry 2 v and scan index sequentially fr
there, to find pointers to records.
o For oxv(?) just scan leaf pages of index finding pointers to records, till first 2™"* :
v.
‘The secondary index provides pointers to the records, but to get the actual rece
to fetch the records by using the pointers. This step may require an UO operation fo" 0
record fetched, since consecutive records may be on different disk blocks: as bef eh
operation requires a disk seek and a block transfer, If the number of retriet
large, using the secondary index may be even more expensive than using lines
‘Therefore, the secondary index should be used only if very few records are selected.
es we
Selections of Complex Selections
it
pit
form A op B, wher?
we have considered only simple selection conditions of the
dicates
So far,
‘gon operation. We now consider more complex selection pre’
equality or comparit
© Conjunction: A conjunctive selection is a selection of the form:
pi p02noan00F)Disjunetion: A disjunctive eetectio CLAPTEI}O Query Processing and Optimization 1
mii r
© ptatsent isn solection of the forn
isjunctive condition is wati
A dig} . ton is satisfied by the union of all
simple conditions Oi, all records satisfying the individual,
Nogation: The result of a selection 6 u(r)
“w(?) is the vet, n
evatnten fle. Tn the nbwence sta of tuples of r for which the condition 0
etn out. Imply the set of tuples in r that are
A8 (conjunctive selection using on
e fi 1:
available for an attribute in one of the nite itt che
k if there is an access path
t simple conditions 0, to redw
h. ice th
4 04 and one of algorithms Al through A8 for which the conbieuie Rast Es iss sae
cost for oui(r). The cost of algorithm A8 is giver
E Ag| anlunetive eslecticn aaa A given by the cost of the chosen algorithm.
ae 7 ‘omposite index): An appropriate composite
(multiple-key) index may be available for some conjunctive selections. If ‘te i
exists on the combined attribute fields, then the index can be searched dinate ™
+ A10 (conjunetive selection by interesting of identifiers): This algorithm requires
indices with record pointers, on the fields involved in the individual conditions, The
algorithm uses corresponding index for each condition, and take intersection of all the
obtained sets of record pointers. Then fetch records from the file and if some conditions do
not have appropriate indices, apply test in memory.
+ All (disjunctive selection by union of identifiers): Indices can only be used if there
is an index for all conditions; otherwise, a linear scan of the relation has to be performed
any way. Uses corresponding index for each condition, and take union of all the obtained
sets of record pointers. Then fetch records from file.
SoRrTING
pak ee
Sorting in database system is important for two reasons:
1. Aquery may specify that the output should be sorted /
2. The processing of some relational query operations can be implement
, i tions e.g,, join operation.
efficiently based on sorted rela aa
For relations that fit in memory, techniques like quick-sort can be used and for relations
ft in mer . i be used.
not fit in memory an external sort-merge algorithm can be
ted more
Igorithm
External Sort-Merge Als nal sorting, The most commonly
' is called exter : M denote
i ; in memory is ea rithm, Let 3
Porting * ae . mee phe is the external sort-merse algo!
technique for exter
Memory size (in pages).
1. Create sorted runs. Initialize (=O. of the elation (Let the final valu
Repeat the following till the end we
2) Read M blocks of relation nto met
ks
b) Sort the in-memory blo“
©) Write sorted data tour Re
d) intel |
ye of i be N)416. Advanced Database
2. Merge the runs (N-way merge). We assume that N M, several merge passes are required. In each pass, contiguous groups of M- 1 ry, |
merged. A pass reduces the number of runs by a factor of M -1, and creates runs longer _ |
same factor. Repeated passes are performed till all runs have been merged into one, *
a|w
a| at a | 19
s | 4 3 | 2 b | 14 a |u|
a | 9 =) ae ila
a | 31 b | uw ales val |
< | 3 c | 33 The eis
b
er e | 16 oan ala
e 16
a [2 £12
r | 16 a | ala
m| 3
a [a = a[7 7 li
r
m | 3 a [a etm
Pi? a | 14 m | 3 id
a[7 7 P p|2
a [1 p|2 r | 16 1 | 6
runs
initial runs sorted
relation ‘output
create merge ity
runs pass -1 pase
Figure 3.3: External sorting using sort-merge.
Figure 3.3 illustrates the steps of the external sort-merge for an example rest a |
illustration purposes, we assume that only one tuple fits in a block (f= 1), and we #5
memory holds at most three blocks. During the merge stage, two blocks are used for int
one for output.
Cost Analysis of external Sort-Merge
Let b- denote the number of blocks containing records of relation r
‘The initial number of runs =lbe/M1.
Since the number of runs decrease by a factor of M - 1 in each merge passite
Go
‘The total number of merge passes required =[ logs a scram
=| loga(by /M .
rst stage reads every block of the
relation and writ
x eransfers. Each of these passes re tae
gain,
writes it out
First, the final pass ,
water Pi ne produce the sorted output without writ
cond, there may be Hing its rest is
y be runs that are not read in or written out duri 2
ing a pass
\ber of block
otal number transfers for external sorting of the relation = b,x (2 *{
ov loguci(b, /M1+ 1).
JOINING
Saae———ooo ee. OS
ike selection, the join operation (oini i
i ty ani a
algorithms is er ical in minimizing a query’s execution time. cap aie
tppesof join algorithms are: ing are 5 well-known
+ — Nested-Loop Join
+ Block Nested-Loop Join
«Indexed Nested-Loop Join
+ Sort-Merge Join
« — Hash Join
Nested-Loop Join
‘his algorithm consists of an inner for loop nes
algorithm, we will use the following notations:
1,6 Relations rand s
t ‘Puple (record) in relation r
t. _Tuple (record) in relation ¢
in relation r
ted within an outer for loop. To illustrate this
ne Number of records
ne Number of records in relation §
b Number of blocks with records in relation
weds in relation §
be ‘Number of blocks with reco! f
for joining the two relation
rand s utilizing the nested-for
Here is a sample pseudo-code Tisting
loop:TB Advanced Database
In the algorithm, t+ and ts are the tuples of relations r and s, respectively. The not
ation 1,
tuple constructed by concatenating the attribute values of tuples t; and t,
With the help of the algorithm, we understood the following points:
* The nested-loop join does not need any indexing similar to a linear file sean for ae
the data, "
+ Nested-loop join does not eare about the given join condition. It is suitable for etch
join condition.
* The nested-loop join algorithm is expensive in nature.
It is because it compute
i 8 a
examines each pair of tuples in the given two relations, z
Block Nested-Loop Join:
If the buffer is too small to hold either relation entirely in memory, we can still obtain a
saving in block accesses if we process the relations on a per-block basis, rather thay aod
tuple basis. Figure 3.5 shows block nested-loop join, which is a variant of the nested. oop cg
where every block of the inner relation is paired with every block of the outer relation Wks
cach pair of blocks, ever tuple in one block is paired with every tuple in the othe, Hack
Generate all pairs of tuples. As before, all pairs of tuples that satisfy the join condition Fe ads
to the result. 7
ffor each block 6, of r {
for each block b, of s {
for each tuple t, in b, {
for each tuple t, in b, {
if join condition is true for (t,, t,)
add tuple t,xt, to the result;
+
+
}
Figure 3.5: Block nested-loop join
‘The primary difference in cost between the bl
is that, in the worst case,
outer relation, instead of
lock nested-loop join and the basic nested-loop =
each block in the inner relation s is read only once for each bleck in
once for each tuple in the outer relation. Clearly,
‘use the smaller relation as the outer relation, in case neither of the relations
Index Nested-Loop Join
it is more efficient»
s fits in memory.
This algorithm is the same as the Nested-Loop Join,
(6) join attribute is used versus a data-file scan on
essentially an equality selection on s utilizing one of t
Sort-Merge Join
except an index file on the inner relti’*
$ - each index lookup in the inner lop *
the selection algorithms.
This algorithm can be used to perform natural joins and equi-joins and requires that
relation (F and s) be sorted by the common attributes between them (Ra §). The details
; a
this algorithm works will not be presented here. However, it is notable to point out thitee EGRET O Query Processing and Optimization — 170
in rand s is only scanne
record ony Scanned once, thus producing a worst nnd hest-cnse cost. of br + by
Variations of the Sort-Merge loin algorith
‘orithm are used, for insta en the las are: ,
orted order, but there exit secondary indies tance, when the data files are in un:
# for the two relations,
Hash Join
Like a sort-meree join, the hash join algorithm ean be used to perform natural joins and equi
joins, The concept be hind the Hash join algorithm is to partition the tuples of ench given relation
into sets. “The partition is done on the basis of the same hash value on the join attributes. The
hash function provides the hash value. ‘The main goal of using the hash function in the
ithm is to °
een See of comparisons and increase the efficiency to complete the
For example, suppose there are two tuples a and b where both of them satisfy the join condition
Tt means they have the same value for the join attributes. Suppose that both a and b tuples
consist of a hash value as i. It implies that tuple a should be in ai, and tuple b should be in by
‘Thus, only compare a tuples in ai with b tuples of bi. There is no need to compare the b tuples in
any other partition. Therefore, in this way, the hash join operation works.
EVALUATION OF EXPRESSION
We have studied how individual relational operations are carried out. The obvious way to
evaluate an expression is simply to evaluate one operation at a time, in an appropriate order.
Now we consider how to evaluate an expression containing multiple operations. There are two
approaches how a query execution tree can be evaluated:
+ Materialization: Compute the result of an evaluation primitive and materialize (store)
the new relation on the disk.
+ Pipelining: Pass on tuples to parent operations even while an operation is still being
executed,
Materialization
Itis easiest to understand intuitively how to evaluate an expression by looking at a pictorial representation of the
expression in an operator tree.
Example 3.2: Consider the expression:
Theat pane( pope sts (Department) » Staff)
Past
eit
Department
1 representation ofan expression (@U0"Y tree).
Figure 3.6: Picto480 Advanced Database
slntional operation at a time
ven expression evaluates one Alto, «
In this method, the given expressio ge " ne
qperation is evaluated in an appropriate sequence or order. After evaluating all the opera,
mporary relation for their subsequent uses. The exam 5
are materialized in at
the output
figure 3.6 is computed as followin)
1. Compute areas (Department) and store relation]
2. Compute Staff Ȣ materialized relation! and store relation?
3. Compute Msi nnson materialized relation?
By repeating the process, we will eventually evaluate the operation at the root of the tree, grizg
the final result of the expression, In our example, we get the final result by executing 4,
projection operation at the root of the tree, using as input the temporary relation created by rp,
join.
‘The cost of this type of evaluation is always more leading to a disadvantage. The disadvanta,
that it needs to construct those temporary relations for materializing the results of th,
evaluated operations, respectively. These temporary relations are written on the disks unlea
they are small in size.
Double buffering (using two buffers, with one continuing execution of the algorithm while te
other is being written out) allows the algorithm to execute more quickly by performing CPL
activity in parallel with I/O activity. The number of seeks can be reduced by allocating ex:
blocks to the output buffer, and writing out multiple blocks at once.
Pipelining
In this method, DBMS do not store the records into temporary tables. Instead, it qui
query and result of which will be passed to next query to process and so on. It will process
query one after the other and each will use the result of previous query for its procssst
Pipelining evaluates multiple operations simultaneously by-passing results of one operative ®
the next one without storing the tuples on the disk,
In the example of figure 3.6, all three operations can be placed in a pipeline, which passes
results of the selection to the join as they are generated. In turn, it passes the results of ti
to the projection as they are generated, ‘The memory requirements are low, since results 2
operation are not stored for long. However, as a result of pipelining, the inputs to the oper!
are not available all at onee for processing,
Creating a pipeline of operations can provide two benefits:
* It eliminates the cost of rea
query evaluat
i i “
ing and writing temporary relations, reducing the
+ It can start generating query results quickly,
plan is combined in a pipeline with its inputs,
displayed to a user as they are generates
before the user sees any query results,
if the root operator of a query evil
‘This can be quite useful if the resul |
| sinee otherwise there may be a lens
Implementation of pipelining
Pipelines can be executed in either of two ways;
adDemand-driven (or Lazy eval ue)
ray uation) Pipeli
s is not i ni
tere aoede iae9 Passed to the higher level automatically. It wil
tere en ‘hen is reawested by the higher level, In this tn, ee
alu a1 with it and it will be transferred tothe mnt eet a rel
ee he next level only when it is
} Query Processing and Optimization
‘181
ing: In this meth
od, the result of lower.
g, Producer-driven (or Eager) Pipelining:
eagerly pass the results to hi
t igher level quer
queries to request for the results. In this me
In this method, the lower-level queries
Tes. Tt does not wait for
find hae he higher-level
; |, lowerslev
store the results and the higher-level queries pulle the Tele fortran te
se
full, then the lower-level query waits for th - AF the butter is
ie higher. i ‘
also called as PULL and PUSH pipelining, nnn {Y®! UY 1 empty it. Hence i ie
QueRY OPTIMIZATION
‘The function of query optimization engine is to find
execution cost of a query. We have seen in the pre
particular operations such as select and join can va
Example 3.3: Consider 2 relations r and s,
an evaluation plan that reduces the overall
‘vious sections that the costs for performing
ry quite dramatically.
with the following characteristics:
10,000 = ny = Number of tuples in r
1,000
1,000
100=
Number of tuples ins
= Number of blocks with tuples in r
Number of blocks with tuples in s
Selecting a single record from ron a non-key attribute can have,
* acost of {logs(b,)1= 10 (binary search) or
+ a cost of bi2 = 5,000 (linear search).
Joining r and s can have,
© acost of n-X bs+ br = 1,001,000 (nested-loop join) or
© cost of 3(b, + bs) + 4na = 73,000 (hash-join where ns = 10,000),
7 sins by
Notice that the cost difference between the 2 selects differs by a ate of tonsa te he
4 factor of ~14. Clearly, selecting lower-cost methods can result in
Performance.
i i sries incl
Query optimization strategies for lowering tho cent is quia
°Plimization, heuristic-based optimization and semanti
Judes: cost-based
st-bas ization -
a 1 cost:based optimization. This is
i wed on indexes, constraints,
ifferent paths ba:
tatisties like record size, number of snes
whether whole table fits in a bl
tc, Some of the features of
jem is known @
This process of selecting a lower-cost mechanism is
se di
on the eost of the query. The auery can 8
Sorting methods ete. This method mainly sors table size,
umber of records per block, number of blocks Oe TT
Seanization of tables, uniqueness of column ¥
the cost-based optimization are as follows:I
482, Advanced Database
«Ibis based on the cost of the query that to be optimized.
«The query can use a lot of paths based on the value of indexes, available sorting me,
constraints, etc.
«The aim of query optimization is to choose the most efficient path of implementing 4,
query at the possible lowest minimum cost in the form of an algorithm, .
«The cost of executing the algorithm needs to be provided by the query Optimizers
the most suitable query can be selected for an operation.
. ‘The cost of an algorithm also depends upon the cardinality of the input.
Heuristic-based Optimization
Heuristic optimization transforms the query-tree by using a set of rules (Heuristics) th,
typically (out not in all cases) improve execution performance. Some common the common
heuristic rules are:
* Perform selection early (reduces the number of tuples)
«Perform projection early (reduces the number of attributes)
© Perform most restrictive selection and join operations (ie., with smallest result siz)
before other similar operations
Initially query tree from SQL statement is generated. Query tree is transformed into mon
efficient query tree, via a series of tree modifications, each of which hopefully reduces the
execution time. A single query tree is involved at last.
Semantie-based Optimization
‘This strategy uses constraints specified on the database schema—such as unique attributes ani
other more complex constraints—in order to modify one query into another query that is mor
efficient to execute,
Example 3.4: Consider the following SQL query:
SELECT elname, m.iname
FROM EMPLOYEE as e, EMPLOYEE as m
WHERE e.super_ssn=m.ssn and e.salary>m.salary;
‘This query retrieves the names of employees who earn more than their supervisors. S17
that we had a constraint on the database schema that stated that no employee can ea
than his or her direet supervisor. Ifthe semantic query optimizer checks forthe exstene oF
constraint, it does not need to execute the query at all because it knows that the rest ae
query will be empty. This may save considerable timo if the constraint checking ©? ee
efficiently. However, searching through many constraints to find those that are applicable ’
given query and that may semantically optimize it can also be quite time-consuming:
inclusion of active rules and additional metadata in database systems, semantic
ptimization techniques are being gradually incorporated into the DBMS.afer
get
eransfort the expression and tree into equival
pie 35: Consider the following SQL, queryy
pon SELECT Stu_name, Marks Obtained
FROM Student, Marks
corresponding relational algebra expression jg:
A Tis. name, Marks Obtained(GStu.id=10 (Sub jte09 (Student o4 Marks)
Tle sae Ma ties
sade
es
Figure 3.7: Initial expression tree
Suppose the Student and Marks relations both have 100 records each and the number of
Stu _id=10 is 50. Note that the Cartesian product resulting in 10,000 records can be reduced by
50% if the o Stu_id=10 operation is performed first. We can also combine the Sub_id=20 and
Cartesian product operations into a more efficient join operation, as well as eliminating any
unneeded columns before the expensive join is performed. The diagram below shows this better,
“optimized” version of the tres
Tsay ames Matis bined
Pasa iansruid
an Ths ont Sat
ie | de
red tree of figure 3.7)
MTree after transformation (opti yuery optimizer can use to
it and theorems the a
e several aro equivalent relations states that the set of
the definit
tbe the same—because they are sets, the order does not
be the y >
st
| algebra theorems:
Figure 3.1
Inrelational algebra, there ar
"ransform the query. For instance,
‘Attributes (domain) of each relation mu
Matter, Here is a partial list of relationa!484 Advanced Database
Cascade of o: A select with con}
cascade of selects upon selec
iaiaata anal) # ON1(FAa(~e(GAN(P)---))
2, Commutativity of o: The select operation is commutative:
oai(oaa(r)) = 5a2(oai(?))
3. Cascade of II: A cascade of proj
the caseade:
Tatts (TTatisa(.(TTatia(*))---)) = Hatin ()
4. Commutating o with TI: Given a 11's and o's attribute of Ar, Az, .
operations can be commuted:
Fanatic orn (6e(P)) = Oc(FTata....An(P))
Commutativity of b¢ (or x): The join and Cartesian product operations are commutative
reasesoar andr
6. Commuting & with »4 (or x): Select can be commu!
1 junctive conditions on the attribute list is equiva
i TL
a
ect operations is equivalent to the last project operat
tion
An, the Mand g
xr
ted with join ( or Cartesian product) as
follows:
Ifall of the attributes in the select’s condition are in relation r then
or 64 8) = (6-(7)) 248
b. Given select the condition c composed of conditions cl and c2, and cl contains only
attributes from r and 2 contains only attributes from s then -
‘o<(r #4 $) = (ca(?)) * (als)
7. Commutativity of set operations (U, 0, -)
commutative; but the difference operation is not:
TUSSSULFASESOr retest
8 Associativity of 4, x, U and 1: Alll four of these operations are individually associative. Le
be any one of these operators, then:
(r0s) Ot=r0(sOt)
9. Commuting o with set operations (U, 4, - ): Let @ be any one of the three set ‘operations, the
c(t 8 8) = (Ge()) 8 (@c(8))
10. | Commuting II with U: Project and union operations can be commuted:
Tlaua(t Us) = (Tats (2)) U (Tani (8))
Using these theorems, an algorithm can be defined to transform the original a
expression/tree created by the parser into a more optimized query. Some of the key ‘concepts
be summarized as follow:
a.
Union and intersection operations are
1. © One primary objective is to reduce the size of the intermediate relations, both in terms!
bytes per record as well as number of records, as soon as possible so that subsea!
operations will have less data to process and thus execute quicker.
2. Operations, such as conjunctive selections, should be broken down into their ea"
set of smaller units to allow the individual units to be moved into “better” position"
the query tree.
: . eins td
Combine Cartesian products with corresponding selects to create joins et
optimized join algorithms like the sort-merge join and hash join ca” ont
magnitude more efficient.CERT 0 Query Processing and Opiniation 185
far down the tree
relations that ca
Move selects and projects
produce smaller intermediate
operations above. 8 possible, as these operat
© operations. will
in be proces
be processed more quickly by the
choice of Evaluation Plans
\ —
spo query optimization engine typically go,
‘ in heuristic theory, vs
walt eory, produce a faster, more effi
, © efficient execution. Ot
execution. Others may,
ical results, be more effici
ten efficient than the theoreti
for queries dependent on the semantic nature of ical models—this can very well be
‘e of the di ery *
gre efficient due to “outside agencies”
7 a eon oo such as network congestion, i
sane PU, te Th ct tasaleaalence econ . competing applications on
ee valuation plan to exeoute a ay ich the query execution engine can
jven time, .
Sua
1, Explain query processing in detail with example,
‘ates a sot i
& set of candidate evaluation plans. S
by prior
freee the case
© processed. Still others can be
Yq, 2% What are query optimization techniques? Explain.
3, _ How does query processing and query optimization related?
‘4 wae o porte relational-algebra expression of following SQL query and draw their
SELECT Stu_name, Dept_id
il FROM Student
WHERE Dept_id <=2;
6. How do you measure the cost of query? Explain.
mula to calculate the cost of searching algorithm
6 Define access path. Write the for:
selections using indices.
the external sort-merge algori
rategies for implementin
jthm with suitable example.
7. Explain
1g the Join operation?
& What are the main st
tion expression?
on works with example?
9. What do you mean by evaluat
37 10. Explain how materialization evaluati
"(11 Explain pipelining approach of evaluation of €X
pression in detail.
optimization.
gard to query
yy be mapped:
istic rules with re
nested queries ma!
12, Contrast cost estimation and hew .
. i whiel
©; 18, Disouss semi-join and antijoin 2° operations (©
, provide an example of each-
J 14. How outer join and non-eaa
15.
17.
18.
19.
20.
‘Advanced Database
int a relational algebra expression? What is meany
b
Jes for transformation of query trees, ang ;
oat 1 and j
mization.
How does a query tree represel
execution of a query tree? Discuss the ru hg
when each rule should be applied during opti en
What is meant by semantic query optimization? How does it differ from othe,
optimization techniques? ery
What is the difference between pil
What are the problems associated with keeping views materialized?
What do you mean by query processing? What are the various steps involved in
processing? Explain with the help of a block diagram. Mery
Discuss the cost components for a cost function that is used to estimate query exeent
cost, Which cost components are used most often as the basis for cost functions? “
pelining and materialization?
Q00