Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
9 views24 pages

Lec 17

The document discusses query optimization in relational database management systems (DBMS), focusing on finding the best query execution plan through cost estimation and search algorithms. It highlights the challenges of enumerating multiple execution plans, particularly for single and multiple-table queries, and emphasizes the use of dynamic programming to reduce search space. The query optimizer is identified as the most complex component of DBMS, tasked with exploring alternative plans and estimating their costs.

Uploaded by

p20232002567
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views24 pages

Lec 17

The document discusses query optimization in relational database management systems (DBMS), focusing on finding the best query execution plan through cost estimation and search algorithms. It highlights the challenges of enumerating multiple execution plans, particularly for single and multiple-table queries, and emphasizes the use of dynamic programming to reduce search space. The query optimizer is identified as the most complex component of DBMS, tasked with exploring alternative plans and estimating their costs.

Uploaded by

p20232002567
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Advanced Database Systems

Spring 2025

Lecture #17:
Query Optimisation: Searching

R&G: Chapter 15
2

Q UERY O PTIMISATION
Plan space

Cost estimation

Search algorithm
3

F INDING THE “B EST ” Q UERY P LAN


Holy grail of any DBMS implementation

Challenge: There may be more than one way to answer a given query
Which one of the join operators should we pick?

With which parameters (block size, buffer allocation, …)?

Which join ordering?


4

F INDING THE “B EST ” Q UERY P LAN


The query optimiser
1. Enumerates all possible query execution plans
If this yields too many plans, at least enumerate the “promising” plan candidates

2. Determines the cost (quality) of each plan

3. Chooses the best one as the final execution plan

Ideally: Want to find the best plan. Practically: Avoid worst plans!
5

E NUMERATION OF A LTERNATIVE P LANS


There are two main cases:
Single-table plans (base case)

Multiple-table plans (induction)

Single-table queries include selects, projects, and group-by / aggregate


Consider each available access path (file scan vs. index)
Choose the one with the least estimated cost
6

S INGLE -TABLE P LANS : C OST E STIMATES


Index I on primary key matches selection:
Cost is (Height(I) + 1) + 1 for a B+ tree (variant B or C)

Clustered index I matching selection:


(NPages(I) + NPages(R)) * selectivity (approximately)

Non-clustered index I matching selection:


(NPages(I) + NTuples(R)) * selectivity (approximately)

Sequential scan of file


NPages(R)

Recall: Must also charge for duplicate elimination if required


7

S INGLE -TABLE P LAN : E XAMPLE SELECT * FROM Sailors


WHERE rating = 8

If we have an index I on rating: NTuples(Sailors) = 40,000


Cardinality NPages(Sailors) = 500

= 1/ NKeys(rating) · NTuples(Sailors) = 1/10 · 40,000 = 4000 tuples NKeys(rating) = 10


NPages(I) = 50
Clustered index
1/ NKeys(rating) · (NPages(I) + NPages(Sailors)) = 1/10 · (50 + 500) = 55 pages are retrieved
Unclustered index
1/ NKeys(rating) · (NPages(I) + NTuples(Sailors)) = 1/10 · (50 + 40,000) = 4005 pages are retrieved

Costs on indexes are approximate as we might not need to retrieve all index pages

If we have an index I on sid:


Doing an index scan retrieves all pages & tuples
Clustered index: ~ (50 + 500) pages retrieved. Unclustered index: ~ (50 + 40,000) pages retrieved

Doing a file scan retrieves all file pages: 500


8

M ULTIPLE -TABLE P LANS


We have translated the query into a graph of query blocks
Query blocks are essentially a multi-way product of relations with projections on top

Task: enumerate all possible execution plans


I.e., all possible 2-way join combinations for each query block

Example: three-way join


12 possible re-orderings
⋈ ⋈
2 shown here ⋈ T S ⋈
R S T R
9

E NORMOUS S EARCH S PACE


# of relations n # of different join trees
2 2
3 12
4 120
5 1,680
6 30,240
7 665,280
8 17,297,280
10 17,643,225,600

We have not even considered different join algorithms!

We n e e d t o re s t r i c t s e a rc h s pa c e !
10

M ULTIPLE -TABLE Q UERY P LANNING


Fundamental decision in IBM’s System R (late 1970):
Only consider left-deep join trees

✓⋈ ⨉⋈ ⨉⋈
⋈ U T ⋈
⋈ T ⋈ ⋈ U ⋈
R S R S T U S R
left-deep bushy right-deep
(everything else)
11

L EFT-D EEP J OIN T REES


DBMSs often prefer left-deep join trees

The inner (rhs) relation always is a base relation
⋈ U
Allows the use of index nested loops join
Allows for fully pipelined plans where intermediate
⋈ T
results are not written to temporary files R S
Should be factored into global cost calculation

Not all left-deep trees are fully pipelined (e.g., sort-merge join)

Pipelining requires non-blocking operators

Modern DBMSs may also consider non left-deep join trees


12

M ULTI -TABLE Q UERY P LANNING


System R-style join order enumeration ⋈ ⋈
Left-deep tree #1, Left-deep tree #2… ⋈ U ⋈ R

Eliminate plans with cross products immediately ⋈ T ⋈ U

R S S T
Enumerate the plans for each operator
Hash, Sort-Merge, Nested Loop…

Enumerate the access paths for each table


Index #1, Index #2, Sequential scan…

Use dynamic programming to reduce the number of cost estimations


13

T HE P RINCIPLE OF O PTIMALITY
The best overall plan is composed of best decisions on the subplans
Optimal result has optimal substructure

For example, the best left-deep plan to join tables R, S, T is either:


(The best plan for joining R, S) ⨝ T

(The best plan for joining R, T) ⨝ S

(The best plan for joining S, T) ⨝ R

This is great!
When optimising a subplan (e.g., R ⨝ S), don’t worry how it will be used later (e.g., when joining with T)!

When optimizing a higher-level plan (e.g., R ⨝ S ⨝ T), reuse the best results of subplans (e.g., R ⨝ S)!
14

E XAMPLE : D YNAMIC P ROGRAMMING


Pass #1 (best 1-relation plans): Find best access SELECT * FROM R, S, T
WHERE R.A = S.A
path to each relation (index vs. full table scans)
AND S.B = T.B

R⋈S
T

R
S R⋈S⋈T
T

T⋈S
R
15

E XAMPLE : D YNAMIC P ROGRAMMING


Pass #2 (best 2-relation plans): determine best join SELECT * FROM R, S, T
WHERE R.A = S.A
order (R ⨝ S or S ⨝ R), choose best candidate
AND S.B = T.B

Hash Join
R.a = S.a R⋈S
T
Sort-Merge Join
R.a = S.a
R
S R⋈S⋈T
T Sort-Merge Join
S.b = T.b

T⋈S
Hash Join
T.b = S.b
R
16

E XAMPLE : D YNAMIC P ROGRAMMING


Pass #2 (best 2-relation plans): determine best join SELECT * FROM R, S, T
WHERE R.A = S.A
order (R ⨝ S or S ⨝ R), choose best candidate
AND S.B = T.B

Hash Join
R.a = S.a R⋈S
T

R
S R⋈S⋈T
T

T⋈S
Hash Join
T.b = S.b
R
17

E XAMPLE : D YNAMIC P ROGRAMMING


Pass #3 (best 3-relation plans): SELECT * FROM R, S, T
WHERE R.A = S.A
best 2-relation plans + one other relation
AND S.B = T.B

Hash Join
R.a = S.a R⋈S Hash Join
S.b = T.b
T

Sort-Merge Join
R S.b = T.b
S R⋈S⋈T
T Sort-Merge Join
S.a = R.a

T⋈S Hash Join


Hash Join
T.b = S.b
R S.a = R.a
18

E XAMPLE : D YNAMIC P ROGRAMMING


Pass #3 (best 3-relation plans): SELECT * FROM R, S, T
WHERE R.A = S.A
best 2-relation plans + one other relation
AND S.B = T.B

Hash Join
R.a = S.a R⋈S Hash Join
S.b = T.b
T

R
S R⋈S⋈T
T Sort-Merge Join
S.a = R.a

T⋈S
Hash Join
T.b = S.b
R
19

E XAMPLE : D YNAMIC P ROGRAMMING


Pass #3 (best 3-relation plans): SELECT * FROM R, S, T
WHERE R.A = S.A
best 2-relation plans + one other relation
AND S.B = T.B

R⋈S
T

R
S R⋈S⋈T
T Sort-Merge Join
S.a = R.a

T⋈S
Hash Join
T.b = S.b
R
20

I NTERESTING O RDERS
System R-style query optimisers also consider interesting orders
Sorting orders of the input tables that may be beneficial later in the query plan
E.g., for a sort-merge join, projection with duplicate removal, order-by clause

Determined by ORDER BY and GROUP BY clauses in the input query or join


attributes of subsequent joins (to facilitate merging)

For each subset of relations, retain only:


Cheapest plan overall, plus

Cheapest plan for each interesting order of the tuples


21

E XAMPLE
SELECT S.sid, COUNT(*) AS number Sailors:
FROM Sailors S B+ tree on sid
JOIN Reserves R ON S.sid = R.sid Reserves:
JOIN Boats B ON R.bid = B.bid
Clustered B+ tree on bid
WHERE B.color = ‘red’
GROUP BY S.sid B+ tree on sid
Boats:
B+ tree on color
Pass 1: Best plan for each relation
Sailors, Reserves: File scan
Boats: B+ tree on color
Also B+ tree on Sailors.sid as interesting order (output sorted on sid)
Also B+ tree on Reserves.bid as interesting order (output sorted on bid)
Also B+ tree on Reserves.sid as interesting order (output sorted on sid)
22

E XAMPLE : PASS 2
Pass 2: Best 2-relation plans

// for each left-deep logical plan


foreach plan P in Pass 1:
foreach FROM table T not in P:
// for each physical plan
foreach access method M on T:
foreach join method ⨝:
generate P ⨝ M(T)

Eliminate cross products


Retain cheapest plan for each (pair of relations, order)
23

E XAMPLE : PASS 3
Using Pass 2 plans as outer relations, generate plans ⋈ sid=sid

for the next join in the same way as Pass 2 INDEX NESTED LOOPS

Example: the marked subplan is the best plan ⋈ bid=bid Sailors


for { Reserves, Boats } and provides an interesting SORT MERGE
INDEX SCAN

order on Boats.bid and Reserves.bid σ color=‘red’

Then, add cost for group-by / aggregate: Boats Reserves


INDEX SCAN SCAN
This is the cost to sort the result by sid
… unless it has already been sorted by a previous operator

Finally, choose the cheapest plan


24

S UMMARY
Query optimisation is an important task in a relational DBMS

Explores a set of alternative plans


Must prune search space; typically, left-deep plans only
Uses dynamic programming for join orderings

Must estimate cost of each plan that is considered


Must estimate the size of result and cost for each plan node

Query optimiser is the most complex part of database systems!

You might also like