Advanced Database Systems
Spring 2025
Lecture #17:
Query Optimisation: Searching
R&G: Chapter 15
2
Q UERY O PTIMISATION
Plan space
Cost estimation
Search algorithm
3
F INDING THE “B EST ” Q UERY P LAN
Holy grail of any DBMS implementation
Challenge: There may be more than one way to answer a given query
Which one of the join operators should we pick?
With which parameters (block size, buffer allocation, …)?
Which join ordering?
4
F INDING THE “B EST ” Q UERY P LAN
The query optimiser
1. Enumerates all possible query execution plans
If this yields too many plans, at least enumerate the “promising” plan candidates
2. Determines the cost (quality) of each plan
3. Chooses the best one as the final execution plan
Ideally: Want to find the best plan. Practically: Avoid worst plans!
5
E NUMERATION OF A LTERNATIVE P LANS
There are two main cases:
Single-table plans (base case)
Multiple-table plans (induction)
Single-table queries include selects, projects, and group-by / aggregate
Consider each available access path (file scan vs. index)
Choose the one with the least estimated cost
6
S INGLE -TABLE P LANS : C OST E STIMATES
Index I on primary key matches selection:
Cost is (Height(I) + 1) + 1 for a B+ tree (variant B or C)
Clustered index I matching selection:
(NPages(I) + NPages(R)) * selectivity (approximately)
Non-clustered index I matching selection:
(NPages(I) + NTuples(R)) * selectivity (approximately)
Sequential scan of file
NPages(R)
Recall: Must also charge for duplicate elimination if required
7
S INGLE -TABLE P LAN : E XAMPLE SELECT * FROM Sailors
WHERE rating = 8
If we have an index I on rating: NTuples(Sailors) = 40,000
Cardinality NPages(Sailors) = 500
= 1/ NKeys(rating) · NTuples(Sailors) = 1/10 · 40,000 = 4000 tuples NKeys(rating) = 10
NPages(I) = 50
Clustered index
1/ NKeys(rating) · (NPages(I) + NPages(Sailors)) = 1/10 · (50 + 500) = 55 pages are retrieved
Unclustered index
1/ NKeys(rating) · (NPages(I) + NTuples(Sailors)) = 1/10 · (50 + 40,000) = 4005 pages are retrieved
Costs on indexes are approximate as we might not need to retrieve all index pages
If we have an index I on sid:
Doing an index scan retrieves all pages & tuples
Clustered index: ~ (50 + 500) pages retrieved. Unclustered index: ~ (50 + 40,000) pages retrieved
Doing a file scan retrieves all file pages: 500
8
M ULTIPLE -TABLE P LANS
We have translated the query into a graph of query blocks
Query blocks are essentially a multi-way product of relations with projections on top
Task: enumerate all possible execution plans
I.e., all possible 2-way join combinations for each query block
Example: three-way join
12 possible re-orderings
⋈ ⋈
2 shown here ⋈ T S ⋈
R S T R
9
E NORMOUS S EARCH S PACE
# of relations n # of different join trees
2 2
3 12
4 120
5 1,680
6 30,240
7 665,280
8 17,297,280
10 17,643,225,600
We have not even considered different join algorithms!
We n e e d t o re s t r i c t s e a rc h s pa c e !
10
M ULTIPLE -TABLE Q UERY P LANNING
Fundamental decision in IBM’s System R (late 1970):
Only consider left-deep join trees
✓⋈ ⨉⋈ ⨉⋈
⋈ U T ⋈
⋈ T ⋈ ⋈ U ⋈
R S R S T U S R
left-deep bushy right-deep
(everything else)
11
L EFT-D EEP J OIN T REES
DBMSs often prefer left-deep join trees
⋈
The inner (rhs) relation always is a base relation
⋈ U
Allows the use of index nested loops join
Allows for fully pipelined plans where intermediate
⋈ T
results are not written to temporary files R S
Should be factored into global cost calculation
Not all left-deep trees are fully pipelined (e.g., sort-merge join)
Pipelining requires non-blocking operators
Modern DBMSs may also consider non left-deep join trees
12
M ULTI -TABLE Q UERY P LANNING
System R-style join order enumeration ⋈ ⋈
Left-deep tree #1, Left-deep tree #2… ⋈ U ⋈ R
Eliminate plans with cross products immediately ⋈ T ⋈ U
R S S T
Enumerate the plans for each operator
Hash, Sort-Merge, Nested Loop…
Enumerate the access paths for each table
Index #1, Index #2, Sequential scan…
Use dynamic programming to reduce the number of cost estimations
13
T HE P RINCIPLE OF O PTIMALITY
The best overall plan is composed of best decisions on the subplans
Optimal result has optimal substructure
For example, the best left-deep plan to join tables R, S, T is either:
(The best plan for joining R, S) ⨝ T
(The best plan for joining R, T) ⨝ S
(The best plan for joining S, T) ⨝ R
This is great!
When optimising a subplan (e.g., R ⨝ S), don’t worry how it will be used later (e.g., when joining with T)!
When optimizing a higher-level plan (e.g., R ⨝ S ⨝ T), reuse the best results of subplans (e.g., R ⨝ S)!
14
E XAMPLE : D YNAMIC P ROGRAMMING
Pass #1 (best 1-relation plans): Find best access SELECT * FROM R, S, T
WHERE R.A = S.A
path to each relation (index vs. full table scans)
AND S.B = T.B
R⋈S
T
R
S R⋈S⋈T
T
T⋈S
R
15
E XAMPLE : D YNAMIC P ROGRAMMING
Pass #2 (best 2-relation plans): determine best join SELECT * FROM R, S, T
WHERE R.A = S.A
order (R ⨝ S or S ⨝ R), choose best candidate
AND S.B = T.B
Hash Join
R.a = S.a R⋈S
T
Sort-Merge Join
R.a = S.a
R
S R⋈S⋈T
T Sort-Merge Join
S.b = T.b
T⋈S
Hash Join
T.b = S.b
R
16
E XAMPLE : D YNAMIC P ROGRAMMING
Pass #2 (best 2-relation plans): determine best join SELECT * FROM R, S, T
WHERE R.A = S.A
order (R ⨝ S or S ⨝ R), choose best candidate
AND S.B = T.B
Hash Join
R.a = S.a R⋈S
T
R
S R⋈S⋈T
T
T⋈S
Hash Join
T.b = S.b
R
17
E XAMPLE : D YNAMIC P ROGRAMMING
Pass #3 (best 3-relation plans): SELECT * FROM R, S, T
WHERE R.A = S.A
best 2-relation plans + one other relation
AND S.B = T.B
Hash Join
R.a = S.a R⋈S Hash Join
S.b = T.b
T
Sort-Merge Join
R S.b = T.b
S R⋈S⋈T
T Sort-Merge Join
S.a = R.a
T⋈S Hash Join
Hash Join
T.b = S.b
R S.a = R.a
18
E XAMPLE : D YNAMIC P ROGRAMMING
Pass #3 (best 3-relation plans): SELECT * FROM R, S, T
WHERE R.A = S.A
best 2-relation plans + one other relation
AND S.B = T.B
Hash Join
R.a = S.a R⋈S Hash Join
S.b = T.b
T
R
S R⋈S⋈T
T Sort-Merge Join
S.a = R.a
T⋈S
Hash Join
T.b = S.b
R
19
E XAMPLE : D YNAMIC P ROGRAMMING
Pass #3 (best 3-relation plans): SELECT * FROM R, S, T
WHERE R.A = S.A
best 2-relation plans + one other relation
AND S.B = T.B
R⋈S
T
R
S R⋈S⋈T
T Sort-Merge Join
S.a = R.a
T⋈S
Hash Join
T.b = S.b
R
20
I NTERESTING O RDERS
System R-style query optimisers also consider interesting orders
Sorting orders of the input tables that may be beneficial later in the query plan
E.g., for a sort-merge join, projection with duplicate removal, order-by clause
Determined by ORDER BY and GROUP BY clauses in the input query or join
attributes of subsequent joins (to facilitate merging)
For each subset of relations, retain only:
Cheapest plan overall, plus
Cheapest plan for each interesting order of the tuples
21
E XAMPLE
SELECT S.sid, COUNT(*) AS number Sailors:
FROM Sailors S B+ tree on sid
JOIN Reserves R ON S.sid = R.sid Reserves:
JOIN Boats B ON R.bid = B.bid
Clustered B+ tree on bid
WHERE B.color = ‘red’
GROUP BY S.sid B+ tree on sid
Boats:
B+ tree on color
Pass 1: Best plan for each relation
Sailors, Reserves: File scan
Boats: B+ tree on color
Also B+ tree on Sailors.sid as interesting order (output sorted on sid)
Also B+ tree on Reserves.bid as interesting order (output sorted on bid)
Also B+ tree on Reserves.sid as interesting order (output sorted on sid)
22
E XAMPLE : PASS 2
Pass 2: Best 2-relation plans
// for each left-deep logical plan
foreach plan P in Pass 1:
foreach FROM table T not in P:
// for each physical plan
foreach access method M on T:
foreach join method ⨝:
generate P ⨝ M(T)
Eliminate cross products
Retain cheapest plan for each (pair of relations, order)
23
E XAMPLE : PASS 3
Using Pass 2 plans as outer relations, generate plans ⋈ sid=sid
for the next join in the same way as Pass 2 INDEX NESTED LOOPS
Example: the marked subplan is the best plan ⋈ bid=bid Sailors
for { Reserves, Boats } and provides an interesting SORT MERGE
INDEX SCAN
order on Boats.bid and Reserves.bid σ color=‘red’
Then, add cost for group-by / aggregate: Boats Reserves
INDEX SCAN SCAN
This is the cost to sort the result by sid
… unless it has already been sorted by a previous operator
Finally, choose the cheapest plan
24
S UMMARY
Query optimisation is an important task in a relational DBMS
Explores a set of alternative plans
Must prune search space; typically, left-deep plans only
Uses dynamic programming for join orderings
Must estimate cost of each plan that is considered
Must estimate the size of result and cost for each plan node
Query optimiser is the most complex part of database systems!