0% found this document useful (0 votes)

9 views24 pages

Lec 17

The document discusses query optimization in relational database management systems (DBMS), focusing on finding the best query execution plan through cost estimation and search algorithms. It highlights the challenges of enumerating multiple execution plans, particularly for single and multiple-table queries, and emphasizes the use of dynamic programming to reduce search space. The query optimizer is identified as the most complex component of DBMS, tasked with exploring alternative plans and estimating their costs.

Uploaded by

p20232002567

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views24 pages

Lec 17

Uploaded by

p20232002567

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Advanced Database Systems

Spring 2025

Lecture #17:
Query Optimisation: Searching

R&G: Chapter 15
2

Q UERY O PTIMISATION
Plan space

Cost estimation

Search algorithm
3

F INDING THE “B EST ” Q UERY P LAN

Holy grail of any DBMS implementation

Challenge: There may be more than one way to answer a given query
Which one of the join operators should we pick?

With which parameters (block size, buffer allocation, …)?

Which join ordering?

F INDING THE “B EST ” Q UERY P LAN

The query optimiser
1. Enumerates all possible query execution plans
If this yields too many plans, at least enumerate the “promising” plan candidates

2. Determines the cost (quality) of each plan

3. Chooses the best one as the final execution plan

Ideally: Want to find the best plan. Practically: Avoid worst plans!
5

E NUMERATION OF A LTERNATIVE P LANS

There are two main cases:
Single-table plans (base case)

Multiple-table plans (induction)

Single-table queries include selects, projects, and group-by / aggregate

Consider each available access path (file scan vs. index)
Choose the one with the least estimated cost
6

S INGLE -TABLE P LANS : C OST E STIMATES

Index I on primary key matches selection:
Cost is (Height(I) + 1) + 1 for a B+ tree (variant B or C)

Clustered index I matching selection:

(NPages(I) + NPages(R)) * selectivity (approximately)

Non-clustered index I matching selection:

(NPages(I) + NTuples(R)) * selectivity (approximately)

Sequential scan of file

NPages(R)

Recall: Must also charge for duplicate elimination if required

S INGLE -TABLE P LAN : E XAMPLE SELECT * FROM Sailors

WHERE rating = 8

If we have an index I on rating: NTuples(Sailors) = 40,000

Cardinality NPages(Sailors) = 500

= 1/ NKeys(rating) · NTuples(Sailors) = 1/10 · 40,000 = 4000 tuples NKeys(rating) = 10

NPages(I) = 50
Clustered index
1/ NKeys(rating) · (NPages(I) + NPages(Sailors)) = 1/10 · (50 + 500) = 55 pages are retrieved
Unclustered index
1/ NKeys(rating) · (NPages(I) + NTuples(Sailors)) = 1/10 · (50 + 40,000) = 4005 pages are retrieved

Costs on indexes are approximate as we might not need to retrieve all index pages

If we have an index I on sid:

Doing an index scan retrieves all pages & tuples
Clustered index: ~ (50 + 500) pages retrieved. Unclustered index: ~ (50 + 40,000) pages retrieved

Doing a file scan retrieves all file pages: 500

M ULTIPLE -TABLE P LANS

We have translated the query into a graph of query blocks
Query blocks are essentially a multi-way product of relations with projections on top

Task: enumerate all possible execution plans

I.e., all possible 2-way join combinations for each query block

Example: three-way join

12 possible re-orderings
⋈ ⋈
2 shown here ⋈ T S ⋈
R S T R
9

E NORMOUS S EARCH S PACE

# of relations n # of different join trees
2 2
3 12
4 120
5 1,680
6 30,240
7 665,280
8 17,297,280
10 17,643,225,600

We have not even considered different join algorithms!

We n e e d t o re s t r i c t s e a rc h s pa c e !
10

M ULTIPLE -TABLE Q UERY P LANNING

Fundamental decision in IBM’s System R (late 1970):
Only consider left-deep join trees

✓⋈ ⨉⋈ ⨉⋈
⋈ U T ⋈
⋈ T ⋈ ⋈ U ⋈
R S R S T U S R
left-deep bushy right-deep
(everything else)
11

L EFT-D EEP J OIN T REES

DBMSs often prefer left-deep join trees
⋈
The inner (rhs) relation always is a base relation
⋈ U
Allows the use of index nested loops join
Allows for fully pipelined plans where intermediate
⋈ T
results are not written to temporary files R S
Should be factored into global cost calculation

Not all left-deep trees are fully pipelined (e.g., sort-merge join)

Pipelining requires non-blocking operators

Modern DBMSs may also consider non left-deep join trees

M ULTI -TABLE Q UERY P LANNING

System R-style join order enumeration ⋈ ⋈
Left-deep tree #1, Left-deep tree #2… ⋈ U ⋈ R

Eliminate plans with cross products immediately ⋈ T ⋈ U

R S S T
Enumerate the plans for each operator
Hash, Sort-Merge, Nested Loop…

Enumerate the access paths for each table

Index #1, Index #2, Sequential scan…

Use dynamic programming to reduce the number of cost estimations

T HE P RINCIPLE OF O PTIMALITY
The best overall plan is composed of best decisions on the subplans
Optimal result has optimal substructure

For example, the best left-deep plan to join tables R, S, T is either:

(The best plan for joining R, S) ⨝ T

(The best plan for joining R, T) ⨝ S

(The best plan for joining S, T) ⨝ R

This is great!
When optimising a subplan (e.g., R ⨝ S), don’t worry how it will be used later (e.g., when joining with T)!

When optimizing a higher-level plan (e.g., R ⨝ S ⨝ T), reuse the best results of subplans (e.g., R ⨝ S)!
14

E XAMPLE : D YNAMIC P ROGRAMMING

Pass #1 (best 1-relation plans): Find best access SELECT * FROM R, S, T
WHERE R.A = S.A
path to each relation (index vs. full table scans)
AND S.B = T.B

R⋈S
T

R
S R⋈S⋈T
T

T⋈S
R
15

E XAMPLE : D YNAMIC P ROGRAMMING

Pass #2 (best 2-relation plans): determine best join SELECT * FROM R, S, T
WHERE R.A = S.A
order (R ⨝ S or S ⨝ R), choose best candidate
AND S.B = T.B

Hash Join
R.a = S.a R⋈S
T
Sort-Merge Join
R.a = S.a
R
S R⋈S⋈T
T Sort-Merge Join
S.b = T.b

T⋈S
Hash Join
T.b = S.b
R
16

E XAMPLE : D YNAMIC P ROGRAMMING

Pass #2 (best 2-relation plans): determine best join SELECT * FROM R, S, T
WHERE R.A = S.A
order (R ⨝ S or S ⨝ R), choose best candidate
AND S.B = T.B

Hash Join
R.a = S.a R⋈S
T

R
S R⋈S⋈T
T

T⋈S
Hash Join
T.b = S.b
R
17

E XAMPLE : D YNAMIC P ROGRAMMING

Pass #3 (best 3-relation plans): SELECT * FROM R, S, T
WHERE R.A = S.A
best 2-relation plans + one other relation
AND S.B = T.B

Hash Join
R.a = S.a R⋈S Hash Join
S.b = T.b
T

Sort-Merge Join
R S.b = T.b
S R⋈S⋈T
T Sort-Merge Join
S.a = R.a

T⋈S Hash Join

Hash Join
T.b = S.b
R S.a = R.a
18

E XAMPLE : D YNAMIC P ROGRAMMING

Pass #3 (best 3-relation plans): SELECT * FROM R, S, T
WHERE R.A = S.A
best 2-relation plans + one other relation
AND S.B = T.B

Hash Join
R.a = S.a R⋈S Hash Join
S.b = T.b
T

R
S R⋈S⋈T
T Sort-Merge Join
S.a = R.a

T⋈S
Hash Join
T.b = S.b
R
19

E XAMPLE : D YNAMIC P ROGRAMMING

Pass #3 (best 3-relation plans): SELECT * FROM R, S, T
WHERE R.A = S.A
best 2-relation plans + one other relation
AND S.B = T.B

R⋈S
T

R
S R⋈S⋈T
T Sort-Merge Join
S.a = R.a

T⋈S
Hash Join
T.b = S.b
R
20

I NTERESTING O RDERS
System R-style query optimisers also consider interesting orders
Sorting orders of the input tables that may be beneficial later in the query plan
E.g., for a sort-merge join, projection with duplicate removal, order-by clause

Determined by ORDER BY and GROUP BY clauses in the input query or join

attributes of subsequent joins (to facilitate merging)

For each subset of relations, retain only:

Cheapest plan overall, plus

Cheapest plan for each interesting order of the tuples

E XAMPLE
SELECT S.sid, COUNT(*) AS number Sailors:
FROM Sailors S B+ tree on sid
JOIN Reserves R ON S.sid = R.sid Reserves:
JOIN Boats B ON R.bid = B.bid
Clustered B+ tree on bid
WHERE B.color = ‘red’
GROUP BY S.sid B+ tree on sid
Boats:
B+ tree on color
Pass 1: Best plan for each relation
Sailors, Reserves: File scan
Boats: B+ tree on color
Also B+ tree on Sailors.sid as interesting order (output sorted on sid)
Also B+ tree on Reserves.bid as interesting order (output sorted on bid)
Also B+ tree on Reserves.sid as interesting order (output sorted on sid)
22

E XAMPLE : PASS 2
Pass 2: Best 2-relation plans

// for each left-deep logical plan

foreach plan P in Pass 1:
foreach FROM table T not in P:
// for each physical plan
foreach access method M on T:
foreach join method ⨝:
generate P ⨝ M(T)

Eliminate cross products

Retain cheapest plan for each (pair of relations, order)
23

E XAMPLE : PASS 3
Using Pass 2 plans as outer relations, generate plans ⋈ sid=sid

for the next join in the same way as Pass 2 INDEX NESTED LOOPS

Example: the marked subplan is the best plan ⋈ bid=bid Sailors

for { Reserves, Boats } and provides an interesting SORT MERGE
INDEX SCAN

order on Boats.bid and Reserves.bid σ color=‘red’

Then, add cost for group-by / aggregate: Boats Reserves

INDEX SCAN SCAN
This is the cost to sort the result by sid
… unless it has already been sorted by a previous operator

Finally, choose the cheapest plan

S UMMARY
Query optimisation is an important task in a relational DBMS

Explores a set of alternative plans

Must prune search space; typically, left-deep plans only
Uses dynamic programming for join orderings

Must estimate cost of each plan that is considered

Must estimate the size of result and cost for each plan node

Query optimiser is the most complex part of database systems!

Lec 14
No ratings yet
Lec 14
26 pages
Lecture 1
No ratings yet
Lecture 1
67 pages
Presentation9 - Query Processing and Query Optimization in DBMS
No ratings yet
Presentation9 - Query Processing and Query Optimization in DBMS
36 pages
12 Query Plan Space
No ratings yet
12 Query Plan Space
72 pages
Class 19
No ratings yet
Class 19
11 pages
Optimization
No ratings yet
Optimization
17 pages
Lec 7 Query Processing, Optimization & Indexing
No ratings yet
Lec 7 Query Processing, Optimization & Indexing
29 pages
Lecture11 Query Processing
No ratings yet
Lecture11 Query Processing
37 pages
22426913
No ratings yet
22426913
124 pages
Lec 22
No ratings yet
Lec 22
45 pages
Query Opt5235234534t34vt4wtwtw45t4w
No ratings yet
Query Opt5235234534t34vt4wtwtw45t4w
24 pages
Query Optimization
No ratings yet
Query Optimization
51 pages
Zyqwadawfafslecture09 Query Optimization
No ratings yet
Zyqwadawfafslecture09 Query Optimization
90 pages
Relational Query Optimization Guide
No ratings yet
Relational Query Optimization Guide
7 pages
CSE 444: Database Internals: Section 4: Query Optimizer
No ratings yet
CSE 444: Database Internals: Section 4: Query Optimizer
16 pages
10 Qo343435154tertweretwgstwgw4
No ratings yet
10 Qo343435154tertweretwgstwgw4
46 pages
CAS CS 460/660 Introduction To Database Systems Query Optimization
No ratings yet
CAS CS 460/660 Introduction To Database Systems Query Optimization
20 pages
Relational Query Optimization Guide
No ratings yet
Relational Query Optimization Guide
71 pages
Lec 11
No ratings yet
Lec 11
43 pages
Lec 15
No ratings yet
Lec 15
43 pages
PT Lect 02 (Structures)
No ratings yet
PT Lect 02 (Structures)
40 pages
Chapter 8
No ratings yet
Chapter 8
65 pages
Overview Ioannidis Chapter
No ratings yet
Overview Ioannidis Chapter
3 pages
Lec 8
No ratings yet
Lec 8
30 pages
MRL3702 Examination On 2023 EF On
No ratings yet
MRL3702 Examination On 2023 EF On
10 pages
Lec 4
No ratings yet
Lec 4
29 pages
05 Optimization
No ratings yet
05 Optimization
58 pages
Um2206 stm32 Nucleo64p Boards mb1319 Stmicroelectronics
No ratings yet
Um2206 stm32 Nucleo64p Boards mb1319 Stmicroelectronics
52 pages
CSE 444 Practice Problems
No ratings yet
CSE 444 Practice Problems
8 pages
Relational Query Optimization: CS186 R & G Chapters 12/15
No ratings yet
Relational Query Optimization: CS186 R & G Chapters 12/15
51 pages
Lec 19
No ratings yet
Lec 19
28 pages
Lec 23
No ratings yet
Lec 23
28 pages
Query Processing
No ratings yet
Query Processing
77 pages
DBMS UNIT 4 Part 1
No ratings yet
DBMS UNIT 4 Part 1
15 pages
Query Optimization
No ratings yet
Query Optimization
20 pages
Vu Lec 34
No ratings yet
Vu Lec 34
26 pages
EPON OLT WebGUI User Manual
No ratings yet
EPON OLT WebGUI User Manual
82 pages
Session - 10 Querying
No ratings yet
Session - 10 Querying
36 pages
QueryProcess Optim
No ratings yet
QueryProcess Optim
60 pages
PT Lect 03 (Unions and Enumerations)
No ratings yet
PT Lect 03 (Unions and Enumerations)
21 pages
ECali1 Engineer Manual Eng
No ratings yet
ECali1 Engineer Manual Eng
138 pages
Unit 3
No ratings yet
Unit 3
63 pages
Data Stream Management
No ratings yet
Data Stream Management
46 pages
Query
No ratings yet
Query
10 pages
08 Dist DB - Query Optimizer New
No ratings yet
08 Dist DB - Query Optimizer New
19 pages
Overview of Query Evaluation: R&G Chapter 12
No ratings yet
Overview of Query Evaluation: R&G Chapter 12
30 pages
Vu Lec 35
No ratings yet
Vu Lec 35
42 pages
Secure File Transmission System Using Steganogrphic Algorithm - New
No ratings yet
Secure File Transmission System Using Steganogrphic Algorithm - New
45 pages
DLCO. 5th Unit Questions Wise
No ratings yet
DLCO. 5th Unit Questions Wise
39 pages
Compusoft, 3 (10), 1108-115 PDF
No ratings yet
Compusoft, 3 (10), 1108-115 PDF
8 pages
Components in ReactJs
No ratings yet
Components in ReactJs
12 pages
Access Path Selection in A Relation Database Management System
No ratings yet
Access Path Selection in A Relation Database Management System
13 pages
PT Lect 05 (Preprocessing)
No ratings yet
PT Lect 05 (Preprocessing)
13 pages
Support Vector Machine
100% (1)
Support Vector Machine
40 pages
Query Optimization in Relational Database Systems
No ratings yet
Query Optimization in Relational Database Systems
77 pages
Execution
No ratings yet
Execution
37 pages
Query Optimization: Imperative Query Execution Plan: Declarative SQL Query
No ratings yet
Query Optimization: Imperative Query Execution Plan: Declarative SQL Query
16 pages
Relational Algebra Optimization
No ratings yet
Relational Algebra Optimization
24 pages
DBMS Aryan Assignment 2
No ratings yet
DBMS Aryan Assignment 2
12 pages
PT Lect 08 (Bit Manipulation)
No ratings yet
PT Lect 08 (Bit Manipulation)
6 pages
B. Change The Color of Text On A Web Page
No ratings yet
B. Change The Color of Text On A Web Page
10 pages
QueryProcessing Sorting
No ratings yet
QueryProcessing Sorting
44 pages
QEII
No ratings yet
QEII
44 pages
CompTIA SY0-401 Exam Prep Guide
100% (1)
CompTIA SY0-401 Exam Prep Guide
6 pages
Samsung Le37m86bdx Le40m86bdx Le46m86bdx Le52m86bdx Le40m87bdx Le46m87bdx Chassis Gtu37,40,46,52sen LCD-TV SM
No ratings yet
Samsung Le37m86bdx Le40m86bdx Le46m86bdx Le52m86bdx Le40m87bdx Le46m87bdx Chassis Gtu37,40,46,52sen LCD-TV SM
196 pages
SOEN 363 - Data Systems For Software Engineers: Query Optimization
No ratings yet
SOEN 363 - Data Systems For Software Engineers: Query Optimization
25 pages
CSE 544: Optimizations: Wednesday, 5/10/2006
No ratings yet
CSE 544: Optimizations: Wednesday, 5/10/2006
51 pages
Chapter 8
No ratings yet
Chapter 8
65 pages
13 QP1
No ratings yet
13 QP1
33 pages
EDCI572 Project
No ratings yet
EDCI572 Project
28 pages
Technical Skills Form - 複本 - 複本
No ratings yet
Technical Skills Form - 複本 - 複本
7 pages
Userguide Ethernetip en Cro 2017 05 08
No ratings yet
Userguide Ethernetip en Cro 2017 05 08
34 pages
Sic Ip Service Handbook 2.3 en
No ratings yet
Sic Ip Service Handbook 2.3 en
91 pages
Cloud Classification and Rainfall Prediction
No ratings yet
Cloud Classification and Rainfall Prediction
5 pages
IBM Dumps
No ratings yet
IBM Dumps
31 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
45 pages
Novel Madre Dewi Lestari PDF
100% (1)
Novel Madre Dewi Lestari PDF
2 pages
Query Evaluation
No ratings yet
Query Evaluation
51 pages
Introduction to SQL Query Language
No ratings yet
Introduction to SQL Query Language
25 pages
Case Study 1
No ratings yet
Case Study 1
3 pages
OLED Module 1.12 Inch-White-27 - 38.9 - 1.28mm - Datasheet
No ratings yet
OLED Module 1.12 Inch-White-27 - 38.9 - 1.28mm - Datasheet
22 pages
Kernel Exploitation for Hackers
No ratings yet
Kernel Exploitation for Hackers
31 pages
GPU-Based Viewshed Analysis Algorithm
No ratings yet
GPU-Based Viewshed Analysis Algorithm
9 pages
BCS Topic
No ratings yet
BCS Topic
66 pages
Network Redundancy with STP
No ratings yet
Network Redundancy with STP
39 pages
Database Query Optimization Guide
No ratings yet
Database Query Optimization Guide
38 pages
BTECH CSE Exam Hall Ticket
No ratings yet
BTECH CSE Exam Hall Ticket
2 pages
Resume ML
No ratings yet
Resume ML
2 pages
Chapter 4 Vector Space
No ratings yet
Chapter 4 Vector Space
66 pages
The Brittleness of Expert Systems Became A Major Concern
No ratings yet
The Brittleness of Expert Systems Became A Major Concern
1 page
Homework List Template
100% (1)
Homework List Template
5 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
45 pages
Simatic Net PG/PC - Industrial Ethernet CP 1623
No ratings yet
Simatic Net PG/PC - Industrial Ethernet CP 1623
22 pages
SQL: The Query Language: CS 186, Spring 2006, Lectures 11&12 R &G - Chapter 5
No ratings yet
SQL: The Query Language: CS 186, Spring 2006, Lectures 11&12 R &G - Chapter 5
58 pages
Everything You Need To Know About PostgreSQL EXPLAIN
No ratings yet
Everything You Need To Know About PostgreSQL EXPLAIN
44 pages
Query Execution
No ratings yet
Query Execution
87 pages

Lec 17

Uploaded by

Lec 17

Uploaded by

Advanced Database Systems

F INDING THE “B EST ” Q UERY P LAN

With which parameters (block size, buffer allocation, …)?

Which join ordering?

F INDING THE “B EST ” Q UERY P LAN

2. Determines the cost (quality) of each plan

3. Chooses the best one as the final execution plan

E NUMERATION OF A LTERNATIVE P LANS

Multiple-table plans (induction)

Single-table queries include selects, projects, and group-by / aggregate

S INGLE -TABLE P LANS : C OST E STIMATES

Clustered index I matching selection:

Non-clustered index I matching selection:

Sequential scan of file

Recall: Must also charge for duplicate elimination if required

S INGLE -TABLE P LAN : E XAMPLE SELECT * FROM Sailors

If we have an index I on rating: NTuples(Sailors) = 40,000

= 1/ NKeys(rating) · NTuples(Sailors) = 1/10 · 40,000 = 4000 tuples NKeys(rating) = 10

If we have an index I on sid:

Doing a file scan retrieves all file pages: 500

M ULTIPLE -TABLE P LANS

Task: enumerate all possible execution plans

Example: three-way join

E NORMOUS S EARCH S PACE

We have not even considered different join algorithms!

M ULTIPLE -TABLE Q UERY P LANNING

L EFT-D EEP J OIN T REES

Pipelining requires non-blocking operators

Modern DBMSs may also consider non left-deep join trees

M ULTI -TABLE Q UERY P LANNING

Eliminate plans with cross products immediately ⋈ T ⋈ U

Enumerate the access paths for each table

Use dynamic programming to reduce the number of cost estimations

For example, the best left-deep plan to join tables R, S, T is either:

(The best plan for joining R, T) ⨝ S

(The best plan for joining S, T) ⨝ R

E XAMPLE : D YNAMIC P ROGRAMMING

E XAMPLE : D YNAMIC P ROGRAMMING

E XAMPLE : D YNAMIC P ROGRAMMING

E XAMPLE : D YNAMIC P ROGRAMMING

T⋈S Hash Join

E XAMPLE : D YNAMIC P ROGRAMMING

E XAMPLE : D YNAMIC P ROGRAMMING

Determined by ORDER BY and GROUP BY clauses in the input query or join

For each subset of relations, retain only:

Cheapest plan for each interesting order of the tuples

// for each left-deep logical plan

Eliminate cross products

Example: the marked subplan is the best plan ⋈ bid=bid Sailors

order on Boats.bid and Reserves.bid σ color=‘red’

Then, add cost for group-by / aggregate: Boats Reserves

Finally, choose the cheapest plan

Explores a set of alternative plans

Must estimate cost of each plan that is considered

Query optimiser is the most complex part of database systems!

You might also like