🔎 Query Optimization
Query optimization is conducted by a query optimizer in a DBMS. The goal
is to select the best available strategy for executing a query, based on
available information. The primary goal is to arrive at the most efficient and
cost-effective plan. Most RDBMSs use a tree as the internal representation
of a query. A more accurate description of query optimization is that it's an
activity performed by a query optimizer to select the best available strategy
for query execution.
🌳 Query Trees and Heuristics
The steps are as follows:
1. The scanner and parser generate the initial query representation.
2. The representation is optimized according to heuristic rules.
3. A query execution plan is developed.
Execution involves groups of operations based on access paths
and files.
Example heuristic rule:
Apply SELECT and PROJECT operations before JOIN to reduce the size of files
to be joined.
Query Tree: Represents a relational algebra expression.
Query Graph: Represents a relational calculus expression.
A query tree is a tree data structure corresponding to an extended relational
algebra expression.
Input relations of the query are represented as leaf nodes.
Relational algebra operations are represented as internal nodes.
Execution starts at the leaf nodes and ends at the root node.
A query graph displays relation nodes as single circles, constants as double
circles, selection/join conditions as edges, and attributes to be retrieved in
square brackets. Query trees are preferable because they show the order of
operations for query execution, which is not possible in query graphs.
🔄 Heuristic Optimization of Query Trees
Many different query trees can represent the same query and yield the
same results. The optimizer transforms an initial, inefficient tree into an
equivalent, final query tree.
Example transformations:
1. Moving SELECT operations down the query tree.
2. Applying the more restrictive SELECT operation first.
3. Replacing CARTESIAN PRODUCT and SELECT with JOIN operations.
4. Moving PROJECT operations down the query tree.
💡 Summary of Heuristics for Algebraic Optimization
Apply operations that reduce the size of intermediate results first.
Perform SELECT and PROJECT operations as early as possible.
Apply the most restrictive SELECT and JOIN operations before others.
⚙️ Choice of Query Execution Plans
Materialized Evaluation: The result of an operation is stored as a
temporary relation.
Pipelined Evaluation: Operation results are forwarded directly to the
next operation in the query sequence.
Nested Subquery Optimization
Unnesting: Removing the nested query and converting the inner and
outer queries into one block. Queries with nested subqueries
connected by IN or ANY can be converted into a single block query.
Alternate technique: Creating temporary result tables from subqueries
and using them in joins.
Subquery (View) Merging Transformation
Inline View: A FROM clause subquery.
View Merging Operation: Merges the tables in the view with the
tables from the outer query block. Views containing select-project-join
operations are considered simple views and can be subjected to view-
merging.
Group-By View-Merging
Delaying the Group By operation after joins may reduce the data
subjected to grouping if the joins have low join selectivity.
Performing Group By early may reduce the amount of data subjected
to subsequent joins.
The optimizer determines whether to merge GROUP-BY views based
on estimated costs.
💾 Materialized Views
A view defined in the database as a query.
A materialized view stores the results of that query and may be stored
temporarily or permanently, used to avoid recomputation.
Incremental View Maintenance
Update views incrementally by accounting for changes since the last update,
including join, selection, projection, intersection, and aggregation
operations.
📊 Use of Selectives in Cost-Based Optimization
The query optimizer estimates and compares query execution costs using
different strategies, choosing the lowest cost estimate. This process is suited
to compiled queries.
Cost-based query optimization: For a given query subexpression,
multiple equivalence rules may apply.
Cost metric: Includes space and time requirements.
Scope of query optimization: A query block.
Cost components for query execution:
Access cost to secondary storage
Disk storage cost
Computation cost
Memory usage cost
Communication cost
🗂️ Catalog Information Used in Cost Functions
Information stored in the DBMS catalog and used by the optimizer:
File size
Organization
Number of levels of each multilevel index
Number of distinct values of an attribute
Attribute selectivity (allows calculation of selection cardinality)
Average number of records that satisfy equality selection condition on that
attribute.
Histograms
Tables or data structures that record information about the distribution of
data.
RDBMS stores histograms for important attributes.
� Cost Functions for the JOIN Operation
Operation Cost
For three memory buffer blocks: BR∗BSBR∗BS <br> For n
J1: Nested-loop join memory buffer blocks: BR+BR∗(BS/n−2)BR+BR∗(BS/n−2)
For a secondary index with selection cardinality
J2: Index-based nested-loop S: BR+((BR∗S)/m)BR+((BR∗S)/m)
J3: Sort-merge join Cost of sorting must be added if sorting needed
J4: Partition-hash join N/A
Join selectivity Join cardinality
Join Selectivity and Cardinality
Semi-Join: Unnesting a query leads to a semi-join.
Anti-Join: Unnesting a query leads to an anti-join.
Multirelation Queries and JOIN Ordering Choices
Left-Deep Join Tree
Right-Deep Join Tree
Bushy Join Tree
Left-deep trees are generally preferred because they work well for common
algorithms for join and are able to generate fully pipelined plans.
Dynamic Programming Algorithm
Optimal solution structure is developed.
Value of the optimal solution is recursively defined.
Optimal solution is computed and its value developed in a
bottom-up fashion.
Physical optimization involves execution decision at the physical level.
Cost-based physical optimization: Top-down or bottom-up approach.
Physical level heuristics: For selections, use index scans whenever
possible.
💰 Example to Illustrate Cost-Based Query Optimization
Consider query Q2 and its query tree. Evaluate potential join orders.
PROJECT DEPARTMENT EMPLOYEE
DEPARTMENT PROJECT EMPLOYEE
DEPARTMENT EMPLOYEE PROJECT
EMPLOYEE DEPARTMENT PROJECT
ℹ️ Additional Issues Related to Query Optimization
Displaying the system’s query execution plan:
Oracle syntax:
EXPLAIN PLAN FOR <SQL query>
IBM DB2 syntax:
EXPLAIN PLAN SELECTION [additional options] FOR <SQL-query>
SQL server syntax:
SET SHOWPLAN_TEXT ON
or
SET SHOWPLAN_XML ON
or
SET SHOWPLAN_ALL ON
Size Estimation of Other Operations
Projection
Set operations
Aggregation
Outer join
Plan Caching
Plan stored by the query optimizer for later use by the same queries with
different parameters.
Top K-Results Optimization
Limits strategy generation.
warehouse An Example of Query Optimization in Data Warehouses
Star Transformation Optimization: Goal is to access a reduced set of
data from the fact table and avoid a full table scan.
Classic star transformation
Bitmap index star transformation
Joining back
🚀 Overview of Query Optimization in Oracle
The physical optimizer is cost-based, with the scope being a single query
block. It calculates cost based on object statistics, estimated resource use,
and memory needed. The global query optimizer integrates logical
transformation and physical optimization phases to generate an optimal
plan for the entire query tree. Adaptive optimization uses a feedback loop to
improve on previous decisions.
Additional Oracle Features
Array Processing
Hints: Specified by the application developer and embedded in the
SQL statement text. Types include access path, join order, join
method, and enabling/disabling a transformation.
Outlines: Used to preserve execution plans.
SQL Plan Management
� Semantic Query Optimization
Uses constraints specified on the database schema.
Goal: Modify one query into another that is more efficient to execute.