CSH2D3 - Database System
05 | Query Processing
Query Processing
Goals of the Meeting
01 02
Students knows the Students understand
basic process of query how to translate SQL
processing Queries into
Relational Algebra
Expression (RAE)
Query Processing 2
Outline
Steps of Query Processing
Relational Algebra Expression
Query Processing
Steps of Query Processing
Query Processing 4
Basic Steps in Query Processing
1. Parsing and translation
2. Optimization
3. Evaluation
Query Processing
Basic Steps in Query Processing (Cont.)
• Parsing and translation
• translate the query into its internal form. This is then translated into
relational algebra.
• Parser checks syntax, verifies relations
• Optimization
• Each relational algebra operation can be evaluated using one of several
different algorithms
• Annotated expression specifying detailed evaluation strategy is called an
evaluation-plan
• Amongst all equivalent evaluation plans choose the one with lowest cost.
• Evaluation
• The query-execution engine takes a query-evaluation plan, executes that
plan, and returns the answers to the query.
Query Processing
Relational Algebra
Query Processing 7
Relational Algebra
• A procedural language consisting of a set of operations that take
one or two relations as input and produce a new relation as their
result.
• Operators
• select:
• project:
• cartesian product: x
• join: ⋈
• union:
• set-intersection:
• set-difference: –
• assignment:
• rename:
Query Processing
Select Operation
• The select operation selects tuples that satisfy a given predicate.
• Notation: p (r)
• p is called the selection predicate
• Example: select those tuples of the instructor relation where the
instructor is in the “Physics” department.
• Query
SELECT * FROM instructor WHERE dept_name = ‘Physics’
• Relational Algebra (RA)
dept_name=“Physics” (instructor)
• Result
Query Processing
Select Operation (Cont.)
• We allow comparisons using
=, , >, . <.
in the selection predicate.
• We can combine several predicates into a larger predicate by using the connectives:
(and), (or), (not)
• Example: Find the instructors in Physics with a salary greater than $90,000, we write:
• Query: SELECT * FROM instructor WHERE dept_name = ‘Physics’ AND salary > 90000
• RA:
dept_name=“Physics” salary > 90,000 (instructor)
• The select predicate may include comparisons between two attributes.
• Example, find all departments whose name is the same as their building name:
• Query: SELECT * FROM department WHERE dept_name = building
• RA:
dept_name=building (department)
Query Processing
Project Operation
• A unary operation that returns its argument relation, with certain
attributes left out.
• Notation:
A1,A2,A3 ….Ak (r)
where A1, A2, …, Ak are attribute names and r is a relation name.
• The result is defined as the relation of k columns obtained by erasing the
columns that are not listed
• Duplicate rows removed from result, since relations are sets
Query Processing
Project Operation Example
• Example: eliminate the dept_name attribute of instructor
• Query: SELECT id, name, salary FROM instructor
• RA:
ID, name, salary (instructor)
• Result:
Query Processing
Composition of Relational Operations
• The result of a relational-algebra operation is relation and therefore of
relational-algebra operations can be composed together into a
relational-algebra expression.
• Consider the query -- Find the names of all instructors in the Physics
department.
• Query: SELECT name FROM instructor WHERE dept_name = ‘Physics’
• RAE:
name( dept_name =“Physics” (instructor))
• Instead of giving the name of a relation as the argument of the
projection operation, we give an expression that evaluates to a relation.
Query Processing
Cartesian-Product Operation
• The Cartesian-product operation (denoted by X) allows us to combine
information from any two relations.
• Example: the Cartesian product of the relations instructor and teaches is
written as:
• Query: SELECT * FROM instructor CROSS JOIN teaches
or SELECT * FROM instructor, teaches
• RA:
instructor X teaches
• We construct a tuple of the result out of each possible pair of tuples: one
from the instructor relation and one from the teaches relation (see next
slide)
• Since the instructor ID appears in both relations we distinguish between
these attribute by attaching to the attribute the name of the relation from
which the attribute originally came.
• instructor.ID
• teaches.ID
Query Processing
The instructor X teaches table
Query Processing
Join Operation
• The Cartesian-Product
instructor X teaches
associates every tuple of instructor with every tuple of teaches.
• Most of the resulting rows have information about instructors who did
NOT teach a particular course.
• To get only those tuples of “instructor X teaches “ that pertain to
instructors and the courses that they taught, we write:
• Query: SELECT * FROM instructor, teaches WHERE instructor.id = teaches.id
• RAE:
instructor.id = teaches.id (instructor x teaches ))
• We get only those tuples of “instructor X teaches” that pertain to
instructors and the courses that they taught.
• The result of this expression, shown in the next slide
Query Processing
Join Operation (Cont.)
• The table corresponding to:
instructor.id = teaches.id (instructor x teaches))
Query Processing
Join Operation (Cont.)
• The join operation allows us to combine a select operation and a
Cartesian-Product operation into a single operation.
• Consider relations r (R) and s (S)
• Let “theta” be a predicate on attributes in the schema R “union” S. The
join operation r ⋈𝜃 s is defined as follows:
𝒓 ⋈𝜽 𝒔 = 𝝈𝜽 (𝒓 × 𝒔)
• Thus
instructor.id = teaches.id (instructor x teaches ))
• Can equivalently be written as
• SELECT * FROM instructor JOIN teaches ON instructor.id = teaches.id
instructor ⋈ Instructor.id = teaches.id teaches
Query Processing
Join Operation (Cont.)
• Natural join :
• Inner join :
• Outer join :
Left Outer Join Full Outer Join Right Outer Join
Query Processing
Union Operation
• The union operation allows us to combine two relations
• Notation: r s
• For r s to be valid.
1. r, s must have the same arity (same number of attributes)
2. The attribute domains must be compatible (example: 2nd
column of r deals with the same type of values as does the
2nd column of s)
• Example: to find all courses taught in the Fall 2017 semester, or in the
Spring 2018 semester, or in both
• Query: SELECT course_id FROM section WHERE semester=‘Fall’ AND year=2017
UNION SELECT course_id FROM section WHERE semester=‘Spring’ AND
year=2018
• RAE:
course_id ( semester=“Fall” Λ year=2017 (section))
course_id ( semester=“Spring” Λ year=2018 (section))
Query Processing
Union Operation (Cont.)
• Result of:
course_id ( semester=“Fall” Λ year=2017 (section))
course_id ( semester=“Spring” Λ year=2018 (section))
Query Processing
Set-Intersection Operation
• The set-intersection operation allows us to find tuples that are in both the input
relations.
• Notation: r s
• Assume:
• r, s have the same arity
• attributes of r and s are compatible
• Example: Find the set of all courses taught in both the Fall 2017 and the Spring 2018
semesters.
• Query: SELECT course_id FROM section WHERE semester=‘Fall’ AND year=2017 INTERSECT SELECT
course_id FROM section WHERE semester=‘Spring’ AND year=2018
• RAE:
course_id ( semester=“Fall” Λ year=2017 (section))
course_id ( semester=“Spring” Λ year=2018 (section))
• Result
Query Processing
Set-Intersection Operation Example 2
Query Processing
Set-Difference Operation
• The set-difference operation allows us to find tuples that are in one relation
but are not in another.
• Notation r – s
• Set differences must be taken between compatible relations.
• r and s must have the same arity
• attribute domains of r and s must be compatible
• Example: to find all courses taught in the Fall 2017 semester, but not in the
Spring 2018 semester
• Query: SELECT course_id FROM section WHERE semester=‘Fall’ AND year=2017 MINUS
SELECT course_id FROM section WHERE semester=‘Spring’ AND year=2018
• RAE:
course_id ( semester=“Fall” Λ year=2017 (section)) −
course_id ( semester=“Spring” Λ year=2018 (section))
Query Processing
Set-Difference Operation Example 2
Query Processing
Assignment Operation
• It is convenient at times to write a relational-algebra expression by
assigning parts of it to temporary relation variables.
• The assignment operation is denoted by and works like assignment
in a programming language.
• Example: Find all instructor in the “Physics” and Music department.
Physics dept_name=“Physics” (instructor)
Music dept_name=“Music” (instructor)
Physics Music
• With the assignment operation, a query can be written as a sequential
program consisting of a series of assignments followed by an expression
whose value is displayed as the result of the query.
Query Processing
Rename Operation
• The results of relational-algebra expressions do not have a
name that we can use to refer to them. The rename
operator, , is provided for that purpose
• The expression:
x (E)
returns the result of expression E under the name x
• Another form of the rename operation:
x(A1,A2, .. An) (E)
returns the result of expression E under the name x, and
with the attributes renamed to A1 , A2 , …., An .
Query Processing
Rename Operation Example
• SELECT * FROM countries nation:
nation (countries)
• SELECT country_id AS id, country_name AS name, region_id
FROM countries nation:
nation (id, name, region_id) (countries)
• SELECT country_id AS id, country_name AS name FROM
countries nation:
id, name (nation (id, name, region_id) (countries) )
Query Processing
Equivalent Queries
• There is more than one way to write a query in relational algebra.
• Example: Find information about courses taught by instructors in the
Physics department with salary greater than 90,000
• Query 1
dept_name=“Physics” salary > 90,000 (instructor)
• Query 2
dept_name=“Physics” ( salary > 90.000 (instructor))
• The two queries are not identical; they are, however, equivalent -- they
give the same result on any database.
Query Processing
Equivalent Queries
• There is more than one way to write a query in relational algebra.
• Example: Find information about courses taught by instructors in the
Physics department
• Query 1
dept_name=“Physics” (instructor ⋈ instructor.ID = teaches.ID teaches)
• Query 2
(dept_name=“Physics” (instructor)) ⋈ instructor.ID = teaches.ID teaches
• The two queries are not identical; they are, however, equivalent -- they
give the same result on any database.
Query Processing
Query Processing 31
Exercises
Given the employee database as follow:
employee (empID, person_name, street, city)
works (empID, compID, salary)
company (compID, company_name, city)
Give an expression in the relational algebra to express each of the following queries:
1. Find the name and city of each employee who does not lives in “Miami”
2. Find the name of each employee whose salary is greater than equal to $100000.
3. Find the name and salary of each employee whose salary is between $50000 and
$100000.
4. Find the name of each employee who lives in “Miami” or whose salary is lower than
$100000.
5. Find the company name and name of each employee who does not work for “BigBank”.
6. Find the company name, city, and name of each employee who lives in the same city as
the company for which she or he works.
Query Processing 32
References
Silberschatz, Korth, and Sudarshan. Database System Concepts – 7th
Edition. McGraw-Hill. 2019.
Slides adapted from Database System Concepts Slide.
Source: https://www.db-book.com/db7/slides-dir/index.html
Query Processing 33