Database Systems
Session 4
Chapter 2 - Relational Model of
data - Part 3
Objectives
1 Understand why do we need Algebraic Query Language
2 Understand how Algebraic Query Language work
Contents
1 Why do we need Algebraic Query Language
2 An Algebraic Query Language
2.4. An Algebraic Query
Language
2.4.1. Why do we need a special Query Language?
One should ask why we need a new kind of
programming languages for databases?
The surprising answer is that RA is useful
because it is less powerful than C or Java. By
limiting what we can do in RA, we get two huge
rewards:
Ease of programming
The ability of the compiler to produce highly optimized
code.
WHY Algebraic Query Language?
SQL query
Parser
Relational Algebra Expression
Query Optimizer
Query Execution Plan
Code generator
Executable Code
Source: Analysis of Execution Plans in Query Optimization (Dr. Sunita M. et al. Principal, Institute of Compute, 2012).
Query Optimazation
Input: Company A: 1000 employees (20 women)
Output: ID, Name (Condition: Top 5 women lowest
income)
Note: π: projection; : selection;
Employee Table
ID Name DOB FDW Sex Addr Income
1 Ng Van A 11/2/2000 1/1/2022 1 Hai phong 315
2 Le Thi B 21/12/1984 1/1/2018 0 Ninh Thuan 245
… … …
999 Phan Hong H 19/6/1997 1/1/2022 0 Can Tho 110
1000 Tran Van C 29/3/1985 1/5/2012 1 Ha Noi 1000
Sex=0 (πID, Name(Employee_sort_by_income))
Query Optimazation
Pseudo-code (ver.1)
// Sort
for (int i=1,i<1000, i++)
for (int j=i+1,j<999, j++) {
if Income[i] < Income[j] {
swap(record[i], record[j])
} V1: Sex=0 (πID, Name(Emp_sort_b
}
// Extract top 5
for (int i=1,i<1000, i++)
if (record[i].Sex==0) {
V2: π
print(record[i]) (Emp_extract_b
iCount++
if iCount=5 break;
}
Query Optimazation
Pseudo-code (ver.2)
// Extract female_list
for (int i=1,i<1000, i++)
if (record[i].Sex==0) {
Append to L;
}
// Sort
for (int i=1,i<len(L), i++)
for (int j=i+1,j<len(L)-1, j++) {
if Income[i] < Income[j] {
swap(record[i], record[j])
}
}
πID, Name(Sex=0 (Emp_extract_by_gender))
Query Optimazation
Pseudo-code (ver.1) Pseudo-code (ver.2)
// Sort // Extract female_list
for (int i=1,i<1000, i++) for (int i=1,i<1000, i++)
for (int j=i+1,j<999, j++) { if (record[i].Sex==0) {
if Income[i] < Income[j] { Append to L;
swap(record[i], record[j]) }
} // Sort
} for (int i=1,i<len(L), i++)
// Extract top 5 for (int j=i+1,j<len(L)-1, j++) {
for (int i=1,i<1000, i++) if Income[i] < Income[j] {
if (record[i].Sex==0) { swap(record[i], record[j])
print(record[i]) }
iCount++ }
If iCount=5 break;
}
Query Optimazation
Pseudo-code (ver.1) Pseudo-code (ver.2)
// Sort // Extract female_list
for (int i=1,i<1000, i++) for (int i=1,i<1000, i++)
for (int j=i+1,j<999, j++) { if (record[i].Sex==0) {
if Income[i] < Income[j] { Append to L;
swap(record[i], record[j]) }
} // Sort
} for (int i=1,i<len(L), i++)
// Extract top 5 for (int j=i+1,j<len(L)-1, j++) {
for (int i=1,i<1000, i++) if Income[i] < Income[j] {
if (record[i].Sex==0) { swap(record[i], record[j])
print(record[i]) }
iCount++ }
If iCount=5 break; Ver1: O(N2) + N ≈ 106+1000;
} Ver2: O(n2) + n ≈ 202+1000;
2.4.2. What is an Algebra?
An algebra, in general, consist of
operators and atomic operands.
RA is another example of an algebra: its
atomic operands are:
Variables = relations
Constants = finite relations
2.4.2. Relational algebra definition
Relational algebra, an offshoot of algebra of
sets via operators
Operators operate on one or more relations to
create a new relation
2.4.3 Overview of relational algebra
4 kinds of operations of the traditional RA:
Set operations: UNION (), INTERSECTION
(), DIFFERENCE/EXCEPT (-);
Operations that remove parts of a relation:
SELECTION () and PROJECTION (π);
Operations that combine the tuples of two
relations: Cartesian PRODUCT (), JOIN (⋈);
Renaming operation (ρ);
Example: Set Union Sell1 Sell2
Sell1 Sell2 Sell1 Sell2
A B C A B C A B C
a1 b1 c1 a1 b1 c1 a1 b1 c1
a1 b1 c2 a2 b2 c2 a1 b1 c2
a2 b1 c1 a2 b1 c1
a2 b2 c2
Sell1 Sell2 = {t | tSell1 tSell2}
2.4.4. Set Operations on Relations - Set Union
Sells1 Sells2
bar beer price bar beer price
Joe's Bud 2.50 Joe's Bud 2.50
Joe's Miller 2.75 Joe's Miller 2.75
Sue's Bud 2.50 Sue's Miller 3.00
Sells1 U Sells2
bar beer price
Joe's Bud 2.50
Joe's Miller 2.75
Sue's Bud 2.50
Sue's Miller 3.00
Example: Set Except Sell1 – Sell2
Sell1 Sell2 Sell1 – Sell2
A B C A B C A B C
a1 b1 c1 a1 b1 c1 a1 b1 c2
a1 b1 c2 a2 b2 c2 a2 b1 c1
a2 b1 c1
Sell1 – Sell2 = {t | tSell1 tSell2}
2.4.4. Set Operations on Relations - Set Except
Sells1 Sells2
bar beer price bar beer price
Joe's Bud 2.50 Joe's Bud 2.50
Joe's Miller 2.75 Joe's Miller 2.75
Sue's Bud 2.50 Sue's Miller 3.00
Sells1 \ Sells2 or Sells1 – Sells2
bar beer price
Sue's Bud 2.50
Example: Set Intersection Sell1 Sell2
Sell1 Sell2 Sell1 Sell2
A B C A B C A B C
a1 b1 c1 a1 b1 c1 a1 b1 c1
a1 b1 c2 a2 b2 c2
a2 b1 c1
Sell1 Sell2 = {t | tSell1 tSell2}
2.4.4. Set Operations on Relations - Set Intersection
This
Sells1 Sells2
bar beer price bar beer price
Joe's Bud 2.50 Joe's Bud 2.50
Joe's Miller 2.75 Joe's Miller 2.75
Sue's Bud 2.50 Sue's Miller 3.00
Sells1 ∩ Sells2
bar beer price
Joe's Bud 2.50
Joe's Miller 2.75
2.4.5. Projection
R1 := πL (R2)
L is a list of attributes from the schema of R2.
R1 is constructed by looking at each tuple of
R2, extracting the attributes on list L, in the
order specified, and creating from those
components a tuple for R1.
Eliminate duplicate tuples, if any.
L R2 R1 = πL(R2) = {t[L] | tR2}
Example: Projection
Relation Sells:
bar beer price
Joe’s Bud 2.50
Joe’s Miller 2.75
Sue’s Bud 2.50
Sue’s Miller 3.00
Prices := πbeer, price(Sells):
beer price
Bud 2.50
Miller 2.75
Miller 3.00
Extended Projection
Using the same πL operator, we allow
the list L to contain arbitrary expressions
involving attributes:
1. Arithmetic on attributes, e.g., A+B->C.
2. Duplicate occurrences of the same
attribute.
Example: Extended Projection
R= (A B)
1 2
3 4
πA+B->C, A, A (R) = C A1 A2
3 1 1
7 3 3
2.4.6. Selection
R1 := σC (R2)
C is a condition (>, <, ≤, , =, ≠, , , )
as in “if” statements) that refers to
attributes of R2.
R1 is all those tuples of R2 that satisfy C.
C(R2) = {t | tR2 C(t) = True}
Example: Selection
Relation Sells:
bar beer price
Joe’s Bud 2.50
Joe’s Miller 2.75
Sue’s Bud 2.50
Sue’s Miller 3.00
JoeMenu := σbar=“Joe’s”(Sells):
bar beer price
Joe’s Bud 2.50
Joe’s Miller 2.75
2.4.7. Cartesian Product
R3 := R1 Χ R2
Pair each tuple t1 of R1 with each tuple t2
of R2.
Concatenation t1t2 is a tuple of R3.
Schema of R3 is the attributes of R1 and
then R2, in order.
But beware attribute A of the same name in
R1 and R2: use R1.A and R2.A.
Example: R3 := R1 Χ R2
R1( A, B ) R3( A, R1.B, R2.B, C )
1 2 1 2 5 6
3 4 1 2 7 8
1 2 9 10
R2( B, C ) 3 4 5 6
5 6 3 4 7 8
7 8 3 4 9 10
9 10
R1R2 = {t | t = (A, B, C) (A, B)R1 (B, C)R2}
2.4.8. Natural Join
A useful join variant (natural join)
connects two relations by:
Equating attributes of the same name, and
Projecting out one copy of each pair of
equated attributes.
Denoted R3 := R1 ⋈ R2.
Example: Natural Join
Sells( bar, beer, price ) Bars( bar, addr )
Joe’s Bud 2.50 Joe’s Maple St.
Joe’s Miller 2.75 Sue’s River Rd.
Sue’s Bud 2.50
Sue’s Coors 3.00
BarInfo := Sells ⋈ Bars
Note: Bars.name has become Bars.bar to make the natural
join “work.”
BarInfo( bar, beer, price, addr )
Joe’s Bud 2.50 Maple St.
Joe’s Milller 2.75 Maple St.
Sue’s Bud 2.50 River Rd.
Sue’s Coors 3.00 River Rd.
2.4.9 Theta Join
R ⋈ θ S
The result of theta join consists of all
combinations of tuples in two relations
R and S that satisfy θ condition
Example: Theta Join
R S
A B C D
1 1 2 2
1 2 3 2
2 3 4 1
R⋈B>=CS
A B C D
1 2 2 2
2 3 2 2
2 3 3 2
2.4.11. Renaming
The ρ operator gives a new schema to
a relation.
R1 := ρR1(A1,…,An)(R2) makes R1 be a
relation with attributes A1,…,An and
the same tuples as R2.
Simplified notation: R1(A1,…,An) := R2.
Example: Renaming
Bars( name, addr )
Joe’s Maple St.
Sue’s River Rd.
R(bar, addr) := Bars
R( bar, addr )
Joe’s Maple St.
Sue’s River Rd.
2.4.11 Combining Operations to Form Queries
Movies (title, year, length, genre, studioName)
What are titles and years of movies made by
Fox that are at least 100 minutes long?
To answer above question, see the steps
represented as an expression tree:
π title, year
∩
σlength >= 100 σstudioName = ‘Fox’
Movies Movies
2.4.12 Relationships among operations
2.4.13 Exercises
A database schema consist of 4 relations:
Product(maker, model, type)
PC(model, speed, ram, hd, price)
Laptop(model, speed, ram, hd, screen, price)
Printer(model, color, type, price)
https://github.com/himahb/hima226/blob/master/computers.sql
2.4.13 Exercises
Samples data for 4 relations
PC
Product
Laptop
Printer
2.4.13 Exercises
Write expression of relational algebra to answer the following queries:
What types of product made by manufacturer A?
What PC models have a speed of at least 3.00?
What types of product made by both manufacturers A and D?
What types of product made by manufacturer A but not by manufacturer D?
Which manufacturers make laptops with a hard disk of at least 100GB?
Find the model numbers and price of all products (of any type) made by
manufacturer B?
Find the model number of all color laser printers?
Find those manufacturers that sell Laptops, but not PC’s.
Find those hard-disk sizes that occur in two or more PC’s.
Find the manufacturers of the computer with the highest available speed.
Find those manufacturers of at least two different computers (PC or laptop) with
speeds of at least 2.80.
1. Product(maker, model, type)
2. PC(model, speed, ram, hd, price)
3. Laptop(model, speed, ram, hd, screen, price)
4. Printer(model, color, type, price)
Summary
RA is more useful than C or Java because it is
less powerful.
RA is an algebra: its atomic operands are:
Variables that stand for relations
Constants, which are finite relations
The six primitive operators of RA are: Selection,
Projection, Product, Union, Difference (Except)
and Rename
Other operators of RA are: Natural Join, Theta
Join, …