Lesson 3
Advanced SQL
3.1
Introduction
This chapter addresses multi table queries, querying multiple tables include joins and sub-queries.
Two other topics critical to web based development are embedded SQL and dynamic SQL. Sub-query is
an important topic in SQL, this semester we shall learn to build several sub-queries.
3.2
Objective
Build syntax for JOINs - equijoin, natural join, outer join, inner join, self join
Write sub-queries. Differentiate and compare sub-query with join
Build equivalent queries using JOIN and sub-query
Compare correlated and non-correlated sub-queries
Combine queries using UNION
CREATE and DROP Indexes
Use conditional expressions
Recognize the need for Transaction Management
Determine the use and location of triggers and stored procedures. Differentiate between stored
procedures and triggers
Differentiate between procedures and functions
Determine the need for embedded SQL and dynamic SQL in web based applications
3.3
Reading Assignment
Read
Page 289, Learning Objectives and Introduction
Summary on page 329 to 330
All definitions on left and right margins on pages 290 to 327
19
20
3.4
LESSON 3. ADVANCED SQL
Working with more than one table
Table 3.1 lists types of joins with a brief explanation. Note, CROSS JOIN and UNION are included in the
table, although they are not strictly joins.
Join
Results
CROSS JOIN
Cartesian product of tables. Combines each row from the first table with
each row from the second table.
NATURAL JOIN
All requested results, values of common columns returned only once. Duplicate column eliminated
INNER JOIN
Requires each record in two tables to have matching records. Used in many
situations. May not be the suitable in some conditions
EQUI JOIN
Uses only field values in tables that are equal. Returns all requested results
including values of common columns
OUTER JOIN
Returns all values from one table even if match not found
LEFT OUTER JOIN
Joins rows based on matched values. The result includes unmatched rows
from the table on the left of the JOIN clause
RIGHT OUTER JOIN
Joins rows based on matched values. The result includes unmatched rows
from the table on the right of the JOIN clause
FULL OUTER JOIN
Joins rows based on matched values and returns rows from both tables.
Not supported in MySQL
SELF JOIN
Joins the table to itself
UNION
Returns a table that includes all data from each table. Both tables (or
views) must have the same number of columns and the data types of both
columns must be same. Strictly not a join
Table 3.1: Joins
3.5
JOIN
A JOIN clause combines two or more tables. The tables are from the same database. Almost all the joins
are constructed with fields from tables that are matched.
Data is stored in individual tables; it is the relation between the tables that make the data meaningful.
JOINs perform that meaningful relation. [4, Fehily, Page 193]. SQLs strength is in JOIN operations.
NATURAL JOIN
Result of a NATURAL JOIN is a set of records from table 1 and table 2 that are equal in their common
attribute.
3.5. JOIN
21
A NATURAL JOIN is one of the eight operators. The following example is adopted from back inside
cover of [1]. Given the minimal metadata, perform a NATURAL JOIN.
a1
b1
b1
c1
a2
b1
b2
c2
a3
b2
b3
c3
Table 3.2: Table 1
Table 3.3: Table 2
Using JOINs
Consider the following three tables Team, Player and PlayerTeam. Table Team has five Teams, TeamID
is the prime-key. Table Player has six players. Players are assigned to teams, this is shown in table
PlayerTeam.
TeamID
Team
21
Woqag
22
Zumey
23
Gavop
24
Turoz
25
Nibeg
Table 3.4: Team
22
LESSON 3. ADVANCED SQL
PlayerID
PlayerName
445
Bix
446
Lay
447
Vow
448
Cox
449
Sal
450
Kar
Table 3.5: Player
PlayerID
TeamID
445
21
446
23
447
23
448
24
449
24
Table 3.6: PlayerTeam
Figure 3.1: ERD Player Team
Player Kar has not been assigned a Team; Team 25,Nibeg and Team 22,Zumey has no players assigned.
3.5. JOIN
3.5.1
23
Exercise
Use PlayerID and TeamID from table 3.6 to write, in tabular form, Player Name and Team Name.
CROSS JOIN
A cross join does not apply any predicate or filter, it has limited practical use in data processing. Be
careful when using a cross join, the result is often not what you expect. Mathematically, a CROSS JOIN is
a product, (specifically a cartesian product) one among the original eight operators. A NATURAL JOIN,
discussed later, is a join operator. Always examine the result of a join operation and verify with your
expected outcome. An incorrectly specified WHERE clause results in a cross join
Aside: A predicate is an operator, or a function, that returns a TRUE or FALSE.
A cross join can be run in two ways:
SELECT * FROM <tableA> CROSS JOIN <tableB>;
or simply
SELECT * FROM <tableA>, <tableB>;
SELECT * FROM Team, Player; results in 30 rows. In this case it is not a useful result.
Question: Think of a use of a cartesian product, i.e. CROSS JOIN
Exercise: Run the script deck.sql.
Aside: Note the method of inserting rows using a single INSERT statement
INNER JOIN
INNER JOIN is the most common type of join - although it is not suitable in all situations. It requires
each record in two tables to have matching records. Here is an INNER JOIN that gives the lists Players
and Team assigned to the player.
INNER JOIN has predictable results when referential integrity is enforced. Referential Integrity needs to
be preserved if the intended results of INNER JOIN have to match the expected results. In sample tables,
Player Kar does not appear in the result - he/she is not associated with a team, Also, team 25,Nibeg,
and team 22,Zumey does not appear in the results.
24
LESSON 3. ADVANCED SQL
The following SQL statement joins three tables:
SELECT PlayerName, TeamName
FROM Player
INNER JOIN
PlayerTeam USING( PlayerID )
INNER JOIN
Team USING( TeamID );
Figure 3.2: Inner Join
An INNER JOIN is suited for tables that do not have NULL values. Data with NULL values are omitted
without error or warnings. NULL values in one table do not match any values in another table, not even
NULL values.
LEFT OUTER JOIN
LEFT JOIN joins two tables; the result matches attributes from both tables and it includes unmatched
rows from the table on the left of the JOIN clause.
# LEFT OUTER JOIN
# Showing player Kar does not belong to any team
SELECT Player.PlayerID, PlayerName, PlayerTeam.TeamID
FROM Player LEFT JOIN
PlayerTeam ON PlayerTeam.PlayerID = Player.PlayerID;
Figure 3.3: Left Outer Join
# LEFT OUTER JOIN
# Showing team Zumey and team Nibeg do not have any players
SELECT Team.TeamID, TeamName, PlayerTeam.PlayerID
FROM Team LEFT JOIN
PlayerTeam ON PlayerTeam.TeamID = Team.TeamID;
Figure 3.4: Left Outer Join
RIGHT OUTER JOIN
RIGHT JOIN joins two tables; the result matches attributes from both tables and it includes unmatched
rows from the table that is on the right of the JOIN clause.
3.6. GROUP BY
25
# RIGHT OUTER JOIN
# Showing team Zumey and team Nibeg do not belong to any players
SELECT Team.TeamID, TeamName, PlayerTeam.PlayerID
FROM PlayerTeam RIGHT JOIN
Team ON PlayerTeam.TeamID = Team.TeamID;
Figure 3.5: Right Outer Join
FULL OUTER JOIN
Joins rows based on matched values and returns rows from both tables. Not supported in MySQL; not
often used. Same results are obtained by using RIGHT OUTER JOIN, UNION with LEFT OUTER JOIN.
SELF JOIN
A table joined with itself is a SELF JOIN. It happens in a unary relationship, when a table with a foreign
key references the primary key in the same table. Think of a join on two tables which are same; each
row of one table is combined with each row of the other table. There is no explicit statement for a SELF
JOIN. Examples of SELF JOIN: Consider an employee table, with EmployeeID, Name and ManagerID. Most
employees will have a manager whose ID will be associated with the employee record.
A drug will have contraindications with another drug. DrugID, Name and ContradicationID will be in
the same record.
Exercise: Run the script SELF JOIN Demo.sql and SELF JOIN Query.sql
Final Note
In MySQL, JOIN, CROSS JOIN, and INNER JOIN are syntactic equivalents; they can replace each other.
In standard SQL, they are not equivalent. INNER JOIN is used with an ON clause
3.6
GROUP BY
GROUP BY is used with aggregate functions, it provides a summary of rows. Examples of aggregate
functions are AVG, SUM and COUNT.
The following SQL statement, adds the total number of people living in cities in each of the countries:
SELECT countrycode, SUM( population )
FROM city
GROUP BY countrycode;
Figure 3.6: Using GROUP BY clause
Run the query and observe the results. Remember to USE world;
Now remove the GROUP BY clause and run the query again.
SELECT countrycode, SUM( population ) FROM city;
26
LESSON 3. ADVANCED SQL
It now adds all population figures from all the rows in the city table. The arithmetic is correct but it
associates the aggregate with the first available country with CountryCode AFG.
Now try this query, with the GROUP BY clause but without the aggregate function:
SELECT countrycode, population FROM city GROUP BY countrycode;
The result may appear correct, but it simply takes the first city available for each country and displays
its population. As a rule, plan your query, know the expected result and verify the result.
These observations tell us that GROUP BY works on aggregate functions.
3.6.1
GROUP BY and HAVING
HAVING clause qualifies groups. Expanding the query in figure 3.6, to limit countries with the sum of city
populations less than 200000.
SELECT countrycode, SUM( population ) as Sum Population
FROM city
GROUP BY countrycode
HAVING SUM( population ) < 200000;
Figure 3.7: GROUP BY qualified by HAVING
In figure 3.8 the HAVING clause qualifies the countries. Observe the same aggregate SUM( population )
in the field list and HAVING clause. WHERE clause does not allow aggregates, it qualifies rows in a table.
Caution: The following query although syntactically correct gives incorrect results. In the procesing order
in figure 2.1 on page 8 HAVING clause is processed before the SELECT clause. An alias cannot be used in
the HAVING clause.
SELECT countrycode, SUM(population) as Sum Population
FROM city
GROUP BY countrycode
HAVING Sum Population < 200000;
Figure 3.8: Syntactically correct query with incorrect results
3.7
Sub-query
A query nested inside another query is known as a subquery
A correlated subquery, also known as a synchronized subquery, uses values from the outer query. The
subquery is evaluated once for each row processed by the outer query.
Source: http://en.wikipedia.org/wiki/Correlated subquery
The above reference provides examples of using a correlated subquery
3.8. LEARNING ACTIVITIES
3.8
27
Learning Activities
Use the world database to generate correlated subqueries. Identify the inner and outer queries
Is the result of the LEFT OUTER JOIN the same as the table on the left of the JOIN clause? Justify your
answer.
3.9
Review Questions
1. A join operation:
(a) is used to combine indexing operations
(b) joins two tables with a common attribute to form a single table or view, the common attribute
must be a prime key in both tables
(c) joins two tables with a common attribute to be combined into a single table or view
(d) joins two disparate tables to be combined into a single table or view
2. A join in which rows that do not have matching values in common columns are still included in the
result table is called a(n):
(a)
(b)
(c)
(d)
(e)
outer join
union join
equi-join
natural join
inner join
3. An operation to join a table to itself is called a:
(a)
(b)
(c)
(d)
(e)
self join
inner join
natural join
outer join
equi join
4. A type of query that is placed within a WHERE or HAVING clause of another query is called a:
(a)
(b)
(c)
(d)
(e)
PL/SQL
subquery
embedded SQL
dynamic SQL
trigger
28
LESSON 3. ADVANCED SQL
5. Identify the clause that takes a value of TRUE if a subquery returns one or more rows in an intermediate results table.
(a)
(b)
(c)
(d)
(e)
IN
HAVING
EXTENTS
EXISTS
WHERE
6. A type of subquery where processing the inner query depends on data from the outer query.
(a)
(b)
(c)
(d)
(e)
correlated
non-correlated
inner
outer
embedded
7. A subquery that is executed once for the entire outer query is
(a)
(b)
(c)
(d)
(e)
embedded
correlated
non-correlated
outer
dynamic
8. Which one of the sub-queries does not depend on data from the outer query
(a)
(b)
(c)
(d)
(e)
inner
embedded
correlated
non-correlated
outer
9. Identifying specific attributes in the SELECT clause, instead of using SELECT * will help reduce
network traffic
(a) True
(b) False
10. A unit of work that changes the state of a database is a/an
(a)
(b)
(c)
(d)
(e)
trigger
stored procedure
embedded query
transaction
dynamic SQL query
11. The SQL statement
SELECT * FROM < table>;
is a transaction
(a) True
(b) False
12. Identify the clause that is used to combine output from multiple queries into a single result table.
(a) COLLATE
3.9. REVIEW QUESTIONS
(b)
(c)
(d)
(e)
29
INTERSECT
DIVIDE
UNION
SELF JOIN
13. SELECT * FROM Student, Course; will result in a
(a)
(b)
(c)
(d)
(e)
NATURAL JOIN
SELF JOIN
OUTER JOIN
CROSS JOIN
INNER JOIN
14. A join operation is performed on two tables. The common field that is used for the join operation
has a few NULL values in each of the two tables. The join operation will:
(a)
(b)
(c)
(d)
match NULL values from the first table but not from the second table
match NULL values from the second table but not from the first table
match NULL values from both tables, since it is a join operation
not match NULL records from any of the two tables
15. Which one of the two SQL statements will filter countries with the letter C occurring only in the
middle of the three letter CountryCode
(a) SELECT countrycode, population
FROM city
WHERE countrycode like %C%;
(b) SELECT countrycode, population
FROM city
WHERE countrycode like C ;
16. An alternative term used for CROSS JOIN is
(a)
(b)
(c)
(d)
(e)
NATURAL JOIN
SELF JOIN
OUTER JOIN
CARTESIAN PRODUCT
INNER JOIN
17. SQL statements in a program written in another language such as C or Java are called
(a)
(b)
(c)
(d)
(e)
dynamic SQL
embedded SQL
sub-query
join
union
18. A named set of SQL statements that are executed when a data modification occurs are called:
(a)
(b)
(c)
(d)
(e)
Sub-queries
Trapdoors
Stored procedures
Triggers
PL/SQL
19. Identify the commands that need to be called explicitly, i.e. they cannot run automatically when a
transaction has taken place
30
LESSON 3. ADVANCED SQL
(a)
(b)
(c)
(d)
(e)
Sub-queries
Trapdoors
Stored procedures
Triggers
PL/SQL
20. Which one of the following returns values and take input parameters
(a) procedures
(b) functions
21. Embedded SQL statements can create a flexible and accessible interface for the user
(a) True
(b) False
22. Embedded SQL statements help enforce security by granting permission to required applications
instead of granting permission to users
(a) True
(b) False
23. SQL statements are built by DBMS at the time a user or procedure requests data from a DBMS or
a transaction is performed. The name given to such a concept is
(a)
(b)
(c)
(d)
(e)
Dynamic view
Material view
Dynamic SQL
Triggers
PL/SQL
Transaction Management
24. Which statement undoes a transaction
(a)
(b)
(c)
(d)
COMMIT
UNDO
REDO
ROLLBACK
25. Which statement make changes to a database permanent
(a)
(b)
(c)
(d)
(e)
COMMIT
UNDO
REDO
ROLLBACK
SAVE
26. Transaction management is needed when:
(a)
(b)
(c)
(d)
(e)
a transaction consists of just one SQL command
there is a security risk
multiple SQL commands must be run as a single transaction
there is more than one database administrator
triggers are used
3.10. SUMMARY
3.10
31
Summary
This chapter has a few intermediate and advanced SQL topics that are covered in later semesters, they
include triggers, stored procedures, embedded SQL, dynamic SQL and user defined data types. Subqueries (nested) queries are used in situations that are complex. In a sub-query, results of one or more
queries are evaluated first, these results then serve as a parameter to another query. Correlated queries
are types of queries also used where several tables are queried.