SQL3
SQL3 - Querying Complex Objects
Retrieving data from a database consists of 3 main processes:
1.Formulation of an information/data request - the query,
2.Query execution by the dbms query processor and
3.Result presentation.
A DB query language will always provide specification of the selection criteria for the
desired information and can also include information for the remaining processes, as
illustrated in the query examples of Figure 2.3b and 2.3c. Note here that the SQL query
includes elements of the execution process (the joins) and the desired result, while the text
IR query only gives the selection criteria, assuming that the system will provide the
selection process as well as determine the result presentation.
Figure 2.3a: A multiple media DB
Q1: List the titles of database texts written by Joan Q2: Find texts on database management using SLQ3 or
Nordbotten. MSQL or MSQL+.
SELECT D.Title Search within Document.body for:
FROM PERSON P, AUTHOR A, DOCUMENT D
WHERE P.Name ='Joan Nordbotten'
AND P.Id = A.PId Database ADJ1 Management
AND A.DId = D.Id AND (SQL3 OR MSQL%)
AND D.Title LIKE '%database%';
1: ADJ=adjacent
Result: a list of titles
Result: a set of documents
Figure 2.3b: SQL query Figure 2.3c: IR query
Figure 2.3: Query language examples
For structured/relational databases, the query language utilizes the well-defined DB
structure that gives names and dimensions to relations/tables and their attributes/columns,
for query specification, execution and result presentation, The SQL query in Figure
2.3b specifies the desired data and how it is to be retrieved. Assuming that there is
a Publication Date attribute in the Document table, the query could be extended to further
specify the result presentation by concluding the query with as
ORDER BY D.Date
clause and adding the D.Date reference to the SELECT clause.
SQL was designed for the regularly structured data of relational databases and for exact
match, Boolean queries, i.e. selection of data that exactly match the search values given in
the query conditions. It is currently the most well known of the structured DB query
languages and has formed a framework for query languages for other (non-relational)
database types such as object-oriented and spatial databases. However, SQL (and SQL2)
do not support access to complex structures.
7.1 New Features in SQL3
SQL3 was accepted as the new standard for SQL in 1999, after more than 7 years of
debate. Basically, SQL3 includes data definition and management techniques from Object-
Oriented dbms, OO-dbms, while maintaining the relational dbms platform. Based on this
merger of concepts and techniques, DBMSs that support SQL3 are called Object-
Relational or or-dbms'.
The most central data modelling notions included in SQL3 are illustrated in Figure 2.4 and
support specification of:
•Classification hierarchies,
•Embedded structures that support composite attributes,
•Collection data-types (sets, lists/arrays, and multi-sets) that can be used for multi-
valued attribute types,
•Large OBject types, LOBs, within the DB, as opposed to requiring external
storage, and
•User defined data-types and functions (UDT/UDF) that can be used to define
complex structures and derived attribute value calculations, among many other
function extensions.
Figure 2.4: DMS support for complex data-types
Query formulation in SQL3 remains based in the structured, relational model, though
several functional additions have been made to support access to the new structures and
data types. Note: there are syntactic differences between or-dbms implementations of this new functionality.
7.1.1 Accessing hierarchic structures
Hierarchic structures can be used at 2 levels, illustrated in Figure 2.4 for:
1.Distinguishing roles between entity-types and
2.Detailing attribute components.
A cascaded dot notation has been added to the SQL3 syntax to support specification of
access paths within these structures. For example, the following statement selects the
names and pictures of students from Bergen, Norway, using the OR DB specification
given by the SQL3 declarations in Figure 5.3a.
Fig.5.3a: SQL3 entity type specifications
Fig.5.3b: SQL3(=SQL2) relationship specification
Figure 5.3: Entity and relationship specification in SQL3
SELECT name, picture FROM Student
WHERE address.city = 'Bergen'
AND address.country = 'Norway';
The SQL3 query processor recognizes that Student is a sub-type of Person and that the
attributes name, picture and address are inherited from Person, making it unnecessary for
the user to:
•specify the Person table in the FROM clause,
•use the dot notation to specify the parent entity-type Person in the SELECT or
WHERE clauses, or
•specify an explicit join between the levels in the entity-type hierarchy,
here Student to Person.
7.1.2 Accessing multi-valued structures
SQL3 supports multi-valued (MV) attributes using a number of different implementation
techniques. Basically, MV attribute structures can be defined as ordered or unordered sets
and implemented as lists, arrays or tables either embedded in the parent table or
'normalized' to a linked table. Note: a specific or-dbms will support implementation of only a few of these options.
In our example in Figure 5.3a, Person.address is a multi-valued complex attribute, defined
as a set of addresses. In execution of the previous query the query processor must search
each City and Country combination for the result. If the query intent is to locate students
with a home address in Bergen, Norway and we assume that the address set has been
implemented as an ordered array in which the 1st address is the home address, the query
should be specified as:
SELECT name, picture FROM Student
WHERE address[1].city = 'Bergen'
AND address[1].country = 'Norway';
7.1.3 Utilizing user defined data types (UDT)
User defined functions can be used in either the SELECT or WHERE clauses, as shown in
the following example, again based on the DB specification given in Figure 5.3a.
SELECT Avg (age) FROM Student
WHERE Level > 4;
AND age > 22;
In this query age is calculated by the function defined for Person.age. The SQL3 processor
must calculate the relevant student.age for each graduate student (assuming
that Level represents the number of years of higher education) and then calculate the
average age of this group.
7.1.4 Accessing large objects
SQL3 has added data-types and storage support for unstructured binary and character large
objects, BLOB and CLOB respectively, that can be used to store multimedia documents.
However, no new query functionality has been added to access the content of these LOB
data, though most SQL3 implementations have extended the LIKE operator so that it can
also search through CLOB data. Thus, access to BLOB/CLOB data must be based on
search conditions in the metadata of formatted columns or on use of the LIKE operator.
Some or-dbms implementations have extended other character string operators to operate
on CLOB data, such as
•LOCATE, which returns the position of the first character or bit string within a
LOB that matches the search string and
•concatenation, substring, and length calculation.
Note that LIKE, concatenation, substring and length are original SQL operators that has
been extended to function with LOBs, while LOCATE is a new SQL3 operator. An
example of using the LIKE operator, based on the MDB defined in Figure 5.3a is
SELECT Description FROM Course
WHERE Description LIKE '%data management%'
OR Description LIKE '%information management%'
;
Note that the LIKE operator does not make use of any index, rather it searches serially
through the CLOB for the pattern given in the query specification. This is a very time consuming
operation!
7.1.5 Result presentation
While there are no new presentation operators in SQL3, both complex and derived
attributes can be used as presentation criteria in the standard clauses "group by, having,
and order by". However, Large objects, LOBs, cannot be used, since 2 LOBs are unlikely
to be identical and have no logical order. SQL3 expands embedded attributes, displaying
them in 1 'column' or as multiple rows.
Depending on or-dbms implementation, The result set is presented either totally, the first
'n' rows or one tuple at a time. If an attribute of a relation in the result set is defined as a
large object, LOB, its presentation may fill one or more screens/pages for each tuple.
SQL3, as a relational language using exact match selection criteria, has no concept of
degrees of relevance and thus no support for ranking the tuples in the result set by
semantic nearness to the query. Providing this functionality will require user defined
output functions, or specialized document processing subsystems as provided by some
OR-DBMS vendors.
These topics will be presented in some detail in the following chapters.
7.2 An example SQL3 query
The SQL3 query given in Figure 7.1 refers to the University DB described in Figure
5.1 and defined in SQL3 in Figure 5.3 .
The Figure 7.1 query uses the DB2/SQL3 syntax to express a query for the following
information request.
"Select the names and ages of students who are over 25 and have taken an advanced
Data Management course within the last 3 years."
The clauses of the query are described below. Note that the terms and phrases in the
English query have been translated to the DB terminology defined in Figure 5.3 .
1.Specifies the data to be retrieved by the query.
In this case,
•Student.Name and Age are inherited from the related Person record, extracted
by the DBMS/query processor through a join on the primary key fields.
Note: no join from Person to Student is specified in the query.
•Age will be calculated by the Age_f function prior to presentation.
•Course.Name and Course.Level form the criteria for the sorted presentation
specified in line 8.
•Course.Description will cause an output problem since each output row will
have a CLOB attribute, which is large. DB2 presents the result row by row
using the atomic attributes as a 'header' for the CLOB attribute.
2.Specifies the tables used in the query and gives each a short synonym for use in
the query.
Note that it is not necessary (or correct) to specify the Person table. The or-dbms
'knows' how to locate the attributes to be inherited by sub-entity types.
3.Specifies the join criteria for the tables. The result is a single table in which each
row has S.Id = T.Sid and T.Cid = C.Id.
Note that if the CLOB representing Course.Description is stored within the Course
record, this query join will
•first move a large amount of data from disk to memory prior to the join, and
•the join result will be very large, containing a CLOB within each row of the
table.
An alternative, used by DB2, is to store only a link or pointer to the media object,
called a locator, in the CLOB field of the Course table. This reduces the size of the
table containing a CLOB and reduces the time required to manipulate it. The media
object is only fetched for result presentation to the user or when it is assigned to a
program variable.
4.through 7 specify the selection criteria that must be matched in order for a row to
become part of the output set. Course.Level > 1 is used to indicate an advanced
course.
The comparison value depends on knowledge of the course codes in the application
domain.
5.Course Description has been defined as a CLOB data type for which the
character-string comparison operator LIKE can be used. If a word or phrase is the
search criteria, than it will not match the whole attribute value and must be enclosed
in '%' to indicate that any preceding and following characters (text) are acceptable.
In this example, texts containing both data and management will satisfy the query
no matter where these words appear. Note: if the phrase "data management" or
"database management" were explicitly required, than line 5 of the query should be
rewritten as:
AND (C.Description LIKE '%data management%' OR C.Description LIKE
'%database management%').
6.Uses the Person.Age function to restrict the set of students. Note that, through
inheritance, both attributes and functions defined in Person can be used in Student.
7.Specifies a date calculation to restrict the set of students. Note that the relationship
has been implemented as a table and becomes searchable as such.
8.Finally, those rows from the join specified in line 3, which satisfy the criteria
specified in lines 4-7, are sorted (ordered by) course.level and
then Course.name before output to the user.
Retrieval efficiency for Q1 would be enhanced if there were indexes
on TakenBy.Sid and TakenBy.Cid. Indexes are specified at system generation by either the:
•DB designer, using the Create Index statement, or
•DMS for integrity control of the uniqueness requirement for primary keys.
Unfortunately, SQL3 does not support index generation for CLOB data types (Chamberlin
1998).
To create a document index, an OR-DBMS extender must be used. Alternatively, the
application could use the UDT/UDF functions of SQL3 to implement an indexing routine
and an extension to the query processor to use the index in data retrieval.
Q1: "Select the names and ages of students who are over 25 and have taken an advanced Data
Management course in the last 3 years."
1. SELECT S.Name, Age, C.Name, C.Level, C.Description
2. FROM Student S, Course C, TakenBy T
3. WHERE S.Id = T.Sid and T.Cid = C.Id
4. AND C.Level > 1 ;
5. AND C.Description LIKE '%data%'
AND C.Description LIKE '%management%'
6. AND S.Age > 25
7. AND (CURRENT DATE - T.Date) < 3 YEARS
8. ORDER BY C.Level, C.Name
Figure 7.1: DB2/SQL3 query utilizing inheritance
7.3 Query optimization
The goal of any query processor is to execute each query as efficiently as possible.
Efficiency here can be measured in both response time and correctness.
The traditional, relational DB approach to query optimization is to transform the query to
an execution tree, and then execute query elements according to a sequence that reduces
the search space as quickly as possible and delays execution of the most expensive (in
time) elements as long as possible. A commonly used execution heuristic is:
1.Execute all select and project operations on single tables first, in order to eliminate
unnecessary rows and columns from the result set.
2.Execute join operations for further reduce the result set.
3.Execute operations on media data, since these can be very time consuming.
4.Prepare the result set for presentation.
Using the example from the query in Figure 7.1, a near optimal execution plan would be to
execute the statements in the following order:
1.Clauses 4, 6 and 7 in any order. Each of these statements reduces the number of
rows in their respective tables.
2.Clause 3. The join will further reduce the number of course tuples that satisfy the
age and time constraints. This will be a reasonably quick operation if:
•There are indexes on TakenBy.Sid and TakenBy.Cid so that an index join
can be performed, and
•The Course.Description clob has been stored outside of the Course table and
is represented by a link to its location.
3.Clause 5 will now search only course descriptions that meet all other selection
criteria. This will still be a time consuming serial search.
4.Finally, clause 8 will order the result set for presentation through the layout
specified in clause 1.