DBMS Unit-4
DBMS Unit-4
Table of Content
Contents
INTRODUCTION TO SQL: ...................................................................................................................... 1
CHARACTERISTICS OF SQL .................................................................................................................... 1
ADVANTAGES OF SQL .......................................................................................................................... 1
SQL DATA TYPES AND LITERALS ........................................................................................................... 2
TYPES OF SQL COMMANDS: ................................................................................................................. 5
SQL OPERATORS .................................................................................................................................. 6
SUBQUERIES ...................................................................................................................................... 19
AGGREGRATE FUNCTIONS ................................................................................................................. 21
CURSORS ........................................................................................................................................... 27
JOINS ................................................................................................................................................. 32
UNIONS ............................................................................................................................................. 36
TABLES, VIEWS AND INDEXES ............................................................................................................ 37
VIEWS................................................................................................................................................ 39
INDEXES ............................................................................................................................................ 44
UNION, INTERSECTION and DIFFERENCE ............................................................................................ 47
EMBEDDED SQL ................................................................................................................................. 48
INTRODUCTION TO SQL:
The relational database management systems and services is one of the main branches of the IT industry
where technological developments are taking place at a very rapid pace. We see hundreds of new
database and related products being released every month. Major software vendors are churning out
database products with more and more features with each release of their database offerings. The
competition in the database market segment is so intense that survival itself is extremely difficult. Other
than the database vendors there are thousands of companies who develop applications—for example,
ERP, e-commerce, banking, etc-which are implemented in one of the relational database management
systems. In addition, there are thousands of people who implement, maintain and use these systems.
Databases are becoming prominent by the day. Today, almost all applications-from spreadsheets and
word processors to statistical analysis tools-have the capability to seamlessly integrate with the
databases, get the required data from the databases and use it for performing tasks from statistical
analysis to mail-merging.
Structured Query Language (SQL) is the standard command set used to communicate with the
relational database management systems. All tasks related to relational data management—creating
tables, querying the database for information, modifying the data in the database, deleting them,
granting access to users, and so on—can be done using SQL. Different database vendors use different
dialects of SQL. But the basic features of all these different flavors are the same-they have the same
base, the ANSI SQL standard. Implementation of the advanced features of SQL differs from vendor to
vendor, but here also the concepts are the same. So the SQL skills are very much transferable. In other
words, except for the vendor specific enhancements, the SQL is the same.
CHARACTERISTICS OF SQL
SQL usage by its very nature is extremely flexible. It uses a free form syntax that gives the user the ability
to structure SQL statements in a way best suited to him. Each SQL request is parsed by the RDBMS
before execution, to check for proper syntax and to optimize the request. Unlike certain programming
languages, there is no need to start SQL statements in a particular column or be finished in a single line.
The same SQL request can be written in a variety of ways.
The origins of SQL are based on the felt need for such a flexible query language. The fact that SQL was
developed after the need for it was specified is evident from the relatively few commands it has.
Throughout its life cycle, SQL received natural extensions to its functional capabilities, and what was
originally intended as a query language has now become the complete database language.
ADVANTAGES OF SQL
The advantages of SQL are;
SQL is a high level language that provides a greater degree of abstraction than procedural
languages. It is so fashioned that the programmer can specify what data is needed but need not
specify how to retrieve it. SQL is coded without embedded data-navigational instructions. This
will be taken care of by the DBMS.
SQL enables the end-users and systems personnel to deal with a number of database
management systems where it is available. Increased acceptance and availability of SQL are also
in its favor.
Applications written in SQL can be easily ported across systems. Such porting could be required
when the underlying Database Management System needs to be upgraded or changed.
1|P a ge
SQL as a language is independent of the way it is implemented internally. A query returns the
same result regardless of whether optimizing has been done with indexes or not. This is because
SQL specifies what is required and not how it should be done.
The language while being simple and easy to learn can handle complex situations.
The results to be expected are well defined. The language has sound theoretical base and there
is no ambiguity about the way a query will interpret the data and produce the result.
SQL is not merely a query language. The same language can be used to define data structures,
control access to the data, delete, insert and modify occurrences of the data.
All SQL operations are performed at a set level. One select statement can retrieve multiple rows,
one modify statement can modify multiple rows. This set-at-a-time feature of the SQL makes it
increasingly powerful than the record-at-a-time processing techniques employed in languages
like COBOL. For example, one will have to code a full fledged COBOL program with more than 60
to 70 lines of code to achieve the same results achieved by the SQL statement UPDATE
emptable SET bonus=1000 WHERE emp_no= 86096.
Note: In the following discussions we will be using the term implementation-defined very frequently.
Implementation-defined, means that an implementation is free to decide how it will implement the SQL
feature in question, but the result of the decision must be well documented.
CHARACTER(n)
This data type represents a fixed length string of exactly 'n' characters where ‘n' is greater than zero and
should be an integer. CHARACTER is an abbreviation for CHARACTER(1) and CHAR is an abbreviation for
CHARACTER.
2|P a ge
CHARACTER VARYING(n)
This data type represents a varying length string whose maximum length is 'n' characters. Here also 'n' is
a positive integer. VARCHAR is an abbreviation for CHARACTER VARYING or CHAR VARYING.
NUMERIC(p,q)
This data type represents a decimal number, 'p' digits and sign with assumed decimal point 'q' digits
from the right. Both 'p' and ‘q' are integers, 'p' should be greater than zero, 'q' can be equal to zero but
'q' should be less than or equal to ‘p’ NUMERIC(p) is an abbreviation for NUMERIC(p,0). NUMERIC is an
abbreviation for NUMERIC(p), where 'p' is implementation-defined.
DECIMAL(p,q)
DECIMAL(p,q) represents a decimal number, 'm' digits and sign, with assumed decimal point 4q' digits
from the right. 0<=q<=p<=m, p >0. *p\ 'q' and 'm' should be integers. DECIMAL(p) is an abbreviation for
DECIMAL(p,0) and DEC is an abbreviation for DECIMAL DECIMAL is an abbreviation for DECIMAL(p),
where 'p' is implementation-defined.
INTEGER
INTEGER represents a signed integer—decimal or binary. INT is an abbreviation for INTEGER. Whether
the INTEGER is decimal or binary is implementation-defined.
SMALLINT
This data type represents a signed integer—decimal or binary. Whether the INTEGER is decimal or
binary, is implementation-defined. But it must be the same as that of INTEGER. Also the actual precision
of INT and SMALLINT is implementation-defined, but the precision of SMALLINT should not exceed that
of INT.
FLOAT(p)
FLOAT(p) represents a floating point number. FLOAT is an abbreviation for FLOAT(p), where 'p' is
implementation defined. REAL is an alternative term for FLOAT(s) where ‘s’ is implementation-defined.
DOUBLE PRECISION is an alternative spelling for FLOAT(d) where 'd' is implementation-defined. The
actual precisions ‘s’ and 'd' for REAL and DOUBLE PRECISION implementation-defined, but 'd' must be
greater than ‘s’.
LITERALS
There are four kinds of scalar literal values supported in SQL. They are:
Character strings
Bit string
Exact numeric
Approximate numeric
Character String:
3|P a ge
Character strings are written as a sequence of characters enclosed in single quotes. The single quote
character is represented within a character string by two single quotes, some examples of character
strings are:
□ 'Mathews Leon'
□ 'Structured Query Language'
□ 'SQL-92’
□ ‘Don"t Do; Delegate’
Bit String:
A bit string is written either as a sequence of 0s and Is enclosed in single quotes and preceded by the
letter ‘B’ or as a sequence of hexadecimal digits enclosed in single quotes and preceded by the letter 'X'.
Some examples are given below:
B’1011011’
B’1’
B'0'
X '1'
X’A5’
Exact Numeric:
These literals are written as a signed or unsigned decimal number possibly with a decimal point. Some
examples of exact numeric literals are given below.
9
90
90.00
0.9
0.09
+99.99
-99.99
Approximate Numeric:
Approximate numeric literals are written as exact numeric literals followed by the letter E followed by a
signed or unsigned integer. Some examples are:
9E9
99.99E9
+999E-9
0.99E9
-9.99E-9
4|P a ge
digits, with the actual precision being implementation-defined. The default and maximum values of 'p'
for NUMERIC and DECIMAL data types are implementation-defined. The data type FLOAT (or REAL or
DOUBLE PRECISION) is known as approximate numeric data types. The maximum and default values of
the precision for FLOAT are implementation-defined. Exact and approximate numeric data types are
collectively known as numeric data types.
For example, using SQL statements, you can create tables, modify them, delete the table, query the data
in the tables, insert data into the table, modify and delete the data, decide who gets to see the data, and
so on. The SQL statement is a simple set of instructions to the RDBMS to perform an action. This
statement contains reserved words and has a specific syntax. A partially constructed SQL statement
cannot be executed. SQL statements are divided into the following' categories:
Data Definition Language (DDL)
Data Manipulation Language (DML)
Data Query Language (DQL)
Data Control Language (DCL)
Data Administration Statements (DAS)
Transaction Control Statements (TCS)
(Note: Only Highlighted with BOLD typeface are under your BCA syllabus)
SQL is a very powerful language that benefits all types of users of the RDBMS. You use SQL to perform all
tasks with the RDBMS. SQL is a very flexible language that enables you to accomplish your development
efforts. SQL enables you to work with large groups of data rather than restricting you to single rows of
data. Additionally, SQL permits the results of one query to be the input to another query statement.
SQL removes the user from the burden of deciding on the correct access method. While using the SQL
you are specifying what information you want and not how it should be retrieved. The task of deciding
how to get the information is left to the RDBMS. The RDBMS uses its own internal optimizer to
determine the fastest and best means of accessing data. This simplifies you application and reduces
development overhead.
By defining the SQL as the sole database access language, the relational database management systems
eliminate the threat to data security and data compromise. You cannot bypass the SQL to access the
RDBMS.
5|P a ge
Data manipulation language commands let users insert, modify and delete the data in the database. SQL
provides three data manipulation statements—INSERT, UPDATE and DELETE.
Data Query Language (DQL) (Not in your syllabus but need to know)
This is one of the most commonly used SQL statements. This SQL statement enables the users to query
one or more tables to get the information they want. SQL has only one data query statement—SELECT.
Data Administration Statements (DAS) (Not in your syllabus but need to know)
Data administration commands allow the user to perform audits and analysis on operations within the
database. They are also used to analyze the performance of the system. Two data administration
commands are START AUDIT and STOP AUDIT. One thing to be remembered here is that, data
administration is totally different from database administration. Database administration is the overall
administration of the database and data administration is only a subset of that.
Transaction Control Statements (TCS) (Not in your syllabus but need to know)
Transaction control statements are statements, which manage all the changes made by the DML
statements. For example transaction statements commit data. Some of the transaction control
statements are COMMIT, ROLLBACK, SAVEPOINT and SET TRANSACTION.
SQL OPERATORS
Operators and conditions are used to perform operations such as addition, subtraction or comparison
on the data items in an SQL statement. Operators are represented by single characters or reserved
words. A condition is an expression of several operators or expressions that evaluates to True, False or
Unknown. There are two types of operators—binary and unary. The unary operator operates on only
one operand. For example to indicate that 10 is a negative number we use the unary operator '-' and
write -10. The binary operator operates on two operands. Examples are multiplication, addition, etc.
Operators and conditions are necessary features of any computer language. They enable you to perform
arithmetic, data comparisons and a variety of other data manipulations that are necessary to support
your application requirements. These are tools that can assist you in selecting the information that you
need. Operators are used to manipulate individual data items and return a result.
ARITHMETIC OPERATORS
Arithmetic operators are used in SQL expressions to add, subtract, multiply, divide and negate data
values. The result of this expression is a numeric value.
6|P a ge
The following is an example of a unary operator:
SELECT AUTHOR
FROM CATALOG
WHERE PRICE = 40;
Never use a double minus sign (--) in an arithmetic expression to indicate a double negative or the
subtraction of a negative number. The -- is reserved in SQL to indicate the beginning of comments.
Hence anything after the -- will be treated as a comment.
COMPARISON OPERATORS
These are used to compare one expression with another. The result of a comparison is True, False or
Unknown. The comparison operators are given in Table 14.2.
7|P a ge
Numeric data types are mutually comparable. If different data types are to be compared, as we have
mentioned before, one value is converted to the same data type as the other and then comparison is
done. The chosen data type is the 'higher' of the two using the following ordering-SMALLINT, INTEGER,
DECIMAL, NUMERIC, REAL, FLOAT, DOUBLEPRECISION. CHAR an VARCHAR are compatible and
comparable if they are taken from the same character set. For example, ASCII characters cannot be
compared to graphics characters; English cannot be compared to Chinese ad so on. In most cases this
will not be a problem as there will be only one character set. The comparison takes the shorter of the
two strings and pads it with spaces. The strings are compared position by position from left to right,
using the collating sequence of the character se (usually ASCII or EBCDIC). Datetime fields should have to
be of the same data type to be comparable. For example, you cannot compare a DATE data type with a,
TIME data type. The following are some examples where the comparison operators are used:
SELECT TITLE FROM CATALOG
WHERE PRICE >100;
Get the titles of all the books whose date of publishing is 1998.
SELECT TITLE FROM CATALOG
WHERE YEAR = 1998;
Get the titles of all the books that are not in the FICTION category.
SELECT TITLE FROM CATALOG
WHERE CATEGORY < > "FICTION";
Get the titles of all the books whose price is less than or equal to 500.
SELECT TITLE FROM CATALOG
WHERE PRICE <= 500;
Get the titles of alt the books whose price is greater than or equal to 800.
SELECT TITLE FROM CATALOG
WHERE PRICE >=800;
Row Comparison
SQL2 has provided the feature to compare the row as a whole. This feature is yet to be implemented in
most products but is very useful in the case of composite columns. For example, consider the following
row constructors A and B:
A = (10, 20, 30, 40)
B = (10, 20, 40, 40)
Now if you do the comparison A=B, it will become
(10, 20, 30, 40) = (10, 20, 40, 40) which is
(10 =10) AND (20=20) AND (30=40) AND (40=40) which is
(TRUE AND TRUE AND FALSE AND TRUE) which is TRUE.
The same process can be done with the other comparison operators : < >, >, <, >= and <=
8|P a ge
IS [NOT] NULL
The IS NULL predicate' is a test for a null value. It is the only way to test to whether a column or
expression is NULL or not. SQL2 has extended the NULL checking so that now you can check whether a
row is NULL or not. A row is said to be NULL values in that row are NULL. Consider the following
examples:
(1, 2, 5) IS NULL - False
(1, NULL, 5) IS NULL - False
(1, NULL, 5) IS NOT NULL - False
(NULL, NULL, NULL) IS NULL - True
(NULL, NULL, NULL) IS NOT NULL - False
NOT (1, 2, 5) IS NULL - True
NOT (1, NULL 5) IS NULL-True
NOT (1, NULL 5) IS NOT NULL-True
NOT (NULL. NULL, NULL) IS NULL - False
NOT (NULL NULL, NULL) IS NOT NULL - True
where the comparison operator is any one of the following -=,<>,<,<=,> and >=. SOME is a different
word for ANY (or in other words, SOME and ANY perform the same function). In general, an ALL
condition evaluates to TRUE if and only if the corresponding comparison condition without the ALL
evaluates to TRUE for all the rows in the table represented in the table expression. Similarly, an ANY
condition evaluates to TRUE if and only if the corresponding comparison condition without the ANY
evaluates to TRUE for any of the rows in the table represented in the table expression. If the table is
empty, then the ALL condition returns TRUE while the ANY condition returns FALSE. The meaning of ALL
and ANY is summarized in Table 14.4.
Consider the following query: "Get the names of all the books whose price is greater than the
maximum of the category averages." For the above query, first you will have to find the average price
of the book in each of the categories, then the maximum of those category averages and then find the
titles whose price is greater than the maximum category average. Here is how it is implemented:
SELECT TITLE FROM CATALOG
WHERE PRICE > ALL (SELECT AVG(PRICE)
FROM CATALOG GROUP BY CATEGORY);
The format of ALL and ANY are, as we have seen, 'x comparison-operator ALL (table-expression)' and x
comparison-operator ANY (table-expression)' where the comparison-operator is any of the following: =,
<>, <, <=, >, > = .
9|P a ge
x comparison-operator ALL (table-expression)' will evaluate to true if the expression 'x comparison-
operator y' evaluates to true for every 'y' 'n the result of evaluating the table expression, 'x comparison-
operator ALL (table-expression)' will evaluate to false if the expression 'x comparison-operator y'
evaluates to false for at least one 'y' in the result of evaluating the table expression, 'x comparison-
operator ALL (table-expression)' will evaluate to unknown in all cases when it is not evaluating to either
true or false, 'x comparison-operator ANY (table-expression)' will evaluate to true if the expression 'x
comparison-operator y' evaluates to true for at least one 'y' in the result of evaluating the table
expression, 'x comparison-operator ANY (table-expression)' will evaluate to false if the expression 'x
comparison-operator y' evaluates to false for every V in the result of evaluating, the table expression, 'x
comparison-operator ANY (table-expression)' will evaluate to unknown in all cases when it is not
evaluating to either true or false.
[NOT] EXISTS
In SQL, an existentially qualified condition is represented by an expression of the form 'EXISTS
(SELECT...... FROM......)'. Such an expression evaluates to true only if the result of evaluating the
subquery represented by the (SELECT..... FROM......) is nonempty, or in other words, only if there exists a
row in the table which satisfies the subquery. EXISTS is a test for a nonempty set. If there are any rows in
the result set it is TRUE otherwise it is FALSE. When using it, it is better to use SELECT * instead of
specifying a column name. This lets the query optimizer to decide which column to use. If some columns
have indexes then those can be used to answer the query, the optimizer can access just the index and
never has to search the table. Consider the following query: "Get the names of all the books that are in
the BOOK table and for which an order is placed."
SELECT TITLE FROM CATALOG
WHERE EXISTS
(SELECT * FROM ORDER_DETAILS
WHERE ORDERDETAILS.BOOKID = CATALOG. BOOKID);
In the above SQL, first the subquery 'SELECT * FROM ORDER_DETAILS WHERE ORDER_DETAILS.BOOKID
= CATALOG.BOOKID' is evaluated first and if there are any rows satisfying the subquery, then the first
part is evaluated, which will fetch the titles of all the books that are figured in the ordertable. Similar to
the EXISTS we can use NOT EXISTS also. For example if you want to get the names of all the books that
are not in the order table, you can use the following query:
SELECT TITLE FROM CATALOG
WHERE NOT EXISTS
(SELECT * FROM ORDERDETAILS
WHERE ORDER_DETAILS.BOOKID = CATALOG.BOOKID);
In the above SQL the sub-query will get all the titles that are in the order table and the NOT EXISTS
clause will get the book names that are in the book table but that are not in the order table. EXISTS
condition is not affected by nulls. The expression 'EXISTS (table-expression)' returns the value false if the
table-expression evaluates to an empty table and the value is true in all the other cases. Even when nulls
are involved 'EXISTS' returns false instead of returning the unknown truth-value. So the use of EXISTS in
cases involving nulls should be avoided.
[NOT] LIKE
Like is a very powerful clause and also very useful. For example if you want to get all the details of the
books whose publisher's name starts with 'M', use LIKE as follows:
SELECT TITLE, AUTHOR, PUBLISHER FROM CATALOG
10 | P a g e
WHERE PUBLISHER LIKE 'M%’;
In general the LIKE clause takes the form scalar-expression LIKE literal [ESCAPE character], where the
scalar expression represents the value of the string. In the literal, the ‘_’ character stands for any single
character, '%' stands for any sequence of n characters and all other characters stand for themselves. The
following are some of the examples of the usage of LIKE:
NAME LIKE '%al%' - Will evaluate to true if NAME contains the string al anywhere inside it.
PUBLISHER LIKE ‘M_n%’ - Will evaluate to true if PUBLISHER starts with M and has n as the third
character with any one character in the middle.
NAME LIKE ‘%c_’ - Will evaluate to true if NAME is more than 2 characters long and the last but
one character is ‘c’.
ESCAPE Clause
If the ESCAPE clause and a character are specified it means that, the special interpretation given to the
literal characters '_' and '%' can be disabled. In the following example the backslash character ‘\’ is
specified as the ESCAPE character, which means that the special interpretation given to '_' and '%' can
be disabled by preceding such characters with a backslash. So the query
SELECT TITLE, AUTHOR, PUBLISHER FROM CATALOG
WHERE PUBLISHER LIKE '%\_%' ESCAPE ‘\’;
will return any publisher name with an underscore ( _ ) in it. NOT LIKE is also available. For example,
PUBLISHER NOT LIKE ‘%E%’, will evaluate to true if PUBLISHER does not contain the letter ‘E’.
[NOT] IN
If you want to get the rows which contain certain values, the best way to do it is to use the IN
conditional expression. For example you want to get the title, author and publisher names of all books
published in 1993,1996 and 1998. The SQL will be as follows:
SELECT TITLE, AUTHOR, PUBLISHER, YEAR FROM CATALOG
WHERE YEAR IN (1993, 1996, 1998);
The same result can be obtained by using two individual comparisons connected together using an AND.
The following SQL will also get the same result as above:
SELECT TITLE, AUTHOR, PUBLISHER, YEAR FROM CATALOG
WHERE YEAR = 1993 OR YEAR = 1996 OR YEAR = 1998;
Another interesting point is that we can specify NOT IN. If we specify NOT IN for the above example then
all the rows for which the publication year is not 1993,1996 and 1998 will be retrieved.
[NOT] BETWEEN
BETWEEN can be used to get those items that fall within a range. For example consider the query: 'Get
all the details of the books whose price is in the range of 10 and 25 both inclusive’. The SQL will be:
SELECT TITLE, AUTHOR, PUBLISHER, PRICE FROM CATALOG
WHERE PRICE BETWEEN 10 AND 25;
Like IN, the same result can be obtained by using two individual comparisons connected together using
an AND. The following SQL will also get the same result as above:
SELECT TITLE, AUTHOR, PUBLISHER, PRICE FROM CATALOG
11 | P a g e
WHERE PRICE >= 10 AND PRICE <= 25;
Like IN, here also, we can specify NOT BETWEEN. For example the query
SELECT TITLE, AUTHOR, PUBLISHER, PRICE FROM CATALOG
WHERE PRICE NOT BETWEEN 10 AND 60;
The above query will retrieve all the books whose price is not between 10 and 60 dollars, that is either
above 60 dollars or below 10 dollars.
LOGICAL OPERATORS
A logical operator is used to produce a single result from combining the two separate conditions. Table
14.5 shows the logical operators and their definitions.
We have seen that we can create compound search conditions by combining together conditional
expressions using AND, OR and NOT. AND and OR can be used to combine more than one conditional
expression. We have also seen that parentheses can be used to enforce the desired order of evaluation.
When more than two search conditions are combined with AND, OR and NOT, according to the
standard, NOT has the highest precedence followed by AND followed by OR. The truth table for
evaluating the conditional expressions is given below (Table 14.6):
The following example uses the logical operator OR to select books which are published before 1999 or
which are in the category 'Fiction' or 'Business'.
12 | P a g e
SELECT TITLE FROM CATALOG
WHERE YEAR <1999 OR CATEGORY IN ("Fiction", "Business"):
If AND is used instead of OR then the query will return all books which were published before 1999 and
fall into the category 'Fiction' or 'Business'.
SELECT TITLE FROM CATALOG
WHERE YEAR <1999 AND CATEGORY IN ("Fiction", "Business");
SET OPERATORS
Set operators combine the results of two separate queries into a single result. Not all implementations
support INTERSECT and MINUS, so check whether your implementation supports these features before
using them.
UNION [ALL] is supported by all SQL-based products. Table 14.7 lists the set operators and their
definitions.
The following statement returns only those rows that exist in both sets of query results. Specifically, only
those books, which are in the CATALOG, table and also in the ORDER_DETAILS table:
SELECT TITLE FROM CATALOG INTERSECT
SELECT TITLE FROM ORDER_DETAILS;
You can have multiple set operators in a single SQL statement. The RDBMS evaluates these set
operators as having equal precedence and therefore evaluates from left to right.
OPERATOR PRECEDENCE
Precedence defines the order that the DBMS uses when evaluating the different operators in the same
expression. Every operator has a pre-defined precedence.
The DBMS evaluates operators with the highest precedence first before evaluating the operators of
lower precedence. Operators of equal precedence are evaluated from left to right. The order of
precedence is given in Table 14.8.
13 | P a g e
QUERIES
To query data from tables in a database, we use the SELECT statement. The SELECT statement
has many different options that one can use to retrieve the data that he/she wants. The result of a
SELECT statement is another table—a table derived from the tables used for the SELECT operation. The
fact that the result of a SELECT statement is another table is referred to as the closure property of the
relational systems.
The closure property means that, since the result of a SELECT operation is another table, it is
possible to apply another SELECT operation on the result. It also means that the SELECT operations can
be nested.
The syntax of the SELECT statement with most of the options is given below:
SELECT [ALL | DISTINCT]
scalar-expression(s)
FROM table(s)
[WHERE conditional-expression]
[CROUP BY column(s)]
[HAVING conditional-expression]
[ORDER BY column(s)];
In the above syntax only the SELECT statement and the FROM clause are required. All the other
four clauses—WHERE, GROUP BY, HAVING and ORDER BY—are optional. You include them in the SELECT
statement only when you require the functions they provide.
The SELECT statement lists the items—column names, computed values, the aggregate
functions, etc to be retrieved. The FROM clause specifies the table or tables from where the data has to
be retrieved.
The WHERE clause tells SQL to include only certain rows of data in the result set. It is in the
WHERE clause you specify the search criteria. For example, you might code something like ... 'WHERE
PRICE>20 in your SQL statement to get the items whose price is greater than 20. You can combine
multiple conditions in the WHERE clause using the logical operators AND and OR.
14 | P a g e
The GROUP BY clause specifies a summary query. This is usually used with aggregate functions like SUM,
AVG, MAX, MIN, etc. For example, the statementGROUP BY PUBLISHER will group the result set based
on the publisher name.
The HAVING clause tells SQL to include only certain groups produced by the GROUP BY clause in
the query result set. HAVING clause is the equivalent of the WHERE clause and is used to specify the
search criteria or search condition when GROUP BY clause is specified.
The ORDER BY clause sorts or orders the results based on the data in one or more columns in
the ascending or descending order. If nothing is specified, the result set will be sorted in ascending
order, which is the default. If you want the results to be sorted in the descending order, then you will
have to specify the keyword DESC. For example, '...ORDER BY PRICE DESC...' will produce a result set
where the items are shown in the descending order of price. If the ORDER BY clause is omitted, the
result set is not sorted.
We will now see the different flavors of the SELECT statement. SELECT statement appears in
many forms and a good understanding of which form to use is essential for any database user. We will
use a table called BOOK for the examples. The BOOK contains the details of books like ID, title, author
name, publisher name, date of publication, price, etc. The table (given in Chapter 13) is given below for
easy reference:
Table 17.1 BOOK Table
ID Title Author Publisher Year Price
A01 A Painted House Grisham Random House 2001 195.55
A02 Abduction Cook Pan Books 2000 360
A03 Airport Hailey Corgi Books 1968 175.45
B01 Biplane Bach Dell Books 1966 283.35
B02 Bloodline Sheldon Warner Books 1977 100.15
B03 Blue Gold Cussler Simon & Schuster 2000 285
C01 Catch 22 Heller Random House 1994 250.1
D01 Doctors Segal Bantam Books 1988 150
D02 Dragon Cussler Harper Collins 1990 123.55
F01 Flood Tide Cussler Simon & Schuster 1997 414.5
HOI Hawaii Michener Mandarin Books 1959 124.5
H02 Hotel Hailey Corgi Books 1965 175.75
101 Icon Forsyth Corgi Books 1996 182.35
102 Illusions Bach Dell Books 1997 330
103 Inca Gold Cussler Harper Collins 1994 124
104 Invasion Cook Pan Books 1997 177.9
O01 One Bach Dell Books 1988 289
P01 Prizes Segal Bantam Books 1995 256.1
S01 Serpent Cussler Simon & Schuster 1999 532.8
S02 Sheba Higgins Signet Books 1995 125
T01 The Class Segal Bantam Books 1965 145,8
TQ3 The Simple Truth Baldaccl Simon &. Schuster 1997 211
T04 Thunder Point Hlggins Signet Books 1993 95.65
T05 Timeline Crlchton Century Books 1999 623.3
V01 Vector Cook Macmillan 1999 424.8
15 | P a g e
names in the table(s), in the left-to-right order in which the columns appear in the table(s). Thus the
SELECT statement
SELECT * FROM BOOK;
is the same as
SELECT BK_ID, ID, TITLE, AUTHOR, PUBLISHER, YEAR, PRICE FROM BOOK;
Qualified Retrieval
Consider the query 'Get the title, author names and publisher of all books published in 1999 and
the price is greater than 300'. The SQL will be something like this:
SELECT TITLE, AUTHOR, PUBLISHER FROM BOOK
WHERE YEAR = 1997 AND PRICE > 500;
You can use all the comparison operators (=, <>, <, <s=, > and >=) in the WHERE clause. One
thing to be noted is that, the inequality comparison operator—not equal to-is expressed differently in
different implementations. According to the SQL standard it is written as '<>', but in DB2 and SQL/DS, it
is written as '¬=' and in MS-SQL Server it is written as '!=' and so on.
The conditional expression following the WHERE clause can consist of a simple comparison or it
can consist of multiple comparisons and other kinds of conditional expressions all combined together
using AND, OR and NOT and parentheses, if required, to indicate a des order of evaluation.
16 | P a g e
Select Using IN
If you want to get the rows which contain certain values, the best way to do it is to us e IN
conditional expression.
For example you want to get the title, author and publisher names of all books publishe 1999
and 2000. The SQL will be as follows;
SELECT TITLE, AUTHOR, PUBLISHER, YEAR FROM BOOK
WHERE YEAR IN (1999, 2000);
Like IN, the same result can be obtained by using two individual comparisons connected
together using an AND. The following SQL will also get the same result as above:
SELECT TITLE, AUTHOR, PUBLISHER, PRICE FROM BOOK
WHERE PRICE >= 400 AND PRICE <= 600;
Like IN, here also, we can specify NOT BETWEEN. For example the query
SELECT TITLE, AUTHOR, PUBLISHER, PRICE FROM BOOK
WHERE PRICE NOT BETWEEN 400 AND 600;
17 | P a g e
Will retrieve all the books whose price is not between 400 and 600 Rupees, that is 600 Rupees or below
400 Rupees as shown below:
Title Author Publisher Price
A Painted House Grisham Random House 195.55
Abduction Cook Pan Books 360
Airport Hailey Corgi Books 175.45
Biplane Bach Dell Books 283.35
Bloodline Sheldon Warner Books 100.15
Blue Gold Cussler Simon & Schuster 285
Catch 22 Heller Random House 250.1
Doctors Segal Bantam Books 150
Dragon Cussler Harper Collins 123.55
Hawaii Michener Mandarin Books 124.5
Hotel Hailey Corgi Books 175.75
Icon Forsyth Corgi Books 182.35
Illusions Bach Dell Books 330
Inca Gold Cussler Harper Collins 124
Invasion Cook Pan Books 177.9
One Bach Dell Books 289
Prizes Segal Bantam Books 256.1
Sheba Higgins Signet Books 125
The Class Segal Bantam Books 145.8
The Runner Reich Headline Books 199.95
The Simple Truth Baldacci Simon & Schuster 211
Thunder Point Higgins Signet Books 95.65
Timeline Crichton Century Books 623.3
In general the LIKE clause takes the form scalar-expression LIKE literal [ESCAPE character], where
the scalar expression represents the value of the string. In the literal the '_' character stands for any
single character, '%' stands for any sequence of n characters and all other characters stand for
themselves. The following are some of the examples of the usage of LIKE:
NAME LIKE "%al%" - Will evaluate to true If NAME contains the string 'al' anywhere inside it.
18 | P a g e
PUBLISHER LIKE "M_n%" - Will evaluate to true If PUBLISHER starts with 'M' and has 'n’ as the
third character,
NAME LIKE "%c_" Will evaluate to true if NAME is more than 2 characters long and the last but
one character is 'c'.
ESCAPE Clause
If the ESCAPE clause and a character is specified it means that, the special interpretation given to the
literal characters'_' and '%' can be disabled. In the following example the backslash character '\’ is
specified as the ESCAPE character, which means that the special interpretation given to '_’ and '%' can
be disabled by preceding such characters with a backslash. So the query
SELECT TITLE, AUTHOR, PUBLISHER, FROM BOOK
WHERE PUBLISHER LIKE "%\_%" ESCAPE”\”;
will return any publisher name with an underscore (_) in it. NOT LIKE is also available. For example,
'PUBLISHER NOT LIKE "%E%";' will evaluate to true if PUBLISHER does not contain the letter ‘E’
Note: Other likes: Selecting computed values, Selecting involving NULLs, Grouping while selecting,
Ordering while selecting, AND, OR, NOT operation, can be seen from the reference book and the
internet.
SUBQUERIES
Subqueries are nested SELECT statements. Or in other words, they are 'SELECT...FROM...
WHERE...' expressions nested inside another such expression. Subqueries enable the user to base the
search criteria of one SELECT statement on the results of another SELECT statement. The ability to use a
query within a query (or a nested query) was the original reason for the word 'structured’ in the name
structured query language.
A subquery is the most natural way of expressing a query, as it parallels the English Language
description of the query. For example, 'Get the names of all the departments which have employees
whose salary is greater than 20000' can be expressed with the help of a subquery as follows:
SELECT DEPT_NAME
FROM DEPARTMENTTABLE
WHERE DEPT_ID IN (SELECT DISTINCT DEPT_D
FROM EMPLOYEE_TABLE WHERE SALARY >20000);
What is a Subquery?
Suppose we want to find out the details about book club members who have placed orders for books, if
we were doing it manually we will first look Into the ORDERSUMMARY table to find out the member IDs
and for each ID we will look up the MEMBER table to get the member details. If we use SQL statements
in this process first we will execute the following SQL on the ORDERSUMMARY table:
SELECT DISTINCT MEMBER_ID
FROM ORDER_SUMMARY;
The above query will get the distributor IDs in the ORDERS table - C01, C02, C03 am C05. Now we can
get the details about these distributors by querying the DISTRIBUTOR table with the above result as
follows:
SELECT * FROM MEMBER
WHERE MEMBER_ID IN ("C01", "C02", "C03", "C05");
19 | P a g e
As mentioned above, subqueries are queries that appear within the WHERE or HAVING dause of another
SQL statement. Subqueries provide an easy and efficient way to handle queries that require the results
from another query. Subqueries allow you to combine these two queries so that you can get the results
using a single SQL statement as follows:
SELECT * FROM MEMBER
WHERE MEMBERJD IN (SELECT DISTINCT MEMBER_ID
FROM ORDERSUMMARY);
Note that the keyword DISTINCT is used in the subquery to eliminate the duplicate distributor IDs.
Subqueries always appear as part of the WHERE clause or HAVING clause. In the WHERE dause
they select the individual rows that appear in the query results. In the HAVING clause, they help to
select the row groups that appear in the query results. The subquery, as seen from the above examples,
is always enclosed in parentheses. It is another SELECT statement with a FROM clause and optional
WHERE, GROUP BY and HAVING clauses. The subqueries are almost identical with the ordinary SELECT
statements, but there are a few differences:
A subquery must produce a single column of data as its result. In other words, the subquery can
have only a single select item in its SELECT clause. So you cannot use a 'SELECT *' in a subquery
unless the table you are referring has only one column.
The ORDER BY clause cannot be specified in a subquery. Since the results of the subquery are
used internally and are not displayed to the user, ordering does not make much sense. You can
specify the ORDER BY clause in the main query to order the results.
A subquery cannot be a UNION, only a single SELECT statement Is allowed.
Even though there are many ways in which subqueries can be formed, they typically are expressed as
one SELECT statement connected to another in one of the following ways:
Using IN or NOT IN predicate Specifying the equality (=») or inequality (<>) predicate
Specifying a predicate using an operator (<, >, <«, >», ANY, ALL, EXISTS, etc,
Note: Types of Sub-query- Nested Subqueries, Parallel Subqueries and Correlated Subqueries, You can
refer the book and internet sources.
20 | P a g e
AGGREGRATE FUNCTIONS
The aggregate functions greatly enhance the power of the SQL statement, They let you mmartze the
data from the tables. An aggregate function takes an entire column of data as Its and produces a single
data item that summarizes the column. The aggregate functions Tided by SQL are:
COUNT( )
COUNT(*)
SUM()
AVG()
MAX()
MIN()
GENERAL RULES
SQL provides six aggregate functions. These are powerful tools and can improve the data retrieval power
considerably. There are some rules, which must be followed while using these functions. They are:
For SUM and AVG the argument must be of type numeric.
Except for the special case COUNT(*), the argument may be preceded by the key word DISTINCT
to eliminate the duplicate rows before the function is applied to a column. The alternative to
DISTINCT is ALL. which is the default. The DISTINCT Is legal for MAX and MIN but meaningless.
The special function COUNT(*) which is used to all rows without any duplicate elimination and
so the keyword DISTINCT is not allowed for this function.
The argument cannot involve any aggregate function references or table expressions at any level
of nesting. For example the SQL ' SELECT AVG(MIN(QTY)) AS AVERAGE' 1$ illegal.
Any NULL in the column is eliminated before the function is applied, regardless of whether
DISTINCT is specified qr not except in the case of COUNT(*) where nulls are handled like normal
values.
When using the MIN and MAX with string data, the comparison of the strings Is dependent on
the character set that is being used. In computers using ASCII character set, digits come before
letters in the sorting sequence and all uppercase characters come before the lowercase
characters. On machines that use the EBCDIC character set. the order is lower case characters,
uppercase characters and then digits. Because of this difference in collating sequence, a query
using the ORDER BY clause can produce different results in the two systems—hence there will
be differences in the results of the MIN and MAX functions.
EMPNO 1 NAME DEPTID BASIC HRA DEDUCTIONS TAX
2400,00
100 Joseph Heller D1 8000.00 1600.00 2000.00
21 | P a g e
COUNT() AND COUNT(*)
COUNT() Is used to count the number of values In a column. COUNT(*) is used to count the number of
rows of the query results. Consider the following examples:
Get the number of rows in the employee table.
SELECT COUNT(*) FROM EMPLOYEE;
Get the number of employees in the department *DT and basic pay less than 6000.
SELECT COUNT(NAME) FROM EMPLOYEE
WHERE DEPTID - "Dl"
AND BASIC < 6000;
SUM()
SUM( ) is used to find the sum of the values in a column. The following examples will illustrate its usage:
Find the total basic pay for all the employees in the organization.
SELECT SUM(BASIC) FROM EMPLOYEE;
The SQL will return 71800.00
Find the total basic pay for all the employees in the department 'D1’.
SELECT SUM(BASIC) FROM EMPLOYEE
WHERE DEPTID = "D”:
The SQL will return 25800.00
Find the total basic pay for all the employees in the department 'D1’ whose basic pay is greater than
6000.
SELECT SUM(BASIC) FROM EMPLOYEE
WHERE DEPTID = "D1"
AND BASIC > 6000;
The SQL will return 15500.00
Find the total pay for all the employees in the department 'D1’.
SELECT SUM(BASIC + HRA - DEDUCTIONS - TAX) FROM EMPLOYEE
22 | P a g e
WHERE DEPTID = "D1";
The SQL will return 20585.00
Find the total pay for all the employees in the department 'D1’ whose basic pay is greater than 6000.
SELECT SUM(BASIC + HRA - DEDUCTIONS - TAX) FROM EMPLOYEE
WHERE DEPTID = "D1"
AND BASIC > 6000;
The SQL will return 11450.00
AVG()
avg( ) is used to find the average of the values in a column, Consider the following queries:
Find the average pay of an employee in the department 'D1’ whose HRA greater than 1000.
SELECT AVG(BASIC + HRA DEDUCTIONS - TAX) FROM EMPLOYEE
WHERE DEPTID = ‘D1’ AND HRA > 1000;
The SQL will return 5408.33
Find the name of all the employees whose bask pay is greater than average basic pay.
SELECT NAME FROM EMPLOYEE
WHERE BASIC > (SELECT AVG(BASIC) FROM EMPLOYEE);
The SQL will return the following:
Name
Joseph Heller
Erich Segal
Jeffrey Archer
Sidney Sheldon
Robert Ludlum
Jack Higgins
23 | P a g e
Find the name of the employee who gets the maximum basic pay.
SELECT NAME FROM EMPLOYEE
WHERE BASIC = (SELECT MAX (BASIC) FROM EMPLOYEE);
The SQL will return the following: NAME
Name
Joseph Heller
Get the department ID, the average, maximum and minimum basic pay of all the departments.
SELECT DEPTID, AVG(BASIC), MAX(BASIC), MIN(BASIC) FROM EMPLOYEE
GROUP BY DEPTID;
The following will be the result of the above query:
DEPTID AVG(BASIC) MAX(BASIC) MIN(BASIC)
Dl 6450.00 8000.00 4800.00
D2 6466.67 7000.00 6000.00
D3 5450.00 5600.00 5300.00
D4 5233.33 6800.00 4200.00
INSERT STATEMENT
This statement, as the name suggests, Is used for inserting rows into a table. The general syntax of the
insert statement h as follows:
INSERT
INTO table-name [[column [.column]
VALUES [literal [.literal].....]];
Or
INSERT
INTO table-name [ [column [.column] ...]]
subquery;
In the first format a single row is inserted Into the table having specified values for specified columns.
The first literal corresponds to the first column; the second literal corresponds to the second column and
so on. In the second format the subquery is evaluated and a copy of the result (usually multiple rows)
are inserted into the table. Here also the one-to-one correspondence between the literals and column
names holds, in both cases omitting the list of columns is equivalent to specifying all columns of the
target table in their left-to-right order within that table. But this practice of omitting the list of columns
is not recommended.
24 | P a g e
INSERT INTO BOOK (ID, TITLE, AUTHOR, PUBLISHER, YEAR, PRICE)
VALUES ('B2T, '60 Minutes Software’, 'Parkinson’, 'Wiley', 1996, 450);
Or
INSERT INTO BOOK
VALUES ('B2T, '60 Minutes Software', ‘Parkinson’, ‘Wiley', 1996, 450);
Now the BOOK table will have one more row added to it. Now let us take another, example, we do not
know the PRICE of the book, but still want to add it to the table. We can do it as follows:
INSERT INTO BOOK (ID, TITLE, AUTHOR, PUBLISHER, YEAR, PRICE)
VALUES ('B2T’, '60 Minutes Software’, 'Parkinson’, 'Wiley', 1996);
A new row is created in the book table with the specified details like ID, Title, Author. Publisher,
Category and Year columns. Here there are few points that should be noted.
If NOT NULL WITH DEFAULT is specified for the columns the default values will be | set for those
unfilled columns (0 for numeric columns, spaces for character columns etc.).
If NOT NULL is specified, then the INSERT will fail and the database will remain unchanged.
If NOT NULL is not specified the column is set to null.
UPDATE STATEMENT
The UPDATE statement is used to modify or update an already existing row or rows of a table. The
syntax for UPDATE statement is:
UPDATE table-name
SET column = scalar-expression
[..column = scalar-expression].
[WHERE condition];
All rows in the table, which satisfy the condition, will be updated in accordance with the nments in the
SET clause. If the WHERE clause is omitted, all rows in the table will be updated. Some examples of the
update operation are given below:
1. Change the PRICE of the book B01 to 600.
UPDATE BOOK
SET PRICE = 600
WHERE ID = ‘B01’;
2. Increase the price of all books, which are published before 1997 by 20%
UPDATE BOOK
SET PRICE = PRICE * 1.2
25 | P a g e
WHERE YEAR < 1997;
3. Change the publisher name for all 'Microsoft Press* to 'mp' and increase the price by 10%.
UPDATE BOOK
SET PUBLISHER = 'MP',
PRICE = PRICE * 1.10
WHERE PUBLISHER = 'Microsoft Press';
4. Change the publisher name for all 'ITP’ books to 'international Thompson Press’ and reduce 15 from
the price.
UPDATE BOOK
SET PUBLISHER = 'International Thompson Press’,
PRICE = PRICE -15
WHERE PUBLISHER = 'ITP’;
5. Change the name of the book 'Abduction’ to 'Abduction (Collector's Edition)', year of publishing to
2001 and increase the price to 850.
UPDATE BOOK
SET TITLE = 'Abduction (Collector's Edition)',
YEAR = 2000,
PRICE = 850
WHERE TITLE = 'Abduction';
DELETE STATEMENT
The DELETE statement is used to delete an already existing row or rows from a table. The syntax
for DELETE statement is:
DELETE
FROM table-name [WHERE conditional-expression];
All rows in the table, which satisfy the condition, will be deleted. If the WHERE clause is omitted all rows
will be deleted. Consider the following examples.
3. Delete all books whose price is greater than the average price
DELETE FROM BOOK
WHERE PRICE > (SELECT AVG(PRICE) FROM BOOK);
26 | P a g e
CURSORS
INTRODUCTION
The SQL standard does not permit a multi-row retrieval operation to be executed as a statement
in its own right—except in direct SQL. This is because the standard is primarily concerned with the use of
SQL in conjunction with a host language—embedded SQL
There are two types of embedded SQL SELECT statements, singleton SELECTS and cursor
SELECTS. SQL statements operate on a set of data and return a set of data. Host language programs, on
the other hand, operate on a row at a time. A singleton SELECT is simply an SQL SELECT statement that
returns a single row. The singleton SELECT differs from the ordinary SELECT statement in that it contains
an INTO clause. The INTO clause is where you code your host variables that accept the data returned by
the RDBMS. But If such a SELECT statement returns more than one row, the values of the first row are
placed in the host variable and the RDBMS will issue an error code. So in your application program, if the
SELECT will return more than one row then you must use Cursors.
A cursor is an SQL object that is associated with a specific table expression. The RDBMS uses
cursors to navigate through a set of rows returned by an embedded SQL SELECT statement. A cursor can
be compared to a pointer. The programmer declares a cursor and defines the SQL statement for the
cursor. After that you can use the cursor like a sequential file. The cursor is opened, rows are fetched
from the cursor, one row at a time, and at the end of processing the cursor is closed.
CURSOR OPERATIONS
The four operations that must be performed for the successful working of the cursor are:
DECLARE - This statement defines the cursor, gives it a name to it and assigns an SQL
statement to it. The DECLARE statement does not execute the SQL statement but
merely defines it.
OPEN - This makes the cursor ready for row retrieval. OPEN is an executable statement.
It reads the SQL search fields, executes the SQL statement and sometimes builds the
result table.
FETCH - This statement returns data from the result table one row at a time to the host
variables. If the result table is not built at the OPEN time, it is built during FETCH.
CLOSE - Releases all resources used by the cursor.
When cursors are used to process multiple rows, the cursor is DECLAREd and OPENed and the
FETCH statement is coded in a loop that reads and processes each row. At the end of the processing,
that is when there are no more rows to be fetched, the FETCH statement returns an SQLCODE of +100
indicating no more rows. The cursor is then CLOSEd.
You can modify or delete a row by using the SQL statements UPDATE and DELETE. But if you
want to read a row and depending upon the values in the row, you want to modify, delete or do
nothing, you can do that with a cursor. This is accomplished with a cursor and a special clause of
UPDATE and DELETE statements usable only by embedded SQL statements, namely WHERE CURRENT
OF. The cursor is declared with a special FOR UPDATE OF clause.
The following code segment gives how a cursor is implemented in COBOL. The standard does
support other languages like FORTRAN, PL/I, etc. as well, and the rules of using cursors are essentially
the same in all the languages.
27 | P a g e
CURSOR POSITIONS
When a cursor is open, it designates a certain collection of rows and a certain ordering for that
collection. It also designates a certain position with respect to that ordering. The possible positions are:
On some specific row ('on' state)
28 | P a g e
Before some specific row ('before’ state)
After some specific row ('after' state)
Cursor state Is affected by a variety of operations. OPEN positions the cursor before the first row.
FETCH NEXT positions the cursor on the next row or if there is no next row, after the last row. FETCH
PRIOR positions the cursor on the prior row or before the first row, if there is no prior row. There are
other FETCH formats like FIRST, LAST. ABSOLUTE n, RELATIVE n, etc. which are discussed In the section
on FETCH statement. If the cursor is on some row and that row is deleted by that cursor (using a DELETE
CURRENT operation), the cursor is positioned before the next row or after the last row.
All cursors are In the closed state at transaction initiation and are forced into the closed state at
transaction termination. While the transaction is being executed, the same cursor can be opened and
closed any number of times.
If INSENSITIVE is specified, OPEN effectively causes a separate copy of the table to be created
and the cursor accesses that copy rather than the table. The updates that affect the table while the
cursor is open will not be visible in the cursor. INSENSITIVE means the users will not know of any
updates that are happeniqg to the table. UPDATE and DELETE CURRENT operations are not allowed
using an INSENSITIVE cursor.
If SCROLL is specified, all forms of FETCH are legal against the cursor. If SCROLL is omitted then
the only FETCH that is permitted is FETCH NEXT.
If READ ONLY is specified, UPDATE and DELETE CURRENT operations using the cursor will not be
allowed. If UPDATE is specified, then the table must be updateable. if the OF column-commalist Is
specified, then every column name in the commalist must be the unqualified name of a column of the
table. Omitting the OF column-commalist is equivalent to specifying all column names of the table.
UPDATE CURRENT operations using the cursor on columns not identified in the OF column-
commalist will not be allowed. UPDATE CURRENT operations on columns that are mentioned in the
ORDER BY clause will not be allowed. UPDATE CURRENT operations on coiumns mentioned in the FOR
UPDATE clause and not mentioned in the ORDER BY clause will be allowed.
DELETE CURRENT operations using the cursor will be allowed if FOR UPDATE is specified. If
neither READ ONLY nor FOR UPDATE is specified, the table is not updateable, if INSENSITIVE, SCROLL or
ORDER BY is specified. That is, if any of these clauses are specified, then the cursor is assumed to be
READ ONLY. If the above clauses are not specified then FOR UPDATE is assumed.
ORDER BY Clause
The ORDER BY clause specifies an item list on which the cursor is to be ordered. The 'syntax of
the ORDER BY clause is as follows:
ORDER BY {column | unsigned-integer} [ASC | DESC]
29 | P a g e
Usually each ORDER BY clause consists of a column name of the table. The optional specification
ASC or DESC indicates the ascending or descending order - ASC being the default.
DECLARE TSTCR CURSOR
FOR SELECT TITLE, AUTHOR, PRICE
FROM BOOK
ORDER BY TITLE;
Otherwise the ORDER BY clause can contain an integer indicating the ordinal position of the
column instead of the column name. Consider the following example:
DECLARE TSTCR CURSOR
FOR SELECT CATEGORY, AVG(PRICE)
FROM BOOK
GROUP BY CATEGORY
ORDER BY 2;
The integer refers to the ordinal (left-to-right) position of the column. This feature makes it
possible to define an ordering on the basis of the column that does not have a proper name. For
example, in the above definition, the 2 refers to the average price. But you can always use an alias in the
ORDER BY clause instead of using the integer as shown below:
DECLARE TSTCR CURSOR
FOR SELECT CATEGORY, AVG(PRICE) AS AVGPRICE
FROM BOOK
GROUP BY CATEGORY
ORDER BY AVGPRICE;
The target-commalist is the commalist of parameters or host variables. The row-selector can be any of
the following:
NEXT
PRIOR
FIRST
LAST
ABSOLUTE n
RELATIVE n
NEXT is the only legal row-selector if the cursor is not defined with a SCROLL option. If the cursor has a
SCROLL option, then the other values are legal. If row-selector is omitted then NEXT is assumed. FROM
is optional, but required if the row selector is explicitly specified.
30 | P a g e
The meanings of NEXT, PRIOR, FIRST and LAST have been discussed before and are self-
explanatory. 'ABSOLUTE n' refers to the nth row in the ordered table that the cursor is associated with.
A negative value for '.n' means 'n* rows backward from the end of the table. 'RELATIVE n' refers to the
nth row in the table relative to the row on which the cursor is currently positioned. The H in ABSOLUTE
and RELATIVE row-selectors can be a literal, parameter or host variable of exact numeric data type with
a scale of zero.
The target-commalist must contain exactly one target for each column retrieved. Consider the
following example, which fetches the next row:
FETCH NEXT FROM LACRYQ1
INTO :EMPNO,
:NAME,
:DEPT,
:SALARY
Each column to be updated must have been mentioned in a FOR UPDATE clause in cursor
definition. For example consider the following cursor definition:
The SALARY column is mentioned in the FOR UPDATE clause. So the following UPDATE
statement will update the SALARY column.
UPDATE EMPLOYEE
SET SALARY = WS-SALNEW
WHERE CURRENT OF LACRYQ1
The cursor must be open and must be updateable. The row on which the cursor is positioned will be
deleted. For example
DELETE
FROM EMPLOYEE
WHERE CURRENT OF LACRYQ1
31 | P a g e
Closing the Cursor-CLOSE
To explicitly close the cursor, you can use the CLOSE statement. The format of the CLOSE statement is
given below:
CLOSE cursor-name
JOINS
The capability of retrieving data from multiple tables using a single SQL statement is one of the
most powerful and useful features of any RDBMS. The more tables involved in the SELECT statement,
the more complex the SQL.
It is the availability of Join operation, almost more than anything else that distinguishes
relational from non-relational systems. Join is a query in which data is retrieved from more than one
table. A join matches data from two or more tables, based on the values of one or more columns in each
table. All matches are combined; creating a resulting row that is the concatenation of the columns from
each table where specified columns match.
32 | P a g e
details)—relies on the join operation to enable you to perform ad hoc queries that will combine the
related data which resides in more than one table. So we can say that join is one of the key operations
of the relational model.
Get the order number and member name for all orders in the ORDER SUMMARY table.
SELECT ORDER_SUMMARY.ORDER_NO, MEMBER.NAME
FROM ORDER_SUMMARY, MEMBER
WHERE ORDER_SUMMARY.MEMBER_ID - MEMBER.MEMBER_ID;
Use of Aliases
You can use aliases to improve the readability of the SQL. Also you need not key In the table name each
and every time. As we have seen, when the columns in the different tables are being Joined, we need to
qualify the column names with their corresponding table names. Giving aliases to tables can make this
job easier. Aliases can be letters, numbers or a combination. Make the aliases short and easy to
remember. So if we were to uses aliases for the previous query, then it would be as follows:
SELECT NAME, PAY
FROM EMPLOYEE E, SALARY S
WHERE E.EMPNO = S.EMPNO;
In some database implementations (for example MS- Access) you need to use the keyword 'AS' for
aliasing instead of the space. In such cases you will have to specify 'EMPLOYEE AS E’ instead of
'EMPLOYEE E’ Check with the database manual for details.
33 | P a g e
Get the title and publisher names of all books that are priced above 1000. Here to get the
required information we have to join the tables CATALOG and PUBLISHER and the join condition
is a greater than comparison operator, and hence the join is a non-equijoin (sometimes called a
'greater-than join').
SELECT TITLE, NAME
FROM CATALOG, PUBLISHER
WHERE CATALOG.PUBLISHER_ID = PUBLISHER.PUBLISHER_ID
AND PRICE >1000;
WHERE Clause
The WHERE clause in a join-SELECT can include other conditions in addition to the join condition.
Get the title, author name, country and price of all the books with India based authors and price
less than 500.
SELECT TITLE, NAME, COUNTRY, PRICE
FROM CATALOG, AUTHOR
WHERE CATALOG.AUTHOR_ID = AUTHOR.AUTHOR_ID
AND AUTHOR.COUNTRY = "India"
AND CATALOG.PRICE <500;
34 | P a g e
Natural Join
The equijoin must produce a result containing two identical columns. If one of these columns is
eliminated then that join is called the 'natural join'. For example consider the following example:
SELECT CATALOG.BOOK_ID, CATALOG.TITLE, CATALOG.AUTHOR_ID,
CATALOG.PUBLISHER_ID, CATALOG.CATEGORY_ID, CATALOG.YEAR, CATALOG.PRICE,
CATEGORY. DESCRIPTION
FROM CATALOG, CATEGORY
WHERE CATALOG.CATEGORY_ID = CATEGORY.CATEGORY_ID
AND CATALOG.PRICE >1000;
The above query will get a result set that contains 58 rows. It does not make sense and does not appear
to be correct. In order to clarify the situation, let us first remove the duplicates. We can do it by using
the keyword DISTINCT in the select clause. So the SQL is now as follows:
SELECT DISTINCT C1.TITLE, C1.PRICE
FROM CATALOG C1, CATALOG C2
WHERE C1.PRICE = C2.PRICE;
Outer Joins
When tables are joined, rows, which contain matching values in the join predicates, are returned.
Sometimes, you may want both matching and non-matching rows returned for the tables that are being
joined. This kind of an operation is known as an outer join.
An outer join is an extended form of the ordinary (inner) join. It differs from the inner' join in
that the rows in one table having no matching rows in the other table will also appear in the results
table with nulls in the other attribute positions, instead of being ignored as is the case with the inner
join. Consider the following tables—the MEMBER table and the ORDERSUMMARY table. If you do an
inner join then you will get rows containing matching values as shown by the following SQL:
SELECT MEMBER.NAME, ORDER_SUMMARY.ORDER_NO, ORDER_SUMMARY.AMOUNT
FROM MEMBER, ORDER_SUMMARY
WHERE MEMBER.MEMBER_ID = ORDER_SUMMARY.MEMBER_ID;
As the result indicates the inner join loses some information; the members who have not placed
an order will not figure in the results table. But outer join preserves such information and this distinction
is the whole point of outer join, initially most RDBMSs did not support outer join explicitly and one had
to find round about solutions to get the desired results. But SQL standard -SQL-92 added explicit outer
join support to the language. The syntax of an outer join expression is:
table-reference [NATURAL] outer-join-type
JOIN table-reference
[ON conditional-expression]
[USING (column-commalist)]
35 | P a g e
The 'outer-join-type' can be any one of the following:
LEFT [OUTER]
RIGHT [OUTER]
FULL [OUTER]
The implementation varies depending on the RDBMS. But the according to the SQL-92 syntax, if the
keyword JOIN is prefixed with LEFT, RIGHT or FULL (with the optional keyword OUTER) then the Join in
question is an outer join. If NATURAL is specified, neither an ON clause nor a USING clause can be
specified; otherwise, exactly one of the two must be specified.
The RDBMS constructs the left outer join like the inner join, except that it retains non-matching
rows from the left table in the result and places null values in the attributes that comes m the right
table. Variations include right outer join, which retains non-matching attributes m the right table, and
full outer join that retains non-matching attributes from both tables.
UNIONS
The union operation combines two sets of rows into a single set composed of all the rows In
either or both of the two original sets provided the two original sets are union compatible (whereas a
join combines two sets of columns into a single set). For union compatibility:
The two sets must contain the same number of columns.
Each column of the first set must be either the same data type as the corresponding
column of the second set or convertible to the same data type as corresponding column
of the second set.
The syntax for UNION is:
SELECT statement
UNION [ALL]
SELECT statement
Consider the following query. Get the details of all authors and publishers who are from 'India'. The SQL
is given below:
SELECT NAME, CITY, COUNTRY
FROM AUTHOR
WHERE COUNTRY = "India"
UNION
SELECT NAME, CITY, COUNTRY
FROM PUBLISHER
WHERE COUNTRY = "India";
36 | P a g e
Note that the two select lists contain the same number of items and the data types are
compatible. Check the user's manual of the RDBMS to see what data type combinations are compatible
in your implementation. In the above result you can see that 5 rows are from the AUTHOR table and 4
are from the PUBLISHER table.
Order of Evaluation
Parentheses can be used to force a particular order of evaluation if multiple UNIONS are
involved. For example, A UNION ALL (B UNION C) and (A UNION ALL B) UNION C are not evaluated. But if
all the operators are either UNIONs or UNION ALLs then parentheses are unnecessary.
TABLES
Tables are the basic building blocks in any relational database management system. They
contain the rows and columns of your data. You can create, modify and delete tables using the data
definition language (DDL) commands.
A table in a relational system consists of a row of column headings, together with zero or more
rows of data values. In this chapter we will see how to create, alter and delete a table.
In most SQL implementations, the user who creates the table is usually its owner. The owner of
a table is responsible for granting others access to his/her tables. When you create a table using SQL,
you need to specify the details like table name, column names and their data types, default values for
each column, etc.
Creating a Table
You create a table using the CREATE TABLE statement. The CREATE TABLE statement creates a
new base table.
A base table is an autonomous named table. By autonomous, we mean that the table exists by
its own right, unlike a view, which does not exist in its own right but is derived from one or more base
tables. The CREATE TABLE statement has two formats. Format 1 is the general form and is as follows:
CREATE TABLE base-table-name
(Column-l-definition
[,Column-2-defmition]...
[,Column-n-definition]
[,Primary-key-definition]
[,Alternate-key-definitions]
[,Foreign-key-definitions]);
The keywords NULL and NOT NULL are optional. The default is to permit nulls, but n< all
implementations follow it. So it is a good practice to spell out your choice, even if you are accepting
defaults. If you specify the NULL option, then the RDBMS will insert a null in column if the user does not
specify a value. If you specify the NOT NULL option it means that the column should have a value. If you
don't specify a value for that column, the system will reject entry and return an error message. When
you use the NOT NULL option, you can specify either the WITH DEFAULT option or UNIQUE option. If
you specify the NOT NULL WITH DEFAUL1 option then the RDBMS will substitute the default values (for
example, 0 for numeric data types spaces for character data types and so on). If NOT NULL UNIQUE is
37 | P a g e
specified, the RDBMS will ensure that the values for that column are unique or in other words, no
duplicates will be allowed. An example of the CREATE TABLE statement is given below:
CREATE TABLE BOOK
(ISBN CHAR(10) NOT NULL,
TITLE CHAR(30) NOT NULL WITH DEFAULT,
AUTHOR CHAR(30) NOT NULL WITH DEFAULT,
PUBLISHER CHAR(30) NOT NULL WITH DEFAULT,
YEAR INTEGER NOT NULL WITH DEFAULT,
PRICE INTEGER NULL,
PRIMARY KEY (ISBN));
This statement will create an empty base table named BOOK. The table will have six columns
and the primary key will be ISBN.
The second format of the CREATE TABLE statement allows the user to create a base tablet that is
having the same structure as some existing table. The syntax is as follows:
CREATE TABLE base-table-name LIKE table-name;
For example the statement 'CREATE TABLE CATALOG LIKE BOOK;' will create a table called
CATALOG with the same structure as BOOK. The important thing that should be noted is that when a
table is created from an existing table, only the structure is copied; the primary, alternate and foreign
key definitions are not inherited.
Most implementations will have system-specific limits on the number of columns that you can
define for a single table. There will also be restrictions on the row length (the total number of bytes that
all the columns use). Usually the limit is large enough so that you won't have to think about it. But if you
are defining a table with large number of columns or with columns that are very large, then you might
want to check the limits to make sure that it is not exceeded.
Modifying a Table
An existing base table can be modified by using the ALTER TABLE statement. The format of the
ALTER TABLE statement is as follows.
ALTER TABLE base-table-name
ADD column data-type [NULL |NOT NULL WITH DEFAULT];
In the following example another column 'DISCOUNT with data-type 'INTEGER' will be added to
the BOOK table
ALTER TABLE BOOK
ADD DISCOUNT INTGER NULL;
Consider another example (the same example we used for the first normal form). Here the initial
definition of the CONTACTS table was as follows:
38 | P a g e
After normalizing to get to the first normal form we made the following changes:
Since the ALTER TABLE does not provide any facility to delete a column from a table, we deleted
the table and created two new ones. Then we modified those tables using the ALTER TABLE statement
to add the primary and foreign key so that it conformed to the first normal form.
Using the ALTER TABLE statement new columns can be added and primary and foreign key
specifications can be added or removed. But alternate key specifications cannot be changed using the
ALTER TABLE statement. The important thing to remember here is that the ALTER TABLE statement
neither support any kind of change to the width or data type of an existing column nor the deletion of
an existing column.
Deleting a Table
An existing base table can be deleted at any time by using the DROP TABLE statement The
syntax of this statement is
DROP TABLE base-table-name
The specified base table is removed from the system. All indexes and views defined for the table
are also automatically dropped. For example, the command 'DROP TABLE BOOK;' will delete the table
named BOOK along with its contents, indexes and any views defined for that table.
VIEWS
When you look into the night sky, you see stars. When you look through a telescope also you
see stars. But the stars are not actually in the telescope; you are just looking at the stars in the sky with a
39 | P a g e
different perspective—a much closer view. There are two kinds of tables base tables and views. Base
table is a physical table that is not defined in terms of other tables. In other words, it is autonomous and
exists in its own right.
Views, on the other hand are not autonomous and do not exist by their own right, view is a
named table that is represented, not by its own physically separate stored data, but by its definition in
terms of other named tables (base tables or views). Or in other words, views and base tables are
analogous to the telescope and the stars. When users see a view, they see the same data that is in the
database tables, but perhaps with a different perspective. And just as the telescope does not contain
any stars, views don't contain any data. Instead, view is a virtual table, deriving its data from base tables.
Creating a View
A view is created or defined using the CREATE VIEW statement. The general syntax of a view
definition is given below:
CREATE VIEW view-name [[column [,column]...........]]
AS subquery
[WITH CHECK OPTION];
The subquery cannot include either UNION or ORDER BY. The clause 'WITH CHECK OPTION'
indicates that UPDATE and INSERT operations against the view are to be checked to ensure that the
UPDATEd or INSERTEd row satisfies the view-defining condition. You can think of a view as a stored
query. That is because, as seen above, the view is defined with a query. Consider the BOOK table. It has
six columns, ID, TITLE, AUTHOR, PUBLISHER, YEAR, PRICE. Suppose a situation arises where we would
want certain users to see only the ID, TITLE, PUBLISHER, YEAR and PRICE columns and that too only for
books published after 1995. We can create a view definition as follows:
40 | P a g e
Note: AUTHOR Column hi tel lo tur o…
The effect of the above SQL statement is the creation of a view called BOOKV. This view will be based on
the BOOK table. Once the view has been created, to the user it is as if there really is a table in the
database with the specified name. In the above example, it is as if there is a table called BOOKV. Only
thing is that, the rows of the BOOK table that does not satisfy the view definition will not be visible to
the user. This is shown in Table 15.1. The view can be referred by its name or alternatively one can
introduce a synonym or an alias for the view. The view has five columns—ID, TITLE, PUBLISHER, YEAR
and PRICE corresponding to the ID, TITLE, PUBLISHER, YEAR and PRICE columns of the base table BOOK.
If column names are not specified explicitly in the view definition statement then the view inherits the
column names of the source of the view. Column names must be specified explicitly for all columns of
the view if:
Any column of the view Is derived from a function, an operational expression or literal.
Two or more columns of the view would otherwise have the same name.
For example, in the following view definition, there is no name that can be inherited for the total
column, because it is derived from a function; hence all the column names must specified.
CREATE VIEW MAXPRICE (PUBLISHER, AVERACE_PRICE)
AS SELECT PUBLISHER, MAX(PRICE)
FROM BOOK
GROUP BY PUBLISHER;
41 | P a g e
The result of the above view is shown in Table 15.2.
Publisher Max Price
Bantam Books 256.1
Century Books 623.3
Corgi Books 182.35
Dell Books 330
Harper Collins 124
Headline Books 199.95
Macmillan 424.8
Mandarin Books 124.5
Pan Books 360
Random House 250.1
Signet Books 125
Simon & Schuster 532.8
Warner Books 100.15
Now the question is whether the update is a valid one or not. Can it be accepted? If it can be,
then it will have the effect of removing that row from the view, since that row will no longer satisfy the
view-defining condition 'WHERE YEAR > 1995'. Similarly, consider the following INSERT operation:
INSERT INTO BOOKV (ID, TITLE, PUBLISHER, YEAR, PRICE)
VALUES ("B04", "Business Notes", "Potter", 1984, 560);
The above row will disappear from the view as soon as it is inserted, because once again the row does
not satisfy the view-defining condition 'WHERE YEAR > 1995'. The CHECK OPTION is provided to avoid
such situations. If the clause 'WITH CHECK OPTION' is included in the view-definition, then all INSERTS
and UPDATES on that view will be checked to ensure that the newly INSERTed or UPDATEd rows do not
violate the view-defining conditions. If there are some values, that do not satisfy the view-definition,
then the update or insert will not take place. Thus if the definition of the BOOKV had the 'WITH CHECK
OPTION' clause, then the above two operations would have been rejected.
42 | P a g e
If the definition of the view involves DISTINCT at the outermost level, then that view is not
updateable.
If the definition of the view includes a nested subquery that refers to the base table on which
the view is defined, then it is not updateable.
If the FROM clause in the view definition involves multiple range variables, then it is not
updateable.
A view defined on a non-updateable view is not updateable.
Advantages of Views
Some of the major advantages of using views are listed below:
Data Security - Views allow to set up different security levels for the same base table, thus
shielding or protecting certain data from people who do not have proper authority. For
example, for the use of the data entry clerk in the personnel department a view of the employee
master table can be created which excludes the confidential details about a person like his
salary, etc.
The views allow the Same data to be seen by different users in different ways at the same time.
Views can be used to present additional information like derived columns.
Views can be used to hide complex queries. Developers can hide complex queries using a view.
The benefit is that users can issue simple queries against the view and the view will take care of
all the complicated work. For example, a developer might hide a join query by creating a view
and the user who uses the view will not feel any difference.
Using Views
The question whether to use views or not and if to use, to what extent is one that is a subject of hot
debate among DBMS professionals. Some support the liberal use of Views but some do not. But the best
strategy is somewhere in between. By following the guidelines given below we can optimize the use of
views:
Do not create one view per base table.
Create a view only when a stated and rational goal can be achieved by the view. Each view must
have a specific logical use before it is created.
There are 7 basic uses for which views are the best. They are:
To provide row and column level security
To ensure efficient access paths
To ensure proper data derivation Q To mask complexity from the user Q To provide domain
support
To rename columns
To provide solutions that cannot be accomplished without views.
If the view you are creating does not belong to any of the categories mentioned above, then you should
reanalyze your decision.
Dropping a View
As we have seen, views are defined by the CREATE VIEW statement. There is no ALTER VIEW
statement as in the case of base tables. If you want to delete or remove an existing view you can do so
by using the DROP VIEW statement. The syntax of the DROP VIEW statement is:
DROP VIEW view-name {RESTRICT | CASCADE}
43 | P a g e
If RESTRICT is specified and if the view is referenced in any other view definition or in an
integrity constraint, the DROP VIEW will fail. If there are not any integrity constraints the DROP VIEW
will succeed and the view will be deleted. If CASCADE is specified, the DROP VIEW will always succeed
and any referencing views and integrity constraints will automatically be dropped too. For example, the
statement 'DROP VIEW BOOKV;' will delete the BOOKV view.
INDEXES
An index is a structure that provides faster access to the rows of a table based on the values of
one or more columns. The index stores data values and pointers to the rows where those data values
occur. In the index the data values are sorted and stored in the ascending or descending order. So, the
RDBMS can quickly search the index to find a particular data value a hence the row associated with it.
Suppose the BOOK table contains thousands of records. If there were no indexes defined for the
book table, the only way the RDBMS can find the details of the book is to sequentially search all the
rows, one row at a time, till the matching ISBN is found. Then it would have to examine the remaining
rows, to make sure that it has found all the matching rows in the table. With an index for the ISBN
column, the RDBMS can zoom in on the requested data with less effort. It can search the index to find
the requested value and then follow the pointer to find the requested row(s) of the table. Searching an
index is much faster than searching the table because as we said earlier, the index is sorted and its rows
are very small. Finding the rows of the table from an index is also faster, because the index tells the
RDBMS, where exactly the rows are stored.
44 | P a g e
So indexes speed up the execution of SQL statements with search conditions, which refer to the
indexed columns. One disadvantage of the index is that it is a separate database object and needs
additional disk space. Another disadvantage is that the index must be updated every time a row is
inserted, deleted or modified. This imposes additional overheads for the INSERT, DELETE and UPDATE
operations.
Creating an Index
indexes are created using the CREATE INDEX statement. The general form of the CREATE INDEX
statement is:
CREATE [UNIQUE] INDEX index-name
ON base-table (column [order] [,column [order]].........)
Each 'order' specification is ASC (ascending) or DESC (descending), ASC being the default. The
left to right sequence of naming columns in the CREATE INDEX statement corresponds to major-to-
minor ordering in the usual way.
For example, the statement 'CREATE INDEX X ON T (P, Q DESC, R)' creates an index called X on the base
table T in which entries are ordered by the ascending R-value within descending Q-value within
ascending R-value. The columns P; Q and R need not be contiguous, nor need they all be of the same
data type, nor need they be all fixed length or varying length.
As we have seen earlier, once created, the index is automatically managed by the RDB to reflect
the updates on the base table, till the index is dropped.
The 'UNIQUE' option in the CREATE INDEX statement specifies that no two rows in indexed base
table will be allowed to take the same value for the indexed column or column combination at the same
time, or in other words no duplicates will be allowed. The RDBMS will reject any attempt to introduce a
duplicate value.
Indexes like the base tables can be created and dropped at any time. Any number of indexes
can be built on a single base table.
Types of Indexes
There are different types of indexes. Most systems allow indexes involving more than one
column (composite indexes) and indexes that prevent duplication of data (unique indexes). Another
option is the clustered index where the indexes are sorted both logically and physically. We will see
these indexes in a little more detail.
Composite Indexes
When an index is made up of more than one column it is called a composite index. Composite
indexes are used when two or more columns are best searched as a unit because of their logical
relationship. For example, an index made on the first name and last name is a composite index.
Suppose, you have an EMPLOYEE table and there are two columns F_NAME and LNAME. You can build a
composite index as follows:
CREATE INDEX INX_NAME
ON EMPLOYEE (L_NAME, F_NAME);
Composite index columns do not have to be specified in the same order as in the CREAT TABLE
statement. You can use any order you want. For better performance, it is a good idea to start with the
column that you use most often in searches.
Unique Indexes
45 | P a g e
A unique index is one in which no two rows are permitted or in which no duplicate values are
allowed for the same index value. Unique indexes are usually created on the primary key any of the
candidate keys) of a table. You specify the index as a unique index by using the keyword UNIQUE. For
unique indexes, the RDBMS checks for duplicate values when the index is created (if data already exists)
and each time new data is added. The following example creates unique index:
CREATE UNIQUE INDEX INX_ID
ON BOOK (ID);
A unique index should be created only if the column on which the index is being built ha
uniqueness. For example, creating a unique index on the first or last names will not be a good idea as it
does not make any sense and will create problem if there is more than one person with the same last
name. But a unique index on the employee number or ISBN will be a good idea, as the chances of two
rows having the same employee number or ISBN are nil. Furthermore, a unique index serves as an
integrity check. For example a duplicate employee number or ISBN reflects me kind of error in the data
entry.
Clustered Indexes
Many RDBMS offer you the choice of making your index clustered or non-clustered. When you
create a clustered index, it means that the system will sort the rows of a table when ere is a change
made to the index. Or in other words, when you use the clustered index, the rows in the table will be in
the same order as that of the index. Since clustered index controls the physical location of data, there
can be only one clustered index per table, most often created on the primary key.
In a non-clustered index, the physical order of the rows is not the same as their indexed order.
There can be as many non-clustered indexes per table as you wish. Clustered indexes are much faster
than non-clustered ones. A clustered index is usually very advantageous when many rows with
contiguous values are being retrieved. But clustered indexes make the data addition and modification
process slower, as the data in the table also have to be sorted along with that in the index.
Dropping an Index
Indexes can be dropped explicitly using the DROP INDEX command. Whenever the base table is dropped
the indexes for that table are automatically dropped. The syntax of the DROP INDEX command is 'DROP
INDEX index-name'. Once the command is executed, the index is destroyed or removed from the
catalog.
Using Indexes
In general it is a good idea to create an index for columns that are used frequently in search conditions.
Indexing also makes a lot of sense when queries against a table are more frequent than inserts and
updates. The RDBMS usually establishes an index for the primary key of a table, because it anticipates
that access to the table will most frequently be using the primary key. Given below are some guidelines
for index creations:
A unique index on the primary key prevents duplicates and guarantees that every value in the
primary key column will in fact uniquely identify the row.
A column that is often accessed in a sorted order should be indexed so that the system can take
advantage of the indexed order.
Columns that are regularly used in joins should be indexed, because the RDBMS ca perform the
join faster.
46 | P a g e
If the number of update operations (INSERT, UPDATE and DELETE) is more, compared to the
querying operations, then indexing the table will affect performance, because every time a table
is updated, the index also has to be sorted.
Indexing on columns that have only one or two values (for example, the column SEX that can
have three values—male, female and unknown) will not have any real performance
improvement.
Indexing small tables with only a few rows does not improve the performance, a RDBMS in all
probability will do a table scan rather than an index scan.
Indexes are very useful and make the data access very fast if used properly. To use them well, you
must understand how the query optimizer of your RDBMS works and what kinds of operations are being
performed on the tables
O
Tom Clancy Special Forces
Robert Ludlum Sigma Protocol
Jack Higgins Edge of Danger
Steve Martini The Jury
Now consider the two relations CATALOG and ORDER. The CATALOG contains the entire list of
books in the bookshop and ORDER contains the books that are ordered by a customer. For simplicity we
will call the CATALOG relation as C and the ORDER relation as O.
The two relations, as we can see, are union compatible. Now we will see the four set-oriented
operations.
47 | P a g e
UNION - The result of this operation denoted by C U O, is the relation that includes all tuples
that are either in C or in O or in both C and O. Duplicates are eliminated.
INTERSECTION - The result of the intersection operation is a relation that includes all tuples that
are in both C and O. The intersection operation is denoted by C ƞ O.
DIFFERENCE - The difference operation is denoted by C — O. The result of the difference
operation is the relation that contains all tuples in C but not in O.
The result of the UNION, INTERSECTION and DIFFERENCE operations on relations C and O are given in
Table 12.2. :
cuo
Robin Cook Shock
Matthew Reilly Area 7
Tom Clancy Special Forces
David Baldacci Last Man Standing
Ken Follet Jackdaws
Robert Ludlum Sigma Protocol
Nicholas-Sparks The Rescue
Jack Higgins Edge of Danger
Steve Martini The Jury
CƞO
Tom Clancy Special Forces
Robert Ludlum Sigma Protocol
C —O
Robin Cook Shock
Matthew Reilly Area 7
David Baldacci Last Man Standing
Ken Follet Jackdaws
Nicholas Sparks The Rescue
Notice that both UNION and INTERSECTION operations are commutative and associative operations.
This means that following are TRUE:
A U B = B U A and A n B = B n A
A U (B U C) = (A U B) U C and A n (B n C) = (A n B) n C
The DIFFERENCE operation is not commutative. This means that A – B is not the same as B-A or in other
words A — B ≠ B —A.
EMBEDDED SQL
Embedded SQLs are SQL statements included in the programming language. The programming
language in which the SQL statements are included is called the host language. Some of the common
host languages are C, COBOL, Pascal, FORTRAN, PL/I, etc. The program is written in the host language
and whenever data access or data manipulation is need the SQL statements are embedded. This
embedded SQL source code is submitted to an SQL precompiler, which processes the SQL statements.
Variables of the host language can be referenced in the embedded SQL statements thus allowing the
48 | P a g e
values calculated by the program to be used by the SQL statements. The host language variables (also
known as host variables) are used by the embedded SQL statements to receive the results of the SQL
queries thus allowing the programming language to process the retrieved values. Special program
variables (called null indicators) are used to assign and retrieve the NULL values to and from the
database.
Embedded SQL
An embedded SQL program contains a mix of SQL and Programming language statements. So an
embedded SQL program cannot be directly submitted to the compiler for the programming language.
Instead before the language compiler is called upon, the SQL precompiler removes all the SQL
statements and replaces it with equivalent function calls in the host language. The SQL precompiler then
process the SQL statements, checks the syntax, optimizes the SQL, and creates the application plan. At
the same time the host language statements are complied by its own compiler and the executable code
is created. These two executables—the executable of the program and the application plan—are
combined together to form the final executable of the embedded SQL program. This is illustrated in
Figure 22.2. It shows how the embedded SQL program is made into the executable version in a 'DB2-
COBOL' environment. But the process is similar for other product-language combinations also.
Precompilation
The DB2 application program contains COBOL code with SQL statements embedded in it. The COBOL
compiler will not be able to recognize the SQL statements and will give compilation errors. So before
running the COBOL compiler, the SQL statements must be removed from the source code.
DCLGEN Command
The DCLGEN or Declaration Generator command is used to produce a COBOL copybook, which
contains an SQL DECLARE TABLE statement along with the WORKING-STORAGE host-variable definitions
for each column of the table. When the DCLGEN command is issued, DB2 reads the catalog to determine
the structure of the table and builds the COBOL copybook.
49 | P a g e
DCLGEN is not a required step because whatever is generated using the DCLGEN command can
be hard-coded In the application program. But it is good practice to run the DCLGEN command for every
table that will be embedded in a COBOL program. Then every program that accesses that table can
INCLUDE the generated copybook. This reduces a lot of unnecessary coding. But one thing that must be
remembered Is that, DCLGEN will generate the host variables with the same name as the column name
and if the program uses two tables that have common column names, then edit the copybook and
change the names.
Binding
The BIND command is a type of compiler for SQL statements. BIND reads SQL statements from
DBRMs and produces a mechanism to access data as directed by the SQL statements being bound. There
are two types of BINDs—BIND PLAN and BIND PACKAGE. BIND PLAN accepts one or more DBRMs
produced from previous DB2 precompilations, one or more packages produced from previous BIND
PACKAGE commands or a combination of DBRMs and Package lists as input. The output of the BIND
PLAN is an application plan containing the executable logic representing optimized access paths to DB2
data. An application plan is executable only with the corresponding load module. Before you can run a
DB2 program, an application plan name must be specified.
The BIND PACKAGE command accepts a DBRM as input and produces a single package
containing the optimized access path logic. You can then bind the packages into an application plan
using the BIND PLAN command. A package is not executable and cannot be specified when a DB2
program is being run. You must bind a package into a plan before using it. BIND performs the following
functions:
Reads the SQL statements In the DBRM and checks the syntax of those statements.
Checks that the DB2 tables and columns being accessed confirm to the corresponding DB2
catalog information.
Performs authorization validations,
Optimizes the SQL statements into efficient access paths.
Link Editing
After compilation the compiled source is link-edited to an executable load module. The
appropriate DB2 host language interface module also must be included In the link-edit step. The
interface module is based on the environment in which the program will execute. The output of the link-
edit step is an executable load module, which can be run with a plan containing the program's DBRM or
package.
50 | P a g e
branching and looping facilities, input/output functions, etc. The SQL handles the database
access and manipulation.
The use of the precompiler shifts the CPU intensive parsing and optimization to the
development phase. So the resulting executable program will be very efficient in its CPU usage.
The DBRM produced by the. precompiler provides portability of applications. An application
program can be written and tested on one system and then its executable program and DBRM
can be moved to another system. After the BIND program on the new system creates a new
application plan and installs it in the database, the application program can use it without
recompilation.
The program's run-time interface to the private database routines is transparent to the
application programmer. The programmer works with the embedded SQL at the source code
level and does not have to worry about other database related issues.
Automatic Rebinding
The application plan is optimized for the database structure as it exists and the plan is placed in
the database by the BIND program. But if the structure changes later, any application plan that refers
the changed structure will result in an error. To handle this situation, the RDBMS stores with the
application plan, a copy of the SQL statements from which the plan was produced. The RDBMS also
keeps track of the changes made to the database objects upon which each application plan is
dependent. If any of the objects is modified, then the RDBMS automatically marks the plan as invalid
and the next time the program tries to use the plan, the RDBMS will rebind the SQL statements to
produce a new application plan. This process is called 'automatic rebinding' and is completely
transparent to the user. But there is one drawback to this. The RDBMS will detect any change to the
database structure that will make the plan invalid. But if there are some changes like the creation of a
new index, which make a better plan possible, they will go undetected and the program will continue to
use the old application plan. In such circumstances the user has to explicitly run the BIND program to
rebind the plan.
51 | P a g e
Every embedded SQL program should include either a host variable called SQLCODE (SQLCOD in
FORTRAN) or a host variable called SQLSTATE (SQLSTA in FORTRAN) or both. After any SQL
statement has been executed, a status value is returned to the program in SQLCODE or
SQLSTATE or both.
Host variables should have data types appropriate to the purposes for which they are used. A
host variable used as a target must have a data type that is compatible with that of the SQL
value to be assigned to that target. Similarly, a host variable that is to be used as a source should
have a data type that is compatible with that of the SQL column to which values of that source
are to be assigned.
Host variables and SQL columns can have the same name.
Every executable SQL statement should in principle be followed by a test of the returned
SQLCODE or SQLSTATE value. The WHENEVER statement is provided to simplify this process.
Error Handling
When you type an SQL statement in the interactive mode and if there is an error it immediately
displayed to you. You can make the corrections and resubmit the SQL again. In the case of embedded
SQL, errors should be trapped by the application program. Embedded SQL statements can produce two
types of errors—compile-time errors and run-time errors.
Compile-time Errors
These include misspelled column names, table names, misspelled keywords, misplac commas,
parentheses, etc. These errors are detected by the precompiler and reported to the programmer. The
programmer can correct the mistakes and compile the program again.
Run-time Errors
Problems like lack of appropriate privileges, single value SELECT statements returning more than one
row, etc. can be detected only during run-time and the application should have the necessary error
handling procedure to handle these errors.
SQLCA
The RDBMS reports the run-time errors to the application program through the SQ Communication
Area—SQLCA. The SQLCA is a data structure that contains error variables an status indicators. By
examining the SQLCA, the application program can determine the success failure of the embedded SQL
statements and take appropriate actions. You should include SQLC in all embedded SQL application
programs. In the case of the DB2-COBOL example, this accomplished by coding the following statements
in your WORKING-STORAGE section:
EXEC SQL
INCLUDE SQLCA
END-EXEC.
As we have mentioned, the SQLCA contains fields, which are used for communicating
information describing the success or failure of the execution of an embedded SQL. The most important
among them is the SQLCODE, which contains the return code passed by DB2 to the application program.
The return code provides information about the execution of the last SQL statement. A value of zero
indicates successful execution, a positive value indicates successful execution, but with an exception,
and a negative value indicates failure during execution. By knowing the value of the SQLCODE the
problem of failure can be pinpointed and appropriate actions could be taken.
52 | P a g e