ito to dbase summary n revison notes
ito to dbase summary n revison notes
Data: Available but unsorted information raw facts and figures with no meaning.
Information: Data that has been organized so that it can be understood as having meaning.
1
CP1500 – Revision Notes
DB Structure of DBMS
One-to-one relationship: One occurrence of an entity can associate with only one occurrences of
another entity
One-to-many relationship: One occurrence of an entity can associate with many occurrences of
another entity
Many-to-many relationship: Many occurrences of an entity can associate with many occurrences
of another entity
name
ER Model Basics
A database “schema” in the ER Model can be represented pictorially using an ER diagrams.
Entity: An entity is described (in DB) using a set of attributes
Entity Set: All entities in an entity set have the same set of attributes (unless ISA hierarchy applies)
Each entity set has a key.
Each attribute has a domain.
Key Constraints
Consider Works_In: An employee can work in many departments; a dept can have many employees.
In contrast, each dept has at most one manager, according to the key constraint on Manages.
name
Aggregation ssn lot
Used when we have to model a relationship involving entity sets
and a relationship set. Employees
Aggregation: indicates that a relationship participates in another
relationship set.
Monitors until
Aggregation allows us to treat a relationship set as an entity
set for purposes of participation in (other) relationships.
started_on since
dname
pid pbudget did budget
3
CP1500 – Revision Notes
policyid cost
Summary of Conceptual Design
Conceptual design follows requirements analysis,
4
CP1500 – Revision Notes
Constraints
Entity Integrity Constraint: states that no primary key value can be null. This is because the primary
key value is used to uniquely identify individual tuples.
Entity constrains can be violated if the primary key of the new tuple is null.
Referential Integrity Constraint: specified between two relations and is used to maintain the
consistency among tuples of the two relations.
Referential integrity can be violated if the value of any foreign key in refers to a tuple that does
not exist in the referenced relation.
Domain constraints can be violated if an attribute value is given that does not appear in the
corresponding domain.
Key constraints can be violated if a key value in the new tuple already exists in another tuples in
the same relation.
Types of Sets
Subset: A set S2 is a subset of another set S1 if every element in S2 is in S1. S1 may have exactly
the same elements as S2.
Proper Subset: A set S2 is a proper subset of another set S1 if every element in S2 is in S1 and S1
has some elements, which are not in S2.
5
CP1500 – Revision Notes
Key Information
Primary Key: A set of attributes that uniquely identify a particular entity (or relationship)
Candidate Key: The primary key is the key selected by the DBA from among the group of candidate
keys.
Foreign Key: (a foreign key can refer to its own relation)
Super key: A set of attributes that properly contains a key.
name
name cost
ssn lot pname age
ssn lot
Monitors until
Dependents
pname ssn age
started_on since
dname
pid pbudget did budget
ssn lot
super-
Monitors ssn pid did until visor
subor-
dinate
Reports_To
Projects pid Started_on pbudget
since
since name dname
name dname ssn lot did budget
ssn lot did budget
6
CP1500 – Revision Notes
note: The relational model is the most widely used model. A recent competitor object-oriented
model & the object-relational model is also starting to emerge.
Relational Query Languages (SQL) Cardinality =3, degree =5, all rows distinct
A major strength of the relational model: supports simple, powerful querying of data.
Queries can be written intuitively, and the DBMS is responsible for efficient evaluation.
The key: precise semantics for relational queries.
Developed by IBM (system R) in the 1970s
Need for a standard since it is used by many vendors:
Standards: SQL-86, SQL-89 (minor revision), SQL-92 (major revision, current standard), SQL-
99 (major extensions)
eg. Find all 18-year-old students: eg. Find A grade enrolled students (multiple
SELECT * query relation):
FROM Student S SELECT S.name, E.cid
WHERE S.age = 18 FROM Student S, Enrolled E
note: renaming for simplification WHERE S.sid = E.sid AND E.grade = ‘A’
7
CP1500 – Revision Notes
eg: Students take only one course; further, no two students in the course are given the same grade.
CREATE TABLE Enrolled
(sid: CHAR(20); cid: CHAR (20); grade: CHAR(2)
PRIMARY KEY (sid, cid)
UNIQUE (cid, grade))
note: unique clause illustrates another possibility for key.
Enrolled
Foreign Keys, Referential Integrity sid cid grade Students
Foreign key: Set of fields in one relation used to 53666 Carnatic101 C sid name login age gpa
‘refer’ to a tuple in another relation (note: must 53666 Reggae203 B 53666 Jones jones@cs 18 3.4
correspond to primary key of the second 53650 Topology112 A 53688 Smith smith@eecs 18 3.2
53650 Smith smith@math 19 3.8
relation). Like a ‘logical pointer’. 53666 History105 B
eg. sid is a foreign key referring to Students:
CREATE TABLE Enrolled
(sid CHAR(20), cid CHAR(20), grade CHAR(2),
PRIMARY KEY (sid,cid),
FOREIGN KEY (sid) REFERENCES Students)
note: Only students listed in the Students relation should be allowed to enroll for courses.
If all foreign key constraints are enforced, referential integrity is achieved (ie no dangling
references).
8
CP1500 – Revision Notes
Views
A view is just a relation, but we store a definition, rather than a set of tuples. (note: views can be
dropped using DROP VIEW command)
eg. CREATE VIEW YoungActiveStudent (name,grade)
AS SELECT S.name, E.grade
FROM Student S, Enrolled E
WHERE S.sid=E.sid and S.age<21
note: Views can be used to present necessary information (a summary), while hiding details in
underlying relation.
9
CP1500 – Revision Notes
10
CP1500 – Revision Notes
Division
Not supported as a primitive operator, but useful for sno pno p n o pno pno
expressing certain queries (eg. Find sailors who have reserved s1 p1 p 2 p2 p1
s1 p2 p4 p2
all boats) B1
s1 p3 p4
Let A have 2 fields, x and y; B have only field y: B2
s1 p4
ie. A/B contains all x tuples (sailors) such that for every y s2 p1 sno B3
tuple (boat) in B, there is an xy tuple in A. s2 p2 s1
Or: If the set of y values (boats) associated with an x s3 p2 s2 sno
s4 p2 s3 s1 sno
value (sailor) in A contains all y values in B, the x value is
s4 p4 s4 s4 s1
in A/B.
In general, x and y can be any lists of fields; y is the list of A A/B1 A/B2 A/B3
fields in B, and x y is the list of fields of A.
Division is not essential operator; just a useful shorthand (also true of joins, but joins are so
common that systems implement joins specially)
idea: For A/B, compute all x values that are not `disqualified’ by some y value in B.
x value is disqualified if by attaching y value from B, we obtain an xy tuple that is not in A.
11
CP1500 – Revision Notes
Query Examples
eg. Find names of sailors who’ve reserved boat #103
1: π sname (( bid=103 Reserves) Sailors)
2: π sname ( bid=103 (Reserves Sailors))
eg. Find the names of sailors who’ve reserved all ‘interlake’ boats
note: Uses division; schemas of the input relations to must be carefully chosen:
1: p (Temp_sids, (π sid,bid Reserves) / (π bid (π btype=’interlake’ Boats))
π sname (Temp_sids Sailors)
Summary
The relational model has rigorously defined query languages that are simple and powerful.
Relational algebra is more operational; useful as internal representation for query evaluation plans.
Several ways of expressing a given query; a query optimizer should choose the most efficient
version.
12
CP1500 – Revision Notes
UNION used to compute the union of any two union-compatible sets of tuples (from SQL queries).
eg. Find sid’s of sailors who’ve reserved a red or a green boat
1: 2:
(SELECT S.sid SELECT S.sid
FROM Sailors S, Boats B, Reserves R FROM Sailors S, Boats B, Reserves R
WHERE S.sid=R.sid AND R.bid=B.bid AND WHERE S.sid=R.sid AND R.bid=B.bid
B.color=‘red’) AND (B.color=‘red’ OR
UNION B.color=‘green’)
(
SELECT S.sid
FROM Sailors S, Boats B, Reserves R
WHERE S.sid=R.sid AND R.bid=B.bid AND
B.color=‘green’)
INTERSECT: Can be used to compute the intersection of any two union-compatible sets of tuples.
Included in the SQL/92 standard, but some systems don’t support it.
Contrast symmetry of the UNION and INTERSECT queries with how much the other versions differ.
eg. Find sid’s of sailors who’ve reserved a red and a green boat
1: 2:
SELECT S.sid SELECT S.sid
FROM Sailors S, Boats B, Reserves R FROM Sailors S, Boats B1, Reserves R1,
WHERE S.sid=R.sid AND R.bid=B.bid Boats B2, Reserves R2
AND B.color=‘red’ WHERE S.sid=R1.sid AND R1.bid=B1.bid
INTERSECT AND S.sid=R2.sid AND R2.bid=B2.bid
SELECT S.sid AND (B1.color=‘red’ AND B2.color=‘green’)
FROM Sailors S, Boats B, Reserves R
WHERE S.sid=R.sid AND R.bid=B.bid
note: Also possible using IN
13
CP1500 – Revision Notes
Division in SQL
eg. Find sailor’s who have reserved all boats
SELECT S.sname
FROM Sailors S
WHERE NOT EXISTS ((SELECT B.bid
FROM Boats B)
EXCEPT
(SELECT R.bid
FROM Reserves R
WHERE R.sid=S.sid))
Aggregate Operators
Significant extension of relational algebra.
COUNT (*)
COUNT ( [DISTINCT] A)
SUM ( [DISTINCT] A)
AVG ( [DISTINCT] A)
MAX (A)
MIN (A)
14
CP1500 – Revision Notes
Conceptual Evaluation
The cross-product of relation-list is computed, tuples that fail qualification are discarded,
‘unnecessary’ fields are deleted, and the remaining tuples are partitioned into groups by the value of
attributes in grouping-list.
The group-qualification is then applied to eliminate some groups. (note: Expressions in group-
qualification must have a single value per group)
In effect, an attribute in group-qualification that is not an argument of an aggregate op also
appears in grouping-list. (SQL does not exploit primary key semantics here!)
One answer tuple is generated per qualifying group.
Embedded SQL
SQL commands can be called from within a host language (eg: C or COBOL) program.
SQL statements can refer to host variables (including special variables used to return status).
Must include a statement to connect to the right database.
SQL relations are (multi-) sets of records, with no a priori bound on the number of records. No
such data structure in C.
SQL supports a mechanism called a cursor to handle this.
15
CP1500 – Revision Notes
Cursors
Can declare a cursor on a relation or query statement (which generates a relation).
Can open a cursor, and repeatedly fetch a tuple then move the cursor, until all tuples have been
retrieved.
Can use a special clause, called ORDER BY, in queries that are accessed through a cursor, to
control tuple order. (note: fields in ORDER BY clause must appear in SELECT clause)
The ORDER BY clause, which orders answer tuples, is only allowed in the context of a cursor.
Can also modify/delete tuple pointed to by a cursor.
eg: Cursor that gets names of sailors who’ve reserved a red boat, in alphabetical order
EXEC SQL DECLARE sinfo CURSOR FOR
SELECT S.sname
FROM Sailors S, Boats B, Reserves R
WHERE S.sid=R.sid AND R.bid=B.bid AND B.color=‘red’
ORDER BY S.sname DESC
General Constraints
Useful when more general ICs than keys are involved.
Can use queries to express constraint.
note: Constraints can be named.
eg: CREATE TABLE Reserves eg: CREATE TABLE Sailors
( sname CHAR(10), ( sid INTEGER,
bid INTEGER, sname CHAR(10),
day DATE, rating INTEGER,
PRIMARY KEY (bid,day), age REAL,
CONSTRAINT noInterlakeRes PRIMARY KEY (sid),
CHECK (`Interlake’ <> CONSTRAINT ratingValue
( SELECT B.bname CHECK ( rating >= 1
FROM Boats B AND rating <= 10 )
WHERE B.bid=bid)))
Constraints Over Multiple Relations
eg: Number of boats + sailors <100
CREATE TABLE Sailors
( sid INTEGER,
sname CHAR(10),
rating INTEGER,
age REAL,
PRIMARY KEY (sid),
CREATE ASSERTION smallClub
CHECK
( (SELECT COUNT (S.sid) FROM Sailors S)
+ (SELECT COUNT (B.bid) FROM Boats B) < 100 )
note: If Sailors is empty, the number of Boats tuples can be anything! ASSERTION is the right
solution; not associated with either table.
Triggers
Trigger: procedure that starts automatically if specified changes occur to the DBMS. Three parts:
Event (activates the trigger)
Condition (tests whether the triggers should run)
Action (what happens if the trigger runs)
16
CP1500 – Revision Notes
eg:
CREATE TRIGGER youngSailorUpdate
AFTER INSERT ON SAILORS
REFERENCING NEW TABLE NewSailors
FOR EACH STATEMENT
INSERT
INTO YoungSailors(sid, name, age, rating)
SELECT sid, name, age, rating
FROM NewSailors N
WHERE N.age <= 18
Summary
SQL was an important factor in the early acceptance of the relational model; more natural than
earlier, procedural query languages.
Relationally complete; in fact, significantly more expressive power than relational algebra.
Even queries that can be expressed in RA can often be expressed more naturally in SQL.
Many alternative ways to write a query; optimizer should look for most efficient evaluation plan.
In practice, users need to be aware of how queries are optimized and evaluated for best results.
NULL for unknown field values brings many complications
Embedded SQL allows execution within a host language; cursor mechanism allows retrieval of one
record at a time
APIs such as ODBC and ODBC introduce a layer of abstraction between application and DBMS
SQL allows specification of rich integrity constraints
Triggers respond to changes in the database
17
CP1500 – Revision Notes
Normalized: Is done to minimize redundancy and minimize the insertion, deletion and update in a
database.
First Normal Form (1NF): All values are atomic. All attributes of a relation must contain only one
value per tuple.
Second Normal Form (2NF): All non-key attributes must be fully functionally dependant on the
primary key.
Third Normal Form (3NF): There must be no transitive dependencies between non-key-attributes.
Boyce-Codd Normal Form (BCNF): All keys must be determinants.
18
CP1500 – Revision Notes
Deletion anomaly: If we delete all employees with rating 5, we lose the information about the
wage for rating 5!
Before:
since
Refining an ER Diagram name dname
ssn lot did budget
1st diagram translated: Workers(S,N,L,D,S) Departments(D,M,B)
Lots associated with workers. Employees Works_In Departments
Suppose all workers in a dept are assigned the same lot: D L
Redundancy; fixed by: Workers2(S,N,D,S) Dept_Lots(D,L)
Can fine-tune this: Workers2(S,N,D,S) Departments(D,M,B,L)
After:
budget
since
name dname
ssn did lot
19
CP1500 – Revision Notes
eg: Contracts (cid, sid, jid, did, pid, qty, value), and:
C is the key: C CSJDPQV
Project purchases each part using single contract: JP C
Dept purchases at most one part from a supplier: SD P
JP C, C CSJDPQV imply JP CSJDPQV (transitivity)
SD P implies SDJ JP (augmentation)
SDJ JP, JP CSJDPQV imply SDJ CSJDPQV (transitivity)
Computing the closure of a set of FDs can be expensive. (Size of closure is exponential in # attrs!)
Typically, we just want to check if a given FD (X Y) is in the closure of a set of FDs F. An
efficient check:
Compute attribute closure of X (denoted X+) write F:
Set of all attributes A such that X A is in F+
There is a linear time algorithm to compute this.
Check if Y is in
Does F = {A B, B C, C D E } imply A E?
ie: is A E in the closure F+ ? Equivalently, is E in A+?
Normal Forms
Returning to the issue of schema refinement, the first question to ask is whether any refinement is
needed!
If a relation is in a certain normal form (BCNF, 3NF etc.), it is known that certain kinds of problems
are avoided/minimized. This can be used to help us decide whether decomposing the relation will
help.
Role of FDs in detecting redundancy:
Consider a relation R with 3 attributes, ABC.
No FDs hold: There is no redundancy here.
Given A B: Several tuples could have the same A value, and if so, they’ll all have the
same B value!
20
CP1500 – Revision Notes
Example Decomposition
Decompositions should be used only when needed.
SNLRWH has FDs S SNLRWH and R W
Second FD causes violation of 3NF; W values repeatedly associated with R values. Easiest way
to fix this is to create a relation RW to store these associations, and remove W from the main
schema:
ie: we decompose SNLRWH into SNLRH and RW
The information to be stored consists of SNLRWH tuples. If we just store the projections of these
tuples onto SNLRH and RW, are there any potential problems that we should be aware of?
21
CP1500 – Revision Notes
22
CP1500 – Revision Notes
23