Data Base Management System
Unit -3
Relational Model
Main idea:
Table: relation
Column header: attribute
Row: tuple
Relational schema: name(attributes)
Example: employee(ssno,name,salary)
Attributes:
Each attribute has a domain – domain constraint
Each attribute is atomic: we cannot refer to or directly see
a subpart of the value.
Relation Example
Account Customer
AccountId CustomerId Balance Id Name Addr
150 20 11,000 20 Tom Irvine
160 23 2,300 23 Jane LA
180 23 32,000 32 Jack Riverside
• Database schema consists of
– a set of relation schema
– Account(AccountId, CustomerId, Balance)
– Customer(Id, Name, Addr)
– a set of constraints over the relation schema
– AccountId, CustomerId must an integer
– Name and Addr must be a string of characters
– CustomerId in Account must be of Ids in Customer
– etc.
NULL value
Customer(Id, Name, Addr)
Id Name Addr
20 Tom Irvine
23 Jane LA
32 Jack NULL
Attributes can take a special value: NULL
Either not known: we don’t know Jack’s address
Domain Constraints
Every attribute has a type:
integer, float, date, boolean, string, etc.
An attribute can have a domain. E.g.:
Id > 0
Salary > 0
age < 100
City in {Irvine, LA, Riverside}
An insertion can violate the domain constraint.
DBMS checks if insertion violates domain constraint and reject the insertion.
Integer String String
Id Name City
20 Tom Irvine
23 Jane San Diego
-2 Jack Riverside violations
Key Constraints
Superkey: A Super Key is a set of one or more attributes (columns)
that can uniquely identify a row (tuple) in a table. No two rows in
the table can have the same values for a Super Key. Every
Candidate Key is a Super Key, but not every Super Key is a
Candidate Key.
Any superset of {Account} is also a superkey
There can be multiple superkeys
Log(LogId, AccountId, Xact#, Time, Amount) Illegal
LogID AccountID Xact# Time Amount
1001 111 4 1/12/02 $100
1001 122 4 12/28/01 $20
1003 333 6 9/1/00 $60
Example of Super Key
Table: Employee
Possible Super Keys:
1️⃣ {Emp_ID} (Unique by itself )
2️⃣ {Emp_ID, Name} (Contains extra attribute )
3️⃣ {Email} (Each email is unique )
4️⃣ {Phone} (Each phone number is unique )
5️⃣ {Emp_ID, Email, Phone, Dept_ID} (Still unique but redundant )
Minimal Super Keys like {Emp_ID}, {Email}, {Phone} are
Candidate Keys.
Redundant Super Keys contain extra attributes and are not
minimal.
Emp_ID Name Email Phone Dept_ID
101 Alice
[email protected] 9876543210 HR
102 Bob
[email protected] 9876543211 IT
103 Charlie
[email protected] 9876543212 HR
Keys
Key:
Minimal superkey (no proper subset is a superkey)
If more than one key: choose one as a primary key
Example:
Key 1: LogID (primary key)
Key 2: AccountId, Xact#
Superkeys: all supersets of the keys
Log(LogId, AccountId, Xact#, Time, Ammount)
LogID AccountID Xact# Time Amount
1001 111 4 1/12/02 $100 OK
1002 122 4 12/28/01 $20
1003 333 6 9/1/00 $60
Integrity Rules
There are two Integrity Rules that every relation should follow :
1. Entity Integrity (Rule 1)
2. Referential Integrity (Rule 2)
Entity Integrity states that –
If attribute A of a relation R is a prime attribute of R, then A
can not accept null and duplicate values.
Referential Integrity Constraints
Given two relations R and S, R has a primary key X (a set of attributes)
A set of attributes Y is a foreign key of S if:
Attributes in Y have same domains as attributes X
For every tuple s in S, there exists a tuple r in R: s[Y] = r[X].
A referential integrity constraint from attributes Y of S to R means that Y is
a foreign that refers to the primary key of R.
The foreign key must be either equal to the primary key or be entirely null.
Foreign key Y X (primary key of R)
r
s
S R
Examples of Referential Integrity
Account Customer
AccountId CustomerId Balance Id Name Addr
150 20 11,000 20 Tom Irvine
160 23 2,300 23 Jane LA
180 23 32,000 32 Jack Riverside
Account.customerId to Customer.Id
Student Dept
Id Name Dept Name chair
1111 Mike ICS ICS Tom
2222 Harry CE CE Jane
3333 Ford ICS MATH Jack
Student.dept to Dept.name: every value of Student.dept must also be a
value of Dept.name.
Relational Algebra
Relational Algebra is :
1. The formal description of how a relational database
operates
2. An interface to the data stored in the database itself.
3. The mathematics which underpin SQL operations
The DBMS must take whatever SQL statements the
user types in and translate them into relational algebra
operations before applying them to the database.
Operators - Retrieval
There are two groups of operations:
1. Mathematical set theory based relations:
UNION, INTERSECTION, DIFFERENCE, and
CARTESIAN PRODUCT.
2. Special database oriented operations:
SELECT , PROJECT and JOIN.
Symbolic Notation
SELECT σ (sigma)
PROJECT (pi)
PRODUCT (times)
JOIN ⋈ (bow-tie)
UNION (cup)
INTERSECTION (cap)
DIFFERENCE - (minus)
RENAME (rho)
SET Operations - requirements
For set operations to function correctly the relations
R and S must be union compatible. Two relations
are union compatible if
They have the same number of attributes
The domain of each attribute in column order is
the same in both R and S.
Set Operations - semantics
Consider two relations R and S.
UNION of R and S
the union of two relations is a relation that includes all
the tuples that are either in R or in S or in both R and S.
Duplicate tuples are eliminated.
INTERSECTION of R and S
the intersection of R and S is a relation that includes
all tuples that are both in R and S.
DIFFERENCE of R and S
the difference of R and S is the relation that contains
all the tuples that are in R but that are not in S.
Union , Intersection , Difference -
Set operators. Relations must have the same
schema.
R(name, dept) S(name, dept)
Name Dept Name Dept
Jack Physics Jack Physics
Tom ICS Mary Math
RS RS R-S
Name Dept Name Dept Name Dept
Jack Physics Jack Physics Tom ICS
Tom ICS
Mary Math
Relational SELECT
SELECT is used to obtain a subset of the tuples of a
relation that satisfy a select condition.
For example, find all employees born after 1st Jan 1950:
SELECT dob > ’01/JAN/1950’ (employee)
or
σ dob > ’01/JAN/1950’ (employee)
Conditions can be combined together using ^ (AND) and v
(OR). For example, all employees in department 1 called
`Smith':
σ depno = 1 ^ surname = `Smith‘ (employee)
Selection
c (R): return tuples in R that satisfy condition C.
Emp (name, dept, salary)
Name Dept Salary
Jane ICS 30K
Jack Physics 30K
Tom ICS 75K
Joe Math 40K
Jack Math 50K
salary>35K (Emp) dept=ics and salary<40K (Emp)
Name Dept Salary Name Dept Salary
Tom ICS 75K Jane ICS 30K
Joe Math 40K
Jack Math 50K
Relational PROJECT
The PROJECT operation is used to select a subset of the attributes of a
relation by specifying the names of the required attributes.
For example, to get a list of all employees with their salary
PROJECT ename, salary (employee)
OR
πename, salary(employee)
Projection
A1,…,Ak(R): pick columns of attributes A1,…,Ak of R.
Emp (name, dept, salary)
Name Dept Salary
Jane ICS 30K
Jack Physics 30K
Tom ICS 75K
Joe Math 40K
Jack Math 50K
name,dept (Emp) name (Emp)
Name Dept Name
Jane ICS Jane
Jack Physics Jack
Tom ICS Tom
Joe Math Joe
Jack Math
Duplicates (“Jack”) eliminated.
CARTESIAN PRODUCT
The Cartesian Product is also an operator which
works on two sets. It is sometimes called the
CROSS PRODUCT or CROSS JOIN.
It combines the tuples of one relation with all the
tuples of the other relation.
Cartesian Product:
R S: pair each tuple r in R with each tuple s in S.
Emp (name, dept) Contact(name, addr)
Name Dept Name Addr
Jack Physics Jack Irvine
Tom LA
Tom ICS
Mary Riverside
Emp Contact
E.name Dept C.Name Addr
Jack Physics Jack Irvine
Jack Physics Tom LA
Jack Physics Mary Riverside
Tom ICS Jack Irvine
Tom ICS Tom LA
Tom ICS Mary Riverside
JOIN Example
JOIN is used to combine related tuples from two
relations R and S.
In its simplest form the JOIN operator is just the
cross product of the two relations and is represented
as (R ⋈ S).
JOIN allows you to evaluate a join condition between
the attributes of the relations on which the join is
undertaken.
The notation used is R ⋈ S
Join Condition
Join
R C
S = c (R S)
• Join condition C is of the form:
<cond_1> AND <cond_2> AND … AND <cond_k>
Each cond_i is of the form A op B, where:
– A is an attribute of R, B is an attribute of S
– op is a comparison operator: =, <, >, , , or .
• Different types:
– Theta-join
– Equi-join
– Natural join
Theta-Join
R S
R.A>S.C
R(A,B) S(C,D)
A B C D
3 4 2 7
5 7 6 8
RS Result
R.A R.B S.C S.D
3 4 2 7 R.A R.B S.C S.D
3 4 6 8 3 4 2 7
5 7 2 7 5 7 2 7
5 7 6 8
Theta-Join
R S
R.A>S.C, R.B S.D
R(A,B) S(C,D)
A B C D
3 4 2 7
5 7 6 8
RS Result
R.A R.B S.C S.D R.A R.B S.C S.D
3 4 2 7 3 4 2 7
3 4 6 8
5 7 2 7
5 7 6 8
Equi-Join
Special kind of theta-join: C only uses the equality operator.
R(A,B) S(C,D)
A B C D
3 4 2 7
5 7 6 8
R S
R.B=S.D
RS Result
R.A R.B S.C S.D R.A R.B S.C S.D
3 4 2 7 5 7 2 7
3 4 6 8
5 7 2 7
5 7 6 8
Natural-Join
Relations R and S. Let L be the union of their attributes.
Let A1,…,Ak be their common attributes.
R S = L (R S)
R.A1=S.A1,…,R.Ak=S.Ak
Natural-Join
Emp (name, dept) Contact(name, addr)
Name Dept Name Addr
Jack Physics Jack Irvine
Tom LA
Tom ICS
Mary Riverside
Emp Contact: all employee names, depts, and addresses.
Emp.name Emp.Dept Contact.name Contact.addr
Jack Physics Jack Irvine
Jack Physics Tom LA
Emp Contact Jack Physics Mary Riverside
Tom ICS Jack Irvine
Tom ICS Tom LA
Tom ICS Mary Riverside
Result Name Dept Addr
Jack Physics Irvine
Tom ICS LA
Outer Joins
Motivation: “join” can lose information
E.g.: natural join of R and S loses info about Tom and
Mary, since they do not join with other tuples.
Called “dangling tuples”.
R S
Name Dept Name Addr
Jack Physics Jack Irvine
Tom ICS Mike LA
Mary Riverside
• Outer join: natural join, but use NULL values to fill in dangling tuples.
• Three types: “left”, “right”, or “full”
Left Outer Join
Name Dept Name Addr
R Jack Physics Jack Irvine S
Mike LA
Tom ICS Mary Riverside
Left outer join
R S
Name Dept Addr
Jack Physics Irvine
Tom ICS NULL
Pad null value for left dangling tuples.
Right Outer Join
Name Addr
Name Dept Jack Irvine
R Jack Physics Mike LA S
Tom ICS Mary Riverside
Right outer join
R S
Name Dept Addr
Jack Physics Irvine
Mike NULL LA
Mary NULL Riverside
Pad null value for right dangling tuples.
Full Outer Join
Name Dept Name Addr
R Jack Physics Jack Irvine S
Tom ICS Mike LA
Mary Riverside
Full outer join
R S
Name Dept Addr
Jack Physics Irvine
Tom ICS NULL
Mike NULL LA
Mary NULL Riverside
Pad null values for both left and right dangling tuples.
Joins Revised
Result of applying these joins in a query:
INNER JOIN: Select only those rows that have values in common in the
columns specified in the ON clause.
LEFT, RIGHT, or FULL OUTER JOIN: Select all rows from the table on the left (or
right, or both) regardless of whether the other table has values in common
and (usually) enter NULL where data is missing.
Combining Different Operations
Construct general expressions using basic operations.
Schema of each operation:
, , -: same as the schema of the two relations
Selection : same as the relation’s schema
Projection : attributes in the projection
Cartesian product : attributes in two relations, use prefix
to avoid confusion
Theta Join : same as
C
Natural Join : union of relations’ attributes, merge
common attributes
Renaming: new renamed attributes
Example 1
customer(ssn, name, city)
account(custssn, balance)
“List account balances of Tom.”
balance ( custssn = ssn
(account (
name =tom
customer )))
balance
Tree representation custssn= ssn
account name=tom
customer
Example 1(cont)
customer(ssn, name, city)
account(custssn, balance)
“List account balances of Tom.”
balance
ssn=custssn
account name=tom
customer
Comparing RA and SQL
Relational algebra:
is closed (the result of every expression is a relation)
has a rigorous foundation
has simple semantics
is used for reasoning, query optimisation, etc.
SQL:
is a superset of relational algebra
has convenient formatting features, etc.
provides aggregate functions
has complicated semantics
is an end-user language.
Functional
Dependencies
And
Normalization
Schema Normalization
Decompose relational schemes to
remove redundancy
remove anomalies
Result of normalization:
Semantically-equivalent relational scheme
Represent the same information as the original
Be able to reconstruct the original from
decomposed relations.
Functional Dependencies
Motivation: avoid redundancy in database design.
Relation R(A1,...,An,B1,...,Bm,C1,...,Cl)
Definition: A1,...,An functionally determine
B1,...,Bm,i.e.,
(A1,...,An →B1,...,Bm)
iff for any two tuples r1 and r2 in R,
r1(A1,...,An ) = r2(A1,...,An )
implies r1(B1,...,Bm) = r2(B1,...,Bm)
By definition: a superkey → all attributes of the
relation.
Example
Take(StudentID, CID, Semster, Grade)
FD: (StudentId,Cid,semester) → Grade
StudentId Cid Semester Grade
1111 ICS184 Winter 02 A
1111 ICS184 Winter 02 B Illegal
2222 ICS143 Fall 01 A-
What if FD: (StudentId, Cid) → Semester?
StudentId Cid Semester Grade
1111 ICS184 Winter 02 A
1111 ICS184 Spring 02 A Illegal
2222 ICS143 Fall 01 A-
“Each student can take a course only once.”
FD Sets
A set of FDs on a relation: e.g., R(A,B,C), {A→B,
B→C, A→C, AB→A}
Some dependencies can be derived
e.g., A→C can be derived from {A→B, B→C}.
Some dependencies are trivial
e.g., AB→A is “trivial.”
Trivial Dependencies
Those that are true for every relation
A1 A2…An → B1 B2…Bm is trivial if B’s are a subset of the
A’s.
Example: XY → X (here X is a subset of XY)
Called nontrivial if none of the B’s is one of the A’s.
Example: AB→C (i.e. there is no such attribute at right
side of the FD which is at left side also)
Closure of FD Set
Definition: Let F be a set of FDs of a relation R.
We use F+ to denote the set of all FDs that must
hold over R, i.e.:
F+ = { X → Y | F logically implies X → Y}
F+ is called the closure of F.
Example: F = {A→B, B→C}, then A→C is in F+.
Armstrong’s Axioms: Inferring All FDs
Given a set of FDs F over a relation R, how to compute F+?
• Reflexivity:
– If Y is a subset of X, then X →Y.
– Example: AB→A, ABC→AB, etc.
• Augmentation:
– If X→Y, then XZ→YZ.
– Example: If A→B, then AC→BC.
• Transitivity:
– If X→Y, and Y→Z, then X→Z.
– Example: If AB→C, and C→D, then AB→D.
More Rules Derived from AAs
Union Rule( or additivity):
If X→Y, X→Z, then X→YZ
Projectivity
If X→YZ, then X→Y and X→Z
Pseudo-Transitivity Rule:
If X→Y, WY→Z, then WX→Z
The Normalization Process
In relational databases the term normalization refers to a reversible step-
by-step process in which a given set of relations is decomposed into a set
of smaller relations that have a progressively simpler and more regular
structure.
The objectives of the normalization process are:
To make it feasible to represent any relation in the
database.
applies to First Normal Form
To free relations from undesirable insertion, update and
deletion anomalies.
applies to all normal forms
The Normalization Process
The entire normalization process is based
upon
the analysis of relations
their schemes
their primary keys
their functional dependencies.
Normalization
rmalized Relati
o
n t normal fo o
rs d normal r
Un
ns
Functional
on normal f
dependency
m m
Sec Fi
No transitive of nonkey
dependency
d f
or
attributes on
between the primary
Thir
orm
nonkey key - Atomic
attributes Boyce- values only
Codd and
Higher
All Full
determinants Functional
are candidate dependency
of nonkey
keys - Single
multivalued attributes on
dependency the primary
key
Normal Forms
1st Normal Form No repeating data groups
2nd Normal Form No partial key dependency
3rd Normal Form No transitive dependency
Boyce-Codd Normal Form Reduce keys dependency
Unnormalized Relations
First step in normalization is to convert the data into a
two-dimensional table
A relation is said to be unnormalized if does not conatin
atomic values.
Eg of Unnormalized Relation
Patient # Surgeon # Surg. date Patient Name Patient Addr Surgeon Surgery Postop drug
Drug side effects
Gallstone
s removal;
Jan 1, 15 New St. Beth Little Kidney
145 1995; June New York, Michael stones Penicillin, rash
1111 311 12, 1995 John White NY Diamond removal none- none
Eye
Charles Cataract
Apr 5, Field removal
243 1994 May 10 Main St. Patricia Thrombos Tetracyclin Fever
1234 467 10, 1995 Mary Jones Rye, NY Gold is removal e none none
Dogwood
Lane Open
Jan 8, Harrison, David Heart Cephalosp
2345 189 1996 Charles Brown NY Rosen Surgery orin none
55 Boston
Post Road,
Nov 5, Chester, Cholecyst
4876 145 1995 Hal Kane CN Beth Little ectomy Demicillin none
Blind Brook Gallstone
May 10, Mamaronec s
5123 145 1995 Paul Kosher k, NY Beth Little Removal none none
Eye
Cornea
Replacem
Apr 5, Hilton Road ent Eye
1994 Dec Larchmont, Charles cataract Tetracyclin
6845 243 15, 1984 Ann Hood NY Field removal e Fever
First Normal Form
Tomove to First Normal Form a relation must
contain only atomic values at each row and
column.
No repeating groups
Relation in 1NF contains only atomic
values.
First Normal Form
Three Formal definitions of First Normal Form
A relation r is said to be in First Normal Form (1NF) if and
only if every entry of the relation (each cell) has at most a
single value.
A relation is in first normal form (1NF) if and only if all
underlying simple domain contains atomic values only.
A relation is in 1NF if and only if all of its attributes are
based upon a simple domain.
These two definitions are equivalent.
If all relations of a database are in 1NF, we can say that
the database is in 1NF.
Eg of First Normal Form
The normalized representation of the PROJECT table
PROJECT
Proj Proj-Name Proj-Mgr- Emp-ID Emp- Emp-Dpt Emp-Hrly- Total
-ID ID Name Rate -Hrs
100 E-commerce 789487453 123423479 Heydary MIS 65 10
100 E-commerce 789487453 980808980 Jones TechSupport 45 6
100 E-commerce 789487453 234809000 Alexander TechSupport 35 6
100 E-commerce 789487453 542298973 Johnson TechDoc 30 12
110 Distance-Ed 820972445 432329700 Mantle MIS 50 5
110 Distance-Ed 820972445 689231199 Richardson TechSupport 35 12
110 Distance-Ed 820972445 712093093 Howard TechDoc 30 8
120 Cyber 980212343 834920043 Lopez Engineering 80 4
120 Cyber 980212343 380802233 Harrison TechSupport 35 11
120 Cyber 980212343 553208932 Olivier TechDoc 30 12
120 Cyber 980212343 123423479 Heydary MIS 65 07
130 Nitts 550227043 340783453 Shaw MIS 65 07
First Normal Form
This normalized PROJECT table is not a relation
because it does not have a primary key.
The attribute Proj-ID no longer identifies uniquely
any row.
To transform this table into a relation a primary key
needs to be defined.
A suitable PK for this table is the composite key
(Proj-ID, Emp-ID)
No other combination of the attributes of the table
will work as a PK.
Partial Dependencies
Identifying the partial dependencies in the PROJECT-
EMPLOYEE relation.
The PK of this relation is formed by the attributes Proj-ID
and Emp-ID.
This implies that {Proj-ID, Emp-ID} uniquely identifies a
tuple in the relation.
They functionally determine any individual attribute or
any combination of attributes of the relation.
However, we only need attribute Emp-ID to functionally
determine the following attributes:
Emp-Name, Emp-Dpt, Emp-Hrly-Rate.
Second Normal Form
And we need only Proj-Id attribute to functionally determine
proj_name and Proj_Mgr_Id.
So, we decompose the relation into following two relations:
PROJECT Proj- Proj- Proj-Mgr-
ID Name ID
100 E- 789487453
commerce
110 Distance- 820972445
Ed
120 Cyber 980212343
130 Nitts 550227043
Second Normal Form
PROJECT-EMPLOYEE
Emp-ID Emp-Name Emp-Dpt Emp-Hrly-
Rate
123423479 Heydary MIS 65
980808980 Jones TechSupport 45
234809000 Alexander TechSupport 35
542298973 Johnson TechDoc 30
432329700 Mantle MIS 50
689231199 Richardson TechSupport 35
712093093 Howard TechDoc 30
834920043 Lopez Engineering 80
380802233 Harrison TechSupport 35
553208932 Olivier TechDoc 30
340783453 Shaw MIS 65
There are no partial dependencies in both the tables
because the determinant of the key only has a single
attribute.
Emp-Name
For eg: Proj-ID
Emp-Dpt
Emp-ID Emp-Hrly-Rate
To relate these two relations, we create a third table
(relationship table) that consists of the primary keys of
both the relations as foreign key and an attribute ‘Total-
Hrs-Worked’ because it is fully dependent on the key of
the relation {Proj-Id, Emp-Id}.
Second Normal Form
A relation is said to be in Second Normal Form if is in 1NF and
when every non key attribute is fully functionally dependent on
the primary key.
Or No nonprime attribute is partially dependent on any key .
Now, the example relation scheme is in 2NF with following
relations:
Project (Proj-Id, Proj-Name, Proj-Mgr-Id)
Employee (Emp-Id, Emp-Name, Emp_dept, Emp-Hrly-Rate )
Proj_Emp (Proj-id, Emp-Id, Total-Hrs-Worked)
Data Anomalies in 2NF Relations
Insertion anomalies occur in the EMPLOYEE
relation.
Consider a situation where we would like to set
in advance the rate to be charged by the
employees of a new department.
We cannot insert this information until there is an
employee assigned to that department.
Notice that the rate that a department charges
is independent of whether or not it has
employees.
Data Anomalies in 2NF Relations
The EMPLOYEE relation is also susceptible to
deletion anomalies.
This type of anomaly occurs whenever we delete
the tuple of an employee who happens to be the
only employee left in a department.
Inthis case, we will also lose the information
about the rate that the department charges.
Data Anomalies in 2NF Relations
Update anomalies will also occur in the EMPLOYEE
relation because there may be several employees from
the same department working on different projects.
If thedepartment rate changes, we need to make
sure that the corresponding rate is changed for all
employees that work for that department.
Otherwise the database may end up in an
inconsistent state.
Transitive Dependencies
A transitive dependency is a functional dependency which holds by virtue of
transitivity. A transitive dependency can occur only in a relation that has three
or more attributes. Let A, B, and C designate three distinct attributes and
following conditions hold:
A→B (where A is the key of the relation)
B→C
Then the functional dependency A → C (which follows from 1 and 3 by the
axiom of transitivity) is a transitive dependency.
For eg: If in a relation Book is the key and
{Book} → {Author}
{Author} → {Nationality}
Therefore {Book} → {Nationality} is a transitive dependency.
Transitive dependency occurs when a non-key attribute determines another
non-key attribute.
Transitive Dependencies
Assume the following functional dependencies of
attributes A, B and C of relation r(R):
C
Third Normal Form
A relation is in 3NF iff it is in 2NF and every non key attribute is non
transitively dependent on the primary key.
A relation r(R) is in Third Normal Form (3NF) if and only if the following
conditions are satisfied simultaneously:
r(R) is already in 2NF.
No nonprime attribute is transitively dependent on the key.
The objective of transforming relations into 3NF is to remove all transitive
dependencies.
Given a relation R with FDs F, test if R is in 3NF.
Compute all the candidate keys of R
For each X→Y in F, check if it violates 3NF
If X
is not a superkey, and Y is not part of a candidate key, then
X→Y violates 3NF.
Conversion to Third Normal Form
A* A*
B B
Convert to
C
B*
* indicates the key or the C
determinant of the relation.
Third Normal Form
Using the general procedure, we will transform our 2NF
relation example to a 3NF relation.
The relation EMPLOYEE is not in 3NF because there is a
transitive dependency of a nonprime attribute on the primary
key of the relation.
In this case, the nonprime attribute Emp-Hrly-Rate is
transitively dependent on the key through the functional
dependency Emp-Dpt → Emp-Hrly-Rate.
To transform this relation into a 3NF relation:
it is necessary to remove any transitive dependency of a
nonprime attribute on the key.
It is necessary to create two new relations.
Third Normal Form
The scheme of
the first relation that we have
named EMPLOYEE is:
EMPLOYEE (Emp-ID, Emp-Name, Emp-Dpt)
The scheme of
the second relation that we have
named CHARGES is:
CHARGES (Emp-Dpt, Emp-Hrly-Rate)
Data Anomalies in Third Normal Form
The Third Normal Form helped us to get rid of the data
anomalies caused either by
transitive dependencies on the PK or
by dependencies of a nonprime attribute on another
nonprime attribute.
However, relations in 3NF are still susceptible to data
anomalies, particularly when
the relations have two overlapping candidate keys or
when a nonprime attribute functionally determines a
prime attribute.
Boyce-Codd Normal Form (BCNF)
• A relation is in BCNF iff every determinant is a candidate key.
OR
• In other words, a relational schema R is in Boyce–Codd normal
form if and only if for every one of its dependencies X → Y, at least
one of the following conditions hold:
• X → Y is a trivial functional dependency (Y ⊆ X)
• X is a superkey for schema R
• The definition of 3NF does not deal with a relation that:
• has multiple candidate keys, where
• those candidate keys are composite, and
• the candidate keys overlap (i.e., have at least one common
attribute)
Example of BCNF
Candidate keys are (sid, part_id)
and (sname, part_id).
With following FDs: sname part_id
1. { sid, part_id } → qty sid qty
2. { sname, part_id } → qty
SSP
3. sid → sname
4. sname → sid
The relation is in 3NF:
For sid → sname, … sname is in a candidate key.
For sname → sid, … sid is in a candidate key.
However, this leads to redundancy and loss of information
Example of BCNF
If we decompose the schema into
R1 = ( sid, sname ), R2 = ( sid, part_id, qty )
These are in BCNF.
The decomposition is dependency preserving.
{ sname, part_id } → qty can be deduced from
(1) sname → sid (given)
(2) { sname, part_id } → { sid, part_id } (augmentation on (1))
(3) { sid, part_id } → qty (given)
and finally transitivity on (2) and (3).
3NF vs BCNF
Only in rare cases does a 3NF table not meet the
requirements of BCNF. A 3NF table which does not have
multiple overlapping candidate keys is guaranteed to be in
BCNF. Depending on what its functional dependencies are, a
3NF table with two or more overlapping candidate keys may
or may not be in BCNF.
If a relation schema is not in BCNF
it is possible to obtain a lossless-join decomposition into a
collection of BCNF relation schemas.
Dependency-preserving is not guaranteed.
3NF
There is always a dependency-preserving, lossless-join
decomposition into a collection of 3NF relation schemas.
Properties of a good Decomposition
A decomposition of a relation R into sub-relations R1, R2,…….,
Rn should possess following properties:
The decomposition should be
• Attribute Preserving ( All the attributes in the given relation
must occur in any of the sub – relations)
• Dependency Preserving ( All the FDs in the given relation
must be preserved in the decomposed relations)
• Lossless join ( The natural join of decomposed relations should
produce the same original relation back, without any spurious
tuples).
• No redundancy ( The redundancy should be minimized in the
decomposed relations).
Lossless Join Decomposition
The relation schemas { R1, R2, …, Rn } is a lossless-join decomposition of R
if:
for all possible relations r on schema R,
r = R1( r ) R2( r ) … Rn ( r )
Example:
Student = ( sid, sname, major)
F = { sid → sname, sid → major}
{ sid, sname } + { sid, major } is a lossless join decomposition
the intersection = {sid} is a key in both schemas
{sid, major} + { sname, major } is not a lossless join decomposition
the intersection = {major} is not a key in either
{sid, major} or { sname, major }
Another Example
R = { A, B, C, D }
F = { A → B, C → D }.
Key is {AC}.
introduce
Decomposition: { (A, B), (C, D), (A, C) } virtually
Consider it a two step decomposition:
1. Decompose R into R1 = (A, B), R2 = (A, C, D)
2. Decompose R2 into R3 = (C, D), R4 = (A, C)
This is a lossless join decomposition.
If R is decomposed into (A, B), (C, D)
This is a lossy-join decomposition.
Fourth Normal Form
A relation R is in 4NF if and only if it satisfies following
conditions:
If R is already in 3NF or in BCNF.
If it contains no multi valued dependencies.
MVDs occur when two or more independent multi valued facts
about the same attribute occur within the same relation.
This means that if in a relation R, having A, B and C attributes,
B and C are multi valued represented as A→→B and A→→C,
then MVD exists only if B and C are independent of each other.
Example: 4NF
Example: 4NF
Fifth Normal Form
A relation R is in 5NF (also called Projection-Join Normal form
or PJNF) iff every join dependency in the relation R is implied
by the candidate keys of the relation R.
A relation decomposed into two relations must have lossless
join property, which ensures that no spurious tuples are
generated when relations are reunited using a natural join.
There are requirements to decompose a relation into more
than two relations. Such cases are managed by join
dependency and 5NF.
Implies that relations that have been decomposed in previous
NF can be recombined via natural joins to recreate the
original relation.
Fifth Normal Form
Consider the different case where, if an agent is an agent for a company and that
company makes a product, then he always sells that product for the company.
Under these circumstances, the 'agent company product' table is as shown
below . This relation contains following dependencies.
Agent →→ Company
Agent →→ Product_Name
Company→→Product_Name
agent_company_product_table
Fifth Normal Form
The table is necessary in order to show all the information required.
Suneet, for example, sells ABC's Nuts and Screws, but not ABC's Bolts. Raj is
not an age it for CDE and does not sell ABC's Nuts or Screws. The table is
in 4NF because it contains no multi-valued dependency. It does,
however, contain an element of redundancy in that it records the fact
that Suneet is an agent for ABC twice. Suppose that the table is
decomposed into its two projections, PI and P2.
The redundancy has been eliminated, but the information about which
companies make which products and which of these products they
supply to which agents has been lost. The natural join of these two
projections will result in some spurious tuples (additional tuples which were
not present in the original relation).
Fifth Normal Form
This table can be decomposed into its three projections without loss of
information as demonstrated below .
If we take the natural join of these relations then we get the original
relation back. So this is the correct decomposition.
decompose into three projection
THANK
YOU