Chapter 15
Basics of Functional
Dependencies
and
Normalization
Chapter 15 Outline
Problems in Bad DB Design
Functional Dependencies
Normal Forms Based on Primary Keys
General Definitions of Second and Third
Normal Forms
Boyce-Codd Normal Form
Problems in Bad DB Design
Student# Studentname Course# CourseName
100 Ali CS100 C++
100 Ali CS101 Java
200 Ahmad cs200 OS
Redundant data
More space Slow system
Complexities of update
Update anomalies:
Insertion, Deletion Update Anomalies
Attributes in which most of their values are Null
Ambiguous meaning of Null
Existed but unknown at present (e.g. Address)
Not applicable (e.g. student average)
Applicable but not assigned yet (e.g. student mark)
Update Anomalies
Insert anomalies
You cannot create/insert a new course unless you have a student
enrolled in a course.
Update anomalies
Incase of updating Studentname, you have to update many rows.
Delete anomalies
If a course has only one student. Deleting the student will delete the
course.
Student# Studentname Course# CourseName
100 Ali CS100 C++
100 Ali CS101 Java
200 Ahmad cs200 OS
Functional Dependency
A functional dependency (FD) is a constraint
between two sets of attributes in a relation schema
• If X and Y are two sets of attributes in the same
relation schema R, then X → Y means that X
functionally determines Y.
• FD is a property of the meaning or semantics of
the attributes
• The FD specifies a restriction on the possible
tuples that can form a relation instance r of R
The FD Constraint - Informally
Functional Example: A relation
dependency DEPARTMENT (DNO, DNAME, DLOC)
X Y holds if and can have following FDs
only if whenever two FD1: DNO DLOC
tuples agree on their FD2: DNO DNAME
X-value, they must
necessarily agree on
their Y value
The FD Constraint - Formally
The FD constraint is that for any two tuples t1 and
t2 in the relation instance r(R) that have:
t1[X] = t2[X]
we must also have t1[Y] = t2[Y]
This means that the values of the Y component of a tuple depend on, or
are determined by, the X component
• The values of the X component of a tuple uniquely (or functionally)
determine the values of the Y component
If X Y holds, then Y is functionally dependent on X
- X is termed the left-hand-side (LHS) of the FD or determinant
- Y is termed the right-hand-side (RHS) of the FD
Inference Rules
An Inference Rules in logic is a procedure
which combines known facts to produce
("infer") new facts
Example: If A is true, and A implies B,
then B is true
There are 6 inference rules: IR1 - IR6
IR1-IR3 are referred to as Armstrong’s
Inference Rules
IR1: Reflexive Rule
If Y X then X Y
A set of attributes always determines itself
or any of its subsets
Example:
If ESSN {ESSN, Dependent_Name} then
{ESSN, Dependent_Name} ESSN holds
IR2: Augmentation Rule
If X Y Then XZ YZ
Adding the same set of attributes to both
the LHS & RHS of a FD results in another
valid FD
Example:
If SSN Ename then
{SSN, Address} {Ename, Address}
IR3: Transitive Rule
If X Y, Y Z Then X Z
FDs are transitive
Example:
If SSN Dno and Dno Dlocation
Then SSN Dlocation
Armstrong's Inference Rules
The rules IR1-IR3 are sound and complete.
• Sound: Anything implied by such rules is logically correct
• Complete: Have the ability to imply any possible logical
FD’s
IR4: Decomposition Rule
If X YZ Then X Y, X Z
We can remove attributes from the RHS of
a dependency, and decompose the FD
Example:
If SSN {Ename, Dno} then
SSN Ename and SSN Dno
IR5: Additive (Union) Rule
If X Y, X Z Then X YZ
We can union attributes from the RHS of a
dependency, and combine a set of FDs
into a single FD (reverse of IR4)
Example:
If SSN Ename and SSN Dno then
SSN {Ename, Dno}
IR6: Pseudo transitive Rule
If X Y, WY Z Then WX Z
Represents a variant of IR3
Example:
If SSN MgrSSN and
{MgrSSN, Dependent_Name} Relationship Then
{SSN, Dependent_Name} -> Relationship
Closure of a Set of FD’s (F )
+
Definition: Given a set F of functional dependencies
on R. The closure of F denoted by F+ is the set of all
functional dependencies inferred from F via the
inference rules given previously.
To compute F+
Let F+ = F
Apply the inference rules repeatedly until no more
changes occur in F+
Example (1)
Let R(A,B,C,D) be a relation schema and
F={AB, AC, BCD} be a set of FD’s hold on R.
Find F+
A B and AC Then A BC (Rule 5)
ABC and BCD Then AD (Rule 3)
AB and AD Then ABD (Rule 5)
AC and AD Then A CD (Rule 5)
AB and AC and AD Then ABCD (Rule 5)
Example (2)
Given R(A, B, C, G, H, I) and
F={A->B, A->C, CG->H, CG->I, B->H}.
We list some members of F+ below
AB and BH Therefore AH (using IR3 )
CGH and CGI Therefore CG->HI (using IR5)
AC Then AGCG (using IR2) (By adding G)
CGI Therefore AGI (using IR3)
(OR by using IR6 AC, CGI therefore AGI)
Closure of Attribute Set
Given a relation schema R and a set of FD’s
that hold on R. Let α be a set of attributes
in R. Then
α+= α plus all attributes that can be implied
directly or indirectly from α
Example (1)
Given R(A, B, C) with functional dependencies
F={AB and BC}. Calculate A+
Initially, A+ ={A}.
And then use the given FD’s
From AB we get A+ = {A, B}.
From BC we get A+ = {A, B, C}.
Therefore,
A+ = {A, B, C} which is all attributes of R
so A is a candidate key.
Example (2)
Given R ( A, B, C, D, E, F ) with a set of FDs
F = {A BC, E CF, B E, CD EF}
Find the candidate key for R.
A+={ABCEF} (By using the algorithm)
B+={BECF}
……
AB+={ABCEF}
AD+={ADBCEF} which is a candidate key
Normalization
Normalization is a method for organizing data elements in a
database into tables to minimize duplication
Why Normalization?
Reduce Redundant data
Remove Inconsistent data
Reduce anomalies
Increase data integrity
Simplify data maintenance
Take less disk space
Goal of Normalization
In each table all non-key attributes should be
dependent on the primary key
Normalization
Normal forms:
First Normal Form (1NF)
Second Normal Form (2NF)
Strength
Third Normal Form (3NF)
Boyce-codd Normal Form (BCNF)
First Normal Form (1NF)
A relation schema is in 1NF if:
domains of attributes include only atomic (simple,
indivisible) values
and the value of an attribute is a single value from the
domain of that attribute
Example of un-normalized relation
Let R(SSN,Name(F-name,L-name),{telephone})
Note: R has a composite attribute (Name) and has a
multivalue attribute (Telephone). Then R in not in 1NF (i.e.
unnormalized relation)
BCNF Form
Rule: Given a relation schema R and a set of
FD’s of the form (αß) that hold on R.
Then R is in BCNF if for all FD’s in F, one
of the following conditions is satisfied:
1) ß α or
2) α is super key
BCNF Example
Lending(Branch-name,Branch-city,Branch-assets,Loan-no,Amount,Customer)
FD1
FD2
FD1: α is not S.K and ß not α
Then Lending must be decomposed into:
R1 which includes α and ß
R2 which includes R – ß
R1(Branch-name,Branch-city,Branch-assets)
R2(Branch-name, Loan-no, Amount, Customer)
BCNF Example
Lending(Branch-name,Branch-city,Branch-assets,Loan-no,Amount,Customer)
FD1
FD2
FD1: α is not S.K and ß not α
Then Lending must be decomposed into:
R1 which includes α and ß
R2 which includes R – ß
R1(Branch-name,Branch-city,Branch-assets)
R2(Branch-name, Loan-no, Amount, Customer)
BCNF Example
Lending(Branch-name,Branch-city,Branch-assets,Loan-no,Amount,Customer)
FD1
FD2
FD1: α is not S.K and ß not α
Then Lending must be decomposed into:
R1 which includes α and ß
R2 which includes R – ß
R1(Branch-name,Branch-city,Branch-assets)
R2(Branch-name, Loan-no, Amount, Customer)
BCNF Example Cont.
Repeat the procedure for R1 and R2 again:
R1(Branch-name,Branch-city,Branch-assets)
R1 has only one FD (α is S.K). So, R1 is in BCNF
R2(Branch-name, Loan-no, Amount, Customer)
R2 has one FD which does not satisfied the conditions. So
decompose R2 into R21 and R22
R21(Loan-no, Amount, Branch-name) which satisfies the S.K condition
BCNF Example Cont.
Repeat the procedure for R1 and R2 again:
R1(Branch-name,Branch-city,Branch-assets)
R1 has only one FD (α is S.K). So, R1 is in BCNF
R2(Branch-name, Loan-no, Amount, Customer)
R2 has one FD which does not satisfied the conditions. So
decompose R2 into R21 and R22
R21(Loan-no, Amount, Branch-name) which satisfies the S.K condition
BCNF Example Cont.
R22(Loan-no, Customer) (R – ß)
Rule: Any attribute which does not determined by FD must
be part of a key.
Lending will be as follows:
Lending (R)
R1 R2
R21 R22
BCNF Example Cont.
R22(Loan-no, Customer) (R – ß)
Rule: Any attribute which does not determined by FD must
be part of a key.
Lending will be as follows:
Lending (R)
R1 R2
Only R1, R21
and R22 will
be in the DB R21 R22
3NF
Rule: Given a relation schema R and a set of FD’s of the
form (αß) that hold on R. Then R is in 3NF if for all
FD’s in F, one of the following conditions is satisfied:
1) ß α or
2) α is super key or
3) Each attribute in ß is prime
Prime attribute: An attribute that is a member of any
candidate key
Nonprime attribute: An attribute that is not a member of
any candidate key
3NF Example
R(Branch-name,Customer-name,Banker-name,Office-no)
FD2:
α is not S.K
ß not α
ß1 is prime but ß2 is not
Then, R is not in 3NF
R must be decomposed into:
R1 which includes α and all nonprime of ß
R2 which includes R – all nonprime of ß
3NF Example
R(Branch-name,Customer-name,Banker-name,Office-no)
ß1
FD2:
α is not S.K ß2
ß not α
ß1 is prime but ß2 is not
Then, R is not in 3NF
R must be decomposed into:
R1 which includes α and all nonprime of ß
R2 which includes R – all nonprime of ß
3NF Example
R(Branch-name,Customer-name,Banker-name,Office-no)
ß1
FD1:
α is not S.K ß2
ß not α
ß1 is prime but ß2 is not
Then, R is not in 3NF
R must be decomposed into:
R1 which includes α and all nonprime of ß
R2 which includes R – all nonprime of ß
R1(Banker-name,Office-no)
R2(Branch-name, Customer-name, Banker-name)
3NF Example
R(Branch-name,Customer-name,Banker-name,Office-no)
ß1
FD1:
α is not S.K ß2
ß not α
ß1 is prime but ß2 is not
Then, R is not in 3NF
R must be decomposed into:
R1 which includes α and all nonprime of ß
R2 which includes R – all nonprime of ß
R1(Banker-name,Office-no)
R2(Branch-name, Customer-name, Banker-name)
3NF Example
R1(Banker-name,Office-no)
R2(Branch-name, Customer-name, Banker-name)
R1:
α is S.K So, R1 is in 3NF
R2: ß is prime attribute. So, R2 is in 3NF
2NF
Rule: Given a relation schema R and a set of FD’s of the
form (αß) that hold on R. Then R is in 2NF if for all
FD’s in F, one of the following conditions is satisfied:
1) ß α or
2) α is super key or
3) Each attribute in ß is prime or
4) α is not proper subset of a key
2NF Example
R(Branch-name,Customer-name,Banker-name,Office-no)
ß1
ß2
FD1:
α
α is S.K
FD2:
α is not subset of a key
So, R is in 2NF
Example
R A B C D E F Full
Dependency
Transitive
Partial Dependency
Dependency
Normalization Steps
If a relation has repeating groups or multivalue
Then remove the repeating group and split the
multivalue into a new relation to be in 1NF
Remove partial dependency to be in 2NF
Remove transitive dependency to be in 3NF
When a relation schema is satisfied 3NF:
Partial dependencies are removed
Transitive dependencies are removed
All attributes are dependent on P.K
Tables are small and well-formd
Example (When R must not be in BCNF)
Let R(A, B, C, D, E) be a relation schema and F={A B,
AC DE, DC) be a set of functional dependencies
hold on R. Check if R is in BCNF or not?
Solution: R(A, B, C, D, E)
FD1
FD2
FD3
FD1: α is not super key. So, decompose R into R1 and R2
R1(A, B)
R2(A, C, D, E)
Example
R2(A, C, D, E)
FD1: α is S.K
FD2: α is not S.K. But if we decompose R2 according to FD2 we will loss FD
R21 (D, C) R22(A, D, E) FD1 is lost
Example
R2(A, C, D, E)
FD1: α is S.K
FD2: α is not S.K. But if we decompose R2 according to FD2 we will loss FD
R21 (D, C) R22(A, D, E) FD1 is lost
Example
R2(A, C, D, E)
FD1: α is S.K
FD2: α is not S.K. But if we decompose R2 according to FD2 we will loss FD
R21 (D, C) R22(A, D, E) FD1 is lost
So, we return to previous
normal form which is 3NF
Then, R1 is in BCNF and
R2 is in 3NF because ß is
prime
Example
R(SSN, Pno, Hours, Ename, Pname, Plocation)
1NF: R is in 1NF because there is no repeating group (composite) and no
multivalue attribute.
2NF:
FD1: α (ssn,pno )is super key
FD2: α (ssn) is not super key
ß (ename) is not prime attribute
α (ssn) is a part of a key. So, R is not in 2NF. Then decompose R into:
R1=(α, ß)=(SSN, Ename)
R2=(R- ß) = (SSN, Pno, Hours, Ename, Pname, Plocation
Example
R1=(α, ß)=(SSN, Ename) R1 is in 2NF
R2=(R- ß) = (SSN, Pno, Hours, Ename, Pname, Plocation)
R2:
FD1: α (ssn,pno )is super key
FD2: α (pno) is not super key
ß (pname, plocation) is not prime attribute
α (pno) is a part of a key. So, R is not in 2NF. Then decompose R2 into:
R21=(α , ß) = (SSN, Pno, Hours)
R22=(R- ß) = (Pno, Pname, Plocation)
R21 and R22 are in 2NF and also in 3NF and in BCNF