DBMS (CO202)
MC2
Ms. Disha Dua
Normalization
● Normalization process (as first proposed by
Codd) takes a relation schema through a
series of tests to certify whether it satisfies
a certain normal form
● The process proceeds in a top-down fashion
by evaluating each relation against the
criteria for normal forms & decomposing
relations as necessary, can thus be
Normalization
● Codd proposed three normal forms: first,
second, and third normal form
● A stronger definition of 3NF—called Boyce-
Codd normal form (BCNF)—was proposed
later by Boyce and Codd.
● All normal forms are based on a single
analytical tool: the functional dependencies
among the attributes of a relation
● Later, fourth normal form & fifth normal
Normalization
● Normalization of data: a process of
analyzing the given relation schemas based
on their FDs & primary keys, to:
(1) minimize redundancy
(2) minimize insertion, deletion & update
anomalies
● It can be considered as a “filtering” or
“purification” process to make the design
have successively better quality
Functional Dependencies
● A functional dependency is a constraint
between two sets of attributes from the
“X determines Y”
or
database “Y is dependent
Formal Definition: on X”
A functional dependency, denoted by X → Y,
between two sets of attributes X and Y that
are subsets of a relation R specifies a
constraint on the possible tuples that can form
a relation state r of R. The constraint is that,
ARMSTRONG’S AXIOMS (Properties
of FDs)
1. Reflexivity
If Y ⊆ X, then X → Y (X determines Y)
2. Augmentation
If X → Y, then ZX → ZY
3. Transitivity
If X → Y and Y → Z, then X → Z
4. Union
If X → Y and X → Z, then X → YZ
ARMSTRONG’S AXIOMS (Properties
of FDs)of Union Property)
5. Decomposition (converse
If X → YZ, then X → Y and X → Z
Note. If XY → Z, then it doesn’t mean that X
→Z&Y→Z
6. Pseudo-transitivity
If X → Y, and WY → Z, then WX → Z
7. Composition
Functional Dependencies - Types
TRIVIAL FDs NON-TRIVIAL FDs
When X → Y, and Y is a When X → Y, and Y is
subset of X, then this is not a subset of X,
a Trivial FD. then this is a Non-
Trivial FDs are ALWAYS trivial FD.
True/Valid.
Ex. [ID, Name] →
Ex. [ID, Name] → [Age]
[Name] Here, [Age] ⊄ [ID,
Attribute Closure/Closure on
Attribute Set
Attribute closure of an attribute set ‘A’ can be
defined as a set of attributes which can be
functionally determined
Directlyfrom it
or indirectly,
through FDs
Denoted by A+
Ex. R(ABC)
FDs: (A → B, B → C)
Then, A+ = A
= AB
Attribute Closure/Closure on
Attribute Set
Ex. R(ABCDEF), find D and DE+
+
FDs: (A → B, C → DE, AC → F, D → AF, E →
CF)
D+ = D
= ADF
= ABDF
DE+ = DE
= ADEF
= ACDEF
Determining Keys using FDs
Recall that (as discussed in class) :
A candidate key may be defined as-
A set of minimal attribute(s) that can identify
each tuple uniquely in the given relation is
called as a candidate key.
OR
Determining Keys using FDs
Hence, for any given relation,
● It is possible to have multiple candidate
keys.
● There exists no general formula to find the
total number of candidate keys of a given
relation.
Determining Keys using FDs
Step 1 for determining keys using FDs:
● Determine all essential attributes of the
given relation.
● Essential attributes are those attributes
which are not present on RHS of any
functional dependency.
● Essential attributes are always a part of
Determining Keys using FDs
Let R(A, B, C, D, E, F) be a relation schema with
the following functional dependencies-
A→B
C→D
D→E
Here, the attributes which are not present on
RHS of any functional dependency are A, C and
F. So, essential attributes are- A, C and F.
Step 2 for determining keys using FDs:
The remaining attributes of the relation are non-essential
attributes, since they can be determined by using essential
attributes. This creates two possibilities:`
Case 1: Case 2:
If all essential attributes If all essential attributes
together can determine all together can not determine all
remaining non-essential remaining non-essential
attributes; attributes;
The combination of essential The set of essential attributes
attributes is the candidate & some non-essential
key. attributes will be candidate
Determining Keys using FDs
Example: Let R = (A, B, C, D, E, F) be a relation scheme with the
following dependencies-
C→F
E→A
EC → D
A→B
Which of the following is a key for R: CD, EC, AE, AC?
Also determine the total number of candidate keys and super keys.
Determining Keys using FDs
Step 1:
● Determine all essential attributes of the given relation.
● Essential attributes of the relation are- C and E.
● So, attributes C and E will definitely be a part of every
candidate key.
Determining Keys using FDs
Step 2: Check if the essential attributes together can determine all
remaining non-essential attributes → find the closure of CE.
{ CE }+
={C,E}
={C,E,F} ( Using C → F )
={A,C,E,F} ( Using E → A )
={A,C,D,E,F} ( Using EC → D )
={A,B,C,D,E,F} ( Using A → B )
Hence CE can determine all the attributes of the given relation. So,
CE is the only possible candidate key of the relation.
Attribute Closure/Closure on
Attribute Set
Practice Questions:
Level 1: { Click Here }
Level 2: { Click Here }
Normal Forms
Formal Definition:
The normal form of a relation refers to the
highest normal form condition that it meets,
and hence indicates the degree to which it has
been normalized.
Normal forms, when considered in isolation
from other factors, do not guarantee a good
database design. It is generally not sufficient
to check separately that each relation schema
Normal Forms To be
explained in
detail later
The process of normalization through
decomposition must also confirm the
existence of 2 additional properties that the
relational schemas should possess
■ Non-additive join/lossless join property -
guarantees that the spurious tuple generation
problem does not occur with respect to the
relation schemas created after decomposition
■ Dependency preservation property -
First Normal Form (1NF)
First normal form (1NF) is now considered to
be part of the formal definition of a relation in
the basic (flat) relational model.
It states that the domain of an attribute must
include only atomic (simple, indivisible) values
and that the value of any attribute in a tuple
must be a single value from the domain of that
attribute. The only attribute values permitted
by 1NF are single atomic (or indivisible)
Second Normal Form (2NF)
Second normal form (2NF) is based on the
concept of full functional dependency. A
functional dependency X → Y is a full
functional dependency if removal of any
attribute A from X means that the dependency
does not hold anymore; that is, for any
attribute A ε X, (X − {A}) does not functionally
determine Y.
Formal Definition:
Second Normal Form (2NF)
The test for 2NF involves testing for functional
dependencies whose left-hand side attributes
are part of the primary key. If the primary key
contains a single attribute, the test need not
be applied at all.
Second Normal Form (2NF)
<StudentProject>
StudentID ProjectID StudentNa ProjectNam
me e
S89 P09 Olivia Geo Location
S76 P07 Jacob Cluster
Exploration
S56 P03 Ava IoT Devices
S92 P05 Alexandra Cloud
Deployment
The primary key attributes are StudentID and
Second Normal Form (2NF)
As stated, the non-prime attributes i.e. StudentName and
ProjectName should be functionally dependent on part of a
candidate key, to be Partial Dependent.
The StudentName can be determined by StudentID, which
makes the relation Partial Dependent.
The ProjectName can be determined by ProjectID, which
makes the relation Partial Dependent.
Therefore, the <StudentProject> relation violates the 2NF
in Normalization and is considered a bad database design.
Second Normal Form (2NF)
Table converted to 2NF
To remove Partial Dependency and violation on 2NF,
decompose the above tables −
<StudentInfo>
StudentID ProjectID StudentNa
me
S89 P09 Olivia
S76 P07 Jacob
S56 P03 Ava
S92 P05 Alexandra
Second Normal Form (2NF)
<ProjectInfo>
ProjectID ProjectName
P09 Geo Location
P07 Cluster Exploration
P03 IoT Devices
P05 Cloud Deployment
Third Normal Form (3NF)
A relation is in third normal form, if there is no transitive
dependency for non-prime attributes as well as it is in second
normal form.
A relation is in 3NF if at least one of the following condition
holds in every non-trivial functional dependency X –> Y:
1. X is a super key.
2. Y is a prime attribute (each element of Y is part of some
candidate key).
In other words,
A relation that is in First and Second Normal Form and in
Third Normal Form (3NF)
Example(rollno, game,
feestructure)
Rollno Game Feestructure FDs:
1 Basketball 500
{rollno -> game,
rollno -> feestructure,
2 Basketball 500
game -> feestructure}
3 Basketball 500
4 Cricket 600 PK: rollno
5 Cricket 600
6 Cricket 600
Third Normal Form (3NF)
Example
Student (rollno, game,
1NF ?
feestructure)
Rollno Game Feestructure
Yes, since there are no
1 Basketball 500 multivalued attributes.
2 Basketball 500
2NF ?
3 Basketball 500
Yes, since all non-key
4 Cricket 600
attributes are fully functional
5 Cricket 600 dependent on the primary
6 Cricket 600 key (rollno).
3NF ?
No, because there is
transitive dependency i.e.
Third Normal Form (3NF)
Example Why do we need for this table to
Student (rollno, game,
be in 3NF?
feestructure)
Rollno Game Feestructure student table is also suffering
1 Basketball 500 from all three anomalies −
2 Basketball 500 ● Insertion anomaly − A
3 Basketball 500
new game can't be inserted
into the table unless we get
4 Cricket 600
a student to play that game.
5 Cricket 600 ● Deletion anomaly − If
6 Cricket 600 rollno 7 is deleted from the
table we also lost the
complete information
regarding tennis.
● Updation anomaly −To
Decomposition for 3NF
Example Dividing into smaller tables
Student (rollno, game,
If X->Y is transitive
feestructure)
Rollno Game Feestructure dependency, divide R into
1 Basketball 500 R1(X+) and R2(R-Y+).
2 Basketball 500 Game->feestructure is a
3 Basketball 500 transitive dependency [since
4 Cricket 600 neither game is a key nor fee
5 Cricket 600 is a key attribute]
6 Cricket 600 R1=game+=(game,
feestructure)
R2=(student-feestructure+) =
(rollno,game)
Decomposition for 3NF
R1
R2
Boyce Codd Normal Form (BCNF)
For a table to satisfy the Boyce-Codd Normal Form, it should satisfy
the following two conditions:
1. It should be in the Third Normal Form.
2. And, for any dependency A → B, A should be a super key.
Hence for every FD, we have to check the LHS. If each one’s a
superkey, only then is our table in BCNF
Note: Please study examples of all normal forms from 1NF up to
BCNF.
Fourth Normal Form (4NF)
A relation will be in 4NF if:
1. It is in Boyce Codd normal form, and
2. It has no multi-valued dependency
For a dependency A → B, if for a single value of A, multiple values
of B exist, then the relation will be a multi-valued dependency.
A multivalued dependency is represented by : A → → B
Multi valued dependency
A table is said to have multi-valued dependency, if the following
conditions are true,
1. For a dependency A → B, if for a single value of A, multiple value
of B exists, then the table may have multi-valued dependency.
2. Also, a table should have at-least 3 columns for it to have a
multi-valued dependency.
3. And, for a relation R(A,B,C), if there is a multi-valued
dependency between, A and B, then B and C should be
independent of each other.
Multi valued dependency
Example: College enrolment table with columns s_id, course and
hobby
Note that
• Student with s_id = 1 has opted for two courses, Science and
Maths, and has two hobbies, Cricket and Hockey.
• The problem: he two records for student with s_id 1, will give rise
to two more records, because for one student, two hobbies exists,
Multi valued dependency
There is no relationship between the columns course and hobby.
They are independent of each other. So there is multi-value
dependency, which leads to unnecessary repetition of data and
might give rise to anomalies.
Getting rid of MVD
Fifth Normal Form (5NF)
A relation is in 5NF if:
● It is in 4NF, and
● It cannot further be decomposed as a lossless join, i.e., the Join
Dependency must be implied by the candidate keys.
5NF is satisfied when all the tables are broken into as many tables
as possible in order to avoid redundancy.
5NF is also known as Project-join normal form (PJ/NF).
Fifth Normal Form (5NF)
Relation: TEACHERS
This table is in 4th normal form as there are no multivalued
dependencies here.
Now, try to split the relation into 2 sub relations say R1 and R2. Let
Fifth Normal Form (5NF)
Relation 1 (R1):
Relation 2 (R2):
Fifth Normal Form (5NF)
Now, note that the redundancy in the tables has been eliminated.
But, relevant information like English being taught by Anju to class
10th has been lost.
Try creating a natural join over the column T_Name for relations R1
and R2. Let this relation be R4. The data in resultant relation (R4) is:
Join Dependency
Join dependency states that a relation once decomposed into two or
more sub relations, must be capable of being joined back to the
original relation. This can only be achieved when we maintain
proper combinations of sub relations which when joined form the
original relation in a lossless manner.