NORMALIZATION
Normalization
• It is a process of decomposing ‘unsatisfactory’ relations to smaller
relations.
• Normalization helps eliminate redundancy, organizes data
efficiently and reduces potential anomalies during data operations
(insertion, updating and deletion operations)
The main Normal forms are:
• First Normal Form (1 NF)
• Second Normal Form (2 NF)
• Third Normal Form (3 NF)
• Boyce Codd Normal Form (BCNF)
2
First Normal Form (1NF)
• The first normal form states that domains of attributes must
include only atomic (simple, indivisible) values and the values of
any attribute in a tuple must be a single value.
• The 1NF also disallows composite attributes that are themselves
multi valued. These are called nested relations because each tuple
can have a relation within a relation.
3
Example 01
Department table
Dno Dname ManagerEno Dloc
1 HQ 100 Colombo
2 Marketing 200 Colombo
Kandy
3 Reserach 300 Galle
Gampaha
N’eliya
This table not in First Normal Form because Dloc is a Multivalued Attribute. Therefore, you
have to break this table into two different tables.
As this relation contains multi valued attributes, it is not in 1 NF.
Therefore, break the table into two tables.
Department
Dno Dname ManagerEno
Department_Location
Dno Dloc
Now both tables are in First Normal Form since all the attributes are single not
multivalued.
Example 02
Emp-Project { Eno, Ename, Address { Pno, hours} }
• This relation is an example of a nested relation. Such relations are
said to be un-normalized. In order to represent the information in
a relational model, normalization must be carried out. This is
done by removing the repeating groups.
7
Second Normal Form (2NF)
• Fully Functional Dependency
• ‘B’ is fully functionally dependent on ‘A’, if it is functionally
dependent on ‘A’ and not functionally dependent on any part of ‘A’.
• 2 NF is based on the concept of full functional dependency.
• A relational schema ‘R’ is in 2 NF if every non-key attribute A in
‘R’ is fully functionally dependent on the primary key of ‘R’.
Non key attributes
Example:
Student (Sno, Sname, Marks)
10
Key Attribute
Example:
Items (Invoice_No, Item_No, Item_Name, Invoice_Date, Order_Qty)
Suppose the PK is {Invoice_No, Item_No} and:
• Invoice_No can be used to find Invoice_Date
• Item_No can be used to find Item_Name
• Invoice_No and Item_No can be used to find Order_Qty.
12
Third Normal Form (3NF)
Transitive Dependency
If X, Y and Z are attributes and if X ->Y and
Y->Z, then Z is transitively dependent on X.
(X ->Z)
Condition A:
A relation is in 3 NF if and only if it is in 2NF and every non key attribute is
non transitively dependent on the primary key.
Condition B:
Suppose in a relation R, a functional dependency X ->A exists, then the
following conditions must be satisfied:
X is a super key of R
OR
A is a prime attribute of R (when X is not a super key)
Super Key
• A super key is a set of one or more attributes (columns), which can
uniquely identify a row in a table. Often we get confused between super
key and candidate key.
• Candidate keys are selected from the set of super keys, the only thing we
take care while selecting candidate key is: It should not have any
redundant attribute. That’s the reason they are also termed as minimal
super key.
Let’s take an example to understand this:
NIC Employee_Number Emp_Name
883200837V 226 Gayan
893245679V 227 Kasun
915478956V 228 Waruna
Super keys: The above table has following super keys. All of the following sets of super key are able to
uniquely identify a row of the employee table.
{NIC}
{Employee_Number}
{NIC, Employee_Number}
{NIC, Emp_Name}
{Employee_Number, Emp_Name}
{NIC, Employee_Number, Emp_Name}
Candidate Keys: A candidate key is a minimal super key with no redundant attributes. The following two set
of super keys are chosen from the above sets as there are no redundant attributes in these sets.
{NIC}
{Employee_Number}
Only these two sets are candidate keys as all other sets are having redundant attributes that are not
necessary for unique identification.
Example 1
Consider the relation Supplier = {Sno, Pno, Sname, City, Status,
Pname, Qty} and the functional dependencies:
{Sno, Pno} -> {Qty}
{Sno} -> {Sname, City}
{Pno} -> {Pname}
{City} ->{Status}
Assume that the primary key of Supplier is {Sno, Pno}.
Decompose R in to 3NF relations.
Example 2:
Consider the relation R = {A, B, C, D, E, F, G,H, I} and the
functional dependencies,
{A, B} -> {E, F, H}
{A} ->{D}
{B} -> {C, G}
{G} -> {I}
Assume that the primary keys of R to be A and B. Decompose R into
2NF , then into 3NF relations.
Example 3:
Consider the relation R = {A, B, C, D, E, F, G,H, I} and the
functional dependencies,
{A, B} -> {C}
{A} ->{D,E}
{B} ->{F}
{F}->{G,H}
{D} -> {H, I}
Assume that the primary keys of R to be A and B. Decompose R
into 2NF, then into 3NF relations.
Boyce Codd Normal Form (BCNF)
A relational schema R is in BCNF if whenever a functional dependency X ->A
holds in R, then X is a super key of R.
Example 1:
The following relation is in 3NF but not in BCNF.
A B C D
After decomposing the relation to meet BCNF:
A B C D
C B
Example 2:
Decompose the following relation to meet BCNF.
E# Specialty Manager
1) Emp (E#, Specialty, Manager)
2) Manager (Manager, Specialty)
Example 5
A B C D E F G
Is this relation normalize? Decompose the table into suitable normalization form
2NF
A B C D A E F G
3NF
A B C D A E E F G
BCNF
A B C D A E E F G
D B