Normalization
WITH EXAMPLES
Purpose of normalization
the purpose of normalization is to identify the best
grouping for attributes that ultimately forms relations.
Some of the characteristics of relations formed are;
Support of data requirements with minimal number of attributes
Relation holds attributes with a close logical relationship
(functional dependency)
Relation holds minimal redundancy with each attribute (except
foreign keys)
increasing the performance of updates
reducing the storage consumption
avoiding update anomalies (insertion, modification and deletion)
Why Normalization?
The database design must be efficient (performance-wise) - should be
free of update, insertion and deletion anomalies.
3 anomalies to avoid:
Insertion Anomaly – need to store a value for an
attribute but can not because the value for
another attribute is unknown
Deletion Anomaly – deleting rows may cause a
loss of important information about the entity
Modification/update Anomaly – occurs when a
change of a single attribute in one record requires
changes in multiple records
4
Example
Anomalies in this Table :
Insertion
Deletion
Modification
Anomalies in the table 5
Insertion – can’t enter a new employee if
the new employee doesn’t assign to any
course (is not allow for NULL value).
Chapter 6 Normalization
Anomalies in the table 6
Deletion – if we remove employee 140,
we lose information about the existence
of a Tax Acc class
Chapter 6 Normalization
Anomalies in the table 7
Modification/update – giving a salary
increase to employee 100 forces us to update
multiple records
Chapter 6 Normalization
Functional dependency (FD)
FD the relationship between attributes in a relation.
For an example, if EmployeeCode and FirstName are attributes
of Employee relation, we can say that FirstName is functionally
dependent on EmployeeCode. This means, each EmployeeCode is
associated with exactly one value of FirstName.
We denote this like;
EmployeeCode -> FirstName
Basically the left-hand side of the arrow is considered as
the determinant. The relationship between left to right is always
one to one (1:1).
If the right-hand attribute is fully dependent on left-hand side, we
call this dependency as full functional dependency.
Types of functional dependency
Full functional dependency
Partial dependency
If the left-hand side is a composite one (two or more attributes) and
right-hand side can be determined by part of left-hand side, then the
dependency is considered as partial dependency
For example:
Product code, receipt number product name
Product code product name
* product name is partially dependent on
product code
*Transitive dependency – see later
First normal form
In order to make sure that the relation is normalized for 1NF, we
need to make sure that;
No multiple values in intersection of each row and column
No repeatable groups in attributes (like Course1, Course2, Course3...
columns)
Order of attributes and tuples are insignificant
No duplicate tuples in the relation.
For example:
First normal form
Since the relation has no multiple values in
intersections and no repeatable groups, it is now a
1NF relation
Second normal form
In order to make sure that the relation is normalized for 2NF,
we need to make sure that;
The table need to be in 1NF
Every non-primary key attribute is fully dependent on the primary
key (There should not be partial dependency between primary key
and non primary key)
thus, in the form, the main task is to remove partial
dependency.
Step 1: identify the functional dependencies
StudentCode, Course -> DateRegistered
StudentCode -> Name, Town, Province
Town -> Province
Step 2: identify the candidate key
Studentcode + course
Step 3: identify the type of functional dependency
Full FD : StudentCode, Course -> Name, Town, Province,
Course, DateRegistered
Partial dependency : StudentCode -> Name, Town, Province
*Transitive dependency
Second Normal form
Step 4: to remove partial dependency from the original
table (Taking studentcode + course as primary key)
Third normal form
In order to make sure that the relation is
normalized for 3NF, we need to make sure that;
The relation need to be in 1NF and 2NF
no non-primary-key attribute is transitively
dependent on the primary key
Transitive dependency:
A B
B C
Thus, AC
Third normal form
In this form, is to remove transitive dependency if
they are exist
Transitive dependency
Studentcode Town
Town Province
Boyce-Codd normal form
(BCNF/3.5NF)
In order to make sure that the relation is
normalized for BCNF, we need to make sure that;
Relation need to be in 1Nf, 2NF and 3NF.
every non-primary-key determinant is a candidate key with
identified functional dependencies. The definition goes
as A relation is in BCNF, if and only if, every determinant
is a candidate key.
What does it exactly means?
For example
Assume that business rules related to this relation are as follows;
Course has one or more subjects.
Course is managed by one or more lecturers.
Subject is taught by one or more lecturers.
Lecturer teaches only one subject.
Let's list out all possible
functional dependencies.
Course, Subject -> Lecturer
Course, Lecturer -> Subject
Lecturer -> Subject
If you consider the primary key of this table is Course + Subject,
then no violation of 1NF, 2NF and 3NF.
Course + Lecturer is also a candidate key as we can identify
tuples uniquely using it
cannot make Lecturer as a primary key because it has
duplicates. Now you have a determinant that cannot be set as a
primary key, hence it violates BCNF.
Let's list out all possible
functional dependencies.
Course, Subject -> Lecturer
Course, Lecturer -> Subject
Lecturer -> Subject
BCNF
In order to make the table BCNF
table, need to decompose as below:
Forth Normal Form (4NF)
This normal form handles multi-valued
dependencies caused by 1NF.
When we see repeated groups or multiple values in
an intersection, we add additional tuples removing
multiple values. That is what we do with 1NF.
When there are two multi-value attributes in a
relation, then each value in one of the attributes has
to be repeated with every value of the other
attribute. This situation is referred as a multi-valued
dependency.
Forth Normal form
In order to make sure that the relation is normalized for BCNF,
we need to make sure that;
A relation that is in Boyce-Codd normal form and
does not contain nontrivial multi-valued dependencies.
(meaning tit should be a trivial dependency)
What is trivial dependency:
A ->B is trivial functional dependency if B is a subset of A.
The following dependencies are also trivial: A->A & B->B
For example: Consider a table with two columns Student_id and
Student_Name.
Student_id student_id student_id + student_name student_id
Student_name student_name
For example:
If we apply 1NF to this relation;
Forth Normal form
See the CustomerContacts table.
CustomerCode determines multiple Telephone (CustomerCode
->> Telephone) and
CustomerCode determines multiple Address (CustomerCode ->>
Address).
The above are non-trivial dependency, thus need to remove: