DATABASE SYSTEMS 6th Edition
By Thomas Connolly & Carolyn Begg
Video Link: https://youtu.be/UX6aqgfEiSw
Chapter 14 - Objectives
What is Normalization
Purpose of normalization.
Characteristics of a good Relation set.
Data Redundancy
Data Redundancy and Update Anomalies
Insertion Anomaly
Deletion Anomaly
Modification Anomaly 2
Normalization
NORMALIZATION is a database design technique that organizes tables in a
manner that reduces redundancy and dependency of data.
A technique for producing a set of relations with desirable properties, given the
data requirements.
Main objective in developing a logical data model for relational database systems
is to create an accurate representation of the data, its relationships, and
constraints.
To achieve this objective, must identify a suitable set of relations.
3
Characteristics of a Suitable set of
relations
1. The minimal number of attributes necessary to support the data requirements
of the enterprise.
2. Attributes with close logical relationship (described as functional
dependency) are found in the same relation.
3. Minimal redundancy , with each attribute represented only once, with the
important exception of attributes that form all or part of foreign keys, which
are essential for the joining of related relations.
Benefits
The benefits of using a database that has a suitable set of relations is:
Database will be easier for the user to access
Easier to maintain data.
Takes minimal storage space on computer.
Normalization
Four most commonly used normal forms are first (1NF), second (2NF) and third (3NF)
normal forms, and Boyce–Codd normal form (BCNF).
Based on functional dependencies among the attributes of a relation.
The inventor of the relational model Edgar Codd proposed the theory of normalization
with the introduction of the First Normal Form, and he continued to extend theory with
Second and Third Normal Form. Later he joined Raymond F. Boyce to develop the
theory of Boyce-Codd Normal Form.
A relation can be normalized to a specific form to prevent possible occurrence of Data
redundancy and update anomalies.
6
Data Redundancy
Major aim of relational database design is to group attributes into relations to
minimize data redundancy and reduce file storage space required by base relations.
Problems associated with data redundancy are illustrated by comparing the
following Staff and Branch relations with the StaffBranch relation.
Staff (staffNo, sName, position, salary, branchNo)
Branch (branchNo, bAddress)
StaffBranch (staffNo, sName, position, salary, branchNo, bAddress)
7
Data Redundancy
8
Data Redundancy
StaffBranch relation has redundant data: details of a branch are repeated for every
member of staff.
In contrast, branch information appears only once for each branch in Branch
relation and only branchNo is repeated in Staff relation, to represent where each
member of staff works.
9
Data Redundancy
10
Update Anomalies
Relations that contain redundant information may potentially suffer from
update anomalies.
Types of update anomalies include:
Insertion
Deletion
Modification.
11
Insertion Anomalies
New member of staff joins branch B005
Insert new row into StaffBranch table
Type wrong address: 163 Main St, Glasgow.
Database is now inconsistent!
Establish new branch with no members of staff
B008, 57 Princes St, Edinburgh
No staff members, so staffNo must be NULL
But staffNo is the primary key of the StaffBranch table, so cannot be NULL!
12
Deletion Anomaly
Mary Howe, staffNo SA9, leaves the company
Delete the appropriate row of StaffBranch
This also deletes details of branch B007 where Mary Howe works
But no-one else works at branch B007, so we no longer know the address of this branch!
13
Modification Anomaly
Branch B003 has transferred to a new location
New address is 145 Main St, Glasgow
Must change three rows of the StaffBranch relation
We can avoid these anomalies by decomposing the original relation into Staff
and Branch.
There are two important properties associated with decomposition of larger
relation into smaller, i.e., Loss less-Join and Dependency Preservation.
14