BTM 382 Database Management
Chapter 6:
Normalization of Database Tables
Chitu Okoli
Associate Professor in Business Technology Management
John Molson School of Business, Concordia University, Montral
Structure of BTM 382 Database
Week 1: Introduction and overview
Management
ch1: Introduction
Weeks 2-6: Database design
ch3: Relational model
ch4: ER modeling
ch6: Normalization
ERD modeling exercise
ch5: Advanced data modeling
Week 7: Midterm exam
Weeks 8-10: Database programming
ch7: Intro to SQL
ch8: Advanced SQL
SQL exercises
Weeks 11-13: Database management
ch2,12,14: Data models
ch13: Business intelligence and data warehousing
ch9,15,16: Selected managerial topics
Review of Chapter 6:
Normalization of Database Tables
What are dependencies between attributes in a table,
and how is tracing of dependencies used to
normalize tables?
What are the normal forms in a relational database?
Why and when would you consider denormalizing
tables in a relational database?
Problems with unnormalized tables
Needless redundancy, hence insert, update and
delete anomalies (inconsistencies)
Data updates are less efficient because tables are
larger
Indexing is less effective
Views (virtual tables) are more cumbersome
Understanding dependencies to be
able to properly normalize tables
Functional dependency
Functional dependency: AB or (A,B)(C,D)
B is functionally dependent on A means that if you know A, then there
you definitely know the correct value for B
E.g. Project.ID Project.Name
Also called determination: A determines B
Full functional dependency: (A,B)C where AC and BC
When all the attributes in a key are required for the determination (none is
optional)
E.g. (Project.ID, Project.Manager) Project.Name
Project.Manager is optionalthis is not a full functional dependency
E.g. (Project.Manager, Project.StartDate) Project.Name
This is a full functional dependency, assuming a manager can launch no
more than one project on a given date
Repeating group = multivalued
attribute
Attribute whose values contain multiple values (a list
or array of values), instead of a single value
Illegal in the relational model; troublesome for
normalization if you dont catch it
Two possible solutions
(e.g. Project.ID Project.Location):
1. Create multiple attributes for each possible value (e.g.
Project.Location1, Project.Location2, Project.Location3)
2. Create a new entity to store multiple possible values (e.g.
Location)
Multivalued dependency
Functional dependency: AB
Multivalued dependency: AB,C
A determines B and A determines C, but B and C have
nothing to do with each other
E.g. Project.ID Project.EmployeeID, Project.Location
Since a project might have multiple locations and multiple
employees work on a project, the EmployeeID and Location in
the same row might have nothing to do with each other
Usually indicates that one or more multivalued attributes
were not handled properly
Partial and transitive dependencies
Partial dependency: (A,B)(C,D) and BC
(A,B) is a candidate key (e.g. primary key)
C doesnt need both A and B to determine it; it only needs B
E.g. (Project.ID,Project.ManagerID) Project.Name
and Project.ID Project.Name
Transitive dependency: A(B,C) and BC
A is a candidate key
Technically speaking, a transitive dependency requires that B and C not be part of
any candidate key. However, if you expand the meaning to include even if they
are part of the key, then you will automatically avoid problems with BCNF
A determines C, but so does B, even though B is not a candidate
key
E.g. Project.ID (Project.Client,Project.Location)
and Project.Client Project.Location
The normal forms
Normalization of relations: https://youtu.be/NwcVv1cxflk
(note and 0:34 and 1:50)
Summary of attaining normal forms
1NF: Primary key identified and no multivalued attributes
Legitimate primary key selected (unique identifying key)
Only one value per table cell; no lists/arrays (multivalued attributes) in any table cell
If you split multivalued attributes off to separate tables, then you avoid 4NF violations
2NF: 1NF minus partial dependencies
All candidate key dependencies are fully functional
(A,B)C where AC and BC
3NF/BCNF: 2NF minus transitive dependencies
Only a candidate key determines any attribute
If A(B,C), then B C
There is a technical distinction between 3NF and BCNF, but if you keep this rule, then you
take care of both 3NF and BCNF
4NF: BCNF minus multivalued dependencies
Each row strictly describes just one entity
If you split multivalued attributes into separate tables to attain 1NF, then you also avoid
4NF violations
DKNF, 5NF, 6NF
relatively rare and often not worth the trouble normalizing, even if applicable
Dependency diagram:
Basic tool for normalization
Depicts all dependencies found in a given table structure
Gives birds-eye view of all relationships among tables
attributes
Makes it less likely that you will overlook an important
dependency
3NF vs BCNF
BCNF is only an issue
because of poor selection
of primary key for 1NF step
Regardless, dealing with all
dependencies resolves
table into BCNF
Fixing 4NF problem
The only reason a table
might be in 3NF/BCNF
but not in 4NF is
because two originally
multivalued attributes
existed at 1NF stage
Multivalued attributes
should always be placed
in separate tables, or be
split into multiple
attributes
If you do this in the first
step to resolve 1NF, you
will never have
problems with 4NF
Denormalization
Denormalization
Although normalization is important, processing speed
and efficiency are also important in database design
Summary of Chapter 6:
Normalization of Database Tables
Correctly identifying dependencies from the very
beginning is critical to properly normalize tables.
The most important normal forms are 1NF, 2NF, 3NF,
BCNF and 4NF.
Although normalization to 4NF is usually important, a
designer might sometimes want to denormalize a
table to a lower normal form.
Sources
Most of the slides are adapted from
Database Systems: Design, Implementation and
Management
by Carlos Coronel and Steven Morris. 11th edition
(2015) published by Cengage Learning. ISBN 13:
978-1-285-19614-5
Other sources are noted on the slides themselves