Normalization
A technique for identifying
table structures that have
potential maintenance
problems
Normalization
Normalization removes duplication and
minimizes redundant chunks of data.
Normalization is a set of formal conditions
that assure that a database is maintainable.
The results of a well executed normalization
process are the same as those of a well
planned E-R model
PROCESS OF DATA
NORMALIZATION
ELIMINATE REPEATING GROUPS
Make a separate table for each set of related
attributes and give each table a primary key.
ELIMINATE REDUNDANT DATA
If an attribute depends on only part of a
multivalued key, remove it to a separate table.
ELIMINATE COLUMNS NOT DEPENDENT ON KEY
If attributes do not contribute to a description of
the key, remove them to a separate table.
Database Programming and Design
PROCESS OF DATA
NORMALIZATION
ISOLATE INDEPENDENT MULTIPLE RELATIONSHIPS
No table may contain two or more 1:n or n:m
relationships that are not directly related.
ISOLATE SEMANTICALLY RELATED MULTIPLE
RELATIONSHIPS
There may be practical constraints on information
that justify separating logically related many-tomany relationships.
Database Programming and Design
Anomalies
An anomaly is essentially an
erroneous change to data, more
specifically to a single record.
Anomalies
A table anomaly is a structure for which
a normal database operation cannot
be executed without information loss
or full search of the data table
Insertion Anomaly
Deletion Anomaly
Update or Modification Anomaly
Anomalies
Insert anomalyCaused when a record is
added to a detail table, with no related
record existing in a master table.
Anomalies
Delete anomalyCaused when a record
is deleted from a master table, without
first deleting all sibling records, in a
detail table. The exception is a cascade
deletion, occurring when deletion of a
master record automatically deletes all
child records in all related detail tables,
before deleting the parent record in the
master table.
Anomalies
Delete anomalyCaused when a record
is deleted from a master table, without
first deleting all sibling records, in a
detail table. The exception is a cascade
deletion, occurring when deletion of a
master record automatically deletes all
child records in all related detail tables,
before deleting the parent record in the
master table.
Anomalies
Update anomalyThis anomaly is
similar to deletion in that both
master and detail records must be
updated to avoid orphaned detail
records. When cascading, ensure
that any primary key updates are
propagated to related child table
foreign keys.
Normal Forms
Relational theory defines a number of
structure conditions called Normal
Forms that assure that certain data
anomalies do not occur in a
database.
Normal Forms
1st Normal Form (1NF)Eliminate
repeating groups such that all records
in all tables can be identified uniquely
by a primary key in each table. In
other words, all fields other than the
primary key must depend on the
primary key.
Normal Forms
2nd Normal Form (2NF)All non-key
values must be fully functionally
dependent on the
primary key. No partial dependencies
are allowed. A partial dependency
exists when a field is
fully dependent on a part of a
composite primary key.
Normal Forms
3rd Normal Form (3NF)Eliminate
transitive dependencies, meaning that
a field is indirectly
determined by the primary key. This is
because the field is functionally
dependent on another
field, whereas the other field is
dependent on the primary key.
Normal Forms
Boyce-Codd Normal Form (BCNF)
Every determinant in a table is a
candidate key. If there is
only one candidate key, 3NF and BCNF
are one and the same.
Normal Forms
4th Normal Form (4NF)Eliminate
multiple sets of multivalued
dependencies.
Normal Forms
5th Normal Form (5NF)Eliminate
cyclic dependencies. 5NF is also
known as Projection
Normal Form (PJNF).
Normal Forms
Domain Key Normal Form (DKNF)
DKNF is the ultimate application of
normalization and is
more a measurement of conceptual
state, as opposed to a transformation
process in itself.
Normal Forms
1NF
2NF
3NF
BCNF
keys
4NF
Keys; No repeating groups
No partial dependencies
No transitive dependencies
Determinants are candidate
No multivalued dependencies
1st Normal Form (1NF)
Eliminates repeating groups.
Defines primary keys.
All records must be identified uniquely with a primary key. A
primary key is unique and thus
no duplicate values are allowed.
All fields other than the primary key must depend on the
primary key, either directly or
indirectly.
All fields must contain a single value.
All values in each field must be of the same datatype.
Create a new table to move the repeating groups from the
original table.
a table in 0th Normal Form
detail of the AUTHORSBOOKS table shown in Figure 4-7,
demonstrating that
leaving a table with no Normal Forms applied at all is completely
silly.
how comma-delimited lists are used as another
common method of displaying 0th Normal Form data,
including repeated groups.
the application of 1NF, removing repeating fields by creating a
new table where the
original and new table are linked in a master-detail, one-tomany relationship.
primary keys are created on both tables where the detail table has a
composite primary
key. The composite primary key contains the master table primary
key field as the prefix field of its
primary key. Therefore, the prefix field AUTHOR on the BOOK
table is the foreign key pointing back to the
master table AUTHOR.
shows what the data looks like in the altered AUTHOR table and
the new BOOK table, previously
the AUTHORSBOOKS table.
2nd Normal Form (2NF)
The table must be in 1NF.
All non-key values must be fully functionally dependent on the primary
key. In other words,
non-key fields not completely and individually dependent on the primary
key are not allowed.
Partial dependencies must be removed. A partial dependency is a
special type of functional dependency that exists when a field is fully
dependant on a part of a composite primary key.
Stating the previous two points in a different way, remove fields that are
independent of the primary key.
Create a new table to separate the partially dependent part of the
primary key and its dependent
fields.
the BOOK table is
in 1NF after separation of repeating group books from the authors. The publisher and subject information
are relatively static compared with books.
the initial stage of the application of 2NF, removing static publisher
and subject
information from the more dynamic BOOK transaction table.
many-to-one relationships are established between dynamic and
static tables, namely
BOOK to PUBLISHER and BOOK to SUBJECT tables.
primary keys are created on both the PUBLISHER and SUBJECT tables to uniquely identify
individual publishers and subjects within their two respective tables. Identifying relationships as
BOOK
related to PUBLISHER and BOOK related to SUBJECT causes the publisher and subject primary key
values to be included in the composite primary key of the BOOK table.
changing
the relationships between dynamic and static tables from
identifying to non-identifying.
what the data looks like in the altered BOOK table with the new PUBLISHER and SUBJECT
tables shown as well. Multiple fields of publisher and subject field information previously duplicated
on the BOOK table (as shown in Figure 4-15) is now separated into the two new PUBLISHER and
SUBJECT tables, with duplicate publishers and subjects removed from the new tables.
3rd Normal Form (3NF)
The table must be in 2NF.
Eliminate transitive dependencies.
Atransitive dependency is where a field is
indirectly determined
by the primary key because that field is
functionally dependent on a second field,
where that
second field is dependent on the primary key.
Create a new table to contain any
separated fields.
Figure 4-23 shows one of the easiest interpretations of 3NF where
a many-to-many relationship presents
the possibility that more than one record will be returned using a
query joining both tables.
Figure 4-24 shows employees and tasks from the 2NF version on the left of the diagram in Figure 4-23.
Employees perform tasks in their daily routines, doing their jobs. If you were searching for the
employee
Columbia, three tasks would always be returned. Similarly, if searching for the third task shown in
Figure 4-24, two employees would always be returned. A problem would arise with this situation when
searching for an attribute specific to a particular assignment where an assignment is a single task
assigned to a single employee.
The transformation in Figure 4-25 could be conceived as being two 2NF transformations because a
many-to-one relationship is creating a more static table by creating the FOREIGN_EXCHANGE table.
A transitive dependency occurs where one field depends on another, which in turn depends on a
third
fieldthe third field typically being the primary key. A state of transitive dependency can also be
interpreted
as a field not being entirely dependent on the primary key.
There is usually a good reason for including calculated fieldsusually performance denormalization.
(Denormalization is explained as a concept in a later chapter.) In a data warehouse, calculated fields
are sometimes stored in materialized views. Data warehouse database modeling is also covered in a
later chapter.
Beyond 3rd Normal Form
(3NF)
BCNF or Boyce-Codd Normal Form (BCNF)Every determinant in
a table is a candidate key. If
there is only one candidate key, then 3NF and BCNF are one and
the same.
4th Normal Form (4NF)Eliminate multiple sets of multi-valued
dependencies.
5th Normal Form (5NF)Eliminate cyclic dependencies. 5NF is
also known as Projection normal
form (PJNF).
Domain Key normal form (DKNF)DKNF is the ultimate
application of normalization and is
more of a measurement of conceptual state as opposed to a
transformation process in itself.
Figure 4-30 shows removal of two often to be NULL valued fields from a table called EDITION,
creating
the new table called RANK. The result is a zero or one-to-one relationship between the RANK
and EDITION
tables. This implies that if a RANK record exists, then a corresponding EDITION record must
exist as well.
In the opposite case, however, an EDITION record can exist where a RANK record does not
have to exist.
Figure 4-31 shows a data picture of the normalized structure shown
at the lower-right of the diagram
in Figure 4-30. What has effectively happened is that potentially
NULL valued fields are moved into a
new table, creating a one-to-one or zero relationship.
Beyond 3rd Normal Form
(3NF)
Boyce-Codd Normal Form (BCNF)
BCNF does the following.
A table must be in 3NF.
A table can have only one candidate key.
A candidate key has potential for being a tables primary key. A table is not
allowed more than one
primary key because referential integrity requires it as such. It would be
impossible to check foreign keys
against more than one primary key. Referential integrity would be
automatically invalid, unenforceable,
and, thus, there would be no relational database model.
Beyond 3rd Normal Form
(3NF)
4th Normal Form (4NF)
4NF does the following.
A table must be in 3NF or BCNF with 3NF.
Multi-valued dependencies must be transformed into functional
dependencies. This implies
that one value and not multiple values are dependent on a primary key.
Eliminate multiple sets of multiple valued or multi-valued dependencies,
sometimes described
as non-trivial multi-valued dependencies.
A multiple valued set is a field containing a comma-delimited list or collections
of some kind. A collection
could be an array of values of the same type. Those multiple values are
dependent as a whole on the
primary key (the whole meaning the entire collection in each record).
Beyond 3rd Normal Form
(3NF)
5th Normal Form (5NF)
5NF does the following.
A table must be in 4NF.
Cyclic dependencies must be eliminated. A cyclic dependency is simply
something that depends
on one thing, where that one thing is either directly in indirectly dependent
on itself.
Beyond 3rd Normal Form
(3NF)
5th Normal Form (5NF)
5NF does the following.
A table must be in 4NF.
Cyclic dependencies must be eliminated. A cyclic dependency is simply
something that depends
on one thing, where that one thing is either directly in indirectly dependent
on itself.
Premier Products Order
Form
(Company
Order History)
Order # 12003
Date Oct 1, 1997
Oklahoma Retail Company
1111 Asp
Norman
1.
2.
3.
Description
Code Qty
Price Amount
Footballs
Sweat Shirts
Shorts
21
44
37
25.00 150
15.00 300
12.00 120
Total
6
20
10
570
0nf: Remove titles and
derived quantities
Order # 12003
Date Oct 1, 1997
Oklahoma Retail Company
1111 Asp
Norman
1.
2.
3.
Description
Code Qty
Price Amount
Footballs
Sweat Shirts
Shorts
21
44
37
25.00 150
15.00 300
12.00 120
Total
6
20
10
570
0 Normal Form
Remove titles and derived quantities
Schema notation
HISTORY(CustName, CustAddr, CustCity
{OrderNum, OrderDate {ProdDescr,
ProdCode, QtyOrdered, OrderPrice}}
1st Normal Form
Add keys
Remove repeating groups
1st Normal Form
Add Keys for embedded entities
Remove Repeating Groups
HISTORY(CustID, CustName, CustAddr,
CustCity {OrderNum, OrderDate {ProdDescr,
ProdCode, QtyOrdered, OrderPrice}}
1st Normal Form
Add Keys for embedded entities
Remove Repeating Groups
Create a table for each embedded entity,
from the outside for nested groups
Insert foreign keys and junction tables
CUSTOMER(CustID, CustName, CustAddr,
CustCity)
ORDER(OrderNum, CustID, OrderDate {ProdDescr,
ProdCode, QtyOrdered, OrderPrice})
1st Normal Form
CUSTOMER(CustID, CustName,
CustAddr, CustCity)
ORDER(OrderNum, CustID, OrderDate)
PRODUCT(ProdDescr, ProdCode,)
ORDER-PRODUCT(OrderNum,
ProdCode, QtyOrdered, OrderPrice)
1st Normal Form
CUSTOMER
ORDER
PRODUCT
1NF
(Keys, No Repeating Groups)
Table contains multi-valued attributes.
TABLE
{ ATTRIBUTES}
TABLE
??
ATTR-TABLE
2nd Normal Form
No partial dependencies
(an attribute has a partial
dependency if it depends
on part of a concatenated
key)
2nd Normal Form
ROSTER(StuID, ZAPNum, StudentName,
CourseTitle, CourseGrade)
Remove partial dependencies
STUDENT(StuID, StudentName)
SECTION(ZAPNum, CourseTitle)
STUDENT-SECTION(StuID, ZAPNum,
CourseGrade)
2nd Normal Form
ROSTER
STUDENT
SECTION
STUDENT-SECTION
2NF
No partial dependencies
Table has data from several connected tables.
TABLE
??
TABLE
??
3rd Normal Form
No transitive dependencies
(a transitive dependency is
an attribute that depends
on other non-key
attributes)
3rd Normal Form
Note: a transitive dependency arises
when attributes from a second entity
appear in a given table.
SECTION(ZAPNum, RoomNum, Day,
Time, CourseTitle, HoursCredit)
3rd Normal Form
SECTION(ZAPNum, RoomNum, Day,
Time, CourseID ,CourseTitle,
HoursCredit)
SECTION(ZAPNum, RoomNum, Day,
Time, CourseID)
COURSE(CourseID, CourseTitle,
HoursCredit)
3NF
No transitive dependencies
Table contains data from an embedded entity with
non-key attributes.
TABLE
TABLE
SUB-TABLE
??
SUB-TABLE
BCNF is the same, but the embedded table may
involve key attributes.
Boyce Codd Normal Form
Every determinant is a
candidate key
BCNF
BCNF dependenceies are like 3nf
dependencies but they involve some
key attributes
Note: BCNF often arises when a 1:m
relationship is modeled as a m:n
relationship
BCNF
SALESMAN-CUST(SalesID, CustID,
Commission)
SALESMAN(SalesID, Commission)
CUSTOMER(CustID, SalesID)
4th Normal Form
No multi-valued
dependencies
4th Normal Form
Note: 4th Normal Form violations
occur when a triple (or higher)
concatenated key represents a pair
of double keys
4th Normal Form
4th Normal Form
Multuvalued dependencies
Instructor
Book
Class
Price
Inro Comp
MIS 2003
Parker
Intro Comp
MIS 2003
Kemp
Data in Action
MIS 4533
Kemp
ORACLE Tricks MIS 4533
Warner
Data in Action
Warner
ORACLE Tricks MIS 4533
MIS 4533
4th Normal Form
INSTR-BOOK-COURSE(InstrID, Book,
CourseID)
COURSE-BOOK(CourseID, Book)
COURSE-INSTR(CourseID, InstrID)
4NF
(No multivalued dependencies)
Independent repeating groups have been treated as a
complex relationship.
TABLE
TABLE
TABLE
TABLE
TABLE
TABLE