Database
Normalisation
A Level Computer Science
Starter
Define the following on your screens:
Primary Key
Composite Primary Key
Entity
Attribute
Relationship
Database Structure
We learned a few basic ideas for structuring a
relational database last lesson:
Each entity should have its own table
Foreign keys create one-to-many relationships
A many-to-many relationship needs a link table
Database Normalisation
While these basic ideas will work nicely for smaller
structures, we need a formal set of rules for larger
databases
Normalisation is the formal process for creating
the most efficient database structure
Normalised databases will:
Minimize data repetition
Eliminate data redundancy
Eliminate update anomalies
Normalisation
There are three stages to normalisation:
First normal form
Second normal form
Third normal form
Forthe exam, you need to understand what each
stages does and why it is needed, but you are
unlikely to need to fully normalise a database
from scratch.
Loans
Zero Normal Form
Loan_ID
Date
Payment_type
Customer_ID
A flat-file database First_Name
Surname
All elements in a single table Address
Book1_ID
Book1_title
Similar to a spreadsheet. Book1_author
Due_date1
Lots of data repetition and Book2_ID
Book2_title
redundancy Book2_author
Due_date2
Book3_ID
Book3_title
Book3_author
Due_date3
Loans
First Normal Form
Loan_ID
Date
Payment_type
Customer_ID
For a database to be in First_Name
Surname
First Normal Form, Address
Book1_ID
there must be no Book1_title
Book1_author
groups of repeated Due_date1
Book2_ID
fields. Book2_title
Book2_author
This database repeats Due_date2
Book3_ID
the book data 3 times Book3_title
Book3_author
Due_date3
First Normal Form
The repeated data is
taken out and put into Loans
Loan_ID
Book
its own table with its Date
Book_ID
Payment type
own primary key Customer_ID
Book_title
Book_author
First_Name
Due_date
Surname
Address
First Normal Form
We put a compound
primary key in the Loans
Book table: Loan_ID
Date
Book
Book_ID
The tables are now in Payment type Loan_ID
Customer_ID Book_title
First Normal Form First_Name Book_author
Surname Due_Date
Address
Second Normal Form
For a table to be in Second Normal Form, there must
be no partial key dependancies. This means that
none of the fields in a table with a composite key
must depend on only part of it.
Loans
Book
Loan_ID Book_title and
Book_ID
Date Book_author are
Payment type Loan_ID
dependant on
Customer_ID Book_title
Book_author
Book_ID only, so this
First_Name
Due_Date is not in Second
Surname
Address
Normal Form
Second Normal Form
Theelements dependant on part of the key are
separated
Loans Book
Book_Loan
Loan_ID Book_ID
Date Book_ID
Book_title
Payment type Loan_ID
Book_author
Customer_ID Due_Date
First_Name
Surname
Address
The tables are now in Second Normal Form
Third Normal Form
For
a database to be in Third Normal Form, there
must be no non-key dependencies
Loans Book_Loan Book
Loan_ID Book_ID Book_ID
Date Loan_ID Book_title
Payment type Due_Date Book_author
Customer_ID
First_Name
Surname First_Name, Surname and Address
Address are all dependant on Customer_ID,
which is not the table’s key. This is not
yet in Third Normal Form.
Third Normal Form
Putthe non-key dependencies into their own
table
Customers Loans Book_Loan Book
Customer_ID Loan_ID Book_ID Book_ID
First_Name Date Loan_ID Book_title
Surname Payment type Due_Date Book_author
Address Customer_ID
This database is now in Third Normal Form
How to remember
Each attribute is dependent on the key, the whole key,
and nothing but the key!
So, to get to third normal form, your non-repeating
fields (first normal form) need to be dependent on the
whole of the key (second normal form), and nothing
other than the key (third normal form).
Summary
FirstNormal Form: No groups of repeated attributes
Second Normal Form: No partial key dependencies
Third Normal Form: No non-key dependencies
Advantages of normalisation:
minimizes data repetition
Eliminates data redundancy and inconsistency
Eliminates update anomolies
Exam Question
State two reasons why database designs are usually normalised.
Reason 1:……………………………………………………….…………
………………………………………………………………………………
………………………………………………………………………………
Reason 2:………………………………………………………………….
………………………………………………………………………………
………………………………………………………………………………
(2)
Mark Scheme
All marks AO1 (understanding)
*Minimise data duplication // no unnecessary repeated data; A. reduce for minimise R. eliminate
*Eliminate data redundancy; A. reduce/minimise for eliminate
Eliminate data inconsistency // improve consistency // avoid inconsistency problems;
Eliminate update anomalies; A. example in context A. updates only need to be made in one place
Eliminate insertion anomalies; A. example in context
Eliminate deletion anomalies; A. example in context
NE. easier to update/insert/delete without concrete example or good explanation
NE. fewer errors when updating / inserting / deleting without concrete example or good explanation
NE. saving space / memory
NE. easier / faster to query
Note: Only award one of the two marks with *. ie a response cannot get two marks for discussion of
only duplication and redundancy
Exam Question 2
State two properties that the relations in a fully normalised database must have.
Property 1:…………………………………………………………………………
……………………………………………………………………………………………
…………………………………………………………………………………
Property 2:…………………………………………………………………………
……………………………………………………………………………………………
…………………………………………………………………………………
(2)
Mark Scheme 2
Data is atomic // no repeating groups (of attributes);
R No repeated columns / attributes / data / values
No partial (key) dependencies // No (non-key) attribute depends on part of the
primary key but not the whole of it // all non–prime attributes are (functionally)
dependent on the whole of every candidate key // (non-key) attributes depend on
the whole key;
No non-key dependencies // No transitive dependencies // (non-key) attributes
depend on nothing but the key;
Every (non-key) attribute is dependent upon the key;
Every determinant is a candidate key;
A “field” for “attribute”
A “part” for “partial”
Task
Complete the exam question on the VLE