N O R M A L I Z AT I O N
• Different anomalies in designing a database,
• The idea of normalization, Functional
dependency,
• Armstrong’s Axioms (proofs not required),
Closures and their computation, Equivalence of
• Functional Dependencies (FD),
• Minimal Cover.
• First Normal Form (1NF),
• Second Normal Form (2NF),
• Third Normal Form (3NF),
• Boyce Codd Normal Form (BCNF),
• Lossless join and dependency preserving
Different anomalies in the
design of a database
⚫ Ifa table is not properly normalized
and have data redundancy then it
willnot
extra memory
only eat space
up but will
difficult to handle also make it
and
without update
facing data loss. the database,
⚫ Insertion, Updation and Deletion
Anomalies are very frequent if database is
not normalized.
Student
table
rollno name branch hod office_tel
401 Arun CSE Mr. X 53337
402 Aiman CSE Mr. X 53337
403 Simon CSE Mr. X 53337
404 Dkon CSE Mr. X 53337
⚫ As we can see, data for the fields
branch, hod and office_tel is repeated for
the students who are in the same
branch in the college, this is Data
Redundancy.
Insertion Anomaly
⚫ Suppose for a new admission, until and
unless a student opts for a branch, data
of the student cannot be inserted, or
else we will have to set the branch
information as NULL.
⚫ Also, if we have to insert data of 100
students of same branch, then the branch
information will be repeated for all those
100 students.
⚫ These scenarios are nothing but
Insertion anomalies.
Updation Anomaly
⚫ What if Mr. X leaves the college? or Mr. X
is no longer the HOD of
computer science department?
⚫ In that case all the student records will
have to be updated, and if by mistake
we miss any record, it will lead to data
inconsistency.
⚫ This is Updation anomaly.
Deletion Anomaly
⚫ In our Student table, two different
informations are kept together,
Student information and Branch
information.
⚫ Hence, at the end of the academic
year, if student records are deleted, we
will also lose the branch information.
⚫ This is Deletion anomaly.
What is Normalization?
⚫ Normalization is a database design
technique that reduces data
redundancy and eliminates undesirable
characteristics like Insertion, Update and
Deletion Anomalies.
⚫ Normalization rules divides larger
tables into smaller tables and links them
using relationships.
⚫ The purpose of Normalisation in SQL
is to eliminate redundant (repetitive) data
and ensure data is stored logically.
Functional dependencies in
DBMS
⚫A functional dependency is a
constraint that specifies the relationship
between two sets of attributes where
one set can accurately determine the
value of other sets.
⚫ It is denoted as X → Y, where X is a
set of attributes that is capable of
determining the value of Y.
⚫ X is called Determinant,
⚫ Y is called the Dependent.
Functional dependencies:
roll_no name dept_name dept_building
42 abc CO A4
43 pqr IT A3
44 xyz CO A4
Valid functional
dependencies:
⚫ roll_no → { name, dept_name,
dept_building },→
◦ Here, roll_no can determine values of fields
name, dept_name and dept_building, hence
a valid Functional dependency
⚫ roll_no → dept_name ,
◦ Since, roll_no can determine whole set of
{name, dept_name, dept_building}, it can
determine its subset dept_name also.
⚫ dept_name → dept_building ,
◦ Dept_name can identify the
dept_building accurately, since
Invalid functional
dependencies:
⚫ name → dept_name
◦ Students with the same name can
have different dept_name, hence this is
not a valid functional dependency.
⚫ dept_building → dept_name
◦ There can be multiple departments in
the same building, For example, in
the above table departments ME and
EC are in the same building B2,
◦hence dept_building → dept_name is an
invalid functional dependency.
Armstrong’s
axioms/properties of
functional dependencies:
⚫ Reflexivity: If Y is a subset of X, then X→Y
holds by reflexivity rule
For example, {roll_no, name} → name is
valid.
⚫ Augmentation: If X → Y is a valid
dependency, then X Z
→ YZ is also valid by the augmentation rule.
For example, If {roll_no, name} → dept_building
is valid, hence {roll_no, name, dept_name} →
{dept_building, dept_name} is also valid.→
⚫ Transitivity: If X → Y and Y → Z are both
valid dependencies, then X→Z is also valid by
the Transitivity rule.
For example, roll_no → dept_name &
Types of Functional
dependencies
⚫ Fully functional dependency
⚫ PartialFunctional
Dependency
⚫ Trivial functional dependency
⚫ Non-Trivial functional
dependency
⚫ Multivalued functional
Fully functional dependency
Partial Functional
Dependency
Trivial functional
dependency
⚫A → B has trivial functional dependency if
B is a subset of A.
◦{ Emp_id, Emp_name } → Emp_id is a
trivial functional dependency as Emp_id
is a subset of
{ Emp_id, Emp_name }.
Non Trivial Functional
Dependency
⚫ A non-trivial functional dependency is a
dependency where the dependent attribute is not a
subset of the determinant.
⚫ In other words, if X →Y is a functional dependency,
then it is
non-trivial if Y is not a subset of X.
Example:
⚫ Consider a Student table with the following
attributes:
Student ( StudentID, Name, Course,
Department )
⚫ A non-trivial functional dependency can be:
StudentID → Name, Course, Department
⚫ Here, StudentID uniquely determines Name,
Multivalued Dependency
⚫A Multivalued Dependency
(MVD) occurs when one
attribute in a table uniquely
determines multiple
independent values of another
attribute, while also being
independent of other attributes. It
is represented as:
⚫ X →→ Y ( X multi-determines
Y ), meaning for each value of X,
Multivalued Dependency
⚫ Consider a table Student ( StudentID,
Course, Hobby )
⚫ A student can enroll in multiple courses.
⚫ A student can also have multiple hobbies.
⚫ A student’s course selection is independent
of their hobbies.
⚫ Thus, there exists a multivalued
dependency: StudentID →→
Course
StudentID →→ Hobby
⚫ This means that for each StudentID, there
can be multiple Courses and multiple
Transitive Dependency in
DBMS
Transitive Dependency in
DBMS
⚫ {Company} → {CEO } If we
know the Company, we know its
CEO's name
⚫ {CEO } → {Age} If we know the CEO,
we know the Age
⚫ Therefore according to the rule of
transitive dependency:
⚫ {Company} → {Age} should hold, that
makes sense because if we know the
company name, we can know his age.
Closure of an Attribute
Set of attributes that can
be functionally
determined from it.
Closure of an attribute
X is denoted by X +
Find the closure of A, B, C,
D given R(A,B,C,D), FD :
{A→B, B→D, C→B}
A+ = ? C+ = ?
A→B→D C→B→D
A+ = C+ =
ABD CBD
B+ = ? D+ = ?
B→D +
Consider a relation R ( A, B, C, D,
E, F, G ) with the functional
dependencies
A → BC BC → DE D → F CF →
Find A+. G
A+ =
{A= }{ A , B , C } ( Using A → BC )
={A,B,C ,D , E } ( Using BC →
DE )
={A,B,C ,D ,E,F} ( Using D
→F)
={A,B,C ,D ,E,F,G}
( Using CF → G )
Consider a relation R ( A, B, C, D,
E, F, G ) with the functional
dependencies
A → BC BC → DE D → F CF
→G
Find D+.
D+ = { D }
={D ,F}
Consider a relation R ( A, B, C, D,
E, F, G ) with the functional
dependencies
A → BC BC → DE D → F CF →
G Find { B , C }+.
{ B , C }+ = { B , C }
= { B , C , D , E } ( Using BC →
DE )
= { B , C , D , E , F } ( Using D →
F)
= { B , C , D , E , F , G } ( Using
CF → G )
Given relational schema R( P, Q, R, S,T, U,V)
having a set of functional dependencies
FD = { P→Q, QR→ST,
PTV→V }.
Determine Closure of (QR) + and (PR)+
{ Q R }+ = Q R
= QRST
{ PR = PR
}+ = PRQ
=
Given relational schema R( P, Q, R,
S,T) and set of functional
dependencies
Determine
FD = { P→QR,Closure of Q→S,T→ P }
RS→T,
( T )+
T+ = T → P → Q R
→S
= TPQRS
Different kinds of KEYS
Candidate Key
⚫ A minimal set of attributes
that can uniquely identify a
tuple in a relation.
⚫ Example: In a Student
table, (StudentID) or
(Email) can be candidate
keys.
Primary Key
⚫ A candidate key that is
selected as the main key
to uniquely identify tuples.
⚫ Example: In an Employee
table, EmployeeID is the
primary key.
Super Key
⚫ A set of one or more
attributes that can
uniquely identify a tuple.
⚫ Example: (StudentID,
Name) is a super key in
a Student table.
Alternate Key
⚫ A candidate key that is not
selected as the primary key.
⚫ Example: In a Customer
table, if CustomerID is the
primary key, then Email can
be an alternate key.
Foreign Key
⚫ An attribute in one table
that refers to the
primary key of another
table.
⚫ Example: In an Orders table,
CustomerID (which refers
to the Customer table) is a
Foreign Key
Customers Table (Primary Key: CustomerID)
CustomerI Name Email
D
1 Devendra
[email protected] 2 Antony
[email protected] 3 Beerbal
[email protected] Orders Table (Foreign Key: CustomerID)
OrderID OrderDate CustomerID
101 2024-03-10 1
102 2024-03-11 2
103 2024-03-12 1
104 2024-03-13 3
Partial Key
⚫ A key used in weak
entities to uniquely
identify them using a
combination of attributes
and a related strong entity.
⚫ Example: A dependent entity
may have (DependentID,
Partial
Key
Customers Table (Primary Key: CustomerID) Orders Table (Foreign Key: CustomerID)
CustomerID Name Email OrderID OrderDate CustomerID
m 102 2024-03-11 2
2 Antony
[email protected] 3 Beerbal
[email protected] 103 2024-03-12 1
104 2024-03-13 3
OrderItems Table (Weak Entity with Partial Key ItemID)
OrderID ItemID ProductName Quantity
101 10 Laptop 10
101 2 Mouse 24
102 10 Laptop 19
103 3 Printer 01
104 10 Laptop 01
104 2 Mouse 01
104 3 Printer 01
Surrogate Key
⚫ An artificially created key that
has no business meaning
but uniquely identifies a
record.
⚫ Example: A system-
generated unique ID
like UUID(Universal
Composite Key
⚫ A key that consists of two or
more attributes to uniquely
identify a tuple.
⚫ Example: Example: (OrderID,
ItemID) together form a
composite key in an
OrderItems table.
Composite
Key
Example: (OrderID, ItemID) together form a
composite key in an OrderItems table.
OrderItems Table
OrderID ItemID ProductName Quantity
101 10 Laptop 10
101 2 Mouse 24
102 10 Laptop 19
103 3 Printer 01
104 10 Laptop 01
104 2 Mouse 01
104 3 Printer 01
Unique Key
⚫ An attribute that must have
unique values in a table
but allows NULL values.
⚫ Example: In a User table,
Email can be a unique key.
Secondary Key
⚫ An attribute or set of
attributes used for
indexing and searching,
but not necessarily unique.
⚫ Example: A Student table may
have 'Department' as a
secondary key for
Consider the following Student
schema- ( roll, name, sex, age,
Student
address, class,
section )
Given below are the examples of
candidate keys-
( class, section,
roll ) ( name,
address )
roll, name, sex, age, address, class,
section
roll name sex age addre clas sectio
ss s n
1 Ram M 18 BOM CS A
2 Maria F 18 BLR AI A
1 Aysha F 18 UDP ME C
2 Gita F 21 MLR CE A
4 John M 21 BLR ME A
5 Abdul M 20 BOM CE B
3 Meena F 18 DEL CS A
1 Meera F 19 MLR CE A
2 Ram M 21 UDP ME A
1 Guru M 22 BLR AI A
1 John M 20 BOM CS B
roll, name, sex, age, address,
class, section
( class, section, roll )
class section roll name sex age address
AI A 1 Guru M 22 BLR
AI A 2 Maria F 18 BLR
CE A 1 Meera F 19 MLR
CE A 2 Gita F 21 MLR
CE B 5 Abdul M 20 BOM
CS A 1 Ram M 18 BOM
CS A 3 Meena F 18 DEL
CS B 1 John M 20 BOM
ME A 2 Ram M 21 UDP
ME A 4 John M 21 BLR
ME C 1 Aysha F 18 UDP
roll, name, sex, age, address,
class, section
( name, address )
name address roll sex age class section
Abdul BOM 5 M 20 CE B
Aysha UDP 1 F 18 ME C
Gita MLR 2 F 21 CE A
Guru BLR 1 M 22 AI A
John BLR 4 M 21 ME A
John BOM 1 M 20 CS B
Maria BLR 2 F 18 AI A
Meena DEL 3 F 18 CS A
Meera MLR 1 F 19 CE A
Ram BOM 1 M 18 CS A
Ram UDP 2 M 21 ME A
Let R = (A, B, C, D, E, F) be a relation
with the following dependencies
C →
FE→
A
EC →
D
A→
B
Determineall essential
attributes of the given relation.
Essential attributes of the relation are C
C →
FE→
AC →
E
D
A→B
So, we have { CE }+
={C ,E}
={C ,E,F} ( Using C →
={A,C ,E,F} F ) ( Using E
={A,C ,D ,E, →A)
F} ( Using EC →
We conclude that CE = {can
A ,determine
B , C , D all
, the
D attributes
) ( Using A
of
the given relation. E , F } →B)
So, CE is the only possible candidate key of the relation.
Number of candidate
keys
CH →
Number of candidate G → BC
A
keys B→
CFH
E→A
F → EG
CH → G
A → BC
B→
CFH
E→A
F → EG
Number of candidate
keys
Let R = (A, B, C, D, E) be a relation
schema with the following
dependencies
AB → C
C →
D B
→E
Determine the total
Closures of a set of functional
dependencies
A Closure of a set of FDs is a set
of all possible FDs that can be
derived from a given set of FDs.
It is also referred as a Complete
set of FDs.
If F is used to donate the set of
FDs for relation R, then a closure
of a set of FDs implied by F is
denoted by F+.
Find closure of a set of FDs
implied by F.
A→B
A→C
CG → H
CG → I
B→H
A→B
A→C
CG → H
CG → I
B→H
SSN +
a) ssn+
result=s
sn
repeat
{
pno →(pname, ploc) (u
v) pno C ssn
}
result=s
sn
repeat
{
ssn →ename
(u v) ssn C ssn
Then result ssn U
ename Result=(ssn,