Database Management System
Unit IV
Data Normalization
Anomalies in relational database design
Anomalies means problems or inconsistency which happened during the
operations performed on the table. There can be many reasons that anomaly
occur for example, It occurs when data is stored multiple times unnecessarily
in the database i.e. redundant data is present or it occur when all the data is
stored in a single table. normalization is used to overcome the anomalies. the
different type of anomalies are insertion, deletion and updation anomaly.
Input
The same input is used for all three anomalies.
Student
ID Name Age Branch Branch_Code Hod_name
1 A 17 Civil 101 Aman
2 B 18 Civil 101 Aman
3 C 19 Civil 101 Aman
4 D 20 CS 102 Monu
5 E 21 CS 102 Monu
6 F 22 Electrical 103 Rakesh
With the help of this table, we are going to show the working of different anomalies.
Insertion Anomaly
When certain data or attributes cannot be inserted into the database without
the presence of other data, it's called insertion anomaly.
For example, let's take a branch name petroleum, now the data regarding
petroleum cannot be stored in the table unless we insert a student which is in
petroleum. Practically, branch existence is not dependent on student
existence i.e. we must have the capability that we can store the data of
branch whether there are any student of that branch or not, but this can't be
done because of insertion anomaly.
Prof. Akshata 1
Pawar
Database Management System
Code
Insert into student values(7, ‘G’,16, ‘PETROLEUM’,104,
‘NAMAN’)#Values get inserted
Select * from Student;#Data selected
Output
ID Name Age Branch Branch_Code Hod_name
1 A 17 Civil 101 Aman
2 B 18 Civil 101 Aman
3 C 19 Civil 101 Aman
4 D 20 CS 102 Monu
5 E 21 CS 102 Monu
6 F 22 Electrical 103 Rakesh
7 G 16 Petroleum 104 Naman
Deletion anomaly
If we delete any data from the database and any other information which is
required also gets deleted with that deletion, then it is called deletion
anomaly.
For example, suppose a student of the electrical branch is leaving so now we
have to delete the data of that student, but the problem is if we delete the
student data, then branch data will also get deleted along with that as there
is only one student present through which branch data is present.
Code
Delete from STUDENT WHERE BRANCH= ‘ELECTRICAL’;#data
get deleted Select * from STUDENT;#data selected
Output
ID Name Age Branch Branch_Code Hod_name
1 A 17 Civil 101 Aman
2 B 18 Civil 101 Aman
Prof. Akshata 2
Pawar
Database Management System
3 C 19 Civil 101 Aman
4 D 20 CS 102 Monu
5 E 21 CS 102 Monu
Updation/modification anomaly
If we want to update any single piece of data then we have to update all
other copies, it comes under insertion anomaly.
For example, suppose we need to change the hod name for civil branch, now
as per requirement, only single data is to be changed, but we have to change
the data at every other part so as to not make an inconsistent table
Algorithm
Step 1 − Use update to make changes in the table
Step 2 − Provide changes that are to be made
Step 3 − Provide condition to where the task get performed
Step 4 − Use select to check the output
Code
Update STUDENT #Table selected to
preform task Set HOD_NAME=
‘RAHUL’#changes to be made WHERE
BRANCH= ‘CIVIL’;#condition given
Select * from STUDENT;#Data selected
Output
ID Name Age Branch Branch_Code Hod_name
1 A 17 Civil 101 Aman
2 B 18 Civil 101 Aman
3 C 19 Civil 101 Aman
4 D 20 CS 102 Monu
5 E 21 CS 102 Monu
6 F 22 Electrical 103 Rakesh
Decomposition:
The decomposition of relation scheme R={A1,A2,A3,…AN}
is its replacement by a set of relation
schemes{R1,R2,R3….RM } such that Ri R for 1 m and
R1UR2URm=R
Prof. Akshata 3
Pawar
Database Management System
A relation scheme R can be decomposed into a collection
of relation schemas {R1,R2,R3….RM } to eliminate some of the
anomalies contained in the original relation R.
Goals of decomposition:
Eliminate redundancy by decomposing a relation in to
several relations in a higher normal form
It is important to check that a decomposing does not lead to
design
Types of Decomposition
Types of Decomposition
Lossless Decomposition
o If the information is not lost from the relation that is decomposed, then
the decomposition will be lossless.
o The lossless decomposition guarantees that the join of relations will result
in the same relation as it was decomposed.
o The relation is said to be lossless decomposition if natural joins of all
the decomposition give the original relation.
Prof. Akshata 4
Pawar
Database Management System
Example:
EMPLOYEE_DEPARTMENT table:
EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME
22 Denim 28 Mumbai 827 Sales
33 Alina 25 Delhi 438 Marketing
46 Stephan 30 Bangalore 869 Finance
52 Katherine 36 Mumbai 575 Production
60 Jack 40 Noida 678 Testing
The above relation is decomposed into two relations EMPLOYEE and
DEPARTMENT
EMPLOYEE table:
EMP_ID EMP_NAME EMP_AGE EMP_CITY
22 Denim 28 Mumbai
33 Alina 25 Delhi
46 Stephan 30 Bangalore
52 Katherine 36 Mumbai
60 Jack 40 Noida
DEPARTMENT table
DEPT_ID EMP_ID DEPT_NAME
Prof. Akshata 5
Pawar
Database Management System
827 22 Sales
438 33 Marketing
869 46 Finance
575 52 Production
678 60 Testing
Now, when these two relations are joined on the common column "EMP_ID", then
the resultant relation will look like:
Employee ⋈ Department
EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME
22 Denim 28 Mumbai 827 Sales
33 Alina 25 Delhi 438 Marketing
46 Stephan 30 Bangalore 869 Finance
52 Katherine 36 Mumbai 575 Production
60 Jack 40 Noida 678 Testing
Hence, the decomposition is Lossless join decomposition.
Dependency Preserving
o It is an important constraint of the database.
o In the dependency preservation, at least one decomposed table must
satisfy every dependency.
o If a relation R is decomposed into relation R1 and R2, then the
dependencies of R either must be a part of R1 or R2 or must be derivable
from the combination of functional dependencies of R1 and R2.
Prof. Akshata 6
Pawar
Database Management System
o For example, suppose there is a relation R (A, B, C, D) with functional
dependency set (A->BC). The relational R is decomposed into R1(ABC) and
R2(AD) which is dependency preserving because FD A->BC is a part of
relation R1(ABC).
Properties of Decomposition
Decomposition must have the following properties:
1.Decomposition Must be Lossless
2.Dependency Preservation
3.Lack of Data Redundancy
1.Decomposition Must be Lossless
Decomposition must always be lossless, which means the information
must never get lost from a decomposed relation. This way, we get a
guarantee that when joining the relations, the join would eventually lead
to the same relation in the result as it was actually decomposed.
2.Dependency Preservation
Dependency is a crucial constraint on a database, and a minimum of one
decomposed table must satisfy every dependency. If {P → Q} holds, then
two sets happen to be dependent functionally. Thus, it becomes more
useful when checking the dependency if both of these are set in the very
same relation. This property of decomposition can be done only when we
maintain the functional dependency. Added to this, this property allows us
to check various updates without having to compute the database
structure’s natural join.
3.Lack of Data Redundancy
It is also commonly termed as a repetition of data/information. According
to this property, decomposition must not suffer from data redundancy.
When decomposition is careless, it may cause issues with the overall data
in the database. When we perform normalization, we can easily achieve
the property of lack of data redundancy.
Prof. Akshata 7
Pawar
Database Management System
Functional dependencies:
Just like the name suggests, a Functional dependency in DBMS
refers to a relationship that is present between attributes of any table that
are dependent on each other. E. F. Codd introduced it, and it helps in
avoiding data redundancy and getting to know more about bad designs.
If X is a relation that has attributes P and Q, then their functional
dependency would be represented by -> (arrow sign)
Thus, here, the following would represent the functional dependency
between the attributes using an arrow sign:
P -> Q
In this case, the left side of this arrow is a Determinant. The right side of
this arrow is a Dependent. P will be the primary key attribute, while Q will
be a dependent non-key attribute from a similar table as the primary key.
It shows that the primary key attribute P is functionally dependent on the
non-key attribute Q. In simpler words, If the column P attribute of a table
identifies the column Q attribute of the very same table uniquely, then the
functional dependency of column Q on column P is symbolised as P → Q.
Functional dependencies (FDs) are used to specify formal measures
of the "goodness" of relational designs
FDs and keys are used to define normal forms for relations
FDs are constraints that are derived from the meaning and
interrelationships of the data attributes
Inference Rule (IR):
Functional Dependency’s Armstrong’s Axioms
Property
o The Armstrong's axioms are the basic inference rule.
o Armstrong's axioms are used to conclude functional dependencies
on a relational database.
o The inference rule is a type of assertion. It can apply to a set of
FD(functional dependency) to derive other FD.
Prof. Akshata 8
Pawar
Database Management System
o Using the inference rule, we can derive additional functional
dependency from the initial set.
The Functional dependency has 6 types of inference rule:
1.
Reflexive Rule (IR1)
In the reflexive rule, if Y is a subset of X, then X determines Y.
1. If X ⊇ Y then X → Y
Example:
1. X = {a, b, c, d, e}
2. Y = {a, b, c}
2. Augmentation Rule (IR2)
The augmentation is also called as a partial dependency. In
augmentation, if X determines Y, then XZ determines YZ for any Z.
1. If X → Y then XZ → YZ
Example:
1. For R(ABCD), if A → B then AC → BC
3. Transitive Rule (IR3)
In the transitive rule, if X determines Y and Y determine Z, then X must
also determine Z.
1. If X → Y and Y → Z then X → Z
4. Union Rule (IR4)
Union rule says, if X determines Y and X determines Z, then X must also
determine Y and Z.
1. If X → Y and X → Z then X → YZ
Proof:
Prof. Akshata 9
Pawar
Database Management System
1.X → Y (given)
2.X → Z (given)
3. X → XY (using IR2 on 1 by augmentation with X. Where XX = X)
4. XY → YZ (using IR2 on 2 by augmentation with Y)
5. X → YZ (using IR3 on 3 and 4)
5.
Decomposition Rule (IR5)
Decomposition rule is also known as project rule. It is the reverse of union rule.
This Rule says, if X determines Y and Z, then X determines Y and X
determines Z separately.
1. If X → YZ then X → Y and X → Z
Proof:
1.X → YZ (given)
2. YZ → Y (using IR1 Rule)
3. X → Y (using IR3 on 1 and 2)
6.
Pseudo transitive Rule (IR6)
In Pseudo transitive Rule, if X determines Y and YZ determines W, then XZ
determines W.
1. If X → Y and YZ → W then XZ → W
Proof:
1.X → Y (given)
2.WY → Z (given)
3. WX → WY (using IR2 on 1 by augmenting with W)
4. WX → Z (using IR3 on 3 and 2)
Types of Functional Dependency
Trivial − If a functional dependency (FD) X→ Y holds, where Y is a subset
of X, then it is called a trivial FD. Trivial FDs always hold.
Prof. Kavita 10
Landage
Database Management System
Non-trivial − If an FD X → Y holds, where Y is not a subset of X, then it is
called a nontrivial FD.
Completely non-trivial − If an FD X → Y holds, where x intersect Y = Φ,
it is said to be a completely non-trivial FD.
What is Transitive Dependency in DBMS?
Whenever some indirect relationship happens to cause functional
dependency (FC), it is known as Transitive Dependency. Thus, if A -> B
and B -> C are true, then A -> C happens to be a transitive dependency.
Thus, to achieve 3NF, one must eliminate the Transitive Dependency.
Normalization:
The process of decomposing unsatisfactory "bad" relations by
breaking up their attributes into smaller relations
Normalization is the process of organizing the data in the database.
Normalization is used to minimize the redundancy from a relation or
set of relations. It is also used to eliminate undesirable
characteristics like Insertion, Update, and Deletion Anomalies.
Normalization divides the larger table into smaller and links them
using relationships.
The normal form is used to reduce redundancy from the database table.
Why do we need Normalization?
The main reason for normalizing the relations is removing these
anomalies. Failure to eliminate anomalies leads to data redundancy and
can cause data integrity and other problems as the database grows.
Normalization consists of a series of guidelines that helps to guide you in
creating a good database structure.
Prof. Kavita 11
Landage
Database Management System
First Normal Form
First Normal Form is defined in the definition of relations (tables)
itself. This rule defines that all the attributes in a relation must have
atomic domains. The values in an atomic domain are indivisible units.
We re-arrange the relation (table) as below, to convert it to First Normal Form.
Prof. Kavita 12
Landage
Database Management System
Each attribute must contain only a single value from its pre-defined domain.
Second Normal Form
Before we learn about the second normal form, we need to understand the
following
−
Prime attribute − An attribute, which is a part of the candidate-key, is
known as a prime attribute.
Non-prime attribute − An attribute, which is not a part of the prime-key, is
said to be a non-prime attribute.
If we follow second normal form, then every non-prime attribute
should be fully functionally dependent on prime key attribute. That is, if X
→ A holds, then there should not be any proper subset Y of X, for which Y
→ A also holds true.
We see here in Student_Project relation that the prime key
attributes are Stu_ID and Proj_ID. According to the rule, non-key
attributes, i.e. Stu_Name and Proj_Name must be dependent upon both
and not on any of the prime key attribute individually. But we find that
Stu_Name can be identified by Stu_ID and Proj_Name can be identified by
Proj_ID independently. This is called partial dependency, which is not
allowed in Second Normal Form.
Prof. Kavita 13
Landage
Database Management System
We broke the relation in two as depicted in the above picture. So there exists no
partial dependency.
Third Normal Form
For a relation to be in Third Normal Form, it must be in Second
Normal form and the following must satisfy –
No non-prime attribute is transitively dependent on prime key attribute.
For any non-trivial functional dependency, X → A, then either − X is
a superkey or, A is prime attribute.
We find that in the above Student_detail relation, Stu_ID is the key
and only prime key attribute. We find that City can be identified by Stu_ID
as well as Zip itself. Neither Zip is a superkey nor is City a prime attribute.
Additionally, Stu_ID → Zip → City, so there exists transitive dependency. To
bring this relation into third normal form, we break the relation into two
relations as follows−
Boyce-Codd Normal Form
Boyce-Codd Normal Form (BCNF) is an extension of Third Normal Form on
strict terms. BCNF states that –
For any non-trivial functional dependency, X → A, X must be a super-key.
In the above image, Stu_ID is the super-key in the relation Student_Detail
and Zip is the superkey in the relation ZipCodes. So,
Stu_ID → Stu_Name, Zip
and
Zip → City
Which confirms that both the relations are in BCNF.
Prof. Kavita 14
Landage