Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
376 views66 pages

UNIT-3 DBMS Normalization and FD

Uploaded by

Mohit Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
376 views66 pages

UNIT-3 DBMS Normalization and FD

Uploaded by

Mohit Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 66

UNIT-3

Database Management Systems(KCA 204)

Functional Dependency
A functional dependency is a constraint that specifies the relationship between two sets of
attributes where one set can accurately determine the value of other sets.
It is denoted as X → Y, where X is a set of attributes that is capable of determining the value
of Y. The attribute set on the left side of the arrow, X is called Determinant, while on the
right side, Y is called the Dependent.

X → Y

Y is functionally dependent on X (den. X → Y), if each value of X in R is associated with


exactly one value of Y in R.

Y is functionally

X Y
dependent on X

Determinant: attribute or set of attributes on the left hand side of the arrow.

The left side of FD is known as a determinant, the right side of the production is known as a
dependent.

Let R be a relation schema α ⊆ R and β ⊆ R „ The functional dependency α → β holds on R if


and only if for any legal relations r(R), whenever any two tuples t1 and t 2 of r agree on the
attributes α, they also agree on the attributes β. That is, t1[α] = t 2 [α] ⇒ t 1[β ] = t 2 [β ] „

Example: Consider r(A,B) with the following instance of r. „

A B
1 4
1 5
3 7
On this instance, A → B does NOT hold, but B → A does hold.

For example of functional dependency:

in relation STUDENT table, Functional Dependencies


STUD_NO->STUD_NAME, STUD_NO->STUD_PHONE hold
but
STUD_NAME->STUD_ADDR do not hold

Types of Functional dependency:


 Trivial functional dependency
 Non-Trivial functional dependency
 Multivalued functional dependency
 Transitive functional dependency

1) Trivial functional dependency

A → B has trivial functional dependency if B is a subset of A.

The following dependencies are also trivial like: A → A, B → B

Example:

Consider a table with two columns Employee_Id and Employee_Name.

{Employee_id, Employee_Name} → Employee_Id is a trivial functional dependency as

Employee_Id is a subset of {Employee_Id, Employee_Name}.

Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name are trivial d


ependencies too.

2. Non-trivial functional dependency

A → B has a non-trivial functional dependency if B is not a subset of A.

When A intersection B is NULL, then A → B is called as complete non-trivial.

Example: ID → Name,
3. Multi-valued dependency :It occurs when two or more independent multi-valued facts
about the same attribute occur within same table. It is denoted
X-> -> Y. For example:-Consider a bike manufacture company which provides two colors
(RED and BLACK)

Bike Model Manufacture year Color


M1001 2007 Black
M1001 2007 Red
M1002 2008 Black
M1002 2008 Red
M2222 2009 Back
o Here columns manufacture year and color are independent of each other on bike
model. In this case these two columns are said to be multi-valued dependent on bike
model.
o Bike model-> manufacture year
o Bike model-> color

4. Transitive Dependency
In transitive functional dependency, dependent is indirectly dependent on determinant.
i.e. If a → b & b → c, then according to axiom of transitivity, a → c. This is a transitive
functional dependency

For example,

enrolno Name dept buildingno

42 Abc CO 4

43 Pqr EC 2

44 Xyz IT 1

45 Abc EC 2

Here, enrolno → dept and dept → buildingno,


Hence, according to transitivity,
enrolno → buildingno is a valid functional dependency. This is an indirect functional
dependency, hence called Transitive functional dependency.

Armstrong Axioms Rules or inference Rule


 Armstrong's Axioms is a set of rules. It provides a simple technique for reasoning about
functional dependencies. It was developed by William W. Armstrong in 1974.It is used to
infer all the functional dependencies on a relational database.
 Armstrong’s Axioms – a set of rules that can be applied repeatedly to infer all the Functional
Dependencies (FDs) implied by a set F of FDs. Using X, Y, and Z to denote sets of attributes
over a relation schema R:

Primary Rules
 Reflexivity rule. If X is a set of attributes and Y ⊆ X, then X→Y holds.
 Augmentation rule. If X→ Y holds and Z is a set of attributes, then XZ→ YZ holds.
 Transitivity rule. If X→Y holds and Y→Z holds, then X→Z holds.

Secondary Rules
 Union rule. If X→Y holds and X→Z holds, then X→YZ holds.
 Decomposition rule. If X→Y Z holds, then X→Y holds and X→Z holds.
 Pseudo transitivity rule. If X → Y holds and WY→Z holds, then X W → Z holds.
 Composition rule :- If X->Y and A->B then XA->YB

Closure of set of functional dependency


Given a set F set of functional dependencies, there are certain other functional
dependencies that are logically implied by F. + E.g. If A → B and B → C, then we can infer
that A → C „
The set of all functional dependencies logically implied by F is the closure of F. „
We denote the closure of F by F+.

Example R = (A, B, C, G, H, I) F = { A → B A → C CG → H CG → I B → H} „
some members of F+

Solution : A → H ( by transitivity from A → B and B → H)

AG → I (Since A-> C and CG -> I hold in the relation, It derives AG-> I using Pseudo
Transitivity rule)

CG → HI ( Since CG-> H and CG-> I hold in the relation, It derives CG HI using Union rule)

AG-> H (Since A -> C and CG -> H hold in the relation, It derives AG H using Pseudo
Transitivity rule )
Therefore F+ ={ A-> H , CG-> HI, AG-> I, AG-> H }
Procedure to computing F+
Closure set of attribute set
The set of all those attributes which can be functionally determined from an attribute set is
called as a closure of that attribute set.
Closure of attribute set {X} is denoted as {X}+.
Steps to Find Closure of an Attribute Set-
Following steps are followed to find the closure of an attribute set-

Step-01:
Add the attributes contained in the attribute set for which closure is being calculated to the
result set.

Step-02:

Recursively add the attributes to the result set which can be functionally determined from
the attributes already contained in the result set.

Algorithm to compute α+, the closure of α under F

result := α;

while (changes to result) do

for each β→γ in F do

begin

if β ⊆ result then result := result ∪ γ

end

Question :- Consider a relation R ( A , B , C , D , E , F , G ) with the functional dependencies-


A → BC

BC → DE

D→F

CF → G

Find the closure of A, D ,BC ?

Closure of attribute A-

A+ = { A }

= { A B C } ( Using A → BC )

= { A B C D E } ( Using BC → DE )

= { A B C D E F } ( Using D → F )

= { A B C D E F G } ( Using CF → G )

Thus,

A+ = { A , B , C , D , E , F , G }

Closure of attribute D-

D+ = { D }

= { D F } ( Using D → F )

Thus,

D+ = { D F }

Closure of attribute set {B, C}-

{ B C }+= { B , C }

= { B C D E } ( Using BC → DE )

= { B C D E F } ( Using D → F )

= { B C , D E F G } ( Using CF → G )

Thus,
{ B , C }+ = { B , C , D , E , F , G }

Question :- R = (A, B, C, G, H, I) „
F = {A → B ,A → C ,CG → H , CG → I , B → H} „
Find (AG)+

Solution : 1. result = AG
2. result = ABCG (A → C and A → B)
3. result = ABCGH (CG → H and CG ⊆ AGBC)
4. result = ABCGHI (CG → I and CG ⊆ AGBCH)

(AG)+={ ABCGHI}

1) Given relational schema R( P Q R S T U V) having following attribute P Q R S T U and V,


also there is a set of functional dependency denoted by FD = { P->Q, QR->ST, PTV->V }.

Determine Closure of (QR)+ and (PR)+

a) QR+ = QR
Hence QR+ = QRST(QR->ST)

b) PR + = PR
P→Q, P is a subset of PR, Hence PR+ = PRQ

Hence PR+ = PRQST(QR → ST)

Therefore PR+ = PRQST ( Answer)

More Questions on Attribute closure:-


Equivalence of functional dependency
It states that, if the relations of different Functional dependencies sets are given, then we
have to find out whether one Functional dependency set is a subset of other given set or
both the sets are equal.

Suppose F & G are the two sets of functional dependencies for a relational schema R, then
following four cases may possible:
F ⊆ G (F is the subset of G)
G ⊆ F (G is the subset of F)
F = G (F ⊆ G and G ⊆ F) (F is equivalent to G)
F ≠ G (F is not equivalent to G)
Canonical Cover OR Irreducible set of functional dependency
A canonical cover Fc for F is a set of dependencies such that F logically implies all
dependencies in Fc , and Fc logically implies all dependencies in F. Furthermore, Fc must
have the following properties:

• No functional dependency in Fc contains an extraneous attribute.


• Each left side of a functional dependency in Fc is unique. That is, there are no two
dependencies α1 → β1 and α2 → β2 in Fc such that α1 = α2.

Algorithm to compute canonical cover of set F:


FC=F
repeat
Use the union rule to replace any dependencies in Fc of the form
α1 → β1 and α1→ β2 with α1→ β1 β2.
Find a functional dependency → α β in Fc with an extraneous attribute
either in α or in β.
/* Note: the test for extraneous attributes is done using Fc , not F */ If an
extraneous attribute is found, delete it from α → β in Fc .
until (Fc does not change)

Extraneous attributes : An attribute of a functional dependency is said to be extraneous if


we can remove it without changing the closure of the set of functional dependencies.
For example, suppose we have the functional dependencies AB → C and A → C in F . Then,
B is extraneous in AB → C.

Example 2: Given a relational Schema R( A, B, C, D) and set of Function Dependency FD = { B


→ A, AD → BC, C → ABD }.

Find the canonical cover?

Solution: Given FD = { B → A, AD → BC, C → ABD }, now decompose the FD using


decomposition rule( Armstrong Axiom ).

1. B → A
2. AD → B ( using decomposition inference rule on AD → BC)
3. AD → C ( using decomposition inference rule on AD → BC)
4. C → A ( using decomposition inference rule on C → ABD)
5. C → B ( using decomposition inference rule on C → ABD)
6. C → D ( using decomposition inference rule on C → ABD)

Now set of FD = { B → A, AD → B, AD → C, C → A, C → B, C → D }

The next step is to find closure of the left side of each of the given FD by including that FD
and excluding that FD, if closure in both cases are same then that FD is redundant and we
remove that FD from the given set, otherwise if both the closures are different then we do
not exclude that FD.

Calculating closure of all FD { B → A, AD → B, AD → C, C → A, C → B, C → D }

1a. Closure B+ = BA using FD = { B → A, AD → B, AD → C, C → A, C → B, C → D }

1b. Closure B+ = B using FD = { AD → B, AD → C, C → A, C → B, C → D }

From 1 a and 1 b, we found that both the Closure( by including B → A and excluding B → A )
are not equivalent, hence FD B → A is important and cannot be removed from the set of FD.

2 a. Closure AD+ = ADBC using FD = { B →A, AD → B, AD → C, C → A, C → B, C → D }

2 b. Closure AD+ = ADCB using FD = { B → A, AD → C, C → A, C → B, C → D }

From 2 a and 2 b, we found that both the Closure (by including AD → B and excluding AD →
B) are equivalent, hence FD AD → B is not important and can be removed from the set of
FD.

Hence resultant FD = { B → A, AD → C, C → A, C → B, C → D }

3 a. Closure AD+ = ADCB using FD = { B →A, AD → C, C → A, C → B, C → D }

3 b. Closure AD+ = AD using FD = { B → A, C → A, C → B, C → D }

From 3 a and 3 b, we found that both the Closure (by including AD → C and excluding AD →
C ) are not equivalent, hence FD AD → C is important and cannot be removed from the set
of FD.

Hence resultant FD = { B → A, AD → C, C → A, C → B, C → D }

4 a. Closure C+ = CABD using FD = { B →A, AD → C, C → A, C → B, C → D }

4 b. Closure C+ = CBDA using FD = { B → A, AD → C, C → B, C → D }

From 4 a and 4 b, we found that both the Closure (by including C → A and excluding C → A)
are equivalent, hence FD C → A is not important and can be removed from the set of FD.

Hence resultant FD = { B → A, AD → C, C → B, C → D }

5 a. Closure C+ = CBDA using FD = { B →A, AD → C, C → B, C → D }

5 b. Closure C+ = CD using FD = { B → A, AD → C, C → D }

From 5 a and 5 b, we found that both the Closure (by including C → B and excluding C → B)
are not equivalent, hence FD C → B is important and cannot be removed from the set of FD.
Hence resultant FD = { B → A, AD → C, C → B, C → D }

6 a. Closure C+ = CDBA using FD = { B →A, AD → C, C → B, C → D }

6 b. Closure C+ = CBA using FD = { B → A, AD → C, C → B }

From 6 a and 6 b, we found that both the Closure( by including C → D and excluding C → D)
are not equivalent, hence FD C → D is important and cannot be removed from the set of FD.

Hence resultant FD = { B → A, AD → C, C → B, C → D }

o Since FD = { B → A, AD → C, C → B, C → D } is resultant FD, now we have checked the


redundancy of attribute, since the left side of FD AD → C has two attributes, let's
check their importance, i.e. whether they both are important or only one.

Closure AD+ = ADCB using FD = { B →A, AD → C, C → B, C → D }

Closure A+ = A using FD = { B →A, AD → C, C → B, C → D }

Closure D+ = D using FD = { B →A, AD → C, C → B, C → D }

Since the closure of AD+, A+, D+ that we found are not all equivalent, hence in FD AD → C,
both A and D are important attributes and cannot be removed.

Hence resultant FD = { B → A, AD → C, C → B, C → D } and we can rewrite as

FD = { B → A, AD → C, C → BD } is Canonical Cover of FD = { B → A, AD → BC, C → ABD }.

Example 2: Given a relational Schema R( W, X, Y, Z) and set of Function Dependency FD =


{ W → X, Y → X, Z → WXY, WY → Z }. Find the canonical cover?

Solution: Given FD = { W → X, Y → X, Z → WXY, WY → Z }, now decompose the FD using


decomposition rule( Armstrong Axiom ).

1. W → X
2. Y → X
3. Z → W ( using decomposition inference rule on Z → WXY )
4. Z → X ( using decomposition inference rule on Z → WXY )
5. Z → Y ( using decomposition inference rule on Z → WXY )
6. WY → Z

Now set of FD = { W → X, Y → X, WY → Z, Z → W, Z → X, Z → Y }
The next step is to find closure of the left side of each of the given FD by including that FD
and excluding that FD, if closure in both cases are same then that FD is redundant and we
remove that FD from the given set, otherwise if both the closures are different then we do
not exclude that FD.

Calculating closure of all FD { W → X, Y → X, Z → W, Z → X, Z → Y, WY → Z }

1 a. Closure W+ = WX using FD = { W → X, Y → X, Z → W, Z → X, Z → Y, WY → Z }

1 b. Closure W+ = W using FD = { Y → X, Z → W, Z → X, Z → Y, WY → Z }

From 1 a and 1 b, we found that both the Closure (by including W → X and excluding W →
X ) are not equivalent, hence FD W → X is important and cannot be removed from the set of
FD.

Hence resultant FD = { W → X, Y → X, Z → W, Z → X, Z → Y, WY → Z }

2 a. Closure Y+ = YX using FD = { W → X, Y → X, Z → W, Z → X, Z → Y, WY → Z }

2 b. Closure Y+ = Y using FD = { W → X, Z → W, Z → X, Z → Y, WY → Z }

From 2 a and 2 b we found that both the Closure (by including Y → X and excluding Y → X )
are not equivalent, hence FD Y → X is important and cannot be removed from the set of FD.

Hence resultant FD = { W → X, Y → X, Z → W, Z → X, Z → Y, WY → Z }

3 a. Closure Z+ = ZWXY using FD = { W → X, Y → X, Z → W, Z → X, Z → Y, WY → Z }

3 b. Closure Z+ = ZXY using FD = { W → X, Y → X, Z → X, Z → Y, WY → Z }

From 3 a and 3 b, we found that both the Closure (by including Z → W and excluding Z →
W ) are not equivalent, hence FD Z → W is important and cannot be removed from the set
of FD.

Hence resultant FD = { W → X, Y → X, Z → W, Z → X, Z → Y, WY → Z }

4 a. Closure Z+ = ZXWY using FD = { W → X, Y → X, Z → W, Z → X, Z → Y, WY → Z }

4 b. Closure Z+ = ZWYX using FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }

From 4 a and 4 b, we found that both the Closure (by including Z → X and excluding Z → X )
are equivalent, hence FD Z → X is not important and can be removed from the set of FD.

Hence resultant FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }

5 a. Closure Z+ = ZYWX using FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }

5 b. Closure Z+ = ZWX using FD = { W → X, Y → X, Z → W, WY → Z }


From 5 a and 5 b, we found that both the Closure (by including Z → Y and excluding Z → Y )
are not equivalent, hence FD Z → X is important and cannot be removed from the set of FD.

Hence resultant FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }

6 a. Closure WY+ = WYZX using FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }

6 b. Closure WY+ = WYX using FD = { W → X, Y → X, Z → W, Z → Y }

From 6 a and 6 b, we found that both the Closure (by including WY → Z and excluding WY →
Z) are not equivalent, hence FD WY → Z is important and cannot be removed from the set of
FD.

Hence resultant FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }

Since FD = { W → X, Y → X, Z → W, Z → Y, WY → Z } is resultant FD now, we have checked


the redundancy of attribute, since the left side of FD WY → Z has two attributes at its left,
let's check their importance, i.e. whether they both are important or only one.

Closure WY+ = WYZX using FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }

Closure W+ = WX using FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }

Closure Y+ = YX using FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }

Since the closure of WY+, W+, Y+ that we found are not all equivalent, hence in FD WY → Z,
both W and Y are important attributes and cannot be removed.

Hence resultant FD = { W → X, Y → X, Z → W, Z → Y, WY → Z } and we can rewrite as:

FD = { W → X, Y → X, Z → WY, WY → Z } is Canonical Cover of FD = { W → X, Y → X, Z →


WXY, WY → Z }.

Normalization
 Normalization is the process of organizing the data in the database.
 Normalization is used to minimize the redundancy from a relation or set of relations.
It is also used to eliminate the undesirable characteristics like Insertion, Update and
Deletion Anomalies.
 Normalization divides the larger table into the smaller table and links them using
relationship.
 The normal form is used to reduce redundancy from the database table.

If a database design is not perfect, it may contain anomalies, which are like a bad dream for
any database administrator. Managing a database with anomalies is next to impossible.

 Update anomalies − If data items are scattered and are not linked to each other
properly, then it could lead to strange situations. For example, when we try to
update one data item having its copies scattered over several places, a few instances
get updated properly while a few others are left with old values. Such instances
leave the database in an inconsistent state.
 Deletion anomalies − We tried to delete a record, but parts of it was left undeleted
because of unawareness, the data is also saved somewhere else.
 Insert anomalies − An Insert Anomaly occurs when certain attributes cannot
be inserted into the database without the presence of other attributes.

Anomalies in DBMS

 Anomalies are problems that can occur in poorly planned, un-normalised databases
where all the data is stored in one table
Student Table:

StudRegistration CourseID StudName Address Course

205 6204 James Los Angeles Economics

205 6247 James Los Angeles Economics

224 6247 Trent Bolt New York Mathematics

230 6204 Ritchie Rich Egypt Computer

230 6208 Ritchie Rich Egypt Accounts


 There are two students in the above table, 'James' and 'Ritchie Rich', whose records
are repetitive when we enter a new CourseID. Hence it repeats the studRegistration,
StudName and address attributes.

1. Insert Anomaly: An insert anomaly occurs in the relational database when some
attributes or data items are to be inserted into the database without existence of
other attributes. For example, In the Student table, if we want to insert a new
courseID, we need to wait until the student enrolled in a course. In this way, it is
difficult to insert new record in the table. Hence, it is called insertion anomalies.
2. Update Anomalies: The anomaly occurs when duplicate data is updated only in
one place and not in all instances. Hence, it makes our data or table inconsistent
state. For example, suppose there is a student 'James' who belongs to Student table.
If we want to update the course in the Student, we need to update the same in the
course table; otherwise, the data can be inconsistent. And it reflects the changes in a
table with updated values where some of them will not.

 3. Delete Anomalies: An anomaly occurs in a database table when some records are
lost or deleted from the database table due to the deletion of other records.
To solve these anomalies and data redundancy we use normalization in DBMS so that
database remains consistent and improves data integrity.

Purpose of Normalization

Normalization is the process of efficiently organizing data in a database. There are two goals
of the normalization process:

1) Eliminating redundant data


2) Ensuring data dependencies make sense
3) Removes insert ,update and delete anomaly

Objective of Normalization
1. It is used to remove the duplicate data and database anomalies from the relational table.
2. Normalization helps to reduce redundancy and complexity by examining new data types
used in the table.
3. It is helpful to divide the large database table into smaller tables and link them using
relationship.
4. It avoids duplicate data or no repeating groups into a table.
5. It reduces the chances for anomalies to occur in a database.

ADVANTAGES OF NORMALIZATION
The following are the advantages of the normalization.

 More efficient data structure.


 Avoid redundant fields or columns.
 More flexible data structure i.e. we should be able to add new rows and data values easily
 Better understanding of data.
 Ensures that distinct tables exist when necessary.
 Easier to maintain data structure i.e. it is easy to perform operations and complex queries
can be easily handled
 Minimizes data duplication.
 Close modeling of real world entities, processes and their relationships.
 Save file space
 eliminate data integrity problems
 removes all anomalies like insert, update and delete anomalies.

DISADVANTAGES OF NORMALIZATION
The following are disadvantages of normalization.
 You cannot start building the database before you know what the user needs.
 On Normalizing the relations to higher normal forms i.e. 4NF, 5NF the performance
degrades.
 It is very time consuming and difficult process in normalizing relations of higher degree.
 Careless decomposition may leads to bad design of database which may leads to serious
problems.
 More tables to join by spreading out data into more tables ,you need to increase to join the
tables
 More complicated SQL queries for multiple table and joins

Types of Normal Forms


The database community has developed a series of guidelines for ensuring that databases
are normalized. These are referred to as normal forms and are numbered from one through
five.

 First Normal Form(1NF) - atomic values


 Second Normal Form(2NF) - no partial dependency /
 Third Normal Form(3NF) - no transitive dependency
 BCNF(Boyce codd Normal Form)- it contains super key on LHS A->B
 Fourth Normal Form(4NF)- no multivalued dependency
 Fifth Normal Form(5NF)- no join dependency

Un-Normalized Form (UNF)


If a table contains non-atomic values at each row, it is said to be in UNF. An atomic value is
something that cannot be further decomposed. A non-atomic value, as the name suggests,
can be further decomposed and simplified. Consider the following table:
Emp-Id Emp-Name Month Sales Bank-Id Bank-Name
E01 AA Jan 1000 B01 SBI
Feb 1200
Mar 850
E02 BB Jan 2200 B02 UTI
Feb 2500
E03 CC Jan 1700 B01 SBI
Feb 1800
Mar 1850
Apr 1725
In the sample table above, there are multiple occurrences of rows under each key Emp-Id.
Although considered to be the primary key, Emp-Id cannot give us the unique identification
facility for any single row. Further, each primary key points to a variable length record (3 for
E01, 2 for E02 and 4 for E03).

First Normal Form (1NF)


 A relation will be 1NF if it contains an atomic value.
 It states that an attribute of a table cannot hold multiple values. It must hold only
single-valued attribute.
 First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.
 A relation is in 1NF if it does not have any repeating groups. It states that an
attribute of a table cannot hold multiple values. It must hold only single-valued
attribute.

A relation is said to be in 1NF if it contains atomic values and each row can provide a unique
combination of values. The above table in UNF can be processed to create the following
table in 1NF.

Emp-Name Month Sales Bank-Id Bank-Name


Emp-Id
E01 AA Jan 1000 B01 SBI
E01 AA Feb 1200 B01 SBI
E01 AA Mar 850 B01 SBI
E02 BB Jan 2200 B02 UTI
E02 BB Feb 2500 B02 UTI
E03 CC Jan 1700 B01 SBI
E03 CC Feb 1800 B01 SBI
E03 CC Mar 1850 B01 SBI
E03 CC Apr 1725 B01 SBI

Partial Dependency occurs when a nonprime attribute is functionally dependent on part of a


candidate key.

For example

R(A,B,C,D)

A,B candidate key

Prime Attribute-A,B

Non prime attribute -C,D

Functional Dependency A->C (partial dependency)

A B C D

Fully Functional Dependency

In a relation R attribute B is fully functionally dependent on A if A completely determine B. A


subset of A does not determine B.

An attribute is fully functional dependent on another attribute, if it is Functionally


Dependent on that attribute and not on any of its proper subset.

Student_Result
{StuID Student ID
CourseID Course ID of the course
CourseTitle Detailed Description of Course
ProfName Name of the professor who teaches this course
RoomNo Room Number where course is taught
Score Score attained by student in this course
CGPA CGPA
}

Primary Key of Relation

StuID+ CourseID
Dependencies
CourseID-> CourseDesc (Partial dependency)
CourseID-> ProfName ((Partial dependency)
ProfName->RoomNo
StuID+ CourseID-> Score (Fully functional dependency)
Score-> CGPA

In the example relation Score is fully functionally dependent on StuID+ CourseID

Second normal form (2NF)

A table is said to be in 2NF if both the following conditions hold:

 Table is in 1NF (First normal form)


 No non-prime attribute is dependent on the proper subset of any candidate key of
table.

A relation schema R is in 2NF if every non-prime attribute A in R is fully functionally


dependent on the primary key of R.

A relation is in 2NF if it is 1NF and no partial dependency.

 Prime attribute − An attribute, which is a part of the candidate-key, is known as a


prime attribute.
 Non-prime attribute − An attribute, which is not a part of the prime-key, is said to
be a non-prime attribute.

We see here in Student_Project relation that the prime key attributes are Stu_ID and
Proj_ID.
According to the rule, non-key attributes, i.e. Stu_Name and Proj_Name must be dependent
upon both and not on any of the prime key attribute individually. But we find that
Stu_Name can be identified by Stu_ID and Proj_Name can be identified by Proj_ID
independently. This is called partial dependency, which is not allowed in Second Normal
Form.

We broke the relation in two as depicted in the above picture. So there exists no partial
dependency.

Let us explain. Emp-Id, empname, month, bankid is the composite primary key of the above
relation. Emp-Name, Month, Sales and Bank-Name all depend upon Emp-Id. But the
attribute Bank-Name depends on Bank-Id, which is not the primary key of the table. So the
table is in 1NF, but not in 2NF. If this position can be removed into another related relation,
it would come to 2NF.
Emp- Emp- Mont Sale Bank-
Id Name h s Id
E01 AA JAN 1000 B01
E01 AA FEB 1200 B01
E01 AA MAR 850 B01
E02 BB JAN 2200 B02
E02 BB FEB 2500 B02
E03 CC JAN 1700 B01
E03 CC FEB 1800 B01
E03 CC MAR 1850 B01
E03 CC APR 1726 B01

Bank- Bank-
Id Name
B01 SBI
B02 UTI

Example: Suppose a college wants to store the data of teachers and the subjects they teach.
They create a table that looks like this: Since a teacher can teach more than one subjects,
the table can have multiple rows for a same teacher.

teacher_id Subject teacher_age

111 Maths 38

111 Physics 38

222 Biology 38

333 Physics 40

333 Chemistry 40

Candidate Keys: {teacher_id, subject}


Non prime attribute: teacher_age

The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF
because non prime attribute teacher_age is dependent on teacher_id alone which is a
proper subset of candidate key. This violates the rule for 2NF as the rule says “no non-prime
attribute is dependent on the proper subset of any candidate key of the table”.

To make the table complies with 2NF we can break it in two tables like this:

teacher_details table:
teacher_id teacher_age

111 38

222 38

333 40

teacher_subject table:

teacher_id Subject

111 Maths

111 Physics

222 Biology

333 Physics

333 Chemistry

Now the tables comply with Second normal form (2NF).

Let us see an example:


Example
StudentProject table

StudentID ProjectNo StudentName ProjectName

S01 199 Katie Geo Location

S02 120 Ollie Cluster Exploration

The prime key attributes are StudentID and ProjectNo, and


As stated, the non-prime attributes i.e. StudentName and ProjectName should be
functionally dependent on part of a candidate key, to be Partial Dependent.

The StudentName can be determined by StudentID that makes the relation Partial
Dependent.

Student ID->Student Name

The ProjectName can be determined by ProjectID, which that the relation Partial
Dependent.

Project Id->Project Name

Therefore, the StudentProject relation violates the 2NF in Normalization and is considered a
bad database design.
To remove Partial Dependency and violation on 2NF, decompose the tables:
StudentInfo

StudentID ProjectNo StudentName

S01 199 Katie

S02 120 Ollie

ProjectInfo

ProjectNo ProjectName

199 Geo Location

120 Cluster Exploration

Now the relation is in 2nd Normal form of Database Normalization.


How to find Candidate key

Question :- Let R = (A, B, C, D, E, F) be a relation scheme with the following dependencies-


C→F
E→A
EC → D
A→B

determine the total number of candidate keys


Solution-
We will find candidate keys of the given relation in the following steps-
To check, we find the closure of CE.
So, we have-
{ CE }+
={C,E}
= { C , E , F } ( Using C → F )
= { A , C , E , F } ( Using E → A )
= { A , C , D , E , F } ( Using EC → D )
= { A , B , C , D , E , F } ( Using A → B )

We conclude that CE can determine all the attributes of the given relation.
So, CE is the only possible candidate key of the relation.
Question on Second Normal Form (2NF):

1. Given a relation R( A, B, C, D, E) and Functional Dependency set

FD = { A → B, B → E, C → D}, determine whether the given R is in 2NF? If not convert it into


2 NF.

Solution: Let us construct an arrow diagram on R using FD to calculate the candidate key.

Let us calculate the closure of AC

AC + = ACBED

Since the closure of AC contains all the attributes of R, hence AC is Candidate Key

Hence there will be only one candidate key AC

Definition of 2NF: No non-prime attribute should be partially dependent on Candidate Key

Since R has 5 attributes: - A, B, C, D, E and Candidate Key is AC, Therefore, prime attribute
(part of candidate key) are A and C while the non-prime attribute are B D and E

a. FD: A → B does not satisfy the definition of 2NF, as a non-prime attribute(B) is


partially dependent on candidate key AC (i.e., key should not be broken at any cost).
b. FD: B → E does not violate the definition of 2NF, as a non-prime attribute(E) is
dependent on the non-prime attribute(B), which is not related to the definition of
2NF.
c. FD: C → D does not satisfy the definition of 2NF, as a non-prime attribute(D) is
partially dependent on candidate key AC (i.e., key should not be broken at any cost)

Hence because of FD A → B and C → D, the above table R( A, B, C, D, E) is not in 2NF

Convert the table R(A, B, C, D, E) in 2NF:

Since due to FD: A →B and C → D our table was not in 2NF, let's decompose the table
R1(A, B, E) ( from FD: A → B and B → E)

R2( C, D) (Now in table R2 FD: C → D is Fully Functional dependent, hence R2 is in 2NF)

And create one table for candidate key AC

R3 ( A, C)

Finally, the decomposed tables which are in 2NF:

a. R1( A, B, E)
b. R2( C, D)
c. R3( A, C)

Q2. Given a relation R( P, Q, R, S, T) and Functional Dependency set FD = { PQ → R, S → T },


determine whether the given R is in 2NF? If not convert it into 2 NF.

Solution: Let us construct an arrow diagram on R using FD to calculate the candidate key.

Let us calculate the closure of PQS

PQS + = PQSRT

Since the closure of PQS contains all the attributes of R, hence PQS is Candidate Key

From the definition of Candidate Key (Candidate Key is a Super Key whose no proper
subset is a Super key)

Hence there will be only one candidate key PQS


Definition of 2NF: No non-prime attribute should be partially dependent on Candidate Key.

Since R has 5 attributes: - P, Q, R, S, T and Candidate Key is PQS, Therefore, prime attributes
(part of candidate key) are P, Q, and S while a non-prime attribute is R and T

a) FD: PQ → R does not satisfy the definition of 2NF, that non-prime attribute( R) is partially
dependent on part of candidate key PQS.

b) FD: S → T does not satisfy the definition of 2NF, as a non-prime attribute(T) is partially
dependent on candidate key PQS (i.e., key should not be broken at any cost).

Hence, FD PQ → R and S → T, the above table R( P, Q, R, S, T) is not in 2NF

Convert the table R( P, Q, R, S, T) in 2NF:

Since due to FD: PQ → R and S → T, our table was not in 2NF, let's decompose the table

R1(P, Q, R) (Now in table R1 FD: PQ → R is Full F D, hence R1 is in 2NF)

R2( S, T) (Now in table R2 FD: S → T is Full F D, hence R2 is in 2NF)

And create one table for the key, since the key is PQS.

R3(P, Q, S)

Finally, the decomposed tables which is in 2NF are:

a) R1( P, Q, R)

b) R2(S, T)

c) R3(P, Q, S)

Third Normal Form (3NF)

A relation is said to be in 3NF, if it is already in 2NF and there exists no transitive


dependency for non prime attributes.

A relation schema R is in third normal form (3NF) if, whenever a nontrivial functional
dependency X→Y holds in R, either (a) X is a super key of R, or (b) Y is a prime attribute of R.

Note:-
If Non prime attribute determines other Non-Prime attribute then it is not in 3 NF
Example: Consider the following employee table:

Empno Ename Sal Dept_name Dept_city


101 Ashish 4000 Computer Lucknow
102 Neeta 7000 Administration Lucknow
103 Vikas 4500 Sales Delhi
104 Vinod 5000 Sales Meerut
105 Anuj 3000 Accounts Delhi
In this relation, Empno is the primary key attribute, Dept_name is the determinant of
dept_city and being prime key attribute, Ecode is the determinant of dept_name. So a
transitive FD exists.

Empno->Dept_name

Dept_name->Dept_city (transitive dependency) so it is not in 3 NF

To make it in 3 NF we decompose and remove transitive dependency.

Emp table

Empno Ename Sal Dept_name


101 Ashish 4000 Computer
102 Neeta 7000 Administration
103 Vikas 4500 Sales
104 Vinod 5000 Sales
105 Anuj 3000 Accounts

Department table

Dept_name Dept_city
Computer Lucknow
Administration Lucknow
Sales Delhi
Sales Meerut
Accounts Delhi

Decomposition of Relation tables into 3NF

Question Suppose a relational schema R (A B C D E) and set of functional dependencies

F: { A ->B , B-> E, C ->D } Check out that relation is in 3NF or not? If not decompose it in
3NF.

Solution1:

Firstly find the candidate key in the relation: (AC)+ = ABCDE AC is the candidate key, because
closure of AC has all the attributes of R.

Prime attributes: AC

Non prime attributes: BDE

A relation is said to be 3NF, if it holds at least one of the following for every non trivial
functional dependency X-> Y:

a)X is super key.

b)Y is prime attribute

A-> B -- Neither A is super key, nor B is prime attribute.

B ->E -- Neither B is not super key, nor E is prime attribute.

C ->D -- Neither C is not super key, nor D is prime attribute.

So, the relation is not in 3NF as it is not following the rules of 3NF.
Question 2: Suppose a relational schema R (A B C D E F G H I) and set of functional
dependencies

F: { AB ->C, AD-> GH, BD-> EF, A-> I, H-> J }

Check out that relation is in 3NF or not? If not decompose it in 3NF.

Solution 2: Firstly find the candidate key in the relation: (ABD) + = A B C D E F G H I

ABD is the candidate key, because closure of ABD has all the attributes of R.

Prime attributes: A, B, D

Non prime attributes: C E F G H I

A relation is said to be 3NF, if it holds at least one of the following for every non trivial
functional dependency X-> Y:

a)X is super key.

b)Y is prime attribute

AB-> C -- Neither AB is super key, nor C is prime attribute.

AD ->GH -- Neither AD is not super key, nor GH is prime attribute.

BD-> EF -- Neither BD is not super key, nor EF is prime attribute.

A-> I -- Neither A is not super key, nor I is prime attribute.

H ->J -- Neither H is not super key, nor J is prime attribute.

So, the relations is not in 3NF as it is not following the rules of 3NF.

therefore R(ABCDE) need to divided into following

R11(ABC) (AB->c) AB+=abc

R12(A I)
BCNF(Boyce – Codd Normal Form)
A relation schema R is in BCNF with respect to a set F of functional dependencies if, for
all functional dependencies in F + of the form α → β, where α ⊆ R and β ⊆ R, at least
one of the following holds:

• α → β is a trivial functional dependency (that is, β ⊆ α).


• α is a superkey for schema R.

A relation is in BCNF, if and only if, every determinant is a Form (BCNF) candidate key
and it must also be in 3NF

Boyce–Codd normal form is a normal form used in database normalization. It is a slightly


stronger version of the third normal form. BCNF was developed in 1974 by Raymond F.
Boyce and Edgar F. Codd to address certain types of anomalies not dealt with by 3NF as
originally defined

o BCNF is the advance version of 3NF. It is stricter than 3NF.


o A table is in BCNF if every functional dependency X → Y, X is the super key of the
table.
o For BCNF, the table should be in 3NF, and for every FD, LHS is super key.

Let's assume there is a company where employees work in more than one
department.

EMPLOYEE table:

EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO

264 India Designing D394 283

264 India Testing D394 300

364 UK Stores D283 232

364 UK Developing D283 549


In the above table Functional dependencies are as follows:

1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate key: {EMP-ID, EMP-DEPT}

The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.

To convert the given table into BCNF, we decompose it into three tables:

EMP_COUNTRY table:

EMP_ID EMP_COUNTRY

264 India

264 India

EMP_DEPT table:

EMP_DEP DEPT_TYP EMP_DEPT_N


T E O

Designing D394 283

Testing D394 300

Stores D283 232

Developing D283 549

EMP_DEPT_MAPPING table:

EMP_ID EMP_DEPT
D394 283

D394 300

D283 232

D283 549

Candidate keys:

For the first table: EMP_ID


For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}

Now, this is in BCNF because left side part of both the functional dependencies is a key.

Comparison BCNF vs 3NF

1)3NF design is always dependency preserving and lossless.

Dependency preserving may achieve in BCNF

2) BCNF stricter than 3NF

3)BCNF relation is in 3NF but reverse is not possible

BCNF Decomposition Algorithm

result={R};

done=false;

compute F+;

while (not done) do

if (there is a schema Ri in result that is not in BCNF)

then begin
let α->β be a nontrivial functional dependency that holds

on Ri such that α->Ri is not in F+, and α ∩ β= ø;

result=(result-Ri) U (Ri-β) U (α,β);

end

else done=true;

Figure 8.11 BCNF decomposition algorithm.

Example:-

class (course id, title, dept name, credits, sec id, semester, year, building,
room number, capacity, time slot id)

The set of functional dependencies that we require to hold on class are:

course id → title, dept name, credits


building, room number → capacity
course id, sec id, semester, year→ building, room number, time slot id

A candidate key for this schema is {course id, sec id, semester, year }
We can apply the algorithm of Figure 8.11 to the class example as follows:

• The functional dependency:


course id → title, dept name, credits

holds, but course id is not a superkey. Thus, class is not in BCNF. We replace class by:
course(course id, title, dept name, credits)
class-1 (course id, sec id, semester, year, building, room number,
capacity, time slot id)
The only nontrivial functional dependencies that hold on course include course id on
the left side of the arrow. Since course id is a key for course, the relation course is in
BCNF.
• A candidate key for class-1 is course id, sec id, semester, year . The functional
dependency:
{building, room number → capacity}

holds on class-1, but building, room number is not a superkey for class-1. We replace
class-1 by:
classroom (building, room number, capacity)
section (course id, sec id, semester, year,
building, room number, time slot id)

classroom and section are in BCNF.

Thus, the decomposition of class results in the three relation schemas course, class-
room, and section, each of which is in BCNF.

Question 2 : R=ABCDE,

F = {A -> BC, C -> DE)

Solution First, let’s compute the attribute closure: A+ = ABCDE B+ = B C+ = CDE D+ = D E+ = E

Candidate key=A

we can begin to decompose it.

Looking at the 1st FD in F, A -> BC. A a key for R. so this FD does not violate BCNF.

Looking at the 2nd FD in F, C -> DE. C is not the key


We decompose R into R1(CDE) , R2(ABC) – i.e. creating two sub schemas, -- { ABCDE-
DE=(ABC) }

In (ABC), A is still the key, so the first FD is still not in violation. In (CDE) C is the key, so C ->
DE is also not in violation. This decomposition is in BCNF.

Example 3:

R = (ABCD)

F = {AB -> C, B -> D; C -> A}

First, let’s compute the attribute closure: A+ = A, B+ = BD ,C+ = AC, D+ = D ,AB+ = ABCD ,BC+
= ABCD

So our candidate keys are AB and BC. Now we start the algorithm:

AB -> C Does the first violate BCNF, i.e. is AB a key for R? The answer is yes, so no violation.

B -> D Does this violate BCNF? Yes, because B is not a key. So we create 2 relations: (BD)
(ABC)

(C -> A). Is C a key – no, so we need to decompose by creating a new relation

(BD)(CA)(BC) {because ABC-A=BC}

Our final decomposition is: R1(BD), R2(CA), R3(BC)


Q5: Relation: R=CSJDPQV

FDs: C →CSJDPQV, SD →P, JP →C,J → S

Solution Candidate Key: C , JP because C+= CSJDPQV and JP+= CSJDPQV

JP →C is OK, since JP is a superkey

SD →P is a violating FD

Decompose into R1=CSJDQV and R2=SDP

J →S is still a violation in R1

Decompose R1: CJDQV and JS

Final set: CJDQV, JS, SDP


Question6 : R =(A, B, C, D). F = {C→D, C→A, B→C}.

Decompose R into a set of BCNF relations

(1)C→D and C→A both cause violations of BCNF.

Solution (1)C D and C A both cause violations of BCNF. Take C→D: decompose R to R1= {C,D}
, R2={A,B,C} [ ABCD-D=ABC]

R2 still violates BCNF because of C→A.

Decompose R2 to R11 = {C,A} R12 = {B,C}. [ABC-A=BC]

Final decomposition: R2 = {C, D}, R11 = {B, C}, R12 = {C, A}.

Ques7 : R = (A, B, C, D) F = {AB→C, AB→D, C→A, D→B}

1.Is R in 3NF, why? If it is not, decompose it into 3NF

2. Is R in BCNF, why? If it is not, decompose it into BCNF

R = (A, B, C, D) F = {AB →C, AB →D, C →A, D →B}

Solution 1. Find all the Candidate Keys: AB, BC, CD, AD Check all FDs in F for 3NF condition

Prime Attribute:-A,B,C,D

All the FDs are in 3 NF so it is in 3NF.

Ans 2. No. Because for C →A, C is not a superkey. Similar for D → B

Final Decompostion

R1 = {C, D}, R2 = {A, C}, R3 = {B, D}

Multi-Valued Dependency
MVD is the dependency where one attribute value is potentially a ‘multi-valued fact’ about
another.
Multi-valued dependency occurs when two or more independent multi-valued facts about
the same attribute occur within same table.
It means that if in a relation R having A, B and C as attributes, B and Care multi-valued facts
about A, which is represented as A-> ->B and A -> -> C, then multi value dependency exist
only if B and C are independent of each other.
It is denoted X->-> Y. For example:-Consider a bike manufacture company which provides
two colours (RED and BLACK)
Bike Table:

Bike Model Manufacture year Color


M1001 2007 Black
M1001 2007 Red
M1002 2008 Black
M1002 2008 Red
M2222 2009 Back
o Here columns manufacture year and color are independent of each other on bike
model. In this case these two columns are said to be multi-valued dependent on bike
model.
o Bike model-> manufacture year A-->B
o Bike model-> color A-->C

MVD can be defined as follow :Let R be a relational scheme and let X and Y be the subsets of
attributes of R.

A multi-valued dependency is denoted by X→→Y and will be read as “there is a multi-valued


dependency of Y on X” or “X multi determines Y”.

A table is said to have multi-valued dependency, if the following conditions are true,

 For a dependency A →→ B, if for a single value of A, multiple value of B exists, then


the table may have multi-valued dependency.
 Also, a table should have at-least 3 columns for it to have a multi-valued
dependency.
 And, for a relation R(A,B,C), if there is a multi-valued dependency between, A and B,
then B and C should be independent of each other.

Fourth Normal Form

A relation is said to be in 4 NF if it is in BCNF and it has no multi-valued dependency. For a


dependency A -> -> B, if for a single value of A, multiple values of B exists, then the relation
will be a multi-valued dependency.

Properties – A relation R is in 4NF if and only if the following conditions are satisfied:
1. It should be in the Boyce-Codd Normal Form (BCNF).
2. the table should not have any Multi-valued Dependency.
A relation schema R is in 4NF with respect to a set D of functional and multivalued
dependencies if for all multivalued dependencies in D+ of the form α →→ β, where α ⊆ R
and β ⊆ R, at least one of the following hold:

 α →→ β is trivial (i.e., β⊆α or α∪β = R) otherwise it is non trival


 α is a superkey for schema R „
 If a relation is in 4NF it is in BCNF

Example
Vendor Table

Vendor Code Item Code Project No.


V1 I1 P1
V1 I2 P1
V1 I1 P3
V1 I2 P3
V2 I2 P1
V2 I3 P1
V3 I1 P2
V3 I1 P3

A relation is in 4NF if it has no more than one independent multi valued dependency or one
independent multi valued dependency with a functional dependency.
The table can be expressed as the two 4NF relations given as following. The fact that
vendors are capable of supplying certain items and that they are assigned to supply for
some projects in independently specified in the 4NF relation.

Vendor-Supply
Item
Vendor Code
Code
V1 I1
V1 I2
V2 I2
V2 I3
V3 I1
Vendor-Project
Project
Vendor No.
Code
V1 P1
V1 P3
V2 P1
V3 P2
Example:

Faculty table

Faculty Subject Committee


John DBMS Placement
John Networking Placement
John MIS Placement
John DBMS Scholarship
John Networking Scholarship
John MIS Scholarship

Consider again the same Faculty relation with MVD. Clearly a faculty has multiple courses to
teach and he is heading several committees. This relation is in BCNF, since al the three
attributes concatenated together constitutes its key, yet it is clearly wrong and requires
decomposition. The rule for decomposition is to decompose the offending table into two,
with the mutlideterminant attribute or attributes as part of the key of both. In this case, to
put the relation in 4NF, two separate relations are formed as follows:

FACULTY_COURSE (FACULTY, COURSE)


FACULTY_COMMITTEE (FACULTY, COMMITTEE)
This table is not in 4 NF because multi-valued dependency so we divide it into two relations

Faculty subject table

Faculty Subject
John DBMS
John Networking
John MIS

Faculty Committee table

Faculty Committee
John Placement
John Scholarship

Example 2:
STU_ID COURSE HOBBY

21 Computer Dancing

21 Math Singing

34 Chemistry Dancing

74 Biology Cricket

59 Physics Hockey

So to make the above table into 4NF, we can decompose it into two tables:

STUDENT_COURSE

STU_ID COURSE

21 Computer

21 Math

34 Chemistry

74 Biology

59 Physics

STUDENT_HOBBY

STU_ID HOBBY

21 Dancing

21 Singing

34 Dancing
74 Cricket

59 Hockey
Consider R(A,B,C,D,E,F,G) with the set of FD's and MVD's given by

{A-> -> B , B->-> G , B->->EG, CD->-> E}

Decompose R into 4NF.Show that the decomposition is not dependency preserving.

Solution

A->->B we will decompose and into two schemas

i)(A,B) and (A,C,D,E,F,G)

(A,B) is in 4NF but (A,C,D,E,F,G) is not in 4 NF

 applying MVD dependency CD->->E we will decompose (A,C,D,E,F,G) into two


schemas
(C,D,E) and (A,C,D,F,G)
(C,D,E) is in 4NF but (A,C,D,F,G) is not in 4NF
 As A->->B and B->-> G
A->-> G by transitivity rule
using the MVD A->->G , we will decompose (A,C,D,F,G) into two schemas:
(A,G) and (A,C,D,F)
(A,G) is in 4 NF but (A,C,D,F) is not in 4 NF
 Thus 4NF schemas are:
(A,B) {A->->B}
(C,D,E) {CD->->E}
(A,G) {A->->B and B->->G then A->->G}
(A,C,D,F)
This decomposition is not dependency preserving as it fails to preserve the
dependency B->->EG

Decomposition

The process of breaking up or dividing a single relation into two or more sub relations is
called as decomposition of a relation.

Lossless Join Decomposition-


Consider there is a relation R which is decomposed into sub relations R1 , R2 , …. , Rn.
 This decomposition is called lossless join decomposition when the join of the sub relations
results in the same relation R that was decomposed.
 For lossless join decomposition, we always have-

R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn = R

where ⋈ is a natural join operator


Consider the following relation R( A , B , C )-

A B C

1 2 1

2 5 3

3 3 3

R( A , B , C )
Consider this relation is decomposed into two sub relations R1( A , B ) and R2( B , C )-

The two sub relations are-

R1 ( A , B )

A B

1 2

2 5

3 3

R2 ( B , C )

B C

2 1

5 3

3 3

Now, let us check whether this decomposition is lossless or not.


For lossless decomposition, we must have-
R1 ⋈ R 2 = R

Now, if we perform the natural join ( ⋈ ) of the sub relations R1 and R2 , we get-

A B C

1 2 1

2 5 3

3 3 3

This relation is same as the original relation R.

Thus, we conclude that the above decomposition is lossless join decomposition.

NOTE-

 Lossless join decomposition is also known as non-additive join decomposition.


 This is because the resultant relation after joining the sub relations is same as the
decomposed relation.
 No extraneous tuples appear after joining of the sub-relations.

2. Lossy Join Decomposition-


Consider there is a relation R which is decomposed into sub relations R1 , R2 , …. , Rn.

 This decomposition is called lossy join decomposition when the join of the sub relations
does not result in the same relation R that was decomposed.
 The natural join of the sub relations is always found to have some extraneous tuples.
 For lossy join decomposition, we always have-

R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn ⊃ R

where ⋈ is a natural join operator

Example-
Consider the following relation R( A , B , C )-

A B C

1 2 1

2 5 3

3 3 3

Consider this relation is decomposed into two sub relations as R1( A , C ) and R2( B , C )-

The two sub relations are-

R1 ( A , B )

A C

1 1

2 3

3 3
R2 ( B , C )

B C

2 1

5 3

3 3

Now, let us check whether this decomposition is lossy or not.

For lossy decomposition, we must have-

R1 ⋈ R 2 ⊃ R

Now, if we perform the natural join ( ⋈ ) of the sub relations R1 and R2 we get-

A B C

1 2 1

2 5 3

2 3 3

3 5 3

3 3 3

This relation is not same as the original relation R and contains some extraneous tuples.
Clearly, R1 ⋈ R2 ⊃ R.
Thus, we conclude that the above decomposition is lossy join decomposition.

If we decompose a relation R into relations R1 and R2,


 Decomposition is lossy if R1 ⋈ R2 ⊃ R
 Decomposition is lossless if R1 ⋈ R2 = R
To check for lossless join decomposition using FD set, following conditions must hold:
1. Union of Attributes of R1 and R2 must be equal to attribute of R. Each attribute of R
must be either in R1 or in R2.
Att(R1) U Att(R2) = Att(R)
2. Intersection of Attributes of R1 and R2 must not be NULL.
Att(R1) ∩ Att(R2) ≠ Φ
3. Common attribute must be a key for at least one relation (R1 or R2)
Att(R1) ∩ Att(R2) -> Att(R1) or Att(R1) ∩ Att(R2) -> Att(R2)
For Example, A relation R (A, B, C, D) with FD set{A->BC} is decomposed into R1(ABC) and
R2(AD) which is a lossless join decomposition as:

1. First condition holds true as Att(R1) U Att(R2) = (ABC) U (AD) = (ABCD) = Att(R).
2. Second condition holds true as Att(R1) ∩ Att(R2) = (ABC) ∩ (AD) ≠ Φ
3. Third condition holds true as Att(R1) ∩ Att(R2) = A is a key of R1(ABC) because A->BC
is given.

Dependency Preserving Decomposition


If we decompose a relation R into relations R1 and R2, All dependencies of R either must
be a part of R1 or R2 or must be derivable from combination of FD’s of R1 and R2.
For Example, A relation R (A, B, C, D) with FD set{A->BC} is decomposed into R1(ABC) and
R2(AD) which is dependency preserving because FD A->BC is a part of R1(ABC).
Question: Consider a schema R(A,B,C) and functional dependencies A->B. Prove that the
decomposition of R into R1(A,B) and R2(A,C) is lossless join decomposition.
Solution:
1. Att(R1) U Att(R2) = Att(R)
(A,B) U (A,C)=(A,B,C) True
2. Att(R1) ∩ Att(R2) ≠ Φ
(A,B) ∩ (A,C)=A≠ Φ true

3. The decomposition is lossless join decomposition if either of the following holds:


Att(R1) ∩ Att(R2) -> Att(R1) or Att(R1) ∩ Att(R2) -> Att(R2)
Att(R1) ∩ Att(R2) = (A,B) ∩ (A,C)=A which is common attribute
A->B is the FD in F . By augmentation rule we have
A->AB which is R1 Thus
Att(R1) ∩ Att(R2) -> Att(R1) is satisfied . Hence above decomposition is lossless-Join
Decomposition

Question 2: Consider a relation schema R ( A , B , C , D ) with the functional dependencies


A → B and C → D. Determine whether the decomposition of R into R 1 ( A , B ) and R2 ( C ,
D ) is lossless or lossy.

Solution:-
1. R1 ( A , B ) ∪ R2 ( C , D )

=R(A,B,C,D)

Clearly, union of the sub relations contain all the attributes of relation R.

Thus, condition-01 satisfies.

2. R1 ( A , B ) ∩ R2 ( C , D )

Clearly, intersection of the sub relations is null.
So, condition-02 fails.
Thus, we conclude that the decomposition is lossy.

Question 3.Consider a relation schema R ( A , B , C , D ) with the following functional


dependencies-
A→B
B→C
C→D
D→B
Determine whether the decomposition of R into R1 ( A , B ) , R2 ( B , C ) and R3 ( B , D ) is
lossless or lossy.
Solution:
1. According to condition-01, union of both the sub relations must contain all the attributes
of relation R.

R‘ ( A , B , C ) ∪ R3 ( B , D )

=R(A,B,C,D)

Clearly, union of the sub relations contain all the attributes of relation R.

Thus, condition-01 satisfies.

2. According to condition-02, intersection of both the sub relations must not be null.

So, we have-

R‘ ( A , B , C ) ∩ R3 ( B , D )

=B

Clearly, intersection of the sub relations is not null.

Thus, condition-02 satisfies.

3. According to condition-03, intersection of both the sub relations must be the super key of
one of the two sub relations or both.

So, we have-

R‘ ( A , B , C ) ∩ R3 ( B , D )

=B

Now, the closure of attribute B is-

B+ = { B , C , D }

Now, we see- Attribute ‘B’ can determine all the attributes of sub relation R 3.

Thus, it is a super key of the sub relation R3. So, condition-03 satisfies.
Thus, we conclude that the decomposition is lossless.
Join Dependency:-
Join decomposition is a further generalization of Multi-valued dependencies. If the join of R1
and R2 over C is equal to relation R, then we can say that a join dependency (JD) exists.
Where R1 and R2 are the decompositions R1(A, B, C) and R2(C, D) of a given relations R (A,
B, C, D).
Definition. A join dependency (JD), denoted by JD(R1, R2, ..., Rn), specified on relation
schema R, specifies a constraint on the states r of R. The constraint states that every legal
state r of R should have a nonadditive join decomposition into R1, R2, ..., Rn. Hence, for
every such r we have

∗ (πR1(r), πR2(r), ..., πRn(r)) = r

A join dependency JD(R1, R2, ..., Rn), specified on relation schema R,is a trivial JD if one of
the relation schemas Ri in JD(R1, R2, ..., Rn) is equal to R. Such a dependency is called
trivial because it has the lossless join property for any relation state r of R and thus does not
specify any constraint on R.

o Alternatively, R1 and R2 are a lossless decomposition of R.


o A JD ⋈{R1, R2,..., Rn} is said to hold over a relation R if R1, R2,....., Rn is a lossless-
join decomposition.
o The *(A, B, C, D), (C, D) will be a JD of R if the join of join's attribute is equal to the
relation R.
o Here, *(R1, R2, R3) is used to indicate that relation R1, R2, R3 and so on are a JD of R.

Fifth Normal Form (5NF):-


A relation is in 5NF if it is in 4NF and not contains any join dependency and cannot be
further non loss decomposed. 5NF is satisfied when all the tables are broken into as many
tables as possible in order to avoid redundancy.5NF is also known as Project-join normal
form (PJNF).

Definition. A relation schema R is in fifth normal form (5NF) (or project-join normal form
(PJNF)) with respect to a set F of functional, multivalued, and join dependencies if, for every
nontrivial join dependency JD(R1, R2, ..., Rn) in F+ (that is, implied by F), every Ri is a
superkey of R.

Vendor Table

Vendor Code Item Code Project No.


V1 I1 P1
V1 I2 P1
V1 I1 P3
V1 I2 P3
V2 I2 P1
V2 I3 P1
V3 I1 P2
V3 I1 P3

Vendor-Supply
Item
Vendor Code
Code
V1 I1
V1 I2
V2 I2
V2 I3
V3 I1

Vendor-Project
Project
Vendor No.
Code
V1 P1
V1 P3
V2 P1
V3 P2

These relations still have a problem. While defining the 4NF we mentioned that all the
attributes depend upon each other. While creating the two tables in the 4NF, although we
have preserved the dependencies between Vendor Code and Item code in the first table and
Vendor Code and Item code in the second table, we have lost the relationship between Item
Code and Project No. If there were a primary key then this loss of dependency would not
have occurred. In order to revive this relationship we must add a new table like the
following. Please note that during the entire process of normalization, this is the only step
where a new table is created by joining two attributes, rather than splitting them into
separate tables.
Project Item
No. Code
P1 11
P1 12
P2 11
P3 11
P3 13

Example 2:

Company Product Supplier


Godrej Soap Mr. X
Godrej Shampoo Mr. X
Godrej Shampoo Mr. Y
Godrej Shampoo Mr. Z
H.Lever Soap Mr. X
H.Lever Soap Mr. Y
H.Lever Shampoo Mr. Y
The relation is not in 5 NF we decompose into 3 tables:-
Company Product table

Company Product
Godrej Soap
Godrej Shampoo
H.Lever Soap
H.Lever Shampoo

Company Supplier table

Company Supplier
Godrej Mr. X
Godrej Mr. Y
Godrej Mr. Z
H.Lever Mr. X
H.Lever Mr. Y

Product Supplier Table

Product Supplier
Soap Mr. X
Soap Mr. Y
Shampoo Mr. X
Shampoo Mr. Y
Shampoo Mr. Z

We take natural join of Company Product⋈ Product Supplier

Company Product Supplier


Godrej Soap Mr. X
Godrej Soap Mr. Y
Godrej Shampoo Mr. X
Godrej Shampoo Mr. Y
Godrej Shampoo Mr. Z
H.Lever Soap Mr. X
H.Lever Soap Mr. Y
H.Lever Shampoo Mr. X
H.Lever Shampoo Mr. Y
H.Lever Shampoo Mr. Z

we take natural join of Company Product⋈ Product Supplier⋈ Product

Supplier we get

Company Product Supplier


Godrej Soap Mr. X
Godrej Soap Mr. Y spurious
Godrej Shampoo Mr. X tuples
Godrej Shampoo Mr. Y
Godrej Shampoo Mr. Z
H.Lever Soap Mr. X
H.Lever Soap Mr. Y
H.Lever Shampoo Mr. X
H.Lever Shampoo Mr. Y

Example 3:

Consider once again the SUPPLY all-key relation in Figure . Suppose that the following
additional constraint always holds: Whenever a supplier s supplies part p, and a project j
uses part p, and the supplier s supplies at least one part to project j, then supplier s will also
be supplying part p to project j. This constraint can be restated in other ways and specifies a
join dependency JD(R1, R2, R3) among the three projections R1(Sname, Part_name),
R2(Sname, Proj_name), and R3(Part_name, Proj_name) of SUPPLY
SUPPLY relation with the join dependency is decomposed into three relations R1, R2, and R3
that are each in 5NF. Notice that applying a natural join to any two of these relations
produces spurious tuples, but applying a natural join to all three together does not.

When two tables R1 and R3 are joined we get(R1⋈R3)

Sname Partname Proj Name


Smith Bolt Proj X
Smith Bolt Proj Y
Smith Nut Proj Y
Smith Nut Proj Z spurious tuples
Adamsky Bolt Proj X
Adamsky Bolt Proj Y
Walton Nut Proj Y
Walton Nut Proj Z
Adamsky Nail Proj X

If we R1⋈R2⋈R3 are taken then the table

Sname Partname Proj Name


Smith Bolt Proj X
Smith Bolt Proj Y
Smith Nut Proj Y
Adamsky Bolt Proj X
Adamsky Bolt Proj Y
Walton Nut Proj Z
Adamsky Nail Proj X

Inclusion Dependency

o Multivalued dependency and join dependency can be used to guide database design
although they both are less common than functional dependencies.

o Inclusion dependencies are quite common. They typically show little influence on
designing of the database.

o The inclusion dependency is a statement in which some columns of a relation are


contained in other columns.

o The example of inclusion dependency is a foreign key. In one relation, the referring
relation is contained in the primary key column(s) of the referenced relation.

o Suppose we have two relations R and S which was obtained by translating two entity
sets such that every R entity is also an S entity.

o Inclusion dependency would be happen if projecting R on its key attributes yields a


relation that is contained in the relation obtained by projecting S on its key
attributes.

o In inclusion dependency, we should not split groups of attributes that participate in


an inclusion dependency.

o In practice, most inclusion dependencies are key-based that is involved only keys.
Inclusion dependencies were defined in order to formalize two types of interrelational
constraints:

 The foreign key (or referential integrity) constraint cannot be specified as a functional or
multivalued dependency because it relates attributes across relations.
 The constraint between two relations that represent a class/subclass relation-ship.

Definition. An inclusion dependency R.X < S.Y between two sets of attributes—X of relation
schema R, and Y of relation schema S—specifies the constraint that, at any specific time
when r is a relation state of R and s a relation state of S, we must have

πX(r(R)) ⊆ πY(s(S))

The ⊆ (subset) relationship does not necessarily have to be a proper subset. Obviously, the
sets of attributes on which the inclusion dependency is specified—X of R and Y of S—must
have the same number of attributes.

Examples: DEP.D_MGR_SSN < EMP.SSN ie, Social Security Number of managers for a
department is always a subset of Social Security Number of Employees of a department.

As with other types of dependencies, there are inclusion dependency inference rules (IDIRs).
The following are three examples:

IDIR1 (reflexivity): R.X < R.X.

IDIR2 (attribute correspondence): If R.X < S.Y, where X = {A1, A2,≤ ..., An} and Y =
{B1, B2, ..., Bn} and Ai corresponds to Bi, then R.Ai < S.Bi for 1  i  n.
IDIR3 (transitivity): If R.X < S.Y and S.Y < T.Z, then R.X < T.Z.

Alternative approaches for database design

 Dangling tuples — Tuples that “disappear” in computing a join.

– Let r1(R1), r2(R2), ..., rn(Rn) be a set of relations.

– A tuple t of relation ri is a dangling tuple if t is not in the relation:

ΠRi (r1 ⋈ r2 ⋈... ⋈ rn)

• The relation r1 ⋈ r2 ⋈... ⋈ rn is called a universal relation since it involves all the
attributes in the “universe” defined by

R1 ∪ R2 ∪ ... ∪ Rn.

• If dangling tuples are allowed in the database, instead of decomposing a universal


relation, we may prefer to synthesize a collection of normal form schemas from a given set
of attributes.

There are two classical approaches to database design:


• The top-down design method starts from the general and moves to the specific. In other
words, you start with a general idea of what is needed for the system and then work your
way down to the more specific details of how the system will interact. This process involves
the identification of different entity types and the definition of each entity’s attributes.
• The bottom-up approach begins with the specific details and moves up to the general.
This is done by first identifying the data elements (items) and then grouping them together
in data sets. In other words, this method first identifies the attributes, and then groups them
to form entities.
Two general approaches (top – down and bottom – up) to the design of the databases can
be heavily influenced by factors like scope, size of the system, the organizations
management style, and the organizations structure

Question to identify Normal Form

Question:- Given a relation R( A, B, C) and Functional Dependency set FD = { A → B, B → C,


and C → A}, determine given R is in which normal form?

Solution: Let us construct an arrow diagram on R using FD to calculate the candidate key.

From the above arrow diagram on R, we can see that all the attributes are determined by all
the attributes of the given FD, hence we will check all the attributes (i.e., A, B, and C) for
candidate keys

Let us calculate the closure of A

A + = ABC . Since closure A contains all the attributes of R, hence A is the Candidate key.

Let us calculate the closure of B

B + = BAC . Since closure B contains all the attributes of R, hence B is the Candidate key.

Let us calculate the closure of C

C + = CAB . Since closure C contains all the attributes of R, hence C is the Candidate key.

Hence three Candidate keys are: A B and C

Since R has 3 attributes: - A B and C, Candidate Keys are A B and C, Therefore, prime
attributes (part of candidate key) are A B C while there is no non-prime attribute

Given FD are { A → B, B → C, and C → A } and Super Key / Candidate Key is A B and C

a. FD: A → B satisfy the definition of BCNF, as A is Super Key, we check other FD for
BCNF
b. FD: B → C satisfy the definition of BCNF, as B is Super Key, we check other FD for
BCNF

c. FD: C -> A satisfy the definition of BCNF, as C is Super Key

Since there were only three FD's and all FD: { A → B, B → C and C → A } satisfy BCNF, hence
the highest normal form is BCNF.

Therefore R(A, B, C ) is in BCNF.

Ques: Given a relation R( P, Q, R, S, T, U, V, W, X) and Functional Dependency set FD = { PQ


→ R, QS → TU, PS → VW, and P → X }, determine whether the given R is in which normal
form?

Solution:

Let us calculate the closure of PQS

PQS + = P Q R S T U X V W (from the closure method we studied earlier)

Since the closure of PQS contains all the attributes of R, hence PQS is Candidate Key

Since R has 9 attributes: - P, Q, R, S, T, U, V, W, X, and Candidate Key is PQS, Therefore,


prime attributes (part of candidate key) are P Q and S while non-prime attribute is R T U V
WX

Given FD are { PQ → R, QS → TU, PS → VW, and P → X } and Super Key / Candidate Key is
PQS

a. FD: PQ → R does not satisfy the definition of BCNF, as PQ is not Super Key, hence the
table is not in BCNF (because if one dependency fails, all fails) now we check the
same FD for 3NF.

b. FD: PQ → R even does not satisfy the definition of 3NF, as PQ is not Super Key or R is
not a prime attribute, hence table is not in 3NF also (because if one dependency
fails, all fails) now we check same FD for 2NF
c. FD: PQ → R even does not satisfy the definition of 2NF, as PQ is not Super Key and R
which is not prime attribute depending on part of the key (partial dependency),
hence table is not in 2NF also (because if one dependency fails, all fails).

Hence from the above three statements, we can say that table R ( P, Q, R, S, T, U, V, W, X)
is in 1NF only.

You might also like