UNIT-3 DBMS Normalization and FD
UNIT-3 DBMS Normalization and FD
Functional Dependency
A functional dependency is a constraint that specifies the relationship between two sets of
attributes where one set can accurately determine the value of other sets.
It is denoted as X → Y, where X is a set of attributes that is capable of determining the value
of Y. The attribute set on the left side of the arrow, X is called Determinant, while on the
right side, Y is called the Dependent.
X → Y
Y is functionally
X Y
dependent on X
Determinant: attribute or set of attributes on the left hand side of the arrow.
The left side of FD is known as a determinant, the right side of the production is known as a
dependent.
A B
1 4
1 5
3 7
On this instance, A → B does NOT hold, but B → A does hold.
Example:
Example: ID → Name,
3. Multi-valued dependency :It occurs when two or more independent multi-valued facts
about the same attribute occur within same table. It is denoted
X-> -> Y. For example:-Consider a bike manufacture company which provides two colors
(RED and BLACK)
4. Transitive Dependency
In transitive functional dependency, dependent is indirectly dependent on determinant.
i.e. If a → b & b → c, then according to axiom of transitivity, a → c. This is a transitive
functional dependency
For example,
42 Abc CO 4
43 Pqr EC 2
44 Xyz IT 1
45 Abc EC 2
Primary Rules
Reflexivity rule. If X is a set of attributes and Y ⊆ X, then X→Y holds.
Augmentation rule. If X→ Y holds and Z is a set of attributes, then XZ→ YZ holds.
Transitivity rule. If X→Y holds and Y→Z holds, then X→Z holds.
Secondary Rules
Union rule. If X→Y holds and X→Z holds, then X→YZ holds.
Decomposition rule. If X→Y Z holds, then X→Y holds and X→Z holds.
Pseudo transitivity rule. If X → Y holds and WY→Z holds, then X W → Z holds.
Composition rule :- If X->Y and A->B then XA->YB
Example R = (A, B, C, G, H, I) F = { A → B A → C CG → H CG → I B → H}
some members of F+
AG → I (Since A-> C and CG -> I hold in the relation, It derives AG-> I using Pseudo
Transitivity rule)
CG → HI ( Since CG-> H and CG-> I hold in the relation, It derives CG HI using Union rule)
AG-> H (Since A -> C and CG -> H hold in the relation, It derives AG H using Pseudo
Transitivity rule )
Therefore F+ ={ A-> H , CG-> HI, AG-> I, AG-> H }
Procedure to computing F+
Closure set of attribute set
The set of all those attributes which can be functionally determined from an attribute set is
called as a closure of that attribute set.
Closure of attribute set {X} is denoted as {X}+.
Steps to Find Closure of an Attribute Set-
Following steps are followed to find the closure of an attribute set-
Step-01:
Add the attributes contained in the attribute set for which closure is being calculated to the
result set.
Step-02:
Recursively add the attributes to the result set which can be functionally determined from
the attributes already contained in the result set.
result := α;
begin
end
BC → DE
D→F
CF → G
Closure of attribute A-
A+ = { A }
= { A B C } ( Using A → BC )
= { A B C D E } ( Using BC → DE )
= { A B C D E F } ( Using D → F )
= { A B C D E F G } ( Using CF → G )
Thus,
A+ = { A , B , C , D , E , F , G }
Closure of attribute D-
D+ = { D }
= { D F } ( Using D → F )
Thus,
D+ = { D F }
{ B C }+= { B , C }
= { B C D E } ( Using BC → DE )
= { B C D E F } ( Using D → F )
= { B C , D E F G } ( Using CF → G )
Thus,
{ B , C }+ = { B , C , D , E , F , G }
Question :- R = (A, B, C, G, H, I)
F = {A → B ,A → C ,CG → H , CG → I , B → H}
Find (AG)+
Solution : 1. result = AG
2. result = ABCG (A → C and A → B)
3. result = ABCGH (CG → H and CG ⊆ AGBC)
4. result = ABCGHI (CG → I and CG ⊆ AGBCH)
(AG)+={ ABCGHI}
a) QR+ = QR
Hence QR+ = QRST(QR->ST)
b) PR + = PR
P→Q, P is a subset of PR, Hence PR+ = PRQ
Suppose F & G are the two sets of functional dependencies for a relational schema R, then
following four cases may possible:
F ⊆ G (F is the subset of G)
G ⊆ F (G is the subset of F)
F = G (F ⊆ G and G ⊆ F) (F is equivalent to G)
F ≠ G (F is not equivalent to G)
Canonical Cover OR Irreducible set of functional dependency
A canonical cover Fc for F is a set of dependencies such that F logically implies all
dependencies in Fc , and Fc logically implies all dependencies in F. Furthermore, Fc must
have the following properties:
1. B → A
2. AD → B ( using decomposition inference rule on AD → BC)
3. AD → C ( using decomposition inference rule on AD → BC)
4. C → A ( using decomposition inference rule on C → ABD)
5. C → B ( using decomposition inference rule on C → ABD)
6. C → D ( using decomposition inference rule on C → ABD)
Now set of FD = { B → A, AD → B, AD → C, C → A, C → B, C → D }
The next step is to find closure of the left side of each of the given FD by including that FD
and excluding that FD, if closure in both cases are same then that FD is redundant and we
remove that FD from the given set, otherwise if both the closures are different then we do
not exclude that FD.
From 1 a and 1 b, we found that both the Closure( by including B → A and excluding B → A )
are not equivalent, hence FD B → A is important and cannot be removed from the set of FD.
From 2 a and 2 b, we found that both the Closure (by including AD → B and excluding AD →
B) are equivalent, hence FD AD → B is not important and can be removed from the set of
FD.
Hence resultant FD = { B → A, AD → C, C → A, C → B, C → D }
From 3 a and 3 b, we found that both the Closure (by including AD → C and excluding AD →
C ) are not equivalent, hence FD AD → C is important and cannot be removed from the set
of FD.
Hence resultant FD = { B → A, AD → C, C → A, C → B, C → D }
From 4 a and 4 b, we found that both the Closure (by including C → A and excluding C → A)
are equivalent, hence FD C → A is not important and can be removed from the set of FD.
Hence resultant FD = { B → A, AD → C, C → B, C → D }
5 b. Closure C+ = CD using FD = { B → A, AD → C, C → D }
From 5 a and 5 b, we found that both the Closure (by including C → B and excluding C → B)
are not equivalent, hence FD C → B is important and cannot be removed from the set of FD.
Hence resultant FD = { B → A, AD → C, C → B, C → D }
From 6 a and 6 b, we found that both the Closure( by including C → D and excluding C → D)
are not equivalent, hence FD C → D is important and cannot be removed from the set of FD.
Hence resultant FD = { B → A, AD → C, C → B, C → D }
Since the closure of AD+, A+, D+ that we found are not all equivalent, hence in FD AD → C,
both A and D are important attributes and cannot be removed.
1. W → X
2. Y → X
3. Z → W ( using decomposition inference rule on Z → WXY )
4. Z → X ( using decomposition inference rule on Z → WXY )
5. Z → Y ( using decomposition inference rule on Z → WXY )
6. WY → Z
Now set of FD = { W → X, Y → X, WY → Z, Z → W, Z → X, Z → Y }
The next step is to find closure of the left side of each of the given FD by including that FD
and excluding that FD, if closure in both cases are same then that FD is redundant and we
remove that FD from the given set, otherwise if both the closures are different then we do
not exclude that FD.
1 a. Closure W+ = WX using FD = { W → X, Y → X, Z → W, Z → X, Z → Y, WY → Z }
1 b. Closure W+ = W using FD = { Y → X, Z → W, Z → X, Z → Y, WY → Z }
From 1 a and 1 b, we found that both the Closure (by including W → X and excluding W →
X ) are not equivalent, hence FD W → X is important and cannot be removed from the set of
FD.
Hence resultant FD = { W → X, Y → X, Z → W, Z → X, Z → Y, WY → Z }
2 a. Closure Y+ = YX using FD = { W → X, Y → X, Z → W, Z → X, Z → Y, WY → Z }
2 b. Closure Y+ = Y using FD = { W → X, Z → W, Z → X, Z → Y, WY → Z }
From 2 a and 2 b we found that both the Closure (by including Y → X and excluding Y → X )
are not equivalent, hence FD Y → X is important and cannot be removed from the set of FD.
Hence resultant FD = { W → X, Y → X, Z → W, Z → X, Z → Y, WY → Z }
From 3 a and 3 b, we found that both the Closure (by including Z → W and excluding Z →
W ) are not equivalent, hence FD Z → W is important and cannot be removed from the set
of FD.
Hence resultant FD = { W → X, Y → X, Z → W, Z → X, Z → Y, WY → Z }
From 4 a and 4 b, we found that both the Closure (by including Z → X and excluding Z → X )
are equivalent, hence FD Z → X is not important and can be removed from the set of FD.
Hence resultant FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }
Hence resultant FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }
From 6 a and 6 b, we found that both the Closure (by including WY → Z and excluding WY →
Z) are not equivalent, hence FD WY → Z is important and cannot be removed from the set of
FD.
Hence resultant FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }
Closure W+ = WX using FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }
Closure Y+ = YX using FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }
Since the closure of WY+, W+, Y+ that we found are not all equivalent, hence in FD WY → Z,
both W and Y are important attributes and cannot be removed.
Normalization
Normalization is the process of organizing the data in the database.
Normalization is used to minimize the redundancy from a relation or set of relations.
It is also used to eliminate the undesirable characteristics like Insertion, Update and
Deletion Anomalies.
Normalization divides the larger table into the smaller table and links them using
relationship.
The normal form is used to reduce redundancy from the database table.
If a database design is not perfect, it may contain anomalies, which are like a bad dream for
any database administrator. Managing a database with anomalies is next to impossible.
Update anomalies − If data items are scattered and are not linked to each other
properly, then it could lead to strange situations. For example, when we try to
update one data item having its copies scattered over several places, a few instances
get updated properly while a few others are left with old values. Such instances
leave the database in an inconsistent state.
Deletion anomalies − We tried to delete a record, but parts of it was left undeleted
because of unawareness, the data is also saved somewhere else.
Insert anomalies − An Insert Anomaly occurs when certain attributes cannot
be inserted into the database without the presence of other attributes.
Anomalies in DBMS
Anomalies are problems that can occur in poorly planned, un-normalised databases
where all the data is stored in one table
Student Table:
1. Insert Anomaly: An insert anomaly occurs in the relational database when some
attributes or data items are to be inserted into the database without existence of
other attributes. For example, In the Student table, if we want to insert a new
courseID, we need to wait until the student enrolled in a course. In this way, it is
difficult to insert new record in the table. Hence, it is called insertion anomalies.
2. Update Anomalies: The anomaly occurs when duplicate data is updated only in
one place and not in all instances. Hence, it makes our data or table inconsistent
state. For example, suppose there is a student 'James' who belongs to Student table.
If we want to update the course in the Student, we need to update the same in the
course table; otherwise, the data can be inconsistent. And it reflects the changes in a
table with updated values where some of them will not.
3. Delete Anomalies: An anomaly occurs in a database table when some records are
lost or deleted from the database table due to the deletion of other records.
To solve these anomalies and data redundancy we use normalization in DBMS so that
database remains consistent and improves data integrity.
Purpose of Normalization
Normalization is the process of efficiently organizing data in a database. There are two goals
of the normalization process:
Objective of Normalization
1. It is used to remove the duplicate data and database anomalies from the relational table.
2. Normalization helps to reduce redundancy and complexity by examining new data types
used in the table.
3. It is helpful to divide the large database table into smaller tables and link them using
relationship.
4. It avoids duplicate data or no repeating groups into a table.
5. It reduces the chances for anomalies to occur in a database.
ADVANTAGES OF NORMALIZATION
The following are the advantages of the normalization.
DISADVANTAGES OF NORMALIZATION
The following are disadvantages of normalization.
You cannot start building the database before you know what the user needs.
On Normalizing the relations to higher normal forms i.e. 4NF, 5NF the performance
degrades.
It is very time consuming and difficult process in normalizing relations of higher degree.
Careless decomposition may leads to bad design of database which may leads to serious
problems.
More tables to join by spreading out data into more tables ,you need to increase to join the
tables
More complicated SQL queries for multiple table and joins
A relation is said to be in 1NF if it contains atomic values and each row can provide a unique
combination of values. The above table in UNF can be processed to create the following
table in 1NF.
For example
R(A,B,C,D)
Prime Attribute-A,B
A B C D
Student_Result
{StuID Student ID
CourseID Course ID of the course
CourseTitle Detailed Description of Course
ProfName Name of the professor who teaches this course
RoomNo Room Number where course is taught
Score Score attained by student in this course
CGPA CGPA
}
StuID+ CourseID
Dependencies
CourseID-> CourseDesc (Partial dependency)
CourseID-> ProfName ((Partial dependency)
ProfName->RoomNo
StuID+ CourseID-> Score (Fully functional dependency)
Score-> CGPA
We see here in Student_Project relation that the prime key attributes are Stu_ID and
Proj_ID.
According to the rule, non-key attributes, i.e. Stu_Name and Proj_Name must be dependent
upon both and not on any of the prime key attribute individually. But we find that
Stu_Name can be identified by Stu_ID and Proj_Name can be identified by Proj_ID
independently. This is called partial dependency, which is not allowed in Second Normal
Form.
We broke the relation in two as depicted in the above picture. So there exists no partial
dependency.
Let us explain. Emp-Id, empname, month, bankid is the composite primary key of the above
relation. Emp-Name, Month, Sales and Bank-Name all depend upon Emp-Id. But the
attribute Bank-Name depends on Bank-Id, which is not the primary key of the table. So the
table is in 1NF, but not in 2NF. If this position can be removed into another related relation,
it would come to 2NF.
Emp- Emp- Mont Sale Bank-
Id Name h s Id
E01 AA JAN 1000 B01
E01 AA FEB 1200 B01
E01 AA MAR 850 B01
E02 BB JAN 2200 B02
E02 BB FEB 2500 B02
E03 CC JAN 1700 B01
E03 CC FEB 1800 B01
E03 CC MAR 1850 B01
E03 CC APR 1726 B01
Bank- Bank-
Id Name
B01 SBI
B02 UTI
Example: Suppose a college wants to store the data of teachers and the subjects they teach.
They create a table that looks like this: Since a teacher can teach more than one subjects,
the table can have multiple rows for a same teacher.
111 Maths 38
111 Physics 38
222 Biology 38
333 Physics 40
333 Chemistry 40
The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF
because non prime attribute teacher_age is dependent on teacher_id alone which is a
proper subset of candidate key. This violates the rule for 2NF as the rule says “no non-prime
attribute is dependent on the proper subset of any candidate key of the table”.
To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:
teacher_id teacher_age
111 38
222 38
333 40
teacher_subject table:
teacher_id Subject
111 Maths
111 Physics
222 Biology
333 Physics
333 Chemistry
The StudentName can be determined by StudentID that makes the relation Partial
Dependent.
The ProjectName can be determined by ProjectID, which that the relation Partial
Dependent.
Therefore, the StudentProject relation violates the 2NF in Normalization and is considered a
bad database design.
To remove Partial Dependency and violation on 2NF, decompose the tables:
StudentInfo
ProjectInfo
ProjectNo ProjectName
We conclude that CE can determine all the attributes of the given relation.
So, CE is the only possible candidate key of the relation.
Question on Second Normal Form (2NF):
Solution: Let us construct an arrow diagram on R using FD to calculate the candidate key.
AC + = ACBED
Since the closure of AC contains all the attributes of R, hence AC is Candidate Key
Since R has 5 attributes: - A, B, C, D, E and Candidate Key is AC, Therefore, prime attribute
(part of candidate key) are A and C while the non-prime attribute are B D and E
Since due to FD: A →B and C → D our table was not in 2NF, let's decompose the table
R1(A, B, E) ( from FD: A → B and B → E)
R3 ( A, C)
a. R1( A, B, E)
b. R2( C, D)
c. R3( A, C)
Solution: Let us construct an arrow diagram on R using FD to calculate the candidate key.
PQS + = PQSRT
Since the closure of PQS contains all the attributes of R, hence PQS is Candidate Key
From the definition of Candidate Key (Candidate Key is a Super Key whose no proper
subset is a Super key)
Since R has 5 attributes: - P, Q, R, S, T and Candidate Key is PQS, Therefore, prime attributes
(part of candidate key) are P, Q, and S while a non-prime attribute is R and T
a) FD: PQ → R does not satisfy the definition of 2NF, that non-prime attribute( R) is partially
dependent on part of candidate key PQS.
b) FD: S → T does not satisfy the definition of 2NF, as a non-prime attribute(T) is partially
dependent on candidate key PQS (i.e., key should not be broken at any cost).
Since due to FD: PQ → R and S → T, our table was not in 2NF, let's decompose the table
And create one table for the key, since the key is PQS.
R3(P, Q, S)
a) R1( P, Q, R)
b) R2(S, T)
c) R3(P, Q, S)
A relation schema R is in third normal form (3NF) if, whenever a nontrivial functional
dependency X→Y holds in R, either (a) X is a super key of R, or (b) Y is a prime attribute of R.
Note:-
If Non prime attribute determines other Non-Prime attribute then it is not in 3 NF
Example: Consider the following employee table:
Empno->Dept_name
Emp table
Department table
Dept_name Dept_city
Computer Lucknow
Administration Lucknow
Sales Delhi
Sales Meerut
Accounts Delhi
F: { A ->B , B-> E, C ->D } Check out that relation is in 3NF or not? If not decompose it in
3NF.
Solution1:
Firstly find the candidate key in the relation: (AC)+ = ABCDE AC is the candidate key, because
closure of AC has all the attributes of R.
Prime attributes: AC
A relation is said to be 3NF, if it holds at least one of the following for every non trivial
functional dependency X-> Y:
So, the relation is not in 3NF as it is not following the rules of 3NF.
Question 2: Suppose a relational schema R (A B C D E F G H I) and set of functional
dependencies
ABD is the candidate key, because closure of ABD has all the attributes of R.
Prime attributes: A, B, D
A relation is said to be 3NF, if it holds at least one of the following for every non trivial
functional dependency X-> Y:
So, the relations is not in 3NF as it is not following the rules of 3NF.
R12(A I)
BCNF(Boyce – Codd Normal Form)
A relation schema R is in BCNF with respect to a set F of functional dependencies if, for
all functional dependencies in F + of the form α → β, where α ⊆ R and β ⊆ R, at least
one of the following holds:
A relation is in BCNF, if and only if, every determinant is a Form (BCNF) candidate key
and it must also be in 3NF
Let's assume there is a company where employees work in more than one
department.
EMPLOYEE table:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table:
EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Candidate keys:
Now, this is in BCNF because left side part of both the functional dependencies is a key.
result={R};
done=false;
compute F+;
then begin
let α->β be a nontrivial functional dependency that holds
end
else done=true;
Example:-
class (course id, title, dept name, credits, sec id, semester, year, building,
room number, capacity, time slot id)
A candidate key for this schema is {course id, sec id, semester, year }
We can apply the algorithm of Figure 8.11 to the class example as follows:
holds, but course id is not a superkey. Thus, class is not in BCNF. We replace class by:
course(course id, title, dept name, credits)
class-1 (course id, sec id, semester, year, building, room number,
capacity, time slot id)
The only nontrivial functional dependencies that hold on course include course id on
the left side of the arrow. Since course id is a key for course, the relation course is in
BCNF.
• A candidate key for class-1 is course id, sec id, semester, year . The functional
dependency:
{building, room number → capacity}
holds on class-1, but building, room number is not a superkey for class-1. We replace
class-1 by:
classroom (building, room number, capacity)
section (course id, sec id, semester, year,
building, room number, time slot id)
Thus, the decomposition of class results in the three relation schemas course, class-
room, and section, each of which is in BCNF.
Question 2 : R=ABCDE,
Candidate key=A
Looking at the 1st FD in F, A -> BC. A a key for R. so this FD does not violate BCNF.
In (ABC), A is still the key, so the first FD is still not in violation. In (CDE) C is the key, so C ->
DE is also not in violation. This decomposition is in BCNF.
Example 3:
R = (ABCD)
First, let’s compute the attribute closure: A+ = A, B+ = BD ,C+ = AC, D+ = D ,AB+ = ABCD ,BC+
= ABCD
So our candidate keys are AB and BC. Now we start the algorithm:
AB -> C Does the first violate BCNF, i.e. is AB a key for R? The answer is yes, so no violation.
B -> D Does this violate BCNF? Yes, because B is not a key. So we create 2 relations: (BD)
(ABC)
SD →P is a violating FD
J →S is still a violation in R1
Solution (1)C D and C A both cause violations of BCNF. Take C→D: decompose R to R1= {C,D}
, R2={A,B,C} [ ABCD-D=ABC]
Final decomposition: R2 = {C, D}, R11 = {B, C}, R12 = {C, A}.
Solution 1. Find all the Candidate Keys: AB, BC, CD, AD Check all FDs in F for 3NF condition
Prime Attribute:-A,B,C,D
Final Decompostion
Multi-Valued Dependency
MVD is the dependency where one attribute value is potentially a ‘multi-valued fact’ about
another.
Multi-valued dependency occurs when two or more independent multi-valued facts about
the same attribute occur within same table.
It means that if in a relation R having A, B and C as attributes, B and Care multi-valued facts
about A, which is represented as A-> ->B and A -> -> C, then multi value dependency exist
only if B and C are independent of each other.
It is denoted X->-> Y. For example:-Consider a bike manufacture company which provides
two colours (RED and BLACK)
Bike Table:
MVD can be defined as follow :Let R be a relational scheme and let X and Y be the subsets of
attributes of R.
A table is said to have multi-valued dependency, if the following conditions are true,
Properties – A relation R is in 4NF if and only if the following conditions are satisfied:
1. It should be in the Boyce-Codd Normal Form (BCNF).
2. the table should not have any Multi-valued Dependency.
A relation schema R is in 4NF with respect to a set D of functional and multivalued
dependencies if for all multivalued dependencies in D+ of the form α →→ β, where α ⊆ R
and β ⊆ R, at least one of the following hold:
Example
Vendor Table
A relation is in 4NF if it has no more than one independent multi valued dependency or one
independent multi valued dependency with a functional dependency.
The table can be expressed as the two 4NF relations given as following. The fact that
vendors are capable of supplying certain items and that they are assigned to supply for
some projects in independently specified in the 4NF relation.
Vendor-Supply
Item
Vendor Code
Code
V1 I1
V1 I2
V2 I2
V2 I3
V3 I1
Vendor-Project
Project
Vendor No.
Code
V1 P1
V1 P3
V2 P1
V3 P2
Example:
Faculty table
Consider again the same Faculty relation with MVD. Clearly a faculty has multiple courses to
teach and he is heading several committees. This relation is in BCNF, since al the three
attributes concatenated together constitutes its key, yet it is clearly wrong and requires
decomposition. The rule for decomposition is to decompose the offending table into two,
with the mutlideterminant attribute or attributes as part of the key of both. In this case, to
put the relation in 4NF, two separate relations are formed as follows:
Faculty Subject
John DBMS
John Networking
John MIS
Faculty Committee
John Placement
John Scholarship
Example 2:
STU_ID COURSE HOBBY
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
Consider R(A,B,C,D,E,F,G) with the set of FD's and MVD's given by
Solution
Decomposition
The process of breaking up or dividing a single relation into two or more sub relations is
called as decomposition of a relation.
R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn = R
A B C
1 2 1
2 5 3
3 3 3
R( A , B , C )
Consider this relation is decomposed into two sub relations R1( A , B ) and R2( B , C )-
R1 ( A , B )
A B
1 2
2 5
3 3
R2 ( B , C )
B C
2 1
5 3
3 3
Now, if we perform the natural join ( ⋈ ) of the sub relations R1 and R2 , we get-
A B C
1 2 1
2 5 3
3 3 3
NOTE-
This decomposition is called lossy join decomposition when the join of the sub relations
does not result in the same relation R that was decomposed.
The natural join of the sub relations is always found to have some extraneous tuples.
For lossy join decomposition, we always have-
R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn ⊃ R
Example-
Consider the following relation R( A , B , C )-
A B C
1 2 1
2 5 3
3 3 3
Consider this relation is decomposed into two sub relations as R1( A , C ) and R2( B , C )-
R1 ( A , B )
A C
1 1
2 3
3 3
R2 ( B , C )
B C
2 1
5 3
3 3
R1 ⋈ R 2 ⊃ R
Now, if we perform the natural join ( ⋈ ) of the sub relations R1 and R2 we get-
A B C
1 2 1
2 5 3
2 3 3
3 5 3
3 3 3
This relation is not same as the original relation R and contains some extraneous tuples.
Clearly, R1 ⋈ R2 ⊃ R.
Thus, we conclude that the above decomposition is lossy join decomposition.
1. First condition holds true as Att(R1) U Att(R2) = (ABC) U (AD) = (ABCD) = Att(R).
2. Second condition holds true as Att(R1) ∩ Att(R2) = (ABC) ∩ (AD) ≠ Φ
3. Third condition holds true as Att(R1) ∩ Att(R2) = A is a key of R1(ABC) because A->BC
is given.
Solution:-
1. R1 ( A , B ) ∪ R2 ( C , D )
=R(A,B,C,D)
Clearly, union of the sub relations contain all the attributes of relation R.
2. R1 ( A , B ) ∩ R2 ( C , D )
=Φ
Clearly, intersection of the sub relations is null.
So, condition-02 fails.
Thus, we conclude that the decomposition is lossy.
R‘ ( A , B , C ) ∪ R3 ( B , D )
=R(A,B,C,D)
Clearly, union of the sub relations contain all the attributes of relation R.
2. According to condition-02, intersection of both the sub relations must not be null.
So, we have-
R‘ ( A , B , C ) ∩ R3 ( B , D )
=B
3. According to condition-03, intersection of both the sub relations must be the super key of
one of the two sub relations or both.
So, we have-
R‘ ( A , B , C ) ∩ R3 ( B , D )
=B
B+ = { B , C , D }
Now, we see- Attribute ‘B’ can determine all the attributes of sub relation R 3.
Thus, it is a super key of the sub relation R3. So, condition-03 satisfies.
Thus, we conclude that the decomposition is lossless.
Join Dependency:-
Join decomposition is a further generalization of Multi-valued dependencies. If the join of R1
and R2 over C is equal to relation R, then we can say that a join dependency (JD) exists.
Where R1 and R2 are the decompositions R1(A, B, C) and R2(C, D) of a given relations R (A,
B, C, D).
Definition. A join dependency (JD), denoted by JD(R1, R2, ..., Rn), specified on relation
schema R, specifies a constraint on the states r of R. The constraint states that every legal
state r of R should have a nonadditive join decomposition into R1, R2, ..., Rn. Hence, for
every such r we have
A join dependency JD(R1, R2, ..., Rn), specified on relation schema R,is a trivial JD if one of
the relation schemas Ri in JD(R1, R2, ..., Rn) is equal to R. Such a dependency is called
trivial because it has the lossless join property for any relation state r of R and thus does not
specify any constraint on R.
Definition. A relation schema R is in fifth normal form (5NF) (or project-join normal form
(PJNF)) with respect to a set F of functional, multivalued, and join dependencies if, for every
nontrivial join dependency JD(R1, R2, ..., Rn) in F+ (that is, implied by F), every Ri is a
superkey of R.
Vendor Table
Vendor-Supply
Item
Vendor Code
Code
V1 I1
V1 I2
V2 I2
V2 I3
V3 I1
Vendor-Project
Project
Vendor No.
Code
V1 P1
V1 P3
V2 P1
V3 P2
These relations still have a problem. While defining the 4NF we mentioned that all the
attributes depend upon each other. While creating the two tables in the 4NF, although we
have preserved the dependencies between Vendor Code and Item code in the first table and
Vendor Code and Item code in the second table, we have lost the relationship between Item
Code and Project No. If there were a primary key then this loss of dependency would not
have occurred. In order to revive this relationship we must add a new table like the
following. Please note that during the entire process of normalization, this is the only step
where a new table is created by joining two attributes, rather than splitting them into
separate tables.
Project Item
No. Code
P1 11
P1 12
P2 11
P3 11
P3 13
Example 2:
Company Product
Godrej Soap
Godrej Shampoo
H.Lever Soap
H.Lever Shampoo
Company Supplier
Godrej Mr. X
Godrej Mr. Y
Godrej Mr. Z
H.Lever Mr. X
H.Lever Mr. Y
Product Supplier
Soap Mr. X
Soap Mr. Y
Shampoo Mr. X
Shampoo Mr. Y
Shampoo Mr. Z
Supplier we get
Example 3:
Consider once again the SUPPLY all-key relation in Figure . Suppose that the following
additional constraint always holds: Whenever a supplier s supplies part p, and a project j
uses part p, and the supplier s supplies at least one part to project j, then supplier s will also
be supplying part p to project j. This constraint can be restated in other ways and specifies a
join dependency JD(R1, R2, R3) among the three projections R1(Sname, Part_name),
R2(Sname, Proj_name), and R3(Part_name, Proj_name) of SUPPLY
SUPPLY relation with the join dependency is decomposed into three relations R1, R2, and R3
that are each in 5NF. Notice that applying a natural join to any two of these relations
produces spurious tuples, but applying a natural join to all three together does not.
Inclusion Dependency
o Multivalued dependency and join dependency can be used to guide database design
although they both are less common than functional dependencies.
o Inclusion dependencies are quite common. They typically show little influence on
designing of the database.
o The example of inclusion dependency is a foreign key. In one relation, the referring
relation is contained in the primary key column(s) of the referenced relation.
o Suppose we have two relations R and S which was obtained by translating two entity
sets such that every R entity is also an S entity.
o In practice, most inclusion dependencies are key-based that is involved only keys.
Inclusion dependencies were defined in order to formalize two types of interrelational
constraints:
The foreign key (or referential integrity) constraint cannot be specified as a functional or
multivalued dependency because it relates attributes across relations.
The constraint between two relations that represent a class/subclass relation-ship.
Definition. An inclusion dependency R.X < S.Y between two sets of attributes—X of relation
schema R, and Y of relation schema S—specifies the constraint that, at any specific time
when r is a relation state of R and s a relation state of S, we must have
πX(r(R)) ⊆ πY(s(S))
The ⊆ (subset) relationship does not necessarily have to be a proper subset. Obviously, the
sets of attributes on which the inclusion dependency is specified—X of R and Y of S—must
have the same number of attributes.
Examples: DEP.D_MGR_SSN < EMP.SSN ie, Social Security Number of managers for a
department is always a subset of Social Security Number of Employees of a department.
As with other types of dependencies, there are inclusion dependency inference rules (IDIRs).
The following are three examples:
IDIR2 (attribute correspondence): If R.X < S.Y, where X = {A1, A2,≤ ..., An} and Y =
{B1, B2, ..., Bn} and Ai corresponds to Bi, then R.Ai < S.Bi for 1 i n.
IDIR3 (transitivity): If R.X < S.Y and S.Y < T.Z, then R.X < T.Z.
• The relation r1 ⋈ r2 ⋈... ⋈ rn is called a universal relation since it involves all the
attributes in the “universe” defined by
R1 ∪ R2 ∪ ... ∪ Rn.
Solution: Let us construct an arrow diagram on R using FD to calculate the candidate key.
From the above arrow diagram on R, we can see that all the attributes are determined by all
the attributes of the given FD, hence we will check all the attributes (i.e., A, B, and C) for
candidate keys
A + = ABC . Since closure A contains all the attributes of R, hence A is the Candidate key.
B + = BAC . Since closure B contains all the attributes of R, hence B is the Candidate key.
C + = CAB . Since closure C contains all the attributes of R, hence C is the Candidate key.
Since R has 3 attributes: - A B and C, Candidate Keys are A B and C, Therefore, prime
attributes (part of candidate key) are A B C while there is no non-prime attribute
a. FD: A → B satisfy the definition of BCNF, as A is Super Key, we check other FD for
BCNF
b. FD: B → C satisfy the definition of BCNF, as B is Super Key, we check other FD for
BCNF
Since there were only three FD's and all FD: { A → B, B → C and C → A } satisfy BCNF, hence
the highest normal form is BCNF.
Solution:
Since the closure of PQS contains all the attributes of R, hence PQS is Candidate Key
Given FD are { PQ → R, QS → TU, PS → VW, and P → X } and Super Key / Candidate Key is
PQS
a. FD: PQ → R does not satisfy the definition of BCNF, as PQ is not Super Key, hence the
table is not in BCNF (because if one dependency fails, all fails) now we check the
same FD for 3NF.
b. FD: PQ → R even does not satisfy the definition of 3NF, as PQ is not Super Key or R is
not a prime attribute, hence table is not in 3NF also (because if one dependency
fails, all fails) now we check same FD for 2NF
c. FD: PQ → R even does not satisfy the definition of 2NF, as PQ is not Super Key and R
which is not prime attribute depending on part of the key (partial dependency),
hence table is not in 2NF also (because if one dependency fails, all fails).
Hence from the above three statements, we can say that table R ( P, Q, R, S, T, U, V, W, X)
is in 1NF only.