Chapter five
Functional Dependency and normaliuzation
A functional dependency is a constraint between two sets of attributes from the database.
Suppose that our relational database schema has n attributes A 1, A2,… ,An; let us think of the
whole database as being described by a single universal relation schema R ={A 1, A2, ... ,An}·
we use this concept only in developing the formal theory of data dependencies. A functional
dependency, denoted by X Y, between two sets of attributes X and Y that are subsets of R
specifies a constraint on the possible tuples that can form a relation state r of R. The constraint is
that, for any two tuples t1 and t2 in r that have t1[X] = t2[X], they must also have t1[Y] = t2[y] .
This means that the values of the Y component of a tuple in r depend on, or are determined by,
the values of the X component; alternatively, the values of the X component of a tuple uniquely
(or functionally) determine the values of the Y component. We also say that there is a functional
dependency from X to Y, or that Y is functionally dependent on X. The abbreviation for
functional dependency is FD or f.d. The set of attributes X is called the left-hand side of the FD,
and Y is called the right-hand side. Thus, X functionally determines Y in a relation schema R if,
and only if, whenever two tuples of r(R) agree on their X-value, they must necessarily agree on
their Y-value.
Note the following:
A functional dependency is a property of the semantics or meaning of the attributes. The
database designers will use their understanding of the semantics of the attributes of R-that is,
how they relate to one another-to specify the functional dependencies that should hold on all
relation states (extensions) r of R. the main use of functional dependencies is to describe further
a relation schema R by specifying constraints on its attributes that must hold at all times.
Consider the relation schema EMP_PROj from the semantics of the attributes, we know that the
following functional dependencies should hold:
a. SSN ENAME
b. PNUMBER {PNAME, PLOCATION}
C. {SSN, PNUMBER} HOURS
These functional dependencies specify that (a) the value of an employee's social security number
(SSN) uniquely determines the employee name (ENAME), (b) the value of a project's number
(PNUMBER) uniquely determines the project name (PNAME) and location (PLOCATION), and
(c) a combination of SSN and PNUMBER values uniquely determines the number of hours the
employee currently works on the project per week (HOURS). Alternatively, we say that ENAME
is functionally determined by (or functionally dependent on) SSN, or "given a value of SSN, we
know the value of ENAME," and so on. A functional dependency is a property of the relation
schema R, not of a particular legal relation state r of R. Hence, an FD cannot be inferred
automatically from a given relation extension r but must be defined explicitly by someone who
knows the semantics of the attributes of R. We denote by F the set of functional dependencies
that are specified on relation schema R.
The set of all functional dependencies that include F as well as all dependencies that can be
inferred from F is called the closure of F: it is denoted by F+
F={SSN {Ename,Bdate,Address, Dnumber},
Dnumber{Dname,DmgrSSN}}
- Additionally we can infer from F are
o SSN{Dname,DmgrSSN}
o SSNSSN
o DnumberDname
Most important type of constraint used to improve designs systematically is a unique-value
constraint called functional dependency.
Title Year Length Film type Studio name Star name
Start wars 1997 124 Color fox Carrie fisher
Start wars 1997 124 Color fox Mark Hamill
Start wars 1997 124 Color fox Harrison Ford
For example, consider the above relation and we can drive the following functional dependency
Title, year length, filmtype, studioname
This FD says that if two tuples have the same value in their title and year components then these
two tuples must have the same values in their length, filmtype, and studioname components
Title,year starname doesn’t hold, and it is not a functional dependency i.e. given a movie
it is entirely possible that there is more than one star for the movie listed in our database.
Keys of Relations
We say a set of one or more attributes {A1,A2,…,An} is a key for a relation R if:
1. Those attributes functionally determine all other attributes of the relation (i.e. because
relations are sets, it is impossible for two distinct tuples of R to agree on all of A1,A2,…
An )
2. No proper subsets of {A1,A2,…,An} functionally determines all other attributes of R, i.e.
a key must be minimal.
FD is an assertion about the schema of a relation, not about a particular instance
e.g. title Film_type doesn’t hold in the above relation (even if for every tuple in this particular
instance of the relation movies it happens that any two tuples agreeing on title also agree on
film_type)
Functional Dependency analysis
Functional dependencies can be used to determine keys of a relation. If a particular attribute or
set of attributes determines all other attributes of a relation, we say that attribute or set of
attributes is a Key.
Closure of Attributes
Given a relation, functional dependencies and set of attributes, closure is find all B such that A
B. denoted by A+ or {A1,A2,…,An}+
- It is finding entire set of attributes determined by attributes A1,…An.
- To find the closure of a set of attributes {A1,A2,…An}+
- start with the attribute set {A1,….An, C,D,… }
- then insert the attribute if there exists any functional dependency
e.g. if ACD insert C and D to the closure set
- Repeat until no change
Closure example
Given the relation with the following schema
Student(SSN,Sname,Address,Hscode,Hsname,Hscity,GPA,Priority) and if the following
functional dependencies hold
SSnSname,Address,GPA
GPApriority
HscodeHsname,Hscity
{SSN,Hscode}+ = {SSN,Hscode,Sname,address,GPA,Priority,Hsname,Hscity}
If we find {SSN}+ only it doesn’t determine all the attributes and hence it can’t be a key. But
together SSN and Hscode can be a key because they determine all the attributes. Compute A+
and if it equals all attributes then A is Key.
The appearance an attribute in the left, middle or right hand side of a functional dependency
determines that weather that attribute can be a key or not. i.e.
- An attribute that appears in the left hand side (LHS) of a functional dependency can be a key
or part of a key
- An attribute that appears in the right hand side (RHS) of a functional dependency cannot be a
key or part of a key
- An attribute that appears in the middle of a functional dependency may or may not be a key or
part of a key
Examples
1. IF a relation R(A,B,C) and functional dependency F= {AB,BC} is given then we can
have left, middle, and right hand side attributes as follows
L M R
A B C
We can find A+ ={A,B,C} therefore, A is a Key for this relation
2. If a relation R(A,B,C,D) and functional dependency F= {AB C,CB,CD} is given
then to determine the key of this relation we can put the attributes to the left, middle, and
right form and find the closure of each. If the closure of a particular attribute contains all
attributes, then that attribute or a set of attribute will be a key.
L M R
A B,C D
Find closures
A+={A} AB+ = {ABCD} AC+ = {ACBD} therefore,
from this the Key for the given relation is {ABC}
3. If a relation R(A,B,C) and functional dependency F= {AB,BC,CA} is given then
to determine the key of this relation we can put the attributes to the left, middle, and right
form and find the closure of each. If the closure of a particular attribute contains all
attributes, then that attribute or a set of attribute will be a key.
L M R
A,B,C
Find closures
A+={A,B,C} B+ = {B,C,A} C+ = {CAB} therefore, from this
the Key for the given relation is {ABC}
Minimal covers
It is used to eliminate redundant functional dependencies from left hand side, and it is a
way of revising FDs so that they are more efficient. It should be done in very complicated
databases. Its aim is to find a minimal set of dependencies as being a set of dependencies
in a standard or canonical form with no redundancies.
Steps to find minimal covers
1. First you should have singleton Right hand side
2. No extraneous(undesired) Left hand side attributes
3. No redundant functional dependencies
Example: if a relation R(ABCDE) and a functional dependency F={AD, BCAD ,
CB, EA, ED } is given then find the minimal covers of the given FD.
- Singleton RHS
F={AD, BCA , BCD,CB, EA, ED }
- Find and remove any extraneous attributes from LHS
First let’s try BCA, to eliminate either B or C from LHS we should find the B+ and
the C+, if for example B+ contains C then we can eliminate C from LHS otherwise
we can’t and the same is true for eliminating B from LHS.
Now find B+={B} so we can’t eliminate the C from LHS, and then find C+={CBAD}
- Since C closure (C+) contains B we can eliminate the B
Therefore, the FD, BCA, will become CA, then make this change immediately in
the set of functional dependencies
Next, consider the FD, BCD then find B+={B} so we can’t eliminate C and
C+={CABD}
- Since C closure (C+) contains B we can eliminate the B. therefore, the FD, BCD
will become CD
- The given set of FD after the extraneous attributes removed becomes
F={AD, CA , CD, CB, EA, ED }
- Now find whether there is any redundant FDs or not, to do this check each and every
functional dependency.
- Check first AD, to determine this FD is redundant or not find A+ and if A+ contains
D then we can get rid of D i.e. we don’t need AD
Find A+={A} – no way to remove D, so keep this FD
- Check CA, find C+={CDB} – doesn’t include A, so keep this FD
- Check CD, find C+={CADB} – since C+ contains D, we can eliminate CD FD
- Check CB, find C+={CAD} – doesn’t include B, so keep this FD
- Check EA, find E+={ED} – doesn’t include A, so keep this FD
- Check ED, find E+={EAD} – since E+ contains D, we can eliminate ED FD
Therefore, the minimal cover for F={AD, BCAD , CB, EA, ED } becomes
F={AD, CA , CB, EA}
Normalization and Normal Forms
Many problems arise from redundancy can be addressed by replacing a relation with a collection
of smaller relations. Normalization process takes a relation schema through a series of tests to
certify whether it satisfies a certain normal form.
Normalization is a process of analyzing a given relation schema based on their FDs and primary
keys to achieve the desirable properties of minimizing redundancy and minimizing the insertion,
deletion and update anomalies.
Normal form of a relation refers to the highest normal form condition that it meets.
First Normal Form (1NF)
It is now considered to be part of the formal definition of a relation in the basic relational model.
It states that the domain of an attribute must include only atomic (simple, indivisible) values and
the value of any attribute in a tuple must be a single value from the domain of that attribute.
- It was defined to disallow multivalued and composite attributes
Example
consider a relational schema Department(Dno,Dname,Dmgrssn,Dlocation) the attribute
Dlocation may has more than one value, there for this relation is not in 1NF
Department Dno Dname Dmgrssn Dlocation
5 Research 3344555 Bellaire,Stanford,Houston
4 Admin 9976543 Stanford
1 HQ 8886655 Houston
Therefore, it should be normalized to 1NF, there are different techniques to do this
1. Expand the key so that there will be a separate tuple in the original department location
for each location of a department with the primary key {Dno, Dlocation} together, but
this introduces redundancy.
Department Dno Dname Dmgrssn Dlocation
5 Research 3344555 Bellaire
5 Research 3344555 Stanford
5 Research 3344555 Houston
4 Admin 9976543 Stanford
1 HQ 8886655 Houston
Now this is in 1NF with redundancy
2. Remove the attribute Dlocation that violates 1NF and place it in a separate relation
Dept_location along with the Primary Key, the primary key for this new relation will be
the combination {Dno,Dlocation}
Dept-Location Dno Dlocation
Departmen Dno Dname Dmgrssn
5 Bellaire
t
5 Stanford
5 Research 3344555
5 Houston
4 Admin 9976543
4 Stanford
1 HQ 8886655
1 Houston
Now this is in 1NF and this method is the best method to normalize a relation to 1NF
3. If a maximum number of values is known for the attribute – for example if our
department locations are only three i.e. Dlocation1,Dlocation2,Dlocation3, replace
Dlocation attribute with these three atomic attributes
Department Dno Dname Dmgrssn Dlocation1 Dlocation1 Dlocation1
5 Research 3344555 Bellaire Stanford Houston
4 Admin 9976543 Stanford
1 HQ 8886655 Houston
This is now also in 1NF but it introduces null values
Second Normal Form (2NF)
It is based on full functional dependency. A full functional dependency XY is a full FD if
removal of any attribute A from X means that the dependency doesn’t hold any more.
For example consider the following FD
{SSN,Pnumber} Hours –is a full dependency (neither SSNHours nor PnumberHours
holds)
A functional dependency XY is a partial dependency if some attribute A ϵ X can be
removed from X and the dependency still holds.
For example {SSN,Pnumber} Ename - is a partial dependency because
SSNEname holds.
A relational schema R is in second normal form if every nonprime attribute A in R is fully
functional dependent on the primary key of R. for example the schema Emp-Proj shown below
is not in second normal form.
Emp-Proj
SSN Pnumber Hourse Ename Pname Plocation
FD1
FD2
FD3
This is not in second normal form and can be normalized in to a number of 2NF relations in
which nonprime attributes are associated only with the part of the primary key on which they are
fully functionally dependent.
Emp1 Emp2 Emp3
Pnumber Pname Plocation
SSN Pnumber Hours
SSN Ename
Third Normal Form (3NF)
It is based on the concept of transitive dependency. A functional dependency XY in a
relation schema R is a transitive dependency if there is a set of attributes Z that is neither
a candidate key nor a subset of any key of R, and both XZ and ZY hold.
A relation schema R is in 3NF if it satisfies 2NF and no nonprime attribute of R is
transitively dependent on the primary key.
Emp-Dept
SSN Ename Bdate Address Dnumber Dname DmgrSSN
This is in 2NF since no partial dependency on a key exists, but it is not in 3NF because of the
transitive dependency of DmgrSSN and Dname on SSN via Dnumber.
SSNDmgrSSN is transitive because both
SSNDnumber and Dnumber DmgrSSN hold and Dnumber is neither a key nor a subset of
the key.
Emp_Dept1 Emp_Dept2
SSN Ename Bdate Address Dnumber Dnumbe Dname DmgrSSN
r
Generaly we want to design our relation schemas so that they have neither partial nor transitive
dependencies, because these cause the update anomalies problem.
Boyce – codd Normal Form
- Every relation in BCNF is also in 3NF
- Boyce – codd normal form states that a relation schema R is in BCNF if whenever a
functional dependancy XA holds in R, then X is a supper key of R.