Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
27 views42 pages

Dbms Unit 3

Uploaded by

benitaroy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views42 pages

Dbms Unit 3

Uploaded by

benitaroy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

lOMoARcPSD|45799663

DBMS UNIT-3

Value Added Course (Anna University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Benita Roy ([email protected])
lOMoARcPSD|45799663

UNIT-3
SYLLABUS:
Database Design Theory: Functional Dependencies, Normal forms based on Primary Keys,
Second and Third Normal Forms, Boyce-Coded Normal Form, Multi valued Dependencies and
Fourth Normal Form, Join Dependencies and Fifth Normal Form.

Database Design Theory:


Database Normalization is a technique of organizing the data in the database. Normalization
is a systematic approach of decomposing tables to eliminate data redundancy and undesirable
characteristics like Insertion, Update and Deletion Anomalies. It is a multi-step process that
puts data into tabular form by removing duplicated data from the relation tables. This module
discusses the basic and higher normal forms.
Introduction to DB design
Each relation schema consists of a number of attributes, and the relational database schema
consists of a number of relation schemas. So far, we have assumed that attributes are grouped
to form a relation schema by using the common sense of the database designer or by mapping
a database schema design from a conceptual data model such as the ER or Enhanced-ER
(EER) data model. These models make the designer identify entity types and relationship
types and their respective attributes, which leads to a natural and logical grouping of the
attributes into relations.
Database Design
There are two levels at which we can discuss the goodness of relation schemas:
1. The logical (or conceptual) level how users interpret the relation schemas and the
meaning of their attributes.
2. The implementation (or physical storage) level how the tuples in a base relation are
stored and updated. This level applies only to schemas of base relations.
An Example
 STUDENT relation with attributes: studName, rollNo, gender, studDept
 DEPARTMENT relation with attributes: deptName, officePhone, hod
 Several students belong to a department
 studDept gives the name of the student’s department
Correct schema:
Student Department

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

Incorrect schema:
Studdept

Problems with bad schema


 Redundant storage of data:
- Office Phone & HOD info -stored redundantly once with each student that
belongs to the department
- wastage of disk space
 A program that updates Office Phone of a department
- must change it at several places
- more running time
- error -prone

2.2 Functional Dependencies


 Formal tool for analysis of relational schemas that enables us to detect and describe
some of the problems in precise terms.
Definition of Functional Dependency

 A functional dependency is a constraint between two sets of attributes from the


database.

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

 Given a relation R, a set of attributes X in R is said to functionally determine another


attribute Y, also in R, (written X→Y) if and only if each X value is associated with at
most one Y value.
 X is the determinant set and Y is the dependent attribute. Thus, given a tuple and the
values of the attributes in X, one can determine the corresponding value of the Y
attribute.
 The abbreviation for functional dependency is FD or f.d. The set of attributes X is
called the left-hand side of the FD, and Y is called the right-hand side.
 A functional dependency is a property of the semantics or meaning of the attributes.
The database designers will use their understanding of the semantics of the attributes
of R to specify the functional dependencies that should hold on all relation states
(extensions) r of R.
Consider the relation schema EMP_PROJ;

From the semantics of the attributes and the relation, we know that the following functional
dependencies should hold:
a) Ssn → Ename
b) Pnumber →{Pname, Plocation}
c) {Ssn, Pnumber}→ Hours
 These functional dependencies specify that
a) The value of an employee’s Social Security number (Ssn) uniquely
determines the employee name (Ename)
b) The value of a projects number (Pnumber) uniquely determines the project
name (Pname) and location (Plocation) and
c) A combination of Ssn and Pnumber values uniquely determines the
number of hours the employee currently works on the project per week
(Hours).
 Alternatively,we say that Ename is functionally determined by (or functionally
dependent on) Ssn, or given a value of Ssn, we know the value of Ename, and so on.
 Relation extensions r(R) that satisfy the functional dependency constraints are called
legal relation states (or legal extensions) of R.
 A functional dependency is a property of the relation schema R, not of a particular
legal relation state r of R.
 Therefore, a Functional Dependencies cannot be inferred automatically from a given
relation extension r but must be defined explicitly by someone who knows the
semantics of the attributes of R.

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

EXAMPLE:

Diagrammatic notation for displaying Functional Dependencies


 Each Functional Dependencies is displayed as a horizontal line

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

 The left-hand-side attributes of the Functional Dependencies are connected by vertical


lines to the line representing the Functional Dependencies
 The right-hand-side attributes are connected by the lines with arrows pointing toward
the attributes.
 Each Functional Dependencies is displayed as a horizontal line
 The left-hand-side attributes of the FD are connected by vertical lines to the line
representing the FD
 The right-hand-side attributes are connected by the lines with arrows pointing toward
the attributes.

Fig: diagrammatic notation for displaying FDs

INFERENCE
RULES:

EXAMPLE: Given FD’s for Relation R{A,C,D,E,F}, Find Closure of FD set by applying
Amstrong’s Axims.
A->B, A->C, CD->E, CD->F, B->E.
Solution:

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

EXAMPLE: Compute the closure of the following set F of functional dependencies for
relational schema R = (A,B,C,D,E) A->BC, CD->E, B->D, E->A

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

EXAMPLE: Give Armstrong’s axioms and using it find the closure of the following FD
set.
A->B, AB->C, D->AC, D->E
Solution:

EXAMPLE: R={A,B,C,D,E,F} and FDs are A->BC, E->CF, B->E, CD->EF compute
closure of {A,B}+
Solution:

EXAMPLE: Consider schema EMPLOYEE(E-ID, E-NAME, E-CITY, E-STATE) and

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

FD = {E-ID->E-NAME, E-ID->E-CITY, E-ID->E-STATE, E-CITY->E-STATE}


1) Find attribute of closure for (E-ID)+
2) Find (E-NAME)
+
Solution:

KEYS AND FUNCTIONAL DEPENDENCIES:


PROPERTIES OF FUNCTIONAL DEPENDENCIES:
1) Reflexive: If Y⊆X then X → Y is a Reflexive Functional Dependency.
Ex: AB→A , A⊆AB holds. Therefore AB→A is a Reflexive Functional Dependency.
2) Augmentation: If X→Y is a functional dependency then by augmentation, XZ→YZ is
also a functional dependency.
3) Transitivity: If X→Y and Y→Z are two functional dependencies then by transitivity,
X→Z is also a functional dependency.
4) Union: If X→Y and X→Z are two functional dependencies then, X→YZ is also a
functional dependency.
5) Decomposition: If X→YZ is a functional dependency then X→Y and X→Z are also
functional dependencies.
CLOSURE SET OF A FUNCTIONAL DEPENDENCY (F+)
It is a set of all functional dependencies that can be determined using the given set of
dependencies. It is denoted by F+.
Attribute Closure (X+): It is a set of all the attributes that can be determined using X. It is
denoted by X+, where X is any set of attributes.
Example:
R(A,B,C) F:{A→B , B→C}
A+={A,B,C} B+={B,C} C+={C}
AB+={A,B,C} AC+={A,C,B} BC+={B,C} ABC+={A,B,C}

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

Identifying keys in the given relation based on Functional Dependencies associated with
it
X+ is a set of attributes that can be determined using the given set X of attributes.
 If X+ contains all the attributes of a relation, then X is called "Super key" of that relation.
 If X+ is minimal set, then X is called "Candidate Key" of that relation.
If no closure contains all the elements then in such a case we can find independent attributes
of that relation i.e., the attributes that which are not in the R.H.S. of any dependency.
If the closure of the Independent attributes contains all the elements then it can be treated as a
candidate key. If the closure of independent attributes also doesn't contain all the elements
then we try to find the key by adding dependent attributes one by one. If we couldn't find key
then we can add groups of dependent attributes till we find a key to that relation.
PROPERTIES OF RELATIONAL DECOMPOSITION:

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

LOSS-LESS JOIN:

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

EXAMPLE: Consider the following relation R(A, B, C, D) and FDs A->BC, is the
decomposition of R into R1(A, B, C), R2(A, D). Check if the decomposition is lossless
join or not.
Solution:

Normal forms based on Primary Keys


NORMAL FORMS:

We assume that a
 Set of functional dependencies is given for each relation Each relation has a
designated primary key.
 This information combined with the tests (conditions) for normal forms drives the
normalization process for relational schema design.
 First three normal forms for relation takes into account all candidate keys of a relation
rather than the primary key.
Normalization of Relations
 The normalization process, as first proposed by Codd (1972a), takes a relation schema
through a series of tests to certify whether it satisfies a certain normal form.
 Initially, Codd proposed three normal forms, which he called first, second, and third
normal form

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

 All these normal forms are based on a single analytical tool: the functional
dependencies among the attributes of a relation
 A fourth normal form (4NF) and a fifth normal form (5NF) were proposed, based on
the concepts of multivalued dependencies and join dependencies, respectively
Normalization of data can be considered a process of analyzing the given relation
schemas based on their FDs and primary keys to achieve the desirable properties of
1) minimizing redundancy and
2) minimizing the insertion, deletion, and update anomalies
 It can be considered as a “filtering” or “purification” process to make the design have
successively better quality.
 Unsatisfactory relation schemas that do not meet certain conditions the normal form
tests are decomposed into smaller relation schemas that meet the tests and hence
possess the desirable properties.
 Thus, the normalization procedure provides database designers with the following:
 A formal framework for analyzing relation schemas based on their keys
and on the functional dependencies among their attributes.
 A series of normal form tests that can be carried out on individual relation
schemas so that the relational database can be normalized to any desired
degree.
 Definition: The normal form of a relation refers to the highest normal form
condition that it meets, and hence indicates the degree to which it has been
normalized.
Practical Use of Normal Forms
 Normalization is carried out in practice so that the resulting designs are of high
quality and meet the desirable properties
 Database design as practiced in industry today pays particular attention to
normalization only up to 3NF, BCNF, or at most 4NF.
 The database designers need not normalize to the highest possible normal form
 Relations may be left in a lower normalization status, such as 2NF, for performance
reasons
 Definition: Denormalization is the process of storing the join of higher normal form
relations as a base relation, which is in a lower normal form.
Definitions of Keys and Attributes Participating in Keys
 Superkey: specifies a uniqueness constraint that no two distinct tuples in any state r of
R can have the same value
 key K is a superkey with the additional property that removal of any attribute from K
will
 cause K not to be a superkey any more
Example:
 The attribute set {Ssn} is a key because no two employees tuples can have the same
value for Ssn
 Any set of attributes that includes Ssn for example, {Ssn, Name, Address} is a superkey
 If a relation schema has more than one key, each is called a candidate key
 One of the candidate keys is arbitrarily designated to be the primary key, and the
others are called secondary keys
 In a practical relational database, each relation schema must have a primary key
 If no candidate key is known for a relation, the entire relation can be treated as a
default superkey

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

 For example {Ssn} is the only candidate key for EMPLOYEE, so it is also the
primary key
 Definition. An attribute of relation schema R is called a prime attribute of R if it is a
member of some candidate key of R. An attribute is called nonprime if it is not a
prime attribute that is, if it is not a member of any candidate key.

 In WORKS_ON relation Both Ssn and Pnumber are prime attributes whereas other
attributes are nonprime.
First Normal Form
 Defined to disallow multivalued attributes, composite attributes, and their
combinations.
 It states that the domain of an attribute must include only atomic (simple, indivisible)
values and that the value of any attribute in a tuple must be a single value from the
domain of that attribute 1NF disallows relations within relations or relations as
attribute values within tuples.
 The only attribute values permitted by 1NF are single atomic (or indivisible) values.
Consider the DEPARTMENT relation schema shown in Figure below

 Primary key is Dnumber


We assume that each department can have a number of locations
 The DEPARTMENT schema and a sample relation state are shown in Figure below

 As we can see, this is not in 1NF because Dlocations is not an atomic attribute, as
illustrated by the first tuple in Figure
 There are two ways we can look at the Dlocations attribute:

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

 The domain of Dlocations contains atomic values, but some tuples can have a
set of these values. In this case, Dlocations is not functionally dependent on
the primary key Dnumber.
 The domain of Dlocations contains sets of values and hence is nonatomic. In
this
 In either case, the DEPARTMENT relation is not in 1NF
There are three main techniques to achieve first normal form for such a relation:
1) Remove the attribute Dlocations that violates 1NF and place it in a separate relation
DEPT_LOCATIONS along with the primary key Dnumber of DEPARTMENT. The
primary key of this relation is the combination {Dnumber, Dlocation}. A distinct
tuple in DEPT_LOCATIONS exists for each location of a department. This
decomposes the non-1NF relation into two 1NF relations.
2) Expand the key so that there will be a separate tuple in the original DEPARTMENT
relation for each location of a DEPARTMENT. In this case, the primary key becomes
the combination {Dnumber, Dlocation}. This solution has the disadvantage of
introducing redundancy in the relation

3) If a maximum number of values is known for the attribute for example, if it is known
that at most three locations can exist for a department replace the Dlocations attribute
by three atomic attributes: Dlocation1, Dlocation2, and Dlocation3. This solution
has the disadvantage of introducing NULL values if most departments have fewer
than three locations. Querying on this attribute becomes more difficult;
Example, consider how you would write the query: List the departments that have
Ballare as one of their locations in this design.
 Of the three solutions, the first is generally considered best because it does not suffer
from redundancy and it is completely general, having no limit placed on a maximum
number of values.
 First normal form also disallows multivalued attributes that are themselves composite.
 These are called nested relations because each tuple can have a relation within it.

 Figure above shows how the EMP_PROJ relation could appear if nesting is allowed
Each tuple represents an employee entity, and a relation PROJS(Pnumber, Hours)

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

within each tuple represents the employee’s projects and hours per week that employee
work on each project.
 The schema of this EMP_PROJ relation can be represented as follows:
EMP_PROJ(Ssn, Ename, {PROJS(Pnumber, Hours)}
 Ssn is the primary key of the EMP_PROJ relation and Pnumber is the partial key of
the nested relation; that is, within each tuple, the nested relation must have unique
values of Pnumber.
 To normalize this into 1NF, we remove the nested relation attributes into a new
relation and propagate the primary key into it; the primary key of the new relation will
combine the partial key with the primary key of the original relation.
 Decomposition and primary key propagation yield the schemas EMP_PROJ1 and
EMP_PROJ2.

EXAMPLE:

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

Second Normal Form

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

EXAMPLE:
Study the relation given below and state what level of normalization can be achieved and
normalize it up to that level.

Solution:

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

Third Normal Forms

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

EXAMPLE:
A software contract and consultancy firm maintains details of all the various projects in
which its employees are currently involved. These details comprise:
 Employee number
 Employee name
 Date of birth
 Department code
 Department name
 Project code
 Project description
 Project supervisor
Assume the following:
 Each employee number is unique.
 Each department has a single department code.
 Each project has a single code and supervisor.
 Each employee may work on one or more projects.
 Employee names need not necessary be unique.
 Project code, project supervisor and project description are repeting fields.
Normalize this data to third normal form.
Solution:

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

EXAMPLE:
What is normalization? Normalize below given relation upto 3NF STUDENT.

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

EXAMPLE:
What is the need for normalization? Consider the relation: Emp-proj = {ssn, Pnumber,
Hours, Ename, Pname, Plocation}

Solution:

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

EXAMPLE:
Normalize the below relation upto 3NF.

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

EXAMPLE:
Consider the relation R(ABC) with following FD A->B, B->C and C->A. What is the
normal form of R?
Solution:

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

EXAMPLE:
A college maintains details of its lecturer’s subject areas skills. These details comprise:
 Lecture number
 Lecture name
 Lecture grade
 Department code
 Department name
 Subject code
 Subject name
 Subject level
Assume that each lecture may teach many subjects but may belong to more than one
department.
Subject code, Subject name, Subject level are repeating fields.
Normalize this data to third normal form.
Solution:

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

EXAMPLE: Prove that any relational schema with two attributes is in BCNF.

EXAMPLE:
Prove the statement “Every relation which is in BCNF is in 3NF but the converse is not
true.”

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

Downloaded by Benita Roy ([email protected])


lOMoARcPSD|45799663

Downloaded by Benita Roy ([email protected])

You might also like