Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
9 views18 pages

r23 Dbms Unit 2 - Database Design

The document discusses database normalization, a process aimed at minimizing redundancy and dependency in database design to enhance efficiency and data integrity. It outlines key concepts, properties, and types of functional dependencies, along with inference rules that help derive functional dependencies for effective normalization. Additionally, it explains various normal forms (1NF, 2NF, 3NF, BCNF, 4NF, 5NF, DKNF) necessary for structuring a database schema properly.

Uploaded by

priyam3783
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views18 pages

r23 Dbms Unit 2 - Database Design

The document discusses database normalization, a process aimed at minimizing redundancy and dependency in database design to enhance efficiency and data integrity. It outlines key concepts, properties, and types of functional dependencies, along with inference rules that help derive functional dependencies for effective normalization. Additionally, it explains various normal forms (1NF, 2NF, 3NF, BCNF, 4NF, 5NF, DKNF) necessary for structuring a database schema properly.

Uploaded by

priyam3783
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

23IT204T – DATABASE MANAGEMENT SYSTEM

UNIT 2: DATABASE DESIGN

NORMALIZATION:

Normalization is a process in database design that aims to minimize redundancy and


dependency by organizing fields (attributes) and tables of a database according to certain
rules. The goal is to ensure that the database structure is efficient, with data stored in a way
that reduces the likelihood of anomalies (such as update, insertion, and deletion anomalies).

Normalization typically involves breaking down a larger table into smaller tables and
defining relationships between them. The process ensures that the database is free from
undesirable characteristics such as data redundancy and ensures the consistency of data.

Key Concepts of Normalization:

1. Redundancy: Repeated data in multiple places that can cause inconsistencies.


2. Anomalies:
o Update Anomaly: When data is updated in one place but not in others.
o Insertion Anomaly: Difficulty in adding data due to missing information.
o Deletion Anomaly: Unintended loss of related data when deleting a record.

Normalization reduces these anomalies by organizing the data into multiple tables.

PROPERTIES OF NORMALIZATION:

The properties of normalization refer to the key characteristics and benefits of applying
normalization techniques in relational database design. These properties ensure that the data
structure is efficient, minimizes redundancy, and maintains consistency. Below are the key
properties:

1. Minimization of Data Redundancy

 Redundancy refers to the unnecessary repetition of data within the database.


Normalization aims to eliminate data redundancy by dividing large tables into
smaller, more manageable ones.
 In normalized databases, each piece of data is stored only once, reducing the chances
of errors and inconsistencies.

Example:

 Instead of storing the same department name for every employee, normalization stores
the department in a separate table and links employees to it via a foreign key.

2. Elimination of Insertion Anomalies

 Insertion anomaly occurs when it is difficult to insert new data into a database due to
the presence of null or incomplete information in a table.
 After normalization, insertion anomalies are minimized because each table contains
only relevant, non-redundant data. For example, when adding a new employee, their
data is inserted without needing to repeat department details if the department already
exists.

3. Elimination of Update Anomalies

 Update anomaly arises when an update to a data value requires changing multiple
rows to maintain consistency.
 Normalized databases ensure that updates are made in one place. For example,
updating a department name only needs to be done once in the department table rather
than in every row of the employee table.
 This minimizes the risk of inconsistent data when updates occur.

4. Elimination of Deletion Anomalies

 Deletion anomaly occurs when deleting a record leads to unintended loss of other
data. For example, if an employee is the last person in a department and their record is
deleted, the department's information might also be lost.
 Normalization reduces deletion anomalies by separating data into appropriate tables.
If an employee record is deleted, the department's record remains intact.

5. Data Integrity

 Data integrity refers to the accuracy and consistency of the data within the database.
 Normalization helps maintain referential integrity by establishing relationships
between tables (such as using primary keys and foreign keys), ensuring that the
database remains consistent and accurate.
 Normalized databases also enforce domain integrity, meaning that data values
conform to defined data types and constraints.

6. Efficient Use of Storage

 By removing redundancy, normalization ensures that the database is storage-efficient.


 Normalization can lead to a more compact representation of data because it avoids
repeating the same values across multiple rows.

7. Improved Data Organization

 The process of normalization organizes data into logical groupings and establishes
clear relationships between entities, making it easier to query and manage.
 This leads to better maintainability and understanding of the database schema,
especially as it grows over time.
FUNCTIONAL DEPENDENCY:

A functional dependency (FD) is a relationship between two sets of attributes in a relational


database. It describes how one attribute (or a set of attributes) determines another attribute (or
a set of attributes) within a table.

In simpler terms, X → Y (read as "X functionally determines Y") means that for every
unique value of X, there is exactly one corresponding value of Y.

 X is called the determinant.


 Y is called the dependent.

Consider a relation Employee (Emp_ID, Emp_Name, Emp_Salary, Dept_ID). Here, we


can express the following Functional Dependencies:

 Emp_ID → Emp_Name: Emp_ID uniquely determines Emp_Name.


 Emp_ID → Emp_Salary: Emp_ID uniquely determines Emp_Salary.
 Emp_ID → Dept_ID: Emp_ID uniquely determines Dept_ID.

TYPES OF FUNCTIONAL DEPENDENCY:

1. Trivial Functional Dependency:


o A trivial functional dependency occurs when an attribute (or set of
attributes) determines itself or a subset of itself.
o Example:
 A → A (Any attribute determines itself)
 {A, B} → A (A combination of A and B determines A)

Trivial functional dependencies do not add meaningful information and are typically
ignored when performing normalization.

2. Non-Trivial Functional Dependency:


o A non-trivial functional dependency occurs when X → Y and Y is not a
subset of X.
o This is the most common type of functional dependency used in database
design.
o Example:
 Emp_ID → Emp_Name (Emp_ID determines Emp_Name, but
Emp_Name is not part of Emp_ID).

3. Transitive Functional Dependency:
o A transitive functional dependency occurs when X → Y and Y → Z,
implying that X → Z.
o In other words, if X determines Y, and Y determines Z, then X transitively
determines Z.
o Example:
 Emp_ID → Dept_ID and Dept_ID → Dept_Name. Therefore, by
transitivity, Emp_ID → Dept_Name.
4. Compositional (or Compound) Functional Dependency:
o A composite functional dependency occurs when a combination of attributes
(set of attributes) determines another attribute.
o Example:
 {Emp_ID, Dept_ID} → Emp_Name (The combination of Emp_ID
and Dept_ID determines Emp_Name).

5. Partial Functional Dependency:
o A partial functional dependency occurs when a non-prime attribute (non-key
attribute) is functionally dependent on only part of a composite (multi-
attribute) primary key.
o Partial dependencies should be removed during the normalization process to
ensure the table is in Second Normal Form (2NF).
o Example:
 In a table with a composite primary key {Emp_ID, Dept_ID}, if
Emp_ID → Emp_Name, this is a partial dependency because
Emp_Name depends only on Emp_ID, not the entire composite key.

6. Full Functional Dependency:
o A full functional dependency occurs when an attribute is functionally
dependent on the entire primary key (and not just part of it).
o This is the opposite of partial dependency and is a key aspect of Second
Normal Form (2NF).
o Example:
 In a table with a composite primary key {Emp_ID, Dept_ID}, if
{Emp_ID, Dept_ID} → Salary, then Salary is fully dependent on the
entire primary key.

7. Multivalued Functional Dependency:
o A multivalued functional dependency occurs when one attribute determines
a set of values for another attribute (or a set of attributes).
o This type of dependency often arises in complex situations and is handled in
Fourth Normal Form (4NF).
o Example:
 If a Student_ID determines multiple Courses for the student, this is a
multivalued functional dependency.

8. Join Dependency:
o A join dependency occurs when a table can be reconstructed from its
projections (subsets) without losing information. It is often used in Fifth
Normal Form (5NF).
o Example:
 If a relation can be decomposed into multiple tables and then
recombined without losing any information, it is governed by join
dependency.
Summary:

 Functional dependency defines the relationship between attributes in a table.


 Trivial Functional Dependency: Attribute determines itself.
 Non-Trivial Functional Dependency: One attribute determines another.
 Transitive Dependency: A chain of dependencies (X → Y → Z).
 Compositional Dependency: A combination of attributes determines another
attribute.
 Partial Dependency: A part of the composite key determines a non-key attribute.
 Full Dependency: Entire primary key determines a non-key attribute.
 Multivalued Dependency: One attribute determines a set of values.
 Join Dependency: A table can be reconstructed from its projections.

INFERENCE RULES FOR FUNCTIONAL DEPENDENCIES (FDS)

Inference rules are a set of rules used to derive all the possible functional dependencies that
can be inferred from a given set of functional dependencies in a relational database. These
rules allow us to determine all the implied dependencies that hold true, helping in the process
of normalization and simplifying the design of relational schemas.

Here are the primary inference rules:

1. Reflexivity Rule (F1):

If Y is a subset of X, then X → Y holds.

 Formally: If Y ⊆ X, then X → Y.
 Explanation: If a set of attributes X includes another set of attributes Y, then X
trivially determines Y. This is because Y is contained within X, and hence it can be
derived from X.
 Example: If {Emp_ID, Emp_Name} → Emp_ID, this is true because Emp_ID is a
subset of {Emp_ID, Emp_Name}.

2. Augmentation Rule (F2):

If X → Y, then XZ → YZ holds for any set of attributes Z.

 Formally: If X → Y, then for any set Z, XZ → YZ.


 Explanation: Augmentation involves adding attributes to both sides of the
dependency. If X determines Y, then adding an extra set of attributes Z to both sides
does not change the dependency.
 Example: If Emp_ID → Emp_Name, then {Emp_ID, Dept_ID} → {Emp_Name,
Dept_ID}.
3. Transitivity Rule (F3):

If X → Y and Y → Z, then X → Z.

 Formally: If X → Y and Y → Z, then X → Z.


 Explanation: If X determines Y, and Y determines Z, then X must also determine Z.
This is a transitive relationship.
 Example: If Emp_ID → Dept_ID and Dept_ID → Dept_Name, then Emp_ID →
Dept_Name.

4. Union Rule (F4):

If X → Y and X → Z, then X → YZ.

 Formally: If X → Y and X → Z, then X → YZ.


 Explanation: If X determines Y and X determines Z, then X determines both Y and
Z together.
 Example: If Emp_ID → Emp_Name and Emp_ID → Emp_Salary, then Emp_ID
→ {Emp_Name, Emp_Salary}.

5. Decomposition Rule (F5):

If X → YZ, then X → Y and X → Z.

 Formally: If X → YZ, then X → Y and X → Z.


 Explanation: If X determines both Y and Z together, then X must also determine Y
and Z individually.
 Example: If Emp_ID → {Emp_Name, Emp_Salary}, then Emp_ID →
Emp_Name and Emp_ID → Emp_Salary.

6. Pseudo-Transitivity Rule (F6):

If X → Y and Y ∪ Z → W, then X ∪ Z → W.

 Formally: If X → Y and Y ∪ Z → W, then X ∪ Z → W.


 Explanation: This rule is a combination of transitivity and augmentation. It says that
if X determines Y, and Y ∪ Z determines W, then X ∪ Z must determine W.
 Example: If Emp_ID → Dept_ID and {Dept_ID, Emp_Salary} → Emp_Name,
then {Emp_ID, Emp_Salary} → Emp_Name.

7. Armstrong's Axioms:

These inference rules are sometimes called Armstrong's Axioms, named after the computer
scientist William W. Armstrong, who introduced them as a formal system to derive
functional dependencies.

The Armstrong’s axioms or Inference Rules include the following:

1. Reflexivity.
2. Augmentation.
3. Transitivity.
4. Union.
5. Decomposition.
6. Pseudo-Transitivity.

Usage of Inference Rules:

 These rules are applied to derive all the functional dependencies implied by a given
set of FDs.
 They are particularly useful in database normalization, where you need to infer
additional FDs that might not have been explicitly listed in the original set but are
necessary for achieving higher normal forms.

Example (Using Inference Rules):

Consider the following functional dependencies:

 A→B
 B→C

Using the Transitivity Rule (F3), we can infer that:

 A→C

Similarly, consider the FDs:

 Emp_ID → Emp_Name
 Emp_ID → Emp_Salary

Using the Union Rule (F4), we can infer:

 Emp_ID → {Emp_Name, Emp_Salary}

These inference rules are powerful tools in the relational database theory for deriving
functional dependencies and ensuring the correctness of database design.

VARIOUS NORMAL FORMS:

Normalization is the process of organizing data in a database to reduce redundancy and avoid
undesirable characteristics like insertion, update, and deletion anomalies. The concept of
normal forms (NF) helps in ensuring that the database schema is structured properly.

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce Codd Normal Form)


4NF (Fourth Normal Form)

5NF (Fifth Normal Form)

Domain Key Normal Form(DKNF)

1. First Normal Form (1NF)

A relation is in First Normal Form (1NF) if:

 All attributes contain only atomic (indivisible) values.


 Each attribute contains values of a single type.
 Each record (row) is unique.

Example: Consider a Student table:

Student_ID Name Subjects


101 John Math, Science
102 Alice English, History

This table violates 1NF because the Subjects column contains multiple values (e.g., Math,
Science for John). To bring it into 1NF, we split the multi-valued column:

Student (1NF):

Student_ID Name Subject


101 John Math
101 John Science
102 Alice English
102 Alice History

Now, each attribute has atomic values, and the table satisfies 1NF.

2. Second Normal Form (2NF)

A relation is in Second Normal Form (2NF) if:

 It is in 1NF.
 There is no partial dependency (i.e., no non-prime attribute depends on a part of a
candidate key).

Partial Dependency: A non-prime attribute depends only on part of a composite primary


key.

Example:

Consider a Course Enrollment table:


Student_ID Course_ID Student_Name Instructor

101 CSE101 John Dr. Smith

101 CSE102 John Dr. Johnson

102 CSE101 Alice Dr. Smith

The composite primary key is {Student_ID, Course_ID}, but Student_Name depends only
on Student_ID, not on the entire composite key. Therefore, this violates 2NF because there
is a partial dependency.

To bring the table into 2NF, we split it into two tables:

Student Table:
Student_ID Student_Name
101 John
102 Alice

Enrollment Table:
Student_ID Course_ID Instructor
101 CSE101 Dr. Smith
101 CSE102 Dr. Johnson
102 CSE101 Dr. Smith
Now, both tables are in 2NF.

3. Third Normal Form (3NF)

A relation is in Third Normal Form (3NF) if:


 It is in 2NF.
 There is no transitive dependency (i.e., non-prime attributes should not depend on
other non-prime attributes).
Example:
Consider the following Employee table:
Emp_ID Emp_Name Dept_ID Dept_Name
101 John D01 Sales
102 Alice D02 Marketing
Emp_ID Emp_Name Dept_ID Dept_Name
103 Bob D01 Sales
Here, Dept_Name depends on Dept_ID, and Dept_ID depends on Emp_ID. This is a
transitive dependency: Emp_ID → Dept_ID → Dept_Name.
To bring the table into 3NF, we remove the transitive dependency by splitting the table:

Employee Table:
Emp_ID Emp_Name Dept_ID
101 John D01
102 Alice D02
103 Bob D01

Department Table:
Dept_ID Dept_Name
D01 Sales
D02 Marketing
Now, both tables are in 3NF.

4. Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd Normal Form (BCNF) if:


 It is in 3NF.
 Every determinant is a candidate key.
Example:
Consider the following Course Table:
Course_ID Instructor Instructor_Room
CSE101 Dr. Smith R101
CSE102 Dr. Johnson R102

In this table, Instructor → Instructor_Room (i.e., the instructor determines the room). But
Instructor is not a candidate key because Course_ID can also determine Instructor. Hence,
this violates BCNF.
To bring the table into BCNF, we split it into two tables:

Instructor Table:
Instructor Instructor_Room
Dr. Smith R101
Dr. Johnson R102

Course Table:
Course_ID Instructor
Course_ID Instructor
CSE101 Dr. Smith
CSE102 Dr. Johnson
Now, both tables are in BCNF.

5. Fourth Normal Form (4NF)

A relation is in Fourth Normal Form (4NF) if:


 It is in BCNF.
 It has no multivalued dependency.
Example:
Consider the Student table:
Student_ID Course Hobby
101 CSE101 Basketball
101 CSE102 Chess
101 CSE103 Reading
Here, Student_ID determines both Course and Hobby independently, which leads to a
multivalued dependency.

To bring this into 4NF, we split it into two tables:

Student_Course Table:
Student_ID Course
101 CSE101
101 CSE102
101 CSE103

Student_Hobby Table:
Student_ID Hobby
101 Basketball
101 Chess
101 Reading
Now, the database is in 4NF.

6. Fifth Normal Form (5NF)

A relation is in Fifth Normal Form (5NF) if:


 It is in 4NF.
 It has no join dependency and joining the tables does not lose information.
Example:
Consider the table for Projects:
Project_ID Employee_ID Skill
Project_ID Employee_ID Skill
P1 E1 Java
P1 E2 Python
P2 E1 C++
This table can be decomposed without losing any information into the following tables:

Project_Employee Table:
Project_ID Employee_ID
P1 E1
P1 E2
P2 E1

Project_Skill Table:
Project_ID Skill
P1 Java
P1 Python
P2 C++
Both tables are in 5NF, and there’s no loss of information when joining them back.

Domain-Key Normal Form (DKNF)

Domain-Key Normal Form (DKNF) is the highest level of normalization in relational


database design. A relation is in Domain-Key Normal Form (DKNF) if it does not contain
any modification anomalies, which are typically caused by improper design or dependencies.

To understand DKNF, we need to break down the concept:

1. Domain: Refers to the set of valid values that an attribute can take. For example, the
domain of the attribute age could be the set of positive integers.
2. Key: Refers to the set of attributes that uniquely identify a record in a relation.

In DKNF, a relation is normalized such that:

 No Constraints: There are no constraints that would violate the basic principles of
relational databases. This means no functional dependency or multivalued
dependency should exist in a way that requires changes to the database that could lead
to anomalies (insertion, deletion, or update anomalies).
 No Hidden Dependencies: There are no hidden dependencies between attributes
other than those defined by the key itself. All constraints (including integrity
constraints, functional dependencies, and domain constraints) are explicitly enforced
by the domain and the key.
Thus, the key idea of DKNF is that a table is free from all types of modification anomalies
and all constraints that define relationships between attributes are explicitly defined by either
the domain or the key.

Key Points of DKNF:

1. No Functional Dependencies Other Than the Key: A relation in DKNF does not
have any functional dependencies other than those imposed by the primary key. Any
dependency that violates this rule would imply that the relation is not in DKNF.
2. No Multivalued Dependencies: A relation in DKNF must also not have any
multivalued dependencies, as these could lead to redundancy in the database.
3. Constraints Are Explicit: All constraints (such as functional dependencies) are
explicitly stated and enforced by either the domain of the attributes or the key. The
structure of the table should be such that these constraints do not require additional
business logic or dependencies to enforce.

Example of Domain-Key Normal Form (DKNF)

Consider a Student table:

Student_ID Name Course_ID Instructor


101 John CSE101 Dr. Smith
102 Alice CSE102 Dr. Johnson
103 Bob CSE103 Dr. Smith

In this table:

 Student_ID is the primary key, and it uniquely identifies each row.


 The domains of Name, Course_ID, and Instructor are well-defined (i.e., Name
should be a string, Course_ID a valid course identifier, and Instructor a valid
instructor name).
 There are no hidden functional dependencies (other than those implied by the key).
 There are no multivalued dependencies between attributes.

This table satisfies DKNF because all dependencies are either domain-based or key-based,
and there are no modification anomalies. The constraints (e.g., valid Instructor, valid
Course_ID) are defined by the domain, and the Student_ID is the key.

Summary of Normal Forms:

1. 1NF: Eliminate repeating groups; make all attributes atomic.


2. 2NF: Eliminate partial dependencies (dependencies on part of a composite key).
3. 3NF: Eliminate transitive dependencies (non-key attributes depending on other non-
key attributes).
4. BCNF: Every determinant is a candidate key.
5. 4NF: Eliminate multivalued dependencies.
6. 5NF: Eliminate join dependencies (ensure no loss of information when joining).
Each higher normal form aims to remove different kinds of anomalies and redundancy,
improving the efficiency and integrity of the database structure.

FINDING CANDIDATE KEY :

A candidate key is a minimal set of attributes that can uniquely identify a record in a
relation. There may be more than one candidate key in a relation, and each candidate key can
determine all attributes in the relation.

Steps to find the candidate key:

1. Find the closure of each possible set of attributes: Start with each attribute (or set
of attributes) and compute its closure.
2. Check if the closure contains all attributes: If the closure of a set of attributes
contains all attributes in the relation, then this set is a super key.
3. Find the minimal set: Among the super keys, the candidate keys are the minimal
ones - the smallest sets that still determine all attributes in the relation.
4. Repeat: If a set of attributes is a candidate key, check if removing any attribute from
it still allows it to determine all attributes. If it does, then it's not minimal.
EXAMPLE:2 FIND THE CANDIDATE KEY
E-R MODEL, NORMALIZATION PROBLEMS – REFER CLASS
NOTES

You might also like