Module 4
Database Design
Relational DB Design:
It's
the process of designing a database that stores data in the
form of tables (relations).
The goal is to design it in a way that's:
Efficient
Easy to maintain
Easy to retrieve data from
Design Stages:
When designing a relational database, there are 4 main stages:
Define Relations – Decide what tables you need.
Define Primary Keys (PK) – Choose the main unique
identifier for each table (e.g., ID).
Define Relationships (Rship) – Set up how tables are
connected (e.g., customer → orders).
Normalization – Improve table structure by removing
duplication and organizing data.
Two Major Design Approaches
There are 2 ways you can design a database:
1. Top-Down Design
a) Develop a Conceptual Model using something
like an ER diagram (Entity-Relationship model).
b) Map the model to tables (convert entities to
tables).
c) Normalize the tables (to improve structure and
remove redundancy).
2. Bottom-Up Design
a) Design by Decomposition – Break one big
table into smaller meaningful ones.
b) Use Normalization to improve these tables
Measuring Quality of DB Design (2 Levels)
There are 2 ways to look at the quality of the design:
1. Logical Level Design
Focus: Database structure
Involves deciding what tables, fields, and relationships
you need.
Example: Designing tables for “Customers”, “Orders”, etc.
2. Physical Level Design
Focus: How data is stored and accessed
Involves storage formats, indexes, and access paths.
Features of a Good Relational Database Design
A good design should:
Be easy to modify and maintain
Make it easy to retrieve data
Allow developers to easily build apps that use it
Informal design guidelines for relation
schemas
The four informal measures of quality for relation
schema
Semantics of the attributes
Reducing the redundant values in tuples
Reducing the null values in tuples
Disallowing the possibility of generating spurious
tuples
Semantics of the attributes
Semantics refers to the meaning of attributes in a
relation. It specifies how to interpret attribute
values in a tuple and how they relate to each other.
Guideline 1:
Design a relation schema so that it is easy to explain
its meaning.
Do not combine attributes from multiple entity types
and relationship types into a single relation.
If a relation includes attributes from multiple
entities or relationships, it can lead to semantic
ambiguity, making it hard to explain or understand
the meaning of the relation.
Examples:
Emp_dept Relation
Ename SSN DOB Addr Dno Dname Mgrssn
This mixes employee-related attributes (Ename, SSN,
DOB, etc.) with department-related attributes (Dno,
Dname, Mgrssn).
Emp_proj Relation
SSN Pno Hrs Enam Pnam | Ploc
e e
This mixes employee (SSN, Ename), project (Pno,
Pname, Ploc), and relationship (Hrs) attributes.
Reducing the redundant values in tuples:
Good schema design minimizes storage space and
avoids redundancy.
Storing the same information repeatedly across
tuples (rows) wastes space and can cause anomalies.
This happens when attributes from multiple
entities are grouped into a single relation.
Problems Due to Redundancy and Anomalies:
Insert Anomaly
To insert a new employee tuple into Emp_dept,
you must include either:
Department details (even if the employee doesn't
work in a department yet), or
NULL for unrelated info (which is bad practice).
Example Table: Emp_dept
MgrSS
Ename SSN Addr Dno Dname
N
A 11 X 1 HR 22
B 22 Y 2 Fin 11
C 33 Z 3 Acc 22
D 44 W NULL NULL NULL
NULL NULL NULL 4 Testing 33
Problems Illustrated:
Row 4 (D): An employee without department info → must use
NULLs for department fields.
Row 5: Trying to insert a department without employees → must
use NULLs for employee fields.
This leads to:
Wasted space due to NULLs.
Insert anomaly: Can't add departments without employees unless
using NULLs (bad design).
SSN as a primary key cannot be NULL, making it hard to insert
Deletion Anomaly:
A deletion anomaly occurs when deleting a tuple (row) that
contains information about one entity also causes loss of
important information about another, unrelated entity.
Example:Emp_dept
Ename SSN Addr Dno Dname MgrSSN
A 11 X 1 HR 22
B 22 Y 2 Fin 11
C 33 Z 2 Acc 22
Now, if we run:
DELETE FROM Emp_dept WHERE SSN = 11;
We're deleting Employee A, but:
Employee A is the only one working in department 1 (HR).
So, deleting that row removes all info about Dept 1, including:
Dname = HR
Dno = 1
Good Design: Normalized Tables
Instead of storing both employee and department
data in one table, split them:
Table 1: Emp
Ename SSN Addr Dno
A 11 X 1
B 22 Y 2
C 33 Z 2
Table 2: Dept
Dno Dname MgrSSN
1 HR 22
2 Fin 11
3 Acc 22
Now, if you delete an employee, department info
stays safe.
Modification Anomalies
A modification anomaly occurs when:
A change in one piece of data requires changes in
multiple rows.
If these changes aren't made everywhere consistently, the
database becomes inconsistent.
Example: EMP_DEPT
If department data like the department name (Dname) or
manager SSN (MgrSSN) is repeated in multiple rows (i.e.,
for each employee), you face problems during updates.
Scenario Given:
Suppose we want to change department name "Acc" to
"Accounts".
If multiple rows contain "Acc" as department name, we
must update all of them.
If even one row is missed, it leads to inconsistency.
DNO DNAME MGRSSN
1 HR 22
2 Fin 11
3 Accounts 22
4 Acc 22
Problem: "Acc" and "Accounts" refer to the same department, but are
now inconsistent.
Note beside the table:
"Causes inconsistency – we have to change everywhere.“
Guideline 2:
Design the base relation schema so that update anomalies are
not present.
This means:
Separate entity data into distinct tables.
Avoid duplication of the same data in multiple rows.
If update anomalies are unavoidable:
◦ Document them.
◦ Ensure that any application updating the database does so correctly and
Reducing the null values in tuples :
Design tables so that most attributes have values in
most rows—avoid columns that are NULL in the
majority of tuples.”
Why it matters:
Wastes storage
Complex or unpredictable queries
NULL meaning is ambiguous
Guideline 3:
Avoid placing attributes in a base relation whose
values are mostly null. Disallowing spurious tuples.
Problematic Table:
Passport LicenseN
EmpID Name Email
Number umber
1 Alice [email protected] P1234 (NULL)
m
2 Bob [email protected] (NULL) L5678
m
3 Charlie charlie@x. (NULL) (NULL)
com
Passport Number and License Number are mostly NULL.
Redesign idea: Move them to separate tables.
Better Design:
EMPLOYEE(EmpID, Name, Email)
EMP_PASSPORT(EmpID, Passport Number)
EMP_LICENSE(EmpID, License Number)
No NULLs in the main table, and optional data is
stored only when available
Generating spurious tuples :
Decompose tables so that rejoining them by key
attributes guarantees no spurious rows.”
A natural join on non-key attributes may produce false
tuples.
Use only primary-key ↔ foreign-key joins to ensure a
lossless join.
Spurious Tuples Example
Ssn
OriginalPno
Table: EMP_PROJ
Hours Ename Pname Ploc
101 1 20 Alice ISRO Bng
102 2 15 Bob IISc Bng
Decomposed:
EMP_LOCS(Ename, Ploc)
EMP_PROJ1(Ssn, Pno, Hours, Pname, Ploc)
Bad Join on Ploc:
SELECT *
FROM EMP_LOCS NATURAL JOIN EMP_PROJ1;
Produces:
Ename Ploc Ssn Pno Hours Pname
Alice Bng 101 1 20 ISRO
Alice Bng 102 2 15 IISc
Bob Bng 101 1 20 ISRO
Bob Bng 102 2 15 IISc
Alice is wrongly associated with the IISc project, and
Bob with ISRO—these are spurious tuples
Fix – Use Lossless Decomposition
Normalized Tables:
EMPLOYEE(Ssn, Ename)
PROJECT(Pno, Pname, Ploc)
WORKS_ON(Ssn, Pno, Hours)
Rejoin properly:
⋈ WORKS_ON USING(Ssn)
EMPLOYEE
⋈ PROJECT USING(Pno)
Join on primary-key ↔ foreign-key ensures no
spurious tuples and lossless reconstruction.
Ename Ploc Ssn Pno Hours Pname
Alice Bng 101 1 20 ISRO
Alice Bng 102 2 15 IISc
Bob Bng 101 1 20 ISRO
Summary:
Guideline What It Prevents How to Fix It
Move rarely-used
Storage waste, NULL-
Guideline 3 attributes into separate
related confusion
tables
False data from bad Decompose only on PK–
Guideline 4
JOINs FK; enforce lossless join
Functional Dependency:
A functional dependency, written as X → Y, means:
Whenever two rows have the same values for
attributes in set X, they must also have the same
values for attributes in set Y. Formally:
Given relation schema R, subsets X, Y ⊆ R,
X is called the determinant.(LHS)
Y is the dependent.(RHS)
For instance, in a student table:
StudentID Name Semester
1234 Alice 4
1235 Bob 6
We see StudentID → Name, Semester, because each
StudentID corresponds to one unique Name & Semester
Why FDs Are Important
Normalization: Identify which attributes belong
together logically.
Avoid anomalies: If dependencies are improperly
placed, updating or deleting data may lead to
inconsistencies.
Schema structure: Helps decide how to split data into
well-structured tables.
Definition
For relation schema R(A₁,…,Aₙ), subsets X, Y ⊆ R,
we say X → Y if, in every legal instance of R,
whenever two tuples t₁ and t₂ agree on all attributes in
X, they must also agree on all attributes in Y
Key points:
LHS (X) = determinant; RHS (Y) = dependent.
Example with Table
Consider this table R(A, B, C)
A B C
1 2 3
4 2 3
5 3 3
A → B: 1-2,2-2,5-3.
B → C:2-3,3-3
BC → A : NO 2 3-1,4 3 3-5
AC → B: 1 3-2,4 3-2,5 3- 3
Therefore for same values 2 different values
Application of FD:
These are the 4 main applications listed in your
image. Let's understand each:
To find additional FDs
Using known FDs, we can find new FDs using rules (like
Armstrong’s Axioms).
Example: If we know:
A→B
B→C
Then we can say: A → C (by transitivity rule)
To identify the key
Functional Dependencies help us find:
o Primary Key (PK): Main unique identifier
o Super Key (SK): Bigger set that uniquely identifies
o Candidate Key (CK): Minimum key with no extra attributes.
Example: If A → B and A → C, then A is a key for table (it can
uniquely find all other attributes).
To find equivalent FDs
Sometimes, two sets of FDs mean the same thing, even if they look
different.
These are called equivalent sets of FDs.
Example: FD set 1:
A → B
A → C
FD set 2:
A → BC
To find minimal FDs
We try to simplify FDs — remove unnecessary attributes
and write the smallest set of FDs that still describe the
same data.
This process is called Finding the Minimal Cover or
Canonical Cover.
Example: From A → BC, we can split to:
A → B
A → C
This is the minimal version.
Classification of FD:
1.Trivial Functional Dependency:
Occurs when the RHS (dependent) is a subset of the
LHS (determinant).
These always hold but aren’t informative for design.
Notation: X→Y is trivial if Y⊆X.
Example:
RollNo Name
1 Alice
2 Bob
{RollNo, Name} → Name is trivial because Name is
already in the determinant set
2. Non‑trivial Functional Dependency:
Here, RHS is not a subset of LHS.
These are meaningful constraints like primary keys
determining other attributes.
Example: RollNo→Name
RollNo Name Age
1 A 17
2 B 18
RollNo→Name and RollNo→Age are non‑trivial.
3. Partial Functional Dependency.
Occurs in relations with composite keys. A non‑key
attribute depends on only part of the composite key
—not the whole.
Leads to 2NF violation
Example:
StudentN
StudentID CourseID Grade
ame
101 C1 Alice A
101 C2 Alice B
Composite key = {StudentID, CourseID} → Grade.
StudentName is functionally dependent on
StudentID alone → partial dependency.
4. Full (Complete) Functional Dependency
A non‑key attribute depends on the entire composite
key, and not just a part.
No proper subset of the key determines the
dependent attribute.
Required to satisfy 2NF.
Example:
StudentID CourseID Grade
101 C1 A
101 C2 B
{StudentID, CourseID} → Grade is a full FD (neither
StudentID nor CourseID alone suffice).
5. Transitive Functional Dependency
Occurs when:
X→Y and Y→Z ⇒ X→Z
The dependency is indirect. Violates 3NF if Z is a
non‑prime attribute.
Example:
DeptNam
EmpID DeptID
e
E1 D1 HR
E2 D2 IT
EmpID→DeptID
DeptID→DeptName
Therefore, transitively: EmpID→DeptName