UNIT-3
Relational Database Design
Normalization
Normalization is a process used in database design to
systematically organize data and eliminate redundancy and
inconsistencies. It involves breaking down large tables into smaller,
more focused tables and defining relationships between them. This
process helps to improve data integrity, reduce data duplication,
and make the database more efficient and maintainable.
Why is Normalization Important?
Normalization addresses several key issues in database design:
Data Redundancy: Reduces duplication of data, saving
storage space and preventing inconsistencies.
Anomalies: Minimizes insertion, deletion, and update
anomalies, which can lead to data integrity problems.
Data Integrity: Ensures data accuracy and consistency by
enforcing data dependencies.
Query Performance: Can improve query performance by
reducing the amount of data that needs to be processed.
Data Flexibility: Makes the database more adaptable to
changes in data requirements.
Advantages of Normal Form:-
Reduced data redundancy: Normalization helps to
eliminate duplicate data in tables, reducing the amount of
storage space needed and improving database efficiency.
Improved data consistency: Normalization ensures that
data is stored in a consistent and organized manner,
reducing the risk of data inconsistencies and errors.
Simplified database design: Normalization provides
guidelines for organizing tables and data relationships,
making it easier to design and maintain a database.
Improved query performance: Normalized tables are
typically easier to search and retrieve data from, resulting in
faster query performance.
Easier database maintenance: Normalization reduces the
complexity of a database by breaking it down into smaller,
more manageable tables, making it easier to add, modify,
and delete data.
Pitfalls in Database Design
Database design is a critical aspect of any software application. A
well-designed database can significantly improve the performance,
scalability, and maintainability of the application. However, poor
database design can lead to numerous problems, including
performance issues, data inconsistencies, and security
vulnerabilities.
Here are some common pitfalls in database design, along with
examples:
1. Poor Normalization
Under-Normalization: This occurs when a database is not
normalized enough. It can lead to data redundancy, update
anomalies, and insertion anomalies.
o Example: A single table storing information about
products, orders, and customers. If a product is
removed, all associated orders and customer
information are also lost.
Over-Normalization: This occurs when a database is
excessively normalized. It can lead to overly complex joins
and decreased query performance.
o Example: Breaking down a table into too many smaller
tables, resulting in a large number of joins to retrieve
simple information.
2. Inefficient Data Types
Choosing incorrect data types: Using data types that are
too large or too small can waste storage space and hinder
performance.
o Example: Using a VARCHAR(255) to store a two-digit
integer.
3. Lack of Indexing
Missing indexes: Without proper indexing, the database
system may need to scan entire tables to find specific data,
leading to slow query performance.
o Example: Not creating an index on a frequently
searched column, such as a customer's last name.
4. Poorly Designed Queries
Inefficient queries: Complex queries that involve full table
scans or unnecessary joins can degrade performance.
o Example: Using SELECT * to retrieve all columns
when only a few are needed.
5. Lack of Security Measures
Weak access controls: Insufficient access controls can
expose sensitive data to unauthorized users.
o Example: Not implementing strong password policies
or role-based access control.
6. Ignoring Data Quality
Poor data quality: Inconsistent or inaccurate data can lead
to incorrect decisions and system failures.
o Example: Allowing duplicate entries or invalid data to
be entered into the database.
7. Insufficient Backup and Recovery Procedures
Lack of backup and recovery: A lack of robust backup and
recovery procedures can lead to data loss in case of
hardware failures or cyberattacks.
o Example: Not regularly backing up the database or not
having a disaster recovery plan.
By understanding and avoiding these common pitfalls, you can
design databases that are efficient, reliable, and secure.
Relational Operations in DBMS
Relational algebra is a formal system for manipulating relational
databases. It provides a set of operators that can be used to query
and manipulate data stored in relational databases. These
operations are the foundation of SQL, the most widely used
database query language.
Types of Relational Operations :-
Selection (σ):
1. Selects tuples from a relation based on a given
condition.
2. Syntax: σ_condition(relation)
3. Example: To select all students with a
GPA greater than 3.5 from a "Students"
relation:
σ GPA>3.5(Students)
Projection (π):
1. Selects specific attributes from a relation.
2. Syntax: π_attribute_list(relation)
3. Example: To select only the "StudentID"
and "Name" attributes from the "Students"
relation:
π StudentID, Name(Students)
Union (∪):
1. Combines two relations with the same schema,
eliminating duplicates.
2. Syntax: relation1 ∪ relation2
3. Example: To combine two relations,
"Freshmen" and "Sophomores," into a
single relation:
Freshmen ∪ Sophomores
Intersection (∩):
1. Finds the tuples that are common to two relations with
the same schema.
2. Syntax: relation1 ∩ relation2
3. Example: To find students who are both in
the "Honors" and "MathClub" relations:
Honors ∩ MathClub
Set Difference (-):
1. Removes tuples from one relation that are also in
another relation.
2. Syntax: relation1 - relation2
3. Example: To find students who are in the
"Freshmen" relation but not in the "Honors"
relation:
Freshmen - Honors
Cartesian Product (×):
1. Combines every tuple from one relation with every
tuple from another relation.
2. Syntax: relation1 × relation2
3. Example: To combine every student with
every course:
Students × Courses
Join (⋈):
1. Combines rows from two or more relations, based on a
related column between them.
2. Types of Joins:
1. Natural Join: Joins relations based on common
attributes.
2. Equijoin: Joins relations based on an equality
condition.
3. Theta Join: Joins relations based on a general
comparison operator (e.g., <, >, ≠).
4. Outer Join: Preserves tuples from one relation
even if they don't have matching tuples in the
other relation. (Left Outer Join, Right Outer Join,
Full Outer Join)
3. Example: To join the "Students" and
"Enrollments" relations based on the
"StudentID" attribute:
Students ⋈ Enrollments
Denormalization in DBMS
Denormalization is a database optimization technique that involves
adding redundant data to one or more tables in a normalized
database. This is done to improve read performance by reducing
the number of joins required to retrieve data. While normalization
is a technique to eliminate redundancy and improve data integrity,
denormalization intentionally introduces redundancy to enhance
performance.
Why Denormalize?
Performance Improvement:
o Reduces Join Operations: By storing redundant data,
denormalization can reduce the number of joins
required to retrieve information, leading to faster query
execution.
o Improves Read Performance: In read-heavy
workloads, denormalization can significantly boost
performance.
When to Denormalize:
Read-Heavy Workloads: When the database is primarily
used for read operations, denormalization can be beneficial.
Complex Queries: If complex queries involving multiple
joins are frequently executed, denormalization can simplify
the query and improve performance.
Real-Time Systems: In real-time systems where low latency
is critical, denormalization can help reduce response times.
Risks of Denormalization:
Data Inconsistencies: Introducing redundancy can increase
the risk of data inconsistencies, as changes made to one
copy of the data may not be reflected in other copies.
Increased Storage Requirements: Denormalization can
lead to increased storage requirements due to data
duplication.
Maintenance Overhead: Maintaining data consistency
across multiple tables can be more complex.
Denormalization Techniques:
1. Combining Tables: Merging multiple tables into a single
table to reduce the number of joins.
2. Adding Redundant Columns: Adding columns to a table
that store data from related tables.
3. Creating Summary Tables: Creating tables that store pre-
calculated aggregates to avoid complex calculations at query
time.
Example:
Consider a database with two tables: Customers and Orders.
Normalized Tables:
SQL
CREATE TABLE Customers (
CustomerID INT PRIMARY KEY,
CustomerName VARCHAR(50),
Address VARCHAR(100)
);
CREATE TABLE Orders (
OrderID INT PRIMARY KEY,
CustomerID INT,
OrderDate DATE,
TotalAmount DECIMAL(10,2),
FOREIGN KEY (CustomerID) REFERENCES
Customers(CustomerID)
);
Denormalized Table:
SQL
CREATE TABLE OrdersWithCustomerInfo (
OrderID INT PRIMARY KEY,
CustomerID INT,
OrderDate DATE,
TotalAmount DECIMAL(10,2),
CustomerName VARCHAR(50),
Address VARCHAR(100),
FOREIGN KEY (CustomerID) REFERENCES
Customers(CustomerID)
);
In the denormalized table, CustomerName and Address are added
to the OrdersWithCustomerInfo table, reducing the need to join the
Customers table when retrieving order information.
Issues in Physical Database Design
Physical database design involves determining the physical
storage structures and access paths for a database. While it's a
crucial step in database development, several issues can arise if
not addressed properly.
Here are some common issues in physical database design:
1. Storage Space Management:
Overallocation: Allocating excessive storage space can
lead to wasted resources and increased costs.
Underallocation: Insufficient storage can result in
performance degradation and data loss.
Inefficient Storage Structures: Poorly chosen storage
structures can impact performance and storage utilization.
2. Performance Tuning:
Slow Query Performance: Inefficient indexing, query
optimization, and hardware configuration can lead to slow
query execution.
I/O Bottlenecks: Excessive I/O operations can degrade
performance, especially in large databases.
Concurrency Control Issues: Poorly designed concurrency
control mechanisms can lead to data inconsistencies and
deadlocks.
3. Security and Privacy:
Data Breaches: Inadequate security measures can expose
sensitive data to unauthorized access.
Privacy Violations: Failure to comply with data privacy
regulations can result in legal and reputational damage.
Unauthorized Access: Weak access controls can allow
unauthorized users to access and modify data.
4. Data Integrity and Consistency:
Data Loss: Hardware failures, software bugs, and human
error can lead to data loss.
Data Corruption: Incorrect data entry, updates, or deletions
can corrupt the database.
Data Inconsistencies: Inconsistent data can lead to
incorrect decisions and system failures.
5. Scalability:
Performance Degradation: As the database grows,
performance may deteriorate if it's not designed to scale.
Storage Limitations: The database may exceed storage
capacity, requiring additional hardware or storage solutions.
Concurrency Issues: Increased concurrency can lead to
performance bottlenecks and data inconsistencies
Joins in Relational Databases
Join is an operation in DBMS(Database Management
System) that combines the row of two or more tables based on
related columns between them. The main purpose of Join is to
retrieve the data from multiple tables in other words Join is used
to perform multi-table queries. It is denoted by ⨝.
Types of Join :
There are many types of Joins in SQL.
Inner Join is a join operation in DBMS that combines two or more
tables based on related columns and returns only rows that have
matching values among tables. Inner join of two types.
Inner Join
Syntax:
SELECT column_name(s)FROM table1INNER JOIN table2
ON table1.column_name = table2.column_name;
Example: Suppose there are two tables Table A and Table B
Table A
Number Square
2 4
Number Square
3 9
Table B
Number Cube
2 8
3 27
A ⨝B
Output
Number Square Cube
2 4 8
3 9 27
2. Outer Join
Outer join is a type of join that retrieves matching as well as non-
matching records from related tables. These three types of outer
join
Left outer join
Right outer join
Full outer join
(a) Left Outer Join
It is also called left join. This type of outer join retrieves all
records from the left table and retrieves matching records from
the right table.
Example: Suppose there are two tables Table A and Table B
Table A
Number Square
2 4
3 9
4 16
Table B
Number Cube
2 8
3 27
5 125
A ⟕B
Output
Number Square Cube
2 4 8
3 9 27
4 16 NULL
SQL Query-
SELECT * FROM TableA LEFT OUTER JOIN TableB
ON TableA.Number = TableB.Number;
Explanation: Since we know in the left outer join we take all the
columns from the left table (Here Table A) In the table A we can
see that there is no Cube value for number 4. so we mark this as
NULL.
(b) Right Outer Join
It is also called a right join. This type of outer join retrieves all
records from the right table and retrieves matching records from
the left table. And for the record which doesn’t lies in Left table
will be marked as NULL in result Set.
Right Outer Join
Example: Suppose there are two tables Table A and Table B
A ⟖B
Output:
Number Square Cube
2 4 8
3 9 27
5 NULL 125
SQL Query
SELECT * FROM TableA RIGHT OUTER JOIN
TableB ON TableA.Number= TableB.Number;
Explanation: Since we know in the right outer join we take all the
columns from the right table (Here Table B) In table A we can see
that there is no square value for number 5. So we mark this as
NULL.
(c) Full Outer Join
FULL JOIN creates the result set by combining the results of both
LEFT JOIN and RIGHT JOIN. The result set will contain all the
rows from both tables. For the rows for which there is no
matching, the result set will contain NULL values.
Example: Table A and Table B are the same as in the left outer
join
A ⟗B
Output:
Number Square Cube
2 4 8
3 9 27
4 16 NULL
5 NULL 125
3. Self Join
A SQL SELF JOIN is a type of join operation where a table
is joined with itself. It allows you to combine data from a
single table by creating a virtual copy of the table and
establishing relationships between the original and virtual
tables. Self joins are used to compare or combine data
within the same table, often by creating relationships
between rows within the table
Syntax
Following is the basic syntax of SQL Self Join −
SELECT column_name(s)
FROM table1 a, table1
WHERE a.common_field = b.common_field;
As we can see in the figure below, the information regarding the
colors assigned and a color each employee picked is entered into
a table. The table is joined to itself using self join over the color
columns to match employees with their Secret Santa.
Join Dependency
A Join Dependency on a relation schema R, specifies a
constraint on states, r of R that every legal state r of R should have
a lossless join decomposition into R1R1, R2R2,..., RnRn. In a
database management system, join dependency is a
generalization of the idea of multivalued dependency.
Types of Join Dependency :-
There are two types of Join Dependencies:
Lossless Join Dependency: It means that whenever the
join occurs between the tables, then no information should
be lost, the new table must have all the content in the
original table.
Lossy Join Dependency: In this type of join dependency,
data loss may occur at some point in time which includes
the absence of a tuple from the original table or duplicate
tuples within the database.
Example of Join Dependency :-
Suppose we have the following table R:
E_Name Company Product
Rohan Comp1 Jeans
Harpreet Comp2 Jacket
Anant Comp3 TShirt
We can break, or decompose the above table into three
tables, this would mean that the table is not in 5NF!
The three decomposed tables would be:
1. R1: The table with columns E_Name and Company.
E_Name Company
Rohan Comp1
Harpreet Comp2
Anant Comp3
2. R2: The table with columns E_Name and Product.
E_Name Product
Rohan Jeans
Harpreet Jacket
Anant TShirt
3. R3: The table with columns Company and Product.
Company Product
Comp1 Jeans
Company Product
Comp2 Jacket
Comp3 TShirt
Let's try to figure out whether or not R has a join dependency.
Step 1- First, the natural join of R1 and R2:
E_Name Company Product
Rohan Comp1 Jeans
Harpreet Comp2 Jacket
Anant Comp3 TShirt
Step 2- Next, let's perform the natural join of the above table
with R3:
E_Name Company Product
Rohan Comp1 Jeans
Harpreet Comp2 Jacket
Anant Comp3 TShirt
In the above example, we do get the same table R after performing
the natural joins at both steps, luckily.Therefore, our join
dependency comes out to be: {(E_Name, Company ), (E_Name,
Product), (Company, Product)}
Because the above-mentioned relations are joined dependent,
they are not 5NF. That is, a join relation of the three relations
above is equal to our initial relation table R.
Multi-valued dependency
o Multivalued dependency occurs when two attributes in a
table are independent of each other but, both depend on a
third attribute.
o A multivalued dependency consists of at least two attributes
that are dependent on a third attribute that's why it always
requires at least three attributes.
Example: Suppose there is a bike manufacturer company which
produces two colors(white and black) of each model every year.
BIKE_MODEL MANUF_YEAR COLOR
M2011 2008 White
M2001 2008 Black
M3001 2013 White
M3001 2013 Black
M4006 2017 White
M4006 2017 Black
Here columns COLOR and MANUF_YEAR are dependent on
BIKE_MODEL and independent of each other.
In this case, these two columns can be called as multivalued
dependent on BIKE_MODEL. The representation of these
dependencies is shown below:
. BIKE_MODEL → → MANUF_YEAR
. BIKE_MODEL → → COLOR
This can be read as "BIKE_MODEL multidetermined
MANUF_YEAR" and "BIKE_MODEL multidetermined COLOR".
Concepts of Normalization - 1NF, 2NF, 3NF,4NF
and 5NF with example.
What is Normalization?
o Normalization is the process of organizing the data in the
database.
o Normalization is used to minimize the redundancy from a
relation or set of relations. It is also used to eliminate
undesirable characteristics like Insertion, Update, and
Deletion Anomalies.
o Normalization divides the larger table into smaller and links
them using relationships.
o The normal form is used to reduce redundancy from the
database table.
Advantages of Normalization :-
o Normalization helps to minimize data redundancy.
o Greater overall database organization.
o Data consistency within the database.
o Much more flexible database design.
o Enforces the concept of relational integrity.
The First Normal Form – 1NF
For a table to be in the first normal form, it must meet the following
criteria:
a single cell must not hold more than one value (atomicity)
there must be a primary key for identification
no duplicated rows or columns
each column must have only one value for each row in the
table
Examples of 1NF :-
Imagine we're building a restaurant management application. That
application needs to store data about the company's employees
and it starts out by creating the following table of employees:
employee_i nam job_cod job state_cod home_stat
d e e e e
E001 Alice J01 Chef 26 Michigan
E001 Alice J02 Waiter 26 Michigan
E002 Bob J02 Waiter 56 Wyoming
E002 Bob J03 Bartend 56 Wyoming
er
E003 Alice J01 Chef 56 Wyoming
All the entries are atomic and there is a composite primary key
(employee_id, job_code) so the table is in the first normal form
(1NF).
But even if you only know someone's employee_id, then you can
determine their name, home_state, and state_code (because they
should be the same person). This means name, home_state,
and state_code are dependent on employee_id (a part of primary
composite key). So, the table is not in 2NF. We should separate
them to a different table to make it 2NF.
The Second Normal Form – 2NF
The 1NF only eliminates repeating groups, not redundancy. That’s
why there is 2NF.
A table is said to be in 2NF if it meets the following criteria:
it’s already in 1NF
has no partial dependency. That is, all non-key attributes are
fully dependent on a primary key.
Example of Second Normal Form (2NF):-
employee_roles Table
employee_id job_code
E001 J01
E001 J02
E002 J02
E002 J03
E003 J01
employees Table
employee_id name state_code home_state
E001 Alice 26 Michigan
E002 Bob 56 Wyoming
E003 Alice 56 Wyoming
jobs table
job_code job
J01 Chef
J02 Waiter
J03 Bartender
home_state is now dependent on state_code. So, if you know
the state_code, then you can find the home_state value.
To take this a step further, we should separate them again to a
different table to make it 3NF.
The Third Normal Form – 3NF
When a table is in 2NF, it eliminates repeating groups and
redundancy, but it does not eliminate transitive partial dependency.
This means a non-prime attribute (an attribute that is not part of
the candidate’s key) is dependent on another non-prime attribute.
This is what the third normal form (3NF) eliminates.
So, for a table to be in 3NF, it must:
be in 2NF
have no transitive partial dependency.
Example of Third Normal Form (3NF):-
employee_roles Table
employee_id job_code
E001 J01
E001 J02
E002 J02
E002 J03
E003 J01
employees Table
employee_id name state_code
E001 Alice 26
E002 Bob 56
E003 Alice 56
jobs Table
job_code job
J01 Chef
J02 Waiter
J03 Bartender
states Table
state_code home_state
26 Michigan
56 Wyoming
Now our database is in 3NF.
Fourth normal form (4NF)
o A relation will be in 4NF if it is in Boyce Codd normal form
and has no multi-valued dependency.
o For a dependency A → B, if for a single value of A, multiple
values of B exists, then the relation will be a multi-valued
dependency.
Example:-
STUDENT
STU_ID COURSE HOBBY
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and
HOBBY are two independent entity. Hence, there is no relationship
between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two
courses, Computer and Math and two
hobbies, Dancing and Singing. So there is a Multi-valued
dependency on STU_ID, which leads to unnecessary repetition of
data.
So to make the above table into 4NF, we can decompose it into
two tables:
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
Fifth Normal Form/Projected Normal Form (5NF)
The fifth normal form (5NF) is also called the Project-Join Normal
Form (PJNF). A relation is in 5NF if it is in 4NF and does not
contain any join dependencies that could result in data loss during
the join operation.
Representing the pinnacle of normalization, 5NF involves
decomposing a table into smaller constituent tables.
A relation R is in Fifth Normal Form if and only if everyone joins
dependency in R is implied by the candidate keys of R. A relation
decomposed into two relations must have lossless join Property,
which ensures that no spurious or extra tuples are generated
when relations are reunited through a natural join.
Example – Consider the above schema, with a case as “if a
company makes a product and an agent is an agent for that
company, then he always sells that product for the company”.
Under these circumstances, the ACP table is shown as:
Table ACP
Agent Company Product
A1 PQR Nut
A1 PQR Bolt
A1 XYZ Nut
A1 XYZ Bolt
A2 PQR Nut
The relation ACP is again decomposed into 3 relations. Now, the
natural Join of all three relations will be shown as:
Table R1
Agent Company
Agent Company
A1 PQR
A1 XYZ
A2 PQR
Table R2
Agent Product
A1 Nut
A1 Bolt
A2 Nut
Table R3
Company Product
PQR Nut
PQR Bolt
XYZ Nut
XYZ Bolt
The result of the Natural Join of R1 and R3 over ‘Company’ and
then the Natural Join of R13 and R2 over ‘Agent’and ‘Product’ will
be Table ACP.
Hence, in this example, all the redundancies are eliminated, and
the decomposition of ACP is a lossless join decomposition.
Therefore, the relation is in 5NF as it does not violate the property
of lossless join.
Functional Dependency in DBMS
What is Functional Dependency?
A functional dependency occurs when one attribute uniquely
determines another attribute within a relation. It is a constraint
that describes how attributes in a table relate to each other. If
attribute A functionally determines attribute B we write this as
the A→B.
Types of Functional Dependencies in DBMS
1) Trivial functional dependency
2) Non-Trivial functional dependency
3) Multivalued functional dependency
4) Transitive functional dependency
1. Trivial Functional Dependency
In Trivial Functional Dependency, a dependent is always a
subset of the determinant. i.e. If X → Y and Y is the subset of X,
then it is called trivial functional dependency
Example:
roll_no name age
42 abc 17
43 pqr 18
44 xyz 18
Here, {roll_no, name} → name is a trivial functional dependency,
since the dependent name is a subset of determinant
set {roll_no, name}. Similarly, roll_no → roll_no is also an
example of trivial functional dependency.
2. Non-trivial Functional Dependency
In Non-trivial functional dependency, the dependent is strictly
not a subset of the determinant. i.e. If X → Y and Y is not a
subset of X, then it is called Non-trivial functional dependency.
Example:
roll_no name age
42 abc 17
43 pqr 18
44 xyz 18
Here, roll_no → name is a non-trivial functional dependency,
since the dependent name is not a subset
of determinant roll_no. Similarly, {roll_no, name} → age is also
a non-trivial functional dependency, since age is not a subset of
{roll_no, name}
3. Multivalued Functional Dependency
In Multivalued functional dependency, entities of the
dependent set are not dependent on each other. i.e. If a → {b,
c} and there exists no functional dependency between b and c,
then it is called a multivalued functional dependency.
For example,
roll_no name age
42 abc 17
43 pqr 18
44 xyz 18
45 abc 19
Here, roll_no → {name, age} is a multivalued functional
dependency, since the dependents name & age are not
dependent on each other(i.e. name → age or age → name
doesn’t exist !)
4. Transitive Functional Dependency
In transitive functional dependency, dependent is indirectly
dependent on determinant. i.e. If a → b & b → c, then according
to axiom of transitivity, a → c. This is a transitive functional
dependency.
For example,
enrol_no name dept building_no
42 abc CO 4
43 pqr EC 2
44 xyz IT 1
45 abc EC 2
Here, enrol_no → dept and dept → building_no. Hence,
according to the axiom of transitivity, enrol_no → building_no is
a valid functional dependency. This is an indirect functional
dependency, hence called Transitive functional dependency.