Unit 1 Study Material
Unit 1 Study Material
Database
What is a Database?
1. Tables: The core structure of a database, where data is stored in rows and columns. Each
table represents an entity (e.g., customers, orders).
2. Fields (Columns): Attributes or properties of the entity. For example, in a customer
table, fields might include CustomerID, Name, Email, and PhoneNumber.
3. Records (Rows): Individual entries in a table. Each record is a unique instance of the
entity represented by the table.
4. Primary Key: A unique identifier for each record in a table. The primary key ensures
that each record can be uniquely identified.
5. Foreign Key: A field in one table that uniquely identifies a row in another table. It is
used to establish a relationship between two tables.
Real-time Example:
In this example, the Books table might have a record for each book in the library, the Members
table for each library member, and the Loans table for each book loan transaction.
Relationships
Types of Relationships:
1. One-to-One (1:1): Each record in Table A is related to one and only one record in Table
B. This type of relationship is less common and usually implemented when two entities
have a one-to-one correspondence.
o Example: In a user profile system, each user might have one unique profile. Thus,
the Users table and the Profiles table might have a one-to-one relationship.
2. One-to-Many (1
): A single record in Table A can be associated with multiple records in Table B. This is
the most common type of relationship.
): Multiple records in Table A can be related to multiple records in Table B. This type of
relationship usually requires a junction table to manage the associations.
Relationships in Databases
A one-to-one relationship exists when a single record in one table is related to a single record in
another table. This type of relationship is used when each record in the first table has one and
only one corresponding record in the second table, and vice versa. One-to-one relationships are
less common compared to other types but can be useful for organizing data into more
manageable pieces.
Example: Consider a scenario in a user management system where each user has a unique
profile.
Users Table:
o Fields: UserID (Primary Key), Username, Password, Email
Profiles Table:
o Fields: ProfileID (Primary Key), UserID (Foreign Key), FirstName, LastName,
DateOfBirth, Address
Here, each user in the Users table has a corresponding profile in the Profiles table. The
UserID in the Profiles table is both a primary key and a foreign key, ensuring that each profile
corresponds to exactly one user.
One-to-Many (1
) Relationship
A one-to-many relationship exists when a single record in one table is related to multiple records
in another table. This is the most common type of relationship used in databases.
Teachers Table:
o Fields: TeacherID (Primary Key), Name, Subject
Students Table:
o Fields: StudentID (Primary Key), Name, TeacherID (Foreign Key)
Here, each teacher in the Teachers table can have multiple students in the Students table. The
TeacherID in the Students table acts as a foreign key linking students to their respective
teacher.
Many-to-Many (M
) Relationship
A many-to-many relationship exists when multiple records in one table are related to multiple
records in another table. This relationship is typically implemented using a junction table.
Example: In a course enrollment system, students can enroll in multiple courses, and each
course can have multiple students.
Students Table:
o Fields: StudentID (Primary Key), Name
Courses Table:
o Fields: CourseID (Primary Key), CourseName
Enrollments Table (Junction Table):
o Fields: EnrollmentID (Primary Key), StudentID (Foreign Key), CourseID
(Foreign Key)
The Enrollments table links students to courses, allowing each student to enroll in multiple
courses and each course to have multiple students.
WHAT IS AN ENTITY?
A Database Management System (DBMS) is software that allows users to define, create,
maintain, and control access to the database. It provides a systematic and organized way to store,
manage, and retrieve information.
The relational data model is a way to structure and query data using tables (relations). Each table
consists of rows (tuples) and columns (attributes), and each table has a unique key to identify its
rows.
Key Concepts:
1. Tables (Relations): Data is stored in tables, where each table represents an entity.
2. Rows (Tuples): Each row in a table represents a unique record.
3. Columns (Attributes): Columns represent the properties or characteristics of the entity.
4. Primary Key: A unique identifier for each row in a table.
5. Foreign Key: A column that creates a relationship between two tables.
Example:
Integrity Rules
Integrity rules are constraints applied to ensure the accuracy and consistency of data within a
relational database.
1. Entity Integrity: Ensures that each table has a primary key and that the key is unique and
not null.
2. Referential Integrity: Ensures that a foreign key value always points to an existing,
valid record in another table.
3. Domain Integrity: Ensures that all entries in a column are of the same data type and
conform to a defined range of values.
Example:
Entity Integrity: In the Students table, StudentID must be unique and not null.
Referential Integrity: In the Enrollments table, StudentID must match a valid
StudentID in the Students table.
Domain Integrity: The Age column in the Students table must contain only integer
values within a specific range.
Theoretical relational languages are abstract languages used to define and manipulate data in a
relational model. They form the basis for practical query languages like SQL.
Example:
π Name (σ CourseName='Database Systems' (Students ⨝ Enrollments ⨝
Relational Algebra: To find names of students enrolled in 'Database Systems' course:
o
Courses))
Relational Calculus: To find names of students older than 20:
o {S.Name | Students(S) AND S.Age > 20}
Database design is the process of creating a database structure that effectively stores and
manages information. Data modeling is a crucial part of this process, where real-world entities
and their relationships are represented in a structured format.
An ER model is a high-level data model that captures the entities (things or objects) within the
scope of an information system, and the relationships among them.
Relational Model
SQL
CREATE TABLE Students (
Student_ID INT PRIMARY KEY,
Name VARCHAR(50),
Age INT
);
);
Before diving into normalization, it's essential to grasp the concept of dependency in relational
databases.
Functional Dependency:
A relationship between two sets of attributes in which one set determines the other uniquely.
Notation: X -> Y (X determines Y)
Example: In a table with attributes (StudentID, Name, CourseID, CourseName), StudentID
determines Name (StudentID -> Name).
Types of Dependencies:
Full Functional Dependency: Every attribute in the dependent set is dependent on the whole
primary key.
Partial Dependency: A non-key attribute is dependent on only part of the primary key.
Transitive Dependency: A non-key attribute is dependent on another non-key attribute.
Now let's understand the types of Normal forms with the help of examples.
A table is referred to as being in its First Normal Form if atomicity of the table is 1.
Here, atomicity states that a single cell cannot hold multiple values. It must hold only a single-valued
attribute.
The First normal form disallows the multi-valued attribute, composite attribute, and their combinations.
Now you will understand the First Normal Form with the help of an example.
Below is a students’ record table that has information about student roll number, student name,
student course, and age of the student.
1 Rahul c/c++ 22
2 Harsh Java 18
3 Mano c/c++ 23
4 Vibin c/c++ 22
5 Aishu Java 21
In the studentsrecord table, you can see that the course column has two values. Thus it does not follow
the First Normal Form. Now, if you use the First Normal Form to the above table, you get the below
table as a result.
1 Rahul C 22
1 Rahul C++ 22
2 Harsh Java 18
3 Mano C 23
4 Mano C++ 23
5 Vibin C 22
6 Vibin C++ 22
7 Aishu Java 21
By applying the First Normal Form, you achieve atomicity, and also every column has unique values.
Before proceeding with the Second Normal Form, get familiar with Candidate Key and Super Key.
Candidate Key
A candidate key is a set of one or more columns that can identify a record uniquely in a table, and YOU
can use each candidate key as a Primary Key.
1 22 Raju Coimbatore 11
2 23 Rakesh Salem 11
3 24 Leela Erode 22
4 25 Sudhan Madurai 33
5 26 Sam Coimbatore 11
Super Key
Super key is a set of over one key that can identify a record uniquely in a table, and the Primary Key is a
subset of Super Key.
Second Normal Form (2NF) : The first condition for the table to be in Second Normal Form is that
the table has to be in First Normal Form. The table should not possess partial dependency. The partial
dependency here means the proper subset of the candidate key should give a non-prime attribute. Now
understand the Second Normal Form with the help of an example. Consider the table Location:
1 D1 Coimbatore
2 D3 Salem
3 T1 Erode
4 F2 Madurai
5 H3 Trichy
The Location table possesses a composite primary key cust_id, storeid. The non-key attribute is
store_location. In this case, store_location only depends on storeid, which is a part of the primary key.
Hence, this table does not fulfill the second normal form.
To bring the table to Second Normal Form, you need to split the table into two parts. This will give you
the below tables:
1 D1 1 Coimbatore
2 D3 2 Salem
3 T1 3 Erode
4 F2 4 Madurai
5 H3 5 Trichy
As you have removed the partial functional dependency from the location table, the column
store_location entirely depends on the primary key of that table, storeid. Now that you understood the
1st and 2nd Normal forms, you will look at the next part of this Normalization in SQL tutorial.
The first condition for the table to be in Third Normal Form is that the table should be in the Second
Normal Form.
The second condition is that there should be no transitive dependency for non-prime attributes,
which indicates that non-prime attributes (which are not a part of the candidate key) should not
depend on other non-prime attributes in a table. Therefore, a transitive dependency is a functional
dependency in which A → C (A determines C) indirectly, because of A → B and B → C (where it is not
the case that B → A).
The third Normal Form ensures the reduction of data duplication. It is also used to achieve data
integrity.
Below is a student table that has student id, student name, subject id, subject name, and address of the
student as its columns.
As you can see in both the tables, all the non-key attributes are now fully functional, dependent only on
the primary key. In the first table, columns name, subid, and addresses only depend on stu_id. In the
second table, the sub only depends on subid.
Boyce Codd Normal Form is also known as 3.5 NF. It is the superior version of 3NF and was developed by
Raymond F. Boyce and Edgar F. Codd to tackle certain types of anomalies which were not resolved with
3NF.
The first condition for the table to be in Boyce Codd Normal Form is that the table should be in the third
normal form. Secondly, every Right-Hand Side (RHS) attribute of the functional dependencies should
depend on the super key of that particular table.
For example :
You have a functional dependency X → Y. In the particular functional dependency, X has to be the part
of the super key of the provided table.
In the above table, student_id and subject together form the primary key because using student_id and
subject; you can determine all the table columns.
Another important point to be noted here is that one professor teaches only one subject, but one
subject may have two professors.
Which exhibit there is a dependency between subject and professor, i.e. subject depends on the
professor's name.
The table is in 1st Normal form as all the column names are unique, all values are atomic, and all the
values stored in a particular column are of the same domain.
The table also satisfies the 2nd Normal Form, as there is no Partial Dependency.
And, there is no Transitive Dependency; hence, the table also satisfies the 3rd Normal Form.
This table follows all the Normal forms except the Boyce Codd Normal Form.
As you can see stuid, and subject forms the primary key, which means the subject attribute is a prime
attribute.
BCNF does not follow in the table as a subject is a prime attribute, the professor is a non-prime
attribute.
To transform the table into the BCNF, you will divide the table into two parts. One table will hold stuid
which already exists and the second table will hold a newly created column profid.
And in the second table will have the columns profid, subject, and professor, which satisfies the BCNF.
Denormalization :
Denormalization is a database optimization technique in which we add redundant data to one or more
tables. This can help us avoid costly joins in a relational database. Note that denormalization does not
mean ‘reversing normalization’ or ‘not to normalize’. It is an optimization technique that is applied
after normalization.
Basically, The process of taking a normalized schema and making it non-normalized is called
denormalization, and designers use it to tune the performance of systems to support time-critical
operations.
For example, in a normalized database, we might have a Courses table and a Teachers table. Each
entry in Courses would store the teacherID for a Course but not the teacherName. When we need to
retrieve a list of all Courses with the Teacher’s name, we would do a join between these two tables.
In some ways, this is great; if a teacher changes his or her name, we only have to update the name in
one place.
The drawback is that if tables are large, we may spend an unnecessarily long time doing joins on
tables.
Pros of Denormalization:
1. Retrieving data is faster since we do fewer joins
2. Queries to retrieve can be simpler(and therefore less likely to have bugs),
since we need to look at fewer tables.
Cons of Denormalization:
1. Updates and inserts are more expensive.
2. Denormalization can make update and insert code harder to write.
3. Data may be inconsistent.
4. Data redundancy necessitates more storage.
In a system that demands scalability, like that of any major tech company, we almost always use
elements of both normalized and denormalized databases.
Advantages of Denormalization:
Improved Query Performance: Denormalization can improve query performance by reducing the
number of joins required to retrieve data.
Reduced Complexity: By combining related data into fewer tables, denormalization can simplify the
database schema and make it easier to manage.
Easier Maintenance and Updates: Denormalization can make it easier to update and maintain the
database by reducing the number of tables.
Improved Read Performance: Denormalization can improve read performance by making it easier to
access data.
Better Scalability: Denormalization can improve the scalability of a database system by reducing the
number of tables and improving the overall performance.
Disadvantages of Denormalization:
Reduced Data Integrity: By adding redundant data, denormalization can reduce data integrity and
increase the risk of inconsistencies.
Increased Complexity: While denormalization can simplify the database schema in some cases, it can
also increase complexity by introducing redundant data.
Increased Storage Requirements: By adding redundant data, denormalization can increase storage
requirements and increase the cost of maintaining the database.
Increased Update and Maintenance Complexity: Denormalization can increase the complexity of
updating and maintaining the database by introducing redundant data.
Limited Flexibility: Denormalization can reduce the flexibility of a database system by introducing
redundant data and making it harder to modify the schema.
In normalization, Non-redundancy
In denormalization, data are combined to
1. and consistency data are stored in set
execute the query quickly.
schema.
References
Text Books: