Database Models & Architecture Guide
Database Models & Architecture Guide
storage,
retrieval, and management. It defines how data is connected, stored, and manipulated. There are different types of data
models used in databases, with network and relational models being two of the most common.
1 Network Data Model:The network data model represents data in a graph-like structure, where data entities (called
records) are represented as nodes, and the relationships between them are represented as edges or links. This model
supports complex relationships like many-to-many, and each record can have multiple parent and child records. The
hierarchical relationship between data elements is maintained through pointers.
Network Data Model Characteristics:*Entities (Nodes): Represent records or objects in the database.Relationships
(Edges): Represent the connections between the entities.Pointers: Data records point to other related records.
Example:Consider an employee database with the following entities:
• Employee (with attributes like EmployeeID, Name, etc.)Department (with attributes like DepartmentID,
Name, etc.)Project (with attributes like ProjectID, ProjectName, etc.)
In a network model:An employee can work in multiple departments and on multiple projects.*A department can have
many employees, and an employee can work in multiple departments.*b can involve multiple employees.
The relationships are set by pointers. For example, an employee record would contain pointers to the department and
project records the employee is associated with.
This kind of model uses sets to define relationships, where each set defines a relationship between a parent record and
multiple child records.
Illustrative Diagram:
lua
Employee <--> Department
|
|
Employee <--> Project
In the network model, each entity can have multiple relationships, forming a graph-like structure of interconnected data.
2. Relational Data Model:The relational data model organizes data into tables (relations). Each table consists of rows
(tuples) and columns (attributes). The relational model is based on set theory and treats the data as a collection of
relations. It emphasizes the use of keys, particularly primary keys (which uniquely identify each record in a table) and
foreign keys (which represent relationships between tables).
Relational Data Model Characteristics:
• Tables: Data is organized into tables, where each table contains rows and columns.
• Primary Key: A unique identifier for each record in a table.
• Foreign Key: A field in a table that refers to the primary key in another table, establishing relationships.
• No Explicit Links: Unlike the network model, there are no pointers or links between records; instead,
relationships are maintained using keys.
Example:
Consider a university database with the following tables:
1. Students:Columns: StudentID (Primary Key), Name, Age, Major
2. Courses:Columns: CourseID (Primary Key), CourseName, Credits
3. Enrollments:Columns: EnrollmentID (Primary Key), StudentID (Foreign Key), CourseID (Foreign Key), Grade
In this example:A student can enroll in multiple courses.
• A course can have multiple students enrolled.
• The Enrollments table acts as a junction table to represent the many-to-many relationship between students and
courses.
Illustrative Tables:
Students Table:
Courses Table:
101 Database 3
102 Algorithms 3
Enrollments Table:
1 1 101 A
2 2 101 B
3 1 102 A
In the relational model, foreign keys link the Students table to the Enrollments table (StudentID) and the Courses table
to the Enrollments table (CourseID). These keys establish the relationships between tables.
Key Differences:
• Network Model: More flexible and complex relationships (many-to-many), uses pointers and links between
records.
• Relational Model: Easier to understand, use of tables and keys to define relationships (often simpler and more
standardized).
Q-The three-schema architecture of a Database Management System (DBMS) is a framework that separates the
database system into three levels of abstraction to improve data independence, ensure efficient data management, and
allow users to interact with data in a way that meets their needs. The three levels of this architecture are:
1. Internal Schema (Physical Level)
2. Conceptual Schema (Logical Level)
3. External Schema (View Level)
Each of these schemas serves a specific purpose in defining how the data is stored, how it is structured logically, and
how it is viewed by different users.
1. Internal Schema (Physical Level):The internal schema defines the physical storage structure of the database on the
storage medium (e.g., hard drive, SSD). This schema focuses on how the data is stored, indexed, and retrieved
efficiently. It deals with the organization of data files, how the data is stored on disk, and how access is optimized.*It
describes the data access paths (indexes, hashing techniques, etc.).*It deals with performance optimization, including
how the data is physically represented and how it's compressed or partitioned.
2. Conceptual Schema (Logical Level):The conceptual schema defines the logical structure of the entire database,
independent of how the data is stored physically. It describes the data, relationships, constraints, and operations in the
database in a way that is meaningful to the database user but without any concern for physical storage details.*It
includes the entities, attributes, and the relationships between them.*This level defines all the data elements that are
available in the database, their interrelationships, and the constraints or rules for the data.
3. External Schema (View Level):The external schema represents the way in which individual users or user groups
view the data. It defines views of the data that are tailored to the specific needs of different users. A single database may
have multiple external schemas, providing different perspectives or subsets of the data.*It allows users to access only
the data they need, without worrying about the underlying structure.*It supports data independence by ensuring that
changes at the internal or conceptual level do not directly impact the external views.
Three-Schema Architecture Diagram:Here’s a conceptual representation of the three-schema architecture of a
DBMS:
sql
+---------------------+
| External Schema 1 | <- User View 1
+---------------------+
|
|
v
+---------------------+
| External Schema 2 | <- User View 2
+---------------------+
|
|
v
+---------------------+ +---------------------+
| Conceptual Schema | ---> | Internal Schema | <- Physical Storage
| (Logical Level) | | (Physical Level)|
+---------------------+ +---------------------+
Explanation of the Diagram:
1. External Schemas:These are the views that represent how the data is presented to individual users or user
groups. For example, in a university database, one user might have a view of student data, while another might
have a view of course data. These views hide the complexity of the underlying database structure.
2. Conceptual Schema:This is the logical view of the database, which represents all data and the relationships
among data elements in an abstract manner. It is independent of how the data is physically stored.*It serves as
the middle layer that connects the external schemas (user views) with the internal schema (storage).
3. Internal Schema:This is the physical layer, which defines how the data is stored in the system. It specifies
storage formats, indexing methods, and other details related to performance and optimization.
Data Independence:One of the main goals of the three-schema architecture is to provide data independence, which is
the ability to change the schema at one level without affecting the schema at other levels.*Physical Data Independence:
Changes to the internal schema (physical storage) should not affect the conceptual schema.*Logical Data Independence:
Changes to the conceptual schema (logical structure) should not affect the external schemas (user views).
Example of Three-Schema Architecture:
Consider a university database:
• External Schema (User Views):A professor might have a view showing student names and grades.*A student
might have a view showing their courses and grades.
• Conceptual Schema (Logical Level):Contains the overall logical structure, like tables for students, courses,
and enrollments with relationships between them.
• Internal Schema (Physical Level):Describes how data is physically stored in files, indexing methods (e.g.,
hash index for faster search), and how the data is stored on the disk.
Q- Strong Entity:A strong entity is an entity that can exist independently and has a unique identifier called a primary
key. This primary key uniquely identifies each instance of the entity. In other words, a strong entity does not rely on any
other entity to be uniquely identified.
Characteristics Independent existence: A strong entity does not depend on any other entity for its
identification.Primary key: It has a primary key that uniquely identifies each record or instance of the entity.No
reliance on other entities: It can be represented without referencing other entities.
Example:
Consider a Student entity in a university database:
• Student Entity: The Student entity could have attributes like StudentID (primary key), Name, Age, and
Department.*The StudentID uniquely identifies each student, and it does not depend on any other entity for its
identification. Therefore, Student is a strong entity.
Diagram:
diff
+------------------+
| Student |
+------------------+
| StudentID (PK) |
| Name |
| Age |
| Department |
+------------------+
Q-Weak Entity:A weak entity is an entity that cannot be uniquely identified by its own attributes alone. It relies on a
strong entity (also called an owner entity) to help identify it. A weak entity usually does not have a sufficient primary
key by itself, so it is identified by a composite key, which includes the key of the strong entity along with its own partial
key.
Characteristics :Dependent existence: A weak entity depends on a strong entity for its identification.Partial key: It
has a partial key (an attribute that can uniquely identify instances of the weak entity when combined with the key of the
strong entity).Identifying relationship: A weak entity is connected to the strong entity via an identifying relationship,
often represented by a double diamond in ER diagrams.
Example:Dependent Entity: A Dependent entity could have attributes like DependentName, Relationship, and
EmployeeID.*The DependentName alone is not unique enough to identify the dependent, so we combine it with the
EmployeeID (from the Employee entity, which is the strong entity) to uniquely identify the dependent.*The Dependent
entity is a weak entity because it depends on the Employee entity for its identification.
Diagram:
lua
+-------------------+ +--------------------+
| Employee | | Dependent |
+-------------------+ +--------------------+
| EmployeeID (PK) | | DependentName |
| Name | | Relationship |
+-------------------+ | EmployeeID (FK) |
+--------------------+
In this example:
• Dependent depends on Employee for its identification. The combination of DependentName and EmployeeID
forms the composite key for Dependent.
Q- Derived Attribute:A derived attribute is an attribute whose value is derived from other attributes in the database.
Instead of being stored directly in the database, it is calculated based on existing data. Derived attributes are usually
marked with a dashed oval in ER diagrams.
Characteristics:Not stored: Derived attributes do not need to be physically stored in the database, as they can be
calculated when required.Calculated from other attributes: They are typically derived from other stored attributes
(either directly or through some formula).Dynamic: Their value can change when the values of the attributes from
which they are derived change.
Example:Employee Entity: The entity has attributes like EmployeeID, HireDate, and Salary.*A derived attribute could
be Age, which can be calculated from the HireDate and the current date.*The Age attribute is derived from the
employee's birthdate or hire date, but it is not stored explicitly in the database.
Diagram:
lua
+------------------+
| Employee |
+------------------+
| EmployeeID (PK) |
| Name |
| HireDate |
| Salary |
+------------------+
|
v
+------------------+
| Derived: Age | <- This is a derived attribute (not stored)
+------------------+
Here:
• The Age attribute is not stored directly but is calculated from the HireDate or BirthDate when needed.
Q-Mapping :In the context of databases, mapping refers to the process of establishing relationships or correspondences
between two different models or representations. It plays a critical role in translating between different levels of
abstraction, such as between Entity-Relationship (ER) models and relational models or between different schemas in a
database.
Types of Mapping:ER to Relational Mapping (ER Model to Relational Model): This involves transforming an
Entity-Relationship (ER) diagram (a conceptual representation of a database) into a relational schema (which is used in
relational database management systems).
Mapping Process:Entities in the ER diagram become tables in the relational schema.*Attributes of entities become
columns in the corresponding tables.*Relationships in the ER model are represented by foreign keys or additional tables
in the relational schema.
Example:An Employee entity with attributes such as EmployeeID, Name, and Salary would map to a table in the
relational model with columns for each of these attributes.*A WorksIn relationship between the Employee and
Department entities could translate to a foreign key in the Employee table, linking the Employee to the Department
table.*Schema Mapping (Involving Views and Schemas): Mapping can also refer to the relationship between different
schemas in a database. For instance, mapping can be used when we have different levels of schemas (like in the
three-schema architecture of DBMS) and need to establish relationships between the external, conceptual, and internal
schemas.
Example:An external schema (user view) might map to the conceptual schema (logical representation) via views that
specify what parts of the database are visible to specific users.*The conceptual schema would then map to the internal
schema (physical storage), indicating how the data is stored on disk.*Data Mapping (in ETL Processes): In the context
of data migration or ETL (Extract, Transform, Load) processes, mapping refers to the process of defining how data from
one database or system (source) will be translated into another database or system (target).
This could involve transforming data types, renaming columns, or combining data from different sources into a unified
format.
1. Relational Mapping (One-to-One, One-to-Many, Many-to-Many): Mapping can also refer to how relationships
between entities are translated into the relational database design, for example:One-to-One Mapping: One
record in a table is related to one record in another table.One-to-Many Mapping: One record in a table is
related to multiple records in another table (often implemented using a foreign key).Many-to-Many Mapping:
Multiple records in one table are related to multiple records in another table, which often requires a junction
table to represent the relationship.
Relational Mapping:
• Employee table:*Columns: EmployeeID (PK), Name, Salary.
• Department table:Columns: DepartmentID (PK), DepartmentName.
• WorksIn table (for many-to-many relationship):*Columns: EmployeeID (FK), DepartmentID (FK),
representing the association between employees and departments.
Importance :Data Integrity: Mapping ensures that relationships between data entities are correctly represented,
helping maintain data integrity.Data Independence: In multi-layered architecture, mapping helps achieve data
independence, such as logical data independence and physical data independence.Data Migration: Mapping allows for
the smooth transfer of data between systems, ensuring that it is accurately transformed into a compatible format for the
target system.
Q-A stored procedure is a precompiled collection of one or more SQL statements that can be executed on demand.
Stored procedures are stored in the database and can be called by an application or by a database user to perform a
specific task or set of tasksEncapsulation: A stored procedure encapsulates a series of SQL statements and logic (like
loops, conditionals, etc.) into a single callable unit.Reusability: Once created, stored procedures can be called multiple
times by various applications or users, making them reusable.Performance: Since stored procedures are precompiled,
their execution is faster compared to executing multiple SQL statements individually.Modularity: They allow you to
break down complex tasks into manageable, modular chunks.Transaction Control: Stored procedures can manage
transactions by using commands like COMMIT or ROLLBACK to control the database state.
Benefits:Efficiency: Reduces network traffic since only the call to the procedure is sent rather than multiple SQL
statements.Security: Users can execute a stored procedure without having direct access to underlying tables, providing
an extra layer of security.Consistency: Stored procedures enforce consistent business logic across various applications.
Example of a Stored Procedure:
Let's say you want to create a stored procedure that updates the salary of an employee:
sql
CREATE PROCEDURE UpdateSalary
@EmployeeID INT,
@NewSalary DECIMAL
AS
BEGIN
UPDATE Employee
SET Salary = @NewSalary
WHERE EmployeeID = @EmployeeID;
END;
To execute the procedure:
sql
EXEC UpdateSalary @EmployeeID = 101, @NewSalary = 70000;
In this example:
• The stored procedure UpdateSalary takes two parameters: @EmployeeID and @NewSalary.
• It updates the Salary of the employee whose EmployeeID matches the given @EmployeeID.
Q-Trigger:A trigger is a special type of stored procedure that is automatically invoked (triggered) by the DBMS when
a specific event occurs on a table or view. Unlike a stored procedure, which needs to be explicitly called by a user or
application, a trigger is automatically executed in response to certain actions, such as an INSERT, UPDATE, or
DELETE on a specified table.
Key Features of Triggers:Event-Driven: Triggers are automatically executed in response to specific database events
(e.g., data modifications).Data Integrity: Triggers can be used to enforce rules and maintain data integrity, such as
preventing invalid updates or ensuring that changes in one table are reflected in another.Automation: They automate
tasks such as auditing changes, maintaining logs, or updating related tables without needing manual intervention.
1. Timing: Triggers can be set to fire BEFORE or AFTER an event occurs (e.g., before an insert or after a delete).
Types of Triggers:BEFORE Trigger: Executed before the operation (e.g., before an INSERT or UPDATE).AFTER
Trigger: Executed after the operation (e.g., after an INSERT, UPDATE, or DELETE).INSTEAD OF Trigger: Replaces
the action that would be performed. For example, it can replace an INSERT operation with a custom operation.
Benefits:Automated Workflow: Triggers can handle tasks like updating related tables, logging changes, and
maintaining data integrity automatically.Enforcing Business Rules: Triggers help enforce business rules and
constraints that are difficult to express using standard database constraints.Auditing: They can be used to track changes
made to the database for auditing purposes.
Example of a Trigger:
Let's create a trigger that automatically updates the LastUpdated column of a table whenever a record is modified.
sql
CREATE TRIGGER UpdateEmployeeTimestamp
AFTER UPDATE ON Employee
FOR EACH ROW
BEGIN
UPDATE Employee
SET LastUpdated = CURRENT_TIMESTAMP
WHERE EmployeeID = OLD.EmployeeID;
END;
In this example:The UpdateEmployeeTimestamp trigger is fired AFTER an UPDATE operation on the Employee
table.*It updates the LastUpdated column to the current timestamp each time an employee's data is updated.
Q-Normalization is a process in database design that involves organizing the attributes (columns) and relations (tables)
in a database to reduce redundancy and dependency. The goal of normalization is to minimize data duplication and
ensure that the database structure is logically consistent and efficient. The process involves decomposing a large table
into smaller, more manageable ones while maintaining the relationships between the data.
Why Normalization?Reduce Data Redundancy: By eliminating repeating groups and duplicate data, normalization
ensures that data is stored in a compact and efficient way.Improve Data Integrity: Normalization helps enforce data
integrity by ensuring that the data follows logical rules (such as consistency and accuracy).Minimize Update
Anomalies: In a non-normalized database, there can be problems like insert, update, and delete anomalies.
Normalization helps prevent these.
Types of Normal Forms:Normalization is typically carried out in multiple stages, or "normal forms," each of which
builds on the previous one. The most commonly used normal forms are the First Normal Form (1NF), Second
Normal Form (2NF), Third Normal Form (3NF), and Boyce-Codd Normal Form (BCNF).
1. First Normal Form (1NF):A relation (table) is in First Normal Form (1NF) if:*All attributes (columns) contain
atomic (indivisible) values.*Each record (row) must be unique and identified by a primary key.*There are no repeating
groups or arrays in any column.
Example (Table before 1NF):
In this case, the Subjects column contains multiple values for each student, which violates 1NF because it is not atomic.
After converting to 1NF:
1 John Math
1 John Science
1 John English
2 Alice History
2 Alice Math
2 Alice Chemistry
Now, the Subjects column contains atomic values, and each row is unique for a given student and subject.
2. Second Normal Form (2NF):A relation is in Second Normal Form (2NF) if:
• It is in 1NF.*It has no partial dependencies, i.e., all non-key attributes are fully functionally dependent on the
primary key.
Partial Dependency:A partial dependency occurs when an attribute depends on only part of a composite primary key (a
primary key that consists of more than one attribute).
Example (Table in 1NF but not in 2NF):
In this case, the primary key is a combination of StudentID and CourseID. However, the attribute Instructor depends
only on CourseID and not on the full primary key. This is a partial dependency.
After converting to 2NF:We break the table into two relations:
1. StudentCourses (stores student-course associations):
StudentID CourseID
1 C101
1 C102
2 C101
Now, the Instructor is only dependent on CourseID, and the data is in 2NF because we removed the partial dependency.
3. Third Normal Form (3NF):A relation is in Third Normal Form (3NF) if:*It is in 2NF.*It has no transitive
dependencies, i.e., non-key attributes are not dependent on other non-key attributes.
Transitive Dependency:A transitive dependency occurs when one non-key attribute depends on another non-key
attribute.
Example (Table in 2NF but not in 3NF):
In this table, InstructorPhone depends on Instructor, and Instructor depends on CourseID, creating a transitive
dependency between InstructorPhone and CourseID.
After converting to 3NF:
We split the table into three relations:
1. StudentCourses (stores student-course associations):
StudentID CourseID
1 C101
1 C102
StudentID CourseID
2 C101
CourseID Instructor
C101 Mr. A
C102 Mr. B
Instructor InstructorPhone
Mr. A 12345
Mr. B 67890
Now, the InstructorPhone is dependent only on Instructor, not on CourseID, and the data is in 3NF.
Boyce-Codd Normal Form (BCNF):A relation is in Boyce-Codd Normal Form (BCNF) if:
• It is in 3NF.*Every determinant is a candidate key.
A determinant is an attribute or a set of attributes that determines another attribute (i.e., if we know the determinant,
we can uniquely determine the value of another attribute).
Example (Table in 3NF but not in BCNF):
1 C101 Mr. A R1
2 C101 Mr. A R1
3 C102 Mr. B R2
In this case, Instructor determines Room, but Instructor is not a candidate key (the primary key is StudentID and
CourseID). This violates BCNF because a non-candidate key (Instructor) is determining a non-prime attribute (Room).
After converting to BCNF:
We break the table into two relations:
1. StudentCourses (stores student-course associations):
StudentID CourseID
1 C101
2 C101
3 C102
C101 Mr. A R1
CourseID Instructor Room
C102 Mr. B R2
Now, the relation satisfies BCNF because every determinant is a candidate key.
Q-Cost-based optimization is a technique used in a Database Management System (DBMS) to determine the most
efficient way to execute a given query. The goal is to minimize the execution cost, which can involve factors such as
I/O operations, CPU usage, memory usage, and network communication. This approach relies on a cost model, which
evaluates the possible execution plans for a query and selects the one with the lowest cost.
How Cost-Based Optimization Works:Query Parsing: When a query is submitted, the DBMS first parses and
transforms the query into a logical query plan (which describes the operations required, like joins, selections,
projections).Generate Physical Plans: The DBMS generates multiple physical plans (or execution plans), each
representing a different way to execute the logical query. For example, a join might be implemented using nested loops,
hash joins, or merge joins.Cost Estimation: The DBMS estimates the cost of each physical plan using the cost model.
This involves considering factors like the number of disk I/O operations, CPU cycles, and the size of intermediate
results.Plan Selection: The optimizer selects the execution plan with the lowest cost, which is expected to execute the
query most efficiently based on the estimated costs.
Example:
Consider a query that involves joining two tables:
sql
SELECT * FROM Employee JOIN Department ON Employee.DepartmentID = Department.DepartmentID;
The cost-based optimizer will consider different join algorithms (nested loop join, hash join, etc.) and select the one
with the least estimated cost based on factors like:*Table sizes (how many rows and columns).*Index
availability.*Available memory and CPU.
Advantages :It tries to select the best query execution strategy based on various factors, leading to better
performance.*It's flexible and adapts to different query patterns and database configurations.
Q-:Two-Phase Locking (2PL) is a concurrency control protocol used in DBMS to ensure serializability of transactions
(i.e., transactions are executed in a way that the result is the same as if they were executed serially, one after the other).
2PL ensures that there is no interference between concurrent transactions, avoiding anomalies like lost updates,
temporary inconsistency, and uncommitted data.
How Two-Phase Locking Works:The protocol is based on two phases:Growing Phase: A transaction can acquire
locks on data items, but it cannot release any locks during this phase. This phase is about acquiring locks to ensure no
other transaction can modify the data during the transaction's execution.Shrinking Phase: After the growing phase, the
transaction enters the shrinking phase, where it can only release locks and cannot acquire any new locks. Once a
transaction releases a lock, it is in the shrinking phase, and no further changes to the locks are allowed.Locks: The
transaction acquires locks (shared or exclusive) on the data items it needs to read or modify.Serializability: The
two-phase nature ensures that the transaction's execution is serializable because it prevents situations like a deadlock or
a transaction accessing data it has already released a lock on.Lock Types: There are typically two types of locks:Shared
Lock (S): Allows other transactions to read the data but prevents them from modifying it.Exclusive Lock (X): Prevents
other transactions from reading or modifying the data.
Example:Consider two transactions, T1 and T2, attempting to access the same data:T1: UPDATE Account SET balance
= balance - 100 WHERE account_id = 123;T2: UPDATE Account SET balance = balance + 100 WHERE account_id =
123;
If T1 acquires an exclusive lock on the account_id = 123, then T2 will be blocked until T1 releases the lock, ensuring
that T1 and T2 do not execute simultaneously and that data integrity is maintained.
Advantages Guarantees Serializability: By ensuring transactions follow a strict locking order, 2PL ensures the
transactions can be executed in a serializable manner.Deadlock Prevention: By following the strict acquisition and
release of locks, 2PL reduces the risk of conflicting transactions and the likelihood of deadlocks.
Disadvantages: Deadlock: Even though 2PL ensures serializability, it does not prevent deadlocks, where two or more
transactions wait indefinitely for each other to release locks.Performance Overhead: The locking mechanism can lead
to contention and blocking, which can degrade performance in highly concurrent systems.
Q-4NF (Fourth Normal Form):4NF is concerned with multi-valued dependencies, which are a type of dependency
that can occur when multiple independent attributes depend on a primary key.A relation (table) is in Fourth Normal
Form (4NF) if:*It is in Boyce-Codd Normal Form (BCNF).*It does not have any multi-valued
dependencies.Multi-Valued Dependency:A multi-valued dependency occurs when one attribute determines two or
more independent attributes in a way that all possible combinations of those attributes should be stored. This can cause
redundancy in the table.
Example:Consider a table that records information about students, their subjects, and their hobbies:
1 Math Reading
1 Science Reading
1 Math Swimming
1 Science Swimming
2 History Painting
2 Geography Painting
Here, we have two independent sets of attributes (Subjects and Hobbies) that depend on the same key (StudentID). This
is a multi-valued dependency because for each StudentID, we can have multiple subjects and multiple hobbies, and
these two sets are independent of each other.
After Converting to 4NF:To eliminate the multi-valued dependency, we break the table into two tables:
1. StudentSubjects:
StudentID Subject
1 Math
1 Science
2 History
2 Geography
2. StudentHobbies:
StudentID Hobby
1 Reading
1 Swimming
2 Painting
Now, each table only has one independent set of values, ensuring there is no multi-valued dependency.
5NF (Fifth Normal Form):5NF, also called Project-Join Normal Form (PJNF), is concerned with join dependencies
and is aimed at eliminating redundancy that results from data being split across multiple relations (tables).
Definition:A relation is in Fifth Normal Form (5NF) if:*It is in 4NF.*It does not have any join dependencies that can be
decomposed into smaller tables without losing information.
Join Dependency:A join dependency occurs when a relation can be split into multiple smaller relations, and the
original table can be reconstructed by joining those smaller relations together. If such decompositions result in
redundancy or loss of information, the table is not in 5NF.Example:Consider a table that stores information about
employees, the projects they work on, and the locations of those projects:
1 ProjectA NY
1 ProjectB LA
2 ProjectA NY
2 ProjectC SF
In this table:An employee can work on multiple projects.*A project can be located in multiple places.*The
combination of employee, project, and location results in redundancy because we are storing combinations of these
three attributes.
This situation leads to a join dependency because you can decompose this table into three smaller relations and join
them back to reconstruct the original table. The three relations are:
1. EmployeeProject:
EmployeeID Project
1 ProjectA
1 ProjectB
2 ProjectA
2 ProjectC
2. ProjectLocation:
Project Location
ProjectA NY
ProjectB LA
ProjectC SF
3. EmployeeLocation:
EmployeeID Location
1 NY
1 LA
2 NY
2 SF
After decomposing the table, we can use JOIN operations to reconstruct the original table, ensuring that we don't have
redundant data stored.
Q-Optimization in the context of databases refers to the process of improving the efficiency of database operations,
such as query processing and data retrieval. The goal is to reduce the time and resources (such as CPU usage, memory,
and disk I/O) required to execute database operations, ensuring that the database system performs well under various
workloads.
Optimization is typically applied in two primary areas in a DBMS:
Query Optimization:Query optimization involves improving the performance of SQL queries by finding the most
efficient way to execute them. Since a query can often be executed in multiple ways (using different algorithms or
access paths), the optimizer chooses the plan that minimizes resource consumption.Query Optimization
Techniques:*Logical Optimization: Involves reordering or transforming the query at a logical level (e.g., rearranging
joins, pushing selections, or projections).Physical Optimization: Focuses on selecting the most efficient physical
execution plan (e.g., choosing between different types of joins, such as hash join, nested loop join, etc.).
Example:
For the query:
sql
SELECT * FROM Employees WHERE Age > 30 AND Department = 'HR';
A query optimizer might:
• Reorder the conditions to first filter by the Department (if it has an index).
• Choose a suitable index on Age or Department to speed up the retrieval of data.
2. Database OptimizationDatabase optimization focuses on improving the database schema, indexing strategies,
storage mechanisms, and overall configuration to ensure efficient data retrieval and storage management.
Techniques for Database Optimization:
• Indexing: Creating indexes on frequently queried columns can drastically reduce query execution time by
allowing faster lookups.Normalization/Denormalization: Normalizing data minimizes redundancy, while
denormalizing data (in specific cases) can reduce the number of joins required, improving
performance.Partitioning: Dividing large tables into smaller, more manageable partitions can reduce query
response time.Caching: Frequently accessed data can be cached to avoid expensive disk I/O operations.
Q-Concurrency control in a Database Management System (DBMS) refers to the techniques and mechanisms used to
ensure that multiple transactions can be executed simultaneously without conflicting with each other, thus preserving
the ACID properties (Atomicity, Consistency, Isolation, Durability) of transactions. The goal is to ensure that the
database remains in a consistent state even when multiple transactions are executed at the same time.:Transaction: A
unit of work in the DBMS, which could be a query or a set of queries.Isolation: Ensures that the execution of one
transaction is isolated from others. Intermediate results of a transaction should not be visible to other transactions until
the transaction is committed.Conflicts: Occur when multiple transactions access the same data concurrently and at least
one of the transactions modifies the data.
Concurrency Control Techniques:
1. Lock-Based Protocols:*Locks are used to prevent multiple transactions from accessing the same data item
concurrently in a conflicting manner.Types of locks:*Shared Lock (S-lock): A transaction can only read the
data item but cannot modify it. Other transactions can acquire a shared lock on the same data item.Exclusive
Lock (X-lock): A transaction has full control over a data item, meaning no other transaction can read or write
the item until the exclusive lock is released.Two-Phase Locking (2PL): This protocol ensures serializability by
dividing the transaction into two phases:Growing Phase: The transaction can acquire locks but cannot release
any.Shrinking Phase: The transaction can release locks but cannot acquire new ones.
2. Timestamp-Based Protocols:Each transaction is given a unique timestamp when it starts. The DBMS uses
these timestamps to decide the order of transaction execution, ensuring that transactions are executed in the
correct order.Older transactions are given priority over newer ones to avoid conflicts.Example: If two
transactions access the same data item, the one with the earlier timestamp will be allowed to proceed, while the
later one may be aborted or delayed.
3. Optimistic Concurrency Control:Transactions execute without locking the data and only check for conflicts at
the end, during a validation phase. If no conflicts are detected, the transaction is committed; otherwise, it is
rolled back.*This technique is useful in environments with low contention for data.
4. Multiversion Concurrency Control (MVCC):Instead of locking data, this technique maintains multiple
versions of the same data item. Each transaction operates on a version of the data, and the DBMS ensures that
the correct version is used for reading and writing.*MVCC reduces the likelihood of conflicts, especially in
systems with heavy read operations.
Importance Ensures data consistency: By preventing concurrent access conflicts, it guarantees that the database
maintains its integrity.Improves system performance: Allows multiple transactions to run in parallel, improving
throughput and system efficiency.Prevents anomalies: Reduces issues such as lost updates, temporary inconsistency,
uncommitted data, and deadlocks.
Recovery management refers to the set of mechanisms and techniques used to ensure that a DBMS can recover from
various types of failures, such as system crashes, hardware failures, or transaction errors, and bring the system back to a
consistent state. Recovery management ensures that data integrity is maintained and that transactions are either fully
completed (committed) or fully undone (rolled back) in case of a failure.
1. Transaction Logs:A transaction log records all changes made to the database. It helps in redoing or undoing
operations during recovery.*The log contains entries for operations like INSERT, UPDATE, DELETE, and
transaction commit/rollback information.
2. Types of Failures:*Transaction Failure: Occurs when a transaction cannot complete successfully due to an
error or conflict.System Failure: The DBMS or the operating system crashes, leading to the loss of in-memory
data.Media Failure: Occurs when the storage medium (e.g., disk or database file) fails, leading to potential data
corruption or loss.
3. Recovery Techniques:
o Log-Based Recovery:In log-based recovery, changes are written to the transaction log first before
being applied to the database. The log can be used during recovery to redo or undo
changes.Write-Ahead Logging (WAL): Ensures that the log is updated before any changes are written
to the database.Checkpointing: is the process of periodically saving the current state of the database and
the log to disk. It helps reduce the time required for recovery, as the DBMS can start recovery from the
last checkpoint rather than from the beginning of the log.Undo/Redo:Undo operations are used to roll
back the changes made by incomplete or aborted transactions.Redo operations are used to reapply the
changes made by committed transactions after a crash or system failure.ARIES (Algorithm for
Recovery and Isolation Exploiting Semantics):ARIES is an advanced recovery algorithm that uses a
combination of analysis, redo, and undo phases to recover the database.*It tracks transactions using a
log and ensures that both committed and uncommitted transactions are handled correctly during
recovery.
4. Backup and Restore:Backups are critical for recovery management. Regular backups of the database ensure
that in case of catastrophic failures (e.g., media failure), the system can restore data from a recent
backup.*Backups can be full, incremental, or differential.
Importance of Recovery Management:
• Ensures durability: Guarantees that once a transaction is committed, its effects are permanent and recoverable,
even after a failure.Data consistency: Recovery ensures that the database is consistent after a failure, preventing
issues such as partial transactions.Fault tolerance: It allows the system to continue functioning even in the
event of failures, minimizing downtime.
Q-Types of Failures in DBMSIn a Database Management System (DBMS), failures can occur due to various reasons,
such as hardware malfunctions, software bugs, or network issues. The system needs mechanisms for recovering from
these failures to maintain consistency, integrity, and availability. The main types of failures in DBMS are:
1. Transaction Failures:These failures occur when a transaction cannot complete successfully. It may fail due to
several reasons, including:System crash: The DBMS or the operating system crashes unexpectedly during the
transaction.Logical errors: The transaction may violate integrity constraints (e.g., trying to insert a record with a
duplicate primary key).Application errors: Errors due to bugs in the application code, such as a division by zero or
incorrect calculations.
2. System Failures:System failures refer to crashes that impact the entire DBMS. These can include:Hardware failure:
A disk crash, power failure, or network failure causing the system to stop functioning.Operating system crashes:
When the OS crashes, the DBMS might also stop functioning temporarily.
System failures often lead to issues such as database corruption or loss of unsaved data.
3. Media Failures:Media failures refer to physical damage or failure of storage devices (e.g., hard disk failure or
memory corruption), causing data to become inaccessible. Examples include:Disk failure: The storage medium (disk or
solid-state drive) crashes or becomes physically damaged, leading to data loss or inaccessibility.File corruption: Data
files on the disk might get corrupted, making the data unusable.
4. Human Errors:These failures occur due to mistakes made by users, administrators, or developers. Examples
include:Accidental data deletion: A user accidentally deletes important records or data.Data modification errors: A
user incorrectly modifies data, leading to inconsistencies.
5. Concurrency Failures:These occur when multiple transactions are executing concurrently, and their interactions
lead to inconsistent or incorrect results. This can happen due to:Deadlocks: When two or more transactions are blocked,
waiting for each other to release resources, leading to a standstill.Lost updates: When multiple transactions update the
same data concurrently without proper synchronization, causing one update to be lost.
QRecovery Techniques in DBMSTo handle failures and maintain consistency, DBMSs use recovery techniques that
allow the system to recover from various types of failures while ensuring that data is consistent and transactions are
durable (i.e., they either complete fully or leave no partial effects). The main recovery techniques used in DBMS are:
1. Log-Based RecoveryLog-based recovery uses a transaction log to track all changes made to the database during a
transaction. The log contains records of all operations (e.g., insert, update, delete), along with the old and new values of
the affected data.
Key concepts:
• Write-Ahead Logging (WAL): In WAL, before any changes are written to the database, the details of the
changes are first written to the log. This ensures that in case of a crash, the changes can be either rolled back or
redone.Transaction Commit: When a transaction is successfully completed, a commit record is written to the
log to mark the transaction as durable.
Example: If a system crashes in the middle of a transaction, the log can be used to:Redo the operations that were
committed but not written to the database.Undo the operations that were in progress but not committed.
2. CheckpointingCheckpointing is a technique in which the DBMS periodically saves the current state of the database
to stable storage, allowing the recovery process to start from a known point. A checkpoint reduces the amount of work
needed during recovery after a failure.
• How Checkpointing Works:During a checkpoint, all modified data pages are written to disk, and a checkpoint
record is written to the log.After a system crash, recovery can begin from the most recent checkpoint, reducing
the number of transactions to redo or undo.
3. Shadow PagingShadow paging is a recovery technique where changes are made to a shadow copy of the database
rather than directly to the database itself. At the end of the transaction, the system switches the active database to the
new version, making the old version obsolete.
• How it works:Shadow pages store the original data, and current pages store the new data.*If a failure occurs,
the database can revert to the original shadow pages, thus ensuring that no partial changes are left in the
database.
Advantages:Provides atomicity and durability for transactions.*Simple to implement, as there is no need for undo or
redo operations.
4. ARIES (Algorithm for Recovery and Isolation Exploiting Semantics)ARIES is an advanced recovery algorithm
used in DBMSs to handle transaction recovery. ARIES uses a combination of techniques such as logging,
checkpointing, and a redo-undo mechanism for recovery.
• How ARIES Works:Analysis Phase: Scans the log backward to identify the last checkpoint and figure out the
state of the database.Redo Phase: Re-applies the changes from the log to bring the database to the state it would
have been in if no failure occurred.Undo Phase: Rolls back the changes made by transactions that were active at
the time of the crash and had not committed yet.
5. BackupsBackups are an essential part of a DBMS's recovery strategy. Regular backups are made of the entire
database or important portions of it (e.g., individual tables or logs). In case of a catastrophic failure (e.g., media failure),
backups can be restored to recover lost data.
Types of Backups:Full Backup: A complete copy of the entire database.Incremental Backup: Only the changes
(updates, inserts, deletes) since the last backup are saved.Differential Backup: All changes since the last full backup
are saved.
Recovery Using Backups: In the event of a failure, a system can restore from the most recent backup and apply the
transaction logs to bring the database to its most recent state.
6. Transaction Rollback/Undo and Rollforward/Redo
• Rollback/Undo: If a transaction is incomplete (due to a failure), its effects are undone using the transaction log.
This ensures that only committed transactions remain in the database.Rollforward/Redo: After recovering from
a failure, the DBMS may need to reapply (redo) changes from the log that were committed but not reflected in
the database at the time of failure.
Q-Timestamp-based concurrency control is a protocol used in database management systems to manage the
execution of concurrent transactions in a way that ensures consistency while preventing conflicts. It assigns a unique
timestamp to each transaction when it starts. These timestamps are then used to determine the serializability of
transactions, which means ensuring that the result of executing multiple transactions concurrently is the same as if they
had been executed serially (one after the other).*The main goal of timestamp-based concurrency control is to maintain
the serializability of transactions while allowing them to run concurrently.
How Timestamp-Based Concurrency Control Works:Each transaction is given a timestamp when it starts. The
timestamp is a unique identifier that reflects the transaction's start time. The DBMS uses these timestamps to order the
execution of conflicting operations (e.g., read and write on the same data item).
Key Concepts: Transaction Timestamp: A unique number assigned to each transaction, which indicates the order in
which it started.Read and Write Rules: The system uses the timestamps to decide whether a transaction’s read or write
operation is allowed based on the timestamps of the other transactions.
There are two main timestamp-based rules to ensure consistency:Read Rule: A transaction can read a data item only if
the transaction's timestamp is earlier than the timestamp of any transaction that has written to the data item.Write Rule:
A transaction can write to a data item only if the transaction's timestamp is earlier than the timestamp of any transaction
that has read or written to the data item.
Basic Idea of Timestamp Ordering:
1. If a transaction T1 reads a data item X, then T1 must have started before any transaction that writes to X.*If a
transaction T1 writes a data item X, then T1 must have started before any transaction that reads or writes to X.
If these conditions are violated, the system can take appropriate actions, such as cancelling or rolling back one of the
conflicting transactions.
Steps in Timestamp-Based Concurrency Control:
Transaction Timestamp Assignment: When a transaction begins, the system assigns a unique timestamp to it (usually
based on the system's clock or a counter).Transaction Execution: Each transaction performs its operations (read, write)
according to the rules defined by the timestamp protocol.Conflict Resolution:If a transaction violates the read or write
rule, it is aborted and rolled back.*The transaction can then be restarted with a new timestamp.
Example :Let’s consider the following scenario with two transactions (T1 and T2) and a data item X.Assigning
Timestamps:T1 starts first and is assigned a timestamp of 1.*T2 starts second and is assigned a timestamp of
2.Operations:T1 reads X (operation 1).*T2 writes X (operation 2).
Now, the system checks whether this is allowed based on the timestamps.
1. Timestamp-Based Decision:T1 reads X and has a timestamp of 1. Since T1 reads X and no other transaction
has written to X yet, the read operation is allowed.*T2 writes to X (its timestamp is 2). The system checks
whether any other transaction has read or written to X before T2 starts:*Since T1 has already read X with a
timestamp of 1, and T2 has a timestamp of 2 (which is later than 1), this means T2 should not be allowed to
write to X because it would result in an inconsistency.Action Taken:Abort T2 because it violates the
timestamp-based rule. Since T1 started before T2, the system ensures that T2 cannot overwrite the data that T1
has read.
2. Re-Execution:T2 is restarted with a new timestamp, ensuring that the transactions are executed in a way that
maintains consistency.
Timestamp-Based Concurrency Control Example Table:
• Explanation: In this case, since T1 read X first (timestamp = 1), T2 cannot write to X (timestamp = 2).
Therefore, T2 is aborted to avoid inconsistencies.
Advantages of Timestamp-Based Concurrency Control:
• Prevents Deadlocks: Unlike lock-based protocols, timestamp-based control does not require transactions to
acquire and release locks, which eliminates the possibility of deadlocks.Simple and Intuitive: The protocol uses
timestamps to establish a clear order of operations, making it easier to understand and implement.
Disadvantages of Timestamp-Based Concurrency Control:
• Abort and Restart: If many transactions conflict, it may lead to frequent aborts and restarts, which can affect
performance.Concurrency Limitations: While the protocol ensures serializability, it may not be as efficient as
other concurrency control methods (e.g., optimistic or multiversion concurrency control) in certain workloads.
Q-Deadlock In the context of a Database Management System (DBMS) or operating systems, a deadlock refers to a
situation in which two or more transactions (or processes) are stuck in a state where each is waiting for the other to
release resources, causing a circular dependency. This situation leads to the transactions or processes being unable to
proceed further because they are all waiting on one another, resulting in a standstill or lockup.
Deadlock in DBMS:In DBMS, deadlock typically occurs when two or more transactions hold locks on resources (like
data items) and simultaneously request locks on resources held by the other transactions. This circular waiting leads to a
deadlock where neither transaction can proceed, and no transaction can release the locks.
Characterstics Mutual Exclusion: At least one resource must be held in a non-shareable mode (i.e., only one
transaction can use a resource at a time).Hold and Wait: A transaction is holding at least one resource and is waiting to
acquire additional resources held by other transactions.*No Preemption: Resources cannot be forcibly taken from a
transaction holding them; they must be released voluntarily.Circular Wait: A set of transactions are waiting for each
other in a circular chain, where each transaction is waiting for a resource held by the next transaction in the chain.
Example of a Deadlock:
Consider two transactions, T1 and T2, and two resources, R1 and R2:
• T1 holds a lock on resource R1 and is requesting a lock on resource R2.
• T2 holds a lock on resource R2 and is requesting a lock on resource R1.
Now, T1 is waiting for T2 to release R2, and T2 is waiting for T1 to release R1. Neither can proceed, and the system is
deadlocked.
T1 R1 R2
T2 R2 R1
Deadlock Detection and Prevention:To handle deadlocks, DBMSs implement strategies such as deadlock prevention,
deadlock avoidance, and deadlock detection.
1. Deadlock Prevention:The goal is to prevent deadlock from occurring by ensuring that at least one of the
deadlock conditions is violated.
o For example:Mutual Exclusion: This condition cannot always be avoided because some resources
cannot be shared (e.g., a database record).Hold and Wait: This can be prevented by requiring that a
transaction must request all the resources it needs at once, before starting.No Preemption: This can be
prevented by allowing the DBMS to preempt resources from a transaction if needed.Circular Wait: This
can be prevented by imposing an ordering on resources. Each transaction can request resources in a
predefined order.
2. Deadlock Avoidance:Deadlock avoidance tries to ensure that the system does not enter into a state where a
deadlock could potentially occur.*One well-known approach is the Wait-For Graph, where the system tracks
transactions and the resources they are waiting for. If a cycle is detected in this graph, the system knows a
deadlock has occurred.*Banker’s Algorithm: In case of resource allocation, this algorithm checks the system’s
state to determine whether granting a resource will lead to a potential deadlock, and only grants the resource if it
would not cause a deadlock.
3. Deadlock Detection:Deadlock detection involves the system periodically checking for deadlock situations and
resolving them if they occur.*A Wait-For Graph can be used to detect cycles in the graph that represent
deadlocks. When a cycle is detected, one or more transactions in the cycle can be aborted to break the deadlock.
4. Deadlock Recovery:Once a deadlock is detected, recovery is needed. Recovery may involve:Killing one or
more transactions to break the cycle and allow the others to proceed.Rolling back a transaction to a safe state
where it doesn't hold any resources.
Deadlock Example (Graph Representation):In this case, we represent the wait-for relationship as a graph:
• T1 is waiting for R2 and holds R1.*T2 is waiting for R1 and holds R2.
This forms a cycle in the Wait-For Graph
T1 → T2
^ |
| v
T2 ← T1
The cycle indicates a deadlock, as T1 is waiting for T2, and T2 is waiting for T1. The DBMS would then need to take
action, such as aborting one of the transactions.
Q-The ACID properties are a set of four key properties that guarantee the reliability and integrity of a database, even
in cases of system failures, power outages, or other unforeseen events. These properties ensure that the database
behaves predictably and consistently during transactions.
The ACID acronym stands for:
1. A - Atomicity
2. C - Consistency
3. I - Isolation
4. D - Durability
Each of these properties plays a critical role in ensuring that transactions in the DBMS are processed in a safe and
secure manner.
1. Atomicity Atomicity ensures that a transaction is treated as a single, indivisible unit of work. A transaction can either
be fully completed (committed) or fully rolled back (aborted). It cannot be in a partial state.Key Point: If a transaction
involves multiple operations (e.g., updating multiple records), either all the operations are successfully completed, or
none are.Example: Suppose you are transferring money from Account A to Account B:*If money is deducted from
Account A but the addition to Account B fails, the transaction should be rolled back so that neither account is
affected.Real-World Example: When you perform a bank transfer, the amount is deducted from one account and added
to another. If any part of the transaction fails (e.g., due to system failure), the entire transaction is aborted, and no partial
changes are made.
2. Consistency Consistency ensures that a transaction takes the database from one valid state to another, maintaining all
database rules, constraints, and triggers. The database must remain in a consistent state before and after the
transaction.Key Point: A transaction should only bring the database into a valid state, meaning it should adhere to the
integrity constraints, such as foreign keys, checks, and uniqueness rules.Example: If a transaction violates a constraint
(e.g., trying to insert a record that violates a unique constraint or foreign key constraint), the transaction will be
aborted.Real-World Example: In a school database, if a transaction tries to enroll a student in a non-existent course,
the database will reject that transaction, ensuring the consistency of the data.
3. Isolation Isolation ensures that the execution of one transaction is isolated from others. This means that the
intermediate states of a transaction are not visible to other transactions. Even if multiple transactions are executed
concurrently, they should not interfere with each other’s operations.Key Point: Transactions should behave as though
they are executed sequentially, even if they are running in parallel. This helps avoid conflicts, such as "dirty reads,"
"non-repeatable reads," or "phantom reads."Example: If two transactions are simultaneously trying to update the same
record, isolation ensures that one transaction must finish before the other can access the record. This prevents
conflicting updates.Real-World Example: Imagine two users are trying to withdraw money from the same bank
account at the same time. The system must ensure that the balance is properly updated for both users, and one user's
transaction cannot interfere with the other Isolation Levels: Different isolation levels control the degree of visibility
between concurrent transactions:Read Uncommitted: Transactions can see uncommitted changes made by other
transactions (lowest isolation).Read Committed: Transactions can only see committed changes from other
transactions.Repeatable Read: Transactions see a consistent snapshot of the data during their execution.Serializable:
Transactions are executed in such a way that it appears as though they were run serially, one after another (highest
isolation).
4. Durability Durability ensures that once a transaction has been committed, its changes are permanent, even in the
event of a system crash, power failure, or other unforeseen issues. The changes are stored in non-volatile storage (like a
hard disk) and cannot be lost.Key Point: Once a transaction has been successfully completed and committed, its effects
are permanent, regardless of any system failures that may occur afterward.Example: If a transaction successfully
updates a record and commits, even if the system crashes immediately after, the changes should be saved permanently
and can be recovered once the system is restored.Real-World Example: When you place an online order and the
confirmation page is displayed, even if the system crashes immediately afterward, your order should be recorded in the
system, and you should still receive it. The durability property ensures this.
Q-Serializability is a fundamental concept in database concurrency control that ensures that the concurrent execution
of transactions results in a database state that is equivalent to some serial execution of those transactions. In other
words, serializability guarantees that the outcome of concurrent transactions will be as if the transactions were executed
one by one, in some sequential order, without any interference or inconsistency.*Serializability is used to ensure that
transactions are executed in a manner that maintains the consistency of the database, even when they are executed
concurrently. It prevents issues like lost updates, temporary inconsistency, and uncommitted data from interfering with
the final state of the database.
Types of Serializability* Conflict Serializability refers to a stricter form of serializability. It ensures that the outcome
of executing a set of transactions concurrently is equivalent to a serial execution of the transactions, but with a specific
restriction: the transactions are considered to be conflicting if they access the same data item and at least one of them
performs a write operation.Two operations conflict if:*They belong to different transactions.*They operate on the
same data item.*At least one of the operations is a write operation.
Conflict Serializability Example:
Let’s consider two transactions:
• T1: Read(A), Write(B)*T2: Write(A), Read(B)
Conflict occurs between Write(A) by T2 and Read(A) by T1 because they access the same data item A and one of them
is a write operation.*If the schedule has no conflicting operations, it can be considered serializable. The conflict graph
or precedence graph is used to determine whether a schedule is conflict serializable. The graph is built as
follows:Each transaction is represented as a node.*If there is a conflict between two transactions (e.g., one writes and
the other reads or writes the same data item), a directed edge is drawn from one transaction to the other, indicating the
order in which they should occur to avoid conflicts.If the resulting graph has no cycles, the schedule is conflict
serializable.
Example:
Read(A) Write(A) T1 → T2
Write(B) Read(B) T2 → T1
• Conflicting operations: Write(A) and Read(A) (T2 → T1), Write(B) and Read(B) (T1 → T2)*Conflict
Serializability: The schedule can be rearranged to form a serial schedule that would be consistent.
2. View Serializabilityis a more relaxed form of serializability. It allows for more flexibility in how the operations of
different transactions are interleaved, but it still guarantees that the final state of the database will be the same as if the
transactions were executed serially. In other words, view serializability focuses on the view each transaction has of the
data and ensures that this view remains consistent with serial execution.
Conditions for View Serializability:A schedule is view serializable if, for every data item, the following three
conditions hold true:Initial Reads Condition: The first transaction that reads a data item in the schedule must read the
same value that would have been read in some serial schedule.Read-from Condition: If a transaction reads a data item
written by another transaction, it must read the same value that the second transaction would write in a serial
schedule.Final Writes Condition: The last write to a data item must match the last write in some serial schedule.
View Serializability Example:Let’s consider two transactions:*T1: Read(A), Write(A)*T2: Write(A), Read(A)
If the view of the data maintained by T1 and T2 (i.e., which values they read and write) matches what would occur in a
serial execution of the transactions, then the schedule can be considered view serializable, even if it is not conflict
serializable.
Example:If T1 reads value X of A, and T2 later writes Y to A, then the final value of A in the schedule is Y, and it
matches the outcome of a serial execution where T2 writes and T1 reads.