Unit 1: 2 Data and Information: 1.2.1. Data
Unit 1: 2 Data and Information: 1.2.1. Data
A Database Management System (DBMS) is software that facilitates the creation, manipulation, and
management of databases. It provides an interface for users to interact with data efficiently and securely.
DBMS handles tasks like data storage, retrieval, and updating, while ensuring data integrity and
consistency. Common types include relational, hierarchical, and object-oriented databases. Examples of
popular DBMSs include MySQL, PostgreSQL, and Oracle.
1.2.1. Data
Data in a DBMS is the raw, unprocessed facts and figures stored in the database. It represents the
basic units of information that are input into the system and typically organized in a structured
format.
1.2.2. Information
Information in a DBMS is the result of processing, organizing, or analyzing data to provide context and
meaning. It is derived from querying and manipulating data to support decision-making, reporting, or
analysis.
1.3. Database
A database is a systematic collection of data that is organized to allow for efficient retrieval, management,
and manipulation. It provides a way to store, organize, and manage data so that it can be easily accessed
and utilized. Here's a comprehensive overview of databases, including their key components and types.
Data within the most common types of databases in operation today is typically modeled in rows and
columns in a series of tables to make processing and data querying efficient. The data can then be easily
accessed, managed, modified, updated, controlled, and organized. Most databases use structured query
language (SQL) for writing and querying data.
A Database Management System (DBMS) is a software system that is designed to manage and organize
data in a structured manner. It allows users to create, modify, and query a database, as well as manage the
security and access controls for that database. DBMS provides an environment to store and retrieve the
data in convenient and efficient manner.
The primary objectives of a Database Management System (DBMS) are to provide an efficient, reliable,
and secure means of storing, managing, and retrieving data.
The main objectives of database management system :
Objective: To systematically organize and manage data so that it is easily accessible and efficiently
maintained.
Structure: Provides a structured way to store data in tables, records, and fields, adhering to a
defined schema.
Data Integrity: Ensures data accuracy and consistency through constraints, relationships, and
rules.
Objective: To enable quick and efficient access to data through various querying mechanisms.
Query Languages: Supports languages like SQL for creating complex queries to retrieve and
manipulate data.
Indexing: Uses indexing techniques to speed up data retrieval and improve query performance.
Access Control: Implements user authentication and authorization mechanisms to control who
can access and modify data.
Encryption: Provides encryption for data at rest and in transit to protect sensitive information.
Objective: To maintain the accuracy and consistency of data throughout its lifecycle.
Constraints: Enforces rules such as primary keys, foreign keys, and unique constraints to ensure
data validity.
Validation: Ensures that data entered into the database meets specified criteria and constraints.
5. Transaction Management
Objective: To handle transactions in a way that ensures data consistency and reliability.
ACID Properties: Ensures transactions are Atomic, Consistent, Isolated, and Durable to
maintain data integrity even in cases of system failure or concurrent access.
Concurrency Control: Manages simultaneous data access and modifications by multiple users to
prevent conflicts and ensure consistency.
Objective: To safeguard data against loss and corruption, and to enable recovery in case of failures.
Backup Procedures: Provides mechanisms for regular data backups to prevent data loss.
Recovery Mechanisms: Ensures that data can be restored to a consistent state after a failure or
corruption.
Objective: To minimize data duplication and ensure that all data is consistent across the database.
Normalization: Uses normalization techniques to reduce data redundancy and improve data
integrity.
Centralized Management: Centralizes data management to avoid duplication and maintain
consistency.
Objective: To ensure that the database can handle increasing amounts of data and user load efficiently.
Scalability: Supports horizontal and vertical scaling to accommodate growth in data volume and
user activity.
Performance Optimization: Implements techniques such as caching, indexing, and query
optimization to enhance performance.
Objective: To provide a layer of abstraction between the physical data storage and the user applications.
Data Abstraction: Allows users to interact with the data without needing to understand the
underlying physical storage details.
Data Independence: Ensures that changes to the database schema or storage do not impact
application programs or user queries.
Objective: To provide tools and features that support users and developers in managing and interacting
with the database.
User Interfaces: Offers graphical user interfaces (GUIs) and command-line tools for interacting
with the database.
Development Tools: Provides tools and libraries for application developers to integrate and
interact with the database.
Database Management Systems (DBMS) offer numerous advantages that significantly enhance the
management, accessibility, and integrity of data. Here are some key advantages of using a DBMS:
Structured Data Storage: Organizes data into tables, rows, and columns, making it easier to
manage and retrieve.
Centralized Management: Provides a single point of access to data, simplifying administration
and maintenance.
3. Data Security
Access Controls: Implements user authentication and authorization to control who can access or
modify data.
Encryption: Provides encryption options for data at rest and in transit to enhance security.
Query Languages: Supports powerful query languages (e.g., SQL) for retrieving and
manipulating data.
Indexing: Uses indexing to speed up data retrieval operations and improve query performance.
Objective: Protect data from loss and ensure recovery in case of failure.
Regular Backups: Provides tools for performing regular backups of data to prevent loss.
Recovery Mechanisms: Includes mechanisms for restoring data from backups and recovering
from system failures.
7. Transaction Management
Objective: Ensure the integrity and consistency of data during concurrent operations.
ACID Properties: Guarantees that transactions are Atomic, Consistent, Isolated, and Durable,
ensuring reliable processing of transactions.
Concurrency Control: Manages simultaneous access to data by multiple users to prevent
conflicts and maintain consistency.
Objective: Handle growing amounts of data and increasing user load efficiently.
Scalability: Supports horizontal and vertical scaling to accommodate growth in data and user
activity.
Performance Optimization: Implements techniques like caching, indexing, and query
optimization to enhance performance.
Objective: Provide a layer of abstraction between the data and the applications that use it.
Data Abstraction: Allows users to interact with the data without needing to understand the
underlying storage details.
Data Independence: Ensures that changes to the database schema do not affect application
programs or queries.
Multi-User Access: Supports concurrent access to data by multiple users and applications,
enhancing collaboration.
Centralized Access: Provides a centralized platform for accessing and sharing data across
different applications and users.
Automated Tasks: Automates routine data management tasks, such as backups and maintenance,
reducing administrative overhead.
1.4.3 Components of DBMS
A database has an immense number of important data that can be securely and swiftly accessed.
Therefore, choosing the right architecture is crucial for effective data management. When connecting to
the database, users can complete their queries more quickly thanks to the DBMS Architecture. Our
decision of database architecture is influenced by a number of variables, including the database's size,
user count, and user relationships. We typically utilize two different kinds of database models: logical
models and physical models.
Types of dbms architecture
There are several types of DBMS Architecture that we use according to the usage requirements. Types
of DBMS Architecture are discussed here.
1-Tier Architecture
2-Tier Architecture
3-Tier Architecture
1.Basic Architecture: Because it just needs one machine to run, 1-Tier Architecture is the
easiest to set up and manage.
2.Cost-Effective: The 1-Tier Architecture is inexpensive to implement because it doesn't
require any more hardware.
3.Simple to Use: Because Tier Architecture is so simple to implement, modest projects are
the main applications for it.
A simple client-server design is comparable to the two-tier architecture. The client-side program
establishes direct communication with the server-side database. For this kind of interaction, APIs
like JDBC and ODBC are utilized. Query processing and transaction management functions are
handled by the server side. User interfaces and application programs are executed on the client
side. In order to communicate with the DBMS, the client-side application connects to the server.
This kind has the benefit of being simpler to learn and maintain, and it works well with current
systems. However, when there are a lot of users, this approach performs poorly.
1.Easy to Access: 2-Tier Architecture makes easy access to the database, which makes fast
retrieval.
2.Scalable: We can scale the database easily, by adding clients or upgrading hardware.
3.Low Cost: 2-Tier Architecture is cheaper than 3-Tier Architecture and Multi-Tier
Architecture.
4. Simple: 2-Tier Architecture is easily understandable as well as simple because of only two
components.
Between the client and the server in a 3-tier architecture, there is an additional layer. There is no
direct communication between the client and the server. Rather, it communicates with an
application server, which connects to the database system through more communication,
allowing for the processing of queries and transaction management. The server and the client can
communicate partially processed data through the use of this intermediary layer. It is with
massive web applications that this kind of architecture is used.
ER Model Steps
1. Identify Entities and Relationships: Determine the objects and their interactions in the
system.
2. Define Attributes: Specify the properties for each entity and relationship.
3. Determine Cardinality: Establish how entities relate to each other in terms of quantity.
An Entity-Relationship (ER) diagram is a key tool in database design, used to visually represent the
structure and relationships within a database. Here's a detailed look at the primary components of an ER
diagram:
1. Entities
2. Attributes
3. Relationships
4. Cardinality
5. Participation Constraints
6. Weak Entities
8. Aggregation
1. Entities
Definition: Objects or concepts that can have data stored about them. Entities are typically things
or concepts that are distinct and can be identified individually.
Representation:
o Symbol: Rectangle
2. Attributes
Definition: Properties or characteristics of an entity that provide more details about it.
Types:
o Simple Attribute: An attribute that cannot be divided further (e.g., ID, Name).
o Composite Attribute: An attribute that can be divided into smaller sub-parts (e.g.,
Address can be split into Street, City, ZipCode).
o Derived Attribute: An attribute whose value can be derived from other attributes (e.g.,
Age can be derived from DateOfBirth).
o Multi-Valued Attribute: An attribute that can hold multiple values (e.g., PhoneNumbers
for a person).
Representation:
o Symbol: Oval
3. Relationships
Definition: Associations between entities that describe how the entities interact with each other.
Types:
o One-to-Many (1
): An instance of entity A can be associated with multiple instances of entity B, but each
instance of B is associated with only one instance of A.
o Many-to-Many (M
): Instances of entity A can be associated with multiple instances of entity B, and vice
versa.
Representation:
o Symbol: Diamond
o Connecting Lines: Lines that connect the diamond to the involved entities
4. Cardinality
Definition: Specifies the number of instances of one entity that can or must be associated with
each instance of another entity.
Representation:
5. Participation Constraints
Definition: Indicates whether all or only some entity instances participate in a relationship.
Types:
o Partial Participation: Only some instances of the entity participate in the relationship
(represented with a single line).
6. Weak Entities
Definition: Entities that cannot be uniquely identified by their own attributes alone and require a
relationship with another (strong) entity to be uniquely identified.
Representation:
Definition:
o Specialization: The process of defining a set of subclasses from a general entity based on
certain attributes.
Representation:
o Symbol for Generalization: A triangle with a line connecting it to the generalized entity,
and lines to the specialized entities.
8. Aggregation
Representation:
In the Entity-Relationship (ER) model, the degree of a relationship specifies the number of
participating entities in that relationship. Here are the common degrees of relationships:
2. Binary Relationship:
o Definition: A relationship between two distinct entity sets.
3. Ternary Relationship:
4. N-ary Relationship:
The ER (Entity-Relationship) model can be classified into different types based on its
complexity and the nature of the relationships it represents. Here’s a classification of the ER
model:
1. Basic ER Model
Description: The foundational model that includes entities, attributes, and relationships.
It covers basic components and their connections but does not include advanced
concepts.
Components:
Components:
4. Hierarchical ER Model
5. Network ER Model
Description: Represents data using a graph structure where entities and relationships
form a network of interconnected nodes.
6. Temporal ER Model
Description: Designed to handle time-varying data, where the historical aspects of data
and changes over time are considered.
7. Multivalued ER Model
Description: Extends the basic ER model to handle attributes that can have multiple
values for a single entity.
Components: Multivalued attributes, Relationships involving multivalued attributes.
1. Inheritance
Inheritance is a mechanism where a new class or entity inherits properties and behaviors from an
existing class or entity. In databases, this is typically represented as a parent-child relationship
between entities. The child entity inherits attributes and relationships from the parent entity.
Example: Suppose you have a Person entity with attributes like PersonID, Name, and Address. You
might have a child entity Employee that inherits these attributes from Person and adds additional
attributes like EmployeeID, Position, and Salary.
2. Specialization
Specialization is the process of defining a more specific entity from a general entity. This is
essentially the creation of sub-entities from a more generalized parent entity, where the sub-
entities (or specialized entities) inherit common attributes from the parent.
Example: In a database, a Vehicle entity might be specialized into Car, Truck, and Motorcycle.
Each specialized entity will inherit common attributes from Vehicle like VehicleID, Make, and
Model, but will also have specific attributes related to their type.
3. Association
Association refers to the relationships between entities. In a database, associations define how
entities interact with each other and how they are related. This could be one-to-one, one-to-
many, or many-to-many relationships.
Example: If you have a Customer entity and an Order entity, the relationship between these two
entities can be modeled as an association where one customer can place many orders (one-to-
many relationship).
In a DBMS, combining ISA relationships helps in modeling complex real-world scenarios. For
instance, an Employee might be both a FullTimeEmployee and a PartTimeEmployee through
specialization and inheritance.
1. Generalization (Opposite of Specialization): You start with specific entities (like
FullTimeEmployee and PartTimeEmployee) and generalize them into a common parent entity
(Employee).
2. Hierarchy: A hierarchical model is formed where Employee is the parent entity, and
FullTimeEmployee and PartTimeEmployee are child entities.
3. Association: The Employee entity might be associated with other entities like Department or
Project, indicating how employees are linked to various departments or projects.
1.8. Constrains
Constraints are rules or conditions imposed on the data in a database table that restrict the type of data
that can be inserted, updated, or deleted, ensuring the data adheres to certain standards and relationships.
Constraints help enforce data integrity and consistency by validating data according to predefined rules
1. Data Integrity: Constraints ensure that the data remains accurate and reliable. They
enforce rules on the data to prevent entry of incorrect or inconsistent information.
2. Consistency: Constraints help maintain consistent data across the database by enforcing
rules and relationships between tables.
3. Enforcement: Constraints are enforced by the DBMS, which checks the data against
these rules during operations like insertion, updating, or deletion.
4. Error Prevention: By defining constraints, a DBMS prevents the entry of invalid data,
thereby reducing the risk of errors and inconsistencies.
1. Primary Key Constraint: Ensures that each row in a table has a unique identifier and
that no two rows have the same value for this identifier. It also enforces that the primary
key column(s) cannot contain NULL values.
2. Foreign Key Constraint: Ensures referential integrity by requiring that a value in one
table (the foreign key) must match a value in another table (the primary key or unique
key), maintaining valid relationships between tables.
3. Unique Constraint: Ensures that all values in a column or a combination of columns are
unique, meaning no two rows can have the same values for the specified columns. It
allows NULL values unless explicitly specified otherwise.
4. Not Null Constraint: Ensures that a column cannot contain NULL values, making it
mandatory for every row to have a valid value for that column.
6. Default Constraint: Provides a default value for a column when no value is specified
during data insertion, ensuring that a column always has a valid value.
Accuracy: Ensures that data entered into the database is correct and adheres to predefined rules.
Reliability: Maintains the reliability of the database by enforcing consistent data entry and
relationships.
Business Rules Enforcement: Implements business logic directly in the database schema,
ensuring that business rules are followed.
1.9.Aggregation
Loose Coupling: The lifecycle of the part (or component) is independent of the lifecycle of the
whole (or container). If the whole is deleted, the part can still exist.
Hierarchical Relationship: Aggregation implies a "has-a" relationship between the whole and its
parts.
Example:
Composition is a stronger form of aggregation and represents a more tightly coupled "whole-
part" relationship where the part cannot exist independently of the whole. The lifecycle of the
part is tied to the lifecycle of the whole.
1.10.1.Characteristics of Composition:
Strong Coupling: The part cannot exist without the whole. If the whole is deleted, the parts are
also deleted.
Ownership: The whole entity owns its parts, meaning the parts are created and destroyed with
the whole.
Exclusive Relationship: A part can only be associated with one whole at a time.
1.10.2.Example:
Consider a House and Room in a real estate database. A House is composed of multiple Rooms. If
the house is demolished, the rooms are no longer relevant and are also destroyed. Here, Room is a
part of House and cannot exist without it.
Aggregation is used when you want to represent a relationship where the parts can exist
independently of the whole. It is suitable for modeling scenarios where parts can be
shared among different wholes or where parts are not exclusively tied to one whole.
Composition is used when you need to represent a relationship where parts are
dependent on the whole and cannot exist separately. It is ideal for scenarios where the
whole and parts have a tightly coupled lifecycle and ownership.
Database Management Systems (DBMS) provide several advantages over traditional file-based
data management systems. Here are some of the key benefits:
Consistency: DBMS enforces constraints (e.g., primary keys, foreign keys) that ensure data
accuracy and consistency across the database.
Validation: Rules and constraints help prevent the entry of invalid data, reducing errors and
inconsistencies.
2. Data Security
Access Control: DBMS provides mechanisms for defining user roles and permissions, ensuring
that only authorized users can access or modify data.
Encryption: Many DBMSs support data encryption to protect sensitive information from
unauthorized access.
Normalization: DBMS uses normalization techniques to eliminate redundancy and ensure that
data is stored efficiently, reducing duplicate data entries.
4. Data Consistency
Transaction Management: DBMS supports transactions that ensure data is updated consistently.
Transactions follow the ACID (Atomicity, Consistency, Isolation, Durability) properties to
maintain data integrity even in the case of system failures.
Concurrency Control: DBMS handles multiple users accessing and modifying data
simultaneously, ensuring that transactions do not interfere with each other.
Indexing: DBMS uses indexing techniques to speed up data retrieval operations, making queries
more efficient.
Query Optimization: Advanced query optimization techniques are employed to improve the
performance of data retrieval and manipulation.
6. Data Independence
Logical and Physical Data Independence: DBMS abstracts the physical storage details from the
users and applications, allowing changes to the database schema without affecting the application
programs that interact with the database.
Automatic Backups: DBMS provides tools and mechanisms for automatic data backups and
recovery, ensuring that data can be restored in case of hardware failures, data corruption, or other
issues.
Recovery Mechanisms: DBMS includes recovery features to restore the database to a consistent
state after a failure or crash.
8. Data Sharing
Multi-user Environment: DBMS allows multiple users to access and share data concurrently,
supporting collaborative work and ensuring data consistency across the organization.
Data Integration: DBMS facilitates integration of data from various sources, providing a unified
view of the data.
Data Modeling: DBMS allows for sophisticated data modeling, enabling the creation of complex
relationships and structures to represent real-world entities and their interactions.
Scalability: Modern DBMSs are designed to handle large volumes of data and can be scaled up
to meet increasing demands.
Centralized Management: Centralized data management reduces the cost and effort associated
with maintaining multiple data copies and ensures that updates are made in a single place.
Automation: DBMS automates many data management tasks, such as backups and indexing,
reducing manual effort and operational costs.
Reporting and Analysis: DBMS provides powerful tools for data reporting, analysis, and
visualization, enabling better decision-making based on accurate and up-to-date information.
Data Mining: Advanced DBMSs support data mining techniques to extract valuable insights
from large datasets.
Data Models: DBMS uses various data models (e.g., hierarchical, network, relational, object-
oriented) to provide an abstract view of the data, simplifying interaction and manipulation.
Unit 2
2.1 Relational Model
The relational model is a framework for managing and structuring data in a database. It was introduced by
E.F. Codd in 1970 and remains a foundational concept in database management systems (DBMS) today.
To query and manipulate data, the relational model uses formal languages:
Relational Algebra: A procedural query language that uses operations like selection, projection,
union, difference, and Cartesian product to retrieve and manipulate data.
Relational Calculus: A declarative query language that specifies what data to retrieve rather than
how to retrieve it. It includes tuple relational calculus and domain relational calculus.
Codd’s rules are a set of thirteen principles proposed by Edgar F. Codd, the inventor of the relational
database model, to define what is required for a database system to be considered truly relational. These
rules are intended to ensure that a database management system (DBMS) adheres to the relational model's
principles and provides a consistent, flexible, and efficient data management environment.
1. **Information Rule**: All information in a relational database should be represented explicitly at the
logical level and in exactly one way — by storing it in tables. This means that data should be stored in
tables and all data should be accessible through these tables.
2. **Guaranteed Access Rule**: Every data element (atomic value) should be logically accessible by
using a combination of table name, primary key, and column name. This implies that you should be able
to retrieve any data value through a simple query based on its table and column.
4. **Dynamic On-Line Catalog Based on the Relational Model**: The database's catalog (metadata or
schema) should itself be stored in the database and accessible using the same query language as user data.
This means that the schema should be represented as tables that can be queried.
5. **Comprehensive Data Sublanguage Rule**: The system must support a comprehensive language
that includes data definition, data manipulation, and transaction management operations. This language
should be capable of handling all aspects of database interactions.
6. **View Updating Rule**: Any view that is theoretically updatable must be updatable by the system.
In other words, if a view (a virtual table) is created from one or more base tables, it should be possible to
perform insert, update, and delete operations through this view if the view's structure permits it.
7. **High-Level Insert, Update, and Delete**: The system should support set-based operations for data
manipulation. Instead of operating on individual rows, the system should allow for batch operations on
sets of rows.
8. **Physical Data Independence**: Changes to the physical storage of data (e.g., changing the
hardware or storage structure) should not require changes to the logical schema (the way data is
represented and accessed). This means that the database's physical implementation details should be
abstracted away from the user.
9. **Logical Data Independence**: Changes to the logical schema (e.g., adding or removing tables or
columns) should not require changes to the application programs that use the data. This implies that
applications should be insulated from changes in the logical structure of the database.
10. **Integrity Independence**: Integrity constraints (rules that ensure data correctness) must be
specified separately from application programs and stored in the database catalog. This ensures that
constraints are maintained consistently across all applications.
11. **Distribution Independence**: The database should be able to operate and be accessed regardless
of whether the data is distributed across multiple locations or stored in a single location. This means the
system should handle data distribution transparently.
12. **Non-Subversion Rule**: If the system provides a low-level or non-relational interface (such as
file system access), it should not bypass or subvert the integrity and security rules enforced by the
relational model. In other words, access methods that bypass the relational model should still adhere to
the system’s rules and constraints.
These rules set a high standard for relational database systems and help ensure that they provide a robust,
flexible, and consistent environment for managing data. While not all modern relational database systems
meet every rule perfectly, these principles continue to influence database design and management
practices.
1. Tables (Relations)
Definition: A table, or relation, is a collection of tuples (rows) that share the same attributes
(columns).
Structure: Each table has a unique name and consists of rows and columns. Each column has a
specific data type, and each row represents a single record.
2. Attributes (Columns)
Definition: Attributes define the properties or fields of a table. Each attribute has a specific data
type and constraints.
Example: In a table for "Employees", attributes might include EmployeeID, FirstName,
LastName, and Department.
3. Tuples (Rows)
Definition: A tuple is a single record in a table. It consists of values for each attribute in the table.
Example: A row in the "Employees" table might have values like (1, 'John', 'Doe', 'Marketing').
4. Primary Key
Definition: A primary key is an attribute or a set of attributes that uniquely identifies each tuple
in a table. It ensures that no two rows have the same key value.
Example: In the "Employees" table, EmployeeID could be the primary key.
5. Foreign Key
Definition: A foreign key is an attribute in one table that refers to the primary key of another
table. It establishes a relationship between the two tables.
Example: If there's a "Departments" table with DepartmentID as the primary key, then the
"Employees" table might include a DepartmentID foreign key to link each employee to their
department.
6. Relationships
Types:
o One-to-One: A single record in one table is related to a single record in another table.
o One-to-Many: A single record in one table is related to multiple records in another table.
o Many-to-Many: Multiple records in one table are related to multiple records in another
table, typically managed using a junction table.
7. Normalization
Definition: Normalization is the process of organizing data to minimize redundancy and improve
data integrity. It involves dividing tables into related tables and defining relationships.
Forms:
o 1NF (First Normal Form): Ensures that each column contains atomic (indivisible)
values.
o 2NF (Second Normal Form): Achieved when the table is in 1NF and all non-key
attributes are fully functionally dependent on the primary key.
o 3NF (Third Normal Form): Achieved when the table is in 2NF and all attributes are
only dependent on the primary key, not on other non-key attributes.
8. Integrity Constraints
Definition: Integrity constraints are rules applied to the data to ensure accuracy and consistency.
Types:
o Entity Integrity: Ensures that each table has a primary key and that the primary key
values are unique and not null.
o Referential Integrity: Ensures that foreign key values in one table match primary key
values in another table or are null.
Definition: SQL is the standard language used to interact with relational databases. It is used for
querying, updating, and managing data.
Operations: Includes commands such as SELECT, INSERT, UPDATE, DELETE, and CREATE
TABLE.
The relational model's power lies in its simplicity and the use of mathematical concepts, particularly set
theory, to model data and relationships. This structure allows for flexible querying and data manipulation,
which has made it the dominant paradigm in database management.
2.4 Key
In the relational data model, keys and integrity constraints are fundamental to maintaining the
accuracy, consistency, and reliability of the data stored in a relational database. Here's a detailed
overview of both concepts:
Keys are attributes or sets of attributes that are used to uniquely identify tuples (rows) within a
table or to establish relationships between tables. There are several types of keys:
Primary Key
Candidate Key
Alternate Key
Definition: An alternate key is any candidate key that is not chosen as the primary key. It
still provides a unique identification for rows in the table.
Example: If EmployeeID is the primary key in the Employees table, then Email and
PhoneNumber could be alternate keys.
Composite Key
Definition: A composite key is a primary key that consists of two or more attributes. It is
used when a single attribute alone cannot uniquely identify a row.
Example: In a table CourseEnrollments, a composite key might be a combination of
StudentID and CourseID to uniquely identify each enrollment.
Foreign Key
Definition: A foreign key is an attribute or set of attributes in one table that refers to the
primary key of another table. It is used to establish and enforce relationships between
tables.
Properties:
o Referential Integrity: Foreign key values must match primary key values in the
referenced table or be null.
Example: In a table Orders, CustomerID could be a foreign key that references the
CustomerID in the Customers table.
Entity Integrity
Definition: Entity integrity ensures that each table has a primary key and that the primary
key values are unique and not null.
Rules:
o Primary Key Uniqueness: Each row in the table must have a unique primary key
value.
o Primary Key Non-nullability: Primary key columns cannot contain null values.
Purpose: Ensures that every record in the table can be uniquely identified.
Referential Integrity
Definition: Referential integrity ensures that a foreign key value in one table matches a
primary key value in the referenced table or is null.
Rules:
o Valid References: Foreign key values must exist in the referenced table or be
null.
o Actions on Update/Delete:
ON DELETE CASCADE: Automatically deletes child rows when a
parent row is deleted.
ON UPDATE CASCADE: Automatically updates child rows when a
parent key value is updated.
ON DELETE SET NULL: Sets the foreign key value to null in the child
table when the parent row is deleted.
ON DELETE RESTRICT: Prevents the deletion of a parent row if there
are related child rows.
Purpose: Maintains consistent and valid relationships between tables.
Domain Integrity
Definition: Domain integrity ensures that the values in a column are of the correct data
type and adhere to constraints defined for that column.
Rules:
o Data Type: The data type of column values must match the defined data type.
o Value Constraints: Constraints such as range, length, and format must be
enforced.
Purpose: Ensures that column values are within acceptable limits and conform to
specified formats.
User-Defined Integrity
Definition: User-defined integrity encompasses rules and constraints defined by the user
to enforce specific business rules and requirements not covered by the standard integrity
constraints.
Rules:
o Custom Constraints: Rules that reflect specific business logic or requirements.
Purpose: Allows the implementation of application-specific rules that maintain data
quality and consistency according to business needs.
1. Selection (σ)
The selection operation filters rows from a relation based on a specified condition. It is
analogous to the WHERE clause in SQL.
Notation: σcondition(R)\sigma_{condition}(R)σcondition(R)
Example: To find employees with a salary greater than $50,000 from the Employees relation:
σsalary>50000(Employees)\sigma_{\text{salary} > 50000}(\text{Employees})σsalary>50000
(Employees)
2. Projection (π)
The projection operation selects specific columns from a relation, discarding the others. It is
similar to the SELECT clause in SQL.
Notation: πcolumns(R)\pi_{columns}(R)πcolumns(R)
Example: To retrieve the name and salary columns from the Employees relation: πname,
salary(Employees)\pi_{\text{name, salary}}(\text{Employees})πname, salary(Employees)
3. Union (∪)
The union operation combines the tuples from two relations, eliminating duplicate tuples. The
relations must have the same schema (i.e., the same number and types of columns).
Example: To combine employees from two departments into one list: DeptA∪DeptB\
text{DeptA} \cup \text{DeptB}DeptA∪DeptB
4. Intersection (∩)
The intersection operation returns tuples that are present in both relations. The relations must
have the same schema.
Example: To find employees who are present in both DeptA and DeptB: DeptA∩DeptB\
text{DeptA} \cap \text{DeptB}DeptA∩DeptB
5. Difference (−)
The difference operation returns tuples that are in the first relation but not in the second. The
relations must have the same schema.
The Cartesian product operation returns all possible pairs of tuples from two relations. This
operation combines each tuple in the first relation with every tuple in the second relation.
7. Join (⨝)
The join operation combines tuples from two relations based on a common attribute. There are
several types of joins, including inner join, natural join, and equi join.
Equi Join: A special case of theta join where the condition is equality.
8. Rename (ρ)
The rename operation changes the name of a relation or its attributes. It is used to give a new
name to a relation or to resolve attribute name conflicts.
Notation: ρnew_name(R) or ρnew_name(attributes)(R)\rho_{new\_name(R)} \text{ or } \
rho_{new\_name(\text{attributes})(R)}ρnew_name(R) or ρnew_name(attributes)(R)
Additional Operations
Some additional operations that can be derived from these basic operations include:
Division (÷): Useful for queries that require finding tuples in one relation that match all
tuples in another relation.
Example: To find employees who have worked on all projects listed in the Projects
relation.
Understanding these operations is crucial for working with relational databases and forms the
basis for more advanced query optimization and data manipulation techniques.
1. Expressiveness:
o Powerful Queries: Relational operations, such as selection, projection, join, and union,
enable expressive and complex queries. SQL provides a rich set of operations to retrieve
and manipulate data in sophisticated ways.
2. Declarative Nature:
o Declarative Queries: In relational algebra and SQL, users specify what data they want
without detailing how to obtain it. This declarative approach simplifies query writing and
focuses on the desired result rather than the procedural steps.
3. Data Integration:
o Join Operations: Operations like joins allow for combining data from multiple tables
based on related columns, enabling users to integrate and analyze data from different
sources within a single query.
4. Data Integrity:
o Constraints and Rules: Relational operations often include mechanisms to enforce data
integrity and consistency, such as primary key constraints, foreign key constraints, and
domain constraints.
5. Normalization Support:
6. Flexibility:
o Dynamic Queries: Users can write dynamic and flexible queries that adapt to changing
requirements, making relational databases versatile for various applications and reporting
needs.
7. Optimization:
1. Performance Overheads:
2. Scalability Constraints:
3. Schema Rigidity:
o Schema Changes: Changes to the schema, such as adding or modifying columns, can be
complex and disruptive, requiring careful planning and potentially affecting existing
operations and queries.
4. Complexity in Query Writing:
o Complex Joins: Writing queries that involve complex joins or nested subqueries can be
challenging and error-prone, especially for users who are not familiar with advanced SQL
techniques.
o Limited Support: Relational operations are optimized for structured data. Handling
unstructured or semi-structured data, such as text or multimedia content, often requires
additional processing or adaptations.
Concept: Tuple Relational Calculus allows users to describe what data to retrieve based
on a set of conditions applied to tuples (rows) in a relation (table). The result is a set of
tuples that satisfy the specified conditions.
Syntax Example: To find all employees in the "Employees" table who work in
department 10:
1. Domain Variables:
Definition: Domain variables are used to represent the values of attributes in the database. Each
domain variable corresponds to a particular attribute or set of attributes.
Example: If you have a relation Employee with attributes EmployeeID, Name, and Salary, you
might use domain variables like EID, N, and S to represent values for these attributes.
Syntax: A DRC query is expressed in the form of a logical predicate. It specifies the conditions
that must be satisfied by the domain variables.
Form: {D | P(D)} where D is a domain variable and P(D) is a predicate (condition) that the values
of D must satisfy.
3. Predicate (Condition):
Definition: The predicate is a logical expression that specifies the conditions that the values of
the domain variables must meet.
Example: If you want to retrieve the names of employees who earn more than $50,000, the
predicate could be Salary > 50000.
Problem: Retrieve the names of employees with a salary greater than $50,000.
Employee(EID, N, Salary) represents a tuple in the Employee relation where EID is the employee ID,
N is the employee's name, and Salary is the salary.
Salary > 50000 is the condition that the Salary must meet.
Declarative Nature: DRC allows users to specify what data they want without needing to specify
how to retrieve it, focusing on the conditions rather than the procedure.
Formal Foundation: It provides a formal foundation for querying relational databases and helps
in understanding the theoretical aspects of query languages.
Domain Relational Calculus: Works with domain variables representing individual attribute
values.
Tuple Relational Calculus: Works with tuple variables representing entire rows or records.
SQL: A practical query language that is commonly used in relational databases. It is a procedural
language compared to the more declarative nature of DRC.
2.10 QBE
Query By Example (QBE) is a database query language used for querying relational databases.
It provides a user-friendly, graphical way to formulate queries by specifying example data rather
than writing complex code. Here’s a detailed overview of QBE, including its features,
advantages, and limitations:
Concept:
QBE allows users to construct queries by filling out a template or form that represents the
structure of the database. Users specify example values in the form to indicate what they
want to retrieve.
How It Works:
Users interact with a graphical interface where they specify example records or
conditions in a template. The system then translates these examples into a formal query to
retrieve the matching records from the database.
o Users are presented with a form or template that mirrors the structure of the
database tables. Fields in the form correspond to columns in the tables.
2. Criteria Specification:
o Users provide example values or conditions in the fields to specify what data they
are interested in. For instance, if a user wants to find all employees in a specific
department, they enter the department number in the relevant field.
3. Query Execution:
o The QBE system translates the example-based criteria into a query language such
as SQL. It then executes the query against the database and returns the results.
2.10.3 Advantages of QBE:
1. User-Friendly:
2. Reduced Complexity:
o No Need for Syntax Knowledge: Users do not need to know the syntax of query
languages like SQL. Instead, they interact with a straightforward form or
template.
o Rapid Prototyping: Users can quickly construct and modify queries by filling
out and adjusting the example fields in the form.
4. Error Reduction:
o Minimized Syntax Errors: Since users are interacting with a graphical interface
rather than writing code, there is less risk of syntax errors in the queries.
5. Visual Feedback:
o Immediate Results: Users can see how their criteria affect the results, providing
immediate feedback and making it easier to refine queries.
1. Limited Expressiveness:
o Complex Queries: QBE may struggle with more complex queries that require
advanced operations, such as complex joins, aggregations, or nested subqueries.
2. Less Flexibility:
o Template Constraints: The predefined forms and templates may limit the
flexibility of query construction, making it harder to perform unconventional
queries.
3. Vendor Dependency:
o Implementation Variability: Different database systems may have variations in
their QBE implementations, leading to inconsistencies and potentially reducing
portability.
4. Performance Concerns:
o Query Optimization: QBE interfaces may not always generate the most efficient
queries, and the performance of queries generated through QBE can vary
depending on the system’s optimization capabilities.
Unit 3
3.1 Structure of Relational Database.
A relational database is organized in a way that facilitates the management and retrieval of data
through a structured approach. The key components and structures of a relational database
include:
1. Tables
Definition: Tables are the fundamental building blocks of a relational database. They store data
in rows and columns.
Structure: Each table has a name and is composed of rows (records) and columns (fields or
attributes).
Example: A table named Employees might have columns like EmployeeID, FirstName, LastName,
and HireDate.
2. Rows
Definition: Rows, also known as records or tuples, represent individual entries in a table.
Structure: Each row contains data corresponding to each column in the table.
3. Columns
Definition: Columns define the attributes or fields of the data stored in a table.
Structure: Each column has a name and a data type (e.g., INTEGER, VARCHAR, DATE).
4. Primary Key
Definition: A primary key is a column or a set of columns that uniquely identifies each row in a
table.
Characteristics: The values in the primary key column(s) must be unique and not null.
5. Foreign Key
Definition: A foreign key is a column or a set of columns in one table that refers to the primary
key of another table.
Purpose: It establishes a relationship between two tables and ensures referential integrity.
Example: An Orders table might have a CustomerID column that references the CustomerID
column in a Customers table.
6. Indexes
Definition: Indexes are used to improve the speed of data retrieval operations on a table.
Purpose: They create a data structure that allows for faster searches, sorting, and querying.
Types: Common types include primary key indexes and secondary indexes.
7. Views
Definition: Views are virtual tables created by querying one or more tables.
Purpose: They provide a way to present data in a specific format or subset without altering the
actual tables.
Example: A view might combine data from the Employees and Departments tables to show
employees and their department names.
8. Relationships
Types:
Relational database design is a critical aspect of organizing and managing data in a relational
database management system (RDBMS). The primary goal is to structure data in a way that
minimizes redundancy and maintains data integrity, making it easy to access, modify, and
manage. Here's a comprehensive introduction to the key concepts and principles involved in
relational database design:
1. Basic Concepts
Relational Model: Proposed by Edgar F. Codd in 1970, the relational model represents
data as tables (or relations). Each table consists of rows (tuples) and columns (attributes).
Tables can be related to each other through common attributes.
Tables (Relations): The core structure of a relational database. Each table represents an
entity (e.g., customers, orders) and contains rows (records) and columns (fields).
Rows (Tuples): Individual records within a table. Each row represents a unique instance
of the entity described by the table.
2. Keys
Primary Key: A unique identifier for each row in a table. No two rows can have the
same primary key value, ensuring each record is unique. Example: customer_id in a
Customers table.
Foreign Key: A field (or a collection of fields) in one table that uniquely identifies a row
of another table. It establishes a relationship between two tables. Example: customer_id in
an Orders table that refers to the customer_id in the Customers table.
Composite Key: A primary key that consists of two or more columns. This is used when
a single column is not sufficient to uniquely identify a record.
3. Normalization
Normalization is the process of organizing a database to reduce redundancy and improve data
integrity. It involves dividing large tables into smaller, related tables and defining relationships
between them. The process typically involves several normal forms:
First Normal Form (1NF): Ensures that each column contains atomic (indivisible)
values, and each record is unique. This means no repeating groups or arrays in a single
column.
Second Normal Form (2NF): Builds on 1NF by ensuring that all non-key attributes are
fully functionally dependent on the entire primary key. It removes partial dependencies.
Third Normal Form (3NF): Ensures that all attributes are dependent only on the
primary key and not on other non-key attributes. It eliminates transitive dependencies.
Boyce-Codd Normal Form (BCNF): A stronger version of 3NF that handles certain
types of anomaly that 3NF does not cover.
4. Relationships
One-to-Many (1
Many-to-Many (M
): Multiple records in Table A are related to multiple records in Table B. This requires a junction
table to manage the relationship. Example: Students and courses, where each student can enroll in
many courses and each course can have many students.
5. Indexes
Indexes improve the speed of data retrieval operations on a database at the cost of additional
storage space and potential decrease in performance for data modification operations. They are
used to quickly locate rows based on the values of specific columns.
6. Data Integrity
Data integrity ensures that the data is accurate and consistent. It is enforced through:
Entity Integrity: Ensures that each table has a primary key and that it is unique and not
null.
Referential Integrity: Ensures that foreign keys in a table match primary keys in another
table, maintaining consistent relationships.
Domain Integrity: Ensures that all values in a column adhere to a defined domain (e.g.,
data types, ranges).
The objectives of a relational database are centered around managing and organizing data
efficiently while ensuring its integrity and usability. Here are the key objectives:
1. Data Integrity: Ensure that data is accurate and consistent. Relational databases use
constraints like primary keys, foreign keys, and unique constraints to maintain data
accuracy and prevent anomalies.
2. Data Independence: Provide a clear separation between the data and the applications
that use it. This allows changes to the database schema without affecting the applications
that access the data.
3. Data Consistency: Ensure that data remains consistent across the database by enforcing
rules and relationships. This helps in maintaining uniformity and reducing redundancy.
4. Efficient Data Retrieval: Optimize the storage and retrieval of data. Relational databases
use indexing, query optimization, and efficient data retrieval methods to handle large
volumes of data and complex queries effectively.
5. Ease of Data Manipulation: Allow for easy insertion, updating, and deletion of data.
SQL (Structured Query Language) is commonly used for these operations, providing a
standardized way to interact with the data.
6. Scalability: Handle growing amounts of data and increasing numbers of users without
significant performance degradation. This involves both horizontal (scaling out) and
vertical (scaling up) scaling techniques.
7. Data Security: Protect data from unauthorized access and breaches. Relational databases
provide various security features such as user authentication, access control, and
encryption.
8. Transaction Management: Ensure that database operations are processed reliably and
adhere to the principles of ACID (Atomicity, Consistency, Isolation, Durability). This
guarantees that transactions are completed successfully or rolled back entirely in case of
failure.
11. Data Redundancy Reduction: Minimize data duplication by normalizing the database
schema, which helps in organizing data into related tables to avoid redundant data
storage.
o PostgreSQL: An open-source DBMS known for its advanced features and compliance
with SQL standards.
o Oracle Database: A commercial DBMS known for its robust features and scalability.
o Microsoft SQL Server: A commercial DBMS developed by Microsoft, known for its
integration with other Microsoft products.
2. Database Design Tools: Software for designing and managing database schemas.
o MySQL Workbench: A visual tool for MySQL that includes data modeling, SQL
development, and server administration.
o Oracle SQL Developer: A free tool for database development and management with
Oracle databases.
o Microsoft SQL Server Management Studio (SSMS): A tool for managing SQL Server
databases.
3. Query Tools: Tools or interfaces for executing SQL queries and managing database
interactions.
o DataGrip: A database IDE by JetBrains that supports multiple databases and provides a
powerful SQL editor.
4. Data Migration and Integration Tools: Tools for moving data between databases or
integrating databases with other systems.
o Apache NiFi: An open-source data integration tool that supports data flow management.
o Backup tools provided by the DBMS: Most DBMS come with built-in backup and
recovery features.
o Third-Party Backup Solutions: Tools like Rubrik or Veeam can be used for more
comprehensive backup solutions.
In relational databases, redundancy and data anomalies can significantly impact data integrity
and efficiency. Understanding these issues helps in designing robust databases. Here’s a detailed
look at redundancy and data anomalies in relational databases:
Redundancy
Redundancy refers to the unnecessary repetition of data within a database. While some level of
redundancy can be intentional and useful (e.g., for optimization), excessive redundancy can lead
to several problems:
Increased Storage Costs: Storing redundant data consumes additional disk space, which
could otherwise be used more efficiently.
Performance Degradation: Redundant data can slow down query performance, as the
system may need to sift through more data than necessary.
Examples of Redundancy
Storing the same information in multiple tables: For example, if a customer’s address
is stored in both the Customers and Orders tables, any updates to the address must be made
in both places.
Duplicated records: If a table contains multiple rows with the same information, it can
lead to inefficient use of resources.
Data Anomalies
Data anomalies are irregularities or inconsistencies in a database that arise due to poor design or
improper handling of data. There are several types of anomalies:
1. Update Anomaly
Occurs when changes to data in one place must be applied in multiple locations. Failure
to update all instances can lead to inconsistencies.
o Example: If a customer’s address is stored in multiple tables, updating the address in one
table but not the others results in differing address information across the database.
2. Insert Anomaly
Occurs when the structure of the database prevents certain types of data from being
entered unless other related data is also provided.
3. Delete Anomaly
Happens when deleting a record results in the unintended loss of other related data.
o Example: If deleting a customer’s record from a table that also contains order
information results in the loss of all associated orders, even if the order data is still
needed for historical purposes.
Normalization is a key technique used to address redundancy and data anomalies. It involves
decomposing tables to eliminate redundant data and ensure that each piece of information is
stored only once. The process typically includes several normal forms:
1. First Normal Form (1NF): Ensures that each column contains only atomic (indivisible)
values and each record is unique. It eliminates repeating groups and arrays.
2. Second Normal Form (2NF): Builds on 1NF by ensuring that all non-key attributes are
fully functionally dependent on the entire primary key. It removes partial dependencies.
3. Third Normal Form (3NF): Ensures that all attributes are dependent only on the
primary key and not on other non-key attributes. It eliminates transitive dependencies.
4. Boyce-Codd Normal Form (BCNF): Addresses certain types of anomalies not handled
by 3NF by ensuring that every determinant is a candidate key.
5. Fourth Normal Form (4NF) and Fifth Normal Form (5NF): Deal with multi-valued
dependencies and join dependencies, respectively, further refining the database design.
Example Scenario
Consider a database for a retail store with a table that combines customer and order information:
Redundancy: The address for Alice Smith is repeated for each of her orders.
Update Anomaly: If Alice Smith’s address changes, it must be updated in multiple rows.
Insert Anomaly: To insert a new customer, you might need to also insert an order, which
is not always appropriate.
Delete Anomaly: Deleting Alice Smith’s record would also remove all her orders, even if
you want to retain historical order data.
Normalization would involve splitting this table into separate tables for customers and
orders:
1. Customers Table:
2. Orders Table:
102 1 Widget B 10
103 2 Widget C 3
By separating the tables, you minimize redundancy, making updates easier and more reliable,
and avoiding anomalies in data management.
Formally, for a relation RRR and attributes XXX and YYY in RRR:
XXX functionally determines YYY (denoted as X→YX \rightarrow YX→Y) if and only if, for
any two tuples in RRR that agree on the XXX attributes, they must also agree on the YYY
attributes.
In other words, if two rows have the same value for XXX, they must have the same value for
YYY.
Single Attribute FD: In a table where EmployeeID is unique for each employee, we can
say EmployeeID → EmployeeName. This means that for a given EmployeeID, there is a unique
EmployeeName.
Composite FD: In a table with CourseID and StudentID, if each combination of CourseID and
StudentID uniquely determines a Grade, we can write this as (CourseID, StudentID) → Grade.
5. Armstrong's Axioms
Transitivity: If X→YX \rightarrow YX→Y and Y→ZY \rightarrow ZY→Z, then X→ZX \
rightarrow ZX→Z.
These axioms are used to infer all possible functional dependencies that hold in a given relation.
The closure of a set of functional dependencies, often denoted F+F^+F+, is the set of all
functional dependencies that can be inferred from the original set FFF using Armstrong's axioms.
Understanding functional dependencies can also aid in query optimization by allowing the
database management system to make better decisions about indexing and query execution plans
based on the expected relationships between data attributes.
A functional dependency, denoted as X→YX \rightarrow YX→Y, between two sets of attributes
XXX and YYY in a relational schema means that if two rows have the same value for XXX,
they must also have the same value for YYY. In other words, XXX functionally determines
YYY.
Components
1. Left-hand Side (LHS): The set of attributes XXX on which the dependency is based.
2. Right-hand Side (RHS): The set of attributes YYY that are determined by XXX.
3.7 Normalization
Normalization is a process in database design used to organize data to reduce redundancy and
improve data integrity. The main goal of normalization is to ensure that the database is free from
undesirable characteristics like insertion, update, and deletion anomalies. This is achieved by
dividing a database into two or more tables and defining relationships between them.
Normalization Process
Normalization involves decomposing a database schema into a set of tables and establishing
constraints to maintain data integrity. It typically involves several stages or normal forms, each
addressing specific types of redundancy and anomalies.
Objective: Ensure that the table is a relation, meaning it has a clear structure with atomic values.
Criteria:
Example:
Objective: Ensure that all non-key attributes are fully functionally dependent on the entire
primary key.
Criteria:
There should be no partial dependency of any column on the primary key. Each non-key attribute
must be fully dependent on the entire primary key.
Example:
1. StudentCourse Table:
StudentID CourseID
101 CS101
101 CS102
2. Instructor Table:
Objective: Ensure that non-key attributes are not dependent on other non-key attributes
(transitive dependency).
Criteria:
There should be no transitive dependencies; all non-key attributes must be dependent only on the
primary key.
Example:
Here, InstructorPhone is dependent on Instructor, not directly on the primary key StudentID, CourseID.
To achieve 3NF, separate into:
1. StudentCourse Table:
StudentID CourseID
101 CS101
101 CS102
2. Instructor Table:
CourseID Instructor
3. InstructorPhone Table:
Instructor InstructorPhone
Objective: Handle anomalies not addressed by 3NF by ensuring that every determinant is a
candidate key.
Criteria:
Example:
1. ACID Properties
o Atomicity: Ensures that all operations within a transaction are completed successfully or
none are applied. If any operation fails, the transaction is aborted, and all changes are
rolled back to maintain data integrity.
o Consistency: Ensures that a transaction brings the database from one consistent state to
another consistent state. This means that all data integrity constraints must be satisfied
after the transaction.
o Isolation: Ensures that the operations of one transaction are isolated from those of other
concurrent transactions. This means the intermediate state of a transaction is invisible to
other transactions.
o Durability: Ensures that once a transaction has been committed, its changes are
permanent and will survive any subsequent system failures.
2. Transaction States
o Partially Committed: The transaction has executed its final operation but has not yet
been committed.
o Committed: All operations of the transaction have been completed successfully, and
changes are permanently applied to the database.
o Failed: The transaction cannot proceed due to some error and must be rolled back.
o Aborted: The transaction has been rolled back, and all changes have been undone.
o COMMIT: Saves all changes made during the transaction to the database, making them
permanent.
o ROLLBACK: Undoes all changes made during the transaction, reverting the database to
its state before the transaction began.
o SAVEPOINT: Sets a point within a transaction to which you can later roll back without
affecting the entire transaction.
4. Concurrency Control
o Locking: Prevents multiple transactions from accessing the same data concurrently in
conflicting ways. Locks can be shared (read) or exclusive (write).
o Deadlock: A situation where two or more transactions are waiting for each other to
release locks, causing all of them to remain blocked. Deadlock detection and resolution
mechanisms are used to handle such scenarios.
o Isolation Levels: Define the degree to which the operations in one transaction are
isolated from those in other concurrent transactions. Common isolation levels include:
Read Committed: Ensures that only committed data is read, avoiding dirty reads
but allowing non-repeatable reads.
o Write-Ahead Logging (WAL): Logs changes before they are applied to the database.
This ensures that if a system failure occurs, the changes can be recovered using the log.
o Checkpointing: Periodically saves the state of the database to reduce recovery time. A
checkpoint is a point in time where all changes up to that point are written to disk.
Consider a banking application where a transaction transfers money from one account to another.
This operation involves two steps:
sql
Copy code
BEGIN TRANSACTION;
COMMIT;
If an error occurs after the first update but before the commit, a rollback will be issued to undo
the changes:
sql
Copy code
ROLLBACK;
1. Confidentiality
o Techniques:
Access Control: Restricts who can view or modify data. Includes role-based
access control (RBAC), discretionary access control (DAC), and mandatory
access control (MAC).
2. Integrity
o Techniques:
Data Validation: Ensures that only valid data is entered into the database.
Includes checks like data type validation, range checks, and format checks.
Constraints: Define rules that data must adhere to, such as primary keys, foreign
keys, unique constraints, and check constraints.
Audit Trails: Track changes to data and database operations to detect and
analyze unauthorized changes or anomalies.
3. Availability
o Objective: Ensure that data is accessible when needed, even in the face of failures or
attacks.
o Techniques:
Backup and Recovery: Regularly back up data and have recovery procedures in
place to restore data in case of loss or corruption.
4. Authentication
o Techniques:
5. Authorization
o Objective: Control what authenticated users are allowed to do within the database.
o Techniques:
Privileges and Roles: Assign specific permissions to users or roles, such as read,
write, update, or delete access.
6. Auditing
o Objective: Monitor and record database activities to detect and respond to potential
security incidents.
o Techniques:
Audit Logs: Maintain records of user actions, data changes, and system access.
Logs should be securely stored and regularly reviewed.
7. Data Masking
o Techniques:
Static Data Masking: Replaces sensitive data with anonymized data in non-
production environments.
Dynamic Data Masking: Hides sensitive data in real-time for users who do not
have the necessary permissions.
8. Vulnerability Management
o Techniques:
1. Least Privilege Principle: Grant users the minimum level of access necessary to perform their
job functions.
2. Regular Backups: Schedule frequent backups and test recovery procedures to ensure data can be
restored in case of failure or corruption.
3. Encryption: Encrypt sensitive data both at rest and in transit to protect it from unauthorized
access.
4. Secure Configuration: Follow security best practices for configuring the DBMS, including
disabling unused features and changing default settings.
5. Monitoring and Auditing: Implement continuous monitoring and auditing to detect and respond
to suspicious activities or security breaches.
6. User Training: Educate users about security best practices, including password management and
recognizing phishing attempts.
Unit 4
4.1 SQL
SQL (Structured Query Language) is a standardized language used to manage and manipulate
relational databases. It allows users to perform various operations such as querying data,
updating records, inserting new data, and deleting existing data.
SQL (Structured Query Language)
Components:
o DDL (Data Definition Language): Defines database structures (e.g., CREATE, ALTER,
DROP).
o DML (Data Manipulation Language): Manages data (e.g., SELECT, INSERT, UPDATE,
DELETE).
o DCL (Data Control Language): Controls access to data (e.g., GRANT, REVOKE).
Operations:
Joins: Combines data from multiple tables (e.g., INNER JOIN, LEFT JOIN).
SQL (Structured Query Language) is used to manage and manipulate relational databases. Here’s a quick
overview of some common SQL commands, categorized by their purpose:
column1 datatype,
column2 datatype
);
JOIN: Combines rows from two or more tables based on a related column.
SELECT * FROM table1 JOIN table2 ON table1.id = table2.foreign_id;
Aggregate Functions
In SQL, data types are crucial for defining the kind of data that can be stored in a column of a
table. Here’s an overview of common SQL data types:
1. Numeric Types
2. String Types
4. Binary Types
5. Boolean Type
6. Special Types
ENUM: A string object with a value chosen from a list of permitted values.
SET: A string object that can have zero or more values, each of which must be chosen from a list.
4.4 DDL
DDL, or Data Definition Language, is a subset of SQL (Structured Query Language) used to
define and manage database structures. Here are some key DDL commands:
CREATE: Used to create new database objects, such as tables, indexes, and views.
name VARCHAR(100),
hire_date DATE
);
ALTER: Used to modify existing database objects. This can include adding or dropping
columns, changing data types, etc.
DROP: Used to delete database objects. Be careful, as this action is usually irreversible.
DROP TABLE employees;
TRUNCATE: Used to remove all records from a table, but it keeps the table structure for
future use.
4.5 Selection
In SQL, the SELECT statement is used to query and retrieve data from a database. Here are
some key components and examples of how to use SELECT:
Basic Syntax
sql
Copy code
Examples
LIMIT 10;
Grouping Results
Using JOINs
GROUP BY: Groups rows that have the same values in specified columns.
4.6 Projection
In SQL, projection refers to the operation of selecting specific columns from a table in a query.
This allows you to focus on just the data you need, rather than retrieving entire rows.
SELECT Statement: The main way to perform projection is through the SELECT statement.
Syntax:
Example: If you have a table named employees, and you only want to retrieve the name and
hire_date, you would write:
Selecting All Columns: To retrieve all columns from a table, you can use *:
SELECT * FROM employees;
Projection with Conditions: You can combine projection with WHERE to filter results:
Sure! In SQL, JOIN operations and SET operations allow you to combine and manipulate data
from multiple tables or result sets. Here’s a breakdown of both:
JOIN Operations
1. INNER JOIN
Returns all records from the right table, and matched records from the left table. If there's no
match, NULLs are returned for the left table.
Returns records when there is a match in either left or right table records. If there’s no match,
NULLs are returned for missing matches.
SELECT a.column1, b.column2FROM tableA aFULL JOIN tableB b ON a.common_column =
b.common_column;
5. CROSS JOIN
Returns the Cartesian product of the two tables, meaning every row from the first table is
combined with every row from the second table.
SET Operations
1. UNION
Combines the result sets of two or more SELECT queries and removes duplicate rows.
2. UNION ALL
Similar to UNION, but includes duplicate rows.
3. INTERSECT
Returns only the rows that are common to both SELECT queries.
Returns rows from the first SELECT query that are not present in the second SELECT query.
Suppose you have two tables, employees and departments, and you want to find employees
along with their department names:
Using UNION:
Aggregate functions in SQL are used to perform calculations on a set of values and return a
single value. They are often used with the GROUP BY clause to group rows that have the same
values in specified columns. Here are some commonly used aggregate functions:
1. COUNT()
Counts the number of rows in a specified column or the total number of rows if no column is
specified.
sql
Copy code
2. SUM()
sql
Copy code
SELECT SUM(salary) FROM employees; -- Total salary of all employees
3. AVG()
sql
Copy code
4. MIN()
sql
Copy code
5. MAX()
sql
Copy code
6. GROUP BY
Used to group rows that have the same values in specified columns. Aggregate functions can
then be applied to these groups.
sql
Copy code
Used to filter records after aggregation. It's similar to the WHERE clause but operates on
aggregated data.
sql
Copy code
Example
sql
Copy code
SELECT department_id,
COUNT(*) AS employee_count,
4.9 DML
DML, or Data Manipulation Language, is a subset of SQL used for managing data within existing
database structures. Here are the primary DML commands:
Copy code
sql
Copy code
INSERT INTO employees (id, name, hire_date, salary) VALUES (1, 'John Doe', '2023-01-15',
60000);
UPDATE: Modifies existing records in a table.
sql
Copy code
sql
Copy code
DELETE FROM employees WHERE id = 1;
4.10 Modification
In SQL, modifications to the data within a database can be accomplished using several
statements, primarily INSERT, UPDATE, and DELETE. Here’s a breakdown of each:
1. INSERT
Syntax:
sql
Copy code
Example:
sql
Copy code
2. UPDATE
Syntax:
sql
Copy code
Example:
sql
Copy code
3. DELETE
Syntax:
sql
Copy code
Example:
sql
Copy code
Important Considerations
WHERE Clause: Always use a WHERE clause with UPDATE and DELETE to avoid modifying or
deleting all records.
Transactions: Consider using transactions (BEGIN, COMMIT, ROLLBACK) for bulk updates or
inserts to ensure data integrity.
Example of a Transaction
sql
Copy code
BEGIN;
COMMIT;
4.11 Truncation
In SQL, truncation generally refers to the process of removing all rows from a table without
logging individual row deletions. It is primarily done using the TRUNCATE statement.
TRUNCATE Statement:
Syntax:
sql
Copy code
Speed: TRUNCATE is faster than DELETE because it does not log individual row deletions.
No WHERE Clause: You cannot use a WHERE clause with TRUNCATE. It removes all rows
unconditionally.
Resetting Identity: If the table has an identity column, TRUNCATE resets the identity seed back
to the starting value.
Usage:
Use TRUNCATE when you want to quickly remove all records from a table and do not need to
maintain a transaction log of individual deletions.
Example:
sql
Copy code
While TRUNCATE is generally a DDL command, it can be rolled back if used within a transaction
in databases that support this feature.
Example:
sql
Copy code
Permissions: You typically need higher privileges to execute a TRUNCATE command compared
to a DELETE command.
4.11 Constraints
In SQL, constraints are rules applied to columns in a table to enforce data integrity and ensure
the accuracy and reliability of the data. Here are the most commonly used types of constraints:
1. NOT NULL
sql
Copy code
);
2. UNIQUE
Ensures that all values in a column are unique across the table.
sql
Copy code
id INT UNIQUE,
);
3. PRIMARY KEY
A combination of NOT NULL and UNIQUE. It uniquely identifies each row in a table. A table can
have only one primary key.
sql
Copy code
name VARCHAR(100)
);
4. FOREIGN KEY
Establishes a relationship between two tables. It ensures that the value in a column (or a set of
columns) matches a value in another table's primary key or unique column.
sql
Copy code
name VARCHAR(100)
);
name VARCHAR(100),
department_id INT,
);
5. CHECK
sql
Copy code
CREATE TABLE employees (
);
6. DEFAULT
Specifies a default value for a column when no value is provided during record insertion.
sql
Copy code
name VARCHAR(100),
);
7. INDEX
While not a constraint in the traditional sense, indexes improve the speed of data retrieval
operations on a database table.
sql
Copy code
Here's how you might use several constraints when creating a table:
sql
Copy code
department_id INT,
);
4.12 Subquery
A subquery in SQL is a query nested within another query. It can be used in various clauses like
SELECT, WHERE, and FROM. Subqueries are helpful for breaking complex queries into simpler
parts, allowing you to perform operations that require results from multiple steps.
Types of Subqueries
Basic Examples
1. Single-Row Subquery
Used in a WHERE clause to compare a value against a single value returned by the subquery.
sql
Copy code
2. Multiple-Row Subquery
Used with IN, ANY, or ALL to compare against multiple values returned by the subquery.
sql
Copy code
3. Correlated Subquery
References columns from the outer query, making it dependent on the outer query’s row.
sql
Copy code
SELECT e1.first_name, e1.salary FROM employees e1 WHERE e1.salary > (SELECT AVG(e2.salary)
FROM employees e2 WHERE e2.department_id = e1.department_id);
You can also use a subquery in the FROM clause, treating the result as a temporary table.
sql
Copy code
FROM employees
Performance: Subqueries can be less efficient than joins, especially if they are executed
multiple times (as in correlated subqueries).
Readability: While subqueries can simplify complex queries, overusing them can lead to
confusion. Balancing readability and performance is key.
NULL Handling: Be cautious when using subqueries with NULL values, as they can affect results,
especially with comparisons.