Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
27 views90 pages

Unit 1: 2 Data and Information: 1.2.1. Data

The document provides an introduction to Database Management Systems (DBMS), detailing their functions, objectives, advantages, and types of architecture. It explains the concepts of data and information, the structure of databases, and the importance of data integrity, security, and efficient retrieval. Additionally, it covers the Entity-Relationship (ER) model used in database design, including its components and steps for implementation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views90 pages

Unit 1: 2 Data and Information: 1.2.1. Data

The document provides an introduction to Database Management Systems (DBMS), detailing their functions, objectives, advantages, and types of architecture. It explains the concepts of data and information, the structure of databases, and the importance of data integrity, security, and efficient retrieval. Additionally, it covers the Entity-Relationship (ER) model used in database design, including its components and steps for implementation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 90

Unit 1

1.1 Introduction to DBMS

A Database Management System (DBMS) is software that facilitates the creation, manipulation, and
management of databases. It provides an interface for users to interact with data efficiently and securely.
DBMS handles tasks like data storage, retrieval, and updating, while ensuring data integrity and
consistency. Common types include relational, hierarchical, and object-oriented databases. Examples of
popular DBMSs include MySQL, PostgreSQL, and Oracle.

1.2 DATA AND INFORMATION :

1.2.1. Data

Data in a DBMS is the raw, unprocessed facts and figures stored in the database. It represents the
basic units of information that are input into the system and typically organized in a structured
format.

1.2.2. Information

Information in a DBMS is the result of processing, organizing, or analyzing data to provide context and
meaning. It is derived from querying and manipulating data to support decision-making, reporting, or
analysis.

1.3. Database

A database is a systematic collection of data that is organized to allow for efficient retrieval, management,
and manipulation. It provides a way to store, organize, and manage data so that it can be easily accessed
and utilized. Here's a comprehensive overview of databases, including their key components and types.

Data within the most common types of databases in operation today is typically modeled in rows and
columns in a series of tables to make processing and data querying efficient. The data can then be easily
accessed, managed, modified, updated, controlled, and organized. Most databases use structured query
language (SQL) for writing and querying data.

1.4. Database Management System

A Database Management System (DBMS) is a software system that is designed to manage and organize
data in a structured manner. It allows users to create, modify, and query a database, as well as manage the
security and access controls for that database. DBMS provides an environment to store and retrieve the
data in convenient and efficient manner.

1.4.1 Objectives Of DBMS

The primary objectives of a Database Management System (DBMS) are to provide an efficient, reliable,
and secure means of storing, managing, and retrieving data.
The main objectives of database management system :

1. Data Organization and Management


2. Efficient Data Retrieval and Querying
3. Data Security and Privacy
4. Data Integrity and Accuracy
5. Transaction Management
6. Backup and Recovery
7. Data Redundancy and Consistency
8. Scalability and Performance
9. Data Abstraction and Independence
10. User and Developer Support

1. Data Organization and Management

Objective: To systematically organize and manage data so that it is easily accessible and efficiently
maintained.

 Structure: Provides a structured way to store data in tables, records, and fields, adhering to a
defined schema.
 Data Integrity: Ensures data accuracy and consistency through constraints, relationships, and
rules.

2. Efficient Data Retrieval and Querying

Objective: To enable quick and efficient access to data through various querying mechanisms.

 Query Languages: Supports languages like SQL for creating complex queries to retrieve and
manipulate data.
 Indexing: Uses indexing techniques to speed up data retrieval and improve query performance.

3. Data Security and Privacy

Objective: To protect data from unauthorized access and ensure privacy.

 Access Control: Implements user authentication and authorization mechanisms to control who
can access and modify data.
 Encryption: Provides encryption for data at rest and in transit to protect sensitive information.

4. Data Integrity and Accuracy

Objective: To maintain the accuracy and consistency of data throughout its lifecycle.

 Constraints: Enforces rules such as primary keys, foreign keys, and unique constraints to ensure
data validity.
 Validation: Ensures that data entered into the database meets specified criteria and constraints.

5. Transaction Management

Objective: To handle transactions in a way that ensures data consistency and reliability.

 ACID Properties: Ensures transactions are Atomic, Consistent, Isolated, and Durable to
maintain data integrity even in cases of system failure or concurrent access.
 Concurrency Control: Manages simultaneous data access and modifications by multiple users to
prevent conflicts and ensure consistency.

6. Backup and Recovery

Objective: To safeguard data against loss and corruption, and to enable recovery in case of failures.

 Backup Procedures: Provides mechanisms for regular data backups to prevent data loss.
 Recovery Mechanisms: Ensures that data can be restored to a consistent state after a failure or
corruption.

7. Data Redundancy and Consistency

Objective: To minimize data duplication and ensure that all data is consistent across the database.

 Normalization: Uses normalization techniques to reduce data redundancy and improve data
integrity.
 Centralized Management: Centralizes data management to avoid duplication and maintain
consistency.

8. Scalability and Performance

Objective: To ensure that the database can handle increasing amounts of data and user load efficiently.

 Scalability: Supports horizontal and vertical scaling to accommodate growth in data volume and
user activity.
 Performance Optimization: Implements techniques such as caching, indexing, and query
optimization to enhance performance.

9. Data Abstraction and Independence

Objective: To provide a layer of abstraction between the physical data storage and the user applications.

 Data Abstraction: Allows users to interact with the data without needing to understand the
underlying physical storage details.
 Data Independence: Ensures that changes to the database schema or storage do not impact
application programs or user queries.

10. User and Developer Support

Objective: To provide tools and features that support users and developers in managing and interacting
with the database.

 User Interfaces: Offers graphical user interfaces (GUIs) and command-line tools for interacting
with the database.
 Development Tools: Provides tools and libraries for application developers to integrate and
interact with the database.

1.4.2 Advantages of DBMS

Database Management Systems (DBMS) offer numerous advantages that significantly enhance the
management, accessibility, and integrity of data. Here are some key advantages of using a DBMS:

1. Improved Data Management


2. Data Integrity and Accuracy
3. Data Security
4. Efficient Data Retrieval and Querying
5. Data Redundancy and Consistency
6. Backup and Recovery
7. Transaction Management
8. Data Abstraction and Independence
9. Improved Data Sharing and Collab oration
10. Scalability and Performance

1. Improved Data Management

Objective: Streamline the organization, storage, and retrieval of data.

 Structured Data Storage: Organizes data into tables, rows, and columns, making it easier to
manage and retrieve.
 Centralized Management: Provides a single point of access to data, simplifying administration
and maintenance.

2. Data Integrity and Accuracy

Objective: Ensure that data remains accurate, consistent, and reliable.


 Constraints and Rules: Enforces data integrity constraints (e.g., primary keys, foreign keys) to
maintain accuracy and prevent anomalies.
 Validation: Ensures that data entered into the database adheres to defined formats and rules.

3. Data Security

Objective: Protect data from unauthorized access and breaches.

 Access Controls: Implements user authentication and authorization to control who can access or
modify data.
 Encryption: Provides encryption options for data at rest and in transit to enhance security.

4. Efficient Data Retrieval and Querying

Objective: Facilitate quick and effective data access and manipulation.

 Query Languages: Supports powerful query languages (e.g., SQL) for retrieving and
manipulating data.
 Indexing: Uses indexing to speed up data retrieval operations and improve query performance.

5. Data Redundancy and Consistency

Objective: Minimize duplication and ensure consistency across the database.

 Normalization: Applies normalization techniques to reduce data redundancy and avoid


inconsistencies.
 Centralized Storage: Stores data in a central location, reducing the need for multiple copies and
updates.

6. Backup and Recovery

Objective: Protect data from loss and ensure recovery in case of failure.

 Regular Backups: Provides tools for performing regular backups of data to prevent loss.
 Recovery Mechanisms: Includes mechanisms for restoring data from backups and recovering
from system failures.

7. Transaction Management

Objective: Ensure the integrity and consistency of data during concurrent operations.

 ACID Properties: Guarantees that transactions are Atomic, Consistent, Isolated, and Durable,
ensuring reliable processing of transactions.
 Concurrency Control: Manages simultaneous access to data by multiple users to prevent
conflicts and maintain consistency.

8. Scalability and Performance

Objective: Handle growing amounts of data and increasing user load efficiently.

 Scalability: Supports horizontal and vertical scaling to accommodate growth in data and user
activity.
 Performance Optimization: Implements techniques like caching, indexing, and query
optimization to enhance performance.

9. Data Abstraction and Independence

Objective: Provide a layer of abstraction between the data and the applications that use it.

 Data Abstraction: Allows users to interact with the data without needing to understand the
underlying storage details.
 Data Independence: Ensures that changes to the database schema do not affect application
programs or queries.

10. Improved Data Sharing and Collaboration

Objective: Facilitate sharing and collaboration among users and applications.

 Multi-User Access: Supports concurrent access to data by multiple users and applications,
enhancing collaboration.
 Centralized Access: Provides a centralized platform for accessing and sharing data across
different applications and users.

 Automated Tasks: Automates routine data management tasks, such as backups and maintenance,
reducing administrative overhead.
1.4.3 Components of DBMS

1.5. Archeitecture of dbms :

A database has an immense number of important data that can be securely and swiftly accessed.
Therefore, choosing the right architecture is crucial for effective data management. When connecting to
the database, users can complete their queries more quickly thanks to the DBMS Architecture. Our
decision of database architecture is influenced by a number of variables, including the database's size,
user count, and user relationships. We typically utilize two different kinds of database models: logical
models and physical models.
Types of dbms architecture
There are several types of DBMS Architecture that we use according to the usage requirements. Types
of DBMS Architecture are discussed here.

 1-Tier Architecture

 2-Tier Architecture

 3-Tier Architecture

1.5.1. Tier 1 Architecture :


The client, server, and database are all located on the same system in a 1-tier architecture, where
the user can access the database by simply sitting on the DBMS. For instance, we set up a
database and SQL server on the local PC in order to learn SQL. This allows us to perform
operations and communicate with the relational database directly. The industry will obviously
choose 2-tier and 3-tier architecture instead of this one.

Advantages of Tier 1 Architecture :

1.Basic Architecture: Because it just needs one machine to run, 1-Tier Architecture is the
easiest to set up and manage.
2.Cost-Effective: The 1-Tier Architecture is inexpensive to implement because it doesn't
require any more hardware.
3.Simple to Use: Because Tier Architecture is so simple to implement, modest projects are
the main applications for it.

1.5.2. Tier 2 Architecture:

A simple client-server design is comparable to the two-tier architecture. The client-side program
establishes direct communication with the server-side database. For this kind of interaction, APIs
like JDBC and ODBC are utilized. Query processing and transaction management functions are
handled by the server side. User interfaces and application programs are executed on the client
side. In order to communicate with the DBMS, the client-side application connects to the server.
This kind has the benefit of being simpler to learn and maintain, and it works well with current
systems. However, when there are a lot of users, this approach performs poorly.

Advantages of Tier 2 Architecture :

1.Easy to Access: 2-Tier Architecture makes easy access to the database, which makes fast
retrieval.

2.Scalable: We can scale the database easily, by adding clients or upgrading hardware.

3.Low Cost: 2-Tier Architecture is cheaper than 3-Tier Architecture and Multi-Tier
Architecture.

3.Easy Deployment: 2-Tier Architecture is easier to deploy than 3-Tier Architecture.

4. Simple: 2-Tier Architecture is easily understandable as well as simple because of only two
components.

1.5.3. Tier 3 Architecture:

Between the client and the server in a 3-tier architecture, there is an additional layer. There is no
direct communication between the client and the server. Rather, it communicates with an
application server, which connects to the database system through more communication,
allowing for the processing of queries and transaction management. The server and the client can
communicate partially processed data through the use of this intermediary layer. It is with
massive web applications that this kind of architecture is used.

Advantages of Tier 3 Architecture :

1.Improved scalability: The dispersed deployment of application servers improves scalability.


The client and server no longer need to establish separate connections.
2.Data Integrity is preserved by 3-Tier Architecture. Data corruption can be prevented or
eliminated since there is a middle layer positioned between the client and the server.
3. Security is enhanced by 3-tier architecture. This kind of approach limits the amount of illegal
data that can be accessed by preventing direct client-server contact.

1.6. ER (Entity-Relationship) Model

An Entity-Relationship (ER) Model is a conceptual framework used in database design to


represent the structure of a database. It helps in organizing and defining the data elements and
their relationships, making it easier to design a database that accurately reflects the needs of an
application or system.

ER Model Steps

1. Identify Entities and Relationships: Determine the objects and their interactions in the
system.

2. Define Attributes: Specify the properties for each entity and relationship.

3. Determine Cardinality: Establish how entities relate to each other in terms of quantity.

4. Draw ER Diagram: Create a graphical representation to visualize the database structure.

5. Convert ER Model to Relational Schema: Translate the ER diagram into a relational


schema for implementation in a database management system (DBMS).

1.6.1. Components of ER Diagram

An Entity-Relationship (ER) diagram is a key tool in database design, used to visually represent the
structure and relationships within a database. Here's a detailed look at the primary components of an ER
diagram:

1. Entities

2. Attributes

3. Relationships

4. Cardinality

5. Participation Constraints

6. Weak Entities

7. Generalization and Specialization

8. Aggregation
1. Entities

 Definition: Objects or concepts that can have data stored about them. Entities are typically things
or concepts that are distinct and can be identified individually.

 Representation:

o Symbol: Rectangle

o Example: Customer, Order, Product

2. Attributes

 Definition: Properties or characteristics of an entity that provide more details about it.

 Types:

o Simple Attribute: An attribute that cannot be divided further (e.g., ID, Name).

o Composite Attribute: An attribute that can be divided into smaller sub-parts (e.g.,
Address can be split into Street, City, ZipCode).

o Derived Attribute: An attribute whose value can be derived from other attributes (e.g.,
Age can be derived from DateOfBirth).

o Multi-Valued Attribute: An attribute that can hold multiple values (e.g., PhoneNumbers
for a person).

 Representation:

o Symbol: Oval

o Connection: Lines connecting attributes to their respective entities

3. Relationships

 Definition: Associations between entities that describe how the entities interact with each other.

 Types:

o One-to-One (1:1): An instance of entity A is associated with at most one instance of


entity B, and vice versa.

o One-to-Many (1
): An instance of entity A can be associated with multiple instances of entity B, but each
instance of B is associated with only one instance of A.

o Many-to-Many (M

): Instances of entity A can be associated with multiple instances of entity B, and vice
versa.

 Representation:

o Symbol: Diamond

o Connecting Lines: Lines that connect the diamond to the involved entities

4. Cardinality

 Definition: Specifies the number of instances of one entity that can or must be associated with
each instance of another entity.

 Representation:

o Symbols: Numerical values or symbols such as 1, N, or M placed near the relationship


lines to indicate the cardinality (e.g., 1, N, 0..1).

5. Participation Constraints

 Definition: Indicates whether all or only some entity instances participate in a relationship.

 Types:

o Total Participation: Every instance of the entity is involved in the relationship


(represented with a double line).

o Partial Participation: Only some instances of the entity participate in the relationship
(represented with a single line).

6. Weak Entities

 Definition: Entities that cannot be uniquely identified by their own attributes alone and require a
relationship with another (strong) entity to be uniquely identified.

 Representation:

o Symbol: Double rectangle


o Relationship: Usually connected with a double diamond to indicate the dependency on
the strong entity.

7. Generalization and Specialization

 Definition:

o Generalization: The process of extracting common characteristics from multiple entities


to create a generalized entity.

o Specialization: The process of defining a set of subclasses from a general entity based on
certain attributes.

 Representation:

o Symbol for Generalization: A triangle with a line connecting it to the generalized entity,
and lines to the specialized entities.

o Symbol for Specialization: Similar to generalization, often depicted as a hierarchical


structure.

8. Aggregation

 Definition: A higher-level abstraction used to simplify complex ER diagrams by grouping


entities and relationships into a single higher-level entity.

 Representation:

o Symbol: A dashed ellipse or a rectangle around a group of entities and relationships.

1.6.2. Relationship Degree in ER model

In the Entity-Relationship (ER) model, the degree of a relationship specifies the number of
participating entities in that relationship. Here are the common degrees of relationships:

1. Unary (or Recursive) Relationship:

o Definition: A relationship between instances of the same entity set.

o Example: An employee supervising another employee.

2. Binary Relationship:
o Definition: A relationship between two distinct entity sets.

o Example: A Student enrolling in a Course.

3. Ternary Relationship:

o Definition: A relationship involving three distinct entity sets.

o Example: A Supplier supplying a Product to a Customer.

4. N-ary Relationship:

o Definition: A generalized form where a relationship involves more than three


entity sets.

o Example: A Project being handled by multiple Employees in various Departments


over different TimePeriods.

1.6.3. Classification of ER Model

The ER (Entity-Relationship) model can be classified into different types based on its
complexity and the nature of the relationships it represents. Here’s a classification of the ER
model:

1. Basic ER Model

 Description: The foundational model that includes entities, attributes, and relationships.
It covers basic components and their connections but does not include advanced
concepts.

 Components: Entities, Attributes, Relationships, Cardinality, Participation Constraints.

2. Extended ER Model (EER Model)

 Description: An enhancement of the basic ER model that includes additional features to


handle more complex data structures.

 Components:

o Subclasses and Superclasses: For representing generalization and specialization.

o Inheritance: Allows entities to inherit attributes and relationships from other


entities.
o Aggregation: To represent a higher-level abstraction grouping entities and
relationships.

3. Object-Oriented ER Model (O-O ER Model)

 Description: Integrates object-oriented concepts with the ER model, focusing on objects


rather than just entities.

 Components:

o Classes: Represent entities.

o Methods: Represent operations or functions related to classes.

o Inheritance: Represents class hierarchies.

o Encapsulation: Data hiding within objects.

4. Hierarchical ER Model

 Description: Represents data in a tree-like structure where entities are organized in a


hierarchy.

 Components: Parent and child relationships, hierarchical relationships.

5. Network ER Model

 Description: Represents data using a graph structure where entities and relationships
form a network of interconnected nodes.

 Components: Nodes (entities), Edges (relationships), Network constraints.

6. Temporal ER Model

 Description: Designed to handle time-varying data, where the historical aspects of data
and changes over time are considered.

 Components: Temporal entities, Temporal relationships, Time attributes.

7. Multivalued ER Model

 Description: Extends the basic ER model to handle attributes that can have multiple
values for a single entity.
 Components: Multivalued attributes, Relationships involving multivalued attributes.

1.7. ISA Relationship

In the context of Database Management Systems (DBMS), ISA (Inheritance-Specialization-


Association) relationships are concepts borrowed from object-oriented programming and are
used to model hierarchical relationships and structural organization in a database. Here’s a
breakdown of these concepts and how they are applied:

1. Inheritance

Inheritance is a mechanism where a new class or entity inherits properties and behaviors from an
existing class or entity. In databases, this is typically represented as a parent-child relationship
between entities. The child entity inherits attributes and relationships from the parent entity.

 Example: Suppose you have a Person entity with attributes like PersonID, Name, and Address. You
might have a child entity Employee that inherits these attributes from Person and adds additional
attributes like EmployeeID, Position, and Salary.

2. Specialization

Specialization is the process of defining a more specific entity from a general entity. This is
essentially the creation of sub-entities from a more generalized parent entity, where the sub-
entities (or specialized entities) inherit common attributes from the parent.

 Example: In a database, a Vehicle entity might be specialized into Car, Truck, and Motorcycle.
Each specialized entity will inherit common attributes from Vehicle like VehicleID, Make, and
Model, but will also have specific attributes related to their type.

3. Association

Association refers to the relationships between entities. In a database, associations define how
entities interact with each other and how they are related. This could be one-to-one, one-to-
many, or many-to-many relationships.

 Example: If you have a Customer entity and an Order entity, the relationship between these two
entities can be modeled as an association where one customer can place many orders (one-to-
many relationship).

Combining ISA Relationships

In a DBMS, combining ISA relationships helps in modeling complex real-world scenarios. For
instance, an Employee might be both a FullTimeEmployee and a PartTimeEmployee through
specialization and inheritance.
1. Generalization (Opposite of Specialization): You start with specific entities (like
FullTimeEmployee and PartTimeEmployee) and generalize them into a common parent entity
(Employee).

2. Hierarchy: A hierarchical model is formed where Employee is the parent entity, and
FullTimeEmployee and PartTimeEmployee are child entities.

3. Association: The Employee entity might be associated with other entities like Department or
Project, indicating how employees are linked to various departments or projects.

1.8. Constrains

Constraints are rules or conditions imposed on the data in a database table that restrict the type of data
that can be inserted, updated, or deleted, ensuring the data adheres to certain standards and relationships.
Constraints help enforce data integrity and consistency by validating data according to predefined rules

1.8.1.Key Characteristics of Constraints

1. Data Integrity: Constraints ensure that the data remains accurate and reliable. They
enforce rules on the data to prevent entry of incorrect or inconsistent information.

2. Consistency: Constraints help maintain consistent data across the database by enforcing
rules and relationships between tables.

3. Enforcement: Constraints are enforced by the DBMS, which checks the data against
these rules during operations like insertion, updating, or deletion.

4. Error Prevention: By defining constraints, a DBMS prevents the entry of invalid data,
thereby reducing the risk of errors and inconsistencies.

1.8.2 Types of Constraints

1. Primary Key Constraint: Ensures that each row in a table has a unique identifier and
that no two rows have the same value for this identifier. It also enforces that the primary
key column(s) cannot contain NULL values.

2. Foreign Key Constraint: Ensures referential integrity by requiring that a value in one
table (the foreign key) must match a value in another table (the primary key or unique
key), maintaining valid relationships between tables.

3. Unique Constraint: Ensures that all values in a column or a combination of columns are
unique, meaning no two rows can have the same values for the specified columns. It
allows NULL values unless explicitly specified otherwise.
4. Not Null Constraint: Ensures that a column cannot contain NULL values, making it
mandatory for every row to have a valid value for that column.

5. Check Constraint: Enforces domain integrity by specifying a condition or expression


that values in a column must satisfy. It restricts the range of valid values for the column.

6. Default Constraint: Provides a default value for a column when no value is specified
during data insertion, ensuring that a column always has a valid value.

1.8.3. Purpose and Benefits

 Accuracy: Ensures that data entered into the database is correct and adheres to predefined rules.

 Reliability: Maintains the reliability of the database by enforcing consistent data entry and
relationships.

 Data Quality: Improves data quality by preventing invalid or duplicate entries.

 Business Rules Enforcement: Implements business logic directly in the database schema,
ensuring that business rules are followed.

1.9.Aggregation

Aggregation is a special type of association that represents a "whole-part" relationship between


entities where the part can exist independently of the whole. It is a way to model relationships
where an entity can be a part of another entity but is not strictly dependent on it for its existence.

1.9.1 Characteristics of Aggregation:

 Loose Coupling: The lifecycle of the part (or component) is independent of the lifecycle of the
whole (or container). If the whole is deleted, the part can still exist.

 Hierarchical Relationship: Aggregation implies a "has-a" relationship between the whole and its
parts.

 Reusability: The same part can be associated with multiple wholes.

Example:

Consider a Department and an Employee in an organization. A Department might have multiple


Employees. If a department is deleted, the employees may still exist and might be reassigned to
another department. Here, Employee is a part of Department, but it does not depend on it for its
existence.

ER Diagram Representation: In an Entity-Relationship (ER) diagram, aggregation is typically


represented with a diamond shape connecting the whole and part entities.
1.10. Composition

Composition is a stronger form of aggregation and represents a more tightly coupled "whole-
part" relationship where the part cannot exist independently of the whole. The lifecycle of the
part is tied to the lifecycle of the whole.

1.10.1.Characteristics of Composition:

 Strong Coupling: The part cannot exist without the whole. If the whole is deleted, the parts are
also deleted.

 Ownership: The whole entity owns its parts, meaning the parts are created and destroyed with
the whole.

 Exclusive Relationship: A part can only be associated with one whole at a time.

1.10.2.Example:

Consider a House and Room in a real estate database. A House is composed of multiple Rooms. If
the house is demolished, the rooms are no longer relevant and are also destroyed. Here, Room is a
part of House and cannot exist without it.

ER Diagram Representation: In an ER diagram, composition is usually represented by a solid


diamond shape connecting the whole and part entities, indicating a stronger relationship
compared to aggregation.

Comparison and Use Cases

 Aggregation is used when you want to represent a relationship where the parts can exist
independently of the whole. It is suitable for modeling scenarios where parts can be
shared among different wholes or where parts are not exclusively tied to one whole.

 Composition is used when you need to represent a relationship where parts are
dependent on the whole and cannot exist separately. It is ideal for scenarios where the
whole and parts have a tightly coupled lifecycle and ownership.

1.11 .Advantages of DBMS

Database Management Systems (DBMS) provide several advantages over traditional file-based
data management systems. Here are some of the key benefits:

1. Data Integrity and Accuracy

 Consistency: DBMS enforces constraints (e.g., primary keys, foreign keys) that ensure data
accuracy and consistency across the database.
 Validation: Rules and constraints help prevent the entry of invalid data, reducing errors and
inconsistencies.

2. Data Security

 Access Control: DBMS provides mechanisms for defining user roles and permissions, ensuring
that only authorized users can access or modify data.

 Encryption: Many DBMSs support data encryption to protect sensitive information from
unauthorized access.

3. Data Redundancy Reduction

 Normalization: DBMS uses normalization techniques to eliminate redundancy and ensure that
data is stored efficiently, reducing duplicate data entries.

 Centralized Data Management: Data is stored in a centralized repository, minimizing data


duplication and ensuring consistency.

4. Data Consistency

 Transaction Management: DBMS supports transactions that ensure data is updated consistently.
Transactions follow the ACID (Atomicity, Consistency, Isolation, Durability) properties to
maintain data integrity even in the case of system failures.

 Concurrency Control: DBMS handles multiple users accessing and modifying data
simultaneously, ensuring that transactions do not interfere with each other.

5. Efficient Data Access and Retrieval

 Indexing: DBMS uses indexing techniques to speed up data retrieval operations, making queries
more efficient.

 Query Optimization: Advanced query optimization techniques are employed to improve the
performance of data retrieval and manipulation.

6. Data Independence

 Logical and Physical Data Independence: DBMS abstracts the physical storage details from the
users and applications, allowing changes to the database schema without affecting the application
programs that interact with the database.

7. Data Recovery and Backup

 Automatic Backups: DBMS provides tools and mechanisms for automatic data backups and
recovery, ensuring that data can be restored in case of hardware failures, data corruption, or other
issues.
 Recovery Mechanisms: DBMS includes recovery features to restore the database to a consistent
state after a failure or crash.

8. Data Sharing

 Multi-user Environment: DBMS allows multiple users to access and share data concurrently,
supporting collaborative work and ensuring data consistency across the organization.

 Data Integration: DBMS facilitates integration of data from various sources, providing a unified
view of the data.

9. Efficient Data Management

 Data Modeling: DBMS allows for sophisticated data modeling, enabling the creation of complex
relationships and structures to represent real-world entities and their interactions.

 Scalability: Modern DBMSs are designed to handle large volumes of data and can be scaled up
to meet increasing demands.

10. Reduced Data Entry and Maintenance Costs

 Centralized Management: Centralized data management reduces the cost and effort associated
with maintaining multiple data copies and ensures that updates are made in a single place.

 Automation: DBMS automates many data management tasks, such as backups and indexing,
reducing manual effort and operational costs.

11. Improved Decision-Making

 Reporting and Analysis: DBMS provides powerful tools for data reporting, analysis, and
visualization, enabling better decision-making based on accurate and up-to-date information.

 Data Mining: Advanced DBMSs support data mining techniques to extract valuable insights
from large datasets.

12. Data Abstraction

 Data Models: DBMS uses various data models (e.g., hierarchical, network, relational, object-
oriented) to provide an abstract view of the data, simplifying interaction and manipulation.
Unit 2
2.1 Relational Model

The relational model is a framework for managing and structuring data in a database. It was introduced by
E.F. Codd in 1970 and remains a foundational concept in database management systems (DBMS) today.

2.1.1 Relational Algebra and Calculus:

To query and manipulate data, the relational model uses formal languages:

 Relational Algebra: A procedural query language that uses operations like selection, projection,
union, difference, and Cartesian product to retrieve and manipulate data.
 Relational Calculus: A declarative query language that specifies what data to retrieve rather than
how to retrieve it. It includes tuple relational calculus and domain relational calculus.

2.2 Codd’s rule

Codd’s rules are a set of thirteen principles proposed by Edgar F. Codd, the inventor of the relational
database model, to define what is required for a database system to be considered truly relational. These
rules are intended to ensure that a database management system (DBMS) adheres to the relational model's
principles and provides a consistent, flexible, and efficient data management environment.

1. **Information Rule**: All information in a relational database should be represented explicitly at the
logical level and in exactly one way — by storing it in tables. This means that data should be stored in
tables and all data should be accessible through these tables.

2. **Guaranteed Access Rule**: Every data element (atomic value) should be logically accessible by
using a combination of table name, primary key, and column name. This implies that you should be able
to retrieve any data value through a simple query based on its table and column.

3. **Systematic Treatment of Null Values**: Null values (representing missing or inapplicable


information) must be uniformly treated across the database. Nulls should be distinguishable from empty
strings or numeric zeros, and their presence should not interfere with querying and data integrity.

4. **Dynamic On-Line Catalog Based on the Relational Model**: The database's catalog (metadata or
schema) should itself be stored in the database and accessible using the same query language as user data.
This means that the schema should be represented as tables that can be queried.

5. **Comprehensive Data Sublanguage Rule**: The system must support a comprehensive language
that includes data definition, data manipulation, and transaction management operations. This language
should be capable of handling all aspects of database interactions.

6. **View Updating Rule**: Any view that is theoretically updatable must be updatable by the system.
In other words, if a view (a virtual table) is created from one or more base tables, it should be possible to
perform insert, update, and delete operations through this view if the view's structure permits it.
7. **High-Level Insert, Update, and Delete**: The system should support set-based operations for data
manipulation. Instead of operating on individual rows, the system should allow for batch operations on
sets of rows.

8. **Physical Data Independence**: Changes to the physical storage of data (e.g., changing the
hardware or storage structure) should not require changes to the logical schema (the way data is
represented and accessed). This means that the database's physical implementation details should be
abstracted away from the user.

9. **Logical Data Independence**: Changes to the logical schema (e.g., adding or removing tables or
columns) should not require changes to the application programs that use the data. This implies that
applications should be insulated from changes in the logical structure of the database.

10. **Integrity Independence**: Integrity constraints (rules that ensure data correctness) must be
specified separately from application programs and stored in the database catalog. This ensures that
constraints are maintained consistently across all applications.

11. **Distribution Independence**: The database should be able to operate and be accessed regardless
of whether the data is distributed across multiple locations or stored in a single location. This means the
system should handle data distribution transparently.

12. **Non-Subversion Rule**: If the system provides a low-level or non-relational interface (such as
file system access), it should not bypass or subvert the integrity and security rules enforced by the
relational model. In other words, access methods that bypass the relational model should still adhere to
the system’s rules and constraints.

These rules set a high standard for relational database systems and help ensure that they provide a robust,
flexible, and consistent environment for managing data. While not all modern relational database systems
meet every rule perfectly, these principles continue to influence database design and management
practices.

2.3 Relational Data Model


The Relational Data Model is a foundational framework in database management systems (DBMS) for
organizing and manipulating data. Introduced by Edgar F. Codd in 1970, this model represents data in a
structured way using relations (often visualized as tables). Here are the core concepts of the Relational
Data Model:

1. Tables (Relations)

 Definition: A table, or relation, is a collection of tuples (rows) that share the same attributes
(columns).
 Structure: Each table has a unique name and consists of rows and columns. Each column has a
specific data type, and each row represents a single record.

2. Attributes (Columns)

 Definition: Attributes define the properties or fields of a table. Each attribute has a specific data
type and constraints.
 Example: In a table for "Employees", attributes might include EmployeeID, FirstName,
LastName, and Department.

3. Tuples (Rows)

 Definition: A tuple is a single record in a table. It consists of values for each attribute in the table.
 Example: A row in the "Employees" table might have values like (1, 'John', 'Doe', 'Marketing').

4. Primary Key

 Definition: A primary key is an attribute or a set of attributes that uniquely identifies each tuple
in a table. It ensures that no two rows have the same key value.
 Example: In the "Employees" table, EmployeeID could be the primary key.

5. Foreign Key

 Definition: A foreign key is an attribute in one table that refers to the primary key of another
table. It establishes a relationship between the two tables.
 Example: If there's a "Departments" table with DepartmentID as the primary key, then the
"Employees" table might include a DepartmentID foreign key to link each employee to their
department.

6. Relationships

 Types:
o One-to-One: A single record in one table is related to a single record in another table.

o One-to-Many: A single record in one table is related to multiple records in another table.
o Many-to-Many: Multiple records in one table are related to multiple records in another
table, typically managed using a junction table.

7. Normalization

 Definition: Normalization is the process of organizing data to minimize redundancy and improve
data integrity. It involves dividing tables into related tables and defining relationships.
 Forms:
o 1NF (First Normal Form): Ensures that each column contains atomic (indivisible)
values.
o 2NF (Second Normal Form): Achieved when the table is in 1NF and all non-key
attributes are fully functionally dependent on the primary key.
o 3NF (Third Normal Form): Achieved when the table is in 2NF and all attributes are
only dependent on the primary key, not on other non-key attributes.

8. Integrity Constraints

 Definition: Integrity constraints are rules applied to the data to ensure accuracy and consistency.
 Types:
o Entity Integrity: Ensures that each table has a primary key and that the primary key
values are unique and not null.
o Referential Integrity: Ensures that foreign key values in one table match primary key
values in another table or are null.

9. SQL (Structured Query Language)

 Definition: SQL is the standard language used to interact with relational databases. It is used for
querying, updating, and managing data.
 Operations: Includes commands such as SELECT, INSERT, UPDATE, DELETE, and CREATE
TABLE.

The relational model's power lies in its simplicity and the use of mathematical concepts, particularly set
theory, to model data and relationships. This structure allows for flexible querying and data manipulation,
which has made it the dominant paradigm in database management.

2.4 Key
In the relational data model, keys and integrity constraints are fundamental to maintaining the
accuracy, consistency, and reliability of the data stored in a relational database. Here's a detailed
overview of both concepts:
Keys are attributes or sets of attributes that are used to uniquely identify tuples (rows) within a
table or to establish relationships between tables. There are several types of keys:

Primary Key

 Definition: A primary key is an attribute or a combination of attributes that uniquely


identifies each row in a table.
 Properties:
o Uniqueness: Each primary key value must be unique across all rows in the table.
o Non-nullability: Primary key attributes cannot contain null values.
 Example: In a table Students, the StudentID might be a primary key because each
student has a unique identifier.

Candidate Key

 Definition: A candidate key is an attribute or a set of attributes that could serve as a


unique identifier for a table. Each candidate key is capable of uniquely identifying a row.
 Properties:
o Uniqueness: Each candidate key value must be unique.
o Minimality: No subset of a candidate key should be able to uniquely identify
rows.
 Example: In a table Employees, both EmployeeID and Email could be candidate keys if
each is unique.

Alternate Key

 Definition: An alternate key is any candidate key that is not chosen as the primary key. It
still provides a unique identification for rows in the table.
 Example: If EmployeeID is the primary key in the Employees table, then Email and
PhoneNumber could be alternate keys.

Composite Key

 Definition: A composite key is a primary key that consists of two or more attributes. It is
used when a single attribute alone cannot uniquely identify a row.
 Example: In a table CourseEnrollments, a composite key might be a combination of
StudentID and CourseID to uniquely identify each enrollment.

Foreign Key
 Definition: A foreign key is an attribute or set of attributes in one table that refers to the
primary key of another table. It is used to establish and enforce relationships between
tables.
 Properties:
o Referential Integrity: Foreign key values must match primary key values in the
referenced table or be null.
 Example: In a table Orders, CustomerID could be a foreign key that references the
CustomerID in the Customers table.

2.5 Integrity Constraints


Integrity constraints are rules applied to the data in a database to ensure its accuracy and
consistency. Key integrity constraints help enforce the correct use of keys and maintain
relationships between tables.

Entity Integrity

 Definition: Entity integrity ensures that each table has a primary key and that the primary
key values are unique and not null.
 Rules:
o Primary Key Uniqueness: Each row in the table must have a unique primary key
value.
o Primary Key Non-nullability: Primary key columns cannot contain null values.
 Purpose: Ensures that every record in the table can be uniquely identified.

Referential Integrity

 Definition: Referential integrity ensures that a foreign key value in one table matches a
primary key value in the referenced table or is null.
 Rules:
o Valid References: Foreign key values must exist in the referenced table or be
null.
o Actions on Update/Delete:
 ON DELETE CASCADE: Automatically deletes child rows when a
parent row is deleted.
 ON UPDATE CASCADE: Automatically updates child rows when a
parent key value is updated.
 ON DELETE SET NULL: Sets the foreign key value to null in the child
table when the parent row is deleted.
 ON DELETE RESTRICT: Prevents the deletion of a parent row if there
are related child rows.
 Purpose: Maintains consistent and valid relationships between tables.

Domain Integrity

 Definition: Domain integrity ensures that the values in a column are of the correct data
type and adhere to constraints defined for that column.
 Rules:
o Data Type: The data type of column values must match the defined data type.
o Value Constraints: Constraints such as range, length, and format must be
enforced.
 Purpose: Ensures that column values are within acceptable limits and conform to
specified formats.

User-Defined Integrity

 Definition: User-defined integrity encompasses rules and constraints defined by the user
to enforce specific business rules and requirements not covered by the standard integrity
constraints.
 Rules:
o Custom Constraints: Rules that reflect specific business logic or requirements.
 Purpose: Allows the implementation of application-specific rules that maintain data
quality and consistency according to business needs.

2.6 Relational Algebra Operation


Relational algebra is a procedural query language used to query relational databases. It consists
of a set of operations that take one or more relations (tables) as input and produce a new relation
as output. These operations form the theoretical foundation for relational query languages like
SQL. Here's a summary of the core relational algebra operations:

1. Selection (σ)

The selection operation filters rows from a relation based on a specified condition. It is
analogous to the WHERE clause in SQL.
Notation: σcondition(R)\sigma_{condition}(R)σcondition(R)

Example: To find employees with a salary greater than $50,000 from the Employees relation:
σsalary>50000(Employees)\sigma_{\text{salary} > 50000}(\text{Employees})σsalary>50000
(Employees)

2. Projection (π)

The projection operation selects specific columns from a relation, discarding the others. It is
similar to the SELECT clause in SQL.

Notation: πcolumns(R)\pi_{columns}(R)πcolumns(R)

Example: To retrieve the name and salary columns from the Employees relation: πname,
salary(Employees)\pi_{\text{name, salary}}(\text{Employees})πname, salary(Employees)

3. Union (∪)

The union operation combines the tuples from two relations, eliminating duplicate tuples. The
relations must have the same schema (i.e., the same number and types of columns).

Notation: R∪SR \cup SR∪S

Example: To combine employees from two departments into one list: DeptA∪DeptB\
text{DeptA} \cup \text{DeptB}DeptA∪DeptB

4. Intersection (∩)

The intersection operation returns tuples that are present in both relations. The relations must
have the same schema.

Notation: R∩SR \cap SR∩S

Example: To find employees who are present in both DeptA and DeptB: DeptA∩DeptB\
text{DeptA} \cap \text{DeptB}DeptA∩DeptB

5. Difference (−)

The difference operation returns tuples that are in the first relation but not in the second. The
relations must have the same schema.

Notation: R−SR - SR−S


Example: To find employees who are in DeptA but not in DeptB: DeptA−DeptB\text{DeptA} - \
text{DeptB}DeptA−DeptB

6. Cartesian Product (×)

The Cartesian product operation returns all possible pairs of tuples from two relations. This
operation combines each tuple in the first relation with every tuple in the second relation.

Notation: R×SR \times SR×S

Example: To combine every employee with every department: Employees×Departments\


text{Employees} \times \text{Departments}Employees×Departments

7. Join (⨝)

The join operation combines tuples from two relations based on a common attribute. There are
several types of joins, including inner join, natural join, and equi join.

 Natural Join: Joins relations based on all common attributes.

Notation: R⋈SR \bowtie SR⋈S

Example: To join Employees and Departments on a common dept_id:


Employees⋈Departments\text{Employees} \bowtie \
text{Departments}Employees⋈Departments

 Theta Join: Joins relations based on a condition other than equality.

Notation: R⋈conditionSR \bowtie_{condition} SR⋈conditionS

Example: To join Employees and Projects where Employees.salary > Projects.budget:


Employees⋈salary>budgetProjects\text{Employees} \bowtie_{\text{salary} > \
text{budget}} \text{Projects}Employees⋈salary>budgetProjects

 Equi Join: A special case of theta join where the condition is equality.

Notation: R⋈attribute1=attribute2SR \bowtie_{\text{attribute}_1 = \text{attribute}_2}


SR⋈attribute1=attribute2S

8. Rename (ρ)

The rename operation changes the name of a relation or its attributes. It is used to give a new
name to a relation or to resolve attribute name conflicts.
Notation: ρnew_name(R) or ρnew_name(attributes)(R)\rho_{new\_name(R)} \text{ or } \
rho_{new\_name(\text{attributes})(R)}ρnew_name(R) or ρnew_name(attributes)(R)

Example: To rename the Employees relation to Staff: ρStaff(Employees)\rho_{\text{Staff}}(\


text{Employees})ρStaff(Employees)

Additional Operations

Some additional operations that can be derived from these basic operations include:

 Division (÷): Useful for queries that require finding tuples in one relation that match all
tuples in another relation.

Notation: R÷SR \div SR÷S

Example: To find employees who have worked on all projects listed in the Projects
relation.

Understanding these operations is crucial for working with relational databases and forms the
basis for more advanced query optimization and data manipulation techniques.

2.7 Advantages and limitations


Relational operations are fundamental to the relational model and involve various operations to
query and manipulate data stored in relational databases. While these operations provide a
powerful way to interact with data, they also come with certain advantages and limitations.
Here’s an overview:

2.7.1 Advantages of Relational Operations:

1. Expressiveness:

o Powerful Queries: Relational operations, such as selection, projection, join, and union,
enable expressive and complex queries. SQL provides a rich set of operations to retrieve
and manipulate data in sophisticated ways.

2. Declarative Nature:

o Declarative Queries: In relational algebra and SQL, users specify what data they want
without detailing how to obtain it. This declarative approach simplifies query writing and
focuses on the desired result rather than the procedural steps.

3. Data Integration:
o Join Operations: Operations like joins allow for combining data from multiple tables
based on related columns, enabling users to integrate and analyze data from different
sources within a single query.

4. Data Integrity:

o Constraints and Rules: Relational operations often include mechanisms to enforce data
integrity and consistency, such as primary key constraints, foreign key constraints, and
domain constraints.

5. Normalization Support:

o Normalization: Relational operations support the normalization process, which helps


organize data into tables that reduce redundancy and improve data integrity.

6. Flexibility:

o Dynamic Queries: Users can write dynamic and flexible queries that adapt to changing
requirements, making relational databases versatile for various applications and reporting
needs.

7. Optimization:

o Query Optimization: Relational databases include optimization techniques to enhance


query performance, such as indexing, query rewriting, and efficient execution plans.

2.7.2 Limitations of Relational Operations:

1. Performance Overheads:

o Complex Queries: Operations involving multiple joins, subqueries, or aggregations can


lead to performance issues, especially with large datasets or complex queries, potentially
impacting query response times.

2. Scalability Constraints:

o Horizontal Scaling: Relational operations can struggle with horizontal scaling


(distributing data across multiple servers) due to the need to maintain ACID properties
and ensure consistency across distributed systems.

3. Schema Rigidity:

o Schema Changes: Changes to the schema, such as adding or modifying columns, can be
complex and disruptive, requiring careful planning and potentially affecting existing
operations and queries.
4. Complexity in Query Writing:

o Complex Joins: Writing queries that involve complex joins or nested subqueries can be
challenging and error-prone, especially for users who are not familiar with advanced SQL
techniques.

5. Unstructured Data Handling:

o Limited Support: Relational operations are optimized for structured data. Handling
unstructured or semi-structured data, such as text or multimedia content, often requires
additional processing or adaptations.

6. Concurrency and Locking Issues:

o Concurrency Control: Managing concurrent operations and maintaining data


consistency can introduce overhead. Locking mechanisms to ensure isolation can lead to
contention and reduced performance in high-transaction environments.

7. Overhead for Transactions:

o Transaction Management: Ensuring ACID properties for transactions involves


overhead related to logging, recovery, and maintaining consistency, which can impact
performance in systems with high transaction rates.

2.8 Relational Calculus

2.8.1 . Tuple Relational Calculus (TRC):

 Concept: Tuple Relational Calculus allows users to describe what data to retrieve based
on a set of conditions applied to tuples (rows) in a relation (table). The result is a set of
tuples that satisfy the specified conditions.

 Syntax Example: To find all employees in the "Employees" table who work in
department 10:

{t∣t∈Employees and t.DeptNo=10}


2.8.2. Domain Relational Calculus (DRC):

 Concept: Domain Relational Calculus uses domain variables to express queries. It


focuses on retrieving data based on conditions applied to the values of attributes.

 Syntax Example: To find the names of employees working in department 10:


 {(eName)∣∃eNo,dNo such that (eNo,eName,dNo)∈Employees and dNo=10}\{ (eName) \
mid \exists eNo, dNo \text{ such that } (eNo, eName, dNo) \in \text{Employees} \
text{ and } dNo = 10 \}{(eName)∣∃eNo,dNo such that (eNo,eName,dNo)∈Employees
and dNo=10}
 Here, eNoeNoeNo, eNameeNameeName, and dNodNodNo are domain variables
representing attribute values.

2.9 Domain Relational Calculus (DRC)


Domain Relational Calculus (DRC) is a formal language used to specify queries in relational
databases. It is one of the two main types of relational calculus, the other being Tuple Relational
Calculus (TRC). DRC is used to express queries based on specifying the conditions that the
domain (or value) of attributes must satisfy.

Key Concepts of Domain Relational Calculus

1. Domain Variables:

 Definition: Domain variables are used to represent the values of attributes in the database. Each
domain variable corresponds to a particular attribute or set of attributes.

 Example: If you have a relation Employee with attributes EmployeeID, Name, and Salary, you
might use domain variables like EID, N, and S to represent values for these attributes.

2. Domain Relational Calculus Expression:

 Syntax: A DRC query is expressed in the form of a logical predicate. It specifies the conditions
that must be satisfied by the domain variables.
 Form: {D | P(D)} where D is a domain variable and P(D) is a predicate (condition) that the values
of D must satisfy.

3. Predicate (Condition):

 Definition: The predicate is a logical expression that specifies the conditions that the values of
the domain variables must meet.

 Example: If you want to retrieve the names of employees who earn more than $50,000, the
predicate could be Salary > 50000.

4. DRC Query Example:

 Problem: Retrieve the names of employees with a salary greater than $50,000.

 Table Schema: Employee(EmployeeID, Name, Salary)

 DRC Query: {N | ∃EID (Employee(EID, N, Salary) ∧ Salary > 50000)}

5. Explanation of the Example:

 N is a domain variable representing the Name attribute.

 ∃EID signifies that there exists an EmployeeID.

 Employee(EID, N, Salary) represents a tuple in the Employee relation where EID is the employee ID,
N is the employee's name, and Salary is the salary.

 Salary > 50000 is the condition that the Salary must meet.

6. Advantages of Domain Relational Calculus:

 Declarative Nature: DRC allows users to specify what data they want without needing to specify
how to retrieve it, focusing on the conditions rather than the procedure.

 Formal Foundation: It provides a formal foundation for querying relational databases and helps
in understanding the theoretical aspects of query languages.

7. Comparison with Tuple Relational Calculus (TRC):

 Domain Relational Calculus: Works with domain variables representing individual attribute
values.

 Tuple Relational Calculus: Works with tuple variables representing entire rows or records.

8. Domain Relational Calculus vs. SQL:


 DRC: Expresses queries in terms of logical predicates and domain variables.

 SQL: A practical query language that is commonly used in relational databases. It is a procedural
language compared to the more declarative nature of DRC.

2.10 QBE
Query By Example (QBE) is a database query language used for querying relational databases.
It provides a user-friendly, graphical way to formulate queries by specifying example data rather
than writing complex code. Here’s a detailed overview of QBE, including its features,
advantages, and limitations:

2.10.1 Overview of QBE

Concept:

 QBE allows users to construct queries by filling out a template or form that represents the
structure of the database. Users specify example values in the form to indicate what they
want to retrieve.

How It Works:

 Users interact with a graphical interface where they specify example records or
conditions in a template. The system then translates these examples into a formal query to
retrieve the matching records from the database.

2.10.2 Components of QBE:

1. Tables and Fields:

o Users are presented with a form or template that mirrors the structure of the
database tables. Fields in the form correspond to columns in the tables.

2. Criteria Specification:

o Users provide example values or conditions in the fields to specify what data they
are interested in. For instance, if a user wants to find all employees in a specific
department, they enter the department number in the relevant field.

3. Query Execution:

o The QBE system translates the example-based criteria into a query language such
as SQL. It then executes the query against the database and returns the results.
2.10.3 Advantages of QBE:

1. User-Friendly:

o Intuitive Interface: QBE provides a graphical, form-based interface that is easier


for non-technical users to understand compared to writing SQL queries manually.

2. Reduced Complexity:

o No Need for Syntax Knowledge: Users do not need to know the syntax of query
languages like SQL. Instead, they interact with a straightforward form or
template.

3. Quick Query Construction:

o Rapid Prototyping: Users can quickly construct and modify queries by filling
out and adjusting the example fields in the form.

4. Error Reduction:

o Minimized Syntax Errors: Since users are interacting with a graphical interface
rather than writing code, there is less risk of syntax errors in the queries.

5. Visual Feedback:

o Immediate Results: Users can see how their criteria affect the results, providing
immediate feedback and making it easier to refine queries.

2.10.4 Limitations of QBE:

1. Limited Expressiveness:

o Complex Queries: QBE may struggle with more complex queries that require
advanced operations, such as complex joins, aggregations, or nested subqueries.

2. Less Flexibility:

o Template Constraints: The predefined forms and templates may limit the
flexibility of query construction, making it harder to perform unconventional
queries.

3. Vendor Dependency:
o Implementation Variability: Different database systems may have variations in
their QBE implementations, leading to inconsistencies and potentially reducing
portability.

4. Performance Concerns:

o Query Optimization: QBE interfaces may not always generate the most efficient
queries, and the performance of queries generated through QBE can vary
depending on the system’s optimization capabilities.

Unit 3
3.1 Structure of Relational Database.

A relational database is organized in a way that facilitates the management and retrieval of data
through a structured approach. The key components and structures of a relational database
include:

1. Tables

 Definition: Tables are the fundamental building blocks of a relational database. They store data
in rows and columns.

 Structure: Each table has a name and is composed of rows (records) and columns (fields or
attributes).

 Example: A table named Employees might have columns like EmployeeID, FirstName, LastName,
and HireDate.

2. Rows

 Definition: Rows, also known as records or tuples, represent individual entries in a table.

 Structure: Each row contains data corresponding to each column in the table.

3. Columns

 Definition: Columns define the attributes or fields of the data stored in a table.

 Structure: Each column has a name and a data type (e.g., INTEGER, VARCHAR, DATE).
4. Primary Key

 Definition: A primary key is a column or a set of columns that uniquely identifies each row in a
table.

 Characteristics: The values in the primary key column(s) must be unique and not null.

 Example: EmployeeID in the Employees table might be the primary key.

5. Foreign Key

 Definition: A foreign key is a column or a set of columns in one table that refers to the primary
key of another table.

 Purpose: It establishes a relationship between two tables and ensures referential integrity.

 Example: An Orders table might have a CustomerID column that references the CustomerID
column in a Customers table.

6. Indexes

 Definition: Indexes are used to improve the speed of data retrieval operations on a table.

 Purpose: They create a data structure that allows for faster searches, sorting, and querying.

 Types: Common types include primary key indexes and secondary indexes.

7. Views

 Definition: Views are virtual tables created by querying one or more tables.

 Purpose: They provide a way to present data in a specific format or subset without altering the
actual tables.

 Example: A view might combine data from the Employees and Departments tables to show
employees and their department names.

8. Relationships

 Definition: Relationships describe how tables are connected to each other.

 Types:

o One-to-One: A single row in Table A is related to a single row in Table B.

o One-to-Many: A single row in Table A is related to multiple rows in Table B.


o Many-to-Many: Multiple rows in Table A are related to multiple rows in Table B,
usually managed through a junction table.

3.2 Introduction to Relational Database Design

Relational database design is a critical aspect of organizing and managing data in a relational
database management system (RDBMS). The primary goal is to structure data in a way that
minimizes redundancy and maintains data integrity, making it easy to access, modify, and
manage. Here's a comprehensive introduction to the key concepts and principles involved in
relational database design:

1. Basic Concepts

 Relational Model: Proposed by Edgar F. Codd in 1970, the relational model represents
data as tables (or relations). Each table consists of rows (tuples) and columns (attributes).
Tables can be related to each other through common attributes.

 Tables (Relations): The core structure of a relational database. Each table represents an
entity (e.g., customers, orders) and contains rows (records) and columns (fields).

 Rows (Tuples): Individual records within a table. Each row represents a unique instance
of the entity described by the table.

 Columns (Attributes): Define the properties or characteristics of the entities represented


by the table. Each column has a data type and a name.

2. Keys

 Primary Key: A unique identifier for each row in a table. No two rows can have the
same primary key value, ensuring each record is unique. Example: customer_id in a
Customers table.

 Foreign Key: A field (or a collection of fields) in one table that uniquely identifies a row
of another table. It establishes a relationship between two tables. Example: customer_id in
an Orders table that refers to the customer_id in the Customers table.

 Composite Key: A primary key that consists of two or more columns. This is used when
a single column is not sufficient to uniquely identify a record.

3. Normalization

Normalization is the process of organizing a database to reduce redundancy and improve data
integrity. It involves dividing large tables into smaller, related tables and defining relationships
between them. The process typically involves several normal forms:
 First Normal Form (1NF): Ensures that each column contains atomic (indivisible)
values, and each record is unique. This means no repeating groups or arrays in a single
column.

 Second Normal Form (2NF): Builds on 1NF by ensuring that all non-key attributes are
fully functionally dependent on the entire primary key. It removes partial dependencies.

 Third Normal Form (3NF): Ensures that all attributes are dependent only on the
primary key and not on other non-key attributes. It eliminates transitive dependencies.

 Boyce-Codd Normal Form (BCNF): A stronger version of 3NF that handles certain
types of anomaly that 3NF does not cover.

4. Relationships

 One-to-One (1:1): A single record in Table A is related to a single record in Table B.


Example: Each person has one passport.

 One-to-Many (1

): A single record in Table A is related to multiple records in Table B. Example: A single


customer can place many orders.

 Many-to-Many (M

): Multiple records in Table A are related to multiple records in Table B. This requires a junction
table to manage the relationship. Example: Students and courses, where each student can enroll in
many courses and each course can have many students.

5. Indexes

Indexes improve the speed of data retrieval operations on a database at the cost of additional
storage space and potential decrease in performance for data modification operations. They are
used to quickly locate rows based on the values of specific columns.

6. Data Integrity

Data integrity ensures that the data is accurate and consistent. It is enforced through:

 Entity Integrity: Ensures that each table has a primary key and that it is unique and not
null.

 Referential Integrity: Ensures that foreign keys in a table match primary keys in another
table, maintaining consistent relationships.
 Domain Integrity: Ensures that all values in a column adhere to a defined domain (e.g.,
data types, ranges).

3.4 Objectives Of A Relational Database

The objectives of a relational database are centered around managing and organizing data
efficiently while ensuring its integrity and usability. Here are the key objectives:

1. Data Integrity: Ensure that data is accurate and consistent. Relational databases use
constraints like primary keys, foreign keys, and unique constraints to maintain data
accuracy and prevent anomalies.

2. Data Independence: Provide a clear separation between the data and the applications
that use it. This allows changes to the database schema without affecting the applications
that access the data.

3. Data Consistency: Ensure that data remains consistent across the database by enforcing
rules and relationships. This helps in maintaining uniformity and reducing redundancy.

4. Efficient Data Retrieval: Optimize the storage and retrieval of data. Relational databases
use indexing, query optimization, and efficient data retrieval methods to handle large
volumes of data and complex queries effectively.

5. Ease of Data Manipulation: Allow for easy insertion, updating, and deletion of data.
SQL (Structured Query Language) is commonly used for these operations, providing a
standardized way to interact with the data.

6. Scalability: Handle growing amounts of data and increasing numbers of users without
significant performance degradation. This involves both horizontal (scaling out) and
vertical (scaling up) scaling techniques.

7. Data Security: Protect data from unauthorized access and breaches. Relational databases
provide various security features such as user authentication, access control, and
encryption.

8. Transaction Management: Ensure that database operations are processed reliably and
adhere to the principles of ACID (Atomicity, Consistency, Isolation, Durability). This
guarantees that transactions are completed successfully or rolled back entirely in case of
failure.

9. Data Relationships: Manage relationships between different data entities effectively.


Relational databases use tables and relationships (one-to-one, one-to-many, many-to-
many) to model real-world relationships between data elements.
10. Backup and Recovery: Provide mechanisms for backing up data and recovering it in
case of failures or disasters. This ensures data durability and availability.

11. Data Redundancy Reduction: Minimize data duplication by normalizing the database
schema, which helps in organizing data into related tables to avoid redundant data
storage.

3.4 Tools for Relational Databases

1. Database Management Systems (DBMS): Software that manages relational databases.


Some popular DBMS include:

o MySQL: An open-source DBMS widely used for web applications.

o PostgreSQL: An open-source DBMS known for its advanced features and compliance
with SQL standards.

o Oracle Database: A commercial DBMS known for its robust features and scalability.

o Microsoft SQL Server: A commercial DBMS developed by Microsoft, known for its
integration with other Microsoft products.

o SQLite: A lightweight, file-based DBMS used for embedded applications.

2. Database Design Tools: Software for designing and managing database schemas.

o MySQL Workbench: A visual tool for MySQL that includes data modeling, SQL
development, and server administration.

o pgAdmin: A graphical tool for managing PostgreSQL databases.

o Oracle SQL Developer: A free tool for database development and management with
Oracle databases.

o Microsoft SQL Server Management Studio (SSMS): A tool for managing SQL Server
databases.

3. Query Tools: Tools or interfaces for executing SQL queries and managing database
interactions.

o DBeaver: A universal database tool that supports various DBMS.

o DataGrip: A database IDE by JetBrains that supports multiple databases and provides a
powerful SQL editor.
4. Data Migration and Integration Tools: Tools for moving data between databases or
integrating databases with other systems.

o Talend: A data integration tool with support for various databases.

o Apache NiFi: An open-source data integration tool that supports data flow management.

5. Backup and Recovery Tools: Tools to back up and restore databases.

o Backup tools provided by the DBMS: Most DBMS come with built-in backup and
recovery features.

o Third-Party Backup Solutions: Tools like Rubrik or Veeam can be used for more
comprehensive backup solutions.

3.5 Redundancy and Data Anomaly of relational databases

In relational databases, redundancy and data anomalies can significantly impact data integrity
and efficiency. Understanding these issues helps in designing robust databases. Here’s a detailed
look at redundancy and data anomalies in relational databases:

Redundancy

Redundancy refers to the unnecessary repetition of data within a database. While some level of
redundancy can be intentional and useful (e.g., for optimization), excessive redundancy can lead
to several problems:

 Increased Storage Costs: Storing redundant data consumes additional disk space, which
could otherwise be used more efficiently.

 Data Integrity Issues: When data is repeated, keeping it synchronized becomes


challenging. Changes made in one place may not be reflected elsewhere, leading to
inconsistencies.

 Performance Degradation: Redundant data can slow down query performance, as the
system may need to sift through more data than necessary.

Examples of Redundancy

 Storing the same information in multiple tables: For example, if a customer’s address
is stored in both the Customers and Orders tables, any updates to the address must be made
in both places.

 Duplicated records: If a table contains multiple rows with the same information, it can
lead to inefficient use of resources.
Data Anomalies

Data anomalies are irregularities or inconsistencies in a database that arise due to poor design or
improper handling of data. There are several types of anomalies:

1. Update Anomaly

Occurs when changes to data in one place must be applied in multiple locations. Failure
to update all instances can lead to inconsistencies.

o Example: If a customer’s address is stored in multiple tables, updating the address in one
table but not the others results in differing address information across the database.

2. Insert Anomaly

Occurs when the structure of the database prevents certain types of data from being
entered unless other related data is also provided.

o Example: In a table that combines customer and order information, it might be


impossible to insert a new customer without immediately creating an order for them,
which is often not desirable.

3. Delete Anomaly

Happens when deleting a record results in the unintended loss of other related data.

o Example: If deleting a customer’s record from a table that also contains order
information results in the loss of all associated orders, even if the order data is still
needed for historical purposes.

Addressing Redundancy and Data Anomalies

Normalization is a key technique used to address redundancy and data anomalies. It involves
decomposing tables to eliminate redundant data and ensure that each piece of information is
stored only once. The process typically includes several normal forms:

1. First Normal Form (1NF): Ensures that each column contains only atomic (indivisible)
values and each record is unique. It eliminates repeating groups and arrays.

2. Second Normal Form (2NF): Builds on 1NF by ensuring that all non-key attributes are
fully functionally dependent on the entire primary key. It removes partial dependencies.

3. Third Normal Form (3NF): Ensures that all attributes are dependent only on the
primary key and not on other non-key attributes. It eliminates transitive dependencies.
4. Boyce-Codd Normal Form (BCNF): Addresses certain types of anomalies not handled
by 3NF by ensuring that every determinant is a candidate key.

5. Fourth Normal Form (4NF) and Fifth Normal Form (5NF): Deal with multi-valued
dependencies and join dependencies, respectively, further refining the database design.

Example Scenario

Consider a database for a retail store with a table that combines customer and order information:

CustomerID CustomerName Address OrderID ProductName Quantity

1 Alice Smith 123 Elm St 101 Widget A 5

1 Alice Smith 123 Elm St 102 Widget B 10

2 Bob Johnson 456 Oak St 103 Widget C 3

 Redundancy: The address for Alice Smith is repeated for each of her orders.

 Update Anomaly: If Alice Smith’s address changes, it must be updated in multiple rows.

 Insert Anomaly: To insert a new customer, you might need to also insert an order, which
is not always appropriate.

 Delete Anomaly: Deleting Alice Smith’s record would also remove all her orders, even if
you want to retain historical order data.

 Normalization would involve splitting this table into separate tables for customers and
orders:

1. Customers Table:

CustomerID CustomerName Address


1 Alice Smith 123 Elm St

2 Bob Johnson 456 Oak St

2. Orders Table:

OrderID CustomerID ProductName Quantity


101 1 Widget A 5

102 1 Widget B 10

103 2 Widget C 3
By separating the tables, you minimize redundancy, making updates easier and more reliable,
and avoiding anomalies in data management.

3.6 Functional Dependency of relational databases

Functional dependency is a key concept in relational database theory, important for


understanding how data is organized and ensuring data integrity. It describes a relationship
between attributes in a relational database and is fundamental for database normalization. Here’s
a detailed overview:

1. Definition of Functional Dependency

A functional dependency (FD) is a constraint between two sets of attributes in a relational


database. Specifically, if you have a relation (or table) with attributes, a functional dependency
specifies that one set of attributes (called the determinant) uniquely determines another set of
attributes.

Formally, for a relation RRR and attributes XXX and YYY in RRR:

 XXX functionally determines YYY (denoted as X→YX \rightarrow YX→Y) if and only if, for
any two tuples in RRR that agree on the XXX attributes, they must also agree on the YYY
attributes.

In other words, if two rows have the same value for XXX, they must have the same value for
YYY.

2. Examples of Functional Dependencies

 Single Attribute FD: In a table where EmployeeID is unique for each employee, we can
say EmployeeID → EmployeeName. This means that for a given EmployeeID, there is a unique
EmployeeName.

 Composite FD: In a table with CourseID and StudentID, if each combination of CourseID and
StudentID uniquely determines a Grade, we can write this as (CourseID, StudentID) → Grade.

3. Types of Functional Dependencies

 Trivial Functional Dependency: A functional dependency is trivial if Y⊆XY \subseteq


XY⊆X. For example, EmployeeID → EmployeeID is trivial because EmployeeID is part of
itself.
 Non-Trivial Functional Dependency: A functional dependency X→YX \rightarrow
YX→Y is non-trivial if YYY is not a subset of XXX. For instance, EmployeeID →
EmployeeName is non-trivial if EmployeeName is not a part of EmployeeID.

4. Importance in Database Design

 Normalization: Functional dependencies are crucial for normalization, a process that


organizes the database to reduce redundancy and improve data integrity. The
normalization process involves decomposing tables based on functional dependencies to
achieve different normal forms (1NF, 2NF, 3NF, BCNF, etc.).

 Ensuring Data Integrity: By understanding and applying functional dependencies, you


can ensure that the data in the database remains consistent and accurate, avoiding
anomalies like update, insertion, and deletion anomalies.

5. Armstrong's Axioms

To reason about functional dependencies, Armstrong's axioms are used:

 Reflexivity: If Y⊆XY \subseteq XY⊆X, then X→YX \rightarrow YX→Y.

 Augmentation: If X→YX \rightarrow YX→Y, then XZ→YZXZ \rightarrow YZXZ→YZ for


any attribute set ZZZ.

 Transitivity: If X→YX \rightarrow YX→Y and Y→ZY \rightarrow ZY→Z, then X→ZX \
rightarrow ZX→Z.

These axioms are used to infer all possible functional dependencies that hold in a given relation.

6. Closure of a Set of Functional Dependencies

The closure of a set of functional dependencies, often denoted F+F^+F+, is the set of all
functional dependencies that can be inferred from the original set FFF using Armstrong's axioms.

7. Role in Query Optimization

Understanding functional dependencies can also aid in query optimization by allowing the
database management system to make better decisions about indexing and query execution plans
based on the expected relationships between data attributes.

3.6 Functional Dependency in Relational Database

Functional dependency is a fundamental concept in relational database theory, crucial for


designing normalized database schemas. It describes a relationship between attributes (columns)
in a relational database where one attribute uniquely determines another.
Definition

A functional dependency, denoted as X→YX \rightarrow YX→Y, between two sets of attributes
XXX and YYY in a relational schema means that if two rows have the same value for XXX,
they must also have the same value for YYY. In other words, XXX functionally determines
YYY.

Components

1. Left-hand Side (LHS): The set of attributes XXX on which the dependency is based.

2. Right-hand Side (RHS): The set of attributes YYY that are determined by XXX.

Types of Functional Dependencies

1. Trivial Functional Dependency: A dependency where YYY is a subset of XXX. For


example, {A,B}→A\{A, B\} \rightarrow A{A,B}→A is trivial because AAA is included
in {A,B}\{A, B\}{A,B}.

2. Non-trivial Functional Dependency: A dependency where YYY is not a subset of


XXX. For example, A→BA \rightarrow BA→B is non-trivial if BBB is not part of AAA.

3. Completely Non-trivial Functional Dependency: A dependency where XXX and YYY


are disjoint. For instance, {A,B}→C\{A, B\} \rightarrow C{A,B}→C is completely non-
trivial if CCC is not part of {A,B}\{A, B\}{A,B}.

3.7 Normalization
Normalization is a process in database design used to organize data to reduce redundancy and
improve data integrity. The main goal of normalization is to ensure that the database is free from
undesirable characteristics like insertion, update, and deletion anomalies. This is achieved by
dividing a database into two or more tables and defining relationships between them.

Normalization Process

Normalization involves decomposing a database schema into a set of tables and establishing
constraints to maintain data integrity. It typically involves several stages or normal forms, each
addressing specific types of redundancy and anomalies.

3.8 . First Normal Form (1NF)

Objective: Ensure that the table is a relation, meaning it has a clear structure with atomic values.

Criteria:

 Each column must contain only atomic (indivisible) values.


 Each column must contain values of a single type.

 Each column must have a unique name.

 The order in which data is stored does not matter.

Example:

A table with multiple values in a single cell:

StudentID Name Courses

101 Alice Math, Science

102 Bob History, Art

To achieve 1NF, split it into:

StudentID Name Course

101 Alice Math

101 Alice Science

102 Bob History

102 Bob Art

3.9 Second Normal Form (2NF)

Objective: Ensure that all non-key attributes are fully functionally dependent on the entire
primary key.

Criteria:

 The table must be in 1NF.

 There should be no partial dependency of any column on the primary key. Each non-key attribute
must be fully dependent on the entire primary key.

Example:

Given a table in 1NF:

StudentID CourseID Instructor InstructorPhone

101 CS101 Dr. Smith 123-456-7890


101 CS102 Dr. Jones 987-654-3210

The primary key is a composite key: (StudentID, CourseID).

To achieve 2NF, separate into:

1. StudentCourse Table:

StudentID CourseID

101 CS101

101 CS102

2. Instructor Table:

CourseID Instructor InstructorPhone

CS101 Dr. Smith 123-456-7890

CS102 Dr. Jones 987-654-3210

3.10 Third Normal Form (3NF)

Objective: Ensure that non-key attributes are not dependent on other non-key attributes
(transitive dependency).

Criteria:

 The table must be in 2NF.

 There should be no transitive dependencies; all non-key attributes must be dependent only on the
primary key.

Example:

Given a table in 2NF:

StudentID CourseID Instructor InstructorPhone

101 CS101 Dr. Smith 123-456-7890

101 CS102 Dr. Jones 987-654-3210

Here, InstructorPhone is dependent on Instructor, not directly on the primary key StudentID, CourseID.
To achieve 3NF, separate into:

1. StudentCourse Table:

StudentID CourseID

101 CS101

101 CS102

2. Instructor Table:

CourseID Instructor

CS101 Dr. Smith

CS102 Dr. Jones

3. InstructorPhone Table:

Instructor InstructorPhone

Dr. Smith 123-456-7890

Dr. Jones 987-654-3210

3.11 Boyce-Codd Normal Form (BCNF)

Objective: Handle anomalies not addressed by 3NF by ensuring that every determinant is a
candidate key.

Criteria:

 The table must be in 3NF.

 Every determinant must be a candidate key.

Example:

If a table violates BCNF because a non-key attribute is a determinant, it should be further


decomposed until BCNF is achieved.

3.12 Transaction processing

Transaction processing in a Relational Database Management System (RDBMS) refers to the


management of database operations as a series of actions that must be executed in a reliable and
consistent manner. A transaction is a sequence of one or more SQL operations (like INSERT,
UPDATE, DELETE) that are executed as a single unit of work. Transactions ensure that the
database remains in a consistent state, even in the presence of failures, errors, or concurrent
operations.

Key Concepts in Transaction Processing

1. ACID Properties

o Atomicity: Ensures that all operations within a transaction are completed successfully or
none are applied. If any operation fails, the transaction is aborted, and all changes are
rolled back to maintain data integrity.

o Consistency: Ensures that a transaction brings the database from one consistent state to
another consistent state. This means that all data integrity constraints must be satisfied
after the transaction.

o Isolation: Ensures that the operations of one transaction are isolated from those of other
concurrent transactions. This means the intermediate state of a transaction is invisible to
other transactions.

o Durability: Ensures that once a transaction has been committed, its changes are
permanent and will survive any subsequent system failures.

2. Transaction States

o Active: The transaction is currently being executed.

o Partially Committed: The transaction has executed its final operation but has not yet
been committed.

o Committed: All operations of the transaction have been completed successfully, and
changes are permanently applied to the database.

o Failed: The transaction cannot proceed due to some error and must be rolled back.

o Aborted: The transaction has been rolled back, and all changes have been undone.

3. Transaction Control Commands

o BEGIN TRANSACTION: Starts a new transaction.

o COMMIT: Saves all changes made during the transaction to the database, making them
permanent.

o ROLLBACK: Undoes all changes made during the transaction, reverting the database to
its state before the transaction began.
o SAVEPOINT: Sets a point within a transaction to which you can later roll back without
affecting the entire transaction.

4. Concurrency Control

o Locking: Prevents multiple transactions from accessing the same data concurrently in
conflicting ways. Locks can be shared (read) or exclusive (write).

o Deadlock: A situation where two or more transactions are waiting for each other to
release locks, causing all of them to remain blocked. Deadlock detection and resolution
mechanisms are used to handle such scenarios.

o Isolation Levels: Define the degree to which the operations in one transaction are
isolated from those in other concurrent transactions. Common isolation levels include:

 Read Uncommitted: Allows reading uncommitted changes from other


transactions, leading to possible dirty reads.

 Read Committed: Ensures that only committed data is read, avoiding dirty reads
but allowing non-repeatable reads.

 Repeatable Read: Ensures that if a transaction reads a value, subsequent reads


of the same value within the transaction will see the same data, avoiding non-
repeatable reads but still allowing phantom reads.

 Serializable: Ensures complete isolation by making transactions appear as if they


were executed sequentially, preventing dirty reads, non-repeatable reads, and
phantom reads.

5. Logging and Recovery

o Write-Ahead Logging (WAL): Logs changes before they are applied to the database.
This ensures that if a system failure occurs, the changes can be recovered using the log.

o Checkpointing: Periodically saves the state of the database to reduce recovery time. A
checkpoint is a point in time where all changes up to that point are written to disk.

3.12.1 Example of Transaction Processing

Consider a banking application where a transaction transfers money from one account to another.
This operation involves two steps:

1. Debit the amount from the source account.

2. Credit the amount to the destination account.


If any step fails (e.g., due to a system crash or an error), the entire transaction should be rolled
back to ensure that the money is not lost or incorrectly credited. This ensures the database
remains consistent, adhering to the ACID properties.

sql

Copy code

BEGIN TRANSACTION;

-- Deduct money from the source account

UPDATE Accounts SET Balance = Balance - 100 WHERE AccountID = 123;

-- Add money to the destination account

UPDATE Accounts SET Balance = Balance + 100 WHERE AccountID = 456;

-- Commit the transaction if both operations are successful

COMMIT;

If an error occurs after the first update but before the commit, a rollback will be issued to undo
the changes:

sql

Copy code

ROLLBACK;

3.13 DATABASE SECURITY

Database security in a Database Management System (DBMS) is a critical aspect of


safeguarding data from unauthorized access, breaches, and other malicious activities. Effective
database security encompasses various practices and technologies designed to protect the
confidentiality, integrity, and availability of data. Here’s a detailed overview of the key concepts
and practices involved in database security:

Key Concepts in Database Security

1. Confidentiality

o Objective: Ensure that sensitive data is accessible only to authorized users.

o Techniques:

 Access Control: Restricts who can view or modify data. Includes role-based
access control (RBAC), discretionary access control (DAC), and mandatory
access control (MAC).

 Encryption: Protects data by converting it into a format that is unreadable


without the proper decryption key. Encryption can be applied to data at rest
(stored data) and data in transit (data being transmitted over networks).

2. Integrity

o Objective: Maintain the accuracy and consistency of data.

o Techniques:

 Data Validation: Ensures that only valid data is entered into the database.
Includes checks like data type validation, range checks, and format checks.

 Constraints: Define rules that data must adhere to, such as primary keys, foreign
keys, unique constraints, and check constraints.

 Audit Trails: Track changes to data and database operations to detect and
analyze unauthorized changes or anomalies.

3. Availability

o Objective: Ensure that data is accessible when needed, even in the face of failures or
attacks.

o Techniques:

 Backup and Recovery: Regularly back up data and have recovery procedures in
place to restore data in case of loss or corruption.

 Replication: Create copies of the database on different servers to provide


redundancy and improve availability.
 Failover Systems: Automatically switch to a backup system in case the primary
system fails.

4. Authentication

o Objective: Verify the identity of users accessing the database.

o Techniques:

 Username and Password: Basic form of authentication where users must


provide credentials to gain access.

 Multi-Factor Authentication (MFA): Enhances security by requiring multiple


forms of verification, such as a password and a fingerprint or a one-time code
sent to a mobile device.

5. Authorization

o Objective: Control what authenticated users are allowed to do within the database.

o Techniques:

 Privileges and Roles: Assign specific permissions to users or roles, such as read,
write, update, or delete access.

 Granular Permissions: Define permissions at various levels, including table,


column, and row levels.

6. Auditing

o Objective: Monitor and record database activities to detect and respond to potential
security incidents.

o Techniques:

 Audit Logs: Maintain records of user actions, data changes, and system access.
Logs should be securely stored and regularly reviewed.

 Alerts and Notifications: Set up alerts for suspicious activities or policy


violations.

7. Data Masking

o Objective: Protect sensitive data by obscuring its actual values.

o Techniques:
 Static Data Masking: Replaces sensitive data with anonymized data in non-
production environments.

 Dynamic Data Masking: Hides sensitive data in real-time for users who do not
have the necessary permissions.

8. Vulnerability Management

o Objective: Identify and mitigate security weaknesses in the database system.

o Techniques:

 Regular Patching: Apply updates and patches to address security vulnerabilities


in the DBMS software.

 Security Scanning: Use tools to scan for vulnerabilities and configuration


issues.

Best Practices for Database Security

1. Least Privilege Principle: Grant users the minimum level of access necessary to perform their
job functions.

2. Regular Backups: Schedule frequent backups and test recovery procedures to ensure data can be
restored in case of failure or corruption.

3. Encryption: Encrypt sensitive data both at rest and in transit to protect it from unauthorized
access.

4. Secure Configuration: Follow security best practices for configuring the DBMS, including
disabling unused features and changing default settings.

5. Monitoring and Auditing: Implement continuous monitoring and auditing to detect and respond
to suspicious activities or security breaches.

6. User Training: Educate users about security best practices, including password management and
recognizing phishing attempts.
Unit 4

4.1 SQL
SQL (Structured Query Language) is a standardized language used to manage and manipulate
relational databases. It allows users to perform various operations such as querying data,
updating records, inserting new data, and deleting existing data.
SQL (Structured Query Language)

 Purpose: Standard language for managing and manipulating relational databases.

 Components:

o DDL (Data Definition Language): Defines database structures (e.g., CREATE, ALTER,
DROP).

o DML (Data Manipulation Language): Manages data (e.g., SELECT, INSERT, UPDATE,
DELETE).

o DCL (Data Control Language): Controls access to data (e.g., GRANT, REVOKE).

o TCL (Transaction Control Language): Manages transactions (e.g., COMMIT, ROLLBACK).

 Operations:

o Querying data with SELECT statements.

o Modifying data with INSERT, UPDATE, and DELETE.

o Defining data structures and constraints.

o Managing user permissions and security.

 Joins: Combines data from multiple tables (e.g., INNER JOIN, LEFT JOIN).

 Views: Virtual tables for simplifying complex queries.

 Indexes: Improves data retrieval speed.

4.2 Commands in SQL

SQL (Structured Query Language) is used to manage and manipulate relational databases. Here’s a quick
overview of some common SQL commands, categorized by their purpose:

Data Querying Commands

SELECT: Retrieves data from one or more tables.


SELECT column1, column2 FROM table_name;

Data Manipulation Commands

INSERT: Adds new rows to a table.

INSERT INTO table_name (column1, column2) VALUES (value1, value2);

UPDATE: Modifies existing data in a table.

UPDATE table_name SET column1 = value1 WHERE condition;

DELETE: Removes rows from a table.

DELETE FROM table_name WHERE condition;


Data Definition Commands

CREATE TABLE: Creates a new table.

CREATE TABLE table_name (

column1 datatype,

column2 datatype

);

ALTER TABLE: Modifies an existing table structure.

ALTER TABLE table_name ADD column_name datatype;

DROP TABLE: Deletes a table and its data.


DROP TABLE table_name;

Data Control Commands

GRANT: Gives users access privileges to the database.

GRANT SELECT ON table_name TO user_name;

REVOKE: Removes access privileges from users.

REVOKE SELECT ON table_name FROM user_name;

Transaction Control Commands

COMMIT: Saves all changes made during the current transaction.

ROLLBACK: Undoes changes made during the current transaction.

Other Useful Commands

JOIN: Combines rows from two or more tables based on a related column.
SELECT * FROM table1 JOIN table2 ON table1.id = table2.foreign_id;

WHERE: Filters records based on specified conditions.

SELECT * FROM table_name WHERE condition;

ORDER BY: Sorts the result set in ascending or descending order.

SELECT * FROM table_name ORDER BY column_name ASC|DESC;

Aggregate Functions

COUNT(), SUM(), AVG(), MIN(), MAX(): Perform calculations on a set of values.


4.3 Data Types

In SQL, data types are crucial for defining the kind of data that can be stored in a column of a
table. Here’s an overview of common SQL data types:

1. Numeric Types

 INT: Integer, typically 4 bytes.

 SMALLINT: Small integer, 2 bytes.

 TINYINT: Very small integer, 1 byte.

 BIGINT: Large integer, 8 bytes.

 FLOAT: Floating-point number.

 DOUBLE: Double-precision floating-point number.

 DECIMAL(p, s): Fixed-point number, where p is precision and s is scale.

2. String Types

 CHAR(n): Fixed-length string of length n.

 VARCHAR(n): Variable-length string, up to n characters.

 TEXT: Variable-length string, for large amounts of text.

 NCHAR(n): Fixed-length Unicode string.

 NVARCHAR(n): Variable-length Unicode string.

3. Date and Time Types

 DATE: Date value (year, month, day).

 TIME: Time value (hour, minute, second).

 DATETIME: Date and time value.

 TIMESTAMP: Date and time, often used for tracking changes.


 INTERVAL: Represents a span of time.

4. Binary Types

 BINARY(n): Fixed-length binary data.

 VARBINARY(n): Variable-length binary data.

 BLOB: Binary Large Object, for storing large binary data.

5. Boolean Type

 BOOLEAN: Represents a true/false value.

6. Special Types

 ENUM: A string object with a value chosen from a list of permitted values.

 SET: A string object that can have zero or more values, each of which must be chosen from a list.

4.4 DDL
DDL, or Data Definition Language, is a subset of SQL (Structured Query Language) used to
define and manage database structures. Here are some key DDL commands:

CREATE: Used to create new database objects, such as tables, indexes, and views.

CREATE TABLE employees (

id INT PRIMARY KEY,

name VARCHAR(100),

hire_date DATE

);

ALTER: Used to modify existing database objects. This can include adding or dropping
columns, changing data types, etc.

ALTER TABLE employees ADD COLUMN salary DECIMAL(10, 2);

DROP: Used to delete database objects. Be careful, as this action is usually irreversible.
DROP TABLE employees;

TRUNCATE: Used to remove all records from a table, but it keeps the table structure for
future use.

TRUNCATE TABLE employees;

RENAME: Used to change the name of a database object.

RENAME TABLE employees TO staff;

COMMENT: Adds comments to the database schema for documentation purposes.

COMMENT ON COLUMN employees.salary IS 'Annual salary of the employee';

4.5 Selection

In SQL, the SELECT statement is used to query and retrieve data from a database. Here are
some key components and examples of how to use SELECT:

Basic Syntax

sql

Copy code

SELECT column1, column2, ...FROM table_name;

Examples

Selecting All Columns

SELECT * FROM employees;

Selecting Specific Columns


SELECT first_name, last_name FROM employees;

Using WHERE Clause

SELECT * FROM employees WHERE department = 'Sales';

Using ORDER BY Clause

SELECT first_name, last_name FROM employees ORDER BY last_name ASC;

Using LIMIT Clause

SELECT * FROM employees

LIMIT 10;

Using Aggregate Functions

SELECT COUNT(*) AS total_employees FROM employees;

Grouping Results

SELECT department, COUNT(*) AS employee_count FROM employees GROUP BY department;

Using JOINs

SELECT e.first_name, e.last_name, d.department_name FROM employees eJOIN departments d


ON e.department_id = d.id;
Common Clauses

WHERE: Filters records based on specified conditions.

ORDER BY: Sorts the result set.

GROUP BY: Groups rows that have the same values in specified columns.

HAVING: Filters groups based on conditions (used with GROUP BY).

LIMIT: Restricts the number of rows returned.

4.6 Projection

In SQL, projection refers to the operation of selecting specific columns from a table in a query.
This allows you to focus on just the data you need, rather than retrieving entire rows.

Key Points about Projection in SQL:

SELECT Statement: The main way to perform projection is through the SELECT statement.

Syntax:

SELECT column1, column2, ...FROM table_name;

Example: If you have a table named employees, and you only want to retrieve the name and
hire_date, you would write:

SELECT name, hire_date FROM employees;

Selecting All Columns: To retrieve all columns from a table, you can use *:
SELECT * FROM employees;

Aliases: You can use aliases to rename columns in the output:

SELECT name AS employee_name, hire_date AS date_hired FROM employees;

Projection with Conditions: You can combine projection with WHERE to filter results:

SELECT name, salary FROM employees WHERE salary > 50000;

Avoiding Duplicates: To remove duplicate results, use the DISTINCT keyword:

SELECT DISTINCT department FROM employees;

4.7 Join and Set

Sure! In SQL, JOIN operations and SET operations allow you to combine and manipulate data
from multiple tables or result sets. Here’s a breakdown of both:

JOIN Operations

1. INNER JOIN

Returns records that have matching values in both tables.

SELECT a.column1, b.column2FROM tableA aINNER JOIN tableB b ON a.common_column =


b.common_column;

2. LEFT JOIN (or LEFT OUTER JOIN)


Returns all records from the left table, and matched records from the right table. If there's no
match, NULLs are returned for the right table.

SELECT a.column1, b.column2FROM tableA aLEFT JOIN tableB b ON a.common_column =


b.common_column;

3. RIGHT JOIN (or RIGHT OUTER JOIN)

Returns all records from the right table, and matched records from the left table. If there's no
match, NULLs are returned for the left table.

SELECT a.column1, b.column2FROM tableA aRIGHT JOIN tableB b ON a.common_column =


b.common_column;

4. FULL JOIN (or FULL OUTER JOIN)

Returns records when there is a match in either left or right table records. If there’s no match,
NULLs are returned for missing matches.
SELECT a.column1, b.column2FROM tableA aFULL JOIN tableB b ON a.common_column =
b.common_column;

5. CROSS JOIN

Returns the Cartesian product of the two tables, meaning every row from the first table is
combined with every row from the second table.

SELECT a.column1, b.column2FROM tableA aCROSS JOIN tableB b;

SET Operations

1. UNION

Combines the result sets of two or more SELECT queries and removes duplicate rows.

SELECT column1 FROM tableAUNIONSELECT column1 FROM tableB;

2. UNION ALL
Similar to UNION, but includes duplicate rows.

SELECT column1 FROM tableAUNION ALLSELECT column1 FROM tableB;

3. INTERSECT

Returns only the rows that are common to both SELECT queries.

SELECT column1 FROM tableAINTERSECTSELECT column1 FROM tableB;

4. EXCEPT (or MINUS in some databases)

Returns rows from the first SELECT query that are not present in the second SELECT query.

SELECT column1 FROM tableAEXCEPTSELECT column1 FROM tableB;


Example Usage

Suppose you have two tables, employees and departments, and you want to find employees
along with their department names:

Using INNER JOIN:

SELECT e.name, d.department_nameFROM employees eINNER JOIN departments d ON


e.department_id = d.id;

Using UNION:

SELECT name FROM employeesUNIONSELECT name FROM contractors;

4.8 Aggregate Function

Aggregate functions in SQL are used to perform calculations on a set of values and return a
single value. They are often used with the GROUP BY clause to group rows that have the same
values in specified columns. Here are some commonly used aggregate functions:

1. COUNT()

Counts the number of rows in a specified column or the total number of rows if no column is
specified.

sql

Copy code

SELECT COUNT(*) FROM employees; -- Count all employeesSELECT COUNT(department_id)


FROM employees; -- Count employees with a department

2. SUM()

Calculates the total sum of a numeric column.

sql

Copy code
SELECT SUM(salary) FROM employees; -- Total salary of all employees

3. AVG()

Calculates the average value of a numeric column.

sql

Copy code

SELECT AVG(salary) FROM employees; -- Average salary

4. MIN()

Returns the minimum value in a set.

sql

Copy code

SELECT MIN(salary) FROM employees; -- Minimum salary

5. MAX()

Returns the maximum value in a set.

sql

Copy code

SELECT MAX(salary) FROM employees; -- Maximum salary

6. GROUP BY

Used to group rows that have the same values in specified columns. Aggregate functions can
then be applied to these groups.

sql

Copy code

SELECT department_id, COUNT(*) FROM employees GROUP BY department_id; -- Count


employees in each department
7. HAVING

Used to filter records after aggregation. It's similar to the WHERE clause but operates on
aggregated data.

sql

Copy code

SELECT department_id, COUNT(*) AS employee_countFROM employeesGROUP BY


department_idHAVING COUNT(*) > 10; -- Only show departments with more than 10
employees

Example

Here’s a complete example that combines some of these functions:

sql

Copy code

SELECT department_id,

COUNT(*) AS employee_count,

AVG(salary) AS average_salaryFROM employeesGROUP BY department_idHAVING


AVG(salary) > 50000; -- Show departments with an average salary over 50,000

4.9 DML

DML, or Data Manipulation Language, is a subset of SQL used for managing data within existing
database structures. Here are the primary DML commands:

SELECT: Retrieves data from one or more tables.


sql

Copy code

SELECT * FROM employees;

INSERT: Adds new records to a table.

sql

Copy code

INSERT INTO employees (id, name, hire_date, salary) VALUES (1, 'John Doe', '2023-01-15',
60000);
UPDATE: Modifies existing records in a table.

sql

Copy code

UPDATE employees SET salary = 65000 WHERE id = 1;

DELETE: Removes records from a table.

sql

Copy code
DELETE FROM employees WHERE id = 1;

4.10 Modification

In SQL, modifications to the data within a database can be accomplished using several
statements, primarily INSERT, UPDATE, and DELETE. Here’s a breakdown of each:

1. INSERT

The INSERT statement is used to add new rows to a table.

Syntax:

sql

Copy code

INSERT INTO table_name (column1, column2, ...)VALUES (value1, value2, ...);

Example:

sql

Copy code

INSERT INTO employees (first_name, last_name, department_id)VALUES ('John', 'Doe', 3);

2. UPDATE

The UPDATE statement modifies existing records in a table.

Syntax:

sql
Copy code

UPDATE table_nameSET column1 = value1, column2 = value2, ...WHERE condition;

Example:

sql

Copy code

UPDATE employeesSET department_id = 4WHERE last_name = 'Doe';

3. DELETE

The DELETE statement removes records from a table.

Syntax:

sql

Copy code

DELETE FROM table_nameWHERE condition;

Example:

sql

Copy code

DELETE FROM employeesWHERE last_name = 'Doe';

Important Considerations

WHERE Clause: Always use a WHERE clause with UPDATE and DELETE to avoid modifying or
deleting all records.

Transactions: Consider using transactions (BEGIN, COMMIT, ROLLBACK) for bulk updates or
inserts to ensure data integrity.

Example of a Transaction

sql
Copy code

BEGIN;

UPDATE employeesSET salary = salary * 1.1WHERE department_id = 2;

COMMIT;

4.11 Truncation
In SQL, truncation generally refers to the process of removing all rows from a table without
logging individual row deletions. It is primarily done using the TRUNCATE statement.

Key Points about Truncation in SQL:

TRUNCATE Statement:

Syntax:

sql

Copy code

TRUNCATE TABLE table_name;


Differences from DELETE:

Speed: TRUNCATE is faster than DELETE because it does not log individual row deletions.

No WHERE Clause: You cannot use a WHERE clause with TRUNCATE. It removes all rows
unconditionally.

Resetting Identity: If the table has an identity column, TRUNCATE resets the identity seed back
to the starting value.

Usage:

Use TRUNCATE when you want to quickly remove all records from a table and do not need to
maintain a transaction log of individual deletions.

Example:

sql

Copy code

TRUNCATE TABLE employees;


Transactional Nature:

While TRUNCATE is generally a DDL command, it can be rolled back if used within a transaction
in databases that support this feature.

Example:

sql

Copy code

BEGIN TRANSACTION;TRUNCATE TABLE employees;ROLLBACK; -- This would restore the data

Permissions: You typically need higher privileges to execute a TRUNCATE command compared
to a DELETE command.

4.11 Constraints

In SQL, constraints are rules applied to columns in a table to enforce data integrity and ensure
the accuracy and reliability of the data. Here are the most commonly used types of constraints:
1. NOT NULL

Ensures that a column cannot contain NULL values.

sql

Copy code

CREATE TABLE employees (

id INT NOT NULL,

name VARCHAR(100) NOT NULL

);

2. UNIQUE

Ensures that all values in a column are unique across the table.

sql

Copy code

CREATE TABLE employees (

id INT UNIQUE,

email VARCHAR(100) UNIQUE

);

3. PRIMARY KEY

A combination of NOT NULL and UNIQUE. It uniquely identifies each row in a table. A table can
have only one primary key.

sql

Copy code

CREATE TABLE employees (


id INT PRIMARY KEY,

name VARCHAR(100)

);

4. FOREIGN KEY

Establishes a relationship between two tables. It ensures that the value in a column (or a set of
columns) matches a value in another table's primary key or unique column.

sql

Copy code

CREATE TABLE departments (

id INT PRIMARY KEY,

name VARCHAR(100)

);

CREATE TABLE employees (

id INT PRIMARY KEY,

name VARCHAR(100),

department_id INT,

FOREIGN KEY (department_id) REFERENCES departments(id)

);

5. CHECK

Ensures that all values in a column satisfy a specific condition.

sql

Copy code
CREATE TABLE employees (

id INT PRIMARY KEY,

age INT CHECK (age >= 18)

);

6. DEFAULT

Specifies a default value for a column when no value is provided during record insertion.

sql

Copy code

CREATE TABLE employees (

id INT PRIMARY KEY,

name VARCHAR(100),

hire_date DATE DEFAULT CURRENT_DATE

);

7. INDEX

While not a constraint in the traditional sense, indexes improve the speed of data retrieval
operations on a database table.

sql

Copy code

CREATE INDEX idx_employee_name ON employees(name);

Example of Using Multiple Constraints

Here's how you might use several constraints when creating a table:

sql
Copy code

CREATE TABLE employees (

id INT PRIMARY KEY,

name VARCHAR(100) NOT NULL,

email VARCHAR(100) UNIQUE,

age INT CHECK (age >= 18),

department_id INT,

FOREIGN KEY (department_id) REFERENCES departments(id)

);

4.12 Subquery

A subquery in SQL is a query nested within another query. It can be used in various clauses like
SELECT, WHERE, and FROM. Subqueries are helpful for breaking complex queries into simpler
parts, allowing you to perform operations that require results from multiple steps.

Types of Subqueries

Single-Row Subquery: Returns a single row.

Multiple-Row Subquery: Returns multiple rows.

Correlated Subquery: References columns from the outer query.

Basic Examples

1. Single-Row Subquery

Used in a WHERE clause to compare a value against a single value returned by the subquery.

sql
Copy code

SELECT first_name, last_name FROM employees WHERE department_id = (SELECT id FROM


departments WHERE name = 'Sales');

2. Multiple-Row Subquery

Used with IN, ANY, or ALL to compare against multiple values returned by the subquery.

sql

Copy code

SELECT first_name, last_name FROM employees WHERE department_id IN (SELECT id FROM


departments WHERE location = 'New York');

3. Correlated Subquery

References columns from the outer query, making it dependent on the outer query’s row.

sql

Copy code

SELECT e1.first_name, e1.salary FROM employees e1 WHERE e1.salary > (SELECT AVG(e2.salary)
FROM employees e2 WHERE e2.department_id = e1.department_id);

Subquery in the FROM Clause

You can also use a subquery in the FROM clause, treating the result as a temporary table.

sql

Copy code

SELECT dept.department_id, avg_salaries.avg_salary FROM (SELECT department_id,


AVG(salary) AS avg_salary

FROM employees

GROUP BY department_id) AS avg_salaries JOIN departments dept ON


avg_salaries.department_id = dept.id;
Important Considerations

Performance: Subqueries can be less efficient than joins, especially if they are executed
multiple times (as in correlated subqueries).

Readability: While subqueries can simplify complex queries, overusing them can lead to
confusion. Balancing readability and performance is key.

NULL Handling: Be cautious when using subqueries with NULL values, as they can affect results,
especially with comparisons.

You might also like