Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
5 views44 pages

Dbms

The document provides an overview of Entity-Relationship (ER) models and relational database design, explaining key concepts such as entities, attributes, relationships, and the importance of ER modeling for database clarity and efficiency. It also discusses advanced ER diagram concepts like generalization, specialization, and aggregation, as well as different database models including hierarchical, network, and relational models. Additionally, it covers relational database design principles, integrity constraints, normalization processes, and transaction processing with a focus on maintaining data accuracy and reliability.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views44 pages

Dbms

The document provides an overview of Entity-Relationship (ER) models and relational database design, explaining key concepts such as entities, attributes, relationships, and the importance of ER modeling for database clarity and efficiency. It also discusses advanced ER diagram concepts like generalization, specialization, and aggregation, as well as different database models including hierarchical, network, and relational models. Additionally, it covers relational database design principles, integrity constraints, normalization processes, and transaction processing with a focus on maintaining data accuracy and reliability.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

.

1. ER Models and Relational Database Design


Introduction

An Entity-Relationship (ER) Model is a way to visually and logically plan a database before building it. It
helps us identify what data we need to store, how different pieces of data are related, and how to organize
everything efficiently. Once we have an ER model, we can convert it into a relational database design, which
means creating tables and relationships in a way that computers can manage and users can easily query.

What is an ER Model?

An ER model uses diagrams (called ER diagrams) to represent:

 Entities: Objects or things we want to store data about (e.g., Student, Teacher, Course).

 Attributes: Details about each entity (e.g., Student Name, Age).

 Relationships: How entities are connected (e.g., a Student enrolls in a Course).

Why Use ER Models?

 Clarity: ER diagrams make it easy to see how data is organized and related.

 Planning: Helps avoid mistakes and missing data before building the actual database.

 Communication: Makes it easier to explain the database design to others.

Main Components of ER Models

1. Entities

Entities are the main objects you want to store information about. Each entity becomes a table in the database.
Examples:

 Student

 Teacher

 Course

2. Attributes

Attributes are the details or properties of each entity.

Examples:

 Student: StudentID, Name, Age

 Course: CourseID, CourseName

3. Relationships

Relationships show how entities are connected.

Examples:

 A Student enrolls in a Course.

 A Teacher teaches a Course.

4. Keys

 Primary Key: Uniquely identifies each record in a table (e.g., StudentID).

 Foreign Key: Connects one table to another (e.g., StudentID in the Enrollment table).

Example: ER Diagram for a School Database

Suppose we want to keep track of students and the courses they take.

Entities and Attributes

 Student: StudentID (PK), Name, Age

 Course: CourseID (PK), CourseName


Relationship

 Enrollment: A student can enroll in many courses, and each course can have many students (many-to-
many relationship).

ER Diagram (Text Representation)


[Student]---<enrolls in>---[Course]

Or, more visually:

+-----------+ +-----------+
| Student | | Course |
+-----------+ +-----------+
| StudentID |<--------->| CourseID |
| Name | | Name |
| Age | +-----------+
+-----------+

To represent many-to-many relationships in a relational database, we use a third table called a junction table
(e.g., Enrollment).

Enrollment Table:

StudentID CourseID

1 101

1 102

2 101

Steps to Convert ER Model to Relational Database Design

1. Identify Entities: Each entity becomes a table.

2. List Attributes: Each attribute becomes a column in the table.

3. Define Keys: Choose a primary key for each table.

4. Map Relationships:

o One-to-One: Add a foreign key to either table.


o One-to-Many: Add a foreign key in the "many" table.

o Many-to-Many: Create a new table with foreign keys from both entities.

Why is ER Modeling Important for Relational Database Design?

 Clarity: Makes it easier to see what data is needed and how it’s connected.

 Efficiency: Helps avoid storing the same data in multiple places.

 Data Integrity: Ensures data is accurate and relationships are maintained.

 Flexibility: Easy to update the design if requirements change.

Real-Life Example

Imagine you are designing a library database. You need to keep track of books, members, and which books are
borrowed.

 Entities: Book, Member

 Attributes: BookID, Title, Author; MemberID, Name, Address

 Relationship: Member borrows Book

ER Diagram:

[Member]---<borrows>---[Book]

Relational Tables:

MemberID Name Address

1 Riya Guwahati

BookID Title Author

-------- --------------- ------------

101 DBMS Basics Sharma

MemberID BookID

---------- --------
1 101

Advantages of Using ER Models in Relational Design

1. Visualization: You can see the entire structure at a glance.

2. Communication: Makes it easier to explain the design to others.

3. Consistency: Helps ensure all data is connected correctly.

4. Error Reduction: Spot mistakes before building the database.

Diagram Example
+-----------+ +-----------+
| Student | | Course |
+-----------+ +-----------+
| StudentID |<--------->| CourseID |
| Name | | Name |
| Age | +-----------+
+-----------+

Summary Table

Step What to Do

Identify Entities Find main objects (Student, Course)

List Attributes Find details for each entity

Define Keys Choose unique identifiers

Map Relationships Connect entities (one-to-one, one-to-many)

Convert to Tables Make tables for each entity and relationship


Conclusion

ER modeling is the foundation of good relational database design. By carefully planning entities, attributes, and
relationships, you can create a database that is easy to use, efficient, and error-free. Once the ER diagram is
ready, converting it into tables and keys is straightforward, ensuring your data is well-organized and reliable for
any application.

2. ER Diagrams: Generalisation, Specialisation,


Aggregation
Introduction

ER diagrams are powerful tools for designing databases, but sometimes real-world situations are more complex.
To handle these, we use advanced concepts like Generalisation, Specialisation, and Aggregation. These help
us simplify, organize, or add detail to our database designs.

Generalisation

Definition

Generalisation is the process of combining similar entities into a single, more general entity. It’s a bottom-up
approach, where you look for common features among entities and create a super-entity.

Example

Suppose you have two entities: Car and Bike. Both have common attributes like "Color", "EngineNo", and
"Owner". Instead of repeating these attributes in both entities, you can generalize them into a single entity called
Vehicle.

Diagram:

Car Bike
\ /
[Vehicle]

Why Use Generalisation?


 Reduces repetition of common attributes.

 Makes the database design simpler and easier to maintain.

Specialisation

Definition

Specialisation is the opposite of generalisation. It’s a top-down approach, where you start with a general entity
and split it into more specific sub-entities based on unique features.

Example

Suppose you have an entity called Employee. Some employees are Teachers, others are Clerks. Teachers have
"Subject" as an attribute, while Clerks have "Department".

Diagram:

[Employee]
/ \
[Teacher] [Clerk]

Why Use Specialisation?

 Allows you to store specific attributes for different types of entities.

 Makes queries and data management more efficient.

Aggregation

Definition

Aggregation is used when a relationship itself needs to be treated as an entity. It helps when you want to model
relationships between relationships.

Example
Suppose you have Employee and Project entities, and a relationship "Works_On". Now, you want to track
which Department is responsible for each "Works_On" relationship.

Diagram:

[Employee]---(works on)---[Project]
\________________________/
[Assignment] (Aggregation)

Why Use Aggregation?

 Allows modeling of complex relationships.

 Useful when relationships have their own properties.

Real-Life Example

Imagine a university:

 Generalisation: Both "Undergraduate" and "Postgraduate" students can be generalized as "Student".

 Specialisation: "Student" can be specialized into "Hosteller" and "Day Scholar".

 Aggregation: "Student participates in Event" can be aggregated if you want to relate "Event
Participation" to "Sponsorship".

Diagram Summary
Generalisation:
Car Bike
\ /
Vehicle

Specialisation:
Employee
/ \
Teacher Clerk

Aggregation:
[Employee]---(works on)---[Project]
\________________________/
[Assignment]

Conclusion

Generalisation, specialisation, and aggregation are advanced ER diagram concepts that help you design flexible,
organized, and detailed databases. They make it easier to handle real-world complexity and ensure your database
can grow as requirements change.

3. Database Models: Network Model, Hierarchical


Model, Relational Model
Introduction

A database model is a way to organize and structure data in a database. The three main types are Hierarchical,
Network, and Relational models. Each has its own way of storing data and connecting information.

Hierarchical Model

Structure

 Data is organized in a tree-like structure.

 Each parent can have multiple children, but each child has only one parent.

Example

Think of a company:

 CEO (parent)

o Manager (child of CEO)

 Employee (child of Manager)

Diagram:
CEO
|
Manager
|
Employee

Pros and Cons

 Pros: Simple, fast for one-to-many relationships.

 Cons: Not flexible for many-to-many relationships.

Network Model

Structure

 Data is organized as a graph.

 Each record can have multiple parents and children (many-to-many relationships).

Example

Students and Courses:

 Each student can enroll in many courses.

 Each course can have many students.

Diagram:

Student <---> Course

Pros and Cons

 Pros: Handles complex relationships.

 Cons: Harder to design and manage.

Relational Model
Structure

 Data is stored in tables (relations) with rows and columns.

 Tables are connected using keys.

Example

StudentID Name

1 Riya

CourseID CourseName

---------- ------------

101 Math

StudentID CourseID

----------- ----------

1 101

Pros and Cons

 Pros: Flexible, easy to use, supports complex queries.

 Cons: Can be slower for very large or complex data.

Comparison Table

Model Structure Relationship Type Example Use

Hierarchical Tree One-to-many Organization chart

Network Graph Many-to-many Student-Course

Relational Table Any (via keys) Modern databases

Conclusion
Understanding these models helps you choose the right structure for your data. Today, the relational model is
the most popular because it is flexible, powerful, and easy to use.

4. Relational Database Design: Underlying Concepts,


Structure, Study of Relational Language (Relational
Algebra, SQL, QBE)
Introduction

Relational database design is about organizing data into tables and defining rules for storing, updating, and
retrieving data efficiently and accurately. It uses relational languages like SQL to interact with data.

Underlying Concepts

 Table (Relation): Like a spreadsheet, stores data in rows and columns.

 Row (Tuple): A single record (e.g., one student).

 Column (Attribute): A property of the record (e.g., name, age).

 Primary Key: Uniquely identifies each row.

 Foreign Key: Connects tables.

Structure

Tables are connected by keys. Data is organized to:

 Reduce repetition (redundancy).

 Ensure accuracy (integrity).

 Make updates easy.

Example Structure:

StudentID Name Age


1 Riya 20

CourseID CourseName

---------- ------------

101 Math

StudentID CourseID

----------- ----------

1 101

Relational Languages

1. Relational Algebra

 A set of mathematical operations to get data from tables.

 Operations: Select, Project, Join, Union, Intersection, Difference.

Example:

 Select all students older than 18.

2. SQL (Structured Query Language)

 The most common language to interact with relational databases.

 Commands:

o SELECT (retrieve data)

o INSERT (add data)

o UPDATE (change data)

o DELETE (remove data)

Example:

SELECT Name FROM Students WHERE Age > 18;

3. QBE (Query By Example)


 A visual way to ask questions to the database by filling out a form.

 Common in user-friendly database systems like Microsoft Access.

Example: Using SQL

Suppose you want to find all students in the "Math" course.

SELECT Students.Name
FROM Students
JOIN Enrollment ON Students.StudentID = Enrollment.StudentID
JOIN Courses ON Enrollment.CourseID = Courses.CourseID
WHERE Courses.CourseName = 'Math';

Conclusion

Relational database design and languages like SQL make it easy to store, retrieve, and manage data.
Understanding these basics is essential for modern database systems.

5. Integrity Constraints (Domain Constraints,


Referential, Assertions, Triggers, Functional
Dependencies)
Introduction

Integrity constraints are rules that ensure the accuracy and reliability of data in a database. They prevent
mistakes and keep data consistent.

Domain Constraints

 Restrict the type of data in a column.

 Example: Age must be between 0 and 120.


Diagram:

| Age | (Allowed: 0–120)

Referential Integrity

 Ensures foreign keys match primary keys in another table.

 Example: Every CourseID in Enrollment must exist in Courses.

Diagram:

Enrollment.CourseID --> Courses.CourseID

Assertions

 Conditions that must always be true.

 Example: No student can enroll in more than 5 courses.

Triggers

 Actions that happen automatically when certain events occur.

 Example: When a student is added, send a welcome email.

Functional Dependencies

 Relationship between columns; knowing one value gives another.

 Example: StudentID determines Name.

Why Are Constraints Important?


 Data Accuracy: Prevents wrong data.

 Consistency: Keeps data reliable.

 Security: Stops unauthorized changes.

Conclusion

Integrity constraints are the backbone of a reliable database. They help maintain trust in your data by preventing
errors and enforcing rules.

6. Normalisation (Using FDs, Multivalued Dependencies,


Joint Dependencies), Domain-Key Normal Form
Introduction

Normalization is the process of organizing tables to reduce data repetition and improve data integrity. It uses
rules called normal forms.

Steps of Normalisation

1. First Normal Form (1NF)

 Each field has a single value (no repeating groups).

2. Second Normal Form (2NF)

 No partial dependency on primary key.

3. Third Normal Form (3NF)

 No transitive dependency.

4. Boyce-Codd Normal Form (BCNF)

 Stricter version of 3NF.


5. Domain-Key Normal Form (DKNF)

 All constraints are based on domains and keys.

Functional Dependencies (FDs)

 Used to find relationships between columns for normalization.

 Example: StudentID → Name

Multivalued Dependencies

 When one attribute determines a set of values for another.

Join Dependencies

 When a table can be split into smaller tables and joined back without losing information.

Example: Normalisation

Suppose you have a table:

Student Courses

Riya Math, Eng

After normalization:

Student Course

Riya Math

Riya Eng
Conclusion

Normalization makes your database efficient, easy to manage, and free from unnecessary duplication.

7. Transaction Processing: Concept, State, ACID


Properties, Serializability, Recoverability, Testing for
Serializability
Introduction

A transaction is a sequence of database operations that must be completed as a single unit. Transaction
processing ensures data remains accurate and reliable even when many users access the database at once.

Transaction Concept

 Transaction: A group of operations (like transferring money) that should be completed together.

 Example: Transferring ₹100 from account A to B involves deducting from A and adding to B.

Transaction States

 Active: Transaction is being executed.

 Partially Committed: Last operation is performed.

 Committed: All operations successful.

 Failed: Something went wrong.

 Aborted: Transaction is rolled back.

ACID Properties

 Atomicity: All or nothing.

 Consistency: Database remains correct.


 Isolation: Transactions do not affect each other.

 Durability: Once committed, changes are permanent.

Serializability

 Transactions should produce the same result as if they were run one after another.

Recoverability

 If a transaction fails, the system can return to a correct state.

Testing for Serializability

 Use precedence graphs to check if transactions can be reordered without conflict.

Conclusion

Transaction processing ensures that databases remain correct, even with many users and operations happening at
the same time.

8. Concurrency Control: Lock-Based, Timestamp-Based,


Validation-Based Protocols, Multi-Version Schemes,
Deadlock Handling
Introduction

Concurrency control manages how multiple users access the database at the same time, preventing errors and
conflicts.

Lock-Based Protocols

 Use locks to control access to data.


 Shared Lock: Many can read.

 Exclusive Lock: Only one can write.

Timestamp-Based Protocols

 Each transaction gets a timestamp.

 Older transactions get priority.

Validation-Based Protocols

 Transactions execute without restrictions, but are checked before committing.

Multi-Version Schemes

 Keep multiple versions of data for different transactions.

Deadlock Handling

 Deadlock: Two transactions wait for each other forever.

 Prevention: Avoid deadlocks by careful resource allocation.

 Detection: Find and break deadlocks when they happen.

 Recovery: Roll back one transaction to solve deadlock.

Conclusion

Concurrency control ensures that many users can safely use the database at the same time without causing
problems.
9. Recovery System: Lock-Based Recovery, Checkpoints,
Shadow Paging, Buffer Management, Recovery from
Loss, Logging, Rollback, Restart Recovery, Fuzzy
Checkpointing
Introduction

A recovery system restores the database to a correct state after a failure (like a crash or power cut).

Lock-Based Recovery

 Deferred: Changes are saved only after transaction commits.

 Immediate: Changes are saved as soon as they happen, but can be undone.

Checkpoints

 Save the state of the database at certain points for easier recovery.

Shadow Paging

 Use copies of data pages to keep the original safe until changes are committed.

Buffer Management

 Controls how data is moved between memory and disk.

Recovery from Loss of Non-Volatile Storage

 Use backups and logs to restore data after hardware failure.


Logging

 Keep a record of changes so they can be undone if needed.

Transaction Rollback

 Undo all changes made by a failed transaction.

Restart Recovery

 Restore the database to a consistent state after a crash.

Fuzzy Checkpointing

 Take checkpoints without stopping the system, allowing transactions to continue.

Conclusion

A good recovery system protects your data from crashes and failures, ensuring you never lose important
information.

1. QUERY PROCESSING
Introduction

Query processing is the series of steps a database management system (DBMS) follows to take a user’s query
(usually written in SQL), understand it, find the best way to answer it, and then actually get the results from the
database. This process is like translating a question in English into a step-by-step plan that the computer can
follow, making sure the answer is found quickly and efficiently.

Steps in Query Processing


1. Parsing and Translation

 Parsing is the first step. The DBMS breaks the SQL query into pieces called tokens (like SELECT,
FROM, WHERE) and checks if the query follows the correct syntax (grammar rules).

 Translation means converting the SQL query into an internal form, often using relational algebra (a
kind of mathematical language the DBMS understands).

 The DBMS also checks if the tables and columns mentioned actually exist and if the operations make
sense (semantic analysis).

Example:

SELECT emp_name FROM employee WHERE salary > 10000;

 The DBMS checks if the employee table and emp_name and salary columns exist.

2. Query Optimization

 There are often many ways to answer a query. The DBMS generates different possible “plans” for how to
get the answer.

 Query optimization is the process of choosing the best (fastest and least costly) plan.

 The optimizer considers things like:

o Which indexes exist?

o How many rows are in each table?

o Which order should tables be joined?

o How much CPU and disk access will each plan need?

3. Query Evaluation

 The chosen plan is executed.

 The DBMS uses physical operators (like scanning a table, using an index, joining tables) to actually
fetch the data.

 The results are returned to the user.

Compile-Time vs. Run-Time


 Compile-time: Parsing, translation, and optimization happen before the query is run.

 Run-time: The actual execution of the plan and returning of results.

Example Flow

1. User Query:
SELECT name FROM students WHERE marks > 80;

2. Parsing:
DBMS checks syntax and existence of table/column.

3. Translation:
SQL is converted to relational algebra, e.g.,
σ(marks > 80)(students)

4. Optimization:
DBMS considers using an index on marks, or scanning the table.

5. Evaluation:
The best plan is run, and results are shown.

Query Trees and Execution Plans

 The DBMS represents the query as a query tree (a diagram showing the order of operations).

 Each node in the tree is an operation (like SELECT or JOIN).

 The optimizer may rearrange the tree for better performance.

Diagram:

SELECT (marks > 80)


|
students

Importance of Query Processing

 Makes sure queries run fast, even with huge databases.


 Ensures correct results.

 Allows users to write simple SQL, while the DBMS handles the complex details.

Conclusion

Query processing is a key part of any DBMS. It turns user questions into efficient actions, making sure the
database is both powerful and easy to use.

2. STORAGE AND FILE STRUCTURE


Introduction

A database is stored on disk as files. The way data is stored and organized on disk affects how fast and
efficiently it can be accessed. Understanding storage and file structure is important for designing and using
databases.

Storage Types

1. Primary Storage

 Fast memory (RAM) used for current operations.

 Volatile (data is lost when power is off).

2. Secondary Storage

 Hard disks, SSDs, magnetic tapes.

 Non-volatile (data stays even when power is off).

 Used for storing the actual database files.

File Structure

 Database files are divided into blocks (fixed-size chunks, e.g., 4KB each).
 Each block can store several records (rows from a table).

 The way records are arranged in blocks affects how quickly they can be read or written.

Types of File Organization

1. Heap (Unordered) Files:

o Records are placed wherever there is space.

o Fast for inserting new records.

o Slow for searching.

2. Sequential Files:

o Records are stored in order (e.g., sorted by StudentID).

o Fast for reading in order, slow for random inserts.

3. Hashed Files:

o Records are placed based on a hash function (e.g., hash(StudentID)).

o Fast for searching by key, but not for range queries.

Blocking Factor

 The blocking factor is the number of records that fit in one block.

 Example: If a block is 4KB and each record is 100 bytes, then blocking factor = 4096/100 = 40 records
per block.

Access Paths

 Access path refers to the method used to find data in files.

 Indexing and hashing are common access paths.


Indexing

 An index is like a book’s table of contents.

 It helps the DBMS find data quickly without scanning the whole file.

 Types: Single-level, multi-level, clustered, non-clustered.

Hashing

 Uses a hash function to decide where to store or find a record.

 Very fast for equality searches (e.g., find student with ID=123).

External Sorting

 Sorting data that does not fit in memory.

 Uses techniques like merge sort with temporary files on disk.

Conclusion

Understanding storage and file structure helps in designing databases that are fast, reliable, and efficient.

3. FILE ORGANIZATION: DISK STORAGE


SYSTEM, BLOCKING FACTOR, ACCESS PATH:
SEARCHING, INDEXING AND HASHING
TECHNIQUES, EXTERNAL SORTING
Disk Storage System

 Data is stored on disks in blocks.

 Each block has a fixed size (e.g., 4KB).

 Disks have tracks and sectors, and data is read/written in blocks.


Blocking Factor (Recap)

 Number of records per block.

 Higher blocking factor = fewer disk reads = faster access.

Access Path: Searching

 Linear Search: Reads every record. Slow for large files.

 Binary Search: Fast, but only works on sorted files.

Indexing Techniques

 Single-Level Index: One index file points to data blocks.

 Multi-Level Index: Index on the index file for faster access.

 Clustered Index: Data is stored in the same order as the index.

 Non-Clustered Index: Index order is different from data order.

Diagram:

[Index File] --> [Data Block]

Hashing Techniques

 Static Hashing: Fixed number of buckets.

 Dynamic Hashing: Buckets can grow or shrink as data changes.

External Sorting

 Used when data is too big for memory.


 Merge Sort: Splits data into sorted runs, then merges them.

Conclusion

Choosing the right file organization and access path can make a huge difference in database performance.

4. TRANSFORMATION OF RELATIONAL
EXPRESSIONS, BREAKING OF QUERIES INTO
SUBQUERIES TO OPTIMISE EXECUTION PLAN,
SELECT, PROJECT AND JOIN OPERATIONS, SET
OPERATIONS, AGGREGATION, COST-BASED
QUERY OPTIMISATION, MEASUREMENT OF
COST OF A QUERY, EVALUATION OF
EXPRESSIONS
Transformation of Relational Expressions

 SQL queries are translated into relational algebra expressions.

 These can be rearranged and simplified for better performance.

Breaking Queries into Subqueries

 Complex queries can be split into smaller subqueries.

 Each subquery can be optimized separately.

 Helps the DBMS find the best execution plan.

Select, Project, and Join Operations

 Select (σ): Picks rows that meet a condition.


 Project (π): Picks specific columns.

 Join (⨝): Combines rows from two tables based on a condition.

Example:

SELECT name FROM students WHERE marks > 80;

 Selects rows where marks > 80, then projects the name column.

Set Operations

 Union: Combines results from two queries (removes duplicates).

 Intersection: Finds common rows.

 Difference: Finds rows in one result but not the other.

Aggregation

 SUM, AVG, MIN, MAX, COUNT are aggregation functions.

 Used to calculate totals, averages, etc.

Cost-Based Query Optimisation

 The optimizer estimates the "cost" (time, CPU, disk access) of each possible plan.

 Chooses the plan with the lowest cost.

Measurement of Cost

 Factors:

o Number of disk reads/writes

o CPU time

o Number of rows processed


 The DBMS uses statistics about the data to estimate costs.

Evaluation of Expressions

 The DBMS chooses the best order and method to evaluate each part of the query.

 Uses indexes, sorts, and joins as needed.

Conclusion

Transforming and optimizing queries ensures that the database answers questions quickly, even with large
amounts of data.

5. HEURISTIC QUERY OPTIMISATIONS: QUERY


TREE, QUERY GRAPH, REPRESENTATIONS OF
QUERIES IN QUERY TREE, STEPS FOR
HEURISTIC QUERY OPTIMISATION, SEMANTIC
QUERY OPTIMISATION
Heuristic Query Optimisation

 Uses rules of thumb (heuristics) to rearrange queries for better performance.

 Faster than cost-based optimization, but may not always find the absolute best plan.

Query Tree

 A tree diagram showing the sequence of operations for a query.

 Each node is an operation (like SELECT or JOIN).

Example:
JOIN
/ \
SELECT SELECT
| |
Table1 Table2

Query Graph

 Shows tables as nodes and joins as edges.

 Useful for visualizing complex queries.

Representations of Queries

 Query Tree: Shows the order of operations.

 Query Graph: Shows relationships among tables.

Steps for Heuristic Query Optimisation

1. Perform Selection Early: Filter rows as soon as possible.

2. Perform Projection Early: Remove unnecessary columns quickly.

3. Combine Selections and Projections: Simplify the tree.

4. Reorder Joins: Join smaller tables first.

5. Eliminate Redundant Operations: Remove unnecessary steps.

Semantic Query Optimisation

 Uses knowledge about the data (like constraints) to simplify queries.

 Example: If you know all students have marks > 0, you can remove "WHERE marks > 0" from the query.
Conclusion

Heuristic and semantic optimizations help the DBMS answer queries faster by simplifying and rearranging
operations based on rules and data knowledge.

1. QUERY PROCESSING
Introduction

Query processing is the backbone of every Database Management System (DBMS). When a user writes a query
(usually in SQL), the DBMS must interpret it, figure out the best way to execute it, and then return the result.
This process is called query processing. Efficient query processing is crucial for fast and correct data retrieval,
especially when databases are large and queries are complex.

Steps in Query Processing

1. Parsing and Translation

 Parsing: The DBMS checks the query’s syntax (grammar) and breaks it into tokens (keywords, table
names, column names, etc.).

 Semantic Analysis: It checks if the tables and columns exist and if the operations make sense.

 Translation: The query is converted into an internal form, usually relational algebra (a mathematical
way to describe database operations).

Example:

SELECT name FROM students WHERE marks > 80;

 The DBMS checks if the students table and name and marks columns exist.

 It translates the query into relational algebra:


σ(marks > 80)(π(name)(students))

2. Query Optimization

 There are many ways to answer a query. The DBMS creates different “plans” for how to get the answer.
 Query optimization is the process of finding the fastest, least costly plan.

 The optimizer considers:

o Indexes available

o Table sizes

o Order of joins

o Disk and CPU cost

Example:
If you join tables A, B, and C, the optimizer tries different join orders and chooses the best.

3. Query Evaluation

 The chosen plan is executed.

 The DBMS uses physical operators (like table scans, index scans, joins) to fetch data.

 The result is returned to the user.

Query Trees and Execution Plans

 Internally, queries are represented as query trees.

 Each node is an operation (SELECT, JOIN, etc.).

 The optimizer may rearrange the tree for better performance.

Diagram:

SELECT (marks > 80)


|
PROJECT (name)
|
STUDENTS

Importance of Query Processing


 Ensures fast response times

 Reduces resource usage

 Supports many users at once

 Scales with large data

Techniques Used

 Index usage: To quickly find rows.

 Join algorithms: Nested loop, hash join, sort-merge join.

 Pushing selections/projections early: To reduce data size quickly.

Example

Suppose you have:

StudentID Name Marks

1 Riya 85

2 Arjun 75

CourseID CourseName StudentID

---------- ------------ -----------

101 Math 1

102 Science 2

Query:

SELECT s.Name, c.CourseName


FROM Students s, Courses c
WHERE s.StudentID = c.StudentID AND s.Marks > 80;

 Parse, check syntax, and translate.

 Optimize: Use join on StudentID, apply marks filter early.


 Evaluate: Use hash join or index join.

 Return: Riya, Math

Summary Table

Step Description

Parsing Check syntax and semantics

Translation Convert SQL to relational algebra

Optimization Find best execution plan

Evaluation Execute plan and return result

Conclusion

Query processing turns user questions into efficient actions, ensuring that databases remain powerful and easy to
use, even as they grow in size and complexity.

2. STORAGE AND FILE STRUCTURE


Introduction

A database is stored on disk as files. The way data is stored and organized on disk affects how quickly and
efficiently it can be accessed. Understanding storage and file structure is crucial for designing high-performing
databases.

Types of Storage

1. Primary Storage

 RAM (main memory)

 Fast, but volatile (data lost when power is off)


 Used for processing queries and caching data

2. Secondary Storage

 Hard disks, SSDs, tapes

 Non-volatile (data persists)

 Used for storing database files

File Structure

 Database files are divided into blocks (fixed-size chunks, e.g., 4KB).

 Each block stores several records (rows).

 The arrangement of records in blocks affects read/write speed.

File Organization Methods

1. Heap (Unordered) Files

 Records are stored wherever there is space.

 Fast for inserts, slow for searches.

2. Sequential Files

 Records are stored in order (e.g., by StudentID).

 Fast for reading in order, slow for random inserts.

3. Hashed Files

 Records are placed based on a hash function.

 Fast for searches by key, not for range queries.

Blocking Factor
 Number of records per block.

 Example: If a block is 4KB and each record is 100 bytes, blocking factor = 4096/100 = 40 records per
block.

Access Paths

 Methods used to find data in files.

 Indexing and hashing are common access paths.

Indexing

 An index is like a book’s table of contents.

 It helps the DBMS find data quickly without scanning the whole file.

 Types: Single-level, multi-level, clustered, non-clustered.

Hashing

 Uses a hash function to decide where to store or find a record.

 Very fast for equality searches (e.g., find student with ID=123).

External Sorting

 Sorting data that does not fit in memory.

 Uses techniques like merge sort with temporary files on disk.

Example Diagram
+--------+--------+--------+
| Block1 | Block2 | Block3 |
+--------+--------+--------+
| rec1 | rec5 | rec9 |
| rec2 | rec6 | rec10 |
| rec3 | rec7 | rec11 |
| rec4 | rec8 | rec12 |
+--------+--------+--------+

Conclusion

Efficient storage and file structure design is essential for fast, reliable, and scalable databases.

3. FILE ORGANIZATION: DISK STORAGE


SYSTEM, BLOCKING FACTOR, ACCESS PATH:
SEARCHING, INDEXING AND HASHING
TECHNIQUES, EXTERNAL SORTING
Disk Storage System

 Data is stored on disks in blocks.

 Each block has a fixed size (e.g., 4KB).

 Disks have tracks and sectors, and data is read/written in blocks.

Blocking Factor (Recap)

 Number of records per block.

 Higher blocking factor = fewer disk reads = faster access.

Access Path: Searching

 Linear Search: Reads every record. Slow for large files.

 Binary Search: Fast, but only works on sorted files.


Indexing Techniques

 Single-Level Index: One index file points to data blocks.

 Multi-Level Index: Index on the index file for faster access.

 Clustered Index: Data is stored in the same order as the index.

 Non-Clustered Index: Index order is different from data order.

Diagram:

[Index File] --> [Data Block]

Hashing Techniques

 Static Hashing: Fixed number of buckets.

 Dynamic Hashing: Buckets can grow or shrink as data changes.

External Sorting

 Used when data is too big for memory.

 Merge Sort: Splits data into sorted runs, then merges them.

Conclusion

Choosing the right file organization and access path can make a huge difference in database performance.

4. TRANSFORMATION OF RELATIONAL
EXPRESSIONS, BREAKING OF QUERIES INTO
SUBQUERIES TO OPTIMISE EXECUTION PLAN,
SELECT, PROJECT AND JOIN OPERATIONS, SET
OPERATIONS, AGGREGATION, COST-BASED
QUERY OPTIMISATION, MEASUREMENT OF
COST OF A QUERY, EVALUATION OF
EXPRESSIONS
Transformation of Relational Expressions

 SQL queries are translated into relational algebra expressions.

 These can be rearranged and simplified for better performance.

Breaking Queries into Subqueries

 Complex queries can be split into smaller subqueries.

 Each subquery can be optimized separately.

 Helps the DBMS find the best execution plan.

Select, Project, and Join Operations

 Select (σ): Picks rows that meet a condition.

 Project (π): Picks specific columns.

 Join (⨝): Combines rows from two tables based on a condition.

Example:

SELECT name FROM students WHERE marks > 80;

 Selects rows where marks > 80, then projects the name column.

Set Operations
 Union: Combines results from two queries (removes duplicates).

 Intersection: Finds common rows.

 Difference: Finds rows in one result but not the other.

Aggregation

 SUM, AVG, MIN, MAX, COUNT are aggregation functions.

 Used to calculate totals, averages, etc.

Cost-Based Query Optimisation

 The optimizer estimates the "cost" (time, CPU, disk access) of each possible plan.

 Chooses the plan with the lowest cost.

Measurement of Cost

 Factors:

o Number of disk reads/writes

o CPU time

o Number of rows processed

 The DBMS uses statistics about the data to estimate costs.

Evaluation of Expressions

 The DBMS chooses the best order and method to evaluate each part of the query.

 Uses indexes, sorts, and joins as needed.

Conclusion
Transforming and optimizing queries ensures that the database answers questions quickly, even with large
amounts of data.

5. HEURISTIC QUERY OPTIMISATIONS: QUERY


TREE, QUERY GRAPH, REPRESENTATIONS OF
QUERIES IN QUERY TREE, STEPS FOR
HEURISTIC QUERY OPTIMISATION, SEMANTIC
QUERY OPTIMISATION
Heuristic Query Optimisation

 Uses rules of thumb (heuristics) to rearrange queries for better performance.

 Faster than cost-based optimization, but may not always find the absolute best plan.

Query Tree

 A tree diagram showing the sequence of operations for a query.

 Each node is an operation (like SELECT or JOIN).

Example:

JOIN
/ \
SELECT SELECT
| |
Table1 Table2

Query Graph

 Shows tables as nodes and joins as edges.

 Useful for visualizing complex queries.


Representations of Queries

 Query Tree: Shows the order of operations.

 Query Graph: Shows relationships among tables.

Steps for Heuristic Query Optimisation

1. Perform Selection Early: Filter rows as soon as possible.

2. Perform Projection Early: Remove unnecessary columns quickly.

3. Combine Selections and Projections: Simplify the tree.

4. Reorder Joins: Join smaller tables first.

5. Eliminate Redundant Operations: Remove unnecessary steps.

Semantic Query Optimisation

 Uses knowledge about the data (like constraints) to simplify queries.

 Example: If you know all students have marks > 0, you can remove "WHERE marks > 0" from the query.

Conclusion

Heuristic and semantic optimizations help the DBMS answer queries faster by simplifying and rearranging
operations based on rules and data knowledge.

END OF FILE

You might also like