0% found this document useful (0 votes)

62 views7 pages

DDS Unit - 2

Uploaded by

Gamer Bhagvan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views7 pages

DDS Unit - 2

Uploaded by

Gamer Bhagvan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

UNIT – II – DISTRIBUTED DATABASE – KCA045

UNIT - II
QUERIES AND OPTIMAZATION
Global Queries to Fragment Queries-Equivalence Transformations for
Queries-Distributed Grouping and Aggregate Function Evaluation-
Parametric Queries-Optimization of Access Strategies-Framework for Query
Optimization-Join Queries- General Queries-Introduction to Distributed
Transactions.

Global Queries to Fragment Queries

When a query is placed, it is at first scanned, parsed and validated. An internal
representation of the query is then created such as a query tree or a query graph. Then
alternative execution strategies are devised for retrieving results from the database tables.
The process of choosing the most appropriate execution strategy for query processing is
called query optimization.

Query Optimization Issues in DDBMS

In DDBMS, query optimization is a crucial task. The complexity is high since number of
alternative strategies may increase exponentially due to the following factors −

 The presence of a number of fragments.

 Distribution of the fragments or tables across various sites.

 The speed of communication links.

 Disparity in local processing capabilities.

Hence, in a distributed system, the target is often to find a good execution strategy for query
processing rather than the best one. The time to execute a query is the sum of the following

 Time to communicate queries to databases.

 Time to execute local query fragments.

 Time to assemble data from different sites.

 Time to display results to the application.

Query Processing
Query processing is a set of all activities starting from query placement to displaying the
results of the query. The steps are as shown in the following diagram −
Figure 2.1 step in query processing

Global Query Optimization

Input: Fragment query

• Find the best (not necessarily optimal) global schedule

➡ Minimize a cost function

➡ Distributed join processing

✦ Bushy vs. linear trees

✦ Which relation to ship where?

✦ Ship-whole vs ship-as-needed

➡ Decide on the use of semijoins

✦ Semijoin saves on communication at the expense of more local
processing.

➡ Join methods

✦ nested loop vs ordered joins (merge join or hash join)

Cost-Based Optimization

• Solution space

➡ The set of equivalent algebra expressions (query trees).

• Cost function (in terms of time)

➡ I/O cost + CPU cost + communication cost

➡ These might have different weights in different distributed environments

(LAN vs WAN).

➡ Can also maximize throughput

• Search algorithm

➡ How do we move inside the solution space?

➡ Exhaustive search, heuristic algorithms (iterative improvement, simulated

annealing, genetic,…)

Query Optimization Process

Figure 2.2 Query Optimization Process

Search Space

• Search space characterized by alternative execution

• Focus on join trees

• For N relations, there are O(N!) equivalent join trees that can be obtained by applying
commutativity and associativity rules

SELECT ENAME,RESP

FROM EMP, ASG,PROJ

WHERE EMP.ENO=ASG.ENO

AND ASG.PNO=PROJ.PNO

Cost Functions

• Total Time (or Total Cost)

➡ Reduce each cost (in terms of time) component individually

➡ Do as little of each cost component as possible

➡ Optimizes the utilization of the resources

Increases system throughput
• Response Time

➡ Do as many things as possible in parallel

➡ May increase total time because of increased total activity

• Summation of all cost factors

• Total cost = CPU cost + I/O cost + communication cost

• CPU cost = unit instruction cost * no.of instructions

• I/O cost = unit disk I/O cost * no. of disk I/Os

• communication cost = message initiation + transmission

2- Step – Problem Definition

• Given

➡ A set of sites S = {s1, s2, …,sn} with the load of each site

➡ A query Q ={q1, q2, q3, q4} such that each subqueryqiis the maximum
processing unit that accesses one relation and communicates with its
neighboring queries

➡ For each qi in Q, a feasible allocation set of sites Sq={s1, s2, …,sk} where each
site stores a copy of the relation in qi

• The objective is to find an optimal allocation of Q to S such that

➡ the load unbalance of S is minimized

➡ The total communication cost is minimized

• For each q in Q compute load (Sq)

• While Q not empty do

➡ Select subquerya with least allocation flexibility

➡ Select best site b fora (with least load and best benefit)
➡ Remove a from Q and recompute loads if needed

2- Step Algorithm Example

•
Let Q = {q1, q2, q3, q4} where q1 is associated with R1, q2 is associated with R2 joined
with the result of q1, etc.

•
Iteration 1: select q4, allocate to s1, set load(s1)=2

•
Iteration 2: select q2, allocate to s2, set load(s2)=3

•
Iteration 3: select q3, allocate to s1, set load(s1) =3

•
Iteration 4: select q1, allocate to s3 or s4

Relational Algebra :
 The Relational Algebra is used to define the ways in which relations (tables) can be
operated to manipulate their data.
 This Algebra is composed of Unary operations (involving a single table) and Binary
operations (involving multiple tables).
 Join, Semi-join these are Binary operations in Relational Algebra.
Join
•
Join is a binary operation in Relational Algebra.
•
It combines records from two or more tables in a database.
•
A join is a means for combining fields from two tables by using values common to
each.
Semi-Join
•A Join where the result only contains the columns from one of the joined tables.
•Useful in distributed databases, so we don't have to send as much data over the network.
•Can dramatically speed up certain classes of queries.
What is “Semi-Join” ?
Semi-join strategies are technique for query processing in distributed database systems. Used
for reducing communication cost.
A semi-join between two tables returns rows from the first table where one or more matches
are found in the second table.
The difference between a semi-join and a conventional join is that rows in the first table will
be returned at most once. Even if the second table contains two matches for a row in the first
table, only one copy of the row will be returned.
Semi-joins are written using EXISTS or IN.

A Simple Semi-Join Example “Give a list of departments with at least one employee.” Query
written with a conventional join:
SELECT D.deptno, D.dname FROM dept D, emp E WHERE E.deptno = D.deptno
ORDER BY D.deptno;
◦ A department with N employees will appear in the list N times.
◦ We could use a DISTINCT keyword to get each department to appear only once.

A Simple Semi-Join Example “Give a list of departments with at least one employee.” Query
written with a semi-join:
SELECT D.deptno, D.dname FROM dept D WHERE EXISTS (SELECT 1 FROM
emp E WHERE E.deptno = D.deptno) ORDER BY D.deptno;
◦ No department appears more than once.
◦ Oracle stops processing each department as soon as the first employee in that
department is found.

C Tadm 23
50% (2)
C Tadm 23
14 pages
Database Systems: Concepts and Solutions
No ratings yet
Database Systems: Concepts and Solutions
13 pages
UNIT - 1 - Datawarehouse & Data Mining
100% (1)
UNIT - 1 - Datawarehouse & Data Mining
24 pages
Query Optimization Techniques
No ratings yet
Query Optimization Techniques
38 pages
DDBS Unit 2
No ratings yet
DDBS Unit 2
7 pages
Query Processing in Distributed Database
No ratings yet
Query Processing in Distributed Database
20 pages
Unit II QUERY PROCESSING AND DECOMPOSITION
No ratings yet
Unit II QUERY PROCESSING AND DECOMPOSITION
24 pages
Chapter 5
No ratings yet
Chapter 5
45 pages
CSE 453 Slide 3
No ratings yet
CSE 453 Slide 3
72 pages
Chapter - 2 Query Processing
No ratings yet
Chapter - 2 Query Processing
63 pages
Chapter 8
No ratings yet
Chapter 8
65 pages
SF8 - Unit 2 DDB
No ratings yet
SF8 - Unit 2 DDB
97 pages
Module 1 - Query Processing
No ratings yet
Module 1 - Query Processing
20 pages
Vu Lec 35
No ratings yet
Vu Lec 35
42 pages
DE Module5 QueryOptimization
No ratings yet
DE Module5 QueryOptimization
11 pages
4 2 Query - Processing
No ratings yet
4 2 Query - Processing
106 pages
Chapter - 2 Query Processing
No ratings yet
Chapter - 2 Query Processing
61 pages
Advanced Database
No ratings yet
Advanced Database
47 pages
Chapter 8
No ratings yet
Chapter 8
65 pages
Chapter 2 Query Processing and Optimization
No ratings yet
Chapter 2 Query Processing and Optimization
45 pages
DBMS Unit - 7
No ratings yet
DBMS Unit - 7
34 pages
Chapter - 2 Query Processing
No ratings yet
Chapter - 2 Query Processing
64 pages
Queryoptimization Examples
No ratings yet
Queryoptimization Examples
26 pages
Advanced Database System Chapter Two Query Processing and Optimization
No ratings yet
Advanced Database System Chapter Two Query Processing and Optimization
50 pages
Chapter - 2 Query Processing
No ratings yet
Chapter - 2 Query Processing
61 pages
Chapter 6 - Query Processing and Optimization Algorithm
No ratings yet
Chapter 6 - Query Processing and Optimization Algorithm
27 pages
Chapter 2
No ratings yet
Chapter 2
47 pages
DBMS Unit - 7
No ratings yet
DBMS Unit - 7
33 pages
DDBMS-Chapter-4-SE-LectureNote (Version 1)
No ratings yet
DDBMS-Chapter-4-SE-LectureNote (Version 1)
11 pages
Chapter 2-Query Processing and Optimi
No ratings yet
Chapter 2-Query Processing and Optimi
43 pages
Advanced Database System Chapter Three Query Processing and Optimization
No ratings yet
Advanced Database System Chapter Three Query Processing and Optimization
94 pages
2e Query Optimization Ozsu ch8
No ratings yet
2e Query Optimization Ozsu ch8
26 pages
Chapter 1 Query Processing and Optimization
No ratings yet
Chapter 1 Query Processing and Optimization
40 pages
Chapter 2 - Query Processing and Optimization
100% (1)
Chapter 2 - Query Processing and Optimization
28 pages
Unit-5 Query Processing and Optimization
No ratings yet
Unit-5 Query Processing and Optimization
40 pages
Algorithms For Query Processing and Optimization
No ratings yet
Algorithms For Query Processing and Optimization
77 pages
2 Chapter 3 Query Optimization
No ratings yet
2 Chapter 3 Query Optimization
29 pages
Query
No ratings yet
Query
13 pages
AMSAL
No ratings yet
AMSAL
58 pages
Adb ch2
No ratings yet
Adb ch2
72 pages
Chapter 4 Query Optimization
100% (2)
Chapter 4 Query Optimization
35 pages
QUERY Processing and Relational Algebra
No ratings yet
QUERY Processing and Relational Algebra
27 pages
SQL Server Query Processing Guide
No ratings yet
SQL Server Query Processing Guide
10 pages
Presentation9 - Query Processing and Query Optimization in DBMS
No ratings yet
Presentation9 - Query Processing and Query Optimization in DBMS
36 pages
4-Query - Processing (1) - PTIT
No ratings yet
4-Query - Processing (1) - PTIT
72 pages
Database Query Optimization Guide
No ratings yet
Database Query Optimization Guide
38 pages
Ivunit Query Processing
No ratings yet
Ivunit Query Processing
12 pages
Query Processing
No ratings yet
Query Processing
5 pages
ADBS - Chapter Two
No ratings yet
ADBS - Chapter Two
41 pages
Module - 4
No ratings yet
Module - 4
60 pages
ch2 PDF
No ratings yet
ch2 PDF
72 pages
Chapter 13: Query Processing
No ratings yet
Chapter 13: Query Processing
25 pages
Relational Algebra Optimization
No ratings yet
Relational Algebra Optimization
24 pages
12 Query Plan Space
No ratings yet
12 Query Plan Space
72 pages
DB - Lecture Query Optimization
No ratings yet
DB - Lecture Query Optimization
80 pages
Lec 14
No ratings yet
Lec 14
26 pages
Relational Query Optimization Guide
No ratings yet
Relational Query Optimization Guide
71 pages
Unit 5
No ratings yet
Unit 5
41 pages
4-Query Processing (Autosaved)
No ratings yet
4-Query Processing (Autosaved)
74 pages
Unit 3
No ratings yet
Unit 3
18 pages
20240108100
No ratings yet
20240108100
1 page
DDS Unit - 5
No ratings yet
DDS Unit - 5
27 pages
Unit-2 SQE
No ratings yet
Unit-2 SQE
8 pages
2024010865
No ratings yet
2024010865
1 page
2024010730
No ratings yet
2024010730
3 pages
MCQs Unit 2 Measures of Central Tendency
100% (1)
MCQs Unit 2 Measures of Central Tendency
16 pages
Existing Clients Can Be Copied From Local To Remote System and Vice Versa
No ratings yet
Existing Clients Can Be Copied From Local To Remote System and Vice Versa
21 pages
Update Function Module LUW V1 V2 V3 Update
75% (4)
Update Function Module LUW V1 V2 V3 Update
3 pages
IDP - Employee Central Core Hybrid - Employee Identifiers V1.7
100% (1)
IDP - Employee Central Core Hybrid - Employee Identifiers V1.7
22 pages
Iso TS 29585-2010
No ratings yet
Iso TS 29585-2010
64 pages
Blockchain Security in Smart Metering
No ratings yet
Blockchain Security in Smart Metering
20 pages
Cranfield Test
No ratings yet
Cranfield Test
11 pages
Reporter and Partner Codes Reporter, Partner and Item Types Flow Codes Service Items Codes Methodology Codes
No ratings yet
Reporter and Partner Codes Reporter, Partner and Item Types Flow Codes Service Items Codes Methodology Codes
17 pages
Passport Automation System
No ratings yet
Passport Automation System
22 pages
How To Upgrade Oracle 9.2.0.1 To 9.2.0.5 or 9.2.0.6
No ratings yet
How To Upgrade Oracle 9.2.0.1 To 9.2.0.5 or 9.2.0.6
3 pages
Ultimate CCP Exam Cram
100% (1)
Ultimate CCP Exam Cram
146 pages
Server Framework 101
No ratings yet
Server Framework 101
73 pages
SQL Assignment 12
No ratings yet
SQL Assignment 12
2 pages
Lecture-07 - Advanced Data Structures - 1
No ratings yet
Lecture-07 - Advanced Data Structures - 1
36 pages
Personnel Record Management System For Addis Ababa Water and Sewerage Authority
No ratings yet
Personnel Record Management System For Addis Ababa Water and Sewerage Authority
62 pages
Describe The Interdependence That Exists Between DSDLC Stages. Answer
No ratings yet
Describe The Interdependence That Exists Between DSDLC Stages. Answer
9 pages
Mongodb Cheat Sheet
No ratings yet
Mongodb Cheat Sheet
10 pages
Node JS Modules
No ratings yet
Node JS Modules
6 pages
LEA 103 CHAPTER 2-Part 1
No ratings yet
LEA 103 CHAPTER 2-Part 1
13 pages
DBLog: Watermark-Based CDC Framework
No ratings yet
DBLog: Watermark-Based CDC Framework
6 pages
Top 50 Pandas Interview Questions and Answers (2024)
No ratings yet
Top 50 Pandas Interview Questions and Answers (2024)
34 pages
Full Stack Development Course Syllabus-1
No ratings yet
Full Stack Development Course Syllabus-1
4 pages
Oracle Database Essentials
No ratings yet
Oracle Database Essentials
7 pages
Android Content Providers Guide
No ratings yet
Android Content Providers Guide
14 pages
RSAP 2019 Training Manual
No ratings yet
RSAP 2019 Training Manual
178 pages
SIMATIC IT Preactor APS 2016: User Guide
No ratings yet
SIMATIC IT Preactor APS 2016: User Guide
304 pages
The Ultimate Guide To C - C4H410 - 04 - SAP Certified Application Associate - SAP Sales Cloud 2011
No ratings yet
The Ultimate Guide To C - C4H410 - 04 - SAP Certified Application Associate - SAP Sales Cloud 2011
2 pages
VSE+InfoScale Enterprise OracleRAC 2020 05
No ratings yet
VSE+InfoScale Enterprise OracleRAC 2020 05
89 pages
Soumyajit Behera's Tech Resume
No ratings yet
Soumyajit Behera's Tech Resume
1 page

DDS Unit - 2

Uploaded by

DDS Unit - 2

Uploaded by

UNIT – II – DISTRIBUTED DATABASE – KCA045

Global Queries to Fragment Queries

Query Optimization Issues in DDBMS

 The presence of a number of fragments.

 Distribution of the fragments or tables across various sites.

 The speed of communication links.

 Disparity in local processing capabilities.

 Time to communicate queries to databases.

 Time to execute local query fragments.

 Time to assemble data from different sites.

 Time to display results to the application.

Global Query Optimization

Input: Fragment query

• Find the best (not necessarily optimal) global schedule

➡ Minimize a cost function

➡ Distributed join processing

✦ Bushy vs. linear trees

✦ Which relation to ship where?

➡ Decide on the use of semijoins

✦ nested loop vs ordered joins (merge join or hash join)

➡ The set of equivalent algebra expressions (query trees).

• Cost function (in terms of time)

➡ I/O cost + CPU cost + communication cost

➡ These might have different weights in different distributed environments

➡ Can also maximize throughput

➡ How do we move inside the solution space?

➡ Exhaustive search, heuristic algorithms (iterative improvement, simulated

Query Optimization Process

• Search space characterized by alternative execution

• Focus on join trees

FROM EMP, ASG,PROJ

• Total Time (or Total Cost)

➡ Reduce each cost (in terms of time) component individually

➡ Do as little of each cost component as possible

➡ Optimizes the utilization of the resources

➡ Do as many things as possible in parallel

➡ May increase total time because of increased total activity

• Summation of all cost factors

• Total cost = CPU cost + I/O cost + communication cost

• CPU cost = unit instruction cost * no.of instructions

• I/O cost = unit disk I/O cost * no. of disk I/Os

• communication cost = message initiation + transmission

2- Step – Problem Definition

• The objective is to find an optimal allocation of Q to S such that

➡ the load unbalance of S is minimized

➡ The total communication cost is minimized

• For each q in Q compute load (Sq)

• While Q not empty do

➡ Select subquerya with least allocation flexibility

2- Step Algorithm Example

You might also like