0% found this document useful (1 vote)

2K views52 pages

Distributed Cost Model

This document discusses distributed query optimization. It begins with basic concepts like centralized versus distributed query optimization and search space reduction techniques. It then covers distributed cost models that optimize for total time or response time. Next it discusses using database statistics and selectivity factors to estimate intermediate result sizes for operations. It describes considerations for join ordering when relations are fragmented across multiple sites. Finally, it discusses using semijoins to efficiently implement joins by reducing relation sizes before transferring them between sites.

Uploaded by

nenz187

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (1 vote)

2K views52 pages

Distributed Cost Model

Uploaded by

nenz187

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

Distributed Database Systems

Fall 2012

Distributed Query Optimization

SL05

Basic Concepts

Distributed Cost Model

Database Statistics

Joins and Semijoins

Query Optimization Algorithms

DDBS12, SL05

1/52

M. Bohlen

Basic Concepts/1
I

Query optimization: Process of

producing an optimal (close to
optimal) query execution plan which
represents an execution strategy
I

Centralized query optimization:

I
I

The main task in query optimization

is to consider different orderings of
the operations
Find (the best) query execution plan
in space of equivalent query trees
Minimize an objective cost function
Gather statistics about relations

Distributed query optimization brings additional issues

I
I
I
I
I

DDBS12, SL05

Linear query trees are not necessarily a good choice

Bushy query trees are not necessarily a bad choice
What and where to ship the relations
How to ship relations (ship as a whole, ship as needed)
When to use semi-joins instead of joins
2/52

M. Bohlen

Basic Concepts/2
I

Search space: The set of alternative query execution plans (query

trees)
I
I
I

Typically very large

The main issue is to optimize joins
For N relations, there are O (N !) equivalent join trees that can be
obtained by applying commutativity and associativity rules

Example: 3 equivalent query trees (join trees) of the joins in the

following query
SELECT ENAME,RESP
FROM
EMP, ASG, PROJ
WHERE EMP.ENO=ASG.ENO AND ASG.PNO=PROJ.PNO

DDBS12, SL05

3/52

M. Bohlen

Basic Concepts/3
I

Reduction of the search space

Restrict by means of heuristics

Perform unary operations before binary operations, etc

Restrict the shape of the join tree

Consider the type of trees (linear trees vs. bushy trees)

Linear Join Tree

DDBS12, SL05

Bushy Join Tree

4/52

M. Bohlen

Basic Concepts/4
I

There are two main strategies to scan the search space

I
I

Deterministic
Randomized

Deterministic scan of the search space

DDBS12, SL05

Start from base relations and build plans by adding one relation at
each step
Breadth-first strategy (BFS): build all possible plans before choosing
the best plan (dynamic programming approach)
Depth-first strategy (DFS): build only one plan (greedy approach)

5/52

M. Bohlen

Basic Concepts/5
I

Randomized scan of the search space

I
I
I

Search for optimal solutions around a particular starting point

e.g., iterative improvement or simulated annealing techniques
Trades optimization time for execution time
I

DDBS12, SL05

Does not guarantee that the best solution is obtained, but avoid the
high cost of optimization

The strategy is better when more than 5-6 relations are involved

6/52

M. Bohlen

Distributed Cost Model/1

Two different types of cost functions can be used

Reduce total time

Reduce response time

I
I

DDBS12, SL05

Reduce each cost component (in terms of time) individually, i.e., do as

little for each cost component as possible
Optimize the utilization of the resources (i.e., increase system
throughput)
Do as many things in parallel as possible
May increase total time because of increased total activity

7/52

M. Bohlen

Distributed Cost Model/2

Total time: Sum of the time of all individual components

I
I

Local processing time: CPU time + I/O time

Communication time: fixed time to initiate a message + time to
transmit the data

Total time =TCPU #instructions + TI/O #I/Os +

TMSG #messages + TTR #bytes

The individual components of the total cost have different weights:

Wide area network

I
I
I

Local area networks

I
I

DDBS12, SL05

Message initiation and transmission costs are high

Local processing cost is low (fast mainframes or minicomputers)
Ratio of communication to I/O costs is 20:1
Communication and local processing costs are more or less equal
Ratio of communication to I/O costs is 1:1.6 (10MB/s network)
8/52

M. Bohlen

Distributed Cost Model/3

Response time: Elapsed time between the initiation and the

completion of a query
Response time =TCPU #seq instructions + TI/O #seq I/Os +
TMSG #seq messages + TTR #seq bytes

where #seq x (x in instructions, I/O, messages, bytes) is the

maximum number of x which must be done sequentially.

Any processing and communication done in parallel is ignored

DDBS12, SL05

9/52

M. Bohlen

Distributed Cost Model/4

Example: Query at site 3 with data from sites 1 and 2.

I
I
I

DDBS12, SL05

Assume that only the communication cost is considered

Total time = TMSG 2 + TTR (x + y )
Response time = max{TMSG + TTR x , TMSG + TTR y }

10/52

M. Bohlen

Database Statistics/1

The primary cost factor is the size of intermediate relations

I
I

that are produced during the execution and

must be transmitted over the network, if a subsequent operation is
located on a different site

It is costly to compute the size of the intermediate relations precisely.

Instead global statistics of relations and fragments are

computed and used to provide approximations

DDBS12, SL05

11/52

M. Bohlen

Database Statistics/2

I
I

Let R (A1 , A2 , . . . , Ak ) be a relation fragmented into R1 , R2 , . . . , Rr .

Relation statistics
I min and max values of each attribute: min{A }, max{A }.
i
i
I length of each attribute: length (A )
i
I number of distinct values in each domain: card (dom (A ))
i
Fragment statistics
I cardinality of the fragment: card (R )
i
I cardinality of each attribute of each fragment: card ( (R )), card (A )
Ai
j
i

DDBS12, SL05

12/52

M. Bohlen

Database Statistics/3
I

Selectivity factor of an operation: the proportion of tuples of an

operand relation that participate in the result of that operation

Assumption: independent attributes and uniform distribution of

attribute values

Selectivity factor of selection

SF (A = value ) =

card (A (R ))
max(A ) value
SF (A > value ) =
max(A ) min(A )
value min(A )
SF (A < value ) =
max(A ) min(A )

DDBS12, SL05

13/52

M. Bohlen

Database Statistics/4

Properties of the selectivity factor of the selection

SF (p (Ai ) p (Aj )) = SF (p (Ai )) SF (p (Aj ))
SF (p (Ai ) p (Aj )) = SF (p (Ai )) + SF (p (Aj ))

(SF (p (Ai )) SF (p (Aj ))

SF (A {values }) = SF (A = value ) card ({values })

DDBS12, SL05

14/52

M. Bohlen

Database Statistics/5
I

Cardinality of intermediate results

Selection
card (P (R )) = SF (P ) card (R )

Projection
I
I

More difficult: correlations between projected attributes are unknown

Simple if the projected attribute is a key

card (A (R )) = card (R )
I

Cartesian Product
card (R S ) = card (R ) card (S )

Union
I
I

Set Difference
I
I

DDBS12, SL05

upper bound: card (R S ) card (R ) + card (S )

lower bound: card (R S ) max{card (R ), card (S )}
upper bound: card (R S ) = card (R )
lower bound: 0
15/52

M. Bohlen

Database Statistics/6
I

Selectivity factor for joins

SFZ =

card (R Z S )
card (R ) card (S )

Cardinality of joins
I

Upper bound: cardinality of Cartesian Product

card (R Z S ) card (R ) card (S )

General case (if SF is given):

card (R Z S ) = SFZ card (R ) card (S )

Special case: R .A is a key of R and S .A is a foreign key of S;

each S-tuple matches with at most one tuple of R

card (R ZR .A =S .A S ) = card (S )

DDBS12, SL05

16/52

M. Bohlen

Database Statistics/7

Selectivity factor for semijoins: fraction of R-tuples that join with

S-tuples
I

An approximation is the selectivity of A in S

SFB< (R B<A S ) = SFB< (S .A ) =

card (A (S ))
card (dom[A ])

Cardinality of semijoin (general case):

card (R B<A S ) = SFB< (S .A ) card (R )

Example: R .A is a foreign key in S (S .A is a primary key)

Then SF = 1 and the result size corresponds to the size of R

DDBS12, SL05

17/52

M. Bohlen

Join Ordering in Fragment Queries/1

Join ordering is an important aspect in centralized DBMS, and it is

even more important in a DDBMS since joins between fragments
that are stored at different sites may increase the communication
time.
Two approaches exist:
I

Optimize the ordering of joins directly

I
I

Replace joins by combinations of semijoins in order to minimize the

communication costs
I

DDBS12, SL05

INGRES and distributed INGRES

System R and System R

Hill Climbing and SDD-1

18/52

M. Bohlen

Join Ordering in Fragment Queries/2

Direct join odering of two relation/fragments located at different

sites
I
I

DDBS12, SL05

Move the smaller relation to the other site

We have to estimate the size of R and S

19/52

M. Bohlen

Join Ordering in Fragment Queries/3

Direct join ordering of queries involving more than two relations is

substantially more complex

Example: Consider the following query and the respective join

graph, where we make also assumptions about the locations of the
three relations/fragments
PROJ ZPNO ASG ZENO EMP

DDBS12, SL05

20/52

M. Bohlen

Join Ordering in Fragment Queries/4

Example (contd.): The query can be evaluated in at least 5

different ways.
I

Plan 1:
EMPSite 2
Site 2: EMP=EMPZASG
EMPSite 3
Site 3: EMPZPROJ

Plan 2:
ASGSite 1
Site 1: EMP=EMPZASG
EMPSite 3
Site 3: EMPZPROJ

Plan 4:
PROJSite 2
Site 2: PROJ=PROJZASG
PROJSite 1
Site 1: PROJZEMP

Plan 3:
ASGSite 3
Site 3: ASG=ASGZPROJ
ASGSite 1
Site 1: ASGZEMP

Plan 5:
EMPSite 2
PROJSite 2
Site 2: EMPZPROJZASG

DDBS12, SL05

21/52

M. Bohlen

Join Ordering in Fragment Queries/5

To select a plan, a lot of information is needed, including

I size (EMP ), size (ASG ), size (PROJ )
I size (EMP Z ASG ), size (ASG Z PROJ )
I

DDBS12, SL05

Possibilities of parallel execution if response time is used

22/52

M. Bohlen

Semijoin Based Algorithms/1

Semijoins can be used to efficiently implement joins

The semijoin acts as a size reducer (similar as to a selection) such

that smaller relations need to be transferred

Consider two relations: R located at site 1 and S located and site 2

Solution with semijoins: Replace one or both operand

relations/fragments by a semijoin, using the following rules:
R ZA S (R B<A S ) ZA S

R ZA (S B<A R )
(R B<A S ) ZA (S B<A R )
I

The semijoin is beneficial if the cost to produce and send it to the

other site is less than the cost of sending the whole operand relation
and doing the actual join.

DDBS12, SL05

23/52

M. Bohlen

Semijoin Based Algorithms/2

sl06.2

Cost analysis R ZA S vs. (R B<A S ) Z S, assuming that

size (R ) < size (S )
I

Perform the join R Z S:

I
I

Perform the semijoins (R B< S ) Z S:

I
I
I
I
I

R Site 2
Site 2 computes R Z S
S 0 = A (S )
S 0 Site 1
Site 1 computes R 0 = R B< S 0
R 0 Site 2
Site 2 computes R 0 Z S

Semijoin is better if: size (A (S )) + size (R B< S ) < size (R )

The semijoin approach is better if the semijoin acts as a sufficient

reducer (i.e., a few tuples of R participate in the join)

The join approach is better if almost all tuples of R participate in

the join

DDBS12, SL05

24/52

M. Bohlen

INGRES Algorithm/1

INGRES uses a dynamic query optimization algorithm that

recursively breaks a query into smaller pieces. It is based on the
following ideas:
I

An n-relation query q is decomposed into n subqueries

q1 q2 qn
I
I

For the decomposition two basic techniques are used: detachment

and substitution
There is a processor that can efficiently process mono-relation
queries
I

DDBS12, SL05

Each qi is a mono-relation (mono-variable) query

The output of qi is consumed by qi +1

Optimizes each query independently for the access to a single relation

25/52

M. Bohlen

INGRES Algorithm/2
I

Detachment: Break a query q into q0 q00 , based on a common

relation that is the result of q0 , i.e.
I

The query
q: SELECT
FROM
WHERE
AND

is decomposed by detachment of the common relation R1 into

q0 :
SELECT R1 .A1
INTO
R10
FROM
R1
WHERE P1 (R1 .A10 )
q00 :

R2 .A2 , . . . , Rn .An
R1 , R2 , . . . , Rn
P1 (R1 .A10 )
P2 (R1 .A1 , . . . , Rn .An )

SELECT
FROM
WHERE

R2 .A2 , . . . , Rn .An
R10 , R2 , . . . , Rn
P2 (R10 .A1 , . . . , Rn .An )

Detachment reduces the size of the relation on which the query q00
is defined.

DDBS12, SL05

26/52

M. Bohlen

INGRES Algorithm/3
I

Example: Consider query q1: Names of employees working on the

CAD/CAM project
q1 : SELECT EMP.ENAME
FROM
EMP, ASG, PROJ
WHERE EMP.ENO = ASG.ENO
AND
ASG.PNO = PROJ.PNO
AND
PROJ.PNAME = CAD/CAM

Decompose q1 into q11 q0 :

q11 : SELECT PROJ.PNO
INTO
JVAR
FROM
PROJ
WHERE PROJ.PNAME = CAD/CAM
q0 :

DDBS12, SL05

SELECT
FROM
WHERE
AND

EMP.ENAME
EMP, ASG, JVAR
EMP.ENO = ASG.ENO
ASG.PNO = JVAR.PNO

27/52

M. Bohlen

INGRES Algorithm/4
I

I
I
I

Example (contd.): The successive detachments may transform q0

into q12 q13 :
q0 :
SELECT EMP.ENAME
FROM
EMP, ASG, JVAR
WHERE EMP.ENO = ASG.ENO
AND
ASG.PNO = JVAR.PNO
q12 :

SELECT
INTO
FROM
WHERE

ASG.ENO
GVAR
ASG, JVAR
ASG.PNO=JVAR.PNO

q13 :

SELECT
FROM
WHERE

EMP.ENAME
EMP, GVAR
EMP.ENO=GVAR.ENO

q1 is now decomposed by detachment into q11 q12 q13

q11 is a mono-relation query
q12 and q13 are multi-relation queries, which cannot be further
detached; also called irreducible

DDBS12, SL05

28/52

M. Bohlen

INGRES Algorithm/5
I

Tuple substitution allows to convert an irreducible query q into

mono-relation queries.
I
I

Choose a relation R1 in q for tuple substitution

For each tuple in R1 , replace the R1 -attributes referred in q by their
actual values, thereby generating a set of subqueries q0 with n 1
relations, i.e.,
q(R1 , R2 , . . . , Rn ) is replaced by {q0 (t1i , R2 , . . . , Rn ), t1i R1 }

Example (contd.): Assume GVAR consists only of the tuples

{E1, E2}. Then q13 is rewritten with tuple substitution in the following
way
q13 : SELECT EMP.ENAME
FROM
EMP, GVAR
WHERE EMP.ENO = GVAR.ENO
q131 :

DDBS12, SL05

SELECT
FROM
WHERE

EMP.ENAME
EMP
EMP.ENO = E1
29/52

M. Bohlen

INGRES Algorithm/6

Example (contd.):
q132 :

DDBS12, SL05

SELECT
FROM
WHERE

EMP.ENAME
EMP
EMP.ENO = E2

q131 and q132 are mono-relation queries

30/52

M. Bohlen

Distributed INGRES Algorithm

sl06.1

The distributed INGRES query optimization algorithm is very

similar to the centralized INGRES algorithm.
I

DDBS12, SL05

In addition to the centralized INGRES, the distributed one should

break up each query qi into sub-queries that operate on fragments;
only horizontal fragmentation is handled.
Optimization with respect to a combination of communication cost
and response time

31/52

M. Bohlen

System R Algorithm/1
I

The System R (centralized) query optimization algorithm

Performs static query optimization based on exhaustive search of

the solution space and a cost function (IO cost + CPU cost)
I
I
I

Input: relational algebra tree

Output: optimal relational algebra tree
Dynamic programming technique is applied to reduce the number of
alternative plans

The optimization algorithm consists of two steps

1. Predict the best access method to each individual relation
(mono-relation query)
2. Consider using index, file scan, etc.
3. For each relation R, estimate the best join ordering
4. R is first accessed using its best single-relation access method
5. Efficient access to inner relation is crucial

Considers two different join strategies

I
I

DDBS12, SL05

(Indexed-) nested loop join

Sort-merge join

32/52

M. Bohlen

System R Algorithm/2
I

Example: Consider query q1: Names of employees working on the

CAD/CAM project
PROJ ZPNO ASG ZENO EMP
I

Join graph

Indexes
I
I
I

DDBS12, SL05

EMP has an index on ENO

ASG has an index on PNO
PROJ has an index on PNO and an index on PNAME

33/52

M. Bohlen

System R Algorithm/3

Example (contd.): Step 1 Select the best single-relation access

paths
I
I
I

DDBS12, SL05

EMP: sequential scan (because there is no selection on EMP)

ASG: sequential scan (because there is no selection on ASG)
PROJ: index on PNAME (because there is a selection on PROJ
based on PNAME)

34/52

M. Bohlen

System R Algorithm/4
I

sl06.4

Example (contd.): Step 2 Select the best join ordering for each
relation

I
I

(EMP PROJ) and (PROJ EMP) are pruned because they are CPs
(ASG Z PROJ) pruned because (we assume) it has higher cost than
(PROJ Z ASG); similar for (ASG Z EMP)
Best total join order ((PROJZ ASG)Z EMP), since it uses the indexes
best
I
I
I

DDBS12, SL05

Select PROJ using index on PNAME

Join with ASG using index on PNO
Join with EMP using index on ENO
35/52

M. Bohlen

Distributed System R Algorithm/1

The System R query optimization algorithm is an extension of

the System R query optimization algorithm with the following main
characteristics:
I

Only the whole relations can be distributed, i.e., fragmentation and

replication is not considered
Query compilation is a distributed task, coordinated by a master site,
where the query is initiated
Master site makes all inter-site decisions, e.g., selection of the
execution sites, join ordering, method of data transfer, ...
The local sites do the intra-site (local) optimizations, e.g., local joins,
access paths

Join ordering and data transfer between different sites are the most
critical issues to be considered by the master site

DDBS12, SL05

36/52

M. Bohlen

Distributed System R Algorithm/2

Two methods for inter-site data transfer

Ship whole: The entire relation is shipped to the join site and stored
in a temporary relation
I
I
I

Fetch as needed: The outer relation is sequentially scanned, and for

each tuple the join value is sent to the site of the inner relation and
the matching inner tuples are sent back (i.e., semijoin)
I
I
I

DDBS12, SL05

Larger data transfer

Smaller number of messages
Better if relations are small

Number of messages = O(cardinality of outer relation)

Data transfer per message is minimal
Better if relations are large and the selectivity is good

37/52

M. Bohlen

Distributed System R Algorithm/3

Four main join strategies for R Z S:

I
I

Notation:
I
I
I

R is outer relation
S is inner relation
LT denotes local processing time
CT denotes communication time
s denotes the average number of S-tuples that match an R-tuple

Strategy 1: Ship the entire outer relation to the site of the inner
relation, i.e.,
I
I
I

Retrieve outer tuples

Send them to the inner relation site
Join them as they arrive

Total cost = LT (retrieve card (R ) tuples from R ) +

CT (size (R )) +
LT (retrieve s tuples from S ) card (R )
DDBS12, SL05

38/52

M. Bohlen

Distributed System R Algorithm/4

Strategy 2: Ship the entire inner relation to the site of the outer
relation. We cannot join as they arrive; they need to be stored.
I

The inner relation S need to be stored in a temporary relation

Total cost = LT (retrieve card (S ) tuples from S ) +

CT (size (S )) +
LT (store card (S ) tuples in T ) +
LT (retrieve card (R ) tuples from R ) +
LT (retrieve s tuples from T ) card (R )

DDBS12, SL05

39/52

M. Bohlen

Distributed System R Algorithm/5

Strategy 3: Fetch tuples of the inner relation as needed for each

tuple of the outer relation.
I
I

For each R-tuple, the join attribute A is sent to the site of S

The s matching S-tuples are retrieved and sent to the site of R

Total cost = LT (retrieve card (R ) tuples from R ) +

CT (length (A )) card (R ) +
LT (retrieve s tuples from S ) card (R ) +
CT (s length (S )) card (R )

DDBS12, SL05

40/52

M. Bohlen

sl06.6
sl06.7

Distributed System R Algorithm/6

Strategy 4: Move both relations to a third site and compute the join
there.
I

The inner relation S is first moved to a third site and stored in a

temporary relation.
Then the outer relation is moved to the third site and its tuples are
joined as they arrive.

Total cost = LT (retrieve card (S ) tuples from S ) +

CT (size (S )) +
LT (store card (S ) tuples in T ) +
LT (retrieve card (R ) tuples from R ) +
CT (size (R )) +
LT (retrieve s tuples from T ) card (R )

DDBS12, SL05

41/52

M. Bohlen

Hill-Climbing Algorithm/1

Hill-Climbing query optimization algorithm

I
I
I

DDBS12, SL05

Refinements of an initial feasible solution are recursively computed

until no more cost improvements can be made
Semijoins, data replication, and fragmentation are not used
Devised for wide area point-to-point networks
The first distributed query processing algorithm

42/52

M. Bohlen

Hill-Climbing Algorithm/2
I

The hill-climbing algorithm proceeds as follows

1. Select initial feasible execution strategy ES0
I

i.e., a global execution schedule that includes all intersite

communication
Determine the candidate result sites, where a relation referenced in the
query exist
Compute the cost of transferring all the other referenced relations to
each candidate site
ES0 = candidate site with minimum cost

2. Split ES0 into two strategies: ES1 followed by ES2

ES1: send one of the relations involved in the join to the other relations
site
ES2: send the join result to the final result site

3. Replace ES0 with the split schedule which gives

cost (ES1) + cost (local join) + cost (ES2) < cost (ES0)
4. Recursively apply steps 2 and 3 on ES1 and ES2 until no more
benefit can be gained
5. Check for redundant transmissions in the final plan and eliminate
them
DDBS12, SL05

43/52

M. Bohlen

Hill-Climbing Algorithm/3
I

Example: What are the salaries of engineers who work on the

CAD/CAM project?
SAL (PAY ZTITLE EMP ZENO (ASG ZPNO (PNAME =CAD /CAM 00 (PROJ ))))
I

Schemas: EMP(ENO, ENAME, TITLE), ASG(ENO, PNO, RESP,

DUR), PROJ(PNO, PNAME, BUDGET, LOC), PAY(TITLE, SAL)
Statistics
Relation Size Site
EMP
8
1
PAY
4
2
PROJ
1
3
ASG
10
4
Assumptions:
I
I
I
I
I

DDBS12, SL05

Size of relations is defined as their cardinality

Minimize total cost
Transmission cost between two sites is 1
Ignore local processing cost
size(EMP Z PAY) = 8, size(PROJ Z ASG) = 2, size(ASG Z EMP) = 10
44/52

M. Bohlen

Hill-Climbing Algorithm/4
I

Example (contd.): Determine initial feasible execution strategy

Alternative 1: Resulting site is site 1

Total cost = cost (PAY Site1) + cost (ASG Site1) +
cost (PROJ Site1)
= 4 + 10 + 1 = 15

Alternative 2: Resulting site is site 2

Total cost = 8 + 10 + 1 = 19

Alternative 3: Resulting site is site 3

Total cost = 8 + 4 + 10 = 22

Alternative 4: Resulting site is site 4

Total cost = 8 + 4 + 1 = 13

I
DDBS12, SL05

Therefore ES0 = EMPSite4; PAY Site4; PROJ Site4

45/52

M. Bohlen

Hill-Climbing Algorithm/5
I

Example (contd.): Candidate split

Alternative 1: ES1,
ES2, ES3
I
I

cost ((EMP Z PAY) Site4) +

ES1: EMPSite 2
ES2: (EMPZPAY)
Site4
ES3: PROJSite 4

Alternative 2: ES1,
ES2, ES3
I

Total cost = cost (EMP Site2) +

cost (PROJ Site4)

= 8 + 8 + 1 = 17

Total cost = cost (PAYSite 1) +

ES1: PAY Site1

ES2: (PAY Z
EMP) Site4
ES3: PROJ
Site 4

cost ((PAY Z EMP) Site4) +

cost (PROJ Site4)

= 4 + 8 + 1 = 13

Both alternatives are not better than ES0, so keep ES0 (or take
alternative 2 which has the same cost)

DDBS12, SL05

46/52

M. Bohlen

Hill-Climbing Algorithm/6

Problems
I

I
I

sl06.5

Greedy algorithm determines an initial feasible solution and iteratively

improves it
If there are local minima, it may not find the global minimum
An optimal schedule with a high initial cost would not be found, since
it wont be chosen as the initial feasible solution

Example: A better schedule is

I PROJSite 4
I ASG = (PROJZASG)Site 1
I (ASGZEMP)Site 2
I Total cost= 1 + 2 + 2 = 5

DDBS12, SL05

47/52

M. Bohlen

SDD-1
I

The SDD-1 algorithm extends the hill climbing algorithm with

semijoins and has the following properties:
I

Considers semijoins
I
I

I
I

cost (R |>< A S ) = CMSG + size (A (S )) CTR

benefit (R |>< A S ) = (1 SF |>< (S .A )) size (R ) CTR

Does not consider replication and fragmentation

Cost of transferring the result to the user site from the final result site
is not considered
Can minimize either total time or response time

The SDD-1 algorithm works with and updates a database profile:

R
R1
R2
R3

DDBS12, SL05

size (R )
1500
3000
2000

A
R1.A
R2.A
R2.B
R3.B

SF |><
0.3
0.8
1.0
0.4

48/52

size (A )
36
320
400
80

M. Bohlen

SDD-1 Algorithm
Step 1 Include all local processing in the execution strategy ES.
Step 2 Update database profile with effects of local processing.
Step 3 Determine beneficial

|><

, i.e., cost ( |>< i ) < benefit ( |>< i ).

Step 4 Remove the most beneficial

|><

and append it to ES.

Step 5 Update the database profile.

Step 6 Update the set of beneficial semijoins; possibly include new
ones.
Step 7 If there are beneficial semijoins go back to Step 4.
Step 8 Find the site where the largest amount of data resides and
select it as the result site.
Step 9 For each Ri at the result site, remove semijoins of the form
Ri |>< Rj where the total cost of ES without this semijoin is
smaller than the cost with it.
Step 10 Permute the order of semijoins if doing so would improve
the total cost of ES.
DDBS12, SL05

49/52

M. Bohlen

Conclusion
I

Distributed query optimization is more complex that centralized

query processing, since
I
I

bushy query trees are not necessarily a bad choice

one needs to decide what, where, and how to ship the relations
between the sites

Query optimization searches the optimal query plan (tree)

For N relations, there are O (N !) equivalent join trees. To cope with

the complexity heuristics and/or restricted types of trees are
considered.

There are two main strategies in query optimization: randomized

and deterministic.

Semi-joins can be used to implement a join. The semi-joins require

more operations to perform, but the data transfer rate is reduced.

INGRES, System R and Hill Climbing are distributed query

optimization algorithms.

DDBS12, SL05

50/52

M. Bohlen

Course Project

I
I

Hand in of project: December 23, 2012

Report
I
I
I
I
I

problem definition
running example
description of solution
evaluation
strength, weaknesses, limitations

Report (5 pages) and implementation (source code, data, steps to

install and run) as zip/tar file

Send by email to [email protected] and [email protected]

DDBS12, SL05

51/52

M. Bohlen

Course Exam

Exam date: 16.01.2013

Exam time: 12:15 - 12:45

Exam location: BIN 2.E.13

Exam form and procedure

I
I
I

oral, 20 minutes
10 minutes about project (demo, code, algorithm)
10 about a topic of the course

During exam: present solutions on examples

Prepare suitable examples beforehand

DDBS12, SL05

52/52

M. Bohlen

Chapter 9-I
No ratings yet
Chapter 9-I
72 pages
4-Query Processing (Autosaved)
No ratings yet
4-Query Processing (Autosaved)
74 pages
Unit - Ii: Communication and Invocation
No ratings yet
Unit - Ii: Communication and Invocation
16 pages
Private Key Recovery Combination Attacks
100% (1)
Private Key Recovery Combination Attacks
26 pages
Vu Lec 33
No ratings yet
Vu Lec 33
36 pages
Chapter 4: Semantic Data Control: View Management Security Control Integrity Control
100% (1)
Chapter 4: Semantic Data Control: View Management Security Control Integrity Control
25 pages
CS3492 DBMS Univ - QP Answer AM 2024
No ratings yet
CS3492 DBMS Univ - QP Answer AM 2024
19 pages
Jntu-Software Process & Project Management
0% (1)
Jntu-Software Process & Project Management
2 pages
3.distance Vector Routing
No ratings yet
3.distance Vector Routing
38 pages
Software Quality Assurance Complete Notes
0% (1)
Software Quality Assurance Complete Notes
42 pages
Design Process and Design Quality - Iii Unit
0% (1)
Design Process and Design Quality - Iii Unit
9 pages
Query Decomposition & Localization
0% (2)
Query Decomposition & Localization
26 pages
Vu Lec 35
No ratings yet
Vu Lec 35
42 pages
Dbms-Unit-3 - Aktu
100% (1)
Dbms-Unit-3 - Aktu
7 pages
4-Query - Processing (1) - PTIT
No ratings yet
4-Query - Processing (1) - PTIT
72 pages
The Database System Environment
100% (1)
The Database System Environment
2 pages
Data Base Management Systems - Lab 2ND SEM BCA - Y2K8 SCHEME
No ratings yet
Data Base Management Systems - Lab 2ND SEM BCA - Y2K8 SCHEME
8 pages
Dbms Question Bank Unit I
100% (1)
Dbms Question Bank Unit I
2 pages
Interface Specification SE
No ratings yet
Interface Specification SE
9 pages
DBMS Question DBMS
100% (1)
DBMS Question DBMS
14 pages
DC Question Bank 5 Units
No ratings yet
DC Question Bank 5 Units
17 pages
Ai Unit-4 Notes
100% (1)
Ai Unit-4 Notes
19 pages
Static and Dynamic Hashing
No ratings yet
Static and Dynamic Hashing
12 pages
Distributed DBMS Challenges
100% (3)
Distributed DBMS Challenges
8 pages
Distributed Database Design Concept
No ratings yet
Distributed Database Design Concept
5 pages
Semantic Integrity Control in Distributed DBMSS: References
100% (1)
Semantic Integrity Control in Distributed DBMSS: References
33 pages
DBMS Organizer 2023
No ratings yet
DBMS Organizer 2023
160 pages
Tower of Hanoi Maths For IB Internal Assesment
100% (2)
Tower of Hanoi Maths For IB Internal Assesment
14 pages
Optimizing Initial Basic Feasible Solutions For Transportation Problems: A Novel Approach Incorporating Second Least Cost As Penalty
No ratings yet
Optimizing Initial Basic Feasible Solutions For Transportation Problems: A Novel Approach Incorporating Second Least Cost As Penalty
9 pages
KNN Is A Very Simple Algorithm Used To Solve Classification Problems. KNN Stands For K-Nearest Neighbors. K Is The Number of Neighbors in KNN
0% (1)
KNN Is A Very Simple Algorithm Used To Solve Classification Problems. KNN Stands For K-Nearest Neighbors. K Is The Number of Neighbors in KNN
9 pages
Dbms Practical Slips
No ratings yet
Dbms Practical Slips
10 pages
Transaction With Replicated Data PDF
No ratings yet
Transaction With Replicated Data PDF
3 pages
DBMS Previous Year Question Paper
No ratings yet
DBMS Previous Year Question Paper
3 pages
DD Decode
0% (1)
DD Decode
104 pages
Assignment 3 NPTEL DBMS January 2024
No ratings yet
Assignment 3 NPTEL DBMS January 2024
10 pages
Éléments de Data Mining Avec Tanagra: Vincent ISOZ, 2013-10-21 (V3.0 Revision 6) (oUUID 1.679)
No ratings yet
Éléments de Data Mining Avec Tanagra: Vincent ISOZ, 2013-10-21 (V3.0 Revision 6) (oUUID 1.679)
146 pages
Anna Univ Internet Programming Lab
No ratings yet
Anna Univ Internet Programming Lab
12 pages
Digital - Chapter3.k Map
No ratings yet
Digital - Chapter3.k Map
18 pages
Computer Vision: Chapter 5. Segmentation
100% (1)
Computer Vision: Chapter 5. Segmentation
16 pages
Monte Carlo
No ratings yet
Monte Carlo
4 pages
Unit 1 Introduction of Machine Learning Notes
No ratings yet
Unit 1 Introduction of Machine Learning Notes
57 pages
Machine Learning Viva Questions With Answers
0% (1)
Machine Learning Viva Questions With Answers
3 pages
CSE 453 Slide 3
No ratings yet
CSE 453 Slide 3
72 pages
DBMS Notes
No ratings yet
DBMS Notes
141 pages
The Traveling Salesman Problem and Its Variations
100% (1)
The Traveling Salesman Problem and Its Variations
836 pages
Dbms Unit-1 Notes For Students
No ratings yet
Dbms Unit-1 Notes For Students
79 pages
OMScheduling PPT
No ratings yet
OMScheduling PPT
38 pages
Discrete-Time Modeling of Clock Jitter in Continuous-Time: ΔΣ Modulators
No ratings yet
Discrete-Time Modeling of Clock Jitter in Continuous-Time: ΔΣ Modulators
4 pages
Sampling Rate and Aliasing On A Virtual Laboratory
No ratings yet
Sampling Rate and Aliasing On A Virtual Laboratory
4 pages
Data Flow Diagrams Complete
100% (1)
Data Flow Diagrams Complete
26 pages
Cisco MCQ
No ratings yet
Cisco MCQ
9 pages
Experiment-1 Aim: Write A Program For Implementation of Bit Stuffing
No ratings yet
Experiment-1 Aim: Write A Program For Implementation of Bit Stuffing
56 pages
Aggregate Data Models
100% (1)
Aggregate Data Models
55 pages
Application of Hilbert Huang Transform in The Field of Power Quality Events Analysis
No ratings yet
Application of Hilbert Huang Transform in The Field of Power Quality Events Analysis
7 pages
Distibuted Database Management System Notes
No ratings yet
Distibuted Database Management System Notes
58 pages
Applications of Community Mining Algorithms
No ratings yet
Applications of Community Mining Algorithms
17 pages
Cs6402 DAA Notes (Unit-3)
No ratings yet
Cs6402 DAA Notes (Unit-3)
25 pages
Oose Notes
75% (4)
Oose Notes
227 pages
Atm Uml Diagram
No ratings yet
Atm Uml Diagram
7 pages
Stat 200 - Mathematical Probability and Statistics I Lecture I - Random Events and Experiments / Approaches To Probability
No ratings yet
Stat 200 - Mathematical Probability and Statistics I Lecture I - Random Events and Experiments / Approaches To Probability
4 pages
DBMS ER Design Issues - Copy Unit.2
No ratings yet
DBMS ER Design Issues - Copy Unit.2
2 pages
A G1002 Pages: 2: Answer Any Two Full Questions, Each Carries 15 Marks
No ratings yet
A G1002 Pages: 2: Answer Any Two Full Questions, Each Carries 15 Marks
2 pages
DBMS
No ratings yet
DBMS
18 pages
Dijkstra's Algorithm Explained
No ratings yet
Dijkstra's Algorithm Explained
39 pages
Unit 1: Question Bank BCA (SEM-3) Software Engineering
No ratings yet
Unit 1: Question Bank BCA (SEM-3) Software Engineering
8 pages
Assignment 2 Mod 3 - Solution
100% (1)
Assignment 2 Mod 3 - Solution
11 pages
ETD Syllabus
No ratings yet
ETD Syllabus
2 pages
Anatomy OF File Write and Read
No ratings yet
Anatomy OF File Write and Read
6 pages
Unit I Content Beyond Syllabus - I Introduction To Data Mining and Data Warehousing What Are Data Mining and Knowledge Discovery?
No ratings yet
Unit I Content Beyond Syllabus - I Introduction To Data Mining and Data Warehousing What Are Data Mining and Knowledge Discovery?
12 pages
Enhanced Data Models For Advanced Applications
91% (11)
Enhanced Data Models For Advanced Applications
15 pages
Experiment #2: Continuous-Time Signal Representation I. Objectives
No ratings yet
Experiment #2: Continuous-Time Signal Representation I. Objectives
14 pages
OM-Chapter 5
No ratings yet
OM-Chapter 5
38 pages
HR Organizer 2023
No ratings yet
HR Organizer 2023
112 pages
Database Management Systems: ©silberschatz, Korth and Sudarshan 1.1 Database System Concepts
No ratings yet
Database Management Systems: ©silberschatz, Korth and Sudarshan 1.1 Database System Concepts
33 pages
USN Name Seminar Topics Scheduled Date & Time Rescheduled Date & Time
No ratings yet
USN Name Seminar Topics Scheduled Date & Time Rescheduled Date & Time
2 pages
Disk Attachment: Host Attached Storage Network Attached Storage
No ratings yet
Disk Attachment: Host Attached Storage Network Attached Storage
23 pages
Diabetes Detection Using Deep Learning Algorithms: ICT Express November 2018
No ratings yet
Diabetes Detection Using Deep Learning Algorithms: ICT Express November 2018
5 pages
XML and Web Services Question Bank
No ratings yet
XML and Web Services Question Bank
28 pages
Distributed Transactions Management
100% (3)
Distributed Transactions Management
28 pages
Introduction To Spectral Theory
No ratings yet
Introduction To Spectral Theory
3 pages
Unit 2 - Selection Sort
No ratings yet
Unit 2 - Selection Sort
10 pages
Harshit DAA 3.2
No ratings yet
Harshit DAA 3.2
5 pages
Assignment 1 (
No ratings yet
Assignment 1 (
2 pages
Questions: 5
No ratings yet
Questions: 5
5 pages
S-19 - Random Variables and Bivariate Continuous Distributions
No ratings yet
S-19 - Random Variables and Bivariate Continuous Distributions
21 pages
Distributed File Systems
No ratings yet
Distributed File Systems
50 pages
Ju DG-Recon Depth-Guided Neural 3D Scene Reconstruction ICCV 2023 Paper
No ratings yet
Ju DG-Recon Depth-Guided Neural 3D Scene Reconstruction ICCV 2023 Paper
11 pages
Forests 14 02440
No ratings yet
Forests 14 02440
18 pages
CAT1 MCQs
No ratings yet
CAT1 MCQs
11 pages
Datsci Handbook
No ratings yet
Datsci Handbook
93 pages