0% found this document useful (0 votes)

26 views110 pages

3 Distribution Design

Uploaded by

dynamogaming8055

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views110 pages

3 Distribution Design

Uploaded by

dynamogaming8055

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 110

Outline

 Introduction
 Background
 Distributed Database Design
 Fragmentation
 Data distribution
 Database Integration
 Semantic Data Control
 Distributed Query Processing
 Multidatabase Query Processing
 Distributed Transaction Management
 Data Replication
 Parallel Database Systems
 Distributed Object DBMS
 Peer-to-Peer Data Management
 Web Data Management
 Current Issues
Design Problem

 In the general setting :

Making decisions about the placement of data and programs
across the sites of a computer network as well as possibly
designing the network itself.

 In Distributed DBMS, the placement of applications

entails
 placement of the distributed DBMS software; and

 placement of the applications that run on the database

Dimensions of the Problem
Dimensions of the Problem
Level of sharing: Three possibilities

• No sharing: Each application and its data execute at one site, and there
is no communication with any other program or access to any data file
at other sites. This characterizes the very early days of networking and
is probably not very common today.

• Level of data sharing: All the programs are replicated at all the sites,
but data files are not. Accordingly, user requests are handled at the
site where they originate and the necessary data files are moved around
the network.

• Data-plus-program sharing: both data and programs may be shared,

meaning that a program at a given site can request a service from
another program at a second site, which, in turn, may have to access a
data file located at a third site.
Dimensions of the Problem
Access pattern behavior

 It is possible to identify two alternatives.

 The access patterns of user requests may be static, so that they do

not change over time, or
 dynamic.

 It is easier to plan for and manage the static environments than would
be the case for dynamic distributed systems.

 Unfortunately, it is difficult to find many real-life distributed

applications that would be classified as static.
Dimensions of the Problem

Level of knowledge

 The third dimension of classification is the level of knowledge about the

access pattern behavior.

 One possibility, is that the designers do not have any information about
how users will access the database.

 This is a theoretical possibility, but it is very difficult, if not impossible,

to design a distributed DBMS that can effectively cope with this
situation.

 The more practical alternatives are that the designers have complete
information, where the access patterns can reasonably be predicted and
do not deviate significantly from these predictions, or partial
information, where there are deviations from the predictions.
Distribution Design

 Top-down
 mostly in designing systems from scratch

 mostly in homogeneous systems

 Bottom-up
 when the databases already exist at a number of sites
Top-Down Design
Top-Down Design
Requirements analysis

 Requirements analysis that defines the environment of the

system and “elicits both the data and processing needs of all
potential database users”.

 The requirements study also specifies where the final system is

expected to stand with respect to the objectives of a
distributed DBMS.

 These objectives are defined with respect to performance,

reliability and availability, economics, and expandability
(flexibility).
Top-Down Design

View design and Conceptual design

• The requirements document is input to two parallel

activities:
 view design and
 conceptual design.

• The view design activity deals with defining the

interfaces for end users.

• The conceptual design, is the process by which the

enterprise is examined to determine entity types and
relationships among these entities.
Top-Down Design
Entity analysis and Functional analysis

• One can possibly divide this process into two related

activity groups:
 entity analysis and
 functional analysis.
• Entity analysis is concerned with determining the
entities, their attributes, and the relationships among
them.
• Functional analysis, is concerned with determining the
fundamental functions with which the modeled
enterprise is involved.
• The results of these two steps need to be cross-
referenced to get a better understanding of which
functions deal with which entities.
Top-Down Design
Statistical information

• In conceptual design and view design activities the

user needs to specify the data entities and must
determine the applications that will run on the
database as well as statistical information about
these applications.

• Statistical information includes the specification of

the frequency of user applications, the volume of
various information, and the like.
Top-Down Design
Distribution design

• The global conceptual schema (GCS) and access pattern

information collected as a result of view design are inputs to the
distribution design step.

• The objective at this stage, which is the focus of this chapter, is to

design the local conceptual schemas (LCSs) by distributing the
entities over the sites of the distributed system.

• It is possible, to treat each entity as a unit of distribution.

Top-Down Design
Distribution design
• There is a relationship between the conceptual design and the
view design.

• In one sense, the conceptual design can be interpreted as

being an integration of user views.

• Even though this view integration activity is very important,

the conceptual model should support not only the existing
applications, but also future applications.

• View integration should be used to ensure that entity and

relationship requirements for all the views are covered in the
conceptual schema.
Top-Down Design
Distribution design

• Rather than distributing relations, it is quite common to divide

them into sub-relations, called fragments, which are then
distributed.
• Thus, the distribution design activity consists of two steps:
 fragmentation and
 allocation.
• The last step in the design process is the physical design,
which maps the local conceptual schemas to the physical
storage devices available at the corresponding sites.
• The inputs to this process are the local conceptual schema and
the access pattern information about the fragments in them.
Distribution Design Issues

 Why fragment at all?

 How to fragment?

 How much to fragment?

 How to test correctness?

 How to allocate?

 Information requirements?
Fragmentation

 Can't we just distribute relations?

 What is a reasonable unit of distribution?
 relation
 views are subsets of relations locality
 extra communication

 fragments of relations (sub-relations)

 concurrent execution of a number of transactions that access
different portions of a relation
 views that cannot be defined on a single fragment will require
extra processing
 semantic data control (especially integrity enforcement) more
difficult
Example
Relation Schemes
EMP
ENO ENAME TITLE SAL PNO RESP DUR

PROJ
PNO PNAME BUDGET

EMP(ENO, ENAME, TITLE, SAL, PNO, RESP, DUR)

PROJ (PNO, PNAME, BUDGET)

 Underlined attributes are relation keys (tuple identifiers).

 Tabular form
Example
Relation Instances
Example
Normalized Relations

Figure 3.3
Example
Transparent Access
SELECT ENAME,SAL
Tokyo
FROM EMP,ASG,PAY
WHERE DUR > 12 Paris
Boston
AND EMP.ENO = ASG.ENO Paris projects
AND PAY.TITLE = EMP.TITLE Paris employees
Communication Paris assignments
Network Boston employees

Boston projects
Boston employees
Boston assignments
Montreal
New
Montreal projects
York Paris projects
Boston projects New York projects
New York employees with budget > 200000
New York projects Montreal employees
New York assignments Montreal assignments
Fragmentation Alternatives – Horizontal

PROJ
PROJ1 : projects with budgets less than PNO PNAME BUDGET LOC
$200,000
P1 Instrumentation 150000 Montreal
PROJ2 : projects with budgets greater than P2 Database Develop. 135000 New York
P3 CAD/CAM 250000 New York
or equal to $200,000 P4 Maintenance 310000 Paris
P5 CAD/CAM 500000 Boston

PROJ1 PROJ2

PNO PNAME BUDGET LOC PNO PNAME BUDGET LOC

P1 Instrumentation 150000 Montreal P3 CAD/CAM 250000 New York
P2 Database Develop. 135000 New York P4 Maintenance 310000 Paris
P5 CAD/CAM 500000 Boston
Example
Fragmentation Alternatives – Vertical
PROJ
PROJ1: information about project PNO PNAME BUDGET LOC
budgets
P1 Instrumentation 150000 Montreal
PROJ2: information about project P2 Database Develop. 135000 New York
P3 CAD/CAM 250000 New York
names and locations P4 Maintenance 310000 Paris
P5 CAD/CAM 500000 Boston

PROJ1 PROJ2
PNO BUDGET PNO PNAME LOC

P1 150000 P1 Instrumentation Montreal

P2 135000 P2 Database Develop. New York
P3 250000 P3 CAD/CAM New York
P4 310000 P4 Maintenance Paris
P5 500000 P5 CAD/CAM Boston
Degree of Fragmentation

finite number of alternatives

tuples relations
or
attributes

Finding the suitable level of partitioning within this

range
Correctness of Fragmentation

 Completeness
 Decomposition of relation R into fragments R1, R2, ..., Rn is
complete if and only if each data item in R can also be
found in some Ri
 Reconstruction
 If relation R is decomposed into fragments R1, R2, ..., Rn,
then there should exist some relational operator ∇ such
that
R = ∇1≤i≤nRi
 Disjointness
 If relation R is decomposed into fragments R1, R2, ..., Rn,
and data item di is in Rj, then di should not be in any other
fragment Rk (k ≠ j ).
Allocation Alternatives

 Non-replicated
 partitioned : each fragment resides at only one site
 Replicated
 fully replicated : each fragment at each site
 partially replicated : each fragment at some of the sites
 Rule of thumb:
If read-only queries << 1, replication is advantageous,
update queries
otherwise replication may cause problems
Comparison of Replication Alternatives
Full-replication Partial-replication Partitioning

QUERY Same Difficulty

Easy
PROCESSING

DIRECTORY Easy or Same Difficulty

MANAGEMENT Non-existant

CONCURRENCY
Moderate Difficult Easy
CONTROL

RELIABILITY Very high High Low

Possible Possible
REALITY Realistic
application application
Information Requirements

 Four categories:
 Database information
 Application information
 Communication network information
 Computer system information
Fragmentation

 Horizontal Fragmentation (HF)

 Primary Horizontal Fragmentation (PHF)
 Derived Horizontal Fragmentation (DHF)

 Vertical Fragmentation (VF)

 Hybrid Fragmentation (HF)
PHF – Information Requirements
 Database Information
 relationship

SKILL
TITLE, SAL

L1
EMP PROJ
ENO, ENAME, TITLE PNO, PNAME, BUDGET,
LOC

ASG
ENO, PNO, RESP, DUR

 cardinality of each relation: card(R)

PHF - Information Requirements
 Application Information
 simple predicates : Given R[A1, A2, …, An], a simple predicate pj is

pj : Ai θValue
where θ  {=,<,≤,>,≥,≠}, Value  Di and Di is the domain of Ai.
For relation R we define Pr = {p1, p2, …,pm}
Example :
PNAME = "Maintenance"
BUDGET ≤ 200000
 minterm predicates : Given R and Pr = {p1, p2, …,pm}
define M = {m1,m2,…,mr} as

M = { mi | mi = pjPr pj* }, 1≤j≤m, 1≤i≤z

where pj* = pj or pj* = ¬(pj).
PHF – Information Requirements

Example

m1: PNAME="Maintenance"  BUDGET≤200000

m2: NOT(PNAME="Maintenance")  BUDGET≤200000

m3: PNAME= "Maintenance"  NOT(BUDGET≤200000)

m4: NOT(PNAME="Maintenance")  NOT(BUDGET≤200000)

Example
Consider relation PAY of Figure 3.3. The following are some of the
possible simple predicates that can be defined on PAY.
PHF – Information Requirements
 Application Information
 minterm selectivities: sel(mi)
 Thenumber of tuples of the relation that would
be accessed by a user query which is specified
according to a given minterm predicate mi.
 access frequencies: acc(qi)
 Thefrequency with which a user application qi
accesses data.
 Accessfrequency for a minterm predicate can
also be defined.
Primary Horizontal Fragmentation

Definition :
Rj = Fj(R), 1 ≤ j ≤ w
where Fj is a selection formula, which is (preferably) a minterm
predicate.
Therefore,
A horizontal fragment Ri of relation R consists of all the tuples of R
which satisfy a minterm predicate mi.


Given a set of minterm predicates M, there are as many horizontal
fragments of relation R as there are minterm predicates.
Set of horizontal fragments also referred to as minterm fragments.
PHF – Algorithm

Given: A relation R, the set of simple predicates Pr

Output: The set of fragments of R = {R1, R2,…,Rw} which
obey the fragmentation rules.

Preliminaries :
 Pr should be complete
 Pr should be minimal
Example

We assume that the non-negativity of the BUDGET values is a feature of the relation that is enforced by an
integrity constraint. Otherwise, a simple predicate of the form 0 BUDGET also needs to be included in Pr

Example 3.7 demonstrates one of the problems of horizontal partitioning. If the

domain of the attributes participating in the selection formulas are continuous
and infinite, as in Example 3.7, it is quite difficult to define the set of formulas F
= {F1, F2, ….., Fn} that would fragment the relation properly. One possible
course of action is to define ranges as we have done in Example 3.7. However,
there is always the problem of handling the two endpoints. For example, if a new
tuple with a BUDGET value of, say, $600,000 were to be inserted into PROJ, one
would have had to review the fragmentation to decide if the new tuple is to go
into PROJ2 or if the fragments need to be revised and a new fragment needs to be
defined as
Example
Completeness of Simple Predicates
 A set of simple predicates Pr is said to be complete if and only if the
accesses to the tuples of the minterm fragments defined on Pr requires
that two tuples of the same minterm fragment have the same probability
of being accessed by any application.

 Example :
 Assume PROJ[PNO,PNAME,BUDGET,LOC] has two applications defined on it.
 Find the budgets of projects at each location. (1)
 Find projects with budgets less than $200000. (2)
Completeness of Simple Predicates

According to (1),
Pr={LOC=“Montreal”,LOC=“New York”,LOC=“Paris”}

which is not complete with respect to (2).

Modify
Pr ={LOC=“Montreal”,LOC=“New York”,LOC=“Paris”, BUDGET≤200000,BUDGET>200000}

which is complete.
Minimality of Simple Predicates

 If a predicate influences how fragmentation is

performed, (i.e., causes a fragment f to be further
fragmented into, say, fi and fj) then there should be at
least one application that accesses fi and fj differently.
 In other words, the simple predicate should be relevant
in determining a fragmentation.
 If all the predicates of a set Pr are relevant, then Pr is
minimal.
acc(mi ) acc(m )
= j

card( fi ) card( f j )
Minimality of Simple Predicates

Example :
Pr ={LOC=“Montreal”,LOC=“New York”, LOC=“Paris”,
BUDGET≤200000,BUDGET>200000}

is minimal (in addition to being complete). However, if we add

PNAME = “Instrumentation”

then Pr is not minimal.

Exercises-Example
Exercises-Solution
Exercises-Example
Exercises-Solution
Exercises-Solution
Exercises-Example
Exercises-Solution
COM_MIN Algorithm
Given: a relation R and a set of simple predicates Pr
Output: a complete and minimal set of simple predicates Pr'
for Pr

Rule 1: a relation or fragment is partitioned into at least two

parts which are accessed differently by at least one
application.
COM_MIN Algorithm
 Initialization :
 find a pi  Pr such that pi partitions R according to Rule 1
 set Pr' = pi ; Pr Pr – {pi} ; F  {fi}
 Iteratively add predicates to Pr' until it is complete
 find a pj  Pr such that pj partitions some fk defined
according to minterm predicate over Pr' according to Rule 1
 set Pr' = Pr'  {pi}; Pr Pr – {pi}; F  F  {fi}
 if pk  Pr' which is nonrelevant then
Pr'  Pr – {pi}
F  F – {fi}
COM_MIN Algorithm (detail)
PHORIZONTAL Algorithm
Makes use of COM_MIN to perform fragmentation.
Input: a relation R and a set of simple predicates Pr
Output: a set of minterm predicates M according to which
relation R is to be fragmented

 Pr'  COM_MIN (R,Pr)

 determine the set M of minterm predicates
 determine the set I of implications among pi  Pr
 eliminate the contradictory minterms from M
PHORIZONTAL Algorithm (detail)
PHF – Example
 Two candidate relations : PAY and PROJ.
 Fragmentation of relation PAY
 Application: Check the salary info and determine raise.
 Employee records kept at two sites  application run at
two sites
 Simple predicates
p1 : SAL ≤ 30000
p2 : SAL > 30000
Pr = {p1,p2} which is complete and minimal Pr'=Pr
 Minterm predicates
m1 : (SAL ≤ 30000)
m2 : NOT(SAL ≤ 30000) = (SAL > 30000)
PHF – Example

PAY1 PAY2
TITLE SAL TITLE SAL
Mech. Eng. 27000 Elect. Eng. 40000
Programmer 24000 Syst. Anal. 34000
PHF – Example
 Fragmentation of relation PROJ
 Applications:
 Find the name and budget of projects given their no.
 Issued at three sites
 Access project information according to budget
 one site accesses ≤200000 other accesses
>200000
 Simple predicates
 For application (1)
p1 : LOC = “Montreal”
p2 : LOC = “New York”
p3 : LOC = “Paris”
 For application (2)
p4 : BUDGET ≤ 200000
p5 : BUDGET > 200000
 Pr = Pr' = {p1,p2,p3,p4,p5}
PHF – Example

 Fragmentation of relation PROJ continued

 Minterm fragments left after elimination
m1 : (LOC = “Montreal”)  (BUDGET ≤ 200000)
m2 : (LOC = “Montreal”)  (BUDGET > 200000)
m3 : (LOC = “New York”)  (BUDGET ≤ 200000)
m4 : (LOC = “New York”)  (BUDGET > 200000)
m5 : (LOC = “Paris”)  (BUDGET ≤ 200000)
m6 : (LOC = “Paris”)  (BUDGET > 200000)
PHF – Example

PROJ1 PROJ2

PNO PNAME BUDGET LOC PNO PNAME BUDGET LOC

Database
P1 Instrumentation 150000 Montreal P2 135000 New York
Develop.

PROJ4 PROJ6

PNO PNAME BUDGET LOC PNO PNAME BUDGET LOC

P3 CAD/CAM 250000 New P4 Maintenance 310000 Paris

York
PHF – Correctness
 Completeness
 Since Pr' is complete and minimal, the selection predicates are
complete

 Reconstruction
 If relation R is fragmented into FR = {R1,R2,…,Rr}

R = Ri FR Ri
 Disjointness
 Minterm predicates that form the basis of fragmentation should be
mutually exclusive.
Derived Horizontal Fragmentation

 Defined on a member relation of a link according to a selection operation

specified on its owner.
 Each link is an equijoin.
 Equijoin can be implemented by means of semijoins.

SKILL
TITLE, SAL

L1
EMP PROJ
ENO, ENAME, TITLE PNO, PNAME, BUDGET, LOC

L2 L3
ASG
ENO, PNO, RESP, DUR
DHF – Definition
Given a link L where owner(L)=S and member(L)=R, the derived horizontal
fragments of R are defined as

Ri = R ⋉F Si, 1≤i≤w
where w is the maximum number of fragments that will be defined on R
and

Si = Fi (S)
where Fi is the formula according to which the primary horizontal
fragment Si is defined.
DHF – Example
Given link L1 where owner(L1)=PAY and member(L1)=EMP
EMP1 = EMP ⋉ PAY1

EMP2 = EMP ⋉ PAY2

where
PAY1 = SAL≤30000(PAY)
PAY2 = SAL>30000(PAY)

EMP1 EMP2
ENO ENAME TITLE ENO ENAME TITLE

E3 A. Lee Mech. Eng. E1 J. Doe Elect. Eng.

E4 J. Miller Programmer E2 M. Smith Syst. Anal.
E7 R. Davis Mech. Eng. E5 B. Casey Syst. Anal.
E6 L. Chu Elect. Eng.
E8 J. Jones Syst. Anal.
DHF – Correctness
 Completeness
 Referential integrity
 Let R be the member relation of a link whose owner is
relation S which is fragmented as FS = {S1, S2, ..., Sn}.
Furthermore, let A be the join attribute between R and S.
Then, for each tuple t of R, there should be a tuple t' of S
such that
t[A] = t' [A]
 Reconstruction
 Same as primary horizontal fragmentation.
 Disjointness
 Simple join graphs between the owner and the member
fragments.
Example
Let us continue with the distribution design of the database we started in Example
3.11.We already decided on the fragmentation of relation EMP according to the
fragmentation of PAY (Example 3.12
Example
Let us now consider ASG. Assume that there are the following two
applications:
Example
Example
Example
Exercises-Example

Given relation PAY as in Figure 3.3, let p1: SAL < 30000 and
p2: SAL ≥ 3000 be two simple predicates. Perform a
horizontal fragmentation of PAY with respect to these
predicates to obtain PAY1, and PAY2. Using the fragmentation
of PAY, perform further derived horizontal fragmentation for
EMP. Show completeness, reconstruction, and disjointness of
the fragmentation of EMP.
Exercises-Solution
Exercises-Solution
Vertical Fragmentation
 Has been studied within the centralized context
 design methodology
 physical clustering
 More difficult than horizontal, because more
alternatives exist.
Two approaches :
 grouping
 attributes to fragments
 splitting
 relation to fragments
Vertical Fragmentation
 Overlapping fragments
 grouping
 Non-overlapping fragments
 splitting
We do not consider the replicated key attributes to
be overlapping.
Advantage:
Easier to enforce functional dependencies
(for integrity checking etc.)
VF – Information Requirements

 Application Information
 Attribute affinities
 a measure that indicates how closely related the attributes are
 This is obtained from more primitive usage data
 Attribute usage values
 Given a set of queries Q = {q1, q2,…, qq} that will run on the relation
R[A1, A2,…, An],

 1 if attribute Aj is referenced by query qi

use(qi,Aj) = 
 0 otherwise

use(qi,•) can be defined accordingly

VF – Definition of use(qi,Aj)

Consider the following 4 queries for relation PROJ

q1: SELECT BUDGET q2: SELECT PNAME,BUDGET
FROM PROJ FROM PROJ
WHERE PNO=Value
q3: SELECT PNAME q4: SELECT SUM(BUDGET)
FROM PROJ FROM PROJ
WHERE LOC=Value WHERE LOC=Value
Let A1= PNO, A2= PNAME, A3= BUDGET, A4= LOC

A1 A2 A3 A4
q1 1 0 1 0
q2 0 1 1 0
q3 0 1 0 1
q4 0 0 1 1
VF – Affinity Measure aff(Ai,Aj)
The attribute affinity measure between two attributes Ai and Aj of a
relation R[A1, A2, …, An] with respect to the set of applications Q = (q1, q2,
…, qq) is defined as follows :

aff (Ai, Aj) =  (query access)

all queries that access A and A i j


access
query access = access frequency of a query 
execution
all sites
VF – Calculation of aff(Ai, Aj)

Assume each query in the previous example accesses the S1 S2 S3

attributes once during each execution.
q1 15 20 10
Also assume the access frequencies
q2 5 0 0
q3 25 25 25
q
4 3 0 0

Then A1 A2 A3 A4
aff(A1, A3) = 15*1 + 20*1+10*1 A1 45 0 45 0
= 45 A2 0 80 5 75
and the attribute affinity matrix AA is A3 45 5 53 3
A4 0 75 3 78
VF – Clustering Algorithm

 Take the attribute affinity matrix AA and reorganize the

attribute orders to form clusters where the attributes in
each cluster demonstrate high affinity to one another.
 Bond Energy Algorithm (BEA) has been used for
clustering of entities. BEA finds an ordering of entities
(in our case attributes) such that the global affinity
measure is maximized.

AM = i j
(affinity of Ai and Aj with their neighbors)
Bond Energy Algorithm
Input: The AA matrix
Output: The clustered affinity matrix CA which is a perturbation of AA
 Initialization: Place and fix one of the columns of AA in CA.
 Iteration: Place the remaining n-i columns in the remaining i+1
positions in the CA matrix. For each column, choose the placement
that makes the most contribution to the global affinity measure.
 Row order: Order the rows according to the column ordering.
Bond Energy Algorithm (detail)
Bond Energy Algorithm

“Best” placement? Define contribution of a placement:

cont(Ai, Ak, Aj) = 2bond(Ai, Ak)+2bond(Ak, Al) –2bond(Ai, Aj)

n
where
bond(Ax,Ay) = 
z =1
aff(Az,Ax)aff(Az,Ay)
BEA – Example
Consider the following 4 queries for relation PROJ
q1: SELECT BUDGET q2: SELECT PNAME,BUDGET
FROM PROJ FROM PROJ
WHERE PNO=Value
q3: SELECT PNAME q4: SELECT SUM(BUDGET)
FROM PROJ FROM PROJ
WHERE LOC=Value WHERE LOC=Value
Let A1= PNO, A2= PNAME, A3= BUDGET, A4= LOC

A1 A2 A3 A4
q1 1 0 1 0
q2 0 1 1 0
q3 0 1 0 1
q4 0 0 1 1
BEA – Example

Note that the diagonal values are not computed since

they are meaningless.

Attribute Affinity Matrix

BEA – Example
Let us consider the AA matrix i.e. Attribute Affinity Matrix and study the contribution of
moving attribute A4 between attributes A1 and A2 given by the formula
BEA – Example
Consider the following AA matrix and the corresponding CA matrix where A1 and
A2 have been placed. Place A3:

Ordering (0-3-1) :
cont(A0,A3,A1) = 2bond(A0 , A3)+2bond(A3 , A1)–2bond(A0 , A1)
= 2* 0 + 2* 4410 – 2*0 = 8820
Ordering (1-3-2) :
cont(A1,A3,A2) = 2bond(A1 , A3)+2bond(A3 , A2)–2bond(A1,A2)
= 2* 4410 + 2* 890 – 2*225 = 10150
Ordering (2-3-4) :
cont (A2,A3,A4) = 1780
BEA – Example

A1 A3 A 2
 Therefore, the CA matrix has the form
45 45 0
0 5 80
45 53 5
0 3 75

 When A4 is placed, the final form of the CA matrix (after A1 A3 A2 A4

row organization) is A1 45 45 0 0

A3 45 53 5 3
A2 0 5 80 75
A4 0 3 75 78
BEA – Example
BEA – Example

Note: Although, as noted in the book, it doesn’t make sense to compute the
diagonal values in the AA matrix, we show them here since they are used in the
following calculations.
Now we start applying the BEA algorithm. We first fix the first two columns and
the Clustered Affinity (CA) Matrix looks like the following:
BEA – Example
Next we consider placing A3 – there are three places where it can be placed:
(a) to the left of A1, which has a contribution of 16950;
(b) in between A1 and A2, which has a contribution of 22050, and
(c) to the right of A2, which has a contribution of 21450.
Thus, the bets ordering is A1, A3, A2 resulting in the following CA matrix:
VF – Algorithm
How can you divide a set of clustered attributes {A1, A2, …, An} into
two (or more) sets {A1, A2, …, Ai} and {Ai, …, An} such that there are no
(or minimal) applications that access both (or more than one) of the
sets.

A1 A2 A3 … Ai Ai+1 . . A
. m
A1
A2
TA
Ai

Ai+1
BA
Am
VF – ALgorithm
Define
TQ = set of applications that access only TA
BQ = set of applications that access only BA
OQ = set of applications that access both TA and BA
and
CTQ = total number of accesses to attributes by applications that access only TA
CBQ = total number of accesses to attributes by applications that access only BA
COQ = total number of accesses to attributes by applications that access both TA and BA
Then find the point along the diagonal that maximizes
CTQCBQCOQ2
VF – Algorithm
Two problems :
Cluster forming in the middle of the CA matrix
 Shift a row up and a column left and apply the algorithm to find the “best”
partitioning point
 Do this for all possible shifts
 Cost O(m2)
More than two clusters
 m-way partitioning
 try 1, 2, …, m–1 split points along diagonal and try to find the best point for
each of these
 Cost O(2m)
VF – Correctness
A relation R, defined over attribute set A and key K, generates the vertical
partitioning FR = {R1, R2, …, Rr}.
 Completeness
 The following should be true for A:

A=  ARi
 Reconstruction
 Reconstruction can be achieved by

R= ⋈•
K Ri, Ri  FR

 Disjointness
 TID's are not considered to be overlapping since they are maintained by the system
 Duplicated keys are not considered to be overlapping
Hybrid Fragmentation

Uses a combination of horizontal and vertical fragmentation to generate the

fragments we need.
Two approaches
1. Generate a set of horizontal fragments and then vertically fragment one of
more of these horizontal fragments.
2. Generate a set of vertical fragments and then horizontally fragment one or
more of these vertical fragments.
Either way, the final fragments produced are the same.
This fragmentation approach provides for the most flexibility for the
designers but at the same time it is the most expensive approach with
respect to reconstruction of the original table.
Example

The nonfragmented version of the EMP table.

Example
Let’s assume that employee salary information needs to be maintained in a
separate fragment from the nonsalary information.
• A vertical fragmentation plan will generate the EMP_SAL and EMP_NON_SAL
vertical fragments.
• The nonsalary information needs to be fragmented into horizontal fragments,
where each fragment contains only the rows that match the city where the
employees work.
• We can achieve this by applying horizontal fragmentation to the
EMP_NON_SAL fragment of the EMP table.
The following three SQL statements show how this is achieved.

Create table NON_SAL_MPLS_EMPS as

Select *
From EMP_NON_SAL
Where Loc = ‘Minneapolis’;

Create table NON_SAL_LA_EMPS as

Select *
From EMP_NON_SAL
Where Loc = ‘LA’;
Example
Create table NON_SAL_NY_EMPS as
Select *
From EMP_NON_SAL
Where Loc = ‘New York’;
Fragment Allocation
 Problem Statement
Given
F = {F1, F2, …, Fn} fragments
S ={S1, S2, …, Sm} network sites
Q = {q1, q2,…, qq} applications
Find the "optimal" distribution of F to S.
 Optimality
 Minimal cost
 Communication + storage + processing (read & update)
 Cost in terms of time (usually)
 Performance
Response time and/or throughput
 Constraints
 Per site constraints (storage & processing)
Information Requirements
 Database information
 selectivity of fragments
 size of a fragment
 Application information
 access types and numbers
 access localities
 Communication network information
 unit cost of storing data at a site
 unit cost of processing at a site
 Computer system information
 bandwidth
 latency
 communication overhead
Allocation

File Allocation (FAP) vs Database Allocation (DAP):

 Fragments are not individual files
 relationships have to be maintained

 Access to databases is more complicated

 remote file access model not applicable
 relationship between allocation and query processing

 Cost of integrity enforcement should be considered

 Cost of concurrency control should be considered
Allocation – Information Requirements
 Database Information
 selectivity of fragments
 size of a fragment
 Application Information
 number of read accesses of a query to a fragment
 number of update accesses of query to a fragment
 A matrix indicating which queries updates which fragments
 A similar matrix for retrievals
 originating site of each query
 Site Information
 unit cost of storing data at a site
 unit cost of processing at a site
 Network Information
 communication cost/frame between two sites
 frame size
Allocation Model
General Form
min(Total Cost)
subject to
response time constraint
storage constraint
processing constraint

Decision Variable

1 if fragment Fi is stored at site Sj

xij =
0 otherwise
Allocation Model

 Total Cost
 query processing cost 
all queries

  cost of storing a fragment at a site

all sites all fragments

 Storage Cost (of fragment Fj at Sk)

(unit storage cost at Sk)  (size of Fj)  xjk

 Query Processing Cost (for one query)

processing component + transmission component
Allocation Model

 Query Processing Cost

Processing component
access cost + integrity enforcement cost + concurrency control

 
cost
(no. of update accesses+ no. of read accesses) 
 Access cost
all sites all fragments
xij  local processing cost at a site

 Integrity enforcement and concurrency control costs

 Can be similarly calculated
Allocation Model

 Query Processing Cost

Transmission component
cost of processing updates + cost of processing retrievals



Cost of updates
 update message cost 
all sites all fragments
  acknowledgment cost
all sites all fragments


 Retrieval Cost
min all sites (cost of retrieval command 
all fragments cost of sending back the result)
Allocation Model

 Constraints
 Response Time
execution time of query ≤ max. allowable response time for that
query


Storage Constraint
storage(for a site)
requirement of a fragment at that site 
storage capacity at that site
all fragments

 Processing constraint (for a site)

 processing load of a query at that site 

all queries processing capacity of that site
Allocation Model

 Solution Methods
 FAP is NP-complete
 DAP also NP-complete

 Heuristics based on
 single commodity warehouse location (for FAP)
 knapsack problem
 branch and bound techniques
 network flow
Allocation Model

 Attempts to reduce the solution space

 assume all candidate partitionings known; select the “best”

partitioning

 ignore replication at first

 sliding window on fragments

Internal Combustion Engine Fundamentals 2nd Edition
94% (17)
Internal Combustion Engine Fundamentals 2nd Edition
426 pages
Top Down Design
No ratings yet
Top Down Design
4 pages
Skylight Space Frame
No ratings yet
Skylight Space Frame
1 page
Distributed Database Design 3rd Assignment
100% (2)
Distributed Database Design 3rd Assignment
22 pages
Module 2
No ratings yet
Module 2
62 pages
Lec3 21 10 16.
No ratings yet
Lec3 21 10 16.
52 pages
Chapter No5 - Distributive Database
No ratings yet
Chapter No5 - Distributive Database
25 pages
Distributed Database Design Methodologies: Stefan0 Ceri, Barbara Pernici, Wiederhold
No ratings yet
Distributed Database Design Methodologies: Stefan0 Ceri, Barbara Pernici, Wiederhold
14 pages
Chapter 6 DDBMS
No ratings yet
Chapter 6 DDBMS
41 pages
Distributed Database Design
100% (3)
Distributed Database Design
86 pages
Monaco TPS Strategies Monaco Tips and Tricks
100% (2)
Monaco TPS Strategies Monaco Tips and Tricks
86 pages
Unit 2-PartII Distributed Database Design
No ratings yet
Unit 2-PartII Distributed Database Design
123 pages
Complex Analysis
100% (1)
Complex Analysis
305 pages
Unit 1
No ratings yet
Unit 1
28 pages
10 Distributeddbms
No ratings yet
10 Distributeddbms
56 pages
Distributed Database Chapter 3 Modified
No ratings yet
Distributed Database Chapter 3 Modified
40 pages
4.1 Lecture 4 Distributed Databases
No ratings yet
4.1 Lecture 4 Distributed Databases
42 pages
Rajasthan Basin
No ratings yet
Rajasthan Basin
239 pages
Distributed Database Design
No ratings yet
Distributed Database Design
15 pages
Chapter - 7 Distributed Database System
No ratings yet
Chapter - 7 Distributed Database System
29 pages
Distributed Databases
No ratings yet
Distributed Databases
53 pages
Chapter 4 - Distributed Database System
No ratings yet
Chapter 4 - Distributed Database System
52 pages
Chapter 2 - 9-15DDB Architecture
No ratings yet
Chapter 2 - 9-15DDB Architecture
67 pages
07 DistributedDataManagement
No ratings yet
07 DistributedDataManagement
44 pages
Ddis U1-3
No ratings yet
Ddis U1-3
40 pages
Week10DatabaseTerminology 38c594f2 f34d 431e 82f5 074ebff1acad 170579
No ratings yet
Week10DatabaseTerminology 38c594f2 f34d 431e 82f5 074ebff1acad 170579
30 pages
Week 12 - Distributed Databases
No ratings yet
Week 12 - Distributed Databases
37 pages
Distributed Database Management Systems: Week-4
No ratings yet
Distributed Database Management Systems: Week-4
24 pages
Distributed Database
100% (1)
Distributed Database
24 pages
Chapter 2
No ratings yet
Chapter 2
61 pages
Distributed Database Design Guide
No ratings yet
Distributed Database Design Guide
52 pages
Distributed Databases and Client-Server Architectures
No ratings yet
Distributed Databases and Client-Server Architectures
41 pages
Q # 1: What Are The Components of Distributed Database System? Explain With The Help of A Diagram. Answer
No ratings yet
Q # 1: What Are The Components of Distributed Database System? Explain With The Help of A Diagram. Answer
12 pages
Distributed Database Concepts
No ratings yet
Distributed Database Concepts
52 pages
ch6 Distributed Database
No ratings yet
ch6 Distributed Database
35 pages
Retaining Wall Drawing
No ratings yet
Retaining Wall Drawing
1 page
Unit 2
No ratings yet
Unit 2
73 pages
Chapter 4 Distributed Databases
No ratings yet
Chapter 4 Distributed Databases
36 pages
Distributed Database Systems Guide
No ratings yet
Distributed Database Systems Guide
5 pages
02 DistributedDataManagement
No ratings yet
02 DistributedDataManagement
37 pages
System Design and Database Optimization
No ratings yet
System Design and Database Optimization
12 pages
Distributed Database Design
No ratings yet
Distributed Database Design
51 pages
Adb CH 4
No ratings yet
Adb CH 4
14 pages
IJERT Efficient Fragmentation and Alloca
No ratings yet
IJERT Efficient Fragmentation and Alloca
7 pages
Lecture 8 - Distributed Databases
No ratings yet
Lecture 8 - Distributed Databases
4 pages
8th DD 2023-4 Seg 3
No ratings yet
8th DD 2023-4 Seg 3
11 pages
Lecture 2 Distriburted Databases
No ratings yet
Lecture 2 Distriburted Databases
45 pages
Lecture 4db
No ratings yet
Lecture 4db
14 pages
Database Design for IT Professionals
No ratings yet
Database Design for IT Professionals
5 pages
DDB 05 PDF
No ratings yet
DDB 05 PDF
19 pages
Chapter 7 - Distributed Database System
No ratings yet
Chapter 7 - Distributed Database System
27 pages
Distributed Database Transparency Features
No ratings yet
Distributed Database Transparency Features
6 pages
Distributed DBM S
No ratings yet
Distributed DBM S
67 pages
Distributed Database Design Methodologies
No ratings yet
Distributed Database Design Methodologies
14 pages
Distributed DB Systems Overview
No ratings yet
Distributed DB Systems Overview
67 pages
Concurrency Control in Distributed Datab
No ratings yet
Concurrency Control in Distributed Datab
5 pages
Top Down Database Design
No ratings yet
Top Down Database Design
4 pages
Chapter 10
No ratings yet
Chapter 10
39 pages
Basis For Distributed Database Technology
No ratings yet
Basis For Distributed Database Technology
35 pages
Distributed DB
No ratings yet
Distributed DB
146 pages
MultiControl Supplement en V3.1
No ratings yet
MultiControl Supplement en V3.1
152 pages
ADBMDatabase System Development Lifecycle
No ratings yet
ADBMDatabase System Development Lifecycle
7 pages
DDB Slides
No ratings yet
DDB Slides
67 pages
mt940 Details
No ratings yet
mt940 Details
18 pages
Climate Influence of Thermal Energy
No ratings yet
Climate Influence of Thermal Energy
6 pages
Car Audio Systems for Toyota, Honda, Kia
No ratings yet
Car Audio Systems for Toyota, Honda, Kia
68 pages
Enterprise Systems: Distributed Databases and Systems - DT211 4
No ratings yet
Enterprise Systems: Distributed Databases and Systems - DT211 4
25 pages
74 SENR1128-System Overview
No ratings yet
74 SENR1128-System Overview
21 pages
Square-Root in DCS or Flow Transmitter
No ratings yet
Square-Root in DCS or Flow Transmitter
3 pages
NetSDK Programming Manual
No ratings yet
NetSDK Programming Manual
49 pages
Riser Concept Selection For FPSO in Deepwater Norwegian Sea: A Case Study
No ratings yet
Riser Concept Selection For FPSO in Deepwater Norwegian Sea: A Case Study
12 pages
Mechatronics Engineering Curriculum
No ratings yet
Mechatronics Engineering Curriculum
10 pages
Distributed Databases Overview
No ratings yet
Distributed Databases Overview
33 pages
Stucor Ma3351 Er
No ratings yet
Stucor Ma3351 Er
149 pages
Chemical Transducer
100% (1)
Chemical Transducer
15 pages
Peniel Favour
100% (2)
Peniel Favour
4 pages
Multicast Sockets Overview and Practical Java Example 1
No ratings yet
Multicast Sockets Overview and Practical Java Example 1
10 pages
Carbon Black Surface Area Analysis
No ratings yet
Carbon Black Surface Area Analysis
39 pages
Chapter 5 Group 13 Elements
No ratings yet
Chapter 5 Group 13 Elements
16 pages
dataVAR LAAR
No ratings yet
dataVAR LAAR
1 page
Making Salts
No ratings yet
Making Salts
29 pages
AA-2285573-1 - Tip Over Test
No ratings yet
AA-2285573-1 - Tip Over Test
3 pages
Fex Guide
No ratings yet
Fex Guide
60 pages
Aging Performance and Moisture Solubility of Veg. Oils For Power Trfs.
No ratings yet
Aging Performance and Moisture Solubility of Veg. Oils For Power Trfs.
6 pages
Instruction Manual FOR New Mather Metals, Inc.: Ajax TOCCO Magnethermic Corporation
100% (1)
Instruction Manual FOR New Mather Metals, Inc.: Ajax TOCCO Magnethermic Corporation
289 pages
International Society For Soil Mechanics and Geotechnical Engineering
No ratings yet
International Society For Soil Mechanics and Geotechnical Engineering
6 pages
Final Report v1.5 Lucknow
No ratings yet
Final Report v1.5 Lucknow
173 pages
Aviation Engine Mechanics Quiz
No ratings yet
Aviation Engine Mechanics Quiz
120 pages