Complexity in the Database Allocation
Design
Must take relationship between fragments into
account
Cost of integrity enforcements
Constraints on response-time, storage, and
processing capability
Needed Information to do
Allocation
Database information: tuple size, cardinality of
fragment
Application information: #updates/#retrieval a
query performs on a fragment
Site Information: storage and processing
capabilities and cost of processing a unit of work
Network Information: communication cost to
transfer a block of data between sites i and j
No models developed to date can
handle all the constraints
Current models simplify assumptions
and work with some specific
situations
Formulation of DAP
DAP can be formulated as an optimization
problem
Min(Total Cost)
Subject to
response time constraint
Storage constraint
Processing load constraint
DAP is NP-complete and several heuristics
have been proposed
Constraints
Q: Set of all queries
S: Set of all sites
STCjk: Storage cost of fragment Fj at Sk
Execution time constraint
Execution time of qi <= maximum response time
of qi for all qi in Q
Storage constraint
STC
F j
jk
storage capacity at site S k ,S k S
Processing constraint
processing load of q
qi Q
at site Sk processing capacity of Sk , Sk S
Cost Computation
Decision variable xij defined as
1 if fragment Fi is stored at site S j
xij =
0 otherwise
Total Cost = query processing cost +
storage cost
Solve the optimization constraint for
xij
Cost Model
Total cost=query processing cost + storage cost
TOC =
QPC
qi Q
STC
S k S F j F
jk
Unit cost of storing data at Sk
STC jk = USCk size( F j ) x jk
size( F j ) = card ( F j ) * length( F j )
Query Processing Cost
QPCi = PCi + TCi
Computation
cost of qi
Transfer
cost of qi
PCi = ACi + IEi + CCi
access cost of qi +
integrity enforcement cost of qi +
concurrency control cost of qi
LPCk: Cost of processing one unit of work at site Sk
RRij: Number of read accesses a query qi makes
to a fragment Fj
URij: #of update accesses a query qi makes to a
fragment Fj
Access cost of query qi
ACi =
(u
S k S F j F
ij
URij + rij RRij ) x jk LPCk
1 if query qi updates F j
uij =
0 otherwise
0
Assume cost of an
1 if query qi retrieves from F j
update same as cost rij =
0 otherwise
0
of retrieval
Transmission Cost Model
TCi = TCU i + TCRi
Update cost: Need to perform updates to
all replicas; no large results sent back
TCU i =
S k S F j F
ij
* x jk * g o ( i ),k +
Cost of update message to all
replicas that are involved in qi
S k S F j F
ij
* x jk * g k ,o ( i )
Cost of confirmation message
back to i
gij: communication cost per message between Si and Sj
Retrieval Cost Model
TCRi =
S k S
F j F
min (rij * x jk * g o (i ),k + rij * x jk *
Cost of sending a query
seli ( F j ) * length( F j )
fsize
* g k ,o ( i ) )
Cost of sending the results back
Pick the least cost site among all sites with the replicas
gij: communication cost per message between Si and Sj
fsize: #Bytes in a message
length(Fj): #bytes in fragment Fj
Seli(Fj): Selectivity Factor of qi on Fj
Heuristic Approaches
Allocation of Horizontal Fragments
Allocation of Vertical Fragments
(Material not in the textbook)
i:
fragment index
j:
site index
k:
application index
ALLOCATION
Notations
Frequency of application k
at site j
rki: Number of retrieval
references of application k
to fragment i.
uki: Number of update
references of application k
to fragment i.
nki = rki + uki (Number of accesses
of application k to fragment
fkj:
Site j
Fragment i
uki
rki
Application k
/w freq. fkj
Allocation of Horizontal Fragments (1)
No replication: Best Fit Strategy
The number of local references of Ri at site j is
B ij =
f kj n ki
Ri is allocated at site j* such that Bij* is maximum.
Advantage: A fragment is allocated to a site that needs it most.
Disadvantage: It disregards the mutual effect of placing a
fragment at a given site if a related fragment is also at that
site.
Allocation of Horizontal Fragments (2)
All beneficial sites approach (replication)
Bij = f kj rki c f kj 'uki
k
Cost of retrieval
references
j ' j
Cost of update
references from
other sites
Ri is allocated at all sites j* such that Bij* > 0.
When all Bijs are negative, a single copy of Ri is
placed at the site such that Bij* is maximum.
Allocation of Horizontal Fragments (3)
Another Replication Approach:
di
The degree of redundancy of Ri
Fi
The reliability and availability benefit of having Ri fully replicated.
(di)
The reliability and availability benefit when the fragment has di
copies.
(1) = 0, (2) = F i , (3) = 3 F i ,
(d i ) = (1 21d ) F i
i
The benefit of introducing a new copy of Ri at site j :
Bij = f kj rki c f kj 'u ki + ( d i )
k
j ' j
Same as All Beneficial
Sites approach
Also takes into
account the benefit
of replication
Allocation of Horizontal Fragments (4)
All Beneficial Sites
Approach:
1. Determine the set of
all sites where the
benefit of allocating
one copy of the
fragment is higher
than the cost.
2. Allocate a copy of the
fragment to each site
in the set.
Alternatively:
1. Determine the solution
of the non-replicated
problem.
2. Progressively
introduce replicated
copies starting from
the most beneficial;
the process is
terminated when no
additional replication is
beneficial.
How about Heuristics for Vertical
Allocation?
SUMMARY
Design of a distributed DB consists of four phases:
Phase 1: Global schema design (same as in centralized DB
design)
Phase 2: Fragmentation
Horizontal Fragmentation
Primary: Determine a complete and minimal set of predicates
Derived: Use semijoin
Vertical Fragmentation
Identify fragments such that many applications can be executed
using just one fragment.
Phase 3: Allocation
The primary goal is to minimize the number of remote accesses.
Phase 4: Physical schema design (same as in centralized DB
design).
10