Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
41 views46 pages

Ch1 (CSE417)

Uploaded by

mahfujur752
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views46 pages

Ch1 (CSE417)

Uploaded by

mahfujur752
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Principles of Distributed Database

Systems
M. Tamer Özsu
Patrick Valduriez

© 2020, M.T. Özsu & P. Valduriez 1


Outline
■ Introduction
■ Distributed and Parallel Database Design
■ Distributed Data Control
■ Distributed Query Processing
■ Distributed Transaction Processing
■ Data Replication
■ Database Integration – Multidatabase Systems
■ Parallel Database Systems
■ Peer-to-Peer Data Management
■ Big Data Processing
■ NoSQL, NewSQL and Polystores
■ Web Data Management

© 2020, M.T. Özsu & P. Valduriez 2


Outline
■ Introduction
❑ What is a distributed DBMS
❑ History
❑ Distributed DBMS promises
❑ Design issues
❑ Distributed DBMS architecture

© 2020, M.T. Özsu & P. Valduriez 3


Distributed Computing

■ A number of autonomous processing elements (not


necessarily homogeneous) that are interconnected by a
computer network and that cooperate in performing their
assigned tasks.
■ What is being distributed?
❑ Processing logic
❑ Function
❑ Data
❑ Control

© 2020, M.T. Özsu & P. Valduriez 4


Current Distribution – Geographically
Distributed Data Centers

© 2020, M.T. Özsu & P. Valduriez 5


What is a Distributed Database System?

A distributed database is a collection of multiple, logically


interrelated databases distributed over a computer network

A distributed database management system (Distributed


DBMS) is the software that manages the DDB and provides
an access mechanism that makes this distribution
transparent to the users

© 2020, M.T. Özsu & P. Valduriez 6


What is not a DDBS?

■ A timesharing computer system


■ A loosely or tightly coupled multiprocessor system
■ A database system which resides at one of the nodes of
a network of computers - this is a centralized database
on a network node

© 2020, M.T. Özsu & P. Valduriez 7


Distributed DBMS Environment

© 2020, M.T. Özsu & P. Valduriez 8


Implicit Assumptions

■ Data stored at a number of sites → each site logically


consists of a single processor
■ Processors at different sites are interconnected by a
computer network → not a multiprocessor system
❑ Parallel database systems
■ Distributed database is a database, not a collection of
files → data logically related as exhibited in the users’
access patterns
❑ Relational data model
■ Distributed DBMS is a full-fledged DBMS
❑ Not remote file system, not a TP system

© 2020, M.T. Özsu & P. Valduriez 9


Important Point

Logically integrated
but
Physically distributed

© 2020, M.T. Özsu & P. Valduriez 10


Outline
■ Introduction
❑ What is a distributed DBMS
❑ History
❑ Distributed DBMS promises
❑ Design issues
❑ Distributed DBMS architecture

© 2020, M.T. Özsu & P. Valduriez 11


History – File Systems

© 2020, M.T. Özsu & P. Valduriez 12


History – Database Management

© 2020, M.T. Özsu & P. Valduriez 13


History – Early Distribution
Peer-to-Peer (P2P)

© 2020, M.T. Özsu & P. Valduriez 14


History – Client/Server

© 2020, M.T. Özsu & P. Valduriez 15


History – Data Integration

© 2020, M.T. Özsu & P. Valduriez 16


History – Cloud Computing

On-demand, reliable services provided over the Internet in


a cost-efficient manner
■ Cost savings: no need to maintain dedicated compute
power
■ Elasticity: better adaptivity to changing workload

© 2020, M.T. Özsu & P. Valduriez 17


Data Delivery Alternatives

■ Delivery modes
❑ Pull-only
❑ Push-only
❑ Hybrid
■ Frequency
❑ Periodic
❑ Conditional
❑ Ad-hoc or irregular
■ Communication Methods
❑ Unicast
❑ One-to-many
■ Note: not all combinations make sense

© 2020, M.T. Özsu & P. Valduriez 18


Outline
■ Introduction
❑ What is a distributed DBMS
❑ History
❑ Distributed DBMS promises
❑ Design issues
❑ Distributed DBMS architecture

© 2020, M.T. Özsu & P. Valduriez 19


Distributed DBMS Promises

❶ Transparent management of distributed, fragmented, and


replicated data

❷ Improved reliability/availability through distributed


transactions

❸ Improved performance

❹ Easier and more economical system expansion

© 2020, M.T. Özsu & P. Valduriez


Transparency

■ Transparency is the separation of the higher-level


semantics of a system from the lower level
implementation issues.
■ Fundamental issue is to provide
data independence
in the distributed environment
❑ Network (distribution) transparency
❑ Replication transparency
❑ Fragmentation transparency
■ horizontal fragmentation: selection
■ vertical fragmentation: projection
■ hybrid

© 2020, M.T. Özsu & P. Valduriez


Example

© 2020, M.T. Özsu & P. Valduriez 22


Transparent Access

Toky
o
SELECT ENAME,SAL
Boston Paris
FROM EMP,ASG,PAY
WHERE DUR > 12 Paris projects
Paris employees
AND EMP.ENO = ASG.ENO Communicatio Paris
n assignments
AND PAY.TITLE = EMP.TITLE Boston
Boston projects Network employees
Boston employees
Boston
assignments
Montrea
Ne l
Montreal projects
w Paris projects
BostonYor
projects New York projects
New York employees with budget >
k
New York projects 200000
New York Montreal employees
assignments Montreal assignments

© 2020, M.T. Özsu & P. Valduriez 23


Distributed Database - User View

Distributed
Database

© 2020, M.T. Özsu & P. Valduriez 24


Distributed DBMS - Reality
User
Quer
y
User
DBMS
Applicatio
Softwar n
e DBMS
Softwar
e
DBMS Communicati
Softwar on
e Subsystem
User
DBMS User Applicatio
Softwar Quer n
e y DBMS
Softwar
e
User
Quer
y

© 2020, M.T. Özsu & P. Valduriez 25


Types of Transparency

■ Data independence
■ Network transparency (or distribution transparency)
❑ Location transparency
❑ Fragmentation transparency
■ Fragmentation transparency
■ Replication transparency

© 2020, M.T. Özsu & P. Valduriez 26


Reliability Through Transactions

■ Replicated components and data should make distributed


DBMS more reliable.
■ Distributed transactions provide
❑ Concurrency transparency
❑ Failure atomicity
• Distributed transaction support requires implementation of
❑ Distributed concurrency control protocols
❑ Commit protocols
■ Data replication
❑ Great for read-intensive workloads, problematic for updates
❑ Replication protocols

© 2020, M.T. Özsu & P. Valduriez 27


Potentially Improved Performance

■ Proximity of data to its points of use


❑ Requires some support for fragmentation and replication

■ Parallelism in execution
❑ Inter-query parallelism

❑ Intra-query parallelism

© 2020, M.T. Özsu & P. Valduriez 28


Scalability

■ Issue is database scaling and workload scaling

■ Adding processing and storage power

■ Scale-out: add more servers


❑ Scale-up: increase the capacity of one server → has limits

© 2020, M.T. Özsu & P. Valduriez 29


Outline
■ Introduction
❑ What is a distributed DBMS
❑ History
❑ Distributed DBMS promises
❑ Design issues
❑ Distributed DBMS architecture

© 2020, M.T. Özsu & P. Valduriez 30


Distributed DBMS Issues

■ Distributed database design


❑ How to distribute the database
❑ Replicated & non-replicated database distribution
❑ A related problem in directory management
■ Distributed query processing
❑ Convert user transactions to data manipulation instructions
❑ Optimization problem
■ min{cost = data transmission + local processing}
❑ General formulation is NP-hard

© 2020, M.T. Özsu & P. Valduriez 31


Distributed DBMS Issues

■ Distributed concurrency control


❑ Synchronization of concurrent accesses
❑ Consistency and isolation of transactions' effects
❑ Deadlock management
■ Reliability
❑ How to make the system resilient to failures
❑ Atomicity and durability

© 2020, M.T. Özsu & P. Valduriez 32


Distributed DBMS Issues

■ Replication
❑ Mutual consistency
❑ Freshness of copies
❑ Eager vs lazy
❑ Centralized vs distributed
■ Parallel DBMS
❑ Objectives: high scalability and performance
❑ Not geo-distributed
❑ Cluster computing

© 2020, M.T. Özsu & P. Valduriez 33


Related Issues

■ Alternative distribution approaches


❑ Modern P2P
❑ World Wide Web (WWW or Web)
■ Big data processing
❑ 4V: volume, variety, velocity, veracity
❑ MapReduce & Spark
❑ Stream data
❑ Graph analytics
❑ NoSQL
❑ NewSQL
❑ Polystores

© 2020, M.T. Özsu & P. Valduriez 34


Outline
■ Introduction
❑ What is a distributed DBMS
❑ History
❑ Distributed DBMS promises
❑ Design issues
❑ Distributed DBMS architecture

© 2020, M.T. Özsu & P. Valduriez 35


DBMS Implementation Alternatives

© 2020, M.T. Özsu & P. Valduriez 36


Dimensions of the Problem

■ Distribution
❑ Whether the components of the system are located on the same machine or
not
■ Heterogeneity
❑ Various levels (hardware, communications, operating system)
❑ DBMS important one
■ data model, query language,transaction management algorithms
■ Autonomy
❑ Not well understood and most troublesome
❑ Various versions
■ Design autonomy: Ability of a component DBMS to decide on issues related to
its own design.
■ Communication autonomy: Ability of a component DBMS to decide whether and
how to communicate with other DBMSs.
■ Execution autonomy: Ability of a component DBMS to execute local operations
in any manner it wants to.

© 2020, M.T. Özsu & P. Valduriez 37


Client/Server Architecture

© 2020, M.T. Özsu & P. Valduriez 38


Advantages of Client-Server
Architectures
■ More efficient division of labor
■ Horizontal and vertical scaling of resources
■ Better price/performance on client machines
■ Ability to use familiar tools on client machines
■ Client access to remote data (via standards)
■ Full DBMS functionality provided to client workstations
■ Overall better system price/performance

© 2020, M.T. Özsu & P. Valduriez 39


Database Server

© 2020, M.T. Özsu & P. Valduriez 40


Distributed Database Servers

© 2020, M.T. Özsu & P. Valduriez 41


Peer-to-Peer Component Architecture

© 2020, M.T. Özsu & P. Valduriez 42


MDBS Components & Execution

© 2020, M.T. Özsu & P. Valduriez 43


Mediator/Wrapper Architecture

© 2020, M.T. Özsu & P. Valduriez 44


Cloud Computing

On-demand, reliable services provided over the Internet in


a cost-efficient manner
■ IaaS – Infrastructure-as-a-Service

■ PaaS – Platform-as-a-Service

■ SaaS – Software-as-a-Service

■ DaaS – Database-as-a-Service

© 2020, M.T. Özsu & P. Valduriez 45


Simplified Cloud Architecture

© 2020, M.T. Özsu & P. Valduriez 46

You might also like