0% found this document useful (0 votes)

11 views6 pages

22 Distributed

The lecture introduces distributed databases, explaining how a distributed DBMS divides a logical database across multiple physical resources while maintaining fault tolerance. It outlines different system architectures such as shared everything, shared memory, shared disk, and shared nothing, along with their respective advantages and challenges. Key design issues include data transparency, query execution, and partitioning schemes, which are essential for efficient distributed database management.

Uploaded by

SAKET K PANDEY

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views6 pages

22 Distributed

Uploaded by

SAKET K PANDEY

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Lecture #21: Introduction to Distributed Databases

15-445/645 Database Systems (Spring 2024)

https://15445.courses.cs.cmu.edu/spring2024/
Carnegie Mellon University
Jignesh Patel

1 Distributed DBMSs
A distributed DBMS divides a single logical database across multiple physical resources. The application
is (usually) unaware that data is split across separated hardware. The system relies on the techniques and
algorithms from single-node DBMSs to support transaction processing and query execution in a distributed
environment. An important goal in designing a distributed DBMS is fault tolerance (i.e., avoiding a single
one node failure taking down the entire system).
The differences between parallel and distributed DBMSs are:
Parallel Database:
• Nodes are physically close to each other.
• Nodes are connected via high-speed LAN (fast, reliable communication fabric).
• The communication cost between nodes is assumed to be small. As such, one does not need to worry
about nodes crashing or packets getting dropped when designing internal protocols.
Distributed Database:
• Nodes can be far from each other.
• Nodes are potentially connected via a public network, which can be slow and unreliable.
• The communication cost and connection problems cannot be ignored (i.e., nodes can crash, and
packets can get dropped).

2 System Architectures
A DBMS’s system architecture specifies what shared resources are directly accessible to CPUs. It affects
how CPUs coordinate with each other and where they retrieve and store objects in the database.
A single-node DBMS uses what is called a shared everything architecture. This single node executes work-
ers on a local CPU(s) with its own local memory address space and disk.

Shared Memory
An alternative to shared everything architecture in distributed systems is shared memory. CPUs have
access to common memory address space via a fast interconnect. CPUs also share the same disk.
In practice, most DBMSs do not use this architecture, as it is provided at the OS / kernel level. It also causes
problems, since each process’s scope of memory is the same memory address space, which can be modified
by multiple processes.
Each processor has a global view of all the in-memory data structures. Each DBMS instance on a processor
has to “know” about the other instances.
Spring 2024 – Lecture #21 Introduction to Distributed Databases

Figure 1: Database System Architectures – Four system architecture approaches

ranging from sharing everything (used by non distributed systems) to sharing mem-
ory, disk, or nothing.

Shared Disk
In a shared disk architecture, all CPUs can read and write to a single logical disk directly via an interconnect,
but each have their own private memories. The local storage on each compute node can act as caches. This
approach is more common in cloud-based DBMSs.
The DBMS’s execution layer can scale independently from the storage layer. Adding new storage nodes or
execution nodes does not affect the layout or location of data in the other layer.
Nodes must send messages between them to learn about other node’s current state. That is, since memory
is local, if data is modified, changes must be communicated to other CPUs in the case that piece of data is
in main memory for the other CPUs.
Nodes have their own buffer pool and are considered stateless. A node crash does not affect the state of
the database since that is stored separately on the shared disk. The storage layer persists the state in the
case of crashes.

Shared Nothing
In a shared nothing environment, each node has its own CPU, memory, and disk. Nodes only communicate
with each other via network. Before the rise of cloud storage platforms, the shared nothing architecture
used to be considered the correct way to build distributed DBMSs.
It is more difficult to increase capacity in this architecture because the DBMS has to physically move data
to new nodes. It is also difficult to ensure consistency across all nodes in the DBMS, since the nodes must
coordinate with each other on the state of transactions. The advantage, however, is that shared nothing
DBMSs can potentially achieve better performance and are more efficient then other types of distributed
DBMS architectures.

15-445/645 Database Systems

Page 2 of 6
Spring 2024 – Lecture #21 Introduction to Distributed Databases

3 Design Issues
Distributed DBMSs aim to maintain data transparency, meaning that users should not be required to know
where data is physically located, or how tables are partitioned or replicated. The details of how data is
being stored is hidden from the application. In other words, a SQL query that works on a single-node
DBMS should work the same on a distributed DBMS.
The key design questions that distributed database systems must address are the following:
• How does the application find data?
• How should queries be executed on a distributed data? Should the query be pushed to where the
data is located? Or should the data be pooled into a common location to execute the query?
• How does the DBMS ensure correctness?
Another design decision to make involves deciding how the nodes will interact in their clusters. Two
options are homogeneous and heterogeneous nodes, which are both used in modern-day systems.
Homogeneous Nodes: Every node in the cluster can perform the same set of tasks (albeit on potentially
different partitions of data), lending itself well to a shared nothing architecture. This makes provisioning
and failover “easier”. Failed tasks are assigned to available nodes.
Heterogeneous Nodes: Nodes are assigned specific tasks, so communication must happen between nodes
to carry out a given task. This allows a single physical node to host multiple “virtual” node types for
dedicated tasks that can independently scale from one node to other. An example is MongoDB, which has
router nodes routing queries to shards and config server nodes storing the mapping from keys to shards.

4 Partitioning Schemes
Distributed system must partition the database across multiple resources, including disks, nodes, proces-
sors. This process is sometimes called sharding in NoSQL systems. When the DBMS receives a query, it
first analyzes the data that the query plan needs to access. The DBMS may potentially send fragments of
the query plan to different nodes, then combines the results to produce a single answer.
The goal of a partitioning scheme is to maximize single-node transactions, or transactions that only access
data contained on one partition. This allows the DBMS to not need to coordinate the behavior of concurrent
transactions running on other nodes. On the other hand, a distributed transaction accesses data at one or
more partitions. This requires expensive, difficult coordination, discussed in the below section.
For logically partitioned nodes, particular nodes are in charge of accessing specific tuples from a shared
disk. For physically partitioned nodes, each shared nothing node reads and updates tuples it contains on its
own local disk.

Implementation
The simplest way to partition tables is naive data partitioning. Each node stores one table, assuming enough
storage space for a given node. This is easy to implement because a query is just routed to a specific
partitioning. This can be bad, since it is not scalable. One partition’s resources can be exhausted if that
one table is queried on often, not using all nodes available. See Figure 2 for an example.
Another way of partitioning is vertical partitioning, which splits a table’s attributes into separate partitions.
Each partition must also store tuple information for reconstructing the original record.
More commonly, horizontal partitioning s used which splits a table’s tuples into disjoint subsets. Choose
column(s) that divides the database equally in terms of size, load or usage, called the partitioning key(s).

15-445/645 Database Systems

Page 3 of 6
Spring 2024 – Lecture #21 Introduction to Distributed Databases

Figure 2: Naive Table Partitioning – Given two tables, place all the tuples in table
one into one partition and the tuples in table two into the other.

The DBMS can partition a database physically (shared nothing) or logically (shared disk) based on hashing,
data ranges or predicates. See Figure 3 for an example. The problem of hash partitioning is that when a
node is added or removed, a lot of data has to be shuffled around. The solution for this is Consistent Hashing.

Figure 3: Horizontal Table Partitioning – Use hash partitioning to decide where

to send the data. When the DBMS receives a query, it will use the table’s partitioning
key(s) to find out where the data is.

Consistent Hashing assigns every node to a location on some logical ring. Then the hash of every parti-
tion key maps to a location on the ring. The node that is closest to the key in the clockwise direction is
responsible for that key. See Figure 4 for an example. When a node is added or removed, keys are only
moved between nodes adjacent to the new/removed node and so only 1/n fraction of the keys are moved.
A replication factor of k means that each key is replicated at the k closest nodes in the clockwise direction.
Logical Partitioning: A node is responsible for a set of keys, but it doesn’t actually store those keys. This
is commonly used in a shared disk architecture.
Physical Partitioning: A node is responsible for a set of keys, and it physically stores those keys. This is
commonly used in a shared nothing architecture.

5 Distributed Concurrency Control

A distributed transaction accesses data at one or more partitions, which requires expensive coordination.

Centralized coordinator
The centralized coordinator acts as a global “traffic cop” that coordinates all the behavior. See Figure 5 for
a diagram.

15-445/645 Database Systems

Page 4 of 6
Spring 2024 – Lecture #21 Introduction to Distributed Databases

Figure 4: Consistent Hashing – All nodes are responsible for some portion of hash
ring. Here, node P 1 is responsible for storing key1 and node P 3 is responsible for
storing key2.

Middleware
Centralized coordinators can be used as middleware, which accepts query requests and routes queries to
correct partitions.

Decentralized coordinator
In a decentralized approach, nodes organize themselves. The client directly sends queries to one of the
partitions. This home partition will send results back to the client. The home partition is in charge of
communicating with other partitions and committing accordingly.
Centralized approaches give way to bottlenecks in the case that multiple clients are trying to acquire locks
on the same partitions. However, it can be better for distributed 2PL as it has a central view of the locks
and can handle deadlocks more quickly. This is non-trivial with decentralized approaches.

15-445/645 Database Systems

Page 5 of 6
Spring 2024 – Lecture #21 Introduction to Distributed Databases

Figure 5: Centralized Coordinator – The client communicates with the coordina-

tor to acquire locks on the partitions that the client wants to access. Once it receives
an acknowledgement from the coordinator, the client sends its queries to those par-
titions. Once all queries for a given transaction are done, the client sends a commit
request to the coordinator. The coordinator then communicates with the partitions
involved in the transaction to determine whether the transaction is allowed to com-
mit.

15-445/645 Database Systems

Page 6 of 6

Unit - 2 (1) DBMS
No ratings yet
Unit - 2 (1) DBMS
25 pages
Unit - I Distributed Data Processing
100% (3)
Unit - I Distributed Data Processing
27 pages
Unit 1 DISTRIBUTED DATABASE
No ratings yet
Unit 1 DISTRIBUTED DATABASE
6 pages
Difference Between Mongodb and RDBMS
No ratings yet
Difference Between Mongodb and RDBMS
5 pages
Distributeddatabase
No ratings yet
Distributeddatabase
27 pages
DB Unit-2
No ratings yet
DB Unit-2
27 pages
Module 2
No ratings yet
Module 2
62 pages
CS8492 DBMS Unit 5
No ratings yet
CS8492 DBMS Unit 5
20 pages
Real Time Mainframe Interview Questions and Answers
No ratings yet
Real Time Mainframe Interview Questions and Answers
13 pages
Week 2 Parallel and Distributed Database
No ratings yet
Week 2 Parallel and Distributed Database
7 pages
Dbms Unit5
No ratings yet
Dbms Unit5
17 pages
Types of Distributed Databases.: Homogeneous Distributed Databases System Heterogeneous Distributed Database System
No ratings yet
Types of Distributed Databases.: Homogeneous Distributed Databases System Heterogeneous Distributed Database System
22 pages
DBMS - Relational Model (Full)
No ratings yet
DBMS - Relational Model (Full)
28 pages
Distributed DBMS - Database Environments
No ratings yet
Distributed DBMS - Database Environments
7 pages
Distributed Database System
No ratings yet
Distributed Database System
5 pages
What Is A Distributed Database
No ratings yet
What Is A Distributed Database
8 pages
Distributed Databases: Daniel Marcous
No ratings yet
Distributed Databases: Daniel Marcous
41 pages
Q # 1: What Are The Components of Distributed Database System? Explain With The Help of A Diagram. Answer
No ratings yet
Q # 1: What Are The Components of Distributed Database System? Explain With The Help of A Diagram. Answer
12 pages
SQL Constraints Overview
No ratings yet
SQL Constraints Overview
4 pages
RDBMS - Module5 - Distributed and Parallel DB
No ratings yet
RDBMS - Module5 - Distributed and Parallel DB
7 pages
DD Mid Answers
No ratings yet
DD Mid Answers
29 pages
DB 5
No ratings yet
DB 5
17 pages
I Am Sharing 'UNIT-II' With You
No ratings yet
I Am Sharing 'UNIT-II' With You
154 pages
Viva Questions
No ratings yet
Viva Questions
18 pages
Knowledge Base Compatible
No ratings yet
Knowledge Base Compatible
20 pages
Distributed Database Concepts
No ratings yet
Distributed Database Concepts
52 pages
DBMS Syllabus
No ratings yet
DBMS Syllabus
3 pages
Week 12 - Distributed Databases
No ratings yet
Week 12 - Distributed Databases
37 pages
Chapter 4 - Distributed Database System
No ratings yet
Chapter 4 - Distributed Database System
52 pages
04 - Distributed DBMSs - Concepts and Design
No ratings yet
04 - Distributed DBMSs - Concepts and Design
72 pages
Lecture 1 Ho
No ratings yet
Lecture 1 Ho
62 pages
DDBS Lec2
No ratings yet
DDBS Lec2
16 pages
Top 130 SQL Interview Questions and Answers - Datawarehouse Architect
75% (4)
Top 130 SQL Interview Questions and Answers - Datawarehouse Architect
12 pages
9.CSI2004-ADBMS Module2 Part1
No ratings yet
9.CSI2004-ADBMS Module2 Part1
54 pages
CH 4
No ratings yet
CH 4
16 pages
Midterm Elective Database Notes
No ratings yet
Midterm Elective Database Notes
14 pages
Lab 2
No ratings yet
Lab 2
9 pages
Unit - I DBMS
No ratings yet
Unit - I DBMS
74 pages
CS3492-DBMS Unit-5
No ratings yet
CS3492-DBMS Unit-5
9 pages
Adb CH 4
No ratings yet
Adb CH 4
14 pages
Parallel and Distributed Databases
No ratings yet
Parallel and Distributed Databases
7 pages
NoSQL & Distributed Databases Overview
No ratings yet
NoSQL & Distributed Databases Overview
124 pages
02 Distdbms Storage
No ratings yet
02 Distdbms Storage
62 pages
Assignment
100% (1)
Assignment
35 pages
Distributed Database
100% (1)
Distributed Database
24 pages
Topic 7 - Distributed Database Systems
No ratings yet
Topic 7 - Distributed Database Systems
44 pages
Oracle Apps R122 - ADO
No ratings yet
Oracle Apps R122 - ADO
147 pages
Distributed
No ratings yet
Distributed
83 pages
Distributed DB
No ratings yet
Distributed DB
16 pages
Lec21Notes Merged
No ratings yet
Lec21Notes Merged
20 pages
Distrubuted Database Concept
No ratings yet
Distrubuted Database Concept
22 pages
Advanced Database Technologies Exam
0% (1)
Advanced Database Technologies Exam
2 pages
Database Fundamentals: Lecturer: Rana Salah
No ratings yet
Database Fundamentals: Lecturer: Rana Salah
27 pages
Unit 2-DBP
No ratings yet
Unit 2-DBP
44 pages
Parallel Database Systems Guide
No ratings yet
Parallel Database Systems Guide
132 pages
Procedure To ROLLBACK FORCE Pending In-Doubt Transaction
No ratings yet
Procedure To ROLLBACK FORCE Pending In-Doubt Transaction
2 pages
8-Distributed Database
No ratings yet
8-Distributed Database
22 pages
Distributed Database Systems Guide
No ratings yet
Distributed Database Systems Guide
24 pages
RA ARCHI DAVAO Jan2017 PDF
No ratings yet
RA ARCHI DAVAO Jan2017 PDF
5 pages
Distributed Database Systems Guide
No ratings yet
Distributed Database Systems Guide
5 pages
Distributed Database Systems Guide
0% (1)
Distributed Database Systems Guide
54 pages
Basis For Distributed Database Technology
No ratings yet
Basis For Distributed Database Technology
35 pages
Distributed Database Management Systems
No ratings yet
Distributed Database Management Systems
123 pages
Postgres 4.1
No ratings yet
Postgres 4.1
765 pages
Parallel Databases
No ratings yet
Parallel Databases
23 pages
SQL Osnovne Komande I Opis
No ratings yet
SQL Osnovne Komande I Opis
5 pages
Basis For Distributed Database Technology
No ratings yet
Basis For Distributed Database Technology
35 pages
1 Distributed DB
No ratings yet
1 Distributed DB
67 pages
SQL Essentials for DBAs
No ratings yet
SQL Essentials for DBAs
61 pages
Advanced Distributed Databases
100% (1)
Advanced Distributed Databases
20 pages
SQL Server Database Connection Guide
No ratings yet
SQL Server Database Connection Guide
2 pages
WhitePaper Adding Speed and Scale To PostgreSQL
No ratings yet
WhitePaper Adding Speed and Scale To PostgreSQL
11 pages
Introduction To Distributed Databases: Intro To Database Systems Andy Pavlo
No ratings yet
Introduction To Distributed Databases: Intro To Database Systems Andy Pavlo
37 pages
Team:DBMS: by Navdeep Kaur Assistant Professor Computer Science Department
No ratings yet
Team:DBMS: by Navdeep Kaur Assistant Professor Computer Science Department
19 pages
Elmasri and Navathe Fundamental of Database Systems 7th Edition
No ratings yet
Elmasri and Navathe Fundamental of Database Systems 7th Edition
2 pages
Distributed DB
No ratings yet
Distributed DB
146 pages
Hadoop & Big Data Mastery Course
No ratings yet
Hadoop & Big Data Mastery Course
8 pages
Database and SQL Lab
0% (1)
Database and SQL Lab
32 pages
SQL Server DBA Interview Questions
100% (2)
SQL Server DBA Interview Questions
3 pages
20464C ENU TrainerHandbook PDF
100% (1)
20464C ENU TrainerHandbook PDF
438 pages
Introduction To Database Systems CSE 444: Lecture 3: SQL (Part 2)
No ratings yet
Introduction To Database Systems CSE 444: Lecture 3: SQL (Part 2)
36 pages
Lecture 1 Ho PDF
No ratings yet
Lecture 1 Ho PDF
62 pages
Advanced Data Base Management Systems
No ratings yet
Advanced Data Base Management Systems
35 pages
10 Distributeddbms
No ratings yet
10 Distributeddbms
56 pages
Olap Case Study - VJ
No ratings yet
Olap Case Study - VJ
16 pages
Mock Test
No ratings yet
Mock Test
5 pages
Database Testing Tutorial
100% (1)
Database Testing Tutorial
43 pages
Práctica 7
No ratings yet
Práctica 7
9 pages

22 Distributed

Uploaded by

22 Distributed

Uploaded by

Lecture #21: Introduction to Distributed Databases

15-445/645 Database Systems (Spring 2024)

Figure 1: Database System Architectures – Four system architecture approaches

15-445/645 Database Systems

15-445/645 Database Systems

Figure 3: Horizontal Table Partitioning – Use hash partitioning to decide where

5 Distributed Concurrency Control

15-445/645 Database Systems

15-445/645 Database Systems

Figure 5: Centralized Coordinator – The client communicates with the coordina-

15-445/645 Database Systems

You might also like