0% found this document useful (0 votes)

33 views5 pages

Unit 4 - DSRM

Distributed database systems use the Two-Phase Commit protocol for transaction commit and recovery is facilitated by logging and checkpoints. The 2PC protocol involves a prepare and commit phase where participants vote to commit or abort. Logging records changes to a log file before applying them to the database. Checkpoints periodically save the system state to stable storage to provide a recovery starting point.

Uploaded by

kashish.sharma.batch2021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views5 pages

Unit 4 - DSRM

Uploaded by

kashish.sharma.batch2021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Unit 4

Failures and Their Classification:

Definition: A failure in a distributed database system refers to any event that disrupts the normal
operation of the system, resulting in the loss of data consistency, availability, or reliability.

Types of Failures:

- Hardware Failures: Failures in physical components such as servers, disks, or network devices.

- Software Failures: Errors or bugs in the software components of the database system, such as the
database management system (DBMS) or applications.

- Network Failures: Communication failures or network outages that prevent data transmission
between distributed nodes.

- Site Failures: Failures that affect an entire site or data center, resulting in the loss of access to all
resources hosted at that location.

- Media Failures: Physical damage or corruption to storage media, such as disks or tapes, leading to
data loss or corruption.

Classification:

- Transient Failures: Temporary failures that can be recovered from quickly, such as a network
glitch or a brief power outage.

- Permanent Failures: Irreversible failures that require more extensive recovery procedures, such
as hardware failures or data corruption.

__________________________________________________________________________________

Checkpoints and Recovery:

1. Checkpoints:

- Definition: Checkpoints are predefined moments in time when the state of a distributed database
system is saved to stable storage, allowing recovery to a consistent state after a failure.

- Purpose: Checkpoints help reduce the amount of work needed during recovery by providing a
consistent starting point.

- Types:

- Periodic Checkpoints: Scheduled at regular intervals to save the current state of the system.

- Forced Checkpoints: Triggered manually or automatically in response to specific events, such as

transaction commits or system checkpoints.

2. Recovery:
- Definition: Recovery in a distributed database system involves restoring the system to a
consistent state after a failure occurs.

- Phases:

- Analysis: Identifying the transactions that were in progress at the time of failure and
determining the necessary actions for recovery.

- Undo: Reverting the effects of incomplete transactions by rolling them back to their pre-failure
state.

- Redo: Reapplying the effects of committed transactions that were lost due to the failure.

- Techniques:

- Backward Recovery: Reverting to a previous consistent state and replaying transactions from
that point forward.

- Forward Recovery: Applying recovery actions directly to the current state of the system without
reverting to a previous state.

3. Recovery Protocols:

- Two-Phase Commit (2PC): Ensures atomicity and durability of distributed transactions by

coordinating commit or rollback decisions among participating nodes.

- Three-Phase Commit (3PC): Enhances the reliability of 2PC by introducing a prepare phase to
handle failure scenarios more robustly.

Process Resilience

Definition: Process resilience refers to the ability of a system or application to continue functioning
despite failures or disruptions.

Fault Tolerance:

- Redundancy: Introducing duplicate processes or components to ensure continued operation if

one fails.

- Failure Detection: Detecting failures quickly to initiate recovery processes.

- Recovery Mechanisms: Implementing strategies such as checkpointing and rollback to recover

from failures.

Techniques -

Replication: Running multiple instances of a process on different nodes to tolerate failures.

- Isolation: Isolating individual processes to prevent failures from propagating to other

components.
- Graceful Degradation: Prioritizing essential functions to maintain basic functionality during failure
conditions.

Challenges:

- Overhead: Replication and recovery mechanisms can introduce overhead in terms of resources
and performance.

- Consistency: Ensuring consistency across replicated processes while maintaining performance.

- Complexity: Designing and managing resilient systems can be complex and require careful
planning.

__________________________________________________________________________________

Reliable Client-Server Communication:

Definition: Reliable client-server communication ensures that data is transmitted accurately and in
the correct order between clients and servers, even in the presence of failures or network issues.

Techniques

- Acknowledgments: Using acknowledgments to confirm successful receipt of data and

retransmitting if necessary.

- Sequence Numbers: Assigning sequence numbers to data packets to ensure correct ordering.

- Timeouts and Retransmissions: Setting timeouts to detect lost packets and retransmitting them if
no acknowledgment is received.

Protocols:

- TCP (Transmission Control Protocol): Provides reliable, connection-oriented communication with

mechanisms such as acknowledgment, retransmission, and flow control.

- HTTP (Hypertext Transfer Protocol): Built on top of TCP, it ensures reliable transfer of web data
between clients and servers.

- RPC (Remote Procedure Call): Provides reliable communication between distributed systems by
abstracting procedure calls over the network.

4. Challenges:

- Performance: Ensuring reliability without sacrificing performance can be challenging.

- Overhead: Adding reliability mechanisms can increase network overhead and latency.

- Scalability: Maintaining reliability in large-scale distributed systems with many clients and servers
can be complex.

_____________________________________________________________________

Reliable Group Communication:

Definition: Reliable group communication ensures that messages are delivered to all members of a
group in a consistent and ordered manner, even in the presence of failures or network partitions.

Techniques

- Total Order: Ensuring that messages are delivered to all group members in the same order.

- View Synchronization: Keeping group members synchronized to detect failures and maintain
consistency.

- Membership Management: Handling dynamic changes in group membership due to joins, leaves,
or failures.

3. Protocols:

- IP Multicast: Allows for one-to-many communication by sending packets to a group of destination

hosts.

- Paxos: A consensus protocol used to ensure agreement among a group of nodes in a distributed
system.

- Virtual Synchrony: Maintains a consistent view of the group by synchronizing membership

changes and message delivery.

4. Challenges:

- Scalability: Ensuring reliable group communication in large-scale distributed systems with many
members.

- Fault Tolerance: Handling failures and network partitions while maintaining consistency.

- Complexity: Designing and implementing reliable group communication protocols can be complex
and require careful consideration of various factors.

Mechanism for commit and recovery in distributed Database system

Ans: In distributed database systems, the Two-Phase Commit (2PC) protocol is commonly used for
commit, and recovery is often facilitated by techniques such as logging and checkpoints.

Two-Phase Commit Protocol:

1. Prepare Phase:

- The coordinator (typically the transaction manager) sends a prepare request to all participants
(resource managers) involved in the transaction.

- Each participant responds with either a "yes" (vote to commit) or "no" (vote to abort).

- If any participant votes "no" (indicating it cannot commit the transaction), the coordinator
proceeds to the abort phase.

2. Commit Phase:

- If all participants vote "yes" in the prepare phase, the coordinator sends a commit request to all
participants.
- Upon receiving the commit request, each participant performs the commit operation, making the
transaction's changes permanent.

- After successfully committing, the participant acknowledges the coordinator.

3. Abort Phase:

- If any participant votes "no" in the prepare phase or if the coordinator times out waiting for
responses, the coordinator sends an abort request to all participants.

- Upon receiving the abort request, each participant rolls back the transaction, undoing any
changes made by the transaction.

- After successfully aborting, the participant acknowledges the coordinator.

Recovery Mechanisms: Logging and Checkpoints

1. Logging:

- Logging involves recording all changes made by transactions to a log file before they are applied
to the database.

- During recovery, the log is replayed to redo committed transactions or undo aborted
transactions, bringing the system to a consistent state.

- Write-Ahead Logging (WAL) is a common logging protocol where changes are written to the log
before being applied to the database to ensure durability.

2. Checkpoints:

- Checkpoints involve periodically saving the system state to stable storage.

- During recovery, the system can roll back to the last checkpoint and replay the log from that point
to recover transactions committed after the checkpoint.

- Checkpoints help reduce the time and resources required for recovery by providing a consistent
starting point.

5G NSA Call Drop Failures Possible Causes Troubleshooting Methods 1
100% (1)
5G NSA Call Drop Failures Possible Causes Troubleshooting Methods 1
17 pages
Distributed Transactions in Distributed Systems
No ratings yet
Distributed Transactions in Distributed Systems
6 pages
Unit 3 PLC Networking
No ratings yet
Unit 3 PLC Networking
22 pages
IPBX YeaStar S20 Data Sheet
No ratings yet
IPBX YeaStar S20 Data Sheet
2 pages
Group 8 Assignment - Ruth & Cosmas
No ratings yet
Group 8 Assignment - Ruth & Cosmas
7 pages
Failure Recovery in Distributed Systems
No ratings yet
Failure Recovery in Distributed Systems
24 pages
DC
No ratings yet
DC
37 pages
Optical Transport Network (OTN) : G709otn - WP - Opt - TM - Ae PDF
No ratings yet
Optical Transport Network (OTN) : G709otn - WP - Opt - TM - Ae PDF
31 pages
DC Unit 3
No ratings yet
DC Unit 3
44 pages
Unit # IV Replication and Fault Tolerance
No ratings yet
Unit # IV Replication and Fault Tolerance
82 pages
DDP Unit V
No ratings yet
DDP Unit V
44 pages
Possible Types of Failure
No ratings yet
Possible Types of Failure
16 pages
Chapter 7-Fault Tolerance
No ratings yet
Chapter 7-Fault Tolerance
71 pages
Fault Tolerance FDCC
No ratings yet
Fault Tolerance FDCC
76 pages
DC Ese Notes
No ratings yet
DC Ese Notes
47 pages
DS Mid-Terms Preparation
No ratings yet
DS Mid-Terms Preparation
11 pages
Iii Year/V Semester Question Bank Unit-Iv Part-A
No ratings yet
Iii Year/V Semester Question Bank Unit-Iv Part-A
5 pages
Implementing MPLS Layer 3 VPNs On Cisco IOS XR
No ratings yet
Implementing MPLS Layer 3 VPNs On Cisco IOS XR
96 pages
DS Unit5
No ratings yet
DS Unit5
13 pages
Ds Part B
No ratings yet
Ds Part B
30 pages
Du3 1
No ratings yet
Du3 1
54 pages
DSC5
No ratings yet
DSC5
13 pages
Distributed Deadlock & Recovery
No ratings yet
Distributed Deadlock & Recovery
55 pages
4 7 Ethernet Concepts Exam Answers
No ratings yet
4 7 Ethernet Concepts Exam Answers
19 pages
Assignment 4 - 044
No ratings yet
Assignment 4 - 044
4 pages
GS110TP DataSheet
No ratings yet
GS110TP DataSheet
4 pages
Unit IV - Distributed Transaction Processing
No ratings yet
Unit IV - Distributed Transaction Processing
38 pages
DS ModelQP Solution
No ratings yet
DS ModelQP Solution
44 pages
6CS5 DS Unit-5
No ratings yet
6CS5 DS Unit-5
34 pages
Ads Unit 4
No ratings yet
Ads Unit 4
6 pages
CC Unit-4
No ratings yet
CC Unit-4
28 pages
DC Unit 4 Important
No ratings yet
DC Unit 4 Important
6 pages
Chapter 8 Fault Tolerance
No ratings yet
Chapter 8 Fault Tolerance
20 pages
Distributed Database Systems Guide
No ratings yet
Distributed Database Systems Guide
29 pages
DSCC QB Solution
No ratings yet
DSCC QB Solution
15 pages
Word Unit5
No ratings yet
Word Unit5
19 pages
DS Unit - 4
No ratings yet
DS Unit - 4
20 pages
Distributed Dbms Advanced Concepts
No ratings yet
Distributed Dbms Advanced Concepts
70 pages
Ds Chapter 7
No ratings yet
Ds Chapter 7
21 pages
Distributed Computing QB Answers
No ratings yet
Distributed Computing QB Answers
15 pages
Fault Tolerance in Distributed Systems
No ratings yet
Fault Tolerance in Distributed Systems
21 pages
Dos Notes
No ratings yet
Dos Notes
18 pages
14CS705B-Distributed Systems Scheme
No ratings yet
14CS705B-Distributed Systems Scheme
24 pages
CCNA R S (200-125) Practice Exam
No ratings yet
CCNA R S (200-125) Practice Exam
105 pages
Course Outline-Network Programming
No ratings yet
Course Outline-Network Programming
3 pages
Firewall - pfBlockerNG - Alerts PDF
No ratings yet
Firewall - pfBlockerNG - Alerts PDF
1 page
Chapter 8-Fault Tolerance
No ratings yet
Chapter 8-Fault Tolerance
30 pages
Intro To DS Chapter 6
No ratings yet
Intro To DS Chapter 6
51 pages
Session 35
No ratings yet
Session 35
3 pages
DS Ia1
No ratings yet
DS Ia1
34 pages
DS CH7 - Fault Tolerance
No ratings yet
DS CH7 - Fault Tolerance
17 pages
Distributed System Recovery Guide
No ratings yet
Distributed System Recovery Guide
119 pages
Distributed Systems Checkpointing
No ratings yet
Distributed Systems Checkpointing
2 pages
Distributed Transactions, ACID, BLOB
No ratings yet
Distributed Transactions, ACID, BLOB
3 pages
Distributed Systems Overview
No ratings yet
Distributed Systems Overview
48 pages
Distributed Recovery Management: UNIT-4
No ratings yet
Distributed Recovery Management: UNIT-4
31 pages
OceanStor Dorado V6 6.0.0 Initial Configuration
No ratings yet
OceanStor Dorado V6 6.0.0 Initial Configuration
33 pages
Distributed DBMS Reliability Unit IV
100% (1)
Distributed DBMS Reliability Unit IV
27 pages
Lecture 7 PDC
No ratings yet
Lecture 7 PDC
8 pages
Distributed Systems & Fault Tolerance
No ratings yet
Distributed Systems & Fault Tolerance
34 pages
Unit-3 Part2
No ratings yet
Unit-3 Part2
74 pages
15-440 Distributed Systems: Fault Tolerance, Logging and Recovery Thursday Oct 8, 2015
No ratings yet
15-440 Distributed Systems: Fault Tolerance, Logging and Recovery Thursday Oct 8, 2015
30 pages
Distributed Computing: Farhad Muhammad Riaz
No ratings yet
Distributed Computing: Farhad Muhammad Riaz
18 pages
System Recovery
No ratings yet
System Recovery
38 pages
Consensus
No ratings yet
Consensus
77 pages
Distributed Systems Recovery Guide
No ratings yet
Distributed Systems Recovery Guide
15 pages
Distributed Systems As DS DS
No ratings yet
Distributed Systems As DS DS
7 pages
DS Chapter V8.0fault Tolerance
No ratings yet
DS Chapter V8.0fault Tolerance
23 pages
Network Module 1
No ratings yet
Network Module 1
31 pages
F5 DRA Overload Control Overview 4 4 0
No ratings yet
F5 DRA Overload Control Overview 4 4 0
18 pages
Fault Tolerance:-: Introduction, Process Resilience, Distributed Commit, Recovery
No ratings yet
Fault Tolerance:-: Introduction, Process Resilience, Distributed Commit, Recovery
52 pages
Huawei VLAN Configuration Log
No ratings yet
Huawei VLAN Configuration Log
4 pages
ALCplus2 - Configuration and Commissioning Guideline - Rev2
No ratings yet
ALCplus2 - Configuration and Commissioning Guideline - Rev2
32 pages
SecureSync 2400 ReleaseNotes For SW - Ver1.7.0
No ratings yet
SecureSync 2400 ReleaseNotes For SW - Ver1.7.0
26 pages
Cisco ASA Security Insights
No ratings yet
Cisco ASA Security Insights
111 pages
Chapter 7
No ratings yet
Chapter 7
26 pages
Data Communication Fundamentals
No ratings yet
Data Communication Fundamentals
1,399 pages
IoT Messaging Protocols Overview
No ratings yet
IoT Messaging Protocols Overview
12 pages
Configuring Etherchannel and Link State Tracking
No ratings yet
Configuring Etherchannel and Link State Tracking
22 pages
ECNS600 V100R002C00 Feature Description (20131127)
No ratings yet
ECNS600 V100R002C00 Feature Description (20131127)
82 pages
Mobile Computing Unit 4
No ratings yet
Mobile Computing Unit 4
24 pages
Avocent Acs 800acs 8000 Advanced Console System Command Reference Guide
No ratings yet
Avocent Acs 800acs 8000 Advanced Console System Command Reference Guide
49 pages
Chapter 1: Routing Concepts: Instructor Materials
No ratings yet
Chapter 1: Routing Concepts: Instructor Materials
70 pages
7450 ESS OS 5.0.R14 Release Notes
No ratings yet
7450 ESS OS 5.0.R14 Release Notes
89 pages
Msan Ua5000
No ratings yet
Msan Ua5000
33 pages
Maglev Load Balancer
No ratings yet
Maglev Load Balancer
14 pages
Exam 3 - Question 1a: 08r. Exam 2 & Part of Exam 3 Review
No ratings yet
Exam 3 - Question 1a: 08r. Exam 2 & Part of Exam 3 Review
7 pages
BDEW White Paper in Practice: IT Security in The Secondary Systems
No ratings yet
BDEW White Paper in Practice: IT Security in The Secondary Systems
6 pages
TeamViewer Manual Wake On LAN Id
No ratings yet
TeamViewer Manual Wake On LAN Id
17 pages

Unit 4 - DSRM

Uploaded by

Unit 4 - DSRM

Uploaded by

Unit 4

Failures and Their Classification:

Checkpoints and Recovery:

- Forced Checkpoints: Triggered manually or automatically in response to specific events, such as

- Two-Phase Commit (2PC): Ensures atomicity and durability of distributed transactions by

- Redundancy: Introducing duplicate processes or components to ensure continued operation if

- Failure Detection: Detecting failures quickly to initiate recovery processes.

- Recovery Mechanisms: Implementing strategies such as checkpointing and rollback to recover

Replication: Running multiple instances of a process on different nodes to tolerate failures.

- Isolation: Isolating individual processes to prevent failures from propagating to other

- Consistency: Ensuring consistency across replicated processes while maintaining performance.

Reliable Client-Server Communication:

- Acknowledgments: Using acknowledgments to confirm successful receipt of data and

- TCP (Transmission Control Protocol): Provides reliable, connection-oriented communication with

- Performance: Ensuring reliability without sacrificing performance can be challenging.

Reliable Group Communication:

- IP Multicast: Allows for one-to-many communication by sending packets to a group of destination

- Virtual Synchrony: Maintains a consistent view of the group by synchronizing membership

Mechanism for commit and recovery in distributed Database system

Two-Phase Commit Protocol:

- After successfully committing, the participant acknowledges the coordinator.

- After successfully aborting, the participant acknowledges the coordinator.

Recovery Mechanisms: Logging and Checkpoints

- Checkpoints involve periodically saving the system state to stable storage.

You might also like