Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
33 views5 pages

Unit 4 - DSRM

Distributed database systems use the Two-Phase Commit protocol for transaction commit and recovery is facilitated by logging and checkpoints. The 2PC protocol involves a prepare and commit phase where participants vote to commit or abort. Logging records changes to a log file before applying them to the database. Checkpoints periodically save the system state to stable storage to provide a recovery starting point.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views5 pages

Unit 4 - DSRM

Distributed database systems use the Two-Phase Commit protocol for transaction commit and recovery is facilitated by logging and checkpoints. The 2PC protocol involves a prepare and commit phase where participants vote to commit or abort. Logging records changes to a log file before applying them to the database. Checkpoints periodically save the system state to stable storage to provide a recovery starting point.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Unit 4

Failures and Their Classification:

Definition: A failure in a distributed database system refers to any event that disrupts the normal
operation of the system, resulting in the loss of data consistency, availability, or reliability.

Types of Failures:

- Hardware Failures: Failures in physical components such as servers, disks, or network devices.

- Software Failures: Errors or bugs in the software components of the database system, such as the
database management system (DBMS) or applications.

- Network Failures: Communication failures or network outages that prevent data transmission
between distributed nodes.

- Site Failures: Failures that affect an entire site or data center, resulting in the loss of access to all
resources hosted at that location.

- Media Failures: Physical damage or corruption to storage media, such as disks or tapes, leading to
data loss or corruption.

Classification:

- Transient Failures: Temporary failures that can be recovered from quickly, such as a network
glitch or a brief power outage.

- Permanent Failures: Irreversible failures that require more extensive recovery procedures, such
as hardware failures or data corruption.

__________________________________________________________________________________

Checkpoints and Recovery:

1. Checkpoints:

- Definition: Checkpoints are predefined moments in time when the state of a distributed database
system is saved to stable storage, allowing recovery to a consistent state after a failure.

- Purpose: Checkpoints help reduce the amount of work needed during recovery by providing a
consistent starting point.

- Types:

- Periodic Checkpoints: Scheduled at regular intervals to save the current state of the system.

- Forced Checkpoints: Triggered manually or automatically in response to specific events, such as


transaction commits or system checkpoints.

2. Recovery:
- Definition: Recovery in a distributed database system involves restoring the system to a
consistent state after a failure occurs.

- Phases:

- Analysis: Identifying the transactions that were in progress at the time of failure and
determining the necessary actions for recovery.

- Undo: Reverting the effects of incomplete transactions by rolling them back to their pre-failure
state.

- Redo: Reapplying the effects of committed transactions that were lost due to the failure.

- Techniques:

- Backward Recovery: Reverting to a previous consistent state and replaying transactions from
that point forward.

- Forward Recovery: Applying recovery actions directly to the current state of the system without
reverting to a previous state.

3. Recovery Protocols:

- Two-Phase Commit (2PC): Ensures atomicity and durability of distributed transactions by


coordinating commit or rollback decisions among participating nodes.

- Three-Phase Commit (3PC): Enhances the reliability of 2PC by introducing a prepare phase to
handle failure scenarios more robustly.

Process Resilience

Definition: Process resilience refers to the ability of a system or application to continue functioning
despite failures or disruptions.

Fault Tolerance:

- Redundancy: Introducing duplicate processes or components to ensure continued operation if


one fails.

- Failure Detection: Detecting failures quickly to initiate recovery processes.

- Recovery Mechanisms: Implementing strategies such as checkpointing and rollback to recover


from failures.

Techniques -

Replication: Running multiple instances of a process on different nodes to tolerate failures.

- Isolation: Isolating individual processes to prevent failures from propagating to other


components.
- Graceful Degradation: Prioritizing essential functions to maintain basic functionality during failure
conditions.

Challenges:

- Overhead: Replication and recovery mechanisms can introduce overhead in terms of resources
and performance.

- Consistency: Ensuring consistency across replicated processes while maintaining performance.

- Complexity: Designing and managing resilient systems can be complex and require careful
planning.

__________________________________________________________________________________

Reliable Client-Server Communication:

Definition: Reliable client-server communication ensures that data is transmitted accurately and in
the correct order between clients and servers, even in the presence of failures or network issues.

Techniques

- Acknowledgments: Using acknowledgments to confirm successful receipt of data and


retransmitting if necessary.

- Sequence Numbers: Assigning sequence numbers to data packets to ensure correct ordering.

- Timeouts and Retransmissions: Setting timeouts to detect lost packets and retransmitting them if
no acknowledgment is received.

Protocols:

- TCP (Transmission Control Protocol): Provides reliable, connection-oriented communication with


mechanisms such as acknowledgment, retransmission, and flow control.

- HTTP (Hypertext Transfer Protocol): Built on top of TCP, it ensures reliable transfer of web data
between clients and servers.

- RPC (Remote Procedure Call): Provides reliable communication between distributed systems by
abstracting procedure calls over the network.

4. Challenges:

- Performance: Ensuring reliability without sacrificing performance can be challenging.

- Overhead: Adding reliability mechanisms can increase network overhead and latency.

- Scalability: Maintaining reliability in large-scale distributed systems with many clients and servers
can be complex.

_____________________________________________________________________

Reliable Group Communication:


Definition: Reliable group communication ensures that messages are delivered to all members of a
group in a consistent and ordered manner, even in the presence of failures or network partitions.

Techniques

- Total Order: Ensuring that messages are delivered to all group members in the same order.

- View Synchronization: Keeping group members synchronized to detect failures and maintain
consistency.

- Membership Management: Handling dynamic changes in group membership due to joins, leaves,
or failures.

3. Protocols:

- IP Multicast: Allows for one-to-many communication by sending packets to a group of destination


hosts.

- Paxos: A consensus protocol used to ensure agreement among a group of nodes in a distributed
system.

- Virtual Synchrony: Maintains a consistent view of the group by synchronizing membership


changes and message delivery.

4. Challenges:

- Scalability: Ensuring reliable group communication in large-scale distributed systems with many
members.

- Fault Tolerance: Handling failures and network partitions while maintaining consistency.

- Complexity: Designing and implementing reliable group communication protocols can be complex
and require careful consideration of various factors.

Mechanism for commit and recovery in distributed Database system

Ans: In distributed database systems, the Two-Phase Commit (2PC) protocol is commonly used for
commit, and recovery is often facilitated by techniques such as logging and checkpoints.

Two-Phase Commit Protocol:

1. Prepare Phase:

- The coordinator (typically the transaction manager) sends a prepare request to all participants
(resource managers) involved in the transaction.

- Each participant responds with either a "yes" (vote to commit) or "no" (vote to abort).

- If any participant votes "no" (indicating it cannot commit the transaction), the coordinator
proceeds to the abort phase.

2. Commit Phase:

- If all participants vote "yes" in the prepare phase, the coordinator sends a commit request to all
participants.
- Upon receiving the commit request, each participant performs the commit operation, making the
transaction's changes permanent.

- After successfully committing, the participant acknowledges the coordinator.

3. Abort Phase:

- If any participant votes "no" in the prepare phase or if the coordinator times out waiting for
responses, the coordinator sends an abort request to all participants.

- Upon receiving the abort request, each participant rolls back the transaction, undoing any
changes made by the transaction.

- After successfully aborting, the participant acknowledges the coordinator.

Recovery Mechanisms: Logging and Checkpoints

1. Logging:

- Logging involves recording all changes made by transactions to a log file before they are applied
to the database.

- During recovery, the log is replayed to redo committed transactions or undo aborted
transactions, bringing the system to a consistent state.

- Write-Ahead Logging (WAL) is a common logging protocol where changes are written to the log
before being applied to the database to ensure durability.

2. Checkpoints:

- Checkpoints involve periodically saving the system state to stable storage.

- During recovery, the system can roll back to the last checkpoint and replay the log from that point
to recover transactions committed after the checkpoint.

- Checkpoints help reduce the time and resources required for recovery by providing a consistent
starting point.

You might also like