Unit 4 Part 2

Uploaded by

menakababu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views21 pages

Unit 4 Part 2

Uploaded by

menakababu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

CS 3551 DISTRIBUTED

COMPUTING
Checkpoi
nt
A/C balance = 20000

ATM Pin Entry

Amount =

10000

Update Balance =

10000 Cash dispense

Checkpoint in Distributed
System
What is Domino
Effect?
● To see why rollback propagation occurs, consider the situation
where the sender of a message m rolls back to a state that
precedes the sending of m.
● The receiver of m must also roll back to a state that precedes m’s
receipt; otherwise, the states of the two processes would be
inconsistent because they would show that message m was
received without being sent, which is impossible in any correct
failure-free execution.
● This phenomenon of cascaded rollback is called the domino
effect.
● In some situations, rollback propagation may extend back to the
initial state of the computation, losing all the work performed
Domino effect
continued…
● Independent or uncoordinated checkpointing : - If each participating process
takes its checkpoints independently, then the system is susceptible to the
domino effect.
How to avoid domino effect?
● Coordinated checkpointing :
○ processes coordinate their checkpoints to form a system-wide consistent state.
○ In case of a process failure, the system state can be restored to such a consistent set of
checkpoints, preventing the rollback propagation.
● Communication-induced checkpointing :
○ forces each process to take checkpoints based on information piggybacked on the application
messages it receives from other processes.
○ Checkpoints are taken such that a system-wide consistent state always exists on stable
storage, thereby
avoiding the domino effect.
● Logbased rollback recovery:
○ combines checkpointing with logging of nondeterministic events.
○ Log-based rollback recovery relies on the piecewise deterministic (PWD) assumption, which
postulates that all non-deterministic events that a process executes can be identified and that
the information necessary to replay each event during recovery can be logged in the event’s
determinant.
○ By logging and replaying the non-deterministic events in their exact original order, a process can
Key
Points
● Rollback recovery treats a distributed system application as a
collection of processes that communicate over a network.
● It achieves fault tolerance by periodically saving the state of a
process during the failure-free execution, enabling it to restart
from a saved state upon a failure to reduce the amount of lost
work.
● The saved state is called a checkpoint, and the procedure of
restarting from a previously checkpointed state is called rollback
recovery.
● A checkpoint can be saved on either the stable storage or the
volatile storage depending on the failure scenarios to be tolerated.
● Challenges for Recovery:
○ on a failure of one or more processes in a system, these dependencies may force
some of the processes that did not fail to roll back, creating what is commonly
called a rollback propagation
Background and
Definitions
1. System Model
2. Local Checkpoint
3. Consistent system states
4. Interactions with the outside
world
5. Different types of messages
1. System
Model

● A distributed system consists of a fixed number of processes, P1, P2

PN , which communicate only through messages.
● Processes cooperate to execute a distributed application and interact
with the outside world by receiving and sending input and output
messages, respectively.
● Some protocols assume that the communication subsystem delivers
messages reliably, in first-in-first-out (FIFO) order, while other
protocols assume that the communication subsystem can lose,
duplicate, or reorder messages.
● a system recovers correctly if its internal state is consistent with the
2. Local Checkpoint - @ each process
level
1. A local checkpoint is a snapshot of the state of the process
at a given instance and the event of recording the state of a
process is called local checkpointing.
2. The contents of a checkpoint depend upon the application
context and the checkpointing method being used.
3. Depending upon the checkpointing method used, a process may
keep
several local checkpoints or just a single checkpoint at any time
4. a process stores all local checkpoints on the stable storage so that
they are available even if the process crashes.
5. We also assume that a process is able to roll back to any of its
existing local checkpoints and thus restore to and restart from
the corresponding state
3. Consistent vs Inconsistent System
States
4. Interactions with Outside World
(OWP)
● a printer cannot roll back the effects of printing a character, and an
automatic teller machine cannot recover the money that it
dispensed to a customer
● A distributed application often interacts with the outside world to
receive input data or deliver the outcome of a computation. If a
failure occurs, the outside world cannot be expected to roll back.
● the outside world see a consistent behavior of the system despite
failures
● Output Commit- before sending output to the OWP, the system must
ensure that the state from which the output is sent will be
recovered despite any future failure.
● Input messages :
○ Received messages from the OWP may not be reproducible during recovery,
because it may not be possible for the outside world to regenerate them.
○ Thus, recovery protocols must arrange to save these input messages so that
they can be
retrieved when needed for execution replay after a failure
Types of
Messages
Types of
Messages
Key
Points
1. In-transit (m1,m2)
a. Messages that has been sent but not yet received
b. When in-transit messages are part of a global system state, these messages do
not cause any inconsistency.
c. For reliable communication channels, a consistent state must include in-transit
messages
because they will always be delivered to their destinations in any legal
execution of the system.
d. On the other hand, if a system model assumes lossy communication
channels, then in-transit
messages can be omitted from system state.
2. Lost Messages(m1)
a. Messages whose send is not undone but receive is undone due to rollback are
called lost messages.
b. This type of messages occurs when the process rolls back to a checkpoint
prior to reception of
the message while the sender does not rollback beyond the send operation of
the message
Key
Points….
3. Delayed Messages (m2,m5)
a. Messages whose receive is not recorded because the receiving process was
either down or the message arrived after the rollback of the receiving process
4. Orphan Messages
a. Messages with receive recorded but message send not recorded are called
orphan messages.
b. For example, a rollback might have undone the send of such messages, leaving
the receive event intact at the receiving process.
c. Orphan messages do not arise if processes roll back to a consistent global state.
5. Duplicate Message(m4,m5)
a. Duplicate messages arise due to message logging and replaying during process
recovery
Issues in Failure
Recovery
A

J
Key
Points

Sewage & Septage Ordinance Guide
100% (2)
Sewage & Septage Ordinance Guide
10 pages
Tehcnical Paper: Calculating Freezing Times in Blast and Plate Freezers by Dr. Andy Pearson
100% (2)
Tehcnical Paper: Calculating Freezing Times in Blast and Plate Freezers by Dr. Andy Pearson
36 pages
100 Geometry Problems: Contributors: Djmathman, Abishek99, Captainflint
No ratings yet
100 Geometry Problems: Contributors: Djmathman, Abishek99, Captainflint
8 pages
Unit 4 Part 3
No ratings yet
Unit 4 Part 3
21 pages
Lm2-Rollback & Recovery
No ratings yet
Lm2-Rollback & Recovery
34 pages
Cs3551 Unit IV Notes
No ratings yet
Cs3551 Unit IV Notes
34 pages
Dc-3551 Unit IV Notes
No ratings yet
Dc-3551 Unit IV Notes
32 pages
DC Unit4
No ratings yet
DC Unit4
33 pages
Unit 4
No ratings yet
Unit 4
32 pages
CS8603 U.iv
No ratings yet
CS8603 U.iv
33 pages
DC Unit4
No ratings yet
DC Unit4
32 pages
Unit Iv Recovery
No ratings yet
Unit Iv Recovery
27 pages
Unit 4
No ratings yet
Unit 4
32 pages
DS NOTES Unit 4 PDF
No ratings yet
DS NOTES Unit 4 PDF
36 pages
System Recovery
No ratings yet
System Recovery
38 pages
Module 4 - Distributed Shared Memory and Failure Recovery - Sreerag Sanilkumar
No ratings yet
Module 4 - Distributed Shared Memory and Failure Recovery - Sreerag Sanilkumar
14 pages
4th Unit Topics Recovery
No ratings yet
4th Unit Topics Recovery
73 pages
CheckpointingRecovery ds14
No ratings yet
CheckpointingRecovery ds14
35 pages
Distributed Computing Module 4 Important Topics PYQs
No ratings yet
Distributed Computing Module 4 Important Topics PYQs
23 pages
Distributed System Recovery Guide
No ratings yet
Distributed System Recovery Guide
119 pages
Unit 4
No ratings yet
Unit 4
94 pages
Distributed Systems Checkpointing
No ratings yet
Distributed Systems Checkpointing
2 pages
Unit-3 Part2
No ratings yet
Unit-3 Part2
74 pages
Assignment 4 - 044
No ratings yet
Assignment 4 - 044
4 pages
Distributed Computing Series 2 Important Topics
No ratings yet
Distributed Computing Series 2 Important Topics
24 pages
Recovery DC
No ratings yet
Recovery DC
6 pages
Checkpoint Recovery in Distributed Systems
100% (1)
Checkpoint Recovery in Distributed Systems
26 pages
Unit - Iv
No ratings yet
Unit - Iv
10 pages
Define The Terms: Rollback Propagation.: Coordinated Checkpointing
No ratings yet
Define The Terms: Rollback Propagation.: Coordinated Checkpointing
5 pages
Unit 4 Part 3
No ratings yet
Unit 4 Part 3
33 pages
Checkpointing and Rollback Recovery For Distributed Systems 5cvcuy5txm
No ratings yet
Checkpointing and Rollback Recovery For Distributed Systems 5cvcuy5txm
23 pages
Fault Tolerance:-: Introduction, Process Resilience, Distributed Commit, Recovery
No ratings yet
Fault Tolerance:-: Introduction, Process Resilience, Distributed Commit, Recovery
52 pages
Distributed Failure Recovery
No ratings yet
Distributed Failure Recovery
30 pages
16 - Issues in Failure Recovery
No ratings yet
16 - Issues in Failure Recovery
5 pages
Rollback Slides
No ratings yet
Rollback Slides
22 pages
Concurrent Checkpointing and Recovery in Distributed Systems
No ratings yet
Concurrent Checkpointing and Recovery in Distributed Systems
61 pages
Chapter 8 Fault Tolerance
No ratings yet
Chapter 8 Fault Tolerance
20 pages
CST402 Scheme
No ratings yet
CST402 Scheme
9 pages
Unit 4 Answer Key
No ratings yet
Unit 4 Answer Key
24 pages
Roll Back and Recovery Mechanisms
No ratings yet
Roll Back and Recovery Mechanisms
12 pages
DS Unit-3 Notes
No ratings yet
DS Unit-3 Notes
35 pages
Failure Recovery in Distributed Systems
No ratings yet
Failure Recovery in Distributed Systems
24 pages
Module 4
No ratings yet
Module 4
59 pages
A 161126
No ratings yet
A 161126
26 pages
DistributedComputing (University) PartA
No ratings yet
DistributedComputing (University) PartA
19 pages
Fault Tolerant Systems: Part 17 - Checkpointing II Chapter 6 - Checkpointing
No ratings yet
Fault Tolerant Systems: Part 17 - Checkpointing II Chapter 6 - Checkpointing
34 pages
Design Patterns For Checkpoint-Based Rollback Recovery
No ratings yet
Design Patterns For Checkpoint-Based Rollback Recovery
26 pages
Ds Chapter 7
No ratings yet
Ds Chapter 7
21 pages
Lm3 Checkpointing Algorithm
No ratings yet
Lm3 Checkpointing Algorithm
40 pages
Session 33
No ratings yet
Session 33
4 pages
DC 2 MARKS New
No ratings yet
DC 2 MARKS New
6 pages
DS CH7 - Fault Tolerance
No ratings yet
DS CH7 - Fault Tolerance
17 pages
Fault Tolerant Message Passing Systems
No ratings yet
Fault Tolerant Message Passing Systems
26 pages
Distributed Checkpointing Guide
No ratings yet
Distributed Checkpointing Guide
33 pages
Distributed Computing Techniques
No ratings yet
Distributed Computing Techniques
3 pages
Distributed Systems Recovery Guide
No ratings yet
Distributed Systems Recovery Guide
15 pages
Checkpoints Recovery
No ratings yet
Checkpoints Recovery
35 pages
Chapter 7-Fault Tolerance
No ratings yet
Chapter 7-Fault Tolerance
71 pages
Consensus
No ratings yet
Consensus
77 pages
Chapter Seven
No ratings yet
Chapter Seven
13 pages
Distributed Computing: Farhad Muhammad Riaz
No ratings yet
Distributed Computing: Farhad Muhammad Riaz
18 pages
Unit 4 Part 1 B
No ratings yet
Unit 4 Part 1 B
31 pages
Stucor CS3391-ND
No ratings yet
Stucor CS3391-ND
293 pages
Unit 3 Part 3
No ratings yet
Unit 3 Part 3
30 pages
1
No ratings yet
1
34 pages
11 Removed
No ratings yet
11 Removed
30 pages
11
No ratings yet
11
35 pages
Unit 2 Message Passing Part 1
No ratings yet
Unit 2 Message Passing Part 1
35 pages
1 Removed
No ratings yet
1 Removed
24 pages
All About Gog and Magog, The Anti-Christ, and The Beast - Islam Question & Answer
No ratings yet
All About Gog and Magog, The Anti-Christ, and The Beast - Islam Question & Answer
3 pages
Resume Film
No ratings yet
Resume Film
1 page
IEEE Standard Terminology For Power and Distribution Transformers
No ratings yet
IEEE Standard Terminology For Power and Distribution Transformers
56 pages
Anthotypes Explore The Darkroom in Your Garden and Make Photographs Using Plants 1466261005 9781466261006 - Compress
No ratings yet
Anthotypes Explore The Darkroom in Your Garden and Make Photographs Using Plants 1466261005 9781466261006 - Compress
100 pages
Mastertop 1210i M 12-04
No ratings yet
Mastertop 1210i M 12-04
3 pages
Skandvig Terra PLC: Global Water Solutions
No ratings yet
Skandvig Terra PLC: Global Water Solutions
6 pages
Pilih Jawapan Yang Terbaik Untuk Melengkapkan Ayat Berikut
No ratings yet
Pilih Jawapan Yang Terbaik Untuk Melengkapkan Ayat Berikut
9 pages
Class-12-Maths-Sep Test-Final QN Paper
No ratings yet
Class-12-Maths-Sep Test-Final QN Paper
5 pages
Thomas Mutoro Wefwafwa - Final Project Report-Signed
No ratings yet
Thomas Mutoro Wefwafwa - Final Project Report-Signed
34 pages
Classification of Lung Sounds Using CNN
No ratings yet
Classification of Lung Sounds Using CNN
10 pages
Architecture Analysis and Simulink Modeling of A High Resolution Zoom ADC
No ratings yet
Architecture Analysis and Simulink Modeling of A High Resolution Zoom ADC
6 pages
Air Conditioner Parts For Assembling and Repairing Manufacturer-Supplier China
No ratings yet
Air Conditioner Parts For Assembling and Repairing Manufacturer-Supplier China
9 pages
Geometry Exercises 2: Parallelogram Rule
No ratings yet
Geometry Exercises 2: Parallelogram Rule
2 pages
DEPORTES
No ratings yet
DEPORTES
5 pages
Pythagorean Triples Guide
No ratings yet
Pythagorean Triples Guide
9 pages
Physical Design Interview Complete 6
No ratings yet
Physical Design Interview Complete 6
1 page
10.2305 IUCN - UK.1998.RLTS.T33255A9771604.en
No ratings yet
10.2305 IUCN - UK.1998.RLTS.T33255A9771604.en
5 pages
Tally ERP 1 Book (1) 1-1
No ratings yet
Tally ERP 1 Book (1) 1-1
43 pages
Z-Transforms and Their Applications For Solving Difference Equations
No ratings yet
Z-Transforms and Their Applications For Solving Difference Equations
3 pages
Biology 12 Unit 9 Assignment 2 Blood Type and Immune Response Virtual Lab
0% (1)
Biology 12 Unit 9 Assignment 2 Blood Type and Immune Response Virtual Lab
2 pages
Electrical System Building Blocks
100% (1)
Electrical System Building Blocks
71 pages
Bead Weaving
No ratings yet
Bead Weaving
17 pages
Mymms d2b Catalog
No ratings yet
Mymms d2b Catalog
12 pages
F6
No ratings yet
F6
1 page
Grafik Pertumbuhan Anak
No ratings yet
Grafik Pertumbuhan Anak
6 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
2 pages

Unit 4 Part 2

Uploaded by

Unit 4 Part 2

Uploaded by

CS 3551 DISTRIBUTED

ATM Pin Entry

10000 Cash dispense

● A distributed system consists of a fixed number of processes, P1, P2

You might also like