Chapter 18 - Recovery and fault tolerance. This chapter discusses recovery and fault tolerance techniques used in a distributed operating system. Resiliency, which is a technique for minimizing the impact of a fault, is also discussed.
Trang 1PROPRIETARY MATERIAL. © 2007 The McGrawHill Companies, Inc. All rights reserved. No part of this PowerPoint slide may be displayed, reproduced or distributed
Trang 2Introduction
– It may damage state of some data or processes, leading to
* Malfunctioning of a server
* Non-availability of resources and services
* Disruption of system operation
– Its consequences can be avoided using three approaches
* Recovery
Some data or processes are rolled back to previous states
* Fault tolerance
Provides un-interrupted operation of a system despite faults
* Resiliency
Reduces cost of re-execution following a fault
– Recovery is the generic term for all three approaches
Trang 3Faults and failures
• A fault damages the state of a system
– We say it causes an error in state of a system
– It leads to unexpected behavior, which we call a failure
– Recovery restores the system to an error-free state
Trang 4Recovery after a fault
• System operation is initiated in state S0 at time 0
• A fault occurs at time t1; a failure is detected at time t i when state is S i’
• Recovery puts the system into a new state S new , which is errorfree
Trang 5Classes of faults
– System fault is a system crash caused by a power outage or
component fault
* Amnesia fault
The system loses its state completely
* Fail-stop fault
The system stops operating when a fault occurs
This property permits an error in system state to be corrected
* Byzantine fault
A process suffering this fault behaves maliciously
* Storage fault
Bad block on a store medium
Trang 6Recovery techniques
– Data recovery
* State of data or file is restored to that in its latest back-up
– Process recovery
* A checkpoint is a recording of the state of a system
* At a fault, a process is rolled back to the state in some checkpoint
– Fault tolerance
* The error in state is corrected without interruption of system operation
– Resiliency
* Some results produced by a computation before a fault occurred are used during recovery
Trang 7Recovery techniques
– Backward recovery
* The state of an entity or application is reset to a prior state
Simple to implement
However, state recording should be both feasible and practical
– Forward recovery
* The erroneous state of an entity or application is repaired
Trang 8Byzantine faults and agreement protocols
tolerance technique is used to avoid their consequences
– The Byzantine agreement problem: A group of processes must
agree on a value
* Initiator broadcasts the value to all other processes
* On receiving the value, each process broadcasts it to all others
A non-faulty process broadcasts the received value
A faulty process may broadcast a different value
* All non-faulty processes must conclude the correct value
* Lamport’s protocol involves m+1 rounds of messages, where number of faulty processes does not exceed m
process cannot reach agreement
Trang 9Recovery schemes
– Checkpointing algorithm
* Decides when and how to create checkpoints of processes
* C ij denotes the j’th checkpoint of process P i
* State (C ij ) denotes the state of process P i in checkpoint C ij
– Recovery algorithm
* When a failure is noticed, it decides which checkpoint a process should be rolled back to
Trang 10Checkpoints and recovery
• At time t f , failure occurs in node N 3 , where process P3 is in operation
• Rolling back P3 to C32 would be inconsistent because sending of
message m3 by P 3 would be nullified, whereas P2 has received m3
Trang 11Orphan messages
– A message mk sent by process Pi to Pj is an orphan in the new
state of a system during recovery if the new state
* Records m k as received by P j
* Does not record m k as sent by P i
• If rolling back of Pi makes mk an orphan, then Pj must be
Trang 12Fault tolerance techniques
– Either a fault cannot cause an error
– Or the error can be removed easily
* It is typically achieved through data recovery
– Its two aspects are
* Fault tolerance for replicated data
Data should remain available; processes should see correct values
* Fault tolerance for distributed data
Different parts of data should not become mutually inconsistent
Trang 13Logs, forward recovery and backward recovery
are used for fault tolerance and recovery
– Do log
* Records those actions that should be performed to ensure correctness of state of an entity or system
– Undo log
* Records actions that should be undone to remove an error in state if
a fault occurs
– Write ahead logging
* A log record is written before an action is taken
* Ensures that the log contains correct entries when faults occur
– Operation log contains a list of actions
– Value log contains a list of values of data
Trang 14Reliable data replication
should be accessed by a process
– Qr: quorum for reading, Qw: quorum for writing
– How to ensure that D can be updated by only one process but
can be read by many, and that a process would always read a
latest value of D?
* 2 x Q w > n
* Q r + Q w > n
* Use time-stamps to identify the latest value
– How to tolerate up to k failures?
* Q r = k + 1
* Q w = n – k
Trang 15Reliable distributed data
link or node faults would cause data inconsistency
– A distributed transaction is used to avoid it
– The two phase commit protocol (also called 2 PC protocol) is
used to tolerate link or node faults
* It ensures one of the following conditions
All sites containing parts of distributed data can perform updates reliably
None of the parts of the distributed data is updated
Trang 16Two phase commit protocol
– Phase I
1 Actions of transaction coordinator:
a Write a ‘Prepare T i’ record in the log
b Set a time-out and send a ‘Prepare T i’ message to all nodes participating in the transaction
2 Actions of a participating node:
a If it is ready to commit, write updates in stable storage and a
‘Prepared T i ’ record in the log, and send a ‘Prepared T i’ message to coordinator
b Otherwise, write an ‘Abandoned T i’ record in the log and send
an ‘Abandoned T i’ message to coordinator
Trang 17Two phase commit protocol
– Phase II
1 Actions of transaction coordinator:
a If it receives a ‘Prepared’ reply from all nodes before time-out
occurs, write a ‘Commit T i’ record in the log and send ‘Commit
T i’ messages to all nodes
b Otherwise, write an ‘Abort T i’ record in the log and send ‘Abort
T i’ messages to all nodes
c Wait for an acknowledgment from each node and write a
‘Complete T i’ message in the log
2 Actions of a participating node: Depending on the coordinator’s message,
a Write a ‘Commit T i’ record in the log and perform commit processing
Trang 18Resiliency
– Parts of the computation are executed in different nodes
* These parts communicate through interprocess messages
– Faults can arise during its operation
* A node containing a part of the computation or its data fails
* A link fails
– However, failures are partial
* A node or link failure may affect only some part of the computation
* Resiliency techniques exploit this fact to reduce the recovery cost
Some of the results computed before the failure are re-used
Only some parts of the computation are re-executed
Trang 19Nested transaction
Ti
– It commits only if its parent transaction Ti commits
– It is implemented as follows
* When T ik completes, it is said to have reached a tentative commit
* When T i wishes to commit, it checks whether all nested
transactions have reached a tentative commit and can participate in commit processing
It is implemented using a 2PC protocol
Trang 20Nested transaction
– If a nested transaction Tik does not respond to a ‘Prepare’
message, the coordinator can retry Tik in the same node or in
some other node
* If T ik had reached tentative commit and its node had failed when
‘Prepare’ message was sent
* If the failed node recovers and the coordinator retries T ik in it
* The results of T ik, computed before the failure, can be used