Lecture Operating systems: A concept-based approach (2/e): Chapter 18 - Dhananjay M. Dhamdhere

Chapter 18 - Recovery and fault tolerance. This chapter discusses recovery and fault tolerance techniques used in a distributed operating system. Resiliency, which is a technique for minimizing the impact of a fault, is also discussed.

Trang 1

Trang 2

Introduction

– It may damage state of some data or processes, leading to

* Malfunctioning of a server

* Non-availability of resources and services

* Disruption of system operation

– Its consequences can be avoided using three approaches

* Recovery

 Some data or processes are rolled back to previous states

* Fault tolerance

 Provides un-interrupted operation of a system despite faults

* Resiliency

 Reduces cost of re-execution following a fault

– Recovery is the generic term for all three approaches

Trang 3

Faults and failures

• A fault damages the state of a system

– We say it causes an error in state of a system

– It leads to unexpected behavior, which we call a failure

– Recovery restores the system to an error-free state

Trang 4

Recovery after a fault

• System operation is initiated in state S0 at time 0

• A fault occurs at time t1; a failure is detected at time t i when state is S i’

• Recovery puts the system into a new state S new , which is errorfree

Trang 5

Classes of faults

– System fault is a system crash caused by a power outage or

component fault

* Amnesia fault

 The system loses its state completely

* Fail-stop fault

 The system stops operating when a fault occurs

 This property permits an error in system state to be corrected

* Byzantine fault

 A process suffering this fault behaves maliciously

* Storage fault

 Bad block on a store medium

Trang 6

Recovery techniques

– Data recovery

* State of data or file is restored to that in its latest back-up

– Process recovery

* A checkpoint is a recording of the state of a system

* At a fault, a process is rolled back to the state in some checkpoint

– Fault tolerance

* The error in state is corrected without interruption of system operation

– Resiliency

* Some results produced by a computation before a fault occurred are used during recovery

Trang 7

Recovery techniques

– Backward recovery

* The state of an entity or application is reset to a prior state

 Simple to implement

 However, state recording should be both feasible and practical

– Forward recovery

* The erroneous state of an entity or application is repaired

Trang 8

Byzantine faults and agreement protocols

tolerance technique is used to avoid their consequences

– The Byzantine agreement problem: A group of processes must

agree on a value

* Initiator broadcasts the value to all other processes

* On receiving the value, each process broadcasts it to all others

 A non-faulty process broadcasts the received value

 A faulty process may broadcast a different value

* All non-faulty processes must conclude the correct value

* Lamport’s protocol involves m+1 rounds of messages, where number of faulty processes does not exceed m

process cannot reach agreement

Trang 9

Recovery schemes

– Checkpointing algorithm

* Decides when and how to create checkpoints of processes

* C ij denotes the j’th checkpoint of process P i

* State (C ij ) denotes the state of process P i in checkpoint C ij

– Recovery algorithm

* When a failure is noticed, it decides which checkpoint a process should be rolled back to

Trang 10

Checkpoints and recovery

• At time t f , failure occurs in node N 3 , where process P3 is in operation

• Rolling back P3 to C32 would be inconsistent because sending of

message m3 by P 3 would be nullified, whereas P2 has received m3

Trang 11

Orphan messages

– A message mk sent by process Pi to Pj is an orphan in the new

state of a system during recovery if the new state

* Records m k as received by P j

* Does not record m k as sent by P i

• If rolling back of Pi makes mk an orphan, then Pj must be

Trang 12

Fault tolerance techniques

– Either a fault cannot cause an error

– Or the error can be removed easily

* It is typically achieved through data recovery

– Its two aspects are

* Fault tolerance for replicated data

 Data should remain available; processes should see correct values

* Fault tolerance for distributed data

 Different parts of data should not become mutually inconsistent

Trang 13

Logs, forward recovery and backward recovery

are used for fault tolerance and recovery

– Do log

* Records those actions that should be performed to ensure correctness of state of an entity or system

– Undo log

* Records actions that should be undone to remove an error in state if

a fault occurs

– Write ahead logging

* A log record is written before an action is taken

* Ensures that the log contains correct entries when faults occur

– Operation log contains a list of actions

– Value log contains a list of values of data

Trang 14

Reliable data replication

should be accessed by a process

– Qr: quorum for reading, Qw: quorum for writing

– How to ensure that D can be updated by only one process but

can be read by many, and that a process would always read a

latest value of D?

* 2 x Q w > n

* Q r + Q w > n

* Use time-stamps to identify the latest value

– How to tolerate up to k failures?

* Q r = k + 1

* Q w = n – k

Trang 15

Reliable distributed data

link or node faults would cause data inconsistency

– A distributed transaction is used to avoid it

– The two phase commit protocol (also called 2 PC protocol) is

used to tolerate link or node faults

* It ensures one of the following conditions

 All sites containing parts of distributed data can perform updates reliably

 None of the parts of the distributed data is updated

Trang 16

Two phase commit protocol

– Phase I

1 Actions of transaction coordinator:

a Write a ‘Prepare T i’ record in the log

b Set a time-out and send a ‘Prepare T i’ message to all nodes participating in the transaction

2 Actions of a participating node:

a If it is ready to commit, write updates in stable storage and a

‘Prepared T i ’ record in the log, and send a ‘Prepared T i’ message to coordinator

b Otherwise, write an ‘Abandoned T i’ record in the log and send

an ‘Abandoned T i’ message to coordinator

Trang 17

Two phase commit protocol

– Phase II

1 Actions of transaction coordinator:

a If it receives a ‘Prepared’ reply from all nodes before time-out

occurs, write a ‘Commit T i’ record in the log and send ‘Commit

T i’ messages to all nodes

b Otherwise, write an ‘Abort T i’ record in the log and send ‘Abort

T i’ messages to all nodes

c Wait for an acknowledgment from each node and write a

‘Complete T i’ message in the log

2 Actions of a participating node: Depending on the coordinator’s message,

a Write a ‘Commit T i’ record in the log and perform commit processing

Trang 18

Resiliency

– Parts of the computation are executed in different nodes

* These parts communicate through interprocess messages

– Faults can arise during its operation

* A node containing a part of the computation or its data fails

* A link fails

– However, failures are partial

* A node or link failure may affect only some part of the computation

* Resiliency techniques exploit this fact to reduce the recovery cost

 Some of the results computed before the failure are re-used

 Only some parts of the computation are re-executed

Trang 19

Nested transaction

Ti

– It commits only if its parent transaction Ti commits

– It is implemented as follows

* When T ik completes, it is said to have reached a tentative commit

* When T i wishes to commit, it checks whether all nested

transactions have reached a tentative commit and can participate in commit processing

 It is implemented using a 2PC protocol

Trang 20

Nested transaction

– If a nested transaction Tik does not respond to a ‘Prepare’

message, the coordinator can retry Tik in the same node or in

some other node

* If T ik had reached tentative commit and its node had failed when

‘Prepare’ message was sent

* If the failed node recovers and the coordinator retries T ik in it

* The results of T ik, computed before the failure, can be used

Định dạng
Số trang	20
Dung lượng	535,95 KB