Critical systems development

Trang 1

©Ian Sommerville 2004 Software Engineering, 7th edition Chapter 20 Slide 1Critical systems development

©Ian Sommerville 2004 Software Engineering, 7th edition Chapter 20 Slide 2

Objectives

avoidance contribute to the development ofdependable systems

software processes

fault avoidance

their use of diversity and redundancy

Trang 2

Software dependability

software to be dependable However, fornon-critical applications, they may be willing

to accept some system failures

dependability requirements and specialsoftware engineering techniques may beused to achieve this

Dependability achievement

• The system is developed in such a way that human error

is avoided and thus system faults are minimised.

• The development process is organised so that faults in the system are detected and repaired before delivery to the customer.

• Verification and validation techniques are used to discover and remove faults in a system before it is deployed.

complexity and this can increase the chances oferror

& V is a more effective route to software

dependability

Trang 3

Diversity and redundancy examples

(e.g in e-commerce systems), companiesnormally keep backup servers and switch tothese automatically if failure occurs

external attacks, different servers may beimplemented using different operatingsystems (e.g Windows and Linux)

Fault-free software

allow for the production of fault-free software, atleast for relatively small systems

conforms to its specification It does NOT meansoftware which will always perform correctly asthere may be specification errors

high It is only cost-effective in exceptionalsituations It is often cheaper to accept softwarefaults and pay for their consequences than toexpend resources on developing fault-free software

Fault-free software development

Trang 4

Fault removal costs

Few Number of residual err ors

Dependable processes

faults, it is important to have a well-defined,repeatable software process

does not depend entirely on individual skills;rather can be enacted by different people

activities should include significant effortdevoted to verification and validation

Dependable process characteristics

Docu mentable The p rocess shou ld have a defined process model that sets out

the activities in the process and the docu mentation that is to be produced during these activities.

Standardised A comprehens ive set of software deve lopment standards that

define ho w the software is to be produced and do cumented should be available.

Auditable The p rocess shou ld be unde rstandab le by people apart from

proces s participants who can check that process standards are being followed and make sugges tions for proces s improvement Diverse The p rocess shou ld include redundan t and d iverse verification

and validation activities.

Robust The p rocess shou ld be ab le to recov er from failures of

individual process activities.

Trang 5

Chapter 29, is also essential

Dependable programming

that contribute to fault avoidance and faulttolerance

constructs

Information protection

the program which need to access it This involvesthe creation of objects or abstract data types thatmaintain state and that provide operations on thatstate

• the probability of accidental corruption of information is reduced;

• the information is surrounded by ‘firewalls’ so that problems are less likely to spread to other parts of the program;

• as all information is localised, you are less likely to make errors and reviewers are more likely to find errors.

Trang 6

A queue specification in Java

interface Queue {

public void put (Object o) ;

public void remove (Object o) ;

public int size () ;

} //Queue

Signal declaration in Java

class Signal {

static public final int red = 1 ;

static public final int amber = 2 ;

static public final int green = 3 ;

public int sigState ;

}

Safe programming

consequence of programmers makingmistakes

track of the relationships between programvariables

error-prone than others so avoiding their usereduces programmer mistakes

Trang 7

Structured programming

development that makes programs easier tounderstand and that avoids programmer errors

control statements

thought and discussion about programming

• Run-time allocation can cause memory overflow.

 Default input processing

• An input action that occurs irrespective of the input.

Trang 8

Exception handling

unexpected event such as a power failure

events to be handled without the need forcontinual status checking to detect exceptions

exceptions needs many additional statements to beadded to the program This adds a significantoverhead and is potentially error-prone

Trang 9

A temperature controller

technique and not just as a way of recovering fromfaults

keeps the freezer temperature within a specifiedrange

Sensor tempSensor = new Sensor () ;

Dial tempDial = new Dial () ;

float freezerTemp = tempSensor.readVal () ;

final float dangerTemp = (float) -18.0 ;

final long coolingTime = (long) 200000.0 ;

public void run ( ) throws InterruptedException {

try { Pump.switchIt (Pump.on) ;

System.out.println (“Thread exception”) ;

throw new InterruptedException ( ) ;

}

} //run

} // FreezerController

Trang 10

Fault tolerance

fault tolerant

availability requirements or where system failurecosts are very high

in operation in spite of software failure

specification, it must also be fault tolerant as theremay be specification errors or the validation may beincorrect

Fault tolerance actions

Fault detection and damage assessment

that a fault (an erroneous system state) hasoccurred or will occur

that must hold for all legal states andchecking the state against these constraints

Trang 11

Insulin pump state constraints

// The dose of insulin to be delivered must always be greater // than zero and less that some defined maximum single dose insulin_dose >= 0 & insulin_dose <= insulin_reservoir_contents // The total amount of insulin delivered in a day must be less // than or equal to a defined daily maximum dose

cumulative_dose <= maximum_daily_dose

Fault detection

before the state change is committed If anerroneous state is detected, the change is notmade

the system state has been changed This isused when a incorrect sequence of correctactions leads to an erroneous state or whenpreventative fault detection involves too muchoverhead

extending the type system by includingadditional constraints as part of the typedefinition

defining basic operations within a classdefinition

Type system extension

Trang 12

corruption caused by a system failure

the state space have been affected by thefailure

can be applied to the state elements toassess if their value is within an allowedrange

Trang 13

Robust array 1

class RobustArray {

// Checks that all the objects in an array of objects

// conform to some defined constraint

boolean hasBeenDamaged = false ;

for (int i= 0; i <this.theRobustArray.length ; i ++)

assessment in data transmission

the integrity of data structures

non-terminating processes If no response after acertain time, a problem is assumed

Damage assessment techniques

Trang 14

• Apply repairs to a corrupted system state.

• Restore the system state to a known safe state.

- domain knowledge is required to computepossible state corrections

safe state are maintained and this replaces thecorrupted system state

Fault recovery and repair

• Error coding techniques which add redundancy to coded data can be used for repairing data corrupted during transmission.

• When redundant pointers are included in data structures (e.g two-way lists), a corrupted list or filestore may be rebuilt if a sufficient number of pointers are uncorrupted

• Often used for database and file system repair.Forward recovery

of backward recovery Changes are notapplied until computation is complete If anerror occurs, the system is left in the statepreceding the transaction

'roll-back' to a correct state

Backward recovery

Trang 15

Safe sort procedure

assesses if the sort has been correctly executed

 It maintains a copy of its input so that if an erroroccurs, the input is not corrupted

 Possible in this case as the condition for a‘valid’ sort

is known However, in many cases it is difficult towrite validity checks

Safe sort 1

class SafeSort {

static void sort ( int [] in tarray, int order ) throws SortError {

int [] copy = new int [intarray.length];

// copy the input array

for (int i = 0; i < intarray.length ; i++)

copy [i] = intarray [i] ;

try {

Sort.bubblesort (intarray, intarray.length, order) ;

Safe sort 2

if (order == Sort.ascending)

for (int i = 0; i <= intarray.length-2 ; i++)

if (intarray [i] > i ntarray [i+1])

throw new SortError () ; else

for (int i = 0; i <= intarray.length-2 ; i++)

if (intarray [i+1] > intarray [i])

throw new SortError () ; } // try block

catch (SortError e )

{

for (int i = 0; i < intarray.length ; i++)

intarray [i] = copy [i] ;

throw new SortError ("Array not sorted") ;

} //catch

} // sort

} // SafeSort

Trang 16

Fault tolerant architecture

involve interactions between the hardware and thesoftware

that checks and the associated code are incorrect

a specific architecture designed to support faulttolerance may be required

failure

Hardware fault tolerance

receive the same input and whose outputs arecompared

 If one output is different, it is ignored and componentfailure is assumed

failures rather than design faults and a lowprobability of simultaneous component failure

Hardware reliability with TMR

A2

A1

A3

Output compar ator

Trang 17

Output selection

hardware unit

different from the others, it rejects it

Essentially, the selection of the actual outputdepends on the majority vote

fault management unit that can either try torepair the faulty unit or take it out of service

Fault tolerant software architectures

based on two fundamental assumptions

• The hardware components do not include common design faults;

• Components fail randomly and there is a low probability of simultaneous component failure.

• It isn’t possible simply to replicate the same component

as they would have common design faults;

• Simultaneous component failure is therefore virtually inevitable.

Design diversity

implemented in different ways They therefore ought

to have different failure modes

and function oriented)

• Implementation in different programming languages;

• Use of different tools and development environments;

• Use of different algorithms in the implementation.

Trang 18

Software analogies to TMR

• The same specification is implemented in a number of different versions by different teams All versions compute simultaneously and the majority output is selected using a voting system.

• This is the most commonly used approach e.g in many models of the Airbus commercial aircraft.

• A number of explicitly different versions of the same

specification are written and executed in sequence.

• An acceptance test is used to select the output to be transmitted.

N-versions

Agreed result Fault manager Input

Output comparison

comparator is a simple piece of software thatuses a voting mechanism to select theoutput

requirement that the results from the differentversions are all produced within a certaintime frame

Trang 19

N-version programming

and implemented by different teams It isassumed that there is a low probability thatthey will make the same mistakes Thealgorithms used should but may not bedifferent

commonly misinterpret specifications in thesame way and chose the same algorithms intheir systems

Recovery blocks

Acceptance test

Algorithm 2 Algorithm 1

Algorithm 3 Reco very blocks

Test f or

Re-test Retry Re-test

Try algorithm

1

Contin ue e xecution if acceptance test succeeds Signal e xception if all algorithms fail Acceptance test

fails – r etry

Recovery blocks

for each version so they reduce the

probability of common errors

is difficult as it must be independent of thecomputation used

real-time systems because of the sequentialoperation of the redundant versions

Trang 20

Problems with design diversity

tackle problems in the same way

• Different teams make the same mistakes Some parts of

an implementation are more difficult than others so all teams tend to make mistakes in the same place;

susceptible to specification errors If the specification

is incorrect, the system could fail

specifications are usually more complex thanhardware specifications and harder to validate

developing separate software specifications from thesame user specification

fault avoidance, fault detection and fault tolerance

the development of dependable systems

important if faults in a system are to be minimised

error-prone - their use should be avoided whereverpossible

Key points

Tiêu đề	Critical systems development
Tác giả	Ian Sommerville
Trường học	University of Stirling
Chuyên ngành	Software Engineering
Thể loại	Thesis
Năm xuất bản	2004
Thành phố	Stirling

Định dạng
Số trang	21
Dung lượng	94,83 KB