Handbook of Reliability, Availability, Maintainability and Safety in Engineering Design - Part 57 ppt

Fault-tree analysis for safety in engineering design is conducted in several steps, from defining the problem to constructing the fault tree, analysing the fault tree, and documenting th

Trang 1

Consequence Consequence Consequence

No Condition

Time delay

Initiating event

Fault tree

Yes

No Condition Yes

Fig 5.3 Cause-consequence diagram

man et al 1994), and programmable user modelling applications (Blandford et al 1999) have emerged to reconcile deficiencies in the tree-based analysis techniques Furthermore, although the use of techniques are adequately suitable in designing for safety of process engineering designs, their use in designing for systems control

is complicated by the large number of ways that computational control can address,

or even contribute to, hazardous system states This problem is solved by the use

of a relatively new forward analysis technique called deviation analysis (Leveson

1995)

Deviation analysis (DA) is based on the underlying assumption that many

acci-dents or inciacci-dents are the result of deviations in system variables, where a devia-tion is the difference between the actual and correct values appropriate for system control The method originates from the forward analysis technique of software de-viation analysis (SDA) in which hazardous behaviour in system control software is analysed DA is an extension of the technique to system control hardware Deviation analysis determines whether hazardous systems behaviour can result from a class of input deviations inclusive in the broad range of process characteristics such as ca-pacity, input, throughput, output and quality It is a means of determining system

component robustness (or, in safety terminology, its survivability), or how it will

behave in an imperfect environment

Hazardous operability studies (HAZOP, short for hazard and operability), was

first introduced by engineers from ICI Chemicals in the UK, in the 1970s The

method entails the investigation of deviations from the design intent for a process

engineering installation by a design team with expertise in different areas such as engineering, operations, maintenance, safety and chemistry The team is guided in

a structured process, by using a set of guidewords to examine deviations from nor-mal process conditions at various key points (nodes) throughout the process The guidewords are applied to the relevant process parameters—for example, flow,

Trang 2

tem-perature, pressure, composition, etc.—in order to identify the causes and conse-quences of deviations Typical terms used in a HAZOP are the following (Kletz

1999):

• Node: a specific location in the process in which (the deviations of) the process intention are evaluated.

• Intention: description of how the process is expected to behave at the node;

this is qualitatively described as an activity (e.g feed, reaction, sedimentation) and/or quantitatively in the process parameters, like temperature, flow rate, pres-sure, composition, etc

• Deviation: a way in which the process conditions may depart from their

inten-tion

• Parameter: the relevant parameter for the condition(s) of the process; e.g

pres-sure, temperature, composition, etc

• Guideword: a short word to describe a deviation of the intention The mostly

used guidewords are NO, MORE, LESS, AS WELL AS, PART OF, OTHER THAN and REVERSE In addition, guidewords like TOO EARLY, TOO LATE, INSTEAD OF, etc are used, the latter mainly for batch-like processes The guidewords are applied, in turn, to all parameters, in order to identify unex-pected and yet credible deviations from the intention

• Cause: the reason(s) why the deviation could occur Many causes could be

iden-tified for one deviation

• Consequence: the results of the deviation, in case it occurs Consequences may

comprise both process hazards and operability problems, like plant shutdown or quality decrease of the product Many consequences can follow from one cause and, in turn, one consequence can have several causes

• Safeguard: facilities that help to reduce the occurrence frequency of the

devia-tion or to mitigate its consequences There are five types of safeguards:

a) Facilities that identify the deviation These comprise, among others, alarm

instrumentation and human operator detection

b) Facilities that compensate the deviation, e.g an automatic control system

that reduces the feed to a vessel in case of overfilling (increase of level) These usually are an integrated part of the process control

c) Facilities that avoid the deviation from occurring.

d) Facilities that prevent deviation from escalating (e.g trips) These facilities

are often interlocked with several units in the process, and controlled by logical computers

e) Facilities that relieve the process from the hazardous deviation These

com-prise, for instance, pressure safety valves (PSV) and vent systems

• Recommendation: activities identified during a HAZOP study for follow-up.

These may comprise technical improvements in the design, modifications in the status of drawings and process descriptions, procedural measures to be de-veloped or further in-depth studies to be carried out

Trang 3

5.2.1.1 Fault-Tree Analysis for Safety in Engineering Design

The concept of fault-tree analysis (FTA) was originated by Bell Telephone

Labora-tories in the 1960s as a technique to perform a safety evaluation of the Minutemen Intercontinental Ballistic Missile Launch Control System A fault tree is a logical diagram that shows the relation between system failure, i.e a specific undesirable event in the system, and failures of the components of the system It is a technique based on deductive logic An undesirable event is first defined and causal relation-ships of the failures leading to that event are then identified Fault trees can be used

in qualitative or quantitative risk analysis The difference between the two is that the qualitative fault tree is linguistic in structure and does not require use of the same rigorous logic as does the formal quantitative fault tree (cf Fig 5.4)

FTA is a deductive technique that focuses on a particular accident or failure, and provides a method for determining causes of that event Fault-tree diagrams use log-ical operators, principally the OR and AND gates The terminology is derived from electrical circuits, the term ‘gate’ referring to the control of a signal or electrical cur-rent The term OR denotes a choice between two or more signals, either of which can ‘open’ the gate The AND term refers to the requirement that both signals are necessary before there is an output from the gate Figure 5.4 shows the logic and event symbols used in FTA

Fault-tree analysis for safety in engineering design is conducted in several steps, from defining the problem to constructing the fault tree, analysing the fault tree, and documenting the results, specifically:

OR gate

The output event occurs if any of the input events occur

AND gate

The output event occurs only when all input events occur

Intermediate event

A fault that results from the interactions of other fault events

Basic event

A component failure that requires no further development

Undeveloped event

A fault that is not examined further because information is unavailable

Transfer IN/OUT symbols

IN indicates the tree is developed further at a corresponding OUT symbol

IN

AND

OR

Undeveloped

event

Basic

event

Intermediate

event

Fig 5.4 Logic and event symbols used in FTA

Trang 4

Step 1 Defining the Problem

The engineering design team selects:

• the top event,

• the boundary conditions,

• system physical bounds,

• the level of systems resolution,

• initial conditions,

• events that are not allowed,

• existing conditions,

• conditional assumptions.

Defining the top event is one of the most important aspects of the first step The top event is the accident (or undesired event) that is the subject of the FTA The top event is often identified through other hazard analysis studies (such as HAZID) Top events should be precisely defined for the system or plant being evaluated, because analysing broadly scoped or poorly defined top events can often lead to an inefficient analysis

For example, a top event of ‘gas leaks in the plant’ is too general Instead, an appropriate top event would be ‘gas leak in the HC piping of the acid separation plant precipitation tank B’

The physical system boundaries encompass the system’s equipment, the equip-ment’s interfaces with other processes, and the utility/support systems that are to be included in the FTA The design team should also specify the level of systems reso-lution for the fault-tree events For example, a motor-operated valve can be included

as a single item of equipment (i.e component) or it can be described as several hard-ware items (i.e parts, e.g the valve body, valve internals, and motor operator) The systems resolution of the FT should be limited to the detail needed to satisfy the analysis objective, and should parallel the resolution of the available information The initial equipment configuration or initial operating conditions describe the system in its normal, unfailed state Events that are not allowed are, for the purposes

of the FTA, events that are considered to be unlikely or that are not to be consid-ered in the analysis, for some exclusive reason For example, wiring failures may

be excluded from the analysis of an instrument system, or cabling may be excluded from the analysis of power generating units Existing conditions within which the system functions are estimates (and assumptions) of the possible operational con-ditions that may arise within the system and its equipment, either as a result of the system’s inherent complexity, or as a result of the complex integration of various systems

Step 2 Constructing the Fault Tree

The FTA begins at the top event and proceeds, level by level, until all fault events

have been traced to their basic contributing causes (i.e basic events) At each level,

Trang 5

Fig 5.5 Safety control of cooling water system

the immediate, necessary and sufficient causes are defined that would result in the intermediate or top event under consideration The analysis continues at each level, until basic causes or the analysis boundary conditions are reached

Returning to the simple fault tree of a cooling water system depicted in Fig 3.19

of Sect 3.2.2.6 dealing with fault-tree analysis in reliability assessment, assume that the systems design included provision for a back-up surge tank with an appropriate control alarm in the event the tank over-flowed, indicating problems with the cooling water feed These problems would typically be:

Excess inflow

Low surge outflow

Control alarm failure

Operator error

Figure 5.5 shows an example of the cooling water surge tank fault tree with two levels below the top event

Step 3 Analysing the Fault Tree

The analysis ‘solves’ the fault tree by identifying combinations of failures that can

lead to accidents These are called minimal cut sets (MCS) The minimal cut sets

for the example shown in Fig 5.5 would be:

Trang 6

• ‘No surge control’ and ‘No alarm control’

• ‘Excess inflow’ and ‘Alarm failure’

• ‘Excess inflow’ and ‘Operator error’

• ‘Low surge outflow’ and ‘Alarm failure’

• ‘Low surge outflow’ and ‘Operator error’.

If the states of each of the control valves (CV1 and CV2) are in failure mode (i.e failed closed and failed open), then further low-level cut sets can be defined, and the fault tree needs to be modified (additional rectangular boxes above each CV circular

box) to include the failed states:

• ‘CV1 fails open’ and ‘Alarm failure’

• ‘CV1 fails closed’ and ‘Alarm failure’

• ‘CV2 fails open’ and ‘Alarm failure’

• ‘CV2 fails closed’ and ‘Alarm failure’.

Failure probabilities can now be assigned The probabilities that are allocated to the events can be combined to estimate the probability of the top event The

probabil-ity of two events, the one with probabilprobabil-ity p1 and the other with probability p2, occurring together are:

and q1and q2are the complements of p1and p2respectively:

q1= 1 − p1

q2= 1 − p2

Then: q1is ‘NOT p1’ and: q2is ‘NOT p2’

The probability of event 1 not occurring is thus q1and the probability of event 2

not occurring is q2 Thus, for event 1 OR event 2 to occur, the probability of the

combination that either does not occur—that is, that one of the two occurs—is given

by the following expression:

The concept of this expression can be clarified by the following example In Fig 5.5, the probabilities of the equipment failures in the circles are derived from expert judgement, and the activities in the rectangular boxes are calculated from frequen-cies further down the tree

The probability for no surge control is calculated as:

P(OR) valves = 1 − [(1 − 0.025) × (1 − 0.025)]

= 0.050

The probability for no alarm control is calculated as:

P(OR) alarm = 1 − [(1 − 0.025) × (1 − 0.052)]

= 0.075

Trang 7

The probability for the top event shown in the figure (tank overflow) is:

P(AND) tank = 0.050 × 0.075

= 0.00375

Although the example is hypothetical, it closely resembles a real-world scenario

in which it is interesting to note that the safety alarm control system’s reliability is lower than that of the surge system it is meant to control! This is due to operator error where operator judgement is jeopardised by failure in the operator control panel (OCP)–which, in many processes, is often the case The failure of an item of equipment will result in its replacement, which reduces the failure frequency, and which then changes the risk probabilities all the way up the tree

The use of computer models is necessary to maintain the fault-tree analysis up

to date It is common in large process plants, however, for the maintenance group not to communicate these improvements to the reliability engineers who continue to use outdated high-risk numbers Similarly, experiences of ineffective operation will usually initiate improved training, so that operator errors are less frequent and the reliability of the whole system is improved

Step 4 Documenting the Results

The analysis should provide a description of the system, a discussion of the problem definition, a list of assumptions, the fault-tree model(s) that were developed, lists

of minimal cut sets, and an evaluation of the significance of the MCSs and any recommendations that arise from the FTA

Probability evaluation of fault trees is considered in most technical papers and books about safety and hazard analysis However, some approximation discrepan-cies are evident, especially in the basic theory of assigning probabilities to the fault-tree gates—specifically, the OR gate

The probability expression for the statistically independent input events for the

OR gate has been given as, (Dhillon 1983):

P(OR) = P(a + b + c + etc.) (5.3)

P(OR) = P(a) + P(b) + P(c) + etc.

a ,b,c, etc = input events

In the example of Fig 5.5, this is equivalent to:

P (OR) = p3+ p4 or p5+ p6

= 0.050 or 0.077 Considering the complements of p1and p2, namely q1and q2, results in:

P (OR) = 1 − (q3× q4) or 1 − (q5× q6)

= 0.049375 or 0.0757

Trang 8

5.2.1.2 Root Cause Analysis for Safety in Engineering Design

Root cause analysis is predominantly a technique for determining the origin of

causes of failure in engineered installations after completion of their design

How-ever, the approach can also be used to identify potential root causes of failure, par-ticularly failures with critical safety consequences, during the engineering design

process before systems manufacture, installation and/or construction The

funda-mental need for design engineers to consider how their designs operate in the field and, more importantly, how they fail is imperative to successfully achieving integrity

in engineering design This will ultimately result in engineering designs that sat-isfy both functional and integrity requirements, using sound engineering judgement, rather than ‘crystal ball’ prediction techniques

Although there is a wealth of knowledge and data concerning systems perfor-mance of existing engineered installations, in general this is not utilised to the ex-tent that information may be obtained for use in new designs, especially in complex integrations of designs To this end, more formal and systematic methods should be introduced during the engineering design process

Although specific methods and tools are available to facilitate designing for reli-ability, for example, their use is often limited to reliability engineers, with the design engineers of other disciplines frequently adopting an intuitive approach to consider-ing reliability in design As the design process becomes increasconsider-ingly sophisticated with higher-level design tasks of complex integrations of similarly complex sys-tems, it has become essential that design engineers formally investigate the integrity

of these designs, particularly at each interface of the integrated systems

Examining and understanding the root cause of failure of a design’s functional

operation can aid in designing for safety and designing-out unreliability In select-ing equipment from an existselect-ing design to meet a new requirement within different systems integration, it is important that design engineers look beyond the standard reliability metric of the existing design, and review in particular the root causes of failure and significant factors affecting the equipment’s reliability and safety In the past, there has been an over-reliance on the use of prediction methods For exam-ple, the original reliability prediction handbook of the USA Department of Defence (DoD), MIL-HDBK-217, contained failure rate models for the various part types used in electronic systems, and concentrated mainly on the use of prediction meth-ods that did not provide engineers with any knowledge of what might fail in service (MIL-HDBK-217F 1998)

A methodology aimed at integrating reliability enhancement practices into the engineering design process has been developed as part of a UK government and aerospace industry initiative As a result, the Reliability Enhancement Methodol-ogy and Modelling (REMM) project was funded in part by the UK’s Department of Trade and Industry through the Civil Aviation Research and Development program and by industrial partners involved (Marshall et al 1998) The main objectives of the project are to develop a methodology that supports reliability enhancement in engineering design and to develop a model that facilitates reliability assessment

throughout a system’s life cycle REMM is primarily used within the aerospace

Trang 9

environment but the methodology and model developed are equally applicable to other high-reliability system designs, such as in process, chemical and mechanical engineering design projects A number of simple practical analyses for use by design engineers, during the early stages of systems realisation, have been developed as part of the REMM methodology These analyses are aimed at improving high-level decision-making using simple graphical representations of reliability data, such as analyses of root causes, trends, and manufacturing data

These graphical representation analyses include:

• Root cause analysis and classification of events into high-level failure categories,

providing the means to determine those factors that have most effect on the sys-tem’s service reliability and, hence, which elements should be tackled as a prior-ity

• Root cause and trend data across specific criteria such as equipment type, periods

of time (e.g particular manufacturing time-line points), application or use, pro-viding further understanding of the nature of the failure that may be characteristic

of the environment in which it is operating

• Manufacturing data analysis, providing valuable insight into the factors that

af-fect service reliability Correlation between manufacturing methods and service requirements can often illuminate small changes in design and manufacturing process that result in significant effects on service reliability

Root cause analysis also utilises the deductive logic tree approach, similar to fault-tree analysis (FTA), in establishing the root causes of functional failure or of a sys-tem state Such an approach to problem solving is particularly useful for determining safety in engineering designs

The approach of establishing the root causes of functional failure in systems design is intended to achieve the following:

• To organise and control design integrity problem identification.

• Provide a visual checklist to ensure all pertinent areas are covered.

• Allow for a standardised approach to safety problem identification.

• Serve as a documented guide for design integrity problem reviews.

The most common root cause analysis methods cover topics from events and causal factor analysis to change analysis, barrier analysis, management oversight and risk assessment, human performance evaluation, standard problem solving and basic decision-making

These methods are considered in the common root cause analysis approach

de-veloped by the Office of Nuclear Energy, US Department of Energy in their DOE guideline DOE-NE-STD-1004-92, and ‘Root cause analysis: guidance document’ (DOE-NE-STD-1004-92 1992)

Trang 10

Common Root Cause Analysis Methods

• Events and causal factor analysis identifies the time sequence of a series of tasks

and/or actions and the surrounding conditions that can lead to a failure occur-rence The results are displayed in an events and causal factor chart that gives

a picture of the relationships of the events and causal factors

• Change analysis is used when the problem is obscure It is a systematic process

that is generally used for a single failure occurrence and focuses on elements that change

• Barrier analysis is a systematic process that can be used to identify physical, and

procedural barriers or controls that should prevent the occurrence of failure

• Management oversight and risk tree (MORT) analysis is used to identify

inad-equacies in barriers/controls, specific barrier and support functions, as well as management functions It identifies specific factors relating to a possible failure occurrence and identifies factors that permit these factors to exist

• Human performance evaluation identifies those factors that influence task

perfor-mance The focus of this analysis method is on operability, work environment, and management factors, as well as man-machine interface studies to improve performance

• Problem solving and decision-making provides a systematic framework for

gath-ering, organising and evaluating information, and applies to all phases of a pos-sible failure occurrence investigation (Kepner et al 1981)

By organising problem analysis results in an orderly manner as the design pro-gresses, the time spent to find the root causes of possible problems is minimised

The method consists of using factor trees to guide the course of the analysis Factor

trees diagrammatically present the major areas to be considered in the various stages

of an engineering design project, such as:

• Systems and equipment design.

• Manufacturing and installation.

• Process start-up and ramp-up.

• Operations and maintenance.

To conduct a root cause analysis specifically in the systems and equipment design stage, a series of charts can be developed representing those functional areas to be investigated, and the various factors to be considered when investigating the func-tional areas for causes of potential failure problems These root cause factors for the systems and equipment design area include the following:

• Origin of design criteria.

• Utility inputs prior to design.

• Equipment specifications.

• Constraints on the design.

• Actual design solution and test.

Định dạng
Số trang	10
Dung lượng	98,23 KB