Handbook of Reliability, Availability, Maintainability and Safety in Engineering Design - Part 11 docx

Table 3.7 Failure mode effect severity classifications1 Catastrophic The occurrence of failure may result in death or equipment loss A 2 Critical The occurrence of failure may result in

Trang 1

Table 3.7 Failure mode effect severity classifications

1 Catastrophic The occurrence of failure may result in death

or equipment loss

A

2 Critical The occurrence of failure may result in severe injury

or major system damage leading to loss

B

3 Marginal The occurrence of failure may result in minor injury

or minor system damage leading to loss

C

4 Minor The failure is not serious enough to lead to injury

or system damage, but it will result in repair or in unscheduled maintenance

D

Table 3.8 Qualitative failure probability levels

Item Probability Term Description

level

1 I Frequent High probability of occurrence during

the item operational period

probable

Moderate probability of occurrence during the item operational period

3 III Occasional Occasion probability of occurrence during

4 IV Remote Unlikely probability of occurrence during

unlikely

Zero chance of occurrence during the item operational period

Fig 3.17 Criticality matrix (Dhillon 1999)

Trang 2

Table 3.9 Failure effect probability guideline values

Item no Failure effect description Probability value of F

3 Probable loss 0.10 < F < 1.00

4 Possible loss 0< F < 0.10

where:

Kfm is the failure mode criticality number

θ= the failure mode ratio or the probability that a component will fail in the

particular failure mode of interest More specifically, it is the fraction of the component failure rate that can be allocated to the failure mode under con-sideration When all failure modes of a component are specified, the sum of the allocations equals unity

F= the conditional probability that the failure effect results in the indicated

severity classification or category, given that the failure mode occurs The

values of F are based on an analyst’s judgment, and these values are

quanti-fied according to Table 3.9

T = is the operational time expressed in hours or cycles

λ= is the component failure rate

The item criticality number K iis calculated separately for each severity class Thus, the total of the criticality numbers of all the failure modes of a component in the severity class of interest is given by the summation of the variables of Eq (3.20), as indicated in

K i=∑n

j=1(kfm)j=∑n

j=1(FθλT)j , (3.21)

where n is the item failure modes that fall under the severity classification under

consideration

When a component’s failure mode results in multiple severity class effects, each with its own occurrence probability, then only the most important is used in the

calculation of the criticality number K i(Agarwala 1990)

This can lead to erroneously low K ivalues for the less critical severity categories

In order to rectify this error, it is recommended to compute F values for all severity

categories associated with a failure mode, and ultimately include only contributions

of K ifor category B, C and D failures (Bowles et al 1994)

c) FMECA Data Sources and Users

Design-related information required for the FMECA includes system schematics, functional block diagrams, equipment detail drawings, pipe and instrument dia-grams (P&IDs), design descriptions, relevant specifications, reliability data,

Trang 3

avail-able field service data, effects of operational and environmental stress, configuration management data, operating specifications and limits, and interface specifications Usually, an FMECA satisfies the needs of many groups during the engineering de-sign process, including not only the different engineering disciplines but quality assurance, reliability and maintainability specialists, systems engineering, logistics support, system safety, various regulatory agencies, and manufacturing contractors

as well Some specific FMECA-related factors and their corresponding data retrieval sources are given as follows (Bowles et al 1994)

FMECA-related factors and their corresponding data sources:

• Failure modes, causes and rates (manufacturer’s database, field experience).

• Failure effects (design engineer, reliability engineer, safety engineer).

• Item identification numbers (parts list).

• Failure detection method (design engineer, maintenance engineer).

• Function (client requirements, design engineer).

• Failure probability/severity classification (safety engineer).

• Item nomenclature/functional specifications (parts list, design engineer).

• Mission phase/operational mode (design engineer).

The FMEA worksheet (Moss et al 1996) is tabular in format to provide a

system-atic approach to the analysis The column headings of a standard FMEA worksheet generally are:

• Item identity/description: a unique identification code and description of each

item

• Function: a brief description of the function performed by the item.

• Failure mode: each item failure mode is listed separately, as there may be several

for an item

• Possible causes: the likely causes of each postulated failure mode.

• Failure detection method: features of the design through which failure can be

recognised

• Failure effect—local level: the effect of the failure on the item’s function.

• Compensating provisions: which could mitigate the effect of the failure.

• Remarks: comments on the effect of failure, including any potential design

changes

FMEA extension into FMECA worksheet If the analysis is extended to quantify

the severity and probability of failure (or failure rate) of the equipment as defined in

a failure modes and effects criticality analysis (FMECA), further columns are added

to the FMEA worksheet, such as:

Failure consequence—system level: the consequences of the failure mode on

sys-tem operation

Severity: the level of severity of the consequence of each failure mode, classified

as:

Level 1—minor, with no consequence on functional performance

Level 2—major, with degradation of system functional performance

Trang 4

Level 3—critical, with a severe reduction in the performance of system function resulting in a change in the system operational state

Level 4—catastrophic, with complete loss of system function

Loss frequency: the expected frequency of loss resulting from each failure mode,

either as a failure rate or as failure probability The latter is usually estimated for the operating time interval as a proportion of the overall system failure rate or failure probability (FP) The levels generally employed for processes are: i) Very low probability<0.01 FP

ii) Low probability 0.01–0.l FP

iii) Medium probability 0.1–0.2 FP

iv) High probability>0.2 FP

Component failure rateλp: the overall failure rate of the component in its opera-tional mode and environment Where appropriate, application and environmental factors may be applied to adjust for the difference between the conditions asso-ciated with the generic failure rate data and operating stresses under which the item is to be used

Failure mode proportionα: the fraction of the overall failure rate related to the fail-ure mode under consideration

Probability of failure consequenceβ: conditional probability that a failure conse-quence occurs

Operational failure rateλo: the product ofλp,αandβ

Data source: the source of the failure rate (or failure probability) data.

For FMECAs, a criticality matrix is constructed that relates loss frequency to

sever-ity for each failure mode Failure mode identification numbers are entered in the appropriate cell of the matrix according to their loss frequency and severity to iden-tify each critical item failure mode

Thus: Criticality= Severity × Loss frequency,

or: Criticality= Severity × Operational failure rate.

3.2.2.6 Fault-Tree Analysis in Reliability Assessment

There are two approaches that can be used to analyse the causal relationships

be-tween equipment and system failures (Moss et al 1996) These are inductive or forward analysis, and deductive or backward analysis FMEA is an example of

in-ductive analysis As previously considered, it starts with a set of equipment failure conditions and proceeds forwards, identifying the possible consequences; this is a

‘what happens if ’ approach.

Fault-tree analysis is a deductive ‘what can cause this’ approach, and is used

to identify the causal relationships leading to a specific system failure mode—the

‘top event’ The fault tree is developed from this top, undesired event, in branches showing the different event paths Equipment failure events represented in the tree are progressively redefined in terms of lower resolution events until the basic events

Trang 5

are encountered on which substantial failure data must be available The events are combined logically by use of gate symbols as shown in Fig 3.18, which illustrates the structure of a typical fault tree

In this case, the basic event combinations are developed that could result in total loss of output from a simple cooling water system Using this failure logic diagram, the probability of the top event or the top event frequency can then be calculated

by providing information on the basic event probabilities The top event and the system boundary must be chosen with care so that the analysis is not too broad or too narrow to produce the results required The specification of the system boundary

is particularly important to the success of the analysis

Many cooling water systems have external power supplies and other services such as a water supply It would not be practical to trace all possible causes of failure of these services back through the distribution and generation systems, nor would this extra detail provide any useful information concerning the system being

Total loss of output

Filter

failure

Failure of

power supply

Supp

Failure of both pumps

Failure of pump A

Failure of pump B

Pump failure

Valve failure OR

Fig 3.18 Simple fault tree of cooling water system

Trang 6

assessed The location of the external boundary will be partially decided by the as-pect of system performance that is of interest; however, it is also important to define the external boundary in the time domain Process start-up or shutdown conditions can generate different hazards from steady-state operation, and it may be necessary

to trace any possible faults that could occur

In Fig 3.18, basic event combinations are developed of the failures of both

pump A and pump B or failure of the power supply that results in overall pump failure and/or failures of the filter or valve that could result in total loss of output

of the cooling water system This approach is clearly depicted in the structure of

the fault tree of Fig 3.18, in that the basic events are combined in an event hierar-chy, from the lower component/sub-assembly levels to the higher assembly/systems

levels of the cooling water system systems breakdown structure (SBS)

a) Fault-Tree Analysis Steps

The detailed steps required to perform a fault-tree analysis within the reliability

assessment procedure for equipment design can be summarised in the following

(Andrews et al 1993):

• Step 1: System configuration understanding.

• Step 2: Identification of system failure states.

• Step 3: Logic model generation.

• Step 4: Qualitative evaluation of the logic model.

• Step 5: Equipment failure analysis.

• Step 6: Quantitative evaluation of the logic model.

• Step 7: Uncertainty analysis.

• Step 8: Sensitivity/importance analysis.

Many of these steps are the same, whatever system and/or equipment is being ana-lysed, though there are some aspects that require special attention, particularly to systems interface when mechanical and electrical equipment is involved Once the first four steps have been conducted, a qualitative evaluation of the fault-tree logi-cal model is necessary to review whether system configuration and system failure

states are correctly understood The minimal cut sets (combinations of equipment failures that provide the necessary and sufficient conditions for system failure) are

then produced

To progress even further with reliability assessment using fault-tree analysis, the

probability of equipment failure, q (t), may be determined together with equipment

maintainability in the form of a repair rate

q (t) = λ

λ+ν(1 − e −(λ+ν)t ) (3.22) Equation (3.22) is for revealed failures whereλ is the failure rate andνthe repair

rate Equation (3.23) is for unrevealed failures, where q is the average

Trang 7

unavail-ability,τis the mean time to repair, andθ is the test interval

For safety systems that are normally inactive, failures are revealed only during test

or actual use, which means that the unrevealed failure model is appropriate for these systems However, the underlying assumption in both of these models is that the failure and repair rates are constant, giving a negative exponential distribution for

the probability of failure (repair) prior to time t Constant failure rates are associated

with random failure events, as indicated by the useful life period of the hazard rate curve, considered in detail in Section 3.2.3

However, mechanical equipment subject to wear, corrosion, fatigue, etc may in many cases not conform to this assumption (Andrews et al 1993) When either the failure or repair rates are not constant, and the probability density functions for the

times to failure f (t) and repair g(t) are available, then they can be combined to give

the unconditional failure intensity w (t) and unconditional repair intensityν(t) by

solving the following simultaneous integral

w (t) = f (t) +

t

0

f (t − u)ν(u)du , (3.24)

ν(t) =

t

0

Having solved these equations, the equipment failure probability is then given by

q (t) =

t

0

[w(u) −ν(u)]du (3.26)

For the case of constant failure rates, the probability density functions for the times

to failure and repair are given as

f (t) =λe−λt , (3.27)

g (t) =νe−νt (3.28) Equations (3.24) and (3.25) can be solved by Laplace transforms Substituting the solution obtained into Eq (3.26) yields Eq (3.27) For more complex distributions

of failure and repair times, numerical solutions may be required With the equipment failure data produced at Step 5, fault-tree quantification gives the system failure probability, the system failure rate, and the expected number of system failures Where failure and repair distributions have been specified for the analysis, con-fidence intervals can be determined at Step 7 Step 8 produces the importance rank-ings for the basic event identifying the equipment that provides the most significant

Trang 8

contribution to system failure Fault trees in reliability assessments of integrated en-gineering systems are significantly more complex than that illustrated in Fig 3.18 With complex engineering designs, fault-tree methodology includes the concepts

of availability and maintainability This is considered in greater detail in Chapter 4,

Availability and Maintainability in Engineering Design

b) Fault-Tree Analysis and Safety and Risk Assessment

The main use of fault trees in designing for reliability is in safety and risk studies.

Fault trees provide a useful representation of the different failure paths, and this can lead to safety and risk assessments of systems and processes even without consider-ing failure and repair data—which does cause some difficulties (Moss et al 1996)

In many cases, fault trees and failure mode and effect analysis (FMEA) are em-ployed in combination—the FMEA to define the effects and consequences of

spe-cific equipment failures, and the fault tree (or several fault trees) to identify and quantify the paths that lead to equipment failure probability, and high risks of safety.

3.2.3 Theoretical Overview of Reliability Evaluation

in Detail Design

Reliability evaluation determines the reliability and criticality values for each in-dividual item of equipment at the lower systems levels of the systems breakdown structure Reliability evaluation determines the failure rates and failure rate patterns

of components, not only for functional failures that occur at random intervals but for wear-out failures as well

Reliability evaluation is considered in the detail design phase of the engineering

design process, to the extent of determination of the frequencies with which failures

occur over a specified period of time based on component failure rates.

The most applicable methodology for reliability evaluation in the detail design phase includes basic concepts of mathematical modelling such as:

• The hazard rate function.

(To represent the failure rate pattern of a component by evaluating the ratio

be-tween its probability of failure and its reliability function.)

• The exponential failure distribution.

(To define the probability of failure and the reliability function of a component

when it is subject only to functional failures that occur at random intervals.)

• The Weibull failure distribution.

(To determine component criticality for wear-out failures, rather than random failures.)

• Two-state device reliability networks.

(A component is said to have two states if it either operates or fails.)

Trang 9

• Three-state device reliability networks.

(A three-state component derates with one operational and two failure states.)

3.2.3.1 The Hazard Rate Function

The hazard rate function is a representation of the failure rate pattern of the ratio between a particular probability density function (p.d.f.), and its cumulative distri-bution function (c.d.f.) or its reliability function.

For continuous random variables, the cumulative distribution function is defined

by

F (t) =

t

−∞

where:

f (x) = probability density function of the distribution of value x over the interval

−∞to t.

In the case where t →∞, the cumulative distribution function is unity

F(∞) =

∞

−∞

The probability density function is derived from the derivative of the cumulative

distribution function, as follows

dF (t)

dt = d

dt

⎡

⎣

t

−∞

f (x)dx

⎤

The reliability function over a period of time t is the difference between the cumu-lative distribution function where t →∞and the cumulative distribution function in

the period of time t or, alternately, it is the subtraction of the cumulative distribution function of failure over a period of time t from unity

The hazard rate function is then defined as

λ(t) = f (t)

or

λ(t) = f (t)

1− F(t) .

Trang 10

Thus, the hazard rate function can be used to represent the hazard rate curve of

sev-eral different probability density functions, particularly the exponential or Poisson function in whichλ(t) is a constant, and the Weibull function in whichλ(t) is either

decreasing or increasing

a) Review of the Hazard Rate Curve

A hazard rate curve is shown in Fig 3.19 This curve is used to represent the failure rate pattern of equipment (i.e assemblies and predominantly components; EPRI 1974) Failure rate representation of electronic components is a prime example, in

which case only the middle portion (useful life period), or the constant failure rate region of the curve is considered

As can be seen in Fig 3.19, the hazard rate curve may be divided into three

distinct regions or parts (i.e decreasing, constant, and increasing hazard rate) The

decreasing hazard rate region of the curve is designated the ‘burn-in period’, or

‘in-fant mortality period’ The ‘burn-in period’ failures, known as ‘early failures’, are the result of design, manufacturing or construction defects in new equipment As the ‘burn-in period’ increases, equipment failures decrease, until the beginning of

the constant failure rate region, which is the middle portion of the curve and

des-ignated the ‘useful life period’ of equipment Failures occurring during the ‘useful life period’ are known as ‘random failures’ because they occur unpredictably This period starts from the end of the ‘burn-in period’ and finishes at the beginning of the

‘wear-out phase’

Fig 3.19 Failure hazard curve (life characteristic curve or risk profile)

Định dạng
Số trang	10
Dung lượng	161,61 KB