In applying FMECA, the criticality analysis establishes a priority rating of components according to the consequences and mea-sures of their various failure modes, which helps to priorit
Trang 1to the point of impracticality where, for example, consideration is given only to sin-gle modes of failure, or only to random failure occurrences, or to maintenance that results in complete renewal and ‘as new’ conditions In reality, the situation is much more complicated with interacting multiple failure modes, variable failure rates, as well as maintenance-induced failures that influence the rates of deterioration, and subsequent failure (Woodhouse 1999)
It is somewhat unrealistic to assume a specific failure rate of equipment within
a complex integration of systems with complex failure processes At best, the intrin-sic failure characteristics of components of equipment are determined from quan-titative probability distributions of failure data obtained in a somewhat clinical en-vironment under certain operating conditions The true failure process, however, is subject to many other factors, including premature or delayed preventive mainte-nance activities conducted during shutdowns of process plant
It is generally accepted that shutdowns affect the failure characteristics of equip-ment as a whole, although it is debatable whether the end result is positive or
nega-tive from a residual life point of view, where residual life is defined as the remaining
life expectancy of a component, given its survival to a specific age This is a concept
of obvious interest, and one of the most important notions in process reliability and equipment aging studies for safety criticality analysis
Safety criticality analysis is thus always faced with combinations of interacting
failure modes and variable failure rates, where the cumulative effects are much more important than estimates of specific probabilities of failure Qualitative estimates of how long equipment might last in certain engineering processes, based on operating conditions and failure characteristics, are much more easily made than quantitative estimates of the chances of failure of individual equipment These cumulative effects are represented in equipment survival curves where a best-fit curve is matched to specific survival data, and a pattern of risks calculated that would be necessary for these effects to be realised In analysing survival data, there is often the need to determine not only the survival time distribution but also the residual survival time
(or residual life) distribution A typical equipment survival curve and hazard curve
are illustrated in Fig 5.41a and 5.41b (Smith et al 2000)
Typical impact, risk exposure, lost performance, and direct cost patterns based on shutdown maintenance intervals for rotating equipment, as well as risk-based
Trang 2main-5.2 Theoretical Overview of Safety and Risk in Engineering Design 655
Fig 5.41 a Kaplan–Meier survival curve for rotating equipment, b estimated hazard curve for
rotating equipment
tenance patterns based on shutdown maintenance intervals for rotating equipment are illustrated in Fig 5.42a and 5.42b (APT Maintenance 1999)
b) Risk-Based Maintenance
Risk-based maintenance is fundamentally an evaluation of maintenance tasks, par-ticularly scheduled preventive maintenance activities in shutdown programs It con-siders the impact of bringing forwards, or delaying, activities that are directed at
preventing cost risks to coincide with essential activities that address safety risks If
the extent of these risks were known, and what they cost, the optimum amount of risk to take, and planned costs to incur, could be calculated Similarly, better deci-sions could be made if the value of the benefits of improved performance, longer life and greater reliability was known These risks and benefits are, however, difficult to quantify, and many of the factors are indeterminable Cost/risk optimisation in this
Trang 3Fig 5.42 a Risk exposure pattern for rotating equipment, b risk-based maintenance patterns for
rotating equipment
context can thus be defined as the minimal total impact, and represents a trade-off between the conflicting interests of the need to reduce costs at the same time as the need to reduce the risks of failure Both are measured in terms of cost, the former being the planned downtime cost plus the cost of preventive maintenance in an at-tempt to increase performance and reliability, and the latter being the cost of losses due to forced shutdowns plus the cost of repair and consequential damage
The total impact is the sum of the planned costs and failure costs When this sum
is at a minimum, an optimal combination of the costs incurred and the failure risks
is reached, as illustrated in Fig 5.43
Cost/risk trade-off decisions determine optimal preventive maintenance intervals for plant shutdown strategies that consider component renewal or replacement cri-teria, spares requirements planning, etc Planned downtime costs plus the costs of preventive maintenance are traded-off against the risk consequences of premature or deferred component renewals or replacements, measured as the cost of losses plus
Trang 45.2 Theoretical Overview of Safety and Risk in Engineering Design 657
Fig 5.43 Typical cost optimisation curve
the cost of repair In each of these areas, cost/risk evaluation techniques are applied
to assist in the application of a safety-critical maintenance approach
Component renewal/replacement criteria are directly determined by failure modes and effects criticality analysis (FMECA), whereby appropriate maintenance tasks are matched to failure modes In applying FMECA, the criticality analysis establishes a priority rating of components according to the consequences and mea-sures of their various failure modes, which helps to prioritise the preventive main-tenance activities for scheduled shutdowns An example of an FMECA for process criticality of a control valve, based on failure consequences (downtime) and failure rate (1/MTBF), is given in Table 5.16.
Reliability, availability, maintainability and safety (RAMS) studies establish the most effective combination of the different types of maintenance (i.e a maintenance strategy) for operational systems and equipment The deliverable results are
opera-tions and maintenance procedures and work instrucopera-tions in which the different types
of maintenance are effectively combined for specific equipment
Failure modes and effects criticality analysis (FMECA), as given in Table 5.16,
is one of the most commonly used techniques for prioritising failures in equipment The analysis at systems level involves identifying potential equipment failure modes and assessing the consequences of these for the system’s performance
Table 5.17 shows the designation of maintenance activities, the appropriate main-tenance trade, and the recommended mainmain-tenance frequency for each failure mode, based on MTBF It is evident that some activities need to be delayed to coincide with others
Different types and levels of maintenance effort are applied, depending upon the process or functional criticality (Woodhouse 1999):
• Quantitative risk and performance analysis (such as RAM and FMECA) is
war-ranted for about 5–10% of the most critical failure modes This is where cost/risk optimisation is applicable for significant costs or risks that are sensitive to high-impact strategies
Trang 5to open’) Control valve Fails to seal/close TLF Production Valve stem cylinders seized
due to chemical deposition
or corrosion
critical Instrument
loop (press 1)
Fails to provide
accurate pressure
indication
TLF Maint Restricted sensing port due
to blockage of chemical or physical accumulation
critical Instrument
loop (press 2)
Fails to detect low
pressure condition
TLF Maint Low pressure switch fails
due to corrosion or mechanical damage
critical Instrument
loop (press 2)
Fails to detect low
pressure condition
TLF Maint Pressure switch relay or
cabling failure
critical Instrument
loop (press 2)
Fails to provide
output signal for
alarm
TLF Maint PLC alarm function or
indicator fails
critical
Trang 6Table 5.17 FMECA with preventive maintenance activities
Component Failure
description
Failure causes D/T (h)
(plus damage)
MTTR (h) (repair time) and damage
MTBF (months)
Maintenance activity Maintenance
trade
Maintenance frequency
Control valve Fails to open Solenoid valve fails,
failed cylinder actuator or air receiver failure
Replace components and test PLC interface
Instr tech 12 monthly
Control valve Fails to open No PLC output due to
modules electronic fault or cabling
valve service as above
Instr tech 12 monthly
Control valve Fails to
seal/close
Valve disk damaged due to corrosion wear (same causes as ‘fails
to open’)
valve and check valve stem, seat and disk or diaphragm for deterioration or corrosion and replace with overhauled valve if required
Fitter 6 monthly
Control valve Fails to
seal/close
Valve stem cylinders seized due to chemical deposition
or corrosion
valve condition assessment and replace components
Instr tech 6 monthly
Trang 7gauge if required Instrument loop
(press 2)
Fails to detect
low pressure
condition
Low pressure switch fails due to corrosion
or mechanical damage
operation of pressure switch and wiring.
Test alarm’s operation
Instr tech 3 monthly
Instrument loop
(press 2)
Fails to detect
low pressure
condition
Pressure switch relay
or cabling failure
operation verification
Instr tech 3 monthly
Instrument loop
(press 2)
Fails to
provide
output signal
for alarm
PLC alarm function
or indicator fails
operation verification
Instr tech 3 monthly
Trang 85.2 Theoretical Overview of Safety and Risk in Engineering Design 661
• Rule-based analysis methods (such as RCM and RBI) are more appropriate for
about 40–60% of the critical failure modes, particularly if supplemented with economic analysis of the resulting impact strategies This is where cost/risk op-timisation is applicable for the costs or risks for setting preventive maintenance intervals
• Review of existing maintenance (excluding simple FMEA studies) provides
a simple check at the lower levels of criticality to verify that there is a valid reason for the maintenance activity, and that the cost is reasonable compared to the consequences
c) Safety Criticality Analysis and Risk-Based Maintenance
Safety criticality analysis was previously considered as the assessment of failure risks In this context, safety criticality analysis is applied to determine the essential maintenance intervals, and the impact of premature or delayed preventive mainte-nance activities where failure risks are considered to be safety critical A safety/risk scale is applied, based on a specific cost benchmark (usually computed as the cost
of output per time interval) related to the cost of losses and the likelihood of failure
A safety criticality model to determine the optimal maintenance interval, and the impact of premature or delayed preventive maintenance activities considers the following:
• A quantified description of the degradation process, using estimates wherever
data are not available, as well as identification of failure modes and related causes
• Cost calculations for material and maintenance labour costs for each failure
mode, including possible consequential damage
• Cost/risk calculations for alternative preventive maintenance intervals based on
a specific cost benchmark related to the cost of losses and the likelihood of fail-ure
• Cost criticality rating of failure modes, and sensitivity testing to the limits of the
likelihood of failure under uncertainty of unavailable or censored data
• Identification of key decision drivers (which assumptions have the greatest effect
upon the optimal decision), for review of the preventive maintenance program
In many cases, there are several interacting failure modes, causes and effects, all
in the same evaluation
The preventive maintenance program or, in the case of continuous processes, the shutdown strategy thus becomes a compromise of scheduled times and costs Some activities will be performed ahead of their ideal timing, whilst others will be delayed
to share the downtime opportunity determined by safety-critical shuts
The risks and performance impact of delayed activities, and the additional costs
of deliberate over-maintenance in others, both contribute to the costs for a partic-ular shutdown program The degree of advantage, on the other hand, is controlled
Trang 9sessment scales, and the use of computer automated computation Table 5.18 shows the application of cost criticality analysis to the FMECA for process criticality of the control valve given in Table 5.17 It indicates the cost criticality rating of each failure mode related to the cost of losses and the cost risk based on estimates of the likelihood of failure Table 5.19 shows a comparison between the process criticality rating and the cost criticality rating of each failure mode of the control valve In this case, the ratings correspond closely with one another
The maintenance frequencies of the preventive maintenance activities that were typically based on the mean time between failures (MTBF) are, however, not rela-tive to either the process criticality rating or the cost criticality rating The mainte-nance frequencies thus require review to determine the optimal maintemainte-nance inter-vals whereby the impact of premature or delayed preventive maintenance activities
is considered
This example of a relatively important item of equipment, such as a process con-trol valve, is typical of many such equipment in process plant where RAM, FMECA
or RCM analysis do not provide sufficient information for decisive decision-making,
as the equipment’s failure modes are not significantly high risk but rather medium risk Where the criticality ratings are not significant (i.e evidence of high critical-ity), as in this case of the control valve, maintenance optimisation becomes difficult, necessitating a review of the risk analysis and decision criteria according to qualita-tive estimates
d) Risk Analysis and Decision Criteria
In typical process plant shutdown programs, decisions concerning the extent and timing of component renewal/replacement activities are generally determined by the dominant failure modes that, in effect, relate to less than a third of the program’s total preventive maintenance activities Criticality ranking or prioritising of equip-ment according to the consequences of failure modes is essential for a risk-based maintenance approach, though comparative studies have shown that qualitative risk ranking is, in many cases, just as effective in identifying the key shutdown drivers, often at a fraction of the cost Typically, these risks can be ranked by designating
Trang 10Table 5.18 FMECA for cost criticality
Component Failure
description
Failure mode
Failure causes Defect.
MATL &
LAB ($)/failure (incl.
damage)
Econ.
$/failure (prod.
loss)
Total
$/failure (prod and repair)
Risk Cost criticality
rating
Control
valve
Fails to open TLF Solenoid valve fails,
failed cylinder actuator
or air receiver failure
Control
valve
Fails to open TLF No PLC output due to
modules electronic fault or cabling
Control
valve
Fails to
seal/close
TLF Valve disk damaged due to corrosion wear (same causes as ‘fails
to open’)
Control
valve
Fails to
seal/close
TLF Valve stem cylinders seized due to chemical deposition or corrosion
Instrument
loop
(press 1)
Fails to provide
accurate
pressure
indication
TLF Restricted sensing port due to blockage of chemical or physical accumulation