To model the impact of delayed fatalities in the assessment of safety in engineering design, consider the effect of a new constant risk, with inten-sity h y, which is delayed for time d.
Trang 1e) Critical Risk Theory in Designing for Safety
In applying critical risk theory to a series process engineering system, the following modelling approach is taken:
Assume the system consists of k independent components, each with expected useful life lengths of z1, z2, z3, , z k, all of which must function for the system to
be able to function, and where the useful life length of the system is Z.
Denoting the survival function of the useful life expectancy of Z by F , and of z
i
by F
i (i = 1,2,3, ,k), then
F
i (z) = P00(0,z i)
Then: F (Z) =∏k
i=1F (Z).
The hazard rate represented by the intensity function can now be formulated
h (Z) =∑k
i=1
The probability of failure resulting from critical risk is expressed as (Eq 5.54):
P 0i (0,Z) =
∞
0
Using the expression for the hazard rate h i (z) of useful life expectancy of Z i,
the survival function of the useful life expectancy of the series process engineering
system is then expressed as
F
i (Z) = exp
⎡
⎣−∏K
i=1
z
0
f (z|C = i)
F (Z) dz
⎤
f) The Concept of Delayed Fatalities
In assessing the safety of a complex process, critical risk may be considered as re-sulting in fatalities due to an accident These fatalities can be classified as immediate
or as delayed It is the delayed fatalities that are of primary interest in high-risk en-gineered installations such as nuclear reactors (NUREG 75/014 1975;
NUREG/CR-0400 1978) Critical risk analysis applies equally well to delayed fatalities as to immediate fatalities To model the impact of delayed fatalities in the assessment of safety in engineering design, consider the effect of a new constant risk, with
inten-sity h (y), which is delayed for time d The model parameters include the following
expressions (Thompson 1988):
Trang 2The intensity function for the new risk is:
hnew(y) = 0 y ≤ d
=λ y > d
The probability that the new risk is the critical risk (resulting in fatality) is (from
Eq 5.51)
∏
i
= P 0i (0,∞)
P 0i (0,∞) =
∞
0
P d (0,∞) =
∞
0
=λ
∞
d
F (y)dy + (λ)
The expected useful life with the new risk delayed is expressed as (from Eq 5.49)
μ =
d
0
F (y)dy +
∞
d
∞
d
1− e −λy F (y)dy
=μ−λ
∞
d
F (y)dy + (λ)
The US Nuclear Regulatory Commission’s Reactor Safety Study (NUREG 75/014 1975) also presents nuclear risk in comparison with the critical risk of other types
of accidents For example, the annual chances of fatality for vehicle accidents in the USA are given as 1 in 4,000, whereas for nuclear reactor accidents the value is 1 in
5 billion
5.2.3.2 Fault-Tree Analysis for Safety Systems Design
For potentially hazardous process engineering systems, it is required statutory prac-tice to conduct a quantitative assessment of the safety features at the engineering design stage The design is assessed by predicting the probability that the safety
Trang 3systems might fail to perform their intended task of either preventing or reducing the consequences of hazardous events This type of assessment is best carried out
in the preliminary design phase when the system has sufficient detail for a mean-ingful analysis, and when it can still be easily modified Several methods have been developed for predicting the likelihood that systems will fail, and for making as-sessments on avoiding such failure, or of mitigating its consequence Such methods include Markov analysis, fault-tree analysis, root cause and common cause analysis,
cause-consequence analysis, and simulation Fault-tree analysis (FTA) is the most
frequently used in the assessment of safety protection systems for systems design
a) Assessment of Safety Protection Systems
The criterion used to determine the adequacy of the safety system is usually a com-parison with specific target values related to a system’s probability to function on demand The initial preliminary design specification is to predict its likelihood of failure to perform according to the design intent The predicted performance is then compared to that which is considered acceptable If system performance is not ac-ceptable, then deficiencies in the design are removed through redesign, and the as-sessment repeated With all the various options for establishing the design criteria
of system configuration, level of redundancy and/or diversity, reliability, availabil-ity and maintainabilavailabil-ity, there is little chance that this approach will ensure that the design reaches its final detail phase with all options adequately assessed For safety systems with consequence of failure seen as catastrophic, it is important to optimise performance with consideration of all the required design criteria, and not just ade-quate performance at the best cost The target values should be used as a minimum acceptance level, and the design should be optimised with respect to performance within the constraints of the design criteria These analysis methods are well de-veloped and can be incorporated into a computerised automatic design assessment cycle that can be terminated when optimal system performance is achieved within the set constraints
Safety systems are designed to operate only when certain conditions occur, and function to prevent these conditions from developing into hazardous events with catastrophic consequences As such, there are specific features common to all safety protection systems—for example, all safety systems have sensing devices that re-peatedly monitor the process for the occurrence of an initiating event These sensors usually measure some or other process variable, and transmit the state of the variable
to a controller, such as a programmable logic controllers (PLC) or distributed control system (DCS) The controller determines whether the state of the process variable is acceptable, by comparing the input signal to a set point When the variable exceeds the alarm limit of the set point, the necessary protective action is activated This protective action may either prevent a hazardous event from occurring, or reduce its consequence
There are several design options with respect to the structure and operation
of a safety system where, from a design assessment point of view, the level of
Trang 4redundancy and level of diversity are perhaps the more important The safety
sys-tem must be designed to have a high likelihood of operability on demand Thus, single component failures should not be able to prevent the system from function-ing One means of achieving this is by incorporating redundancy or diversity into the system’s configuration Redundancy duplicates items of equipment (assemblies, sub-assemblies and/or components) in a system, while diversity includes totally dif-ferent equipment to achieve the same function However, increased levels of redun-dancy and diversity can also increase the number of system failures To
counter-act this problem, partial redundancy is opted for—e.g k out of n sensors indicate
a failed condition It is specifically as a result of the assessment of safety in
engi-neering design during the preliminary design phase that decisions are made where to incorporate redundancy or diversity, and if full or partial redundancy is appropriate
b) Design Optimisation in Designing for Safety
The objective of design optimisation in designing for safety is to minimise system unreliability (i.e probability of component failure) and system unavailability (i.e probability of system failure on demand), by manipulating the design variables such that design criteria constraints are not violated However, the nature of the design variables as well as the design criteria constraints engender a complexity problem
in design optimisation
Commonly with mathematical optimisation, an objective function defines how the characteristics that are to be optimised relate to the variables In the case where
an objective function cannot be explicitly defined, some form of the function must
be assumed and the region defined over which the approximate function can be con-sidered acceptable Design criteria constraints fall into two categories: those that can be determined from an objective function relating to the design variables, which can be assessed mathematically, and those that cannot be easily expressed as a func-tion, and can be assessed only through analysis In the former case, a computational method is used to solve the design optimisation problem of a safety system The method is in the form of an iterative scheme that produces a sequence of system designs gradually improving the safety system performance When the design can
no longer be improved due to restrictions of the design criteria constraints, the opti-misation procedure terminates (Andrews 1994)
Assessment of the preliminary design of a safety system might require improve-ments to system performance This could imply developing a means of expressing system performance as a function of the design variables
Qsystem= f (V1,V2,V3, ,Vn) (5.60) where:
V1, V2, V3, , V nare the design variables, typically including:
• the number of high-pressure valves,
• the number of pressure transmitters,
Trang 5• the level of redundancy of valves,
• the number of transmitters to trip.
It is computationally difficult to develop a function Q that can consider all design
options However, with the use of a Taylor series expansion, the following expres-sion is obtained
f (x +Δx ) = f (x) + gTΔx+1
2ΔxT· G ·Δx (5.61) where:
Δx= the change in the design vector
g = the gradient vector
G = the Hessian matrix
The gradient g(x) is the first-order partial derivatives of f (x)
g(x) =
δ
δx1f (x), δ
δx2f (x), δ
δx n f (x)
(5.62)
The Hessian matrix G (x) is a square symmetric matrix of second derivatives
given as
G (x) =
⎡
⎢
⎢
⎢
⎢
⎢
⎣
δ2F
δx1δx1, δ2F
δx1δx2, δ2F
δx1δx n
δ2F
δx2δx1, δ2F
δx2δx2, δ2F
δx2δxn
δ2F
δx nδx1, δ2F
δx nδx2, δ2F
δx nδx n
⎤
⎥
⎥
⎥
⎥
⎥
⎦
(5.63)
Truncating (Eq 5.61) after the linear term inΔx means that the function f (x+Δx) can be evaluated provided that the gradient vector can be obtained, that is,∂f /∂x for
each design parameter Since integer design variables are being dealt with,∂f /∂x
cannot be strictly formulated but, if consideration is taken of the fact that a smooth curve has been used to link all discrete points to give the marginal distribution of
f as a function of x i, then∂f /∂x i can be obtained Partial derivatives can be used
to determine how values of f are improved by updating each x ibyΔx i A fault tree
can be developed to obtain f (x +Δx ) for each x i provided x i+Δx iis integer; finite differences can then be used to estimate∂f /∂xi This would require a large number
of fault trees to be produced and analysed, which would usually result in this option not being pursued from a practical viewpoint
Since truncating the Taylor series of (Eq 5.61) at a finite number of terms
pro-vides only an approximation of f (x+Δx), the solution space over which this approx-imation is acceptable also needs to be defined This is accomplished by setting up
a solution space in the neighbourhood of the design’s specific target variable This procedure results in an iterative scheme, and the optimal solution being approached
by sequential optimisation
Trang 6c) Assessment of Safety Systems with FTA
Where design criteria constraints can be assessed only through analysis, fault-tree analysis (FTA) is applied In the assessment of the performance of a safety system,
a fault tree is constructed and analysed for two basic system failure modes: failure
to work on demand, and spurious system trips Fault trees are analysed in the de-sign optimisation problem, to obtain numerical estimates of the partial derivatives
of system performance with respect to each design variable This information is re-quired to produce the objective function coefficients However, the requirement to draw fault trees for several potential system designs, representing the causes of the two system failure modes, would make the optimisation method impractical Man-ual development of a new tree for each assessment would be too time-consuming One approach in resolving this difficulty is to utilise computer automated fault-tree synthesis programs; at present, these have not been adequately developed to accom-plish such a task An alternative approach has been developed to construct a fault
tree for systems design, using house events (Andrews et al 1986).
House events can be included in the structure of fault trees, and either occur with certainty (event set to TRUE) or do not occur with certainty (event set to FALSE) Their inclusion in a fault-tree model has the effect of turning on or off branches in the tree Thus, a single fault tree can be constructed that, by defining the status of house events, could represent the causes of system failure on demand for any of several potential designs An example of a sub-system of a fault tree that develops
causes of dormant failure of a high-pressure protection system, alternately termed
a high-integrity protection system (HIPS), is illustrated in Fig 5.22 In this exam-ple, the function of the HIPS sub-system is to prevent a high-pressure surge pass-ing through the process, thereby protectpass-ing the process equipment from exceedpass-ing its individual pressure ratings The HIPS utilises transmitters that determine when pipeline pressure exceeds the allowed limit The transmitters relay a signal to a con-troller that activates HIPS valves to close down the pipeline The design variables for optimisation of the HIPS sub-system include six house events (refer to Fig 5.22) that can be summarised in the following criteria:
• what type of valve should be fitted,
• whether high-pressure valve type 1 should be fitted, or not,
• whether high-pressure valve type 2 should be fitted, or not.
The house events in the fault tree represent the following conditions:
H1 – HIPS valve 1 fitted NH1 – HIPS valve 1 not fitted H2 – HIPS valve 2 fitted NH2 – HIPS valve 2 not fitted V1 – Valve type 1 selected V2 – Valve type 2 selected
Considering first the bottom left-hand branch in Fig 5.22 that represents ‘HIPS valve 1 fails stuck’, this event will depend on which type of valve has been selected
Trang 7Fig 5.22 Fault tree of dormant failure of a high-integrity protection system (HIPS; Andrews 1994)
in the design If type 1 has been fitted, then V1 is set to TRUE If type 2 is fitted, then V2 is set to TRUE This provides the correct causes of the event being developed in function of which valve is fitted One of either V1 or V2 must be set Furthermore,
if no HIPS option is included in the system design, then house events NH1 and NH2 will both be set (i.e TRUE) Once these events are set, the output event from the OR gates into which they feed will also be true At the next level up in the tree structure, both inputs to the AND gate will have occurred and, therefore, the HIPS system will not provide protection Where HIPS valves are fitted, the appropriate house events
Trang 8NH1 or NH2 will be set to FALSE, requiring component failure event to render the HIPS sub-system inactive
By using house events in this manner, all design options can be represented in
a single fault tree Another fault tree can be constructed using the same technique to represent causes of spurious system failure for each potential design The fault trees are then analysed to obtain numerical estimates of the partial derivatives of system performance with respect to each design variable This information is required to produce the objective function coefficients in the design optimisation problem The objective function is then derived by truncating the Taylor series at the linear term
of the gradient vector, g, and ignoring the quadratic term of the Hessian matrix.
This truncation means that a finite number of terms provide an approximation of the
objective function, with a valid representation of Qsystemonly within the neighbour-hood of the target design variables Additional constraints are therefore included to restrict the solution space in the neighbourhood of the design’s specific target vari-ables The objective function is then evaluated in the restricted design space, and the optimal design selected
5.2.3.3 Common Cause Failures in Root Cause Analysis
The concept of multiple failures arising from a common cause was first studied on
a formal basis during the application of root cause analysis in the nuclear power industry In order to obtain sufficiently high levels of reliability and safety in critical risk control circuits, redundancy was introduced In applying redundancy, several items can be used in parallel with only one required to be in working order Although the approach increases system reliability, it leads to large increases in
false alarms measured in what is termed the false alarm rate (FAR) This is over-come, however, by utilising a concept termed voting redundancy; in its simplest
arrangement, this is two out of three, where the circuit function is retained if two
or three items are in working order This not only improves reliability and safety but also reduces the FAR Voting redundancy has the added advantage that a system can tolerate the failure of some items in a redundant set, allowing failed items to be taken out of service for repair or replacement (electronic control components such
as sensors, circuit boards, etc are usually replaced)
a) Defining CMF and CCF
It has become evident from practical experience in the process industry that, in many cases, the levels of reliability and safety that are actually being obtained have fallen short of the predicted design values This is due largely to common root causes
leading to the concurrent failure of several items The concept of common mode failures (CMF) was developed from studies into this problem It was subsequently
recognised that multiple failures could arise from common weaknesses, where a par-ticular item (assembly and/or component) was used in various locations on a plant
Trang 9Furthermore, the term common cause failure (CCF) was applied to the root causes
of these failure modes, not only manifested at the parts and component level but also including the effects of the working environment on the item, e.g the effects from the assembly, sub-system and system levels, as well as the process and
environmen-tal conditions Consequently, the term dependent failure was used to include both
CMF and CCF, although CMF is, in effect, a subset of CCF Many terms relating
to the integrity of systems and components were formally defined and included in
a range of military standards, especially reliability and maintainability However, it took some time before CMF and CCF were formally defined in the nuclear energy industry
The UK Atomic Energy Authority (AEA) has defined CMF as follows (Edwards
et al 1979):
“A common-mode failure (CMF) is the result of an event which, because of dependencies,
causes a coincidence of failure states of components in two or more separate channels of
a redundancy system, leading to the defined system failing to perform its intended function”.
The UK Atomic Energy Authority has also defined CCF as follows (Watson 1981):
“A common-cause failure is the inability of multiple first in line items to perform as required
in a defined critical time period, due to a single underlying defect or physical phenomena, such that the end effect is judged to be a loss of one or more systems”.
CCF can arise from both engineering and operational causes:
• Engineering causes can be related to the engineering design as well as
manu-facturing, installation and construction stages Of these, engineering design cov-ers the execution of the design requirement and functional deficiencies, while the manufacturing, installation and construction stages cover the activities of fabrication and inspection, packaging, handling and transportation, installation and/or construction Plant commissioning is often also included in the engineer-ing causes
• Operational causes can be separated into procedural causes and environmental
effects The procedural causes cover all aspects of maintenance and operation
of the equipment, while environmental causes are quite diverse in that they in-clude not only conditions within the process (influenced partly by the process parameters and the materials handled in the process) but external environmental conditions such as climatic conditions, and extreme events such as fire, floods, earthquakes, etc as well
Typical examples of actual causes of CCF are (Andrews et al 1993):
• Identical manufacturing defects in similar components.
• Maintenance errors made by the same maintenance crews.
• Operational errors made by the same operating crews.
• Components in the same location subject to the same stresses.
Since the earliest applications of CCF, two methods have been extensively used to
allow for such events These are the cut-off probability method and the beta factor
Trang 10Table 5.12 Upper levels of systems unreliability due to CCF
Systems configuration Minimum failure probability
Single instrument 10−2
Redundant system 10−3
Partially diverse system 10−4
Fully diverse system 10−5
Two diverse systems 10−6
method The cut-off probability method proposes limiting values of failure
proba-bility to account for the effect of CCF
The basis of this is the assumption that, because of CCF, system reliability can never exceed an upper limit determined by the configuration of the system These
upper levels of systems unreliability were generically given as shown in Table 5.12
(Bourne et al 1981)
The beta method assumes that a proportion,β, of the total failure rate of a com-ponent arises from CCF It follows, therefore, that the proportion (1−β) arises from
independent failures This can be expressed in
where:
λt = the total failure rate
λi = the independent failure rate
λccf = the common cause failure rate
From this equation follows
λccf=β·λt
and:
The results from the beta factor method must, however, be considered with some pessimism because they need to be modified for higher levels of redundancy than
is needed for the simple one-out-of-two case Although in theory CCF can occur, it does not follow that it will The probability of failure of all three items of a two-out-of-three redundancy system due to CCF is likely to be lower than the probability of two failing (Andrews et al 1993)
The cut-off method is thus extensively used where there are no relevant field data or even if any database is inadequate, and serves as a suitable guide in the preliminary design phase for determining the limiting values of failure probability
to account for the effect of CCF It is also quite usual in such circumstances to use the beta factor method, but this requires engineering judgment on the appropriate values for beta—in itself, this is probably no more accurate than using the cut-off method A combination of both methods in the assessment of reliability and safety due to CCF in engineering design is best suited for application by expert judgment