Investigation of the time behavior of repairable systems spans a very large class of stochastic processes, from simple Poisson process through Markov and semi- Markov processes up to sophisticated regenerative processes with only one or just a few regeneration states. Nonregenerative processes are rarely considered because of mathematical difficulties. Important for the choice of the class of processes to be A. Birolini,Reliability Engineering, DOI: 10.1007/978-3-642-39535-2_6,
ÓSpringer-Verlag Berlin Heidelberg 2014
169
used are the distribution functions for the failure-free and repair times involved. If failure and repair rates of all elements in the system are constant (time independent) during the stay time in every state (not necessarily at a state change, e.g. because of load sharing), the process involved is a time-homogeneous Markov process with a finitenumberofstates,forwhichstaytime ineverystateisexponentiallydistributed.
The same holds if Erlang distributions occurs (supplementary states, Section 6.3.3).
The possibility to transform a given stochastic process into a Markov process by introducing supplementary variables is not considered here. Generalization of the distribution functions for repair times leads to semi-regenerative processes, i.e., to processes with an embedded semi-Markov process. This holds, in particular, if the system has only one repair crew, since each termination of a repair is a regeneration point (because of the constant failure rates). Arbitrary distributions of repair and failure-free times lead in general to nonregenerative stochastic processes.
Table 6.1 shows the processes used in reliability investigations of repairable systems,with their possibilities and limits. AppendixA7introduces theseprocesses with particular emphasis on reliability applications. All equations necessary for the reliability and availability calculation of systems described by time-homogeneous Markov processes and semi-Markov processes are summarized in Table 6.2.
Besides the assumption about the involved distribution functions for failure-free and repair times, reliability and availability calculation is largely influenced by the maintenance strategy,logistic support,type of redundancy,and dependence between elements. Existence of a reliability block diagram is assumed in Sections 6.2-6.7, not necessarily in Sections 6.8-6.10. Results are expressed as functions of time by solving appropriate systems of differential (or integral) equations, or given by the mean time to failure or the steady-state point availability at system level (MTTFS i or PAS) by solving appropriate systems of algebraic equations. If the system has no redundancy, the reliability function is the same as in the nonrepairable case. In the presence of redundancy, it is generally assumed that redundant elements will be repaired on line, i.e. without operational interruption at system level. Reliability investigations thus aim to find the occurrence of the first system down, whereas the point availability is the probability to find the system in an up state at a time t, independently of whether down states at system level have occurred before t.
In order to unify models and simplify calculations, the following assumptions are made for analyses in Sections 6.2-6.7 (partly also in Sections 6.8-6.10).
1. Continuous operation: Each element of the system is in operating or reserve state, when not under repair or waiting for repair. (6.1) 2. No further failures at system down (no FF): At system down the system is
repaired (restored) according to a given maintenance strategy to an up state at system level from which operation is continued, failures during a
repair at system down are not considered. (6.2)
3. Only one repair crew: At system level only one repair crew is available, repair is performed according to a stated strategy (e.g. first-in/first-out). (6.3)
Table 6.1 Basicstochasticprocessesusedinreliability & availabilityanalysisofrepairablesystems Stochastic process Can be used in modeling Back-ground Diffi-culty Renewal process One-item structures with arbitrary failure rates,
negligible repair times, new after repair
Renewal theory Medium Alternatingrenewalproc.
(SMP with 2 states)
One-item repairable structures with arbitrary failure and repair rates, new after repair
Renewal theory Medium Markov process(MP)
(finite state space, time- homogeneous, regenerative at every time point t)
Systems of arbitrary structure whose elements have constant failure and repair rates during the stay time in every state (not necessarily at a state change, e. g. because of load sharing) *
Differen- tial eqs.
or Integral equations
Low
Semi-Markov process with>2 states(SMP) (regenerative at state change)
Some (few) systems with only one repair crew, whose elements have constant failure and arbitrary repair rates*
Integral
equations Medium Semi-regenerativeprocess
(process with an embedded SMP with≥2 states)
Systems of arbitrary structure with only one repair crew, whose elements have constant failure and arbitrary repair rates *
Integral equations High Regenerative process
with just one regeneration state
Systems of arbitrary structure whose elements have constant failure and arbitrary repair rates* (insomecases const. failure rate only in a reservestate)
Integral equations
High to very high Nonregenerative
process
Systems whose elements have arbitrary failure and repair rates
Partial diff. eqs.
High to very high repaired elements new after repair (yielding system new (with respect to a specific state) for constant failure rates of all ele - ments and only one repair crew); * constant failure / repair rates can be extended to Erlang distribution (Fig. 6.6)
4. Redundancy: Failure detection & switch are ideal, and redundant elements are repaired on line, i.e. without interruptionofoperationatsystem level. (6.4) 5. States: Each element in the reliability block diagram has only two states
(good or failed), and is as-good-as-new after each repair (restoration). (6.5) 6. Independence:Failure-free (failure-free operating) and repair (restoration)
times of each element are stochastically independent, >0, and continuous random variables with finite mean (MTTF MTTR, )and variance. (6.6) 7. Support: Preventive maintenance is neglected and logistic support is ideal
(repair time=restoration time=down time). (6.7)
The above assumptions holds for Sections 6.2-6.7, and apply in many practical applications. However, assumption(6.5)must be critically verified fortheaspect as- good-as-new, when repaired elements contain parts with time dependent failure rate which have not been replaced by new ones at repair; with (6.3), it applies at system level only if at each repair all non-replaced parts have constant failure rates.
At system level,reliability figureshaveindices S i (e. g. MTTFS i), whereSstands for system andiis the state entered at t=0, see Table 6.2 (system refers in this book, often in practical applications,to the highest integration level of theitemconsidered;
t=0 is the beginning of observations, x=0 for interarrival times). Assuming irre- ducible embedded Markov chains, asymptotic & steady-state is used for stationary.
ReliabilityPoint AvailabilityInterval Reliability
Semi - Markov Processes (SMP) R()()()R(),Q,, ()SiiijSj
t ttxtxdx ZU ji
Si j
iZUt=−+−∫
∈>∑1=0∈ ≠
0 01q R MTTFTMTTFSiSjiij ZU jij
iZU=+ ∈ ≠∑∈ P, with ij
ijikijijijx ijijiiijm
xxx xx
kijji dx dx
dx dx
ijij
Q()Pr{, }F(), ()(),
,, Q()F()
, ,{,...,}
=≤∩>= ==≡
≠≠> ∈
τττP P
P
q Q
,
,
0 00 iijikijij ijijikij
kj xkjxx
i xxiijij
=>∞≡ =≤>
≠ ≤≠==
Pr{, ,}Q( F()Pr{,,},()()
ττ τττ
= ), , FQfor
Pii0 00
PAP()(), ,,, PA()Siijtt ZUjSi
imt ZUi= ∈∑=…> =∈
for00 01 PAtPSSijt jZU
PSj= →∞= ∈∑lim()IR())PA(see for θ with ijijiikkj
t k ki
m ji iij jijm
ttxtxdx t xx
ijmijijiiij
P()(Q())()P() , Q()Q(),
,{,...,},,P(),,
=−+− =
∫∑ ∑
= ≠
≠ ≠=
∈>===
δ δδδ
1 00 0
00010
q , for Txdxii=−∞ ∫(Q())1 0
Problem oriented calculation; for constant failure rates, following approximation can often be used in steady-state IR R, SjSjP ZUj
()()θθθ≈ ∈∑>0 with P
ttT TTTjtj tijj jjjjjkk k
m ==== →∞→∞=∑lim()lim()PP ,
, 1 0PP Txdxjj=−∞ ∫(Q()),1 0 and Pj fromPPPjiij im = =∑ 0 ijm,{,...,},∈0 Pii≡0,Pj>0, Pj∑=1(one eq. for Pj , arbitrarily chosen, must be replaced by Pj∑=1); Pj= steady-state prob. of the irreducible embedded Markov chain
Time Homogeneous Markov Processes (method of integral equations) R()R(),, ()Sit ijx SjteetxdxiiZUtt ZU ji
Si j
i=+−−−∫
∈>ρρρ, ∑ =R 0∈ ≠
0 01 MTTFMTTFSi i
ij iSj with
ZU jij
iZU=+
ρ1 ∈ ∑ρρ ∈ ≠
, ρij= transition rate (see definitions below), ρρiij
m =∑ j ji= ≠0
PA()P(), ,,, PA()Siijtt ZUSii j
imt ZU= ∈∑=…> =∈
for 00 01 PAtPSSijt ZUj
SjP= →∞ ∈=∑lim()IR())PA(see for θ with eijijtikxkj
t k ki
m tetxdx t
ii ijijiiijijmji
P()P() ,{,...,},,P(),,
=+−−−
= ≠
∫∑ ∈>===≠
δρρρ δδδ
00 00010
,
for
IR(,)P()R(),..., SiijSjttt ZUj
im t+= ∈∑=θθθ ,,>0,0 IR()IR(,)R(),limSSSj tZUttPj jθθθθ=+= →∞∈∑>0 see below for Pj
Table 6.2 Relationshipsforthereliability,pointavailability&intervalreliabilityofsystemsdescribed by time-homogeneous Markov processes or semi-Markov processes (AppendicesA7.5 & A7.6)
Time Homogeneous Markov Processes (method of differential equations) R()P()',,Siijtt ZUSi j
iZUt, R(0)=1= ∈∑∈>0 MTTFMTTFSi i
ij iSj ZU jij
iZU=+
ρ1 ∈, ∑ρρ ∈ ≠ with ijjj jjij iij
m
ttt tttijjmt ZUijijiij
P()P() P() ()P()P(),
’’’ ’’’’’ ,
• ,,,, ,’
≡ =−+ =≠∑=…> =∈
and
obtained from P for ρρ ρρρ0
00 ’’’’ , ’’
, P(), P(),
=∈= ==≠∈=≠∑0 0100
0
for ,
for
ZU jiZUi
jji iij
m iji
ρρ
PA()P(), ,,, PA() Siijtt ZUSiij
imt ZU= ∈∑=…> =∈
for 00 01 PAtPSSijtZUj
SjP== →∞∈∑lim()IR())PA (see for θ ,
and
obtained
from
P,
with ijjj jjjiij iij
m ij
mttt tttjmt j i
jii
P()P() P() ()P()P()• ,,,, P(),, ,
≡ =−+ =≠ ≠
∑=…> == =∑
ρρ ρρ
0
00 01 0
for P(),,,jjiim000=≠=…
IR(,)P()R(),..., SiijSjttt ZUj
im t+= ∈∑=θθθ ,,>00 IR()IR(,)R(),limSSSjtZUttPj jθθθθ=+= →∞∈∑>0 PPfrom
with PttPPjtj tijjjiij iij
m === →∞→∞=≠∑lim()lim(), ,ρρ 0
ij
mPj,{,...,},∈>00(irreducible embedded Markov chain) PPm01++=… (one equation for Pj , arbitrarily chosen, must be dropped and replaced by PPm01++=…) R(
)Pr{(,]}, {,,} E[]R
* Siiim Sii
ttt t
upinZZUSUupstatesUdownstatesUUZZ MTTFZ
system is entered at ; stays for system; set of the
, set of the system failure- free time is entered at
,==∈==∪=… ===
00 0
0 SSiSiiSiSisSi SiiS
ttttt ttt
dZUsed upatZimPA t()R() ; R()R() PA()Pr{},,,
;P
˜˜ *
, Laplace transform of R() system is entered at
*=∈== ===…=
∞ −∞ ∫∫0 0000 rr{} (,)Pr{[,]},
;(
)Pr{[,]*
system at in steady-state or for=average av.in steady- -state or for IR system is entered at (in general) IR system in steady-state
upAA upinZZUupin
ttt ttttttt
S iSiiS
→→∞=∞ +=+=∈=+θθθθ0 or for T mean stay (sojourn) time in ( =/
for Markov processes
, =(
1-Q
fo
r SMP
) ;
=mean recurrence time o
f
0= **
t iT PiT MUT
ZxdxTZiiiki S
iii i
k k
m→∞ =∞ ∫= =∑
} ())11 0ρ PP === ===
==+==∑ ∈∈PAfPAfffP ZZZ
SudSSSduSudSduSSSiij ijjijj MDTMUTMDT ZUZUij tttt
/)/ P()Pr{}; P()Pr{
/() , *
system mean up time; =(system mean down time ; for Markovprocesses system in state at is entered at
system in state
-1 0
1ρ at at ,lim in steady-statelim transition from to insystem in at
holds for Markov processes only,
(=0),,
tttt ttttt
PTT ZZZ
jjjtijjjj ijijiijiijii
t t
} P()P()P()/ limPr{ (,]}
==== =+→∞→∞ ↓≡ρ δδρρρ δ0
1 P >0arbitrary * **or Markov processes
, is entered at "
can be replaced
by
"system in Z at is the mean time between a transition and the successive in steady-state or for (considering and one recognizes that in steady-state, or for a
(MP)
t fZtt=UUUUt MUTMDTtii SS
MUTS=∞ →∞→→→00""; ,,system beaves like a onesystem beaves like a one-item structure ; for practical applications ,
)
---, yielding forsystem as-good-as- newwith respect to the state considered;is the begin of the observation for interarrival timesused for (, ) ;
MPMTTFMUTMTTRMDTMUTMTTFSSSSO arepairedelementisasgoodasnewxrepairrestorationt== ==≈ 00();
Table 6.2 (cont.)
Section 6.2 considers the one-item repairable structure under general assump- tions, allowing a careful investigation of the asymptotic and stationary behavior.
For basic reliability structures encountered in practical applications (series, parallel, and series-parallel), investigations in Sections 6.3-6.6 begin by assuming constant failure and repair rates for every element in the reliability block diagram.
Distributions of repair times, and as far as possible of failure-free times, are then generalized step by step up to the case in which the process involved remains regenerative with a mini-mum number of regeneration states. This, also to show capability & limits of the models involved. For large series-parallel structures, approximate expressions are carefully developed in Section 6.7. Procedures for investigating repairable systems with complex structure (for which a reliability block diagram often does not exist) are given in Section 6.8 on the basis of practical examples, including imperfect switching, incomplete coverage, more than 2 states, phased-mission systems, common cause failures, and fault tolerant reconfigurable systems with reward & frequency/duration aspects. It is shown that tools developed in Appendix A7 (Tab. 6.2) can be used to solve many of the problems occurring in practical applications, on a case-by-case basis working with the diagram of transition rates or a time schedule. Alternative investigation methods, as well as computer-aided analysis are discussed in Section 6.9 and a Monte Carlo approach useful for rare events is given. Human reliability is considered in Section 6.10.
From the results of Sections 6.2 - 6.10, the following conclusions can be drawn:
1.As longas MTTRi<<MTTFi holds foreach element Ei inthereliabilityblock diagram, the shape of the distribution function of the repair time has small influence on MTTFS and PAS=AAS (Examples 6.8, 6.9, 6.10).
2. As a consequence of Point 1,it is preferable to start investigations byassuming Markov models (constant failure&repair rates for all elements, Table 6.2); in a secondstep,moreappropriatedistribution functions can be considered(p.277).
3. The assumption (6.2) of no further failure at system down has no influence on the reliability function; it allows a reduction of the state space and simplifies availability & interval reliability calculations (yielding good approximations).
4. Already for moderately large systems, use of Markov models can become time-consuming (up toe n! states for a rel. block diagram with . n elements);
approximateexpressions are thusimportant,andthe macro-structures intro- duced in Section 6.7 (Table 6.10) adheres well to many practical applications.
5. For large systems or complex structures, following possibilities are available:
• work directly with the diagram of transition rates (Section 6.8),
• calculation of the mean time to failure and steady-state availability at system level only (Table 6.2,Eqs.(A7.126),(A7.173),(A7.131),(A7.178)),
• use of approximate expressions (Sections 6.7 & 6.9.7, Tables 6.9 & 6.10),
• use of alternative methods or of Monte-Carlo simulation (Section 6.9).
6. Human reliability has to be evaluated on a case-by-case basis; having in mind, as far as possible, to bypass or greatly support dangerous human decisions.
E
Figure 6.1 Reliability block diagram for a one-item structure