For an embeddedrealtimeprocesscontrol system incorporating artificialintelligence programs, the system reliability is determined by both the softwaredriven response computation time and the hardwaredriven response execution time. A general model, based on the probability that the system can accomplish its mission under a time constraint without incurring failure, is proposed to estimate the softwarelhardware reliability of such a system. The factors which influence the proposed reliability measureare identified, and the effectsof mission time, heuristics and realtime constraintson the system reliabilitywith artificialintelligence planning procedures are illustrated. An optimal search procedure might not always yield a higher reliability than that of a nonoptimal search procedure. Hence, design parameters and conditions under which one search procedure is preferred over another, in terms of improved softwarehardware reliability, are identified.
Trang 1Effect of Artificial-Intelligence Planning-Procedures
on System Reliability
Ing-Ray Chen, Member IEEE
University of Mississippi, University
Farokh B Bastani, Member IEEE
University of Houston, Houston
Key Words - Real-time process-control system, Artificial in-
telligence, System reliability, Search heuristics
Reader Aids -
Purpose: Analyze a reliability model
Special math needed for explanations: Probability
Special math needed to use results: None
Results useful to: Computer designers and reliability analysts
Abstract - For an embedded real-time process-control system
incorporating artificial-intelligence programs, the system reliability
is determined by both the software-driven response computation
time and the hardware-driven response execution time A general
model, based on the probability that the system can accomplish
its mission under a time constraint without incurring failure, is
proposed to estimate the softwarelhardware reliability of such a
system The factors which influence the proposed reliability
measure are identified, and the effects of mission time, heuristics
and real-time constraints on the system reliability with artificial-
intelligence planning procedures are illustrated An optimal search
procedure might not always yield a higher reliability than that of
a non-optimal search procedure Hence, design parameters and
conditions under which one search procedure is preferred over
another, in terms of improved software/hardware reliability, are
identified
1 INTRODUCTION
In embedded computer systems, the computer is a part of
a larger system such as an automated manufacturing system,
a robot, or a defense system The computer usually provides
control functions and must operate in real-time to cope with
deadlines Typically, it executes an infinite loop in which it first
reads the sensor values, then spends time Tp to compute or plan
a response, and time T, to execute the response The reliabili-
ty of an embedded computer system during time t can be deter-
mined by viewing it as a series connection of statistically in-
dependent hardware and software components, so that:
Pr { mission accomplished I Tp = tp, T, = t,}
= l o l o
Notation
F joint Cdf of the time to compute and execute the response; it depends on both the hardware and soft- ware parts of the system
The integrand in (2) can be expressed as:
Hardware failures are due to many things such as wear and tear
on components, while software failures are due to:
residual faults in the program use of suboptimal algorithms, such as heuristics failure to meet real-time constraints
We make the following general assumptions concerning the reliability of an embedded computer system
1 Mission time The mission time is Tp + T,; the longer
it is, the more likely that the software or the hardware will fail
Tp & T, are inversely correlated, since generally the more time spent in planning (computing a response) the more likely that
the strategy is optimal, and vice versa
2 Hardware-component reliability Hardware components
have a variety of failure modes & mechanisms, at least some
of which can be affected by the software, eg, imposing excessive
“stress” on some hardware components When software af- fects the reliability of hardware components, then software & hardware failures are only conditionally statistically independent
3 Hardware-system reliability The software can affect the
form of Rhardware ( t ) For example, consider a 2-component hard- ware system One plan might require that both components be active in order to react to the sensor inputs, while another plan might require only one component to be active When software affects the reliability of hardware in this way, then software & hardware failures are only conditionally statistically independent
4 Residual software faults If software is not modified,
its failure rate due to faults that remain undetected is constant
5 Intrinsic sofrware faults Such faults, if any, are due
to fundamental limitations of the algorithm used in the software For example, the use of heuristics can result in occasional failures even though the algorithm is devoid of any residual faults This is modeled by the Pr{algorithm works correctly dur- ing planning time}
Rsystem ( t , = Rhardware ( t ) Rsoftware ( t , (1)
However, a more appropriate reliability measure is the prob-
ability that the system accomplishes its mission:
Pr {mission accomplished}
001 8-9529/91/08OO-0364$01 .WO1991 IEEE
Trang 2CHEN/BASTANI: EFFECT OF ARTIFICIAL-INTELLIGENCE PLANNING-PROCEDURES ON SYSTEM RELIABILITY 365
6 Failures due to real-time constraints In embedded com-
puter systems, the environment can change dynamically Hence,
if the planning time is too long then, when the response is ex-
ecuted, the environment might have changed too much for the
response to have any effect These failures are often characteriz-
ed by the inability of the system to meet real-time constraints
This is modeled by the Pr(p1an is correct for sensor readings
during the planning & execution times I plan is based on obser-
vations at time O}
All 6 elements can be combined to yield:
Pr {mission succeeds}
= 1: 1: Rh(tp+te;a) Pc(tp)Pv(tp,te)dF(tp,te) ( 5 )
Section 2 discusses the system reliability for two search
techniques:
generate and test heuristic pruning [8]
Example 1 illustrates the effect of mission time on the system
reliability while example 2 considers failures due to the use of
heuristics
Section 3 illustrates the tradeoff between the response com-
putation time and the response execution time for some search procedures in real-time situations and identifies conditions under which a non-optimal search strategy can provide a better system reliability than an optimal strategy
2 SEARCH HEURISTICS
Notation
tp planning time
tt? execution time
4 reliability of component i during time tp + te
A, constant software-failure rate due to residual faults
Ah constant hardware-failure rate (used only in examples)
Rh,gen ( times;parameters ) general hardware-reliability
function
Rh ( t ; a ) hardware-reliability function with parameter a,
when hardware & software failures are statistically
independent
Pc( t ) Pr {heuristics used in the planning method do work dur-
ing time t }
P V ( t l , r 2 ) Pr{plan is valid during time tl +rz I plan is based
on observations at time O}
gauf(.) standard normal (Gaussian) Cdf
Other, standard notation is given in “Information for Readers
& Authors” at the rear of each issue
In this paper we focus on the application of some artificial-
intelligence search strategies for fault-tolerant process-control
systems and address the design tradeoff in the optimality of
search strategies vs the satisfaction of real-time constraints
f e ( f ) , F e ( t ) pdf, Cdf of te
Detailed Assumptions
1 Hardware-system failure is statistically independent of
software; this allows us to replace Rh,gen by Rh, the effective
hardware reliability
2 In all the examples, Rh = exp ( - A h t ) , for simplicity
and tractability
3 The software has been debugged completely, ie, As =
0
Then (4) becomes:
Pr {mission succeeds}
2.1 Effect of Mission Time
Assumptions
1 PC ( t p ) = 1, ie, there are no software errors due to the
2 Pv(rp,te) = 1 , ie, there are no real-time constraints
3 fe ( r e ) = 6 ( r e ) , ie, the response execution time is zero;
6 ( r ) is the standard impulse function: it has unit area concen-
trated in the immediate vicinity of t
use of heuristics or other approximate algorithms
Then (5) becomes:
Example I
step -
takes T time units to process has a probability p of passing the test
This implies that:
Consider a Generate & Test search procedure where each
Then,
Pr {Search procedure terminates and the system is alive}
W
i = l
Example 1‘
Same as example 1, except: To obtain a closed-form solu- tion, make the additional simplistic assumption of constant hardware-failure rate Then (8) becomes:
Trang 3366
Pr{Search procedure terminates and the system is alive}
Thus for this simplistic case, the reliability of the Generate &
Test search procedure is good when the hardware failure rate
is small and is poor when the hardware failure rate is large -
a reasonable result
IEEE TRANSACTIONS ON RELIABILITY, VOL 40, NO 3 , 1991 AUGUST
2.2 Effect of Heuristics
Assumptions
1 Pv(tp,t,) = 1, ie, there are no real-time constraints
2 f, (t,) = 6 (t,) , ie, the response execution time is zero
Then ( 5 ) becomes:
Pr{mission succeeds} = Rh(tp;a) Pc(tp)fp(tp)dtp (10)
Example 2
Consider the Branch & Bound method [8] which uses
heuristics to limit the search for an optimal solution The
Generate & Test technique discussed in section 2.1 is not suitable
where optimal or reasonably optimal solutions are required,
since all solutions must be inspected
Notation
b
D
r
d
h ( i )
T
U(a,b) uniform Cdf over ( a , b )
branching factor of the resulting tree
maximum depth of the tree
Pr {heuristic does result in a correct answer}, viz, the
reliability of the heuristic
depth of the search tree
Pr { i nodes are examined}
time duration to analyze each node
Assumption
1 d/ D = Pr{correct answer is found I d)
Now, if the depth of the search tree is limited to d, then be-
tween 2bd12 and bd nodes have to be examined [8] The results
are:
Tp - U(2bd12T,bdT)
Pc(iT) = rd/D
Example 2'
Same as example 2, except: To obtain a closed-form solu-
tion, make the additional simplistic assumption of constant
hardware-failure rate Then -
Pr { system completes the search successfully} =
When h is the uniform distribution over ( 2bd12,bd), then (1 1) becomes:
R ( r ,d ) = - rd exp( -hh2bd12T) -ex p[ -X h( bd+l ) n
D (bd-2bdI2+1)(1-exp( -hhT)) '(12)
The optimal value of d can be obtained by solving:
Example 2"
Same as example 2 except: Examine each node up to depth
d without using any heuristics Thus, r = 1, and the system reliability is:
The optimal value of d is obtained by solving:
The optimal value of d (for this example 2 I' ) is independent
of the maximum depth of the tree D ( 1 I d I D ) Compare this with the heuristic search (example 2'); there is a value of
r below which a complete search results in a more reliable system
3 REAL-TIME CONSTRAINTS
In section 2 we assumed that the system state does not change while the response is being computed While this is sometimes true such as playing chess, it is not true for real-time systems Nonnally, in real-time processcontrol systems, there is a stringent real-time constraint which must be satisfied When a real-time situation arises, there is a response computation period in which
an optimal or a near-optimal strategy must be formulated This period is followed by a response execution time which activates the underlying hardware mechanisms and carries out the strategy Very frequently, if the system spends too much time in the response computation period for formulating an optimal strategy, then there is a high risk that the response is not completed within the real-time constraint - since there is not enough time left for response execution On the other hand, if a non-optmal strategy
is selected in order to meet the real-time constraint, the reliabili-
ty of the resulting strategy might not be acceptable due to the poor hardware reliability associated with the strategy selection Thus, there is a tradeoff between response computation-time and response execution-time under a real-time situation
Assumptions
1 Pc(t,,) = 1, ie, there are no planning errors
2 PV(tp3te)
1, if tp + te I tR, the real-time constraint
Trang 4CHEN/BASTANI: EFFECT OF ARTIFICIAL-INTELLIGENCE PLANNING-PROCEDURES ON SYSTEM RELIABILITY
Then -
Pr {mission succeeds 1 planning method}
= 1“ rlTp=rpfp(tplplanning method) dtp, (17)
0
‘ITP=tp E Rh ( tp + t,;cr) f, (t, I planning method)dt, (18)
This section illustrates this tradeoff by investigating some
artificial-intelligence search procedures Specifically, we in-
vestigate the use of A * , which is known to be optimal [3], with
some other search heuristics which, we show, can provide a
better system reliability (hardware & software) under certain
conditions Our intention is not to explore findings of heuristics
which would lead to better real-time performance of search pro-
cedures [4]; rather, we are interested in identifying the condi-
tions under which a search strategy can provide a better system-
reliability over the others We restrict our analysis to a very
simple problem-space so as to obtain tractable results
3.1 Unique Solution Path
Notation
h* ( n )
h ( n ) an estimate of h * ( n )
& positive)
time duration to analyze each node
minimum path cost from a node n to a goal node
E heuristic error, a non-negative number (usually small
T
It is well known that in the context of a graph problem,
A* (or its corresponding algorithms such as IDA* [2]) is op-
timal when its admissibility condition is satisfied That is, it
is guaranteed that A* always finds a minimal cost (optimal) path
to a goal when h ( n ) I h* ( n ) for all nodes n However, in
real situations, it is not practical to rely on the statement, “a
heuristic always satisfies the inequality condition”, other than
for trivial cases such as h ( n ) = 0
Pohl [6] has analyzed the worst case for A* when -
h * ( n ) - E I h ( n ) I h * ( n ) + E
In the problem space of infinite binary trees with unit cost on
all the arcs of the search graph, Pohl concluded that, in the worst
case, k2‘ + 1 nodes have to be visited before the unique goal
node, which resides at level k of the tree, can be located When
this result is compared with the worst case of other search
strategies (eg, pure Branch & Bound) which requires 2k+’ -
1, we can perform a worst case tradeoff analysis for the two
search strategies as follows
In the infinite binary-tree problem-space when A* is used
to search for the solution path, tp = (k2‘ + 1 ) T i n the worst
case The f, ( t, I planning method) can be any probability func-
tion obtained from a large representative sample of problem in-
stances pertaining to a system
Example 3
367
Let F,(t, I planning method) = gaufi(t, - p)/a] (1.9)
This models a system in which the time to execute a path has
a normal (Gaussian) distribution with mean p and standard deviation a Hence, (18) becomes:
.[gauf( (tR-tp+hh a 2 - p ) / a ) - gauf( ( h h a 2 - p ) / a ) ~
(20)
Example 3’
Same as example 3, except: ( p , a) = (K,C, 0 )
Notation
K, a constant
C cost of the solution path obtained using the given plan- ning method
The worst-case system reliability for A* occurs for Tp = T(k2‘
+ 1):
exp[ - A,, (k2‘T+ T+ K,C) , if tR > T( k2‘ + 1 ) + K,C
(21)
otherwise
Lo9
4kt = The interpretation of (21) is intuitively clear -
in order for A* to satisfy the real-time constraint, tR On the
other hand, if the pure Branch & Bound heuristic is used to guide the search, then in the worst case the real-time constraint is met when -
k < lOg[(tR - K,C + T ) / T ] - 1 (23)
The system reliability is:
exp[ - Ah ( 2 k + ’ T - T + K,C)] when the inequality is true From this example 3 ’ we see that strategies that do not use heuristic functions are not necessarily worse than those that use heuristics - especially if certain information is given a priori For example, if the depth of the unique goal state is known, then (23) can be used as a criterion for using the pure Branch & Bound heuristic to enhance the reliability of the system This is particularly important when the heuristic error,
E , is unknown
3.2 Multiple Solution Paths
In section 3.1, we assumed that there is a unique solution path Here, we consider the possibility of multiple paths, all
Trang 5368 IEEE TRANSACTIONS ON RELIABILITY, VOL 40, NO 3, 1991 AUGUST
of which could lead to the same goal state Certainly, of these
multiple solution paths, some are optimal whereas others are
not Therefore, there is a difference in the quality of the solu-
tion Generally, the more time one invests in finding a solu-
tion, the more likely that solution is close to optimal However,
investing more time might not be permitted in real-time and a
near-optimal solution might be desired as a compromise between
the real-time constraint and the maximum system-reliability
This section illustrates this compromise by comparing two
search strategies using two heuristics: A* and hill-climbing [8]
A* is well known in the domain of artificial-intelligence search
Hill-climbing represents an extreme case where search efficien-
cy, rather than solution quality as used in A * , is used to guide
the search for a solution path
Assumptions for Hill Climbing
1 The search space is an infinite binary tree (as in the
analysis in section 3.1)
2 Arcs no longer have the same weight
3 Unlike A * , which uses arc weight as the search
heuristic, hill climbing uses the remaining number of nodes to
guide the search for a solution path
4 To obtain maximum search efficiency at the expense
of solution quality, the search is streamlined from level to level
without checking whether there are other nodes at the same level
that may lead to a more optimal solution path
5 To ensure that there is at least one solution -
a All leaf nodes are solution nodes
b The search tree has the monotonic and admissible
property [41:
h ( n ) I h ( n ' ) + C ( n , n ' ) (25)
Notation
n'
C ( n , n ' ) actual distance between n and n'
k
any successor node of n
depth of an optimal solution-path in the tree
Assumptions for A*
1 [same as Hill Climbing assumptions 1,2 and 51
Thus, C ( n , n ' ) = 1 forhillclimbingandC(n,n')' > OforA*
Number of Nodes to Be Visited
A Hill Climbing: k + 1 in the worst case, 2 in the best
case, and ( k + 3 ) / 2 for the average (assuming a uniform
distribution over [2, ( k + 3 ) /2]
B A * : Between k + 1 and 2 k + 1 - 1, and 2k + k/2 for
the average (assuming the same uniform distribution as in A )
The worst-case upper bound occurs when the probability that
the relative error exceeds some fixed positive quantity is greate1
than 1/2 [l]
The gain in search efficiency in hill climbing is associated with the decline in the quality of the solution:
Notation
chc cost of the solution path associated with hill climbing
CA cost of the solution path associated with A*
Ehc non-negative number indicating the degradation of the solution quality
PrG} PrG nodes are expanded}
The mean system-reliability can be calculated by conditioning on
Pr { Tp = j T ) :
tR - j T
= Rh UT+ te;a) ( te 1 planning method)dt,
PrG}
The last equality in (27) follows from (18)
Example 4 Assumptions
1 Pr G} is the uniform distribution over
11 for A* and over [2,k+ 11 for hill climbing
(27)
[k+1, 2k +' -
2 The hardware failure rate is a constant, Ah
3 F,(te) is given by (19)
e (AZ,u:.-2pA.X,)/2 "+'-'
( 2 k + ' - k - 1 ) j = k + l
Example 4'
Same as example 4, except: (p,,, U,,) = (KcC,,,O) where
y = A * , hc - similar to example 3 ' Then (28) & (29) simplify
to:
Trang 62 k + 1 - 1 ACKNOWLEDGME~T
e - ’hKcCA*
r ( A * ) =
2 ( 2 k + 1 -k - 1 ) j = k + l
Foundation under grant CCR-9110816
(30)
REFERENCES
1, if fR>JT + K,CA*
- 1, otherwise [l] N Huyn, R Dechter, J Pearl, “Probabilistic analysis of the complexity
of A * ” , Artificial Intelligence, vol 15, 1980, pp 241-254
[2] R E Korf, “Depth-first iterative-deepening: an optimal admissible tree
search”, Artijcial Intelligence, vol 27, 1985, pp 97-109
[3] J Pearl, “Some recent results in heuristic search theory”, IEEE Trans
Pattem Analysis and Machine Intelligence, vol PAMI-6, 1984 Jan, pp 1-12
[4] J Pearl, Heuristics, 1984; Addison-Wesley
[5] J Pearl, J H Kim, “Studies in semi-admissible heuristics”, ZEEE Tram Pattem Analysis and Machine Intelligence, vol PAMI-4, 1982 Jul, pp [6] I Pohl, “First results on the effect of error in heuristic search”, Machine
(31)
I
S j ( A * ) =
e -’,,Kc( 1 +ehC)CA k + 1
r ( h c ) = e -’”‘T(sj(hC) 4- ) (32)
j = 2
1, if tR>jT+K,( 1 +Ehc)CA*
2k
392-399
I - 1, otherwise
S j ( h C ) =
Eq (30) - (33) imply:
(33) Intelligence, vol 5 , 1970, pp 219-236
and Control Systems, 1987; North Holland
[7] N Viswanadham, V V S Sarma, M G Singh, Reliability of Computer
[8] P H Winston, Artijcial Intelligence, 2nd edition, 1984; Addison-Wesley
If the real-time constraint can be satisfied for both A* and
hill climbing, ie, for all j , S j ( A * ) = Sj(hc ) = 1, then A*
will have a better system reliability than that obtained from
hill climbing, ie,
AUTHORS
Dr Ing-Ray Chen; Department of Computer and Information Science; Weir
exp[- Ah ( f p + &CA )] 2 exp[ - Ah ( tp + K, ( 1 + eh,) CA )] 302; University of Mississippi, University, Mississippi 38677 USA
Ing-Ray Chen (S’86, M’90) received the BS from the National Taiwan
University in 1978, and the MS & PhD in Computer Science from the Univer- sity of Houston, University Park in 1985 & 1988 He is an Assistant Professor
of Computer and Information Science at the University of Mississippi His research interests include distributed systems, fault-tolerant systems, perfor- mance & reliability evaluation, and application of artificial intelligence to in- dustrial process-contro~
Dr Farokh B Bastani; Department of Computer Science; University of Houston; Houston Texas 77004 USA
This is so because Ehc 2 0
The advantage of hill-climbing strategy is that it has a higher
probability of satisfying the real-time constraint as can be Seen
by the condition required by it during the response
computation period; however, this advantage disappears as
Ehc becomes larger, ie, when:
f R - ( k + 1 ) T
KccA*
€hc > - 1 (34) ty, vol 39, 1990 Jun Farokh B Bastani (M’82): For biography, see IEEE Trans Reliabili-
From these results, we see that there is a tradeoff between
the system reliability and the satisfaction of the real-time con-
Manuscript TR89-101 received 1989 July 10; revised 1990 April 2; revised
1991 February 4
straint especially when the constraint, tR, is stringent IEEE Log Number 00174 4 T R F