Effect of ArtificialIntelligence PlanningProcedures on System Reliability

For an embeddedrealtimeprocesscontrol system incorporating artificialintelligence programs, the system reliability is determined by both the softwaredriven response computation time and the hardwaredriven response execution time. A general model, based on the probability that the system can accomplish its mission under a time constraint without incurring failure, is proposed to estimate the softwarelhardware reliability of such a system. The factors which influence the proposed reliability measureare identified, and the effectsof mission time, heuristics and realtime constraintson the system reliabilitywith artificialintelligence planning procedures are illustrated. An optimal search procedure might not always yield a higher reliability than that of a nonoptimal search procedure. Hence, design parameters and conditions under which one search procedure is preferred over another, in terms of improved softwarehardware reliability, are identified.

Trang 1

Effect of Artificial-Intelligence Planning-Procedures

on System Reliability

Ing-Ray Chen, Member IEEE

University of Mississippi, University

Farokh B Bastani, Member IEEE

University of Houston, Houston

Key Words - Real-time process-control system, Artificial in-

telligence, System reliability, Search heuristics

Reader Aids -

Purpose: Analyze a reliability model

Special math needed for explanations: Probability

Special math needed to use results: None

Results useful to: Computer designers and reliability analysts

Abstract - For an embedded real-time process-control system

incorporating artificial-intelligence programs, the system reliability

is determined by both the software-driven response computation

time and the hardware-driven response execution time A general

model, based on the probability that the system can accomplish

its mission under a time constraint without incurring failure, is

proposed to estimate the softwarelhardware reliability of such a

system The factors which influence the proposed reliability

measure are identified, and the effects of mission time, heuristics

and real-time constraints on the system reliability with artificial-

intelligence planning procedures are illustrated An optimal search

procedure might not always yield a higher reliability than that of

a non-optimal search procedure Hence, design parameters and

conditions under which one search procedure is preferred over

another, in terms of improved software/hardware reliability, are

identified

1 INTRODUCTION

In embedded computer systems, the computer is a part of

a larger system such as an automated manufacturing system,

a robot, or a defense system The computer usually provides

control functions and must operate in real-time to cope with

deadlines Typically, it executes an infinite loop in which it first

reads the sensor values, then spends time Tp to compute or plan

a response, and time T, to execute the response The reliabili-

ty of an embedded computer system during time t can be deter-

mined by viewing it as a series connection of statistically in-

dependent hardware and software components, so that:

Pr { mission accomplished I Tp = tp, T, = t,}

= l o l o

Notation

F joint Cdf of the time to compute and execute the response; it depends on both the hardware and software parts of the system

The integrand in (2) can be expressed as:

Hardware failures are due to many things such as wear and tear

on components, while software failures are due to:

residual faults in the program use of suboptimal algorithms, such as heuristics failure to meet real-time constraints

We make the following general assumptions concerning the reliability of an embedded computer system

1 Mission time The mission time is Tp + T,; the longer

it is, the more likely that the software or the hardware will fail

Tp & T, are inversely correlated, since generally the more time spent in planning (computing a response) the more likely that

the strategy is optimal, and vice versa

2 Hardware-component reliability Hardware components

have a variety of failure modes & mechanisms, at least some

of which can be affected by the software, eg, imposing excessive

“stress” on some hardware components When software affects the reliability of hardware components, then software & hardware failures are only conditionally statistically independent

3 Hardware-system reliability The software can affect the

form of Rhardware ( t ) For example, consider a 2-component hardware system One plan might require that both components be active in order to react to the sensor inputs, while another plan might require only one component to be active When software affects the reliability of hardware in this way, then software & hardware failures are only conditionally statistically independent

4 Residual software faults If software is not modified,

its failure rate due to faults that remain undetected is constant

5 Intrinsic sofrware faults Such faults, if any, are due

to fundamental limitations of the algorithm used in the software For example, the use of heuristics can result in occasional failures even though the algorithm is devoid of any residual faults This is modeled by the Pr{algorithm works correctly during planning time}

Rsystem ( t , = Rhardware ( t ) Rsoftware ( t , (1)

However, a more appropriate reliability measure is the prob-

ability that the system accomplishes its mission:

Pr {mission accomplished}

001 8-9529/91/08OO-0364$01 .WO1991 IEEE

Trang 2

CHEN/BASTANI: EFFECT OF ARTIFICIAL-INTELLIGENCE PLANNING-PROCEDURES ON SYSTEM RELIABILITY 365

6 Failures due to real-time constraints In embedded com-

puter systems, the environment can change dynamically Hence,

if the planning time is too long then, when the response is ex-

ecuted, the environment might have changed too much for the

response to have any effect These failures are often characteriz-

ed by the inability of the system to meet real-time constraints

This is modeled by the Pr(p1an is correct for sensor readings

during the planning & execution times I plan is based on obser-

vations at time O}

All 6 elements can be combined to yield:

Pr {mission succeeds}

= 1: 1: Rh(tp+te;a) Pc(tp)Pv(tp,te)dF(tp,te) ( 5 )

Section 2 discusses the system reliability for two search

techniques:

generate and test heuristic pruning [8]

Example 1 illustrates the effect of mission time on the system

reliability while example 2 considers failures due to the use of

heuristics

Section 3 illustrates the tradeoff between the response com-

putation time and the response execution time for some search procedures in real-time situations and identifies conditions under which a non-optimal search strategy can provide a better system reliability than an optimal strategy

2 SEARCH HEURISTICS

Notation

tp planning time

tt? execution time

4 reliability of component i during time tp + te

A, constant software-failure rate due to residual faults

Ah constant hardware-failure rate (used only in examples)

Rh,gen ( times;parameters ) general hardware-reliability

function

Rh ( t ; a ) hardware-reliability function with parameter a,

when hardware & software failures are statistically

independent

Pc( t ) Pr {heuristics used in the planning method do work dur-

ing time t }

P V ( t l , r 2 ) Pr{plan is valid during time tl +rz I plan is based

on observations at time O}

gauf(.) standard normal (Gaussian) Cdf

Other, standard notation is given in “Information for Readers

& Authors” at the rear of each issue

In this paper we focus on the application of some artificial-

intelligence search strategies for fault-tolerant process-control

systems and address the design tradeoff in the optimality of

search strategies vs the satisfaction of real-time constraints

f e ( f ) , F e ( t ) pdf, Cdf of te

Detailed Assumptions

1 Hardware-system failure is statistically independent of

software; this allows us to replace Rh,gen by Rh, the effective

hardware reliability

2 In all the examples, Rh = exp ( - A h t ) , for simplicity

and tractability

3 The software has been debugged completely, ie, As =

0

Then (4) becomes:

Pr {mission succeeds}

2.1 Effect of Mission Time

Assumptions

1 PC ( t p ) = 1, ie, there are no software errors due to the

2 Pv(rp,te) = 1 , ie, there are no real-time constraints

3 fe ( r e ) = 6 ( r e ) , ie, the response execution time is zero;

6 ( r ) is the standard impulse function: it has unit area concen-

trated in the immediate vicinity of t

use of heuristics or other approximate algorithms

Then (5) becomes:

Example I

step -

takes T time units to process has a probability p of passing the test

This implies that:

Consider a Generate & Test search procedure where each

Then,

Pr {Search procedure terminates and the system is alive}

W

i = l

Example 1‘

Same as example 1, except: To obtain a closed-form solution, make the additional simplistic assumption of constant hardware-failure rate Then (8) becomes:

Trang 3

366

Pr{Search procedure terminates and the system is alive}

Thus for this simplistic case, the reliability of the Generate &

Test search procedure is good when the hardware failure rate

is small and is poor when the hardware failure rate is large -

a reasonable result

IEEE TRANSACTIONS ON RELIABILITY, VOL 40, NO 3 , 1991 AUGUST

2.2 Effect of Heuristics

Assumptions

1 Pv(tp,t,) = 1, ie, there are no real-time constraints

2 f, (t,) = 6 (t,) , ie, the response execution time is zero

Then ( 5 ) becomes:

Pr{mission succeeds} = Rh(tp;a) Pc(tp)fp(tp)dtp (10)

Example 2

Consider the Branch & Bound method [8] which uses

heuristics to limit the search for an optimal solution The

Generate & Test technique discussed in section 2.1 is not suitable

where optimal or reasonably optimal solutions are required,

since all solutions must be inspected

Notation

b

D

r

d

h ( i )

T

U(a,b) uniform Cdf over ( a , b )

branching factor of the resulting tree

maximum depth of the tree

Pr {heuristic does result in a correct answer}, viz, the

reliability of the heuristic

depth of the search tree

Pr { i nodes are examined}

time duration to analyze each node

Assumption

1 d/ D = Pr{correct answer is found I d)

Now, if the depth of the search tree is limited to d, then be-

tween 2bd12 and bd nodes have to be examined [8] The results

are:

Tp - U(2bd12T,bdT)

Pc(iT) = rd/D

Example 2'

Same as example 2, except: To obtain a closed-form solu-

tion, make the additional simplistic assumption of constant

hardware-failure rate Then -

Pr { system completes the search successfully} =

When h is the uniform distribution over ( 2bd12,bd), then (1 1) becomes:

R ( r ,d ) = - rd exp( -hh2bd12T) -ex p[ -X h( bd+l ) n

D (bd-2bdI2+1)(1-exp( -hhT)) '(12)

The optimal value of d can be obtained by solving:

Example 2"

Same as example 2 except: Examine each node up to depth

d without using any heuristics Thus, r = 1, and the system reliability is:

The optimal value of d is obtained by solving:

The optimal value of d (for this example 2 I' ) is independent

of the maximum depth of the tree D ( 1 I d I D ) Compare this with the heuristic search (example 2'); there is a value of

r below which a complete search results in a more reliable system

3 REAL-TIME CONSTRAINTS

In section 2 we assumed that the system state does not change while the response is being computed While this is sometimes true such as playing chess, it is not true for real-time systems Nonnally, in real-time processcontrol systems, there is a stringent real-time constraint which must be satisfied When a real-time situation arises, there is a response computation period in which

an optimal or a near-optimal strategy must be formulated This period is followed by a response execution time which activates the underlying hardware mechanisms and carries out the strategy Very frequently, if the system spends too much time in the response computation period for formulating an optimal strategy, then there is a high risk that the response is not completed within the real-time constraint - since there is not enough time left for response execution On the other hand, if a non-optmal strategy

is selected in order to meet the real-time constraint, the reliabili-

ty of the resulting strategy might not be acceptable due to the poor hardware reliability associated with the strategy selection Thus, there is a tradeoff between response computation-time and response execution-time under a real-time situation

Assumptions

1 Pc(t,,) = 1, ie, there are no planning errors

2 PV(tp3te)

1, if tp + te I tR, the real-time constraint

Trang 4

CHEN/BASTANI: EFFECT OF ARTIFICIAL-INTELLIGENCE PLANNING-PROCEDURES ON SYSTEM RELIABILITY

Then -

Pr {mission succeeds 1 planning method}

= 1“ rlTp=rpfp(tplplanning method) dtp, (17)

0

‘ITP=tp E Rh ( tp + t,;cr) f, (t, I planning method)dt, (18)

This section illustrates this tradeoff by investigating some

artificial-intelligence search procedures Specifically, we in-

vestigate the use of A * , which is known to be optimal [3], with

some other search heuristics which, we show, can provide a

better system reliability (hardware & software) under certain

conditions Our intention is not to explore findings of heuristics

which would lead to better real-time performance of search pro-

cedures [4]; rather, we are interested in identifying the condi-

tions under which a search strategy can provide a better system-

reliability over the others We restrict our analysis to a very

simple problem-space so as to obtain tractable results

3.1 Unique Solution Path

Notation

h* ( n )

h ( n ) an estimate of h * ( n )

& positive)

time duration to analyze each node

minimum path cost from a node n to a goal node

E heuristic error, a non-negative number (usually small

T

It is well known that in the context of a graph problem,

A* (or its corresponding algorithms such as IDA* [2]) is op-

timal when its admissibility condition is satisfied That is, it

is guaranteed that A* always finds a minimal cost (optimal) path

to a goal when h ( n ) I h* ( n ) for all nodes n However, in

real situations, it is not practical to rely on the statement, “a

heuristic always satisfies the inequality condition”, other than

for trivial cases such as h ( n ) = 0

Pohl [6] has analyzed the worst case for A* when -

h * ( n ) - E I h ( n ) I h * ( n ) + E

In the problem space of infinite binary trees with unit cost on

all the arcs of the search graph, Pohl concluded that, in the worst

case, k2‘ + 1 nodes have to be visited before the unique goal

node, which resides at level k of the tree, can be located When

this result is compared with the worst case of other search

strategies (eg, pure Branch & Bound) which requires 2k+’ -

1, we can perform a worst case tradeoff analysis for the two

search strategies as follows

In the infinite binary-tree problem-space when A* is used

to search for the solution path, tp = (k2‘ + 1 ) T i n the worst

case The f, ( t, I planning method) can be any probability func-

tion obtained from a large representative sample of problem in-

stances pertaining to a system

Example 3

367

Let F,(t, I planning method) = gaufi(t, - p)/a] (1.9)

This models a system in which the time to execute a path has

a normal (Gaussian) distribution with mean p and standard deviation a Hence, (18) becomes:

.[gauf( (tR-tp+hh a 2 - p ) / a ) - gauf( ( h h a 2 - p ) / a ) ~

(20)

Example 3’

Same as example 3, except: ( p , a) = (K,C, 0 )

Notation

K, a constant

C cost of the solution path obtained using the given planning method

The worst-case system reliability for A* occurs for Tp = T(k2‘

+ 1):

exp[ - A,, (k2‘T+ T+ K,C) , if tR > T( k2‘ + 1 ) + K,C

(21)

otherwise

Lo9

4kt = The interpretation of (21) is intuitively clear -

in order for A* to satisfy the real-time constraint, tR On the

other hand, if the pure Branch & Bound heuristic is used to guide the search, then in the worst case the real-time constraint is met when -

k < lOg[(tR - K,C + T ) / T ] - 1 (23)

The system reliability is:

exp[ - Ah ( 2 k + ’ T - T + K,C)] when the inequality is true From this example 3 ’ we see that strategies that do not use heuristic functions are not necessarily worse than those that use heuristics - especially if certain information is given a priori For example, if the depth of the unique goal state is known, then (23) can be used as a criterion for using the pure Branch & Bound heuristic to enhance the reliability of the system This is particularly important when the heuristic error,

E , is unknown

3.2 Multiple Solution Paths

In section 3.1, we assumed that there is a unique solution path Here, we consider the possibility of multiple paths, all

Trang 5

368 IEEE TRANSACTIONS ON RELIABILITY, VOL 40, NO 3, 1991 AUGUST

of which could lead to the same goal state Certainly, of these

multiple solution paths, some are optimal whereas others are

not Therefore, there is a difference in the quality of the solu-

tion Generally, the more time one invests in finding a solu-

tion, the more likely that solution is close to optimal However,

investing more time might not be permitted in real-time and a

near-optimal solution might be desired as a compromise between

the real-time constraint and the maximum system-reliability

This section illustrates this compromise by comparing two

search strategies using two heuristics: A* and hill-climbing [8]

A* is well known in the domain of artificial-intelligence search

Hill-climbing represents an extreme case where search efficien-

cy, rather than solution quality as used in A * , is used to guide

the search for a solution path

Assumptions for Hill Climbing

1 The search space is an infinite binary tree (as in the

analysis in section 3.1)

2 Arcs no longer have the same weight

3 Unlike A * , which uses arc weight as the search

heuristic, hill climbing uses the remaining number of nodes to

guide the search for a solution path

4 To obtain maximum search efficiency at the expense

of solution quality, the search is streamlined from level to level

without checking whether there are other nodes at the same level

that may lead to a more optimal solution path

5 To ensure that there is at least one solution -

a All leaf nodes are solution nodes

b The search tree has the monotonic and admissible

property [41:

h ( n ) I h ( n ' ) + C ( n , n ' ) (25)

Notation

n'

C ( n , n ' ) actual distance between n and n'

k

any successor node of n

depth of an optimal solution-path in the tree

Assumptions for A*

1 [same as Hill Climbing assumptions 1,2 and 51

Thus, C ( n , n ' ) = 1 forhillclimbingandC(n,n')' > OforA*

Number of Nodes to Be Visited

A Hill Climbing: k + 1 in the worst case, 2 in the best

case, and ( k + 3 ) / 2 for the average (assuming a uniform

distribution over [2, ( k + 3 ) /2]

B A * : Between k + 1 and 2 k + 1 - 1, and 2k + k/2 for

the average (assuming the same uniform distribution as in A )

The worst-case upper bound occurs when the probability that

the relative error exceeds some fixed positive quantity is greate1

than 1/2 [l]

The gain in search efficiency in hill climbing is associated with the decline in the quality of the solution:

Notation

chc cost of the solution path associated with hill climbing

CA cost of the solution path associated with A*

Ehc non-negative number indicating the degradation of the solution quality

PrG} PrG nodes are expanded}

The mean system-reliability can be calculated by conditioning on

Pr { Tp = j T ) :

tR - j T

= Rh UT+ te;a) ( te 1 planning method)dt,

PrG}

The last equality in (27) follows from (18)

Example 4 Assumptions

1 Pr G} is the uniform distribution over

11 for A* and over [2,k+ 11 for hill climbing

(27)

[k+1, 2k +' -

2 The hardware failure rate is a constant, Ah

3 F,(te) is given by (19)

e (AZ,u:.-2pA.X,)/2 "+'-'

( 2 k + ' - k - 1 ) j = k + l

Example 4'

Same as example 4, except: (p,,, U,,) = (KcC,,,O) where

y = A * , hc - similar to example 3 ' Then (28) & (29) simplify

to:

Trang 6

2 k + 1 - 1 ACKNOWLEDGME~T

e - ’hKcCA*

r ( A * ) =

2 ( 2 k + 1 -k - 1 ) j = k + l

Foundation under grant CCR-9110816

(30)

REFERENCES

1, if fR>JT + K,CA*

- 1, otherwise [l] N Huyn, R Dechter, J Pearl, “Probabilistic analysis of the complexity

of A * ” , Artificial Intelligence, vol 15, 1980, pp 241-254

[2] R E Korf, “Depth-first iterative-deepening: an optimal admissible tree

search”, Artijcial Intelligence, vol 27, 1985, pp 97-109

[3] J Pearl, “Some recent results in heuristic search theory”, IEEE Trans

Pattem Analysis and Machine Intelligence, vol PAMI-6, 1984 Jan, pp 1-12

[4] J Pearl, Heuristics, 1984; Addison-Wesley

[5] J Pearl, J H Kim, “Studies in semi-admissible heuristics”, ZEEE Tram Pattem Analysis and Machine Intelligence, vol PAMI-4, 1982 Jul, pp [6] I Pohl, “First results on the effect of error in heuristic search”, Machine

(31)

I

S j ( A * ) =

e -’,,Kc( 1 +ehC)CA k + 1

r ( h c ) = e -’”‘T(sj(hC) 4- ) (32)

j = 2

1, if tR>jT+K,( 1 +Ehc)CA*

2k

392-399

I - 1, otherwise

S j ( h C ) =

Eq (30) - (33) imply:

(33) Intelligence, vol 5 , 1970, pp 219-236

and Control Systems, 1987; North Holland

[7] N Viswanadham, V V S Sarma, M G Singh, Reliability of Computer

[8] P H Winston, Artijcial Intelligence, 2nd edition, 1984; Addison-Wesley

If the real-time constraint can be satisfied for both A* and

hill climbing, ie, for all j , S j ( A * ) = Sj(hc ) = 1, then A*

will have a better system reliability than that obtained from

hill climbing, ie,

AUTHORS

Dr Ing-Ray Chen; Department of Computer and Information Science; Weir

exp[- Ah ( f p + &CA )] 2 exp[ - Ah ( tp + K, ( 1 + eh,) CA )] 302; University of Mississippi, University, Mississippi 38677 USA

Ing-Ray Chen (S’86, M’90) received the BS from the National Taiwan

University in 1978, and the MS & PhD in Computer Science from the Univer- sity of Houston, University Park in 1985 & 1988 He is an Assistant Professor

of Computer and Information Science at the University of Mississippi His research interests include distributed systems, fault-tolerant systems, performance & reliability evaluation, and application of artificial intelligence to in- dustrial process-contro~

Dr Farokh B Bastani; Department of Computer Science; University of Houston; Houston Texas 77004 USA

This is so because Ehc 2 0

The advantage of hill-climbing strategy is that it has a higher

probability of satisfying the real-time constraint as can be Seen

by the condition required by it during the response

computation period; however, this advantage disappears as

Ehc becomes larger, ie, when:

f R - ( k + 1 ) T

KccA*

€hc > - 1 (34) ty, vol 39, 1990 Jun Farokh B Bastani (M’82): For biography, see IEEE Trans Reliabili-

From these results, we see that there is a tradeoff between

the system reliability and the satisfaction of the real-time con-

Manuscript TR89-101 received 1989 July 10; revised 1990 April 2; revised

1991 February 4

straint especially when the constraint, tR, is stringent IEEE Log Number 00174 4 T R F

Định dạng
Số trang	6
Dung lượng	475,84 KB
File đính kèm	chen1991.zip (443 KB)