Using a simulated user to explore human robot interfaces

In addition to describing the knowledge the human operator must have, as well as what aspects of the task will be difficult for the operator, the model makes quantitative predictions abo

Trang 1

Human-robot interfaces (HRI) can be difficult to use We examine urban search

rescue robots (USR) as an example We present here a theory of their use based on a simulated user written in the ACT-R cognitive modeling language The model, using a simulated eye and hand, interacts directly with an unmodified and simple

tele-operating task of maneuvering in an environment to avoid other moving objects The model user also performs a secondary task In addition to describing the knowledge the human operator must have, as well as what aspects of the task will be difficult for the operator, the model makes quantitative predictions about how the speed of the robot influences the quality of the navigation and performance on the secondary task These results are examples of the types of outputs available from a model user

As the model now interacts with the USR simulator using only the bitmap, the model should be widely applicable to testing other simulators and to actual robots The model already suggests why human-robot interfaces are difficult to use and where they can be improved

Index Terms

cognitive model, ACT-R, human-robot interfaces

I I NTRODUCTION

In the future, it might be that robots will become completely autonomous and will act largely independent However, such a level of independence has not yet been

achieved and is in some cases simply undesirable Many of the tasks that robots face today like exploration, reconnaissance, and surveillance, will continue to require supervision [1] Furthermore, people often do not have enough confidence in a

completely autonomous robot to let it operate independently So it seems that the level to which the use of robots will be integrated in our society, will be largely

dependent on the robots ability to communicate with humans in understandable and friendly modalities [2]

Despite its importance, a general theory of human-robot interface use seems to be lacking Many human-robot interfaces do not even respect the most fundamental HCI principles In this paper, we will present the beginnings of a theory that indicates the issues that make human-robot interfaces difficult to use Concurrently, we will

present a quantative tool in the form of a simulated user that can be used to identify problems associated with human-robot interface use Specifically, we introduce a methodology in which a cognitive model autonomously exercises human-robot

interfaces, indicating ways to improve the interface and laying bare problems that can serve as starting points for a general theory of human-robot interface use

Using a simulated user to explore human

robot interfaces

D VAN ROOY, FRANK E RITTER Member, IEEE, and R ST.-AMANT, Member, IEEE

Simuser to explore HRI

Manuscript for Special Issue on Human-Robot Interaction - IEEE Transactions on Systems, Man and

Cybernetics.

Trang 2

One of the reasons that there does not seem to be a general theory of human-robot interface use is the complexity of the task domain, which is reflected in the diversity

in types of human-robot interactions An application that illustrates this well is that

of robot assisted Urban Search and Rescue (USR) USR involves the detection and rescue of victims from urban structures like collapsed buildings Because of the extreme physical and perceptual demands of USR, these applications are usually mixed-initiative human-robot interactions, in which a human operator and a robot interact in some manner to produce adequate performance [3] This means that it might be optimal for the robot to exhibit a fair amount of autonomy in some

situations, for instance, in navigating in a confined space using its own sensors However, other situations might require human intervention: An operator may have

to assist in freeing a robot because its sensors do not provide enough information for autonomous recovery [3] And yet further interventions, some only imagined, such

as providing medication to trapped survivors, will legally require human intervention This illustrates how in the case of enhanced robot autonomy, the role of the operator could often shift between control to monitoring and diagnosis [1]

There are several reasons why principles from HCI are missing from many human robot systems First of all, the task domain of human-robot systems is more complex and diverse, making it very hard to meet the needs of diverse users or come up with

a general metaphor Furthermore, these systems are typically more expensive than regular commercial software packages At the same time, they are not built as often

as regular software and when they are built, it is usually not by people trained in HCI Currently, USR robots are directly driven by operators As they become more

autonomous, these problems will become more complex What is needed is a way to test and improve these interfaces

II U SING A SIMULATED USER TO EXPLORE HUMAN ROBOT INTERFACES

In this section, we will introduce a cross-platform architecture in which a cognitive model simulates user performance Specifically, we will introduce a simulated user, consisting of a cognitive model and a pair of simulated eyes and hands that can be applied to sample human-robot interfaces (or with additional knowledge any other interface for that matter) Ultimately, the intention is to provide a quantative tool to guide the design process of human-robot interfaces This tool will enable designers

to apply psychological theories in real time, providing a simulated user that acts like and interacts with the same interface as a real user

A cognitive model forms the cognition of our simulated user A cognitive model is a theory of human cognition realized as a running computer program It produces human-like performance in that it takes time, commits errors, deploys strategies, and learns It presents a means of applying cognitive psychology data and theory to HCI problems in real-time and in an interactive environment [4-6] We have developed a system consisting of the cognitive architecture ACT-R [7] and a simulated eyes and hands suite called Segman [8] that can be applied to virtually any type of interface running on any operating system We will begin by describing the parts that make up the system and then provide a demonstration Subsequently, we’ll discuss how this

Trang 3

system can be applied as a simulated user to explore human-robot interaction, and how it supports explanations of user’s behavior and evaluation of interfaces

The ACT-R architecture integrates theories of cognition [7], visual attention [9], and motor movement [10] It has been applied successfully to higher-level cognition phenomena, such as modeling scientific reasoning [11], differences in working

memory [12], and skill acquisition [13] to name but a few Recently it has been

applied successfully to a number of HCI issues [14] [15] [6].ACT-R makes a distinction between two types of long-term knowledge, declarative and procedural Declarative knowledge is factual and holds information like “2 + 3 = 5” or “George Bush is the president of the USA” The basic units of declarative knowledge are chunks, which are schema-like structures, effectively forming a propositional network Procedural knowledge consists of production rules that encode skills and take the form of

condition-action pairs Production rules correspond to specific goals or sub-goals, and mainly retrieve and change declarative knowledge

Besides the symbolic procedural and declarative components, ACT-R also has a sub-symbolic component that determines the use of the sub-symbolic knowledge Each

symbolic construct, be it a production or chunk, has sub-symbolic parameters

associated with it that reflect its past use In this way, the system keeps track of the usefulness of the symbolic information Which information is currently available in the declarative memory module is partially determined by the odds that a particular piece of information will be used in that context

An important aspect of the ACT-R architecture is that models created in it predict human behavior qualitatively and quantative: Each covert step of cognition

(production firing, retrieval from declarative memory, procedural knowledge

application) and overt action (mouse-click, moving visual attention) has latencies associated with them that are based on psychological theories and data For

instance, taking a cognitive action, firing a production rule, takes 50 ms (modulated

by other factors such as practice), and the time needed to move a mouse is

Figure 1 ACT-R 5 system diagram The production system and buffers run in parallel, but each component is itself serial The graded areas indicate the novel functionality provided

by SEGMAN that overrides the original perceptual –motor functionality of ACT-R 5, which is indicated by the dashed lines.

Trang 4

calculated using Fitts law (e.g., [16]) In this way, the system provides a way to apply psychological knowledge in real-time

A schematic of the current implementation of the theory, ACT-R 5.0

(act.psy.cmu.edu/ACT-R_5.0), is shown in Figure 1 At the heart of the architecture is

a production system, which represents central cognition and interacts with a number

of buffers These buffers represent the information that the system is currently

acting on: The Goal buffer contains the present goal of the system, the Declarative buffer contains the declarative knowledge that is currently available, and the

perceptual and motor buffers indicate the state of the perceptual and motor module (busy or free, and their contents) The communication between central cognition and the buffers is regulated by production rules As mentioned, production rules are condition-action pairs: The first part of a production rule, the condition-side, typically tests if certain declarative knowledge (in the form of a chunk) is present in a certain buffer The second part, the action side, then sends a request to a buffer to either change the current goal, retrieve knowledge from a buffer such as declarative

memory, or perform some action

The perceptual and motor buffers allow the model to “look” at an interface and

manipulate objects in that interface The perceptual buffer builds a representation of the display in which each object is represented by a feature Productions can send commands to the perceptual buffer to direct attention to an object on the screen and create a chunk in declarative memory that represents that object and its location on the screen The production system can then send commands, initiated by a

production rule, to the motor buffer to manipulate these objects

Central cognition and the various buffers run in parallel with one another, but each of the perceptual and motor buffers is serial (with a few rare exceptions) and can only contain one chunk of information This means that the production system might retrieve a chunk from declarative memory, while the perceptual buffer shifts

attention and the motor buffer moves the mouse We will mainly concentrate on the motor and perceptual buffer, which are most relevant for our purpose

ACT-R 5 in its current release (act.psy.cmu.edu) interacts with interfaces using a Perceptual-Motor buffer (ACT-R/PM) ACT-R/PM [15] includes tools for creating

interfaces and annotating existing interfaces in Macintosh Common Lisp so that

models can see and interact with objects in the interface This allows most models to interact in some way with most interfaces that are written in that language, and to let all models interact with all interfaces written with the special tools

For our simulations, we developed a more general version of ACT-R/PM, which

provides ACT-R 5 direct access to an interface, thus removing the need for a specific interface creation tool This is done by extending ACT-R/PM with the Segman suite (www.csc.ncsu.edu/faculty/stamant/cognitive-modeling.html)

Trang 5

As Figure 1 shows, Segman [8] takes pixel-level input from the screen (i.e., the screen bitmap), runs the bitmap through image processing algorithms, and builds a

structured representation of the screen This representation is then passed to ACT-R through the ACT-R/PM theory of visual perception (i.e perceptual buffer) ACT-R/PM moderates what is visible and how long it takes to see and recognize objects

Segman can also generate mouse and keyboard inputs to manipulate objects on the screen This functionality is called through the ACT-R/PM theory of motor output, but

we have extended the output results to work with any Windows interface This is done by creating very primitive events (click icon, select button, etc), which are

implemented as functions at the operating system level As such, they are

indistinguishable from human-generated events Currently, we have a fully

functional system that runs under Windows 98 and 2000

III T HE MODEL OF ROBOT DRIVING

We will now describe an implementation of our system called DUMAS (pronounced [doo ‘maa], see also smartAHS [17]), which stands for Driver User Model in ACT-R & Segman DUMAS drives a car in a Java-implemented game, which was downloaded from www.theebest.com/games/3ddriver/3ddriver.shtml For the simulations reported below, no changes were made to the game

We choose the 3D driver game for several reasons First, it has a direct interface, in that the operator directs the car using the keyboard This perspective is often

referred to as “inside-out” driving, because the operator feels as if she is inside the vehicle and looking out, and is a common method for vehicle or robot tele-operation [1] Second, driving behavior is a prototypical example of real-time, interactive

decision making in an interactive environment [18] [14] and is as such is comparable

to many tele-operated robot tasks The source code is extensible, which means aspects of the environment (e.g., slow or fast driving) and interface can be

manipulated (e.g., bigger or smaller buttons), in a controlled fashion Because the code is Java, this can be done on multiple platforms And finally, because we did not write it, it helps to show the generality of this approach

Models of driving have been targets of research for decades (the analysis of Gibson and Crooks in 1938 provides one of the earliest examples [19]; see Bellet and

Tattegrain-Veste [20] for a concise historical overview from a cognitive ergonomics perspective.) The hierarchical risk model of van der Molen and Botticher is a

representative example of recent models [21] Driving can be seen as structured into strategic, tactical and operational levels Moving up the hierarchy, each level

describes an increasingly abstract set of behaviors that govern choices at the level below it. At the strategic level, planning activity takes place, such as the choice of route and travel speed At the tactical level decisions encompass more concrete, situation-dependent actions, such as lane changing, passing, and so forth The

operational level describes skilled but routine activities, such as steering and

acceleration

The different levels of abstraction represent different demands on the cognitive, perceptual, and motor abilities of the driver For example, feedback from assistive

Trang 6

technology such as ABS or power steering is provided at the operational level through haptic channels, often imperceptibly Feedback for travel speed, in contrast, requires some cognitive activity at the strategic level, to interpret speedometer readings If the feedback channels from these different activities were reversed (e.g., if the driver had to interpret a numerical value to determine power steering assist), their usability would be seriously impaired Many task domains in HRI, in particular urban search and rescue, share this layered structure

We set out to let the model perform some standard tasks, like staying on track,

avoiding traffic and increasing or decreasing speed At this point, the model can start the game by clicking the mouse on the game window, accelerate by pushing the “A” key, brake by pushing the “Z” key, and steer by using the left and right arrow-keys Perceptual processing in the model is based on observations from the literature on human driving, as is common for other driving models Land and Horwood's [22] study of driving behavior describes a "double model" of steering, in which a region of the visual field relatively far away from the driver (about 4 degrees below the

horizon) provides information about road curvature, while a closer region (7 degrees below the horizon) provides position-in-lane information Attention to the visual field

at an intermediate distance, 5.5 degrees below the horizon, provides a balance of this information, resulting in the best performance

The visual interface of the 3D Driver Game, which is the same interface the model uses, is shown in Figure 2 The default procedure for perception in the model is as follows The model computes position-in-lane information by detecting the edges of the road and the stripes down the center of the road The stripes are combined into

a smoothed curve to provide the left boundary of the lane, while the right edge of the road directly gives the right lane boundary The model computes the midpoint

between these two curves at 5.5 degrees below the horizon This point, offset by a small amount to account for the left-hand driving position of the car, is treated as the goal If the center of the visual field (the default focal point) is to the right of this point, the model must steer to the right, otherwise to the left

Perceptual processing in the model has limitations For example, it is not entirely robust: Determining the center of the lane can break down if the road is curving off too fast in one direction Segman can also return some of the information that it has extracted For example, it can determine road curvature from more distant points, as

is done in models of human driving [23] However, this has not yet led to improved performance in this simulation environment

Trang 7

In its current form, the model has problems staying on the road because visual

cognition is not yet perfect The amount of change in speed depends on how the visual environment is changing At the moment, the model takes snapshots of the whole visual scene to determine its actions It detects changes by recording the

locations of specific points in the visual field and then measuring the distance they move from one snapshot to the next It turns out that this is not a good way to

handle visual flow: Suppose that at time t the model analyzes the road, records the

data for estimating visual flow, and determines that steering one direction or another

is appropriate At time t+1 some steering command is issued, and the simulated car moves in that direction At time t+1 or later the road is again analyzed so that flow

can be computed, but at this point the action of the model resulted in changes in the visual field, independent of changes that would have occurred otherwise This

contribution needs to be accounted for, or the car might end up braking every time it steers

The model still represents a rather restricted model of driving behavior Whereas previous driving models use more than 40 rules [18], the complete behavior of

DUMAS is currently determined by only 20 productions rules Foremost, this reflects that the production system of DUMAS does not yet use the full range of perceptual-motor capabilities offered by the ACT-R architecture through the ACT-R/PM theory Nevertheless, the demonstrations below will illustrate that even with a relatively

simple ACT-R model, the current model already demonstrates some of its capabilities and produces behavior that is fully in line with more established models of driving

b Two demonstrations

We provide two example analyses of the 3D driver game interface This first one assesses the influence of speed on the ability to drive, the second examines how multi-tasking influences driving These demonstrations are really proofs of existence They are examples of the type of measures that would be helpful in testing and

designing more advanced human-robot interfaces In order to increase the realism of our simulation, we will need to expand the perceptual-motor capabilities of the ACT-R model However, even though the model at this point only simulates constraints in cognitive functioning, it is able to simulate realistic driving behavior Figure 3 shows

a screenshot of the desktop during a simulation run On the left, is a GNU Emacs window, in which a trace of the cognitive model appears (which we discuss later) The right top half of the figure shows the debug window of the Allegro Lisp package

Figure 2: Two snapshots of the driving environment

Trang 8

DUMAS starts the game autonomously by clicking on the game window shown in the right bottom of the figure Note that the model “knows” where the gaming window resides on the desktop By limiting the attention of the model to the position and dimensions of the game, we create a virtual bounding box on the screen

Next, the model accelerates, drives at a constant speed and slows down if necessary (e.g in a strong curve) Because at this point the model cannot pass, a run typically ends when the model hits traffic in its own lane Runs can also end when it commits

an error and runs off the road Figure 4 shows the results of the model simulation on two dependent measures: Lateral deviation, which shows the position of the car with respect to the center of the right lane and total driving time in minutes These

should be seen as example measures The model and architecture can provide other measures such as working memory load, spare capacity in the processor and

learning

Speed

In the speed demonstration, DUMAS completed three sets of 10 runs, one at low, one

at medium and one at high speed, in the 3D Driving Game We looked at the

influence of speed on total driving time and the amount of lane deviation, which reflects the ability of the model to stay on its ideal line of driving Lane deviation is commonly used in driving studies to measure the influence of factors such as multi-tasking and drug use [24] Figure 4 summarizes the results

6

9

12

1 2 3

Figure 3: Screen capture showing a GNU Emacs window

on the left, an Allegro Lisp window in the top right corner and the driving game in the bottom right corner

Trang 9

The left panel of Figure 4 shows the model predicts that average lane deviation will increase as speed increases, which is in line with experimental data and previous models The model needs a certain amount of time to update its representation of the environment, mainly determined by constraints build into the ACT-R model As a result, the distance between steering adjustments increases as speed increases thus leading to larger lane deviations

The right panel of Figure 4 shows how total average driving time, measured in

minutes, drops significantly as speed increases The explanation for this is made clear by the type of errors in the first condition: Because more distance passes between two steering adjustments, the chance of accidents also increases In the Slow condition, DUMAS only had 3 accidents, compared to 7 and 10 in the Medium and Fast conditions respectively

Multi-tasking

In the multi-tasking demonstration, we illustrate how dividing the model’s attention produces the same effect as increasing speed In essence, the model’s performance

is determined by the speed and accuracy with which it reacts and adapts to the environment As a consequence, anything that diverts attention from driving will affect performance More precisely, the time between updating moments will

increase, leading to behavior that is less adapted to the environment

To simulate the influence of anxiety as a dual-task, we added useless knowledge to the system designed to interfere with driving Specifically, we added to the model’s procedural knowledge simple rules that can fire any time while the model is driving This simulates the influence of distracting thoughts, as well as the effects of reduced working memory: Due to the serial nature of rule-firing in ACT-R, whenever one of the useless rules fires, it results into a slowing down of the execution of the relevant driving productions As a result, performance will be more error prone (for related work, ritter.ist.psu.edu/acs-lab/#ACT-R/AC) [25]

Figure 4 Speed Demonstration: Lane deviation (in degrees) and total driving time (in minutes) of DUMAS in function of speed Slow corresponds to a driving speed within the range of 15-20, medium 20-25, and fast 30-35 as measured on the spedometer in the simulation

Trang 10

6

9

12

0 1 2 3

We compared the slow speed condition from the speed demonstration (Standard) to a condition in which the model drove at the same speed but was bothered by

“obtrusive” thoughts (Worried) Figure 5 shows the result for the same set of

dependent measures The left panel shows the model predicts that average lane deviation increases when the model is worried This confirms data generated with more complex driver models that show how a secondary task affects performance [14] The second measure further confirms this The right panel of Figure 5 shows how total average driving time, measured in minutes, drops significantly in the Worry condition due to an increase in the number of accidents

A very useful aspect of the ACT-R model is that it also generates a protocol of

behavioral output, illustrating how separate parts of a complex behavior like driving unfold over time Figure 6 depicts a test run of the model, starting with the “go” production and ending with a crash For illustrative purpose, we chose a particularly short run As you can see, the protocol indicates what behavior (steering, cruising,

“thinking about the World Cup” as worry) is taking place at what time This protocol gives insight into the behavior, in that it shows the sequence and timing of behavior and can also indicate critical points in a behavior Furthermore, it can be compared

to the behavior of human subjects as a further validation of the model, or to gain further insight into a complex behavior such as driving

Even though DUMAS is still in its beginning stages and more work needs to be done,

it already illustrates many of the issues that a theory of human-robot interface use will have to face More specifically, it allows identifying a set of subtasks that appear relevant to human-robot interface use What are these?

1 Visual orientation: Visual input is undoubtedly the most important source of

information in driving [26] Nevertheless, the human visual system seems badly equipped for a task like driving: We only see sharply in a small center of the visual

Standard Worried Standard Worried

Figure 5: Lane deviation (in degrees) and total driving time (in

minutes) of DUMAS in the Standard and Worried condition

Định dạng
Số trang	19
Dung lượng	293,5 KB