Báo cáo khoa học: " Robust and Flexible Mixed-Initiative Dialogue for Telephone Services" pot

Starting from the evaluation of a preliminar version of the system we 1 conclude the necessity to desing a robust and flexible system suitable to have to have different dialogue contro

Trang 1

Proceedings of EACL '99

R o b u s t and Flexible M i x e d - I n i t i a t i v e D i a l o g u e

for T e l e p h o n e Services

Relafio Gil, Jos~ ~ and Tapias, D a n i e l and Gancedo, M a r i a C

Charfuelan, Marcela ~ and Hern£ndez, Luis A

Speech Technology Group, Telefdnica Investigacihn y Desarrollo, S.A

C Emilio Vargas, 6 28043 - Madrid (Spain) Teh34.1.549500 Fax:34.1.3367350 e-mail:jretanio@gaps.ssr.upm.es

A b s t r a c t

In this work, we present an experimental

analysis of a Dialogue System for the au-

tomatization of simple telephone services

Starting from the evaluation of a preliminar

version of the system we 1 conclude the ne-

cessity to desing a robust and flexible system

suitable to have to have different dialogue

control strategies depending on the charac-

teristics of the user and the performance of

the speech recognition module Experimen-

tal results following the PARADISE frame-

work show an important improvement both

in terms of task success and dialogue cost

for the proposed system

1 I N T R O D U C T I O N

In this contribution we present some improve-

ments on the design of a Dialogue Management

System for the a u t o m a t i z a t i o n of simple telephone

tasks in a PABX environment (automatic name

dialing, voice messaging, ) From the point

of view of its functionality, our system is a very

simple one because there is no need of advanced

Plan Recognition strategies or General P r o b l e m

Solving methods However we think t h a t even for

these kind of dialogue sytems there is still a long

way to d e m o n s t r a t e their usability in real situa-

tions by the "general public"

In our work we will concentrate on systems

designed for the telephone line and for a wide

range of potential users Therefore our evalua-

tions will be done taking into account different lev-

els of speech recognition performance and user be-

haviours In particular we will propose and eval-

uate strategies directed to increase the robustness

against recognition errors and flexibility to deal

with a wide range of users We will use the PAR-

ADISE evaluation framework (Walker et al., 1998)

to analyze b o t h task success and agent dialogue

behaviour related to subjective user satisfaction

1~ Dep SSR ETSIT-UPM Spain

2 R O B U S T A N D F L E X I B L E

S Y S T E M Following the classification of Dialogue Systems proposed by Allen (Allen, 1997), our baseline clia- logue system could be described as a system with topic-based performance capabilities, a d a p t i v e single task, a minimal pair clarification/correction dialogue manager and fixed mixed-initiative One of the most i m p o r t a n t objectives of our dialogue manager has been the i m p l e m e n t a t i o n of a collaborative dialogue model So the system has

to be able to understand all the user actions, in whatever order they appear, a n d even if the focus

of the dialogue has been changed by the user In order to achieve this, we organize the information

in an information tree, controlled by a t a s k knowledge interpreter and we let the d a t a to partici- pate in driving the dialogue However, to control

a mixed-initiative s t r a t e g y we use three s e p a r a t e sources of information: the user data, the world knowledge embedded in the t a s k structure and the general dialogue acts

Therefore, from this preliminar evaluation of the system we found t h a t in order to increase its p e r m o r m a n c e two m a j o r points should be ad- dressed: a) robustness against recognition and parser errors, and b) more flexibility to be able

to deal with different user models We designed four complementary strategies to improve its performance:

1 To estimate the performance of the speech recognition module This was done from a count on the number of corrections during previous inter- actions with the same user

2 To classify each user as belonging to group A or B that will be described later in the Experimental Results section This was done combining a normalized average number of utterances per task and the amount of information in each utterance, especially at some particular dialogue points (for example when answering to the question of our previous example)

Trang 2

Proceedings of E A C L '99

3 To include a control module that from the re-

sults of steps 1 and 2 defines two different kinds

of control management allowing a flexible mixed-

initiative strategy: more user initiative for Group

A users and high recognition rates, and more

restictive strategies for Group B users and/or low

recognition performance

All of these strategies have been included in our

system as it is depicted in Figure 1

3 E X P E R I M E N T A L R E S U L T S

In order to test the i m p r o v e m e n t s over our original

system (described in (Alvarez et al., 1996)) we de-

signed a simulated evaluation environment where

the p e r f o r m a n c e of the Speech Recognition Mod-

ule (recognition rate) was artificially controlled

A Wizard of Oz simulation environment was de-

signed to o b t a i n different levels of recognition per-

formance for a v o c a b u l a r y of 1170 words: 96.4%

word recognition rate for high performance and

80% for low performance A pre-defined single

fixed mixed-initiative s t r a t e g y was used in all the

cases

We used an a n n o t a t e d d a t a base composed of

50 dialogues with 50 different novice users and 6

different simple telephone tasks in each dialogue:

25 dialogues were simulated using 94.6% recogni-

tion rate a n d 25 with 80% Performance results

were obtained using the P A R A D I S E evaluation

framework (Walker et al., 1998), determining the

contributions of task success and dialogue cost to

user satisfaction Therefore as task success mea-

sure me obtained the K a p p a coefficient while dia-

logue cost measures were based on the n u m b e r of

users turns In this case it is i m p o r t a n t to point

out t h a t as each tested dialogue is composed of a

set of six different tasks which have quantify differ-

ent n u m b e r of turns, the n u m b e r of t u r n s for each

t a s k was normalized to it's N ( x ) = ~+ ~ O" x score

Both Group High ASR

Lo ASR Hi ASR

Satisf 26.4 30.1 3 5 4 25.2

Table 1: Shows means results for both group in low

and high ASR And separately for each Group A and

B, only in high ASR situation

User satisfaction in Table 1 was o b t a i n e d as a

cumulative satisfaction score for each dialogue by

s u m m i n g the scores of a set of questions similar

t,o those proposed in (Walker et al., 1998) The

ANOVA for K a p p a , the cost measure a n d user sat-

isfaction d e m o s t r a t e d a significant effect of ASR

performance As it could be predicted, we found

t h a t in all cases a low recognition r a t e corresponds

to a dramatical decrease in the absolute n u m b e r

of suscessfully completed tasks and an i m p o r t a n t increase in the average n u m b e r of utterances However we also found t h a t in high ASR situation the t a s k success m e a s u r e of K a p p a was sur- prisingly low

A closer inspection of the dialogues in Table 1 revealed t h a t this low performance under high

A S R situations was due to the presence of two groups of users A first group, G r o u p A, showed

a "fluent" interaction with the system, similar to

t h e one supposed by the mixed-initiative s t r a t e g y (for example, as an answer to the question of the system " d o you want to do any other t a s k ? " , these users could answer something like "yes, I would like to send a message to John Smith") While the other group of users, G r o u p B, exibited a very restrictive interaction with the system (for example, a short answer "yes" for the s a m e question)

As a conclusion of this first evaluation we found

t h a t in order to increase the p e r m o r m a n c e of our baseline system, two m a j o r points should be ad- dressed: a) robustness against recognition and parser errors, a n d b) m o r e flexibility to be able

to deal with different user models

Therefore we designed an adaptive s t r a t e g y to

a d a p t our dialogue m a n a g e r to G r o u p A or B of users and to High and Low ASR situations T h e

a d a p t a t i o n was done based on linear discrimina- tion, as it is ilustrated in Figure 2, using b o t h the average n u m b e r of turns and recognition errors from the two first tasks in each dialogue

Low ASR Both Gr

0.71

User Turn 7.2 Satisfaction 26.9

High ASR

5.3 6.1 32.1 29.4

Table 2: Shows means results for each Group in high ASR situations and for both in low ASR

Table 2 shows mean results for each G r o u p A and B of users for High ASR performance, and for all users in Low ASR situations These results show a m o r e stable behaviour of the system, t h a t

is, less difference in performance between users of

G r o u p A a n d G r o u p B and, although to a lower extend, between high and low recognition rates

4 C O N C L U S I O N S

T h e main conclusion of the work is the necessity

to design a d a p t i v e dialogue m a n a g e m e n t strategies to make the system robust against recogniton performance and different user behaviours

Trang 3

Proceedings of EACL '99

R e f e r e n c e s

James Allen 1997 Tutorial: Dialogue Modeling

uno, ACL/ERACL Workshop on Spoken Dia-

logue System, Madrid, Spain

D Tapias 1996 The Natural Language Pro-

cessing Module ]or a Voice Asisted Operator at

Tele]oniea I÷D uno, ICSLP '96, Philadelphia,

USA

M Walker, D Litman, C Kamm, and A Abella

1998 Evaluating spoken dialog agents with

PARADISE: Two case studies, uno, Computer

speech and language

Trang 4

Proceedings of EACL '99

[

PARSER

TRAKER BASIC ACTS

USERS GROUPS SELECTOR

SYSTEM DEFINED DIALOG GROUPS STRATEG~

SELECTOR

BASIC ACTS

BACKWARD USER INTENTIONS CO-REFERENCE PROCESSOR

< PROCESSOR y

[ SE~'NTIC y

> GATHERINGS PROCESSOR

>[ CORRECTION ] DETECTOR

I BEHAVIOUR USER ACTS

KNOWLEDGE

> INTERPRETER

TASK ACTS

ACTS INTERPRETER

DIALOG ACTS

L

• REQUEST-REPLY INFOP,$L~TIOF

• ACTUALIZATION OF DIALOG'S INFORMATION:

'\\

]

* REQU~T.REpLy DATA INFO~T~ON

• STORE DATA INFOI~MATION

APLICATION

Figure 1: Modules of Robust and Flexible Mixed-Iniciative Dialogue

r ~

1 2

I 0

: : ~,:: , ' o , ~ : : ; ~

5 i 0 1 5 2 0 % E R R O R

R A T E

Figure 2: User clasification

Tiêu đề	Robust and flexible mixed-initiative dialogue for telephone services
Tác giả	Relafio Gil, Daniel Tapias, María C. Gancedo, Marcela Charfuelan, Luis A. Hernández
Trường học	Universidad Politécnica de Madrid
Chuyên ngành	Speech Technology
Thể loại	báo cáo khoa học
Năm xuất bản	1999
Thành phố	Madrid

Định dạng
Số trang	4
Dung lượng	250,83 KB