Quality of Telephone-Based Spoken Dialogue Systems phần 9 pdf

Step 1: Scenario Definition The scenario for the restaurant information task consists of six slots for attribute-value pairs, namely Task, Foodtype, Date, Time, Price and Location.. In t

Trang 1

Definition of Interaction Parameters

Trang 2

372

Trang 3

Trang 4

374

Trang 5

Trang 6

376

Trang 7

Trang 8

378

Trang 9

Definition of

Trang 10

This page intentionally left blank

Trang 11

Mitt-griechisch, spanisch, orientalisch, asiatisch)

Das Restaurant location hat am weekday Ruhetag (location: am Schauspielhaus, in der

Innenstadt, am Hauptbahnhof, am Stadtpark, am Kunstmuseum, am Stadion, am Opernhaus;

weekday: Montag, Dienstag, Mittwoch, Donnerstag, Freitag, Samstag, Sonntag)

Wann möchten Sie location foodtype essen gehen? (location: am Schauspielhaus, in der

Innenstadt, am Hauptbahnhof, am Stadtpark, am Kunstmuseum, am Stadion, am Opernhaus;

foodtype: vegetarisch, italienisch, französisch, griechisch, spanisch, orientalisch, asiatisch)

Das Lokal price und öffnet um time Uhr (price: ist billig, ist preiswert, ist teuer, hat

gehobene Preise, ist in der unteren Preisklasse, ist in der mittleren Preisklasse, ist in der

oberen Preisklasse; time: dreizehn, sieben, fünfzehn, achtzehn, zwanzig, vierzehn, siebzehn) Die Gerichte im foodtype Restaurant beginnen bei price Mark (foodtype: vegetarischen, italienischen, französischen, griechischen, spanischen, orientalischen, asiatischen; price:

fünfzehn, zwanzig, vierzig, achtzehn, dreißig, dreizehn, siebzehn)

Wednes-Spanish, oriental, asian)

The restaurant location is closed on weekday (location: at the theater, in town center, at the main station, at the city park, at the art museum, at the stadium, at the opera house; weekday:

Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday)

When would you like to eat foodtype food location? (foodtype: vegetarian, Italian, French, Greek, Spanish, oriental, asian; location: at the theater, in town center, at the main station,

at the city park, at the art museum, at the stadium, at the opera house)

The restaurant price and opens at time (price: is cheap, is good value, is expensive, has

high prices, is in the lower price category, is in the middle price category, is in the upper

price category; time: one p.m., seven o’clock, three p.m., six p.m., eight p.m., two p.m., five

p.m.)

The menu in the foodtype restaurant starts at price DM (foodtype: vegetarian, Italian, French, Greek, Spanish, oriental, asian; price: fifteen, twenty, forty, eighteen, thirty, thirteen,

seventeen)

Trang 12

Trang 13

Appendix C

BoRIS Dialogue Structure

Figure C 1 Dialogue flow in the BoRIS restaurant information system of experiment 6.3, part1

Trang 14

Figure C.2 Dialogue flow in the BoRIS restaurant information system of experiment 6.3, part

2 For a legend see Figure C.1

Trang 15

BoRIS Dialogue Structure 385

Figure C.3 Dialogue flow in the BoRIS restaurant information system of experiment 6.3, part

3 For a legend see Figure C.1

Trang 16

Trang 17

Appendix D

Instructions and Scenarios

BoRIS

Dear participant!

Thank you for taking the time to do this experiment!

During the next hour you will get to know BoRIS via the telephone: The Bochumrestaurant information system

This test will show how you experience a telephone call with BoRIS For this aim,

we ask you to call BoRIS five times Before each call you will get a small task At

the end of each telephone call, we ask you to write down what you think about the

system You can do this easily by filling out a questionnaire

Before the test starts, we would like to ask you to answer the questions given onthe following pages For the test evaluation, we need some personal informationfrom your side, information which will be treated anonymously of course

At the end of the whole experiment, we ask you to give an overall judgment about

all the calls you had with BoRIS

For some assessments you will find the following scale:

Usually, your judgment should be in the range between bad and excellent In case

of an unpredictable extreme judgment, you can use the thinly drawn edges of thescale as well Please also use the spaces between the grid marks, as depictedabove

Assess the system in a very self-confident way and remember during the wholetest session:

Not you are tested, but you test our system!

And now: Have a lot of fun!

extremely

bad

bad poor fair good excellent ideal

Trang 19

Instructions and Scenarios 389

Trang 21

Trang 22

Dialogue no.

You plan to eat out in Bochum Because your favorite restaurant is closed for holidays, ask BoRIS for a restaurant.

Please write down first which specifications you want to give to BoRIS.

If BoRIS is unable to find a matching restaurant, please search for an alternative until BoRIS indicates at least one restaurant.

Restaurant name(s):

Trang 23

The following guidelines describe the steps an evaluation expert has to perform in order toanalyze and annotate an interaction with the BoRIS restaurant information system, see experi-ments 6.1 to 6.3 A number of criteria are given which have to be judged upon in each step, and it

is illustrated how these criteria have to be interpreted in the context of the restaurant informationtask It has to be noted that the criteria and recommendations are not strict rules Instead, theevaluation expert often has a certain degree of freedom for interpretation In order to take adecision in an individual case, the expert should consider the objective of the criteria, and thecourse of the interaction up to the specific point In the case that a certain interpretation is chosen,the expert should try to adhere to this interpretation in order to reach consistent results for alldialogues in the analysis set

The analysis and annotation procedure consists of the following steps:

Task AVM analysis

Task success labelling

Contextual appropriateness labelling

System correction turn labelling

User correction turn labelling

Cancel attempt labelling

Help request labelling

User question labelling

System answer labelling

Speech understanding labelling

Automatic calculation of speech-recognition-related measures

Automatical calculation of further interaction parameters

The following guidelines focus on steps where the expert has to take a judgment on a specificinteraction aspect (Steps 1 and 3 to 14) Practical information on the operation of the CSLU-basedWoZ workbench and of the expert evaluation tool are given in Skowronek (2002)

Step 1: Scenario Definition

The scenario for the restaurant information task consists of six slots for attribute-value pairs,

namely Task, Foodtype, Date, Time, Price and Location The field Task can take two different

values: “Get information” where the aim of the dialogue is to obtain information about rants, or “unknown” where the user asks for a task which is not supported by the system, e.g areservation The expert has to interpret relative date specifications like today, tomorrow, etc asfollows:

restau-Today, tomorrow, the day after tomorrow etc the corresponding weekday

Now, in a little while etc the corresponding day and time

Trang 24

This interpretation corresponds to the canonical values used by the language understandingcomponent of the system The following expressions should not be changed in this way becausethey are out of the understanding capability of the system:

During the week, weekdays, weekend, etc leave unchanged

In the case that no specifications for a slot are given in the scenario definition, the according slotshould be left undetermined The same principle applies to the free scenario

Step 3:

The system utterances are automatically logged during the interaction Thus, only the userutterances have to be transcribed by the expert, in the case that no transcription has been pro-duced during the interaction (which is the case for simulated recognition) The user utterancetranscription has to include literally everything that has been articulated during a user’s turn,including laughing, talking to himself, etc In this way, it will reflect the input of the system in

a real-life environment

The expert has to type the transcription into the according field of the evaluation tool Allletters (including German “Umlaute”) and punctuation marks are allowed Linebreaks are gen-erated automatically, but they can also be enforced by pressing the return key However, it has to

be ensured that no empty lines are transcribed, except when the whole user utterance is empty.Scrolling over several lines is possible

Transcription

Step 4: Barge-In Labelling

This step refers to the user utterances only A barge-in attempt is counted when the userintentionally addresses the system while the system is still speaking In this definition, userutterances which are not intended to influence the course of the dialogue (laughing, speaking tohimself/herself) are not counted as barge-ins They are treated as spontaneous reactions whichare not intended to influence the course of the dialogue

All barge-in attempts are labelled by setting the according radio button in the expert evaluationtool The barge-in utterance will not be transcribed until the user repeats it when the turn is onthe user again

Step 5: Task AVM Analysis

The “Scenario AVM” is specified by the scenario and consists of six attribute-value pairs for

the slots Task, Date, Foodtype, Time, Price and Location.

During the course of the interaction, it may happen that the user changes one or several ofthe specifications given in the scenario, either by adding further constraints, by omitting to giveconstraints, or by changing the constraint values Such a change may happen either on the user’sown authority, or because the system requested to do so In both cases the “Scenario AVM” has

to be amended, resulting in a “Modified Scenario AVM” and in a “Changed AVM”

In a first step, the attributes of the user query which differ from the specification given inthe scenario have to be identified These attributes and the corresponding values are writtendown in the according “Modified Scenario AVM”

If the user voluntarily sets the value of an attribute to a neutral value (e.g by saying “don’tknow”, “doesn’t matter”, etc.), the value “neutral” has to be set in the AVM However, in thecase that the user has no possibility to specify the value (e.g because the system did not askhim/her to do so), the AVM remains unchanged at this point This guideline assumes thatthe user would have provided the missing information but the system prematurely directedthe dialogue in a different way

Trang 25

In the case that the user specifies a value for an attribute that is not indicated in the scenario,this value has to be included in the “Modified Scenario AVM”, independently of whetherthe system asked for it or not

In the case that the system asks the user to modify attribute values during the interaction(e.g because it did not find a matching restaurant), such modifications should be included

in the “Changed AVM”

In the case that the user changes an attribute value which was previously specified withoutbeing asked to do so by the system, two situations have to be distinguished:

If the user changes a specification spontaneously, by intuition, this modification should

be handled in the “Modified Scenario AVM”

If the user changes a specification because the system obviously did not process his/herfirst specification attempt, this modification should be handled by the “Changed AVM”.This principle is in accordance with the definition of user correction turns, see below Whenthe expert would rate such an utterance as a user correction turn, then the modification should

be handled by the “Changed AVM”

When the modification occurs in an explicit confirmation situation (e.g as a response to asystem confirmation utterance like “Do you really want to eat out in ?”), then it should

be handled by the “Modified Scenario AVM”

The expert is only allowed to provide values which are in the system’s vocabulary (so-called

“canonical values”) Other values, although they might be specified by the user, should not beintroduced in the AVMs This rule corresponds to a system-orientated point of view

All AVMs are amended by an additional slot, namely the one with the restaurants which matchthe specified attribute values This slot is automatically calculated from the system database

“Today, I’d like to eat out in a Greek restaurant downtown.”

“I’m sorry There’s no restaurant that matches your query

Would you like to change you query?”

“Yes, please.”

“You can change the type of food, the preferred price range, .”

What is your modification?”

“Well, I want Italian food.”

Trang 26

Remark: Current weekday is Friday

The resulting “Modified Scenario AVM” looks as follows:

The “Changed AVM” is as follows:

Step 6: Task Success Labelling

Task success describes whether the aim of the dialogue was achieved or not In order to judgeupon task success, the expert makes use of the restaurant slot of the “Modified Scenario AVM”,the restaurant slot of the “Changed AVM”, and the transcription of the dialogue

The following categories are available:

1 S: Succeed

The aim of the dialogue is achieved and the user obtains the requested information, providedthat the requested information is covered by the system functionality The aim of the dialoguecan be regarded as achieved when all restaurants given by the system are contained inthe restaurant slot of the “Modified Scenario AVM” However, not all restaurants of the

“Modified Scenario AVM” need to be included in the system answer

2 SCu: Succeed with constraint relaxation by the user

The user made a query which is covered by the functionality of the system, but the systemcannot find a matching restaurant in the database The user follows the system request formodification, and the system is able to provide an answer for the modified request This case

is regarded as a constraint relaxation by the user The restaurants given by the system allhave to be contained in the “Changed AVM” However, not all restaurants of the “ChangedAVM” need to be included in the system answer

3 SCs: Succeed with constraint relaxation by the system

The aim of the dialogue is achieved although the user was not able to provide all cations All restaurants provided by the system have to be part of the “Modified ScenarioAVM” However, not all restaurants of the “Modified Scenario AVM” need to be included

specifi-in the system answer

Trang 27

4

5

6

7

SCuCs: Succeed with constraint relaxation by the user and by the system

Combination of the previous two cases: The aim of the dialogue is achieved after the userhad to change his/her query, but nevertheless the user was unable to give all specifications.The aim of the dialogue can be regarded as achieved if all restaurants provided by the systemare contained in the “Changed AVM” However, not all restaurants of the “Changed AVM”need to be contained in the system answer

SN: Succeed in spotting that no answer exists

The user made a query which can be covered by the current system functionality, howeverthe system cannot find a matching restaurant in the database The user does not follow thesystem request for modification, and the interaction ends by spotting that no answer exists

This case is regarded as a success in spotting that no answer exists (SN) The SN category

also has to be chosen when the system informs the user that his/her request is outside thecurrent system functionality, e.g when the user asks for a reservation All the “ModifiedScenario AVM”, the “Changed AVM” and the system response do not contain any restaurants

in this case

FS: Failed because of system behavior

The system provides an answer which is neither contained in the “Modified Scenario AVM”nor in the “Changed AVM”, or it finishes the interaction prematurely although the userbehaves cooperatively

FU: Failed because of user behavior

The aim of the dialogue cannot be achieved because the user behaves uncooperatively, e.g

by giving permanently senseless answers The interaction is also classified as FU in the case

that the user finishes the interaction prematurely (e.g by simply hanging up), irrespective

of the user’s motivation

A first proposal for task success is made by the evaluation tool This proposal might have to

be modified by the expert, notably in the case of unexpected termination of the dialogue In such

a case, the expert tool might not be able to provide any meaningful proposal for task success

Step 7: Contextual Appropriateness Labelling

The expert has to judge each system utterance with respect to its appropriateness in the currentdialogue context The following criteria serve as a basis for this judgment:

Informativeness: Make your contribution as informative as required (for the current purposes

of the exchange); do not make your contribution more informative than is required.Truth and evidence: Do not say what you believe to be false; do not say for which you lackadequate evidence

Định dạng
Số trang	47
Dung lượng	2,16 MB