Step 1: Scenario Definition The scenario for the restaurant information task consists of six slots for attribute-value pairs, namely Task, Foodtype, Date, Time, Price and Location.. In t
Trang 1Definition of Interaction Parameters
Trang 2372
Trang 3Definition of Interaction Parameters
Trang 4374
Trang 5Definition of Interaction Parameters
Trang 6376
Trang 7Definition of Interaction Parameters
Trang 8378
Trang 9Definition of
Trang 10This page intentionally left blank
Trang 11Mitt-griechisch, spanisch, orientalisch, asiatisch)
Das Restaurant location hat am weekday Ruhetag (location: am Schauspielhaus, in der
Innenstadt, am Hauptbahnhof, am Stadtpark, am Kunstmuseum, am Stadion, am Opernhaus;
weekday: Montag, Dienstag, Mittwoch, Donnerstag, Freitag, Samstag, Sonntag)
Wann möchten Sie location foodtype essen gehen? (location: am Schauspielhaus, in der
Innenstadt, am Hauptbahnhof, am Stadtpark, am Kunstmuseum, am Stadion, am Opernhaus;
foodtype: vegetarisch, italienisch, französisch, griechisch, spanisch, orientalisch, asiatisch)
Das Lokal price und öffnet um time Uhr (price: ist billig, ist preiswert, ist teuer, hat
gehobene Preise, ist in der unteren Preisklasse, ist in der mittleren Preisklasse, ist in der
oberen Preisklasse; time: dreizehn, sieben, fünfzehn, achtzehn, zwanzig, vierzehn, siebzehn) Die Gerichte im foodtype Restaurant beginnen bei price Mark (foodtype: vegetarischen, italienischen, französischen, griechischen, spanischen, orientalischen, asiatischen; price:
fünfzehn, zwanzig, vierzig, achtzehn, dreißig, dreizehn, siebzehn)
Wednes-Spanish, oriental, asian)
The restaurant location is closed on weekday (location: at the theater, in town center, at the main station, at the city park, at the art museum, at the stadium, at the opera house; weekday:
Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday)
When would you like to eat foodtype food location? (foodtype: vegetarian, Italian, French, Greek, Spanish, oriental, asian; location: at the theater, in town center, at the main station,
at the city park, at the art museum, at the stadium, at the opera house)
The restaurant price and opens at time (price: is cheap, is good value, is expensive, has
high prices, is in the lower price category, is in the middle price category, is in the upper
price category; time: one p.m., seven o’clock, three p.m., six p.m., eight p.m., two p.m., five
p.m.)
The menu in the foodtype restaurant starts at price DM (foodtype: vegetarian, Italian, French, Greek, Spanish, oriental, asian; price: fifteen, twenty, forty, eighteen, thirty, thirteen,
seventeen)
Trang 12This page intentionally left blank
Trang 13Appendix C
BoRIS Dialogue Structure
Figure C 1 Dialogue flow in the BoRIS restaurant information system of experiment 6.3, part1
Trang 14Figure C.2 Dialogue flow in the BoRIS restaurant information system of experiment 6.3, part
2 For a legend see Figure C.1
Trang 15BoRIS Dialogue Structure 385
Figure C.3 Dialogue flow in the BoRIS restaurant information system of experiment 6.3, part
3 For a legend see Figure C.1
Trang 16This page intentionally left blank
Trang 17Appendix D
Instructions and Scenarios
BoRIS
Dear participant!
Thank you for taking the time to do this experiment!
During the next hour you will get to know BoRIS via the telephone: The Bochumrestaurant information system
This test will show how you experience a telephone call with BoRIS For this aim,
we ask you to call BoRIS five times Before each call you will get a small task At
the end of each telephone call, we ask you to write down what you think about the
system You can do this easily by filling out a questionnaire
Before the test starts, we would like to ask you to answer the questions given onthe following pages For the test evaluation, we need some personal informationfrom your side, information which will be treated anonymously of course
At the end of the whole experiment, we ask you to give an overall judgment about
all the calls you had with BoRIS
For some assessments you will find the following scale:
Usually, your judgment should be in the range between bad and excellent In case
of an unpredictable extreme judgment, you can use the thinly drawn edges of thescale as well Please also use the spaces between the grid marks, as depictedabove
Assess the system in a very self-confident way and remember during the wholetest session:
Not you are tested, but you test our system!
And now: Have a lot of fun!
extremely
bad
bad poor fair good excellent ideal
Trang 19Instructions and Scenarios 389
Trang 21Instructions and Scenarios 391
Trang 22Dialogue no.
You plan to eat out in Bochum Because your favorite restaurant is closed for holidays, ask BoRIS for a restaurant.
Please write down first which specifications you want to give to BoRIS.
If BoRIS is unable to find a matching restaurant, please search for an alternative until BoRIS indicates at least one restaurant.
Restaurant name(s):
Trang 23Instructions and Scenarios 393
The following guidelines describe the steps an evaluation expert has to perform in order toanalyze and annotate an interaction with the BoRIS restaurant information system, see experi-ments 6.1 to 6.3 A number of criteria are given which have to be judged upon in each step, and it
is illustrated how these criteria have to be interpreted in the context of the restaurant informationtask It has to be noted that the criteria and recommendations are not strict rules Instead, theevaluation expert often has a certain degree of freedom for interpretation In order to take adecision in an individual case, the expert should consider the objective of the criteria, and thecourse of the interaction up to the specific point In the case that a certain interpretation is chosen,the expert should try to adhere to this interpretation in order to reach consistent results for alldialogues in the analysis set
The analysis and annotation procedure consists of the following steps:
Task AVM analysis
Task success labelling
Contextual appropriateness labelling
System correction turn labelling
User correction turn labelling
Cancel attempt labelling
Help request labelling
User question labelling
System answer labelling
Speech understanding labelling
Automatic calculation of speech-recognition-related measures
Automatical calculation of further interaction parameters
The following guidelines focus on steps where the expert has to take a judgment on a specificinteraction aspect (Steps 1 and 3 to 14) Practical information on the operation of the CSLU-basedWoZ workbench and of the expert evaluation tool are given in Skowronek (2002)
Step 1: Scenario Definition
The scenario for the restaurant information task consists of six slots for attribute-value pairs,
namely Task, Foodtype, Date, Time, Price and Location The field Task can take two different
values: “Get information” where the aim of the dialogue is to obtain information about rants, or “unknown” where the user asks for a task which is not supported by the system, e.g areservation The expert has to interpret relative date specifications like today, tomorrow, etc asfollows:
restau-Today, tomorrow, the day after tomorrow etc the corresponding weekday
Now, in a little while etc the corresponding day and time
Trang 24This interpretation corresponds to the canonical values used by the language understandingcomponent of the system The following expressions should not be changed in this way becausethey are out of the understanding capability of the system:
During the week, weekdays, weekend, etc leave unchanged
In the case that no specifications for a slot are given in the scenario definition, the according slotshould be left undetermined The same principle applies to the free scenario
Step 3:
The system utterances are automatically logged during the interaction Thus, only the userutterances have to be transcribed by the expert, in the case that no transcription has been pro-duced during the interaction (which is the case for simulated recognition) The user utterancetranscription has to include literally everything that has been articulated during a user’s turn,including laughing, talking to himself, etc In this way, it will reflect the input of the system in
a real-life environment
The expert has to type the transcription into the according field of the evaluation tool Allletters (including German “Umlaute”) and punctuation marks are allowed Linebreaks are gen-erated automatically, but they can also be enforced by pressing the return key However, it has to
be ensured that no empty lines are transcribed, except when the whole user utterance is empty.Scrolling over several lines is possible
Transcription
Step 4: Barge-In Labelling
This step refers to the user utterances only A barge-in attempt is counted when the userintentionally addresses the system while the system is still speaking In this definition, userutterances which are not intended to influence the course of the dialogue (laughing, speaking tohimself/herself) are not counted as barge-ins They are treated as spontaneous reactions whichare not intended to influence the course of the dialogue
All barge-in attempts are labelled by setting the according radio button in the expert evaluationtool The barge-in utterance will not be transcribed until the user repeats it when the turn is onthe user again
Step 5: Task AVM Analysis
The “Scenario AVM” is specified by the scenario and consists of six attribute-value pairs for
the slots Task, Date, Foodtype, Time, Price and Location.
During the course of the interaction, it may happen that the user changes one or several ofthe specifications given in the scenario, either by adding further constraints, by omitting to giveconstraints, or by changing the constraint values Such a change may happen either on the user’sown authority, or because the system requested to do so In both cases the “Scenario AVM” has
to be amended, resulting in a “Modified Scenario AVM” and in a “Changed AVM”
In a first step, the attributes of the user query which differ from the specification given inthe scenario have to be identified These attributes and the corresponding values are writtendown in the according “Modified Scenario AVM”
If the user voluntarily sets the value of an attribute to a neutral value (e.g by saying “don’tknow”, “doesn’t matter”, etc.), the value “neutral” has to be set in the AVM However, in thecase that the user has no possibility to specify the value (e.g because the system did not askhim/her to do so), the AVM remains unchanged at this point This guideline assumes thatthe user would have provided the missing information but the system prematurely directedthe dialogue in a different way
Trang 25Instructions and Scenarios 395
In the case that the user specifies a value for an attribute that is not indicated in the scenario,this value has to be included in the “Modified Scenario AVM”, independently of whetherthe system asked for it or not
In the case that the system asks the user to modify attribute values during the interaction(e.g because it did not find a matching restaurant), such modifications should be included
in the “Changed AVM”
In the case that the user changes an attribute value which was previously specified withoutbeing asked to do so by the system, two situations have to be distinguished:
If the user changes a specification spontaneously, by intuition, this modification should
be handled in the “Modified Scenario AVM”
If the user changes a specification because the system obviously did not process his/herfirst specification attempt, this modification should be handled by the “Changed AVM”.This principle is in accordance with the definition of user correction turns, see below Whenthe expert would rate such an utterance as a user correction turn, then the modification should
be handled by the “Changed AVM”
When the modification occurs in an explicit confirmation situation (e.g as a response to asystem confirmation utterance like “Do you really want to eat out in ?”), then it should
be handled by the “Modified Scenario AVM”
The expert is only allowed to provide values which are in the system’s vocabulary (so-called
“canonical values”) Other values, although they might be specified by the user, should not beintroduced in the AVMs This rule corresponds to a system-orientated point of view
All AVMs are amended by an additional slot, namely the one with the restaurants which matchthe specified attribute values This slot is automatically calculated from the system database
“Today, I’d like to eat out in a Greek restaurant downtown.”
“I’m sorry There’s no restaurant that matches your query
Would you like to change you query?”
“Yes, please.”
“You can change the type of food, the preferred price range, .”
What is your modification?”
“Well, I want Italian food.”
Trang 26Remark: Current weekday is Friday
The resulting “Modified Scenario AVM” looks as follows:
The “Changed AVM” is as follows:
Step 6: Task Success Labelling
Task success describes whether the aim of the dialogue was achieved or not In order to judgeupon task success, the expert makes use of the restaurant slot of the “Modified Scenario AVM”,the restaurant slot of the “Changed AVM”, and the transcription of the dialogue
The following categories are available:
1 S: Succeed
The aim of the dialogue is achieved and the user obtains the requested information, providedthat the requested information is covered by the system functionality The aim of the dialoguecan be regarded as achieved when all restaurants given by the system are contained inthe restaurant slot of the “Modified Scenario AVM” However, not all restaurants of the
“Modified Scenario AVM” need to be included in the system answer
2 SCu: Succeed with constraint relaxation by the user
The user made a query which is covered by the functionality of the system, but the systemcannot find a matching restaurant in the database The user follows the system request formodification, and the system is able to provide an answer for the modified request This case
is regarded as a constraint relaxation by the user The restaurants given by the system allhave to be contained in the “Changed AVM” However, not all restaurants of the “ChangedAVM” need to be included in the system answer
3 SCs: Succeed with constraint relaxation by the system
The aim of the dialogue is achieved although the user was not able to provide all cations All restaurants provided by the system have to be part of the “Modified ScenarioAVM” However, not all restaurants of the “Modified Scenario AVM” need to be included
specifi-in the system answer
Trang 27Instructions and Scenarios 397
4
5
6
7
SCuCs: Succeed with constraint relaxation by the user and by the system
Combination of the previous two cases: The aim of the dialogue is achieved after the userhad to change his/her query, but nevertheless the user was unable to give all specifications.The aim of the dialogue can be regarded as achieved if all restaurants provided by the systemare contained in the “Changed AVM” However, not all restaurants of the “Changed AVM”need to be contained in the system answer
SN: Succeed in spotting that no answer exists
The user made a query which can be covered by the current system functionality, howeverthe system cannot find a matching restaurant in the database The user does not follow thesystem request for modification, and the interaction ends by spotting that no answer exists
This case is regarded as a success in spotting that no answer exists (SN) The SN category
also has to be chosen when the system informs the user that his/her request is outside thecurrent system functionality, e.g when the user asks for a reservation All the “ModifiedScenario AVM”, the “Changed AVM” and the system response do not contain any restaurants
in this case
FS: Failed because of system behavior
The system provides an answer which is neither contained in the “Modified Scenario AVM”nor in the “Changed AVM”, or it finishes the interaction prematurely although the userbehaves cooperatively
FU: Failed because of user behavior
The aim of the dialogue cannot be achieved because the user behaves uncooperatively, e.g
by giving permanently senseless answers The interaction is also classified as FU in the case
that the user finishes the interaction prematurely (e.g by simply hanging up), irrespective
of the user’s motivation
A first proposal for task success is made by the evaluation tool This proposal might have to
be modified by the expert, notably in the case of unexpected termination of the dialogue In such
a case, the expert tool might not be able to provide any meaningful proposal for task success
Step 7: Contextual Appropriateness Labelling
The expert has to judge each system utterance with respect to its appropriateness in the currentdialogue context The following criteria serve as a basis for this judgment:
Informativeness: Make your contribution as informative as required (for the current purposes
of the exchange); do not make your contribution more informative than is required.Truth and evidence: Do not say what you believe to be false; do not say for which you lackadequate evidence