STOCHASTIC NATURAL LANGUAGE GENERATION FOR SPOKEN DIALOG SYSTEMS

This paper presents thedetails of the implementation in the CMU Communicator, somepreliminary evaluation results, and the potential contribution to thegeneral problem of natural language

Trang 1

STOCHASTIC NATURAL LANGUAGE GENERATION FOR

SPOKEN DIALOG SYSTEMS

Alice H Oh Alexander I Rudnicky

School of Computer Science Carnegie Mellon University Pittsburgh, PA USA

Trang 2

Alex Rudnicky (air+@cs.cmu.edu)

School of Computer Science

5000 Forbes Avenue

Pittsburgh, PA 15213

USA

Trang 3

N-gram language models have been found useful for many automaticspeech recognition applications Although it is not clear whether thesimple n-grams can adequately model human language, we show anapplication of this ubiquitous modeling technique to the task of naturallanguage generation (NLG) This work shows that it is possible toemploy a purely corpus-based approach to NLG within a spoken dialogsystem In this paper, we discuss applying this corpus-based stochasticlanguage generation at two levels: content selection and sentenceplanning/realization At the content selection level, the utterances aremodeled by bigrams, and the appropriate attributes are chosen usingthe bigram statistics For sentence planning and realization, theutterances in the corpus are modeled by n-grams of varying length, andeach new utterance is generated stochastically This paper presents thedetails of the implementation in the CMU Communicator, somepreliminary evaluation results, and the potential contribution to thegeneral problem of natural language generation for spoken dialog,written dialog, and text generation

Trang 4

1 Introduction

Natural language generation (NLG) is the process of generating text from a meaningrepresentation It can be thought of as the reverse of natural language understanding (NLU)(see Figure 1) While it is clear that NLG is an important part of natural language processing,there has been considerably less research activity in NLG than in NLU This can be partlyexplained by the reality that NLU, at least until now, has had more application potential, due

to the enormous amount of text present in the world (Mellish and Dale, 1998) In contrast, it isunclear what the input to NLG should be, and other than some systems in which input to NLG

is automatically created by another module, all input must be created somehow just for thispurpose

(Insert Figure 1 about here)Nevertheless, NLG plays a critical role in applications such as text summarization, machinetranslation, and dialog systems Presently, NLG researchers are actively advancing the field oftext generation, and as a result, many useful technologies are being developed However,there is still a critical shortage of adequate NLG technologies for spoken dialog systems,where NLG plays an especially important role The current work, along with other relatedefforts such as Baptist and Seneff (2000) and Ratnaparkhi (2000), strives to highlight theimportance and initiate progress of NLG research in spoken dialog systems

In developing and maintaining a natural language generation (NLG) module for a spokendialog system, we recognized the limitations of the current NLG technologies for ourpurposes While several general-purpose rule-based generation systems have been developed(cf Elhadad and Robin, 1996), they are often quite difficult to adapt to small, task-orientedapplications because of their generality To overcome this problem, several people haveproposed different solutions Bateman and Henschel (1999) have described a lower cost andmore efficient generation system for a specific application using an automatically customizedsubgrammar Busemann and Horacek (1998) describe a system that mixes templates and rule-

Trang 5

based generation This approach takes advantages of templates and rule-based generation asneeded by specific sentences or utterances Stent (1999) has also proposed a similar approachfor a spoken dialog system However, for all of these, there is still the burden of writinggrammar rules and acquiring the appropriate lexicon

Because comparatively less effort is needed, many current dialog systems use template-basedgeneration But there is one obvious disadvantage to templates: the quality of the outputdepends entirely on the set of templates Even in a relatively simple domain, such as travelreservations, the number of templates necessary for reasonable quality can become quite large(in the order of hundreds) such that maintenance becomes a serious problem There is anunavoidable tradeoff between the amount of time and effort in creating and maintainingtemplates and the variety and quality of the output utterances This will become clear when

we present the details of our previous template system in Section 4.2.1

Recognizing these limitations of the rule-based and template-based techniques, we developed

a novel approach to natural language generation It features a simple, yet effective based technique that uses statistical models of task-oriented language spoken by domainexperts to generate system utterances We have applied this technique to sentence realizationand content planning, and have incorporated the resulting generation component into aworking spoken dialog system In our evaluation experiments, this technique performed wellfor our spoken dialog system This shows that the corpus-based approach is a promisingavenue to further explore

corpus-Outline

This article is structured as follows:

Section 2 starts with a high-level view of the general role of NLG in various languagetechnology applications, and then continues with a discussion of the specific role that NLGplays in spoken dialog systems Section 3 is a survey of some of the popular techniques inNLG This section will also serve as an introduction to the terminology used throughout the

Trang 6

rest of the paper Section 4 briefly describes the Carnegie Mellon Communicator, how ourstochastic NLG fits into the system, and how the NLG module has changed over time Section

5 gives a detailed description of the corpora we used, and how we prepared them beforebuilding the statistical models Section 6 and 7 talk about our implementation at the contentplanning and surface realization levels, respectively Section 8 describes our evaluationmethodology and presents the results Section 9 is the conclusion of this paper

2 Natural Language Generation for Spoken Dialog Systems

Natural Language Generation (NLG) and Spoken Dialog Systems are two distinct and overlapping research fields within language technologies Most researchers in these two fieldshave not worked together until very recently However, since every spoken dialog systemneeds an NLG component, spoken dialog researchers have started to look at whattechnologies the NLG community can provide them While that may be a good approach, it isalso possible for the spoken dialog community to take the initiative and contribute to the NLGcommunity This work is one of the first attempts at such contribution, along with Stent(1999), Ratnaparkhi (2000), and Baptist and Seneff (2000) It is not merely an application ofwell-developed NLG techniques to a spoken dialog system, but it introduces a novelgeneration technique potentially well-suited to many applications In this section, we comparethe different roles that NLG must play in text-based applications and in spoken dialogsystems

non-2.1 Natural Language Generation

In this section, we give a general introduction to NLG systems for text-based applications,such as machine translation By doing so, we hope to set a stage for describing how the needs

of spoken dialog systems differ from those of the applications for which most NLGtechnologies were developed

One of the most important applications of NLG is machine translation (MT) In an MT systemwhere interlingua is used as an intermediate representation between the source language and

Trang 7

the target language, NLG is an essential component that maps interlingua to text in the targetlanguage (see Figure 2) One example of such a system is Nitrogen (Langkilde and Knight,1998), the generation component used in Gazelle, a broad-coverage machine translationsystem.

(Insert Figure 2 about here)

As one can imagine, building an MT system for written documents is a difficult task Themachine-generated output must be coherent, grammatical, and correct Even if the sourcelanguage is mapped correctly to a representation in the interlingua, it is the generationsystem's responsibility to plan the structure of the final text, choose the right words, and putthe words in the correct order according to the grammar of the target language

One of the reasons that building an NLG system is so difficult is that the generation grammarrequires the input to NLG be rich with many linguistic details As an example, we can look atthe Nitrogen system, which is one of the more flexible systems, but it still requires a fairamount of linguistic detail (e.g., θ-roles, such as agent, patient, see Figure 3) that can bedaunting for some applications While it may not pose much of a problem for MT systemswhere the linguistic features can be extracted from the source language, there are otherapplications of NLG where this is a major problem

(Insert Figure 3 about here)Some other applications in which NLG plays an essential role include personalized letterwriting (Reiter, et al 2000) and report generation systems, including multilingual reportgeneration (Goldberg, et al 1994)

In all of these text-based applications, the focus of the NLG component is on automaticallygenerating well-structured, grammatical, and well-written text In the next section, we discusssome of the different issues that arise when designing an NLG component for spoken dialogsystems

Trang 8

2.2 Spoken Dialog Systems

A spoken dialog system enables human-computer interaction via spoken natural language Atask-oriented spoken dialog system speaks as well as understands natural language tocomplete a well-defined task This is a relatively new research area, but many task-orientedspoken dialog systems are already fairly advanced Examples include a complex travelplanning system (Rudnicky, et al 1999), a publicly available worldwide weather informationsystem (Zue, et al 2000), and an automatic call routing system (Gorin, et al 1997)

Building an NLG component for a spoken dialog system differs from the more traditionalNLG systems for generating documents, but it is a very interesting problem that can provide anovel way of looking at NLG The following are some characteristics of task-oriented spokendialog systems that contribute to the unconventional issues for the developer of the NLGcomponent

1 The language used in spoken dialog is different from the language used in written text.Spoken utterances are generally shorter in length compared to sentences in written text.Spoken utterances follow grammatical rules, but much less strictly than written text Also,the syntactic structures used tend to be much simpler and less varied than those in writtentext We expect this is due to the limited cognitive capacity of humans In a spoken dialog,simple and short utterances may be easier to say and understand, thus enabling moreeffective communication.1

2 The language used in task-oriented dialogs is largely domain-specific The domains forthe current dialog systems are fairly narrow (e.g., flight reservation, weather information).Hence, the lexicon for a given spoken dialog system is small and domain-specific

3 NLG is usually not the main focus in building/maintaining these systems Yet the NLGmodule is critical in system performance and user satisfaction In a telephone-baseddialog system, NLG and text-to-speech (TTS) synthesis are the only modules users will

1 Note, that the unit utterance is different from the unit turn; a turn may consist of several utterances

Trang 9

experience directly, but with limited development resources, NLG has traditionally beenoverlooked by spoken dialog system developers.

Taking these characteristics into account, NLG for task-oriented spoken dialog systems must

(Insert Table 1 about here)

Trang 10

Using the most general segmentation in the first column, content planning entails taking the overall communicative goal of the document, breaking it down into smaller sub-goals, and layiing out the ordering of those sub-goals in a coherent manner In spoken dialog systems, the dialog manager is responsible for content planning.

In sentence planning, the appropriate syntactic structure and meaning words are chosen, aswell as sentence boundaries In syntactic realization, the meaning words are connected toproduce well-formed strings

In the rest of this section (and in our NLG system), we chose not to cover content/textplanning in detail and concentrate on sentence planning and syntactic realization This isbecause content planning is not as relevant in the current spoken dialog systems Also, for ourconvenience, we will group sentence planning and syntactic realization under the name

“surface realization” The term “surface realization” here also includes some aspects ofsentence planning This conflicts with the use of the term in Mellish and Dale (1998), but fortemplate generation and NLG in spoken dialog systems, combining sentence planning intosurface realization seems more natural

3.1 Surface Realization

A definition of surface realization given in Mellish and Dale (1998) is as follows:

Determining how the underlying content of a text should be mapped

into a sequence of grammatically correct sentences … An NLG

system has to decide which syntactic form to use, and it has to ensure

that the resulting text is syntactically and morphologically correct

One technique for surface realization is using templates, which is widely used in simple textgeneration applications At the other end of the spectrum is a technique based on generationgrammar rules, which most research systems have focused on Recently, there has been work

on hybrid systems and corpus-based methods The next section describes the existing

Trang 11

approaches, gives examples, and then analyzes the characteristics of the different approachesfor comparison.

3.2.1 Grammar Rule-Based Technique

Many research NLG systems use generation grammar rules, much like NLU systems withsemantic or syntactic grammars Most generation grammars are syntactic, and some rely onone or more linguistic theories A good example of a rule-based surface realizer is SURGE(Elhadad and Robin, 1996) It incorporates a couple of different linguistic theories, namelysystemic grammar and lexicalist linguistics One drawback of using specific linguistictheories is that no one theory covers all possible sentence constructions SURGE, with itscombination of theories, still cannot generate some sentences which the underlying theories

do not provide rules for Nevertheless, generation grammar rules enable an NLG system tohave wide coverage, be domain independent, and be reusable, proven by many very differentapplications that use SURGE as the surface realizer

Input to SURGE, and all other rule-based systems need to be very richly specified withlinguistic features such as verb tense, number, NP heads, categorization, definiteness, andothers (see Figure 4) This is definitely a major disadvantage of the rule-based technique For

a rule-based system to be more efficient over a template-system, the amount of time and effort

of designing and adhering to such a rich input representation must be justified This isprobably why most spoken dialog systems as well as other systems in which NLG isnecessary but not the main focus, use templates rather than grammar rules Most rule-based

Trang 12

systems do provide default values, but using the default values too often defeats the purpose

of rule-based technique; it would produce output equivalent to a poorly developed set oftemplates With richly specified input, though, SURGE produces a wide variety of sentences.Here are some examples (Elhadad and Robin, 1999)

Michael Jordan scored two-thirds of his 36 points with 3 point shots, to

enable Chicago to hand New York a season high sixth straight loss

Dallas, TX – Charles Barkley matched his season record with 42 points

Friday night as the Phoenix Suns handed the Dallas Mavericks their

franchise worst 27th defeat in a row at home 123-97

Figure 4 shows an example of an input to SURGE, which produces the sentence “MichaelJordan scored 36 points” As you can see, it is a very rich input representation Manylinguistic features must be specified in the input In the next section, we look at an approachthat compensates for this drawback with a statistical model

As is true for parsing grammars, generation grammar rules take much effort and time todevelop The set of grammar rules is the core knowledge source of any rule-based NLGsystems, and the quality of output and coverage of sentence types depend on the set ofgrammar rules Not only do they take time and effort, only a highly skilled grammar writercan write the rules If a new type of sentence needs to be generated, new rules must be addedand checked for consistency with the existing rules Such maintenance is costly Oneadvantage, though, is that many of the rules are domain independent, so they can be reused indifferent applications, provided the input specification conforms

3.2.2 Template-based Technique

In a template-based generation system, the developer handcrafts a set of templates and cannedexpressions to be used A “canned expression” is just a fixed string(s) to be inserted into any

Trang 13

place in the document A “template” is a canned expression with one or more slots that is leftempty to be filled with appropriate values

The following example (artificially created for a simple demonstration of templates in a generating system) illustrates how canned expressions and templates are typically used

letter-Dear Ms Brown,

We are sorry to inform you that your application for the ABC platinum

Visa card has not been approved We hope that you will consider us

again for your future needs

Dear Mr White,

We are sorry to inform you that your application for the ABC classic

Dear Ms Oh,

We are happy to inform you that your application for the ABC gold

Visa card has been approved Your credit limit is $5,000 We look

forward to a great relationship with you

In the above example, there are two different letter forms: one for notifying that theapplication has been accepted, and another for notifying that the application has beendeclined Each of the letters consists of some canned expressions (e.g., “We hope that you will

…”), and templates (e.g., “Dear ,”), where the underlined words show where the slots of

a template have been filled with appropriate values

The template-based technique is popular among business applications where similardocuments are produced in large quantities (e.g., customer service letters) It is also used often

in dialog systems where there is only a finite and limited set of system output sentences

Trang 14

Although the templates seem to work fine for the above example, the limitations of thetemplate system can be reached quite quickly Consider, for example, revising the first letter

to say that the application for platinum Visa has been declined, but she can instead receive agold Visa card with the credit limit of $3,000 Using the current templates, the letter would be

Dear Ms Brown,

We are happy to inform you that your application for the ABC gold

Visa card has been approved Your credit limit is $3,000 We look

forward to a great relationship with you

With some clever rearrangement and deletion of one sentence, the letter can be improved

Dear Ms Brown,

Visa card has not been approved We are happy to inform you that your

application for the ABC gold Visa card has been approved Your credit

limit is $3,000 We look forward to a great relationship with you

The first two sentences of the letter would be best if aggregated to produce something like

this

Dear Ms Brown,

Visa card has not been approved, but we are happy to inform you that

your application for the ABC gold Visa card has been approved Your

credit limit is $3,000 We look forward to a great relationship with you

Trang 15

To do this using templates, however, would require writing one more template for thatsentence We leave it up to the readers to imagine how difficult it would be to extend thetemplate system to create any sophisticated documents There are, however, some templatesystems that build in capabilities to connect sentences using the appropriate connectives (seeReiter, 1995) Nevertheless, one can always find the limitations of template-based systems,and building in more and more capabilities to account for the limitations would result in avery complex system, which would lose the advantage over rule-based systems In principle, atemplate-based system can generate every sentence a rule-based system can generate, if theset of templates covers all of the possible sentences the grammar rules can generate.Obviously, that is not very practical.

There are, however, advantages of using a template-based system One is that the input to atemplate-based system can be minimally specified In the example above, only the letter typeand the slot-fillers (e.g., customer’s name, credit limit) are needed Compare this to the inputrepresentation that a rule-based system would need (see previous section), and one can seethat the work has been merely transferred to another part of the application that must producethe rich input representation

Another advantage of a template-based system is that anyone who can write English (or anyother language one wishes to automatically generate) can design a template-based system, and

if that person can also program in a computer language, he/she can implement the system

3.2.3 Hybrid Technique: Rules + Statistics

This technique was developed to overcome the drawback of the rule-based technique that theinput needs to be richly specified The general idea is to use a statistical n-gram languagemodel to compensate for any missing features in the input or any gaps in the knowledgebases Nitrogen (Langkilde and Knight, 1998), a generation engine within an MT system, isthe first to apply this technique In cases where the input specification is not complete (e.g.,

verb tense is missing) or there are more than one word to express the concept (i.e., lexical

Trang 16

sentences, it first generates a lattice of possible sentences, and then uses bigram statistics to

choose the best one An example would be, if the lexicon did not have the plural form of

“goose”, instead of using the general English morphology to generate the ungrammatical

phrase “two gooses”, it would use the bigram statistics to generate the correct phrase “two

geese” This certainly adds robustness into Nitrogen that strictly rule-based systems can only

get with carefully designing knowledge bases and input frames

This and other hybrid systems, such as Busemann and Horacek (1998) which uses templates

and grammar rules, combine the advantages of different techniques In the next section, we

will describe our NLG system, which is also a hybrid system

4 NLG in the Carnegie Mellon Communicator

The Carnegie Mellon Communcator has been in development since the fall of 1998, and at

that time, a simple version of the generation module was first designed and implemented In

this section, we first give a general description of the Carnegie Mellon Communicator and

where NLG fits in, and then we describe the evolution of our NLG module, from a purely

template-based system, to the current hybrid system of templates and statistical generation

4.1 Carnegie Mellon Communicator

The Carnegie Mellon Communicator (Rudnicky, et al 1999) is a spoken dialog system with a

telephone interface It enables users to arrange complex travel itineraries via natural

conversation It is made up of several modules working together to provide a smooth

interaction for the users while completing the task of arranging travel to fit the user’s

constraints

Sphinx-II automatic speech recognizer takes the user’s speech and outputs a string The

Phoenix semantic parser takes the string and turns it into a semantic frame The dialog

manager (Xu and Rudnicky, 2000) handles the semantic frames from the Phoenix parser and

interacts with various domain agents such as the date/time module, the user preferences

16

Input Frame

{act querycontent depart_timedepart_date 20000501}

Language

Models Candidate Utterances

What time on {depart_date}?

At what time would you be leaving {depart_city}?

Scoring Generation

Trang 17

database, and the flight information agent When the dialog manager decides that somethingneeds to be communicated to the user, it sends a frame to NLG, which then takes the frameand generates an utterance to be synthesized by the TTS module The input frame from the

dialog manager consists of “act”, which is similar to a speech act, “content”, which, together with “act”, defines the utterance class, or the equivalent of a dialog act, and attribute-value

pairs, some of which are hierarchically structured The input frame, which is an object that thedialog manager and other modules use to communicate, has no linguistic features such as

subject, patient, tense, head, etc An example of an input frame is in Figure 5.

4.2 Evolution of NLG in Communicator

The NLG module communicates with the dialog manager and TTS via socket connections.The module is implemented in Perl These and other implementation details have not changedmuch since its inception by Kevin Lenzo

4.2.1 Template-based system

Our NLG module started off with around 50 templates, including the “say” speech act, whichallows the dialog manager to send a template as part of the input frame This adds flexibility

to the system during the development cycle, as the dialog manager has complete control to

“add” templates as needed This functionality is still in the system, but the use of it is limitedonly to development purposes This is because the control of the system’s language shouldreside only in the NLG module Otherwise, the modularity of our system would becompromised

The number of templates grew as we added more functionality to our system The largestexpansion came with the addition of the “help” speech act, when we added 16 templates toprovide context-sensitive help messages More information about the actual templates isavailable in the Appendix Similar to Axelrod’s (2000) template generation system, which isalso designed for a spoken dialog system in the travel reservations domain, our templates are

Tiêu đề	Stochastic Natural Language Generation For Spoken Dialog Systems
Tác giả	Alice H. Oh, Alexander I. Rudnicky
Trường học	Carnegie Mellon University
Chuyên ngành	Computer Science
Thể loại	thesis
Thành phố	Pittsburgh

Định dạng
Số trang	35
Dung lượng	142,5 KB