1. Trang chủ
  2. » Luận Văn - Báo Cáo

Báo cáo khoa học: "UNGRAMHATICALITY AND EXTRA-GRAMMATICALITY IN NATURAL LANGUAGE UNDERSTANDING SYSTEMS" docx

6 358 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 6
Dung lượng 476,59 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Rules that are both manually added to the original grammar or automatically constructed during parsing analyze the ill-formed input.. Predicates can be desianated by the grammar writer a

Trang 1

Stan C Kwasny as The Ohio S t a t e U n i v e r s i t y Columbus, Ohio

1 I n t r o d u c t i o n Among the components included in N a t u r a l Language

Understanding (NLU) systems i s a grammar which s p e c i f i e s

much o f the l i n g u i s t i c s t r u c t u r e o f the u t t e r a n c e s t h a t

can be expected However, i t i s c e r t a i n t h a t i n p u t s

that are ill-formed with respect to the grammar will be

received, b o t h because p e o p l e regularly form

ungra=cmatical utterances and because there are a variety

of forms that cannot be readily included in current

grammatical models and are hence "extra-grammatical"

" understanding requires, at the very least, some

attempt to interpret, rather than merely reject, what

seem to be ill-formed utterances." [WIL76]

This paper i n v e s t i g a t e s s e v e r a l language phenomena

commonly considered ungrammatical or e x t r a - g r a m m a t i c a l

and proposes techniques d i r e c t e d at i n t e g r a t i n g them as

much as p o s s i b l e i n t o the c o n v e n t i o n a l grammatical

processing performed by NLU systems through Augmented

Transition Network (ATN) grammars For each NLU system,

a "normative" grammar is assumed which specifies the

structure of well-formed inputs Rules that are both

manually added to the original grammar or automatically

constructed during parsing analyze the ill-formed input

The ill-formedness is shown at the completion of a parse

by deviance from f u l l y grammatical structures We have

been able to do t h i s processing w h i l e p r e s e r v i n g the

structural characteristics of the original grammar and

i t s i n h e r e n t e f f i c i e n c y

considered p r e v i o u s l y in p a r t i c u l a r NLU systems, see f o r

example the e l l i p s i s handling in LIFER [HEN??] Some

techniques similar to ours have been used for parsing,

see for example the conjunction mechanism in LUNAR

[WOO?3) On the l i n g u i s t i c s i d e , Chomsky [CHO6q] and

Katz [KAT6q], among o t h e r s have considered the t r e a t m e n t

Weischedel and Black [WEI?9] The present study is

distinguished by the range of phenomena c o n s i d e r e d , i t s

s t r u c t u r a l and e f f i c i e n c y g o a l s , and the i n c l u s i o n o f

the techniques proposed w i t h i n one i m p l e m e n t a t i o n

mechanisms aimed at solving the problems, and describes

extensions are suggested Unless otherwise noted, all

ideas have been tested through implementation A more

d e t a i l e d and extended discussion of a l l points may be

found in Kwasny [KWA?9]

I I Language Phenomena

e x t r a - g r a m m a t i c a l i n p u t depends on two f a c t o r s The

first is the identification of types of ill-formednese

and the p a t t e r n s they f o l l o w The second i s the

r e l a t i n g o f i l l - f o r m e d i n p u t to the parsing path o f a

i n t r o d u c e s the types o f i l l - f o r m e d n e s s we have s t u d i e d ,

ee Current Address:

Computer Science Department

Indiana U n i v e r s i t y

Bloomington, I n d i a n a

By

Norman K Sondheimer Sperry Univac Blue Bell, Pennsylvania

structures in terms of ATN grammars

II.I Co-Occurrence Violations Our first class of errors can be connected to co-occurrence restrictions within a sentence There are many occassions in a sentence where two p a r t s o r more must agree (= i n d i c a t e s an i l l - f o r m e d or ungrammatical

s e n t e n c e ) :

=Draw a c i r c l e s

" I w i l l stay from now under midnight

The errors in the above involve coordination between the underlined words The first example illustrates simple agreement problems The second involves a complicated

r e l a t i o n between a t l e a s t the t h r e e u n d e r l i n e d terms Such phenomena do occur n a t u r a l l y For example, Shore ($H077] analyzes fifty-six freshman English papers

w r i t t e n by Black c o l l e g e s t u d e n t s and r e v e a l s p a t t e r n s

o f nonstandard usage ranging from u n i n f l e c t e d p l u r a l s , possessives, and t h i r d person s i n g u l a r s t o

o v e r i n f l e c t i o n (use o f i n a p p r o p r i a t e e n d i n g s ) For c o - o c c u r r e n c e v i o l a t i o n s , the blocks t h a t keep

i n p u t s from being parsed as the user intended a r i s e from

a f a i l u r e o f a t e s t on an arc or the f a i l u r e t o s a t i s f y

an arc type r e s t r i c t i o n , e g , f a i l u r e o f a word t o be

in the c o r r e c t c a t e g o r y The e s s e n t i a l b l o c k in the

f i r s t example would l i k e l y occur on an agreement t e s t on

an arc a c c e p t i n g a noun, The e s s e n t i a l blockage in the second example is likely to come from f a i l u r e of the arc testing the f i n a l preposition

11.2 E l l i p s i s and Extraneous Terms

In handling e l l i p s i s , the most r e l e v a n t d i s t i n c t i o n

t o make i s between c o n t e x t u a l and t e l e g r a p h i c e l l i p s i s

C o n t e x t u a l e l l i p s i s occurs when a form o n l y makes proper sense in the c o n t e x t o f o t h e r sentences For example, the form

e P r e s i d e n t Carter has

seems ungrammatical without the preceding question form Who has a daughter named Amy?

P r e s i d e n t C a r t e r has

Telegraphic ellipsis, on the other hand, occurs when a form o n l y makes proper sense in a particular

s i t u a t i o n For example, the t o m e

3 c h a i r s no w a i t i n g ( s i g n in barber shop) Yanks s p l i t ( h e a d l i n e in s p o r t s s e c t i o n )

P r o f i t margins f o r each product (query submitted to a NLU system)

Trang 2

noted In parentheses The final example Is from an

e x p e r i m e n t a l s t u d y of NLU for management i n f o r m a t i o n

which i n d i c a t e d t h a t such forms must be c o n s i d e r e d

[MAL75]

Another type of unarammaticality related to

ellipsis occurs when the user puts unnecessary words or

phrases In an utterance The reason for an extra word

may be a change of intention In the middle of an

utterance, an oversight, or simply for emphasis For

example,

• Draw a llne with from here to there

" L i s t p r i c e s o f s i n g l e u n i t p r i c e s f o r 72 and 73

The second example comes from M a l h o t r a [MALT5]

The best way to see the errors In terms of the ATN

is to think of the user as trylng to complete a path

through the grammar, but having produced an input that

has too many or too few forms necessary to traverse all

arcs,

II.3 C o n j u n c t i o n

C o n j u n c t i o n i s an e x t r e m e l y common phenomenon, b u t

i t i s seldom d i r e c t l y t r e a t e d i n 8 grammar We have

c o n s i d e r e d s e v e r a l t y p o s o f c o n j u n c t i o n

Simple forms o f c o n j u n c t i o n o c c u r most f r e q u e n t l y ,

as i n

John l o v e s Mary and h a t e s Sue

Gapping o c c u r s when i n t e r n a l segments o f t h e second

c o n j u n c t a r e m i s s i n a , as i n

J o h n l o v e s Mary and Wary J o h n

The l i s t form o f c o n j u n c t i o n o c c u r s when more t h a n two

e l e m e n t s a r e j o i n e d i n a s i n g l e p h r a s e , as i n

John l o v e s Wary Sue, Nancy end B i l l

Correlative c o n j u n c t i o n o c c u r s i n s e n t e n c e s t o

coordinate the Joining of constituents, as in

John b o t h l o v e s and h a t e s Sue

The r e a s o n c o n J u n c t s a r e g e n e r a l l y l e f t o u t o f

grammars i s t h a t t h e y can appear i n so many p l a c e s t h a t

i n c l u s i o n would d r a m a t i c a l l y i n c r e a s e t h e s i z e o f t h e

grammar The same argument applies t o the ungrammatical

phenomena Since t h e y a l l o w so much v a r i a t i o n compared

t o g r a m m a t i c a l f o r m s , i n c l u d i n g them w i t h e x i s t i n g

techniques would dramatically increase the size oF a

gram~aar F u r t h e r t h e r e i s a r e a l d i s t i n c t i o n i n terms

of completeness and clarity of intent between

g r a m m a t i c a l and ungram mat ic a l f o r m s Hence we f e e l

justified In suggesting speciai techniques f o r their

treatment

I I I Proposed Mechanisms and How They Apply

The f o l l o w i n g p r e s e n t a t i o n o f o u r t e c h n i q u e s

assumes an u n d e r s t a n d i n g o f t h e ATN model The

techniques are a p p l i e d to the langumae phenomena

discussed ~n t h e p r e v i o u s section

I I I l R e l a x a t i o n T e ch n iqu es The f i r s t two methods d e s c r i b e d a r e r e l a x a t i o n methods which a l l o w t h e s u c c e s s f u l t r a v e r s a l o f ATN a r c s

t h a t m i a h t n o t o t h e r w i s e be t r a v e r s e d D u r i n 8 p a r s i n a , whenever an a r c c a n n o t be t a k e n , a check i s made t o see

i f some form o f r e l a x a t i o n can a p p l y I f i t can t h e n a

b a c k t r a c k p o i n t i s c r e a t e d which i n c l u d e s t h e r e l a x e d

v e r s i o n o f t h e a r c These a l t e r n a t i v e s a r e n o t

c o n s i d e r e d u n t i l a f t e r a l l p o s s i b l e 8 r a m m a t l c s l p a t h s have been a t t e m p t e d t h e r e b y i n s u r t n 8 t h a t 8 r a m m a t i c e l

i n p u t s a r e s t i l l handled c o r r e c t l y R e l a x a t i o n o f

p r e v i o u s l y r e l a x e d a r c s i s a l s o p o s s i b l e Two methods

o f r e l a x a t i o n have been I n v e s t i g a t e d Our f i r s t method i n v o l v e s r e l a x l n 8 a t e s t on an

a r c , s i m i l a r t o t h e method used by Weisohedel i n [WEI79] T e s t r e l a x a t i o n o c c u r s when t h e t e s t p o r t i o n

of an arc contains a relaxable predicate and the test

f a i l s Two methods o f t e s t r e l a x a t i o n have been

i d e n t i f i e d and implemented based on p r e d i c a t e t y p e Predicates can be desianated by the grammar writer as either absolutely violable in which case the opposite value of the predicate (determined by the LISP function NOT applied to the predicate) Is substituted for the predicate during relaxation or conditionally violable in which case s substitute predicate is provided For example, consider the following to be a test that fails: (AND

(INFLECTING V) (INTRAN3 V))

I f t h e p r e d i c a t e INFLECTING was d e c l a r e d a b s o l u t e l y

v i o l a b l e and i t s use i n t h i s t e s t r e t u r n e d t h e v a l u e NIL, t h e n t h e n e g a t i o n o f (INFLECTING Y) would r e p l a c e

It in the test creating a new arc with the test:

(AND

T (INTRANS V))

I f INTRANS were c o n d i t i o n a l l y v i o l a b l e w i t h t h e substitute predicate TRANS, then the following test would appear on t h e new a r c :

(AND (INFLECTING V) (TRANS V))

Whenever more t h a n one t e s t i n a f a i l i n g a r c i s

v i o l a b l e , a l l p o s s i b l e s i n g l e r e l a x a t i o n s a r e a t t e m p t e d

i n d e p e n d e n t l y A b s o l u t e l y v i o l a b l e p r e d i c a t e s can be permitted in cases where the test describes some

s u p e r f i c i a l c o n s i s t e n c y c h e c k i n g o r where t h e t e s t ' s

f a i l u r e o r success d o e s n ' t have a d i r e c t a f f e c t on meaning, w h i l e c o n d i t i o n a l l y v i o l a b l e p r e d i c a t e s a p p l y

t o p r e d i c a t e s which must be r e l a x e d c a u t i o u s l y o r e l s e loss o f meaning may result

ChomsMy d i s c u s s e s t h e n o t i o n o f o r g a n i z i n g word

c a t e g o r i e s h i e r a r c h i c a l l y i n d e v e l o p i n g his i d e a s on

d e g r e e s of g r a m m a t i c a l n e s s We have a p p l i e d and

e x t e n d e d these i d e a s In o u r second method o f r e l a x a t i o n

c a l l e d c a t e s o r y r e l a x a t i o n In t h i s method, t h e 8rammar

w r i t e r p r o d u c e s , a l o n g w i t h t h e grammar, a h i e r a r c h y

d e s c r i b i n g t h e r e l a t i o n s h i p amen8 words, c a t e g o r i e s , and

p h r a s e t y p e s which i s u t i l i z e d by t h e r e l a x a t i o n mechanism to c o n s t r u c t r e l a x e d v e r s i o n s o f a r c s t h a t

h i v e f a i l e d When an arc f a i l s because o f an arc type failure (i.e., because a particular word, category, or

p h r a s e was n o t f o u n d ) a new a r c ( o r a r c s ) may be c r e a t e d

a c c o r d i n g t o the d e s c r i p t i o n o f t h e word, c a t e g o r y , o r

p h r a s e i n t h e h i e r a r c h y T y p i c a l l y PUSH a r c s w i l l relax to PUSH arcs, CAT arcs to CAT or PUSH arcs, and WRD o r HEM a r c s t o CAT a r c s C o n s i d e r f o r example, t h e

s y n t a c t i c c a t e a o r y h i e r a r c h y f o r pronouns shown i n

F i g u r e 1 For t h i s example, t h e c a t e a o r y r e l a x a t i o n

Trang 3

pronouns to include the category PRONOUN The arc

produced from category r e l a x a t i o n o f PERSONAL pronouns

a l s o i n c l u d e s t h e s u b c a t e g o r i e s REFLEXIVE and

DEMONSTRATIVE i n o r d e r t o expand t h e scope o f t e r m s

during relaxation As with test relaxation, successive

relaxations could occur

For both methods of relaxation, "deviance notes"

multiple levels of relaxation occur, a note is generated

for each of these The entire list of deviance notes

accompanies the final structure produced by the parser

In this way, the final structure is marked as deviant

and the nature of the deviance is available for use by

other components of the understanding system

In our implementation, test relaxation has been

f u l l y i m p l e m e n t e d , w h i l e c a t e g o r y r e l a x a t i o n has been

i m p l e m e n t e d f o r a l l cases e x c e p t t h o s e i n v o l v i n g PUSH

requires a modification to our backtracking algorithm

I I I 2 Co-Occurrence and R e l a x a t i o n

The solution being proposed to handled forms that

are deviant because of co-occurrence violations centers

around the use of relaxation methods Where simple

tests exist within a grammar to filter out unacceptable

forms of the type noted above, these tests may be

relaxed to allow the acceptance of these forms This

doesn't eliminate the need for such tests since these

tests help in disambiguation and provide a means by

which sentences are marked as having violated certain

r u l e s

For co-occurrence violations, the point in the

grammar where parsing becomes blocked is often exactly

where the test or category violation occurs An arc at

that point is being attempted and fails due to a failure

alternative generated which may be explored at a later

point via backtracking For example, the sentence:

WJohn l o v e Mary

shows a disagreement between the subject (John) and the

verb (love) Most probably this would show up during

parsing when an arc is attempted which is expecting the

verb of the sentence The test would fall and the

traversal would not be allowed At that point, an

backtracking to consider

III.) Patterns and the Pattern Arc

In this section, relaxation techniques, as a p p l i e d

to the grammar itself, are introduced through the use o f

patterns and pattern-matching algorithms Other systems

have used patterns for parsing We have devised a

formalism, patterns which are flexible and useful

implemented and a r e now t e s t i n g , a p a t t e r n i s a l i n e a r

sequence of ATN arcs which is matched against the input

string A pattern arc (PAT) has been added to the ATN

formalism whose form is similar to that of other arcs:

(PAT <pat apec> <test> <act> a <term>)

The pattern specification (<pat spec>) is defined as:

< p a t spec> ::: ( < p a t t > <mode> a)

<part> ::= (<p arc>*)

<pat name>

<mode> : : = UNANCHOR

OPTIONAL SKIP

<p arc> : : = <arc>

> <arc>

<pat name> ::= user-assiGned pattern name

>

The pattern (<part>) is either the name of a pattern, a

">", or a list of ATN arcs, each of which may be preceded by the symbol ">", while the pattern mode (<mode>) can be any of the keywords, UNANCHOR, OPTIONAL,

patterns by name, a dictionary o f patterns is supported

A dictionary of arcs is also supported, allowing the referencing of arcs by name as well Further, named arcs are defined as macros, allowing the dictionary and the grammar to be substantially reduced in size

THE PATTERN MATCHER Pattern matching proceeds by matching each arc in the pattern against the input string, but is affected by the chosen "mode" of matching Since the individual component a r c s a r e , i n a sense, complex p a t t e r n s , t h e ATN i n t e r p r e t e r can be c o n s i d e r e d p a r t o f t h e m a t c h i n g

a l g o r i t h m as w e l l I n a r e s w i t h i n p a t t e r n s , e x p l i c i t

t r a n s f e r t o a new s t a t e i s i g n o r e d and t h e n e x t a r c

a t t e m p t e d on success i s t h e one f o l l o w i n g i n t h e

p a t t e r n An a r e i n a p a t t e r n p r e f a c e d b y " > " can be

c o n s i d e r e d o p t i o n a l , i f t h e OPTIONAL mode has been

s e l e c t e d t o a c t i v a t e t h i s f e a t u r e When t h i s i s d o n e ,

t h e m a t c h i n g a l g o r i t h m s t i l l a t t e m p t s t o match o p t i o n a l

a r e a , b u t may i g n o r e them A p a t t e r n u n a n c h o r i n g

c a p a b i l i t y i s a c t i v a t e d by s p e c i f y i n g t h e mode UNANCHOR

In this mode, patterns are permitted to skip words prior

results in words being ignored between matches of the arcs within a pattern This is a generalization of the UNANCHOR mode

P a t t e r n m a t c h i n g a g a i n r e s u l t s i n d e v i a n c e n o t e s For p a t t e r n s , t h e y c o n t a i n i n f o r m a t i o n n e c e s s a r y t o

d e t e r m i n e how m a t c h i n g s u c c e e d e d

SOURCE OF PATTERNS

An a u t o m a t i c p a t t e r n g e n e r a t i o n mechanism has been

i m p l e m e n t e d u s i n g t h e t r a c e o f t h e c u r r e n t e x e c u t i o n

p a t h t o p r o d u c e a p a t t e r n T h i s i s i n v o k e d b y u s i n g a

" > " as t h e p a t t e r n name P a t t e r n s produced i n t h i s

f a s h i o n c o n t a i n o n l y t h o s e a r c s t r a v e r s e d a t t h e c u r r e n t

l e v e l o f r e c u r s i o n i n t h e n e t w o r k , a l t h o u g h we a r e

p l a n n i n g t o implement a g e n e r a l i z a t i o n o£ t h i s i n which

s u b n e t ~ o r k p a t h s Each a r e i n an a u t o m a t i c p a t t e r n i s marked as o p t i o n a l P a t t e r n s can a l s o be c o n s t r u c t e d

d y n a m i c a l l y i n p r e c i s e l y t h e same way g r a m m a t i c a l structures are built using BUILDQ The vehicle by which this is accomplished is discussed next

AUTOMATIC PRODUCTION OF ARCS

P a t t e r n a r c s e n t e r t h e grammar i n two ways They

a r e m a n u a l l y w r i t t e n i n t o t h e grammar i n t h o s e cases

where t h e u n g r a m m a t i c a l i t i e s a r e common and t h e y a r e added t o t h e grammar a u t o m a t i c a l l y i n t h o s e cases where

t h e u n g r a m m a t i c a l i t y i s d e p e n d e n t on c o n t e x t P a t t e r n

a r c s p r o d u c e d d y n a m i c a l l y e n t e r t h e grammar t h r o u g h one

o f two d e v i c e s They may be c o n s t r u c t e d as needed b y

Trang 4

use through an expectation mechanism

As the expectatlon-based parsing efforts clearly

show, syntactic elements especially words c o n t a i n

i m p o r t a n t c l u e s on p r o c e s s i n g I n d e e d we a l s o have

found It useful to make the ATN mechanism more "active"

by allowing it to produce new arcs based on such clues

TO achieve t h i s , t h e CAT, MEM, TBT, and WRD a r c s have

been g e n e r a l i z e d and four new "macro" a r c s , known as

CAT e HEM e, TST a, and WRD e have been added t o the ATN

formalism These are similar In every way to their

c o u n t e r p a r t s , e x c e p t t h a t as a f i n a l a c t i o n , i n s t e a d of

indicating the state t o which t h e traversal leads, a new

arc i s o o n s t r u c t e d d y n a m i c a l l y and i m m e d i a t e l y e x e c u t e d

The d i f f e r e n c e i n t h e form t h a t t h e new a r c t a k e s i s

seen i n t h e f o l l o w i n g p a i r where < c r e s t act> I s used t o

d e f i n e t h e dynamic a r c :

(CAT <cat> < t e s t > <act> a <term >)

(CAT e <cat> < t e s t > <act> a < c r e a t a c t > )

Arcs computed by macro arcs can be of any type permitted

by the ATN, b u t one of the most useful arcs to compute

in this manner is the PAT arc discussed above

EXPECTATIONS

The macro arc forces immediate execution of an arc

Arcs may also be computed and temporarily added to the

grammar for l a t e r execution t h r o u g h an " e x p e c t a t i o n "

mechanism E x p e c t a t i o n s a r e p e r f o r m e d as a c t i o n s w i t h i n

a r c s ( a n a l o g o u s t o t h e H O L D a c t i o n f o r p a r s i n g

structures) or as actions elsewhere In t h e MLU system

(e.g., during generation when particular types of

r e s p o n s e s can be f o r e s e e n ) Two forms a r e a l l o w e d :

(EXPECT <crest act> <state>)

(EXPECT <crest act> )

In the first case, the arc created is bound t o a state

as specified When later processing leads to that

s t a t e , t h e e x p e c t e d a r c will be a t t e m p t e d as one

alternative at that state In the second case, where no

state is specified, the effect is to attempt the arc at

every state visited d u r i n g the parse

The r a n g e of an e x p e c t a t i o n produced d u r i n g p a r s i n g

is ordinarily l i m i t e d t o a single s e n t e n c e , with the arc

disappearing after it has been used; h o w e v e r , the start

state, S e, is reserved for expectations intended to be

active at the beginning of the next sentence These

w i l l d i s a p p e a r i n t u r n a t t h e e n d - - ~ p r o o e s s i n g f o r t h a t

s e n t e n c e

IIZ.q Patterns t Elllpsls~ and Extraneous Forms

The P a t t e r n a r c i s proposed as t h e p r i m a r y

mechanism f o r h a n d l i n g e l l i p s i s and e x t r a n e o u s f o r m s A

P a t t e r n a r c can be seen as c a p t u r i n g a s i n g l e p a t h

t h r o u g h a netWOrk The matcher g i v e s some freedom In

how t h a t p a t h r e l a t e s t o a s t r i n g We p r o p o s e t h a t t h e

a p p r o p r i a t e p a r s i n g p a t h t h r o u g h a n e t w o r k r e l a t e s t o an

e l l i p t i c a l s e n t e n c e o r one w i t h e x t r a words i n t h e same

way With c o n t e x t u a l e l l i p s i s , t h e r e l a t i o n s h i p w i l l be

i n h a v i n g some o f t h e a r c s on the c o r r e c t p a t h n o t

satisfied In Pattern arcs, these will be represented

by a r c s marked as o p t i o n a l With c o n t e x t u a l e l l i p s i s ,

d i a l o g u e c o n t e x t w i l l p r o v i d e t h e d e f a u l t s f o r t h e

m i s s i n g components With P a t t e r n a r c s , t h e d e v i a n c e

notes will show what was left o u t and the other

components in the ~ U system will be responsible for

supplying the values

The source of patterns for contextual ellipsis is

i m p o r t a n t In L i f e r [HEN77], t h e p r e v i o u s u s e r i n p u t can be seen as a pattern for elliptical processing of the current input The automatic pattern generator

d e v e l o p e d h e r e , a l o n g w i t h t h e e x p e c t a t i o n mechanism,

w i l l c a p t u r e t h i s l e v e l o f p r o c e s s i n g But w i t h the

a b i l i t y t o c o n s t r u c t a r b i t r a r y p a t t e r n s and t o add them

t o the grammar from o t h e r components of t h e MLU system,

o u r approach can a c c c o m p l i s h much more For example, a

q u e s t i o n g e n e r a t i o n r o u t i n e c o u l d add an e x p e c t a t i o n o f

a y e s / n o answer i n f r o n t o f a t r a n s f o r m e d r e p h r a s i n g o f

a q u e s t i o n , as i n Did Amy klas anyone?

Yes, J i s m y was kissed

Patterns for telegraphic ellipsis will have to be added to the grammar manually Generally, patterns of usage must be identified, say in a study like that of Malhotra, so that appropriate patterns can be constructed Patterns for extraneous forms will also be added In advance These w i l l e i t h e r use the unachor

o p t i o n In o r d e r t o s k i p f a l s e s t a r t s , o r d y n a m i c a l l y produced p a t t e r n s t o c a t c h r e p e t i t i o n s f o r emphasis In

g e n e r a l , o n l y a l i m i t e d number o f t h e s e p a t t e r n s s h o u l d

be r e q u i r e d The v a l u e o f t h e p a t t e r n mechanism h e r e ,

e s p e c i a l l y In t h e case of t e l e g r a p h i c e l l i p s i s , w i l l be

i n c o n n e c t i n g the u n g r a m m a t i c a l t o g r a m m a t i c a l f o r m s

III.5 C o n j u n c t i o n and Macro Arcs

P a t t e r n a r c s a r e a l s o proposed as t h e p r i m a r y mechanism f o r h a n d l i n g c o n j u n c t i o n The r a t i o n a l e f o r

t h i s i s t h e o f t e n noted c o n n e c t i o n between c o n j u n c t i o n and e l l i p s i s , see f o r example H a l l t d a y and Haman [HAL75] T h i s i s c l e a r w i t h g a p p i n g , as i n t h e

f o l l o w i n g where t h e p a r e n t h e s e s show t h e m i s s i n g component

John l o v e s Mary and Mary ( l o v e s ) John

BUt i t a l s o can be seen w i t h o t h e r f o r m s , as i n John l o v e s Mary and (John) h a t e s Sue

John l o v e s H a r y , (John l o v e s ) Sue, (John l o v e s ) Mancy, and (John l o v e s ) B i l l

Whenever a c o n j u n c t i o n i s seen, a p a t t e r n i s d e v e l o p e d from the a l r e a d y i d e n t i f i e d e l e m e n t s and matched a g a i n s t

t h e r e m a i n i n g segments of i n p u t The h e u r i s t i c s for

d e c i d i n g from which l e v e l t o produce the p a t t e r n f o r c e

t h e most g e n e r a l i n t e r p r e t a t i o n i n o r d e r t o enc our age an

e l l i p t i c a l r e a d i n g

A l l o f t h e forms o f c o n j u n c t i o n d e s c r i b e d above a r e

t r e a t e d t h r o u g h a g l o b a l l y d e f i n e d s e t o f " c o n j u n c t i o n

a r c s " (Some r e s t r i c t e d c a s e s , such as " a n d " f o l l o w i n g

" b e t w e e n " , have t h e c o n j u n c t i o n b u i l t i n t o t h e grammar)

In g e n e r a l , t h i s s e t w i l l be made up o f macro arcs which compute P a t t e r n a r c s The a u t o m a t i c p a t t e r n mechanism

i s h e a v i l y used With s i m p l e c o n j u n c t i o n s , t h e

r i g h t m o s t e l e m e n t s in t h e p a t t e r n s a r e matched

I n t e r n a l e l e m e n t s In p a t t e r n s a r e s k i p p e d w i t h g a p p i n g The l l s t form o f c o n j u n c t i o n can a l s o be h a n d l e d t h r o u g h

t h e c a r e f u l c o n s t r u c t i o n o f dynamic p a t t e r n s which a r e

t h e n e x p e c t e d a t a l a t e r p o i n t C o r r e l a t i v e s a r e treated similarly, with expectations based on the dynamic building of patterns

There a r e a number o f d e t a i l s i n o u r p r o p o s a l which

w i l l n o t be p r e s e n t e d There a r e a l s o v i s i b l e l i m i t s

i t i s i n s t r u c t i v e t o compare the p r o p o s a l t o t h e SYSCONj facility of Woods [W0073] It treats conjunction as

Trang 5

allows for sentences such as

He drove his car through and broke a plate glass

window

which at best we will accept with a misleading d e v i a n c e

n o t e However, i t can not handle the o b v i o u s e l l i p t i c a l

cases, such g a p p i n g , o r the t i g h t l y c o n s t r a i n e d cases,

investigating the pattern approach

I I I 6 Interaction of Techniques

As grammatical processing proceeds, ungrammatical

possibilities are continually being suggested from the

various mechanisms we have implemented To coordinate

all of these activities, the backtracking mechanism has

been improved to keep track o f the:le alternatives All

paths in the original grammar are attempted first Only

when these all fail are the conjunction alternatives and

the manually added and d y n a m i c a l l y produced

alternatives of these sorts connected with a single

state can be thought of as a single possibility A

selection mechanism is used to determine which backtrack

point among the many potential alternatives is worth

exploring next Currently, we use a method also used by

alternative with the longest path length

IV Conclusion and Open Questions

These results are significant, we believe, because

they extend the state of the art in several ways Most

obvious are the following:

The use of the category h i e r a r c h y to handle arc

type failures;

The use of the pattern mechanism to allow for

contextual ellipsis and gapping;

More generally, the use of patterns to allow for

many sorts of ellipsis and conjunctions; and

Finally, the orchestration of all of the techniques

grammatical alternatives are tried first and no

modifications are made to the original grammar, its

inherent efficiency and structure are preserved

IV.1 Open Problems

Various questions for further research have arisen

during the course of this work The most important of

these are discussed here

Better control must be exercised over the selection

of viable alternatives when ungrammatical possibilities

are being attempted The longest-path heuristic is

somewhat weak The process that decides this would need

to take into consideration, among other things, whether

to allow relaxation of a criteria applied to the subject

or to the verb in a case where the subject and verb do

not agree The current path length heuristic would

always relax the verb which is clearly not always

correct

No consideration has been given to the possible

connection of one error wlth another In some cases,

one error can lead to or affect another

c o n s i d e r e d in t h i s s t u d y , f o r example, i d i o m s , metaphors, i n c o r r e c t word o r d e r , run t o g e t h e r s e n t e n c e s ,

i n c o r r e c t p u n c t u a t i o n , m i s s p e l l i n g , and p r e s u p p o s i t i o n a l

f a i l u r e E i t h e r l i t t l e i s known about these p r oc esses

o r they have been s t u d i e d els e w her e i n d e p e n d e n t l y In

e i t h e r case, work remains t o be done

V Acknowledgments

We wish to acknowledge the comments of Ralph Weischedel and Marc Fogel on previous drafts of this paper Although we would like to blame them, any shortcomings are clearly our own fault

VI Bibliography [CHO6q]

[FOD64]

[HAL76]

(HEN77]

[KAT643 [KWA793

[MAL75]

[SHO77]

[WEI79]

[ WIL76 ]

[wo0733

Chomsky, N., "Degrees o f G r a m m a t i c a l n e s s , " in [FOD6~], 38q-389

Fodor, J A and J J Katz, The Structure of Language: Readings in the P h i l o s o p h y o f Language, P r e n t i c e - H a l l , Englewood C l i f f s , New

J e r s e y , 196q

H a l l i d a y , M.A.K and R Hasan, Cohesion i n

E n g l i s h , Longman, London, 1976

H e n d r l x , G G., "The LIFER M a n u a l , " T e c h n i c a l

S t a n f o r d Research I n s t i t u t e , Menlo Park,

C a l i f o r n i a , F e b r u a r y , 1977

K a t z , J J , " S e m i - S e n t e n c e s , " in [FOD64], qoo-q16

Kwasny, S., "T rea t me n t o f Ungrammatical and

E x t r a g r a m m a t i c a l Phenomena i n N a t u r a l Language Understanding Systems," PhD dissertation (forthcoming), Ohio State University, 1979

Management: An Experimental Analysis," MAC TR-I~6, M I T , Cambridge, Ha, F e b r u a r y , 1975 Shores, D L , " B l a c k E n g l i s h and Black

A t t i t u d e s , " in Papers i n Language V a r i a t i o n

D L Shores and C PT-Hines (Ed ~ ] ~ e

U n i v e r s i t y of Alabama Press, U n i v e r s i t y , Alabama, 1977

Weischedel, R M., and J B l a c k , "Responding to Potentially Unparseable Sentences," manuscript,

Delaware, 1979

Wilka, Y., "Natural Language Understanding Systems Within the A.I Paradigm: A Survey," American Journal of Computational Lin~uistlcs,

~ h ~ - # - ~ 1T 1976

Woods, W A2 "An Experimental Parsing System for Transition Network Grammars," in Natural Language P r o c e s s i n g , R M u s l i n ( E d ) , Algorithmlcs Press, 1973

PRONOUN

REFLEXIVE

/;o i

he she y o u r s e l f t h i s t h a t

F i g u r e 1 A C a t e g o r y H i e r a r c h y

Ngày đăng: 31/03/2014, 17:20

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm