Rules that are both manually added to the original grammar or automatically constructed during parsing analyze the ill-formed input.. Predicates can be desianated by the grammar writer a
Trang 1Stan C Kwasny as The Ohio S t a t e U n i v e r s i t y Columbus, Ohio
1 I n t r o d u c t i o n Among the components included in N a t u r a l Language
Understanding (NLU) systems i s a grammar which s p e c i f i e s
much o f the l i n g u i s t i c s t r u c t u r e o f the u t t e r a n c e s t h a t
can be expected However, i t i s c e r t a i n t h a t i n p u t s
that are ill-formed with respect to the grammar will be
received, b o t h because p e o p l e regularly form
ungra=cmatical utterances and because there are a variety
of forms that cannot be readily included in current
grammatical models and are hence "extra-grammatical"
" understanding requires, at the very least, some
attempt to interpret, rather than merely reject, what
seem to be ill-formed utterances." [WIL76]
This paper i n v e s t i g a t e s s e v e r a l language phenomena
commonly considered ungrammatical or e x t r a - g r a m m a t i c a l
and proposes techniques d i r e c t e d at i n t e g r a t i n g them as
much as p o s s i b l e i n t o the c o n v e n t i o n a l grammatical
processing performed by NLU systems through Augmented
Transition Network (ATN) grammars For each NLU system,
a "normative" grammar is assumed which specifies the
structure of well-formed inputs Rules that are both
manually added to the original grammar or automatically
constructed during parsing analyze the ill-formed input
The ill-formedness is shown at the completion of a parse
by deviance from f u l l y grammatical structures We have
been able to do t h i s processing w h i l e p r e s e r v i n g the
structural characteristics of the original grammar and
i t s i n h e r e n t e f f i c i e n c y
considered p r e v i o u s l y in p a r t i c u l a r NLU systems, see f o r
example the e l l i p s i s handling in LIFER [HEN??] Some
techniques similar to ours have been used for parsing,
see for example the conjunction mechanism in LUNAR
[WOO?3) On the l i n g u i s t i c s i d e , Chomsky [CHO6q] and
Katz [KAT6q], among o t h e r s have considered the t r e a t m e n t
Weischedel and Black [WEI?9] The present study is
distinguished by the range of phenomena c o n s i d e r e d , i t s
s t r u c t u r a l and e f f i c i e n c y g o a l s , and the i n c l u s i o n o f
the techniques proposed w i t h i n one i m p l e m e n t a t i o n
mechanisms aimed at solving the problems, and describes
extensions are suggested Unless otherwise noted, all
ideas have been tested through implementation A more
d e t a i l e d and extended discussion of a l l points may be
found in Kwasny [KWA?9]
I I Language Phenomena
e x t r a - g r a m m a t i c a l i n p u t depends on two f a c t o r s The
first is the identification of types of ill-formednese
and the p a t t e r n s they f o l l o w The second i s the
r e l a t i n g o f i l l - f o r m e d i n p u t to the parsing path o f a
i n t r o d u c e s the types o f i l l - f o r m e d n e s s we have s t u d i e d ,
ee Current Address:
Computer Science Department
Indiana U n i v e r s i t y
Bloomington, I n d i a n a
By
Norman K Sondheimer Sperry Univac Blue Bell, Pennsylvania
structures in terms of ATN grammars
II.I Co-Occurrence Violations Our first class of errors can be connected to co-occurrence restrictions within a sentence There are many occassions in a sentence where two p a r t s o r more must agree (= i n d i c a t e s an i l l - f o r m e d or ungrammatical
s e n t e n c e ) :
=Draw a c i r c l e s
" I w i l l stay from now under midnight
The errors in the above involve coordination between the underlined words The first example illustrates simple agreement problems The second involves a complicated
r e l a t i o n between a t l e a s t the t h r e e u n d e r l i n e d terms Such phenomena do occur n a t u r a l l y For example, Shore ($H077] analyzes fifty-six freshman English papers
w r i t t e n by Black c o l l e g e s t u d e n t s and r e v e a l s p a t t e r n s
o f nonstandard usage ranging from u n i n f l e c t e d p l u r a l s , possessives, and t h i r d person s i n g u l a r s t o
o v e r i n f l e c t i o n (use o f i n a p p r o p r i a t e e n d i n g s ) For c o - o c c u r r e n c e v i o l a t i o n s , the blocks t h a t keep
i n p u t s from being parsed as the user intended a r i s e from
a f a i l u r e o f a t e s t on an arc or the f a i l u r e t o s a t i s f y
an arc type r e s t r i c t i o n , e g , f a i l u r e o f a word t o be
in the c o r r e c t c a t e g o r y The e s s e n t i a l b l o c k in the
f i r s t example would l i k e l y occur on an agreement t e s t on
an arc a c c e p t i n g a noun, The e s s e n t i a l blockage in the second example is likely to come from f a i l u r e of the arc testing the f i n a l preposition
11.2 E l l i p s i s and Extraneous Terms
In handling e l l i p s i s , the most r e l e v a n t d i s t i n c t i o n
t o make i s between c o n t e x t u a l and t e l e g r a p h i c e l l i p s i s
C o n t e x t u a l e l l i p s i s occurs when a form o n l y makes proper sense in the c o n t e x t o f o t h e r sentences For example, the form
e P r e s i d e n t Carter has
seems ungrammatical without the preceding question form Who has a daughter named Amy?
P r e s i d e n t C a r t e r has
Telegraphic ellipsis, on the other hand, occurs when a form o n l y makes proper sense in a particular
s i t u a t i o n For example, the t o m e
3 c h a i r s no w a i t i n g ( s i g n in barber shop) Yanks s p l i t ( h e a d l i n e in s p o r t s s e c t i o n )
P r o f i t margins f o r each product (query submitted to a NLU system)
Trang 2noted In parentheses The final example Is from an
e x p e r i m e n t a l s t u d y of NLU for management i n f o r m a t i o n
which i n d i c a t e d t h a t such forms must be c o n s i d e r e d
[MAL75]
Another type of unarammaticality related to
ellipsis occurs when the user puts unnecessary words or
phrases In an utterance The reason for an extra word
may be a change of intention In the middle of an
utterance, an oversight, or simply for emphasis For
example,
• Draw a llne with from here to there
" L i s t p r i c e s o f s i n g l e u n i t p r i c e s f o r 72 and 73
The second example comes from M a l h o t r a [MALT5]
The best way to see the errors In terms of the ATN
is to think of the user as trylng to complete a path
through the grammar, but having produced an input that
has too many or too few forms necessary to traverse all
arcs,
II.3 C o n j u n c t i o n
C o n j u n c t i o n i s an e x t r e m e l y common phenomenon, b u t
i t i s seldom d i r e c t l y t r e a t e d i n 8 grammar We have
c o n s i d e r e d s e v e r a l t y p o s o f c o n j u n c t i o n
Simple forms o f c o n j u n c t i o n o c c u r most f r e q u e n t l y ,
as i n
John l o v e s Mary and h a t e s Sue
Gapping o c c u r s when i n t e r n a l segments o f t h e second
c o n j u n c t a r e m i s s i n a , as i n
J o h n l o v e s Mary and Wary J o h n
The l i s t form o f c o n j u n c t i o n o c c u r s when more t h a n two
e l e m e n t s a r e j o i n e d i n a s i n g l e p h r a s e , as i n
John l o v e s Wary Sue, Nancy end B i l l
Correlative c o n j u n c t i o n o c c u r s i n s e n t e n c e s t o
coordinate the Joining of constituents, as in
John b o t h l o v e s and h a t e s Sue
The r e a s o n c o n J u n c t s a r e g e n e r a l l y l e f t o u t o f
grammars i s t h a t t h e y can appear i n so many p l a c e s t h a t
i n c l u s i o n would d r a m a t i c a l l y i n c r e a s e t h e s i z e o f t h e
grammar The same argument applies t o the ungrammatical
phenomena Since t h e y a l l o w so much v a r i a t i o n compared
t o g r a m m a t i c a l f o r m s , i n c l u d i n g them w i t h e x i s t i n g
techniques would dramatically increase the size oF a
gram~aar F u r t h e r t h e r e i s a r e a l d i s t i n c t i o n i n terms
of completeness and clarity of intent between
g r a m m a t i c a l and ungram mat ic a l f o r m s Hence we f e e l
justified In suggesting speciai techniques f o r their
treatment
I I I Proposed Mechanisms and How They Apply
The f o l l o w i n g p r e s e n t a t i o n o f o u r t e c h n i q u e s
assumes an u n d e r s t a n d i n g o f t h e ATN model The
techniques are a p p l i e d to the langumae phenomena
discussed ~n t h e p r e v i o u s section
I I I l R e l a x a t i o n T e ch n iqu es The f i r s t two methods d e s c r i b e d a r e r e l a x a t i o n methods which a l l o w t h e s u c c e s s f u l t r a v e r s a l o f ATN a r c s
t h a t m i a h t n o t o t h e r w i s e be t r a v e r s e d D u r i n 8 p a r s i n a , whenever an a r c c a n n o t be t a k e n , a check i s made t o see
i f some form o f r e l a x a t i o n can a p p l y I f i t can t h e n a
b a c k t r a c k p o i n t i s c r e a t e d which i n c l u d e s t h e r e l a x e d
v e r s i o n o f t h e a r c These a l t e r n a t i v e s a r e n o t
c o n s i d e r e d u n t i l a f t e r a l l p o s s i b l e 8 r a m m a t l c s l p a t h s have been a t t e m p t e d t h e r e b y i n s u r t n 8 t h a t 8 r a m m a t i c e l
i n p u t s a r e s t i l l handled c o r r e c t l y R e l a x a t i o n o f
p r e v i o u s l y r e l a x e d a r c s i s a l s o p o s s i b l e Two methods
o f r e l a x a t i o n have been I n v e s t i g a t e d Our f i r s t method i n v o l v e s r e l a x l n 8 a t e s t on an
a r c , s i m i l a r t o t h e method used by Weisohedel i n [WEI79] T e s t r e l a x a t i o n o c c u r s when t h e t e s t p o r t i o n
of an arc contains a relaxable predicate and the test
f a i l s Two methods o f t e s t r e l a x a t i o n have been
i d e n t i f i e d and implemented based on p r e d i c a t e t y p e Predicates can be desianated by the grammar writer as either absolutely violable in which case the opposite value of the predicate (determined by the LISP function NOT applied to the predicate) Is substituted for the predicate during relaxation or conditionally violable in which case s substitute predicate is provided For example, consider the following to be a test that fails: (AND
(INFLECTING V) (INTRAN3 V))
I f t h e p r e d i c a t e INFLECTING was d e c l a r e d a b s o l u t e l y
v i o l a b l e and i t s use i n t h i s t e s t r e t u r n e d t h e v a l u e NIL, t h e n t h e n e g a t i o n o f (INFLECTING Y) would r e p l a c e
It in the test creating a new arc with the test:
(AND
T (INTRANS V))
I f INTRANS were c o n d i t i o n a l l y v i o l a b l e w i t h t h e substitute predicate TRANS, then the following test would appear on t h e new a r c :
(AND (INFLECTING V) (TRANS V))
Whenever more t h a n one t e s t i n a f a i l i n g a r c i s
v i o l a b l e , a l l p o s s i b l e s i n g l e r e l a x a t i o n s a r e a t t e m p t e d
i n d e p e n d e n t l y A b s o l u t e l y v i o l a b l e p r e d i c a t e s can be permitted in cases where the test describes some
s u p e r f i c i a l c o n s i s t e n c y c h e c k i n g o r where t h e t e s t ' s
f a i l u r e o r success d o e s n ' t have a d i r e c t a f f e c t on meaning, w h i l e c o n d i t i o n a l l y v i o l a b l e p r e d i c a t e s a p p l y
t o p r e d i c a t e s which must be r e l a x e d c a u t i o u s l y o r e l s e loss o f meaning may result
ChomsMy d i s c u s s e s t h e n o t i o n o f o r g a n i z i n g word
c a t e g o r i e s h i e r a r c h i c a l l y i n d e v e l o p i n g his i d e a s on
d e g r e e s of g r a m m a t i c a l n e s s We have a p p l i e d and
e x t e n d e d these i d e a s In o u r second method o f r e l a x a t i o n
c a l l e d c a t e s o r y r e l a x a t i o n In t h i s method, t h e 8rammar
w r i t e r p r o d u c e s , a l o n g w i t h t h e grammar, a h i e r a r c h y
d e s c r i b i n g t h e r e l a t i o n s h i p amen8 words, c a t e g o r i e s , and
p h r a s e t y p e s which i s u t i l i z e d by t h e r e l a x a t i o n mechanism to c o n s t r u c t r e l a x e d v e r s i o n s o f a r c s t h a t
h i v e f a i l e d When an arc f a i l s because o f an arc type failure (i.e., because a particular word, category, or
p h r a s e was n o t f o u n d ) a new a r c ( o r a r c s ) may be c r e a t e d
a c c o r d i n g t o the d e s c r i p t i o n o f t h e word, c a t e g o r y , o r
p h r a s e i n t h e h i e r a r c h y T y p i c a l l y PUSH a r c s w i l l relax to PUSH arcs, CAT arcs to CAT or PUSH arcs, and WRD o r HEM a r c s t o CAT a r c s C o n s i d e r f o r example, t h e
s y n t a c t i c c a t e a o r y h i e r a r c h y f o r pronouns shown i n
F i g u r e 1 For t h i s example, t h e c a t e a o r y r e l a x a t i o n
Trang 3pronouns to include the category PRONOUN The arc
produced from category r e l a x a t i o n o f PERSONAL pronouns
a l s o i n c l u d e s t h e s u b c a t e g o r i e s REFLEXIVE and
DEMONSTRATIVE i n o r d e r t o expand t h e scope o f t e r m s
during relaxation As with test relaxation, successive
relaxations could occur
For both methods of relaxation, "deviance notes"
multiple levels of relaxation occur, a note is generated
for each of these The entire list of deviance notes
accompanies the final structure produced by the parser
In this way, the final structure is marked as deviant
and the nature of the deviance is available for use by
other components of the understanding system
In our implementation, test relaxation has been
f u l l y i m p l e m e n t e d , w h i l e c a t e g o r y r e l a x a t i o n has been
i m p l e m e n t e d f o r a l l cases e x c e p t t h o s e i n v o l v i n g PUSH
requires a modification to our backtracking algorithm
I I I 2 Co-Occurrence and R e l a x a t i o n
The solution being proposed to handled forms that
are deviant because of co-occurrence violations centers
around the use of relaxation methods Where simple
tests exist within a grammar to filter out unacceptable
forms of the type noted above, these tests may be
relaxed to allow the acceptance of these forms This
doesn't eliminate the need for such tests since these
tests help in disambiguation and provide a means by
which sentences are marked as having violated certain
r u l e s
For co-occurrence violations, the point in the
grammar where parsing becomes blocked is often exactly
where the test or category violation occurs An arc at
that point is being attempted and fails due to a failure
alternative generated which may be explored at a later
point via backtracking For example, the sentence:
WJohn l o v e Mary
shows a disagreement between the subject (John) and the
verb (love) Most probably this would show up during
parsing when an arc is attempted which is expecting the
verb of the sentence The test would fall and the
traversal would not be allowed At that point, an
backtracking to consider
III.) Patterns and the Pattern Arc
In this section, relaxation techniques, as a p p l i e d
to the grammar itself, are introduced through the use o f
patterns and pattern-matching algorithms Other systems
have used patterns for parsing We have devised a
formalism, patterns which are flexible and useful
implemented and a r e now t e s t i n g , a p a t t e r n i s a l i n e a r
sequence of ATN arcs which is matched against the input
string A pattern arc (PAT) has been added to the ATN
formalism whose form is similar to that of other arcs:
(PAT <pat apec> <test> <act> a <term>)
The pattern specification (<pat spec>) is defined as:
< p a t spec> ::: ( < p a t t > <mode> a)
<part> ::= (<p arc>*)
<pat name>
<mode> : : = UNANCHOR
OPTIONAL SKIP
<p arc> : : = <arc>
> <arc>
<pat name> ::= user-assiGned pattern name
>
The pattern (<part>) is either the name of a pattern, a
">", or a list of ATN arcs, each of which may be preceded by the symbol ">", while the pattern mode (<mode>) can be any of the keywords, UNANCHOR, OPTIONAL,
patterns by name, a dictionary o f patterns is supported
A dictionary of arcs is also supported, allowing the referencing of arcs by name as well Further, named arcs are defined as macros, allowing the dictionary and the grammar to be substantially reduced in size
THE PATTERN MATCHER Pattern matching proceeds by matching each arc in the pattern against the input string, but is affected by the chosen "mode" of matching Since the individual component a r c s a r e , i n a sense, complex p a t t e r n s , t h e ATN i n t e r p r e t e r can be c o n s i d e r e d p a r t o f t h e m a t c h i n g
a l g o r i t h m as w e l l I n a r e s w i t h i n p a t t e r n s , e x p l i c i t
t r a n s f e r t o a new s t a t e i s i g n o r e d and t h e n e x t a r c
a t t e m p t e d on success i s t h e one f o l l o w i n g i n t h e
p a t t e r n An a r e i n a p a t t e r n p r e f a c e d b y " > " can be
c o n s i d e r e d o p t i o n a l , i f t h e OPTIONAL mode has been
s e l e c t e d t o a c t i v a t e t h i s f e a t u r e When t h i s i s d o n e ,
t h e m a t c h i n g a l g o r i t h m s t i l l a t t e m p t s t o match o p t i o n a l
a r e a , b u t may i g n o r e them A p a t t e r n u n a n c h o r i n g
c a p a b i l i t y i s a c t i v a t e d by s p e c i f y i n g t h e mode UNANCHOR
In this mode, patterns are permitted to skip words prior
results in words being ignored between matches of the arcs within a pattern This is a generalization of the UNANCHOR mode
P a t t e r n m a t c h i n g a g a i n r e s u l t s i n d e v i a n c e n o t e s For p a t t e r n s , t h e y c o n t a i n i n f o r m a t i o n n e c e s s a r y t o
d e t e r m i n e how m a t c h i n g s u c c e e d e d
SOURCE OF PATTERNS
An a u t o m a t i c p a t t e r n g e n e r a t i o n mechanism has been
i m p l e m e n t e d u s i n g t h e t r a c e o f t h e c u r r e n t e x e c u t i o n
p a t h t o p r o d u c e a p a t t e r n T h i s i s i n v o k e d b y u s i n g a
" > " as t h e p a t t e r n name P a t t e r n s produced i n t h i s
f a s h i o n c o n t a i n o n l y t h o s e a r c s t r a v e r s e d a t t h e c u r r e n t
l e v e l o f r e c u r s i o n i n t h e n e t w o r k , a l t h o u g h we a r e
p l a n n i n g t o implement a g e n e r a l i z a t i o n o£ t h i s i n which
s u b n e t ~ o r k p a t h s Each a r e i n an a u t o m a t i c p a t t e r n i s marked as o p t i o n a l P a t t e r n s can a l s o be c o n s t r u c t e d
d y n a m i c a l l y i n p r e c i s e l y t h e same way g r a m m a t i c a l structures are built using BUILDQ The vehicle by which this is accomplished is discussed next
AUTOMATIC PRODUCTION OF ARCS
P a t t e r n a r c s e n t e r t h e grammar i n two ways They
a r e m a n u a l l y w r i t t e n i n t o t h e grammar i n t h o s e cases
where t h e u n g r a m m a t i c a l i t i e s a r e common and t h e y a r e added t o t h e grammar a u t o m a t i c a l l y i n t h o s e cases where
t h e u n g r a m m a t i c a l i t y i s d e p e n d e n t on c o n t e x t P a t t e r n
a r c s p r o d u c e d d y n a m i c a l l y e n t e r t h e grammar t h r o u g h one
o f two d e v i c e s They may be c o n s t r u c t e d as needed b y
Trang 4use through an expectation mechanism
As the expectatlon-based parsing efforts clearly
show, syntactic elements especially words c o n t a i n
i m p o r t a n t c l u e s on p r o c e s s i n g I n d e e d we a l s o have
found It useful to make the ATN mechanism more "active"
by allowing it to produce new arcs based on such clues
TO achieve t h i s , t h e CAT, MEM, TBT, and WRD a r c s have
been g e n e r a l i z e d and four new "macro" a r c s , known as
CAT e HEM e, TST a, and WRD e have been added t o the ATN
formalism These are similar In every way to their
c o u n t e r p a r t s , e x c e p t t h a t as a f i n a l a c t i o n , i n s t e a d of
indicating the state t o which t h e traversal leads, a new
arc i s o o n s t r u c t e d d y n a m i c a l l y and i m m e d i a t e l y e x e c u t e d
The d i f f e r e n c e i n t h e form t h a t t h e new a r c t a k e s i s
seen i n t h e f o l l o w i n g p a i r where < c r e s t act> I s used t o
d e f i n e t h e dynamic a r c :
(CAT <cat> < t e s t > <act> a <term >)
(CAT e <cat> < t e s t > <act> a < c r e a t a c t > )
Arcs computed by macro arcs can be of any type permitted
by the ATN, b u t one of the most useful arcs to compute
in this manner is the PAT arc discussed above
EXPECTATIONS
The macro arc forces immediate execution of an arc
Arcs may also be computed and temporarily added to the
grammar for l a t e r execution t h r o u g h an " e x p e c t a t i o n "
mechanism E x p e c t a t i o n s a r e p e r f o r m e d as a c t i o n s w i t h i n
a r c s ( a n a l o g o u s t o t h e H O L D a c t i o n f o r p a r s i n g
structures) or as actions elsewhere In t h e MLU system
(e.g., during generation when particular types of
r e s p o n s e s can be f o r e s e e n ) Two forms a r e a l l o w e d :
(EXPECT <crest act> <state>)
(EXPECT <crest act> )
In the first case, the arc created is bound t o a state
as specified When later processing leads to that
s t a t e , t h e e x p e c t e d a r c will be a t t e m p t e d as one
alternative at that state In the second case, where no
state is specified, the effect is to attempt the arc at
every state visited d u r i n g the parse
The r a n g e of an e x p e c t a t i o n produced d u r i n g p a r s i n g
is ordinarily l i m i t e d t o a single s e n t e n c e , with the arc
disappearing after it has been used; h o w e v e r , the start
state, S e, is reserved for expectations intended to be
active at the beginning of the next sentence These
w i l l d i s a p p e a r i n t u r n a t t h e e n d - - ~ p r o o e s s i n g f o r t h a t
s e n t e n c e
IIZ.q Patterns t Elllpsls~ and Extraneous Forms
The P a t t e r n a r c i s proposed as t h e p r i m a r y
mechanism f o r h a n d l i n g e l l i p s i s and e x t r a n e o u s f o r m s A
P a t t e r n a r c can be seen as c a p t u r i n g a s i n g l e p a t h
t h r o u g h a netWOrk The matcher g i v e s some freedom In
how t h a t p a t h r e l a t e s t o a s t r i n g We p r o p o s e t h a t t h e
a p p r o p r i a t e p a r s i n g p a t h t h r o u g h a n e t w o r k r e l a t e s t o an
e l l i p t i c a l s e n t e n c e o r one w i t h e x t r a words i n t h e same
way With c o n t e x t u a l e l l i p s i s , t h e r e l a t i o n s h i p w i l l be
i n h a v i n g some o f t h e a r c s on the c o r r e c t p a t h n o t
satisfied In Pattern arcs, these will be represented
by a r c s marked as o p t i o n a l With c o n t e x t u a l e l l i p s i s ,
d i a l o g u e c o n t e x t w i l l p r o v i d e t h e d e f a u l t s f o r t h e
m i s s i n g components With P a t t e r n a r c s , t h e d e v i a n c e
notes will show what was left o u t and the other
components in the ~ U system will be responsible for
supplying the values
The source of patterns for contextual ellipsis is
i m p o r t a n t In L i f e r [HEN77], t h e p r e v i o u s u s e r i n p u t can be seen as a pattern for elliptical processing of the current input The automatic pattern generator
d e v e l o p e d h e r e , a l o n g w i t h t h e e x p e c t a t i o n mechanism,
w i l l c a p t u r e t h i s l e v e l o f p r o c e s s i n g But w i t h the
a b i l i t y t o c o n s t r u c t a r b i t r a r y p a t t e r n s and t o add them
t o the grammar from o t h e r components of t h e MLU system,
o u r approach can a c c c o m p l i s h much more For example, a
q u e s t i o n g e n e r a t i o n r o u t i n e c o u l d add an e x p e c t a t i o n o f
a y e s / n o answer i n f r o n t o f a t r a n s f o r m e d r e p h r a s i n g o f
a q u e s t i o n , as i n Did Amy klas anyone?
Yes, J i s m y was kissed
Patterns for telegraphic ellipsis will have to be added to the grammar manually Generally, patterns of usage must be identified, say in a study like that of Malhotra, so that appropriate patterns can be constructed Patterns for extraneous forms will also be added In advance These w i l l e i t h e r use the unachor
o p t i o n In o r d e r t o s k i p f a l s e s t a r t s , o r d y n a m i c a l l y produced p a t t e r n s t o c a t c h r e p e t i t i o n s f o r emphasis In
g e n e r a l , o n l y a l i m i t e d number o f t h e s e p a t t e r n s s h o u l d
be r e q u i r e d The v a l u e o f t h e p a t t e r n mechanism h e r e ,
e s p e c i a l l y In t h e case of t e l e g r a p h i c e l l i p s i s , w i l l be
i n c o n n e c t i n g the u n g r a m m a t i c a l t o g r a m m a t i c a l f o r m s
III.5 C o n j u n c t i o n and Macro Arcs
P a t t e r n a r c s a r e a l s o proposed as t h e p r i m a r y mechanism f o r h a n d l i n g c o n j u n c t i o n The r a t i o n a l e f o r
t h i s i s t h e o f t e n noted c o n n e c t i o n between c o n j u n c t i o n and e l l i p s i s , see f o r example H a l l t d a y and Haman [HAL75] T h i s i s c l e a r w i t h g a p p i n g , as i n t h e
f o l l o w i n g where t h e p a r e n t h e s e s show t h e m i s s i n g component
John l o v e s Mary and Mary ( l o v e s ) John
BUt i t a l s o can be seen w i t h o t h e r f o r m s , as i n John l o v e s Mary and (John) h a t e s Sue
John l o v e s H a r y , (John l o v e s ) Sue, (John l o v e s ) Mancy, and (John l o v e s ) B i l l
Whenever a c o n j u n c t i o n i s seen, a p a t t e r n i s d e v e l o p e d from the a l r e a d y i d e n t i f i e d e l e m e n t s and matched a g a i n s t
t h e r e m a i n i n g segments of i n p u t The h e u r i s t i c s for
d e c i d i n g from which l e v e l t o produce the p a t t e r n f o r c e
t h e most g e n e r a l i n t e r p r e t a t i o n i n o r d e r t o enc our age an
e l l i p t i c a l r e a d i n g
A l l o f t h e forms o f c o n j u n c t i o n d e s c r i b e d above a r e
t r e a t e d t h r o u g h a g l o b a l l y d e f i n e d s e t o f " c o n j u n c t i o n
a r c s " (Some r e s t r i c t e d c a s e s , such as " a n d " f o l l o w i n g
" b e t w e e n " , have t h e c o n j u n c t i o n b u i l t i n t o t h e grammar)
In g e n e r a l , t h i s s e t w i l l be made up o f macro arcs which compute P a t t e r n a r c s The a u t o m a t i c p a t t e r n mechanism
i s h e a v i l y used With s i m p l e c o n j u n c t i o n s , t h e
r i g h t m o s t e l e m e n t s in t h e p a t t e r n s a r e matched
I n t e r n a l e l e m e n t s In p a t t e r n s a r e s k i p p e d w i t h g a p p i n g The l l s t form o f c o n j u n c t i o n can a l s o be h a n d l e d t h r o u g h
t h e c a r e f u l c o n s t r u c t i o n o f dynamic p a t t e r n s which a r e
t h e n e x p e c t e d a t a l a t e r p o i n t C o r r e l a t i v e s a r e treated similarly, with expectations based on the dynamic building of patterns
There a r e a number o f d e t a i l s i n o u r p r o p o s a l which
w i l l n o t be p r e s e n t e d There a r e a l s o v i s i b l e l i m i t s
i t i s i n s t r u c t i v e t o compare the p r o p o s a l t o t h e SYSCONj facility of Woods [W0073] It treats conjunction as
Trang 5allows for sentences such as
He drove his car through and broke a plate glass
window
which at best we will accept with a misleading d e v i a n c e
n o t e However, i t can not handle the o b v i o u s e l l i p t i c a l
cases, such g a p p i n g , o r the t i g h t l y c o n s t r a i n e d cases,
investigating the pattern approach
I I I 6 Interaction of Techniques
As grammatical processing proceeds, ungrammatical
possibilities are continually being suggested from the
various mechanisms we have implemented To coordinate
all of these activities, the backtracking mechanism has
been improved to keep track o f the:le alternatives All
paths in the original grammar are attempted first Only
when these all fail are the conjunction alternatives and
the manually added and d y n a m i c a l l y produced
alternatives of these sorts connected with a single
state can be thought of as a single possibility A
selection mechanism is used to determine which backtrack
point among the many potential alternatives is worth
exploring next Currently, we use a method also used by
alternative with the longest path length
IV Conclusion and Open Questions
These results are significant, we believe, because
they extend the state of the art in several ways Most
obvious are the following:
The use of the category h i e r a r c h y to handle arc
type failures;
The use of the pattern mechanism to allow for
contextual ellipsis and gapping;
More generally, the use of patterns to allow for
many sorts of ellipsis and conjunctions; and
Finally, the orchestration of all of the techniques
grammatical alternatives are tried first and no
modifications are made to the original grammar, its
inherent efficiency and structure are preserved
IV.1 Open Problems
Various questions for further research have arisen
during the course of this work The most important of
these are discussed here
Better control must be exercised over the selection
of viable alternatives when ungrammatical possibilities
are being attempted The longest-path heuristic is
somewhat weak The process that decides this would need
to take into consideration, among other things, whether
to allow relaxation of a criteria applied to the subject
or to the verb in a case where the subject and verb do
not agree The current path length heuristic would
always relax the verb which is clearly not always
correct
No consideration has been given to the possible
connection of one error wlth another In some cases,
one error can lead to or affect another
c o n s i d e r e d in t h i s s t u d y , f o r example, i d i o m s , metaphors, i n c o r r e c t word o r d e r , run t o g e t h e r s e n t e n c e s ,
i n c o r r e c t p u n c t u a t i o n , m i s s p e l l i n g , and p r e s u p p o s i t i o n a l
f a i l u r e E i t h e r l i t t l e i s known about these p r oc esses
o r they have been s t u d i e d els e w her e i n d e p e n d e n t l y In
e i t h e r case, work remains t o be done
V Acknowledgments
We wish to acknowledge the comments of Ralph Weischedel and Marc Fogel on previous drafts of this paper Although we would like to blame them, any shortcomings are clearly our own fault
VI Bibliography [CHO6q]
[FOD64]
[HAL76]
(HEN77]
[KAT643 [KWA793
[MAL75]
[SHO77]
[WEI79]
[ WIL76 ]
[wo0733
Chomsky, N., "Degrees o f G r a m m a t i c a l n e s s , " in [FOD6~], 38q-389
Fodor, J A and J J Katz, The Structure of Language: Readings in the P h i l o s o p h y o f Language, P r e n t i c e - H a l l , Englewood C l i f f s , New
J e r s e y , 196q
H a l l i d a y , M.A.K and R Hasan, Cohesion i n
E n g l i s h , Longman, London, 1976
H e n d r l x , G G., "The LIFER M a n u a l , " T e c h n i c a l
S t a n f o r d Research I n s t i t u t e , Menlo Park,
C a l i f o r n i a , F e b r u a r y , 1977
K a t z , J J , " S e m i - S e n t e n c e s , " in [FOD64], qoo-q16
Kwasny, S., "T rea t me n t o f Ungrammatical and
E x t r a g r a m m a t i c a l Phenomena i n N a t u r a l Language Understanding Systems," PhD dissertation (forthcoming), Ohio State University, 1979
Management: An Experimental Analysis," MAC TR-I~6, M I T , Cambridge, Ha, F e b r u a r y , 1975 Shores, D L , " B l a c k E n g l i s h and Black
A t t i t u d e s , " in Papers i n Language V a r i a t i o n
D L Shores and C PT-Hines (Ed ~ ] ~ e
U n i v e r s i t y of Alabama Press, U n i v e r s i t y , Alabama, 1977
Weischedel, R M., and J B l a c k , "Responding to Potentially Unparseable Sentences," manuscript,
Delaware, 1979
Wilka, Y., "Natural Language Understanding Systems Within the A.I Paradigm: A Survey," American Journal of Computational Lin~uistlcs,
~ h ~ - # - ~ 1T 1976
Woods, W A2 "An Experimental Parsing System for Transition Network Grammars," in Natural Language P r o c e s s i n g , R M u s l i n ( E d ) , Algorithmlcs Press, 1973
PRONOUN
REFLEXIVE
/;o i
he she y o u r s e l f t h i s t h a t
F i g u r e 1 A C a t e g o r y H i e r a r c h y