The second contribution is our typology of errors in preposition usage.. MIT Press, Cambridge.. Cambridge University Press, Cambridge.. Cambridge University Press, Cambridge.. MIT Pres
Trang 1ON R E P R E S E N T I N G G O V E R N E D P R E P O S I T I O N S AND
H A N D L I N G " I N C O R R E C T " AND NOVEL P R E P O S I T I O N S
H a t t e R Blejer, Sharon F l a n k , a n d A n d r e w K c h l e r
SRA C o r p o r a t i o n
2000 15th St N o r t h
A r l i n g t o n , VA 22201, USA
A B S T R A C T
N L P systems, in o r d e r to be robust,
must h a n d l e novel a n d i l l - f o r m e d i n p u t
One c o m m o n type o f error involves the use
o f n o n - s t a n d a r d prepositions to m a r k
a r g u m e n t s In this paper, we argue t h a t
such errors can be h a n d l e d in a systematic
f a s h i o n , a n d t h a t a system d e s i g n e d to
h a n d l e t h e m o f f e r s o t h e r advantages We
o f f e r a c l a s s i f i c a t i o n s c h e m e f o r
p r e p o s i t i o n usage errors F u r t h e r , we show
h o w t h e k n o w l e d g e r e p r e s e n t a t i o n
e m p l o y e d in t h e SRA NLP system
facilitates h a n d l i n g these data
1.0 I N T R O D U C T I O N
It is well k n o w n t h a t NLP systems,
in o r d e r to be robust, must h a n d l e ill-
f o r m e d i n p u t One c o m m o n type o f error
i n v o l v e s t h e use o f n o n - s t a n d a r d
prepositions to m a r k arguments In this
paper, we a r g u e t h a t such errors can be
h a n d l e d in a systematic fashion, a n d t h a t a
system d e s i g n e d to h a n d l e t h e m o f f e r s
o t h e r a d v a n t a g e s
T h e examples o f n o n - s t a n d a r d
prepositions we present in the p a p e r are
taken f r o m colloquial language, both
w r i t t e n a n d oral T h e type o f error these
examples represent is q u i t e f r e q u e n t in
colloquial w r i t t e n language T h e f r e q u e n c y
o f such examples rises sharply in evolving
sub-languages a n d in oral colloquial
language In d e v e l o p i n g an NLP system to
be used by various U.S g o v e r n m e n t
customers, we have been sensitized to the
need to h a n d l e v a r i a t i o n a n d i n n o v a t i o n in
preposition usage H a n d l i n g this type o f
v a r i a t i o n or i n n o v a t i o n is part o f o u r
overall c a p a b i l i t y to h a n d l e novel
predicates, w h i c h arc f r e q u e n t in sub-
language Novel predicates created f o r sub-
languages arc less "stable" in how they m a r k
a r g u m e n t s ( A R G U M E N T MAPPING) t h a n
general English "core" predicates w h i c h
speakers learn as c h i l d r e n It can be
expected that the e v e n t u a l a d v e n t o f
successful speech u n d e r s t a n d i n g systems will f u r t h e r e m p h a s i z e the n e e d to h a n d l e this a n d o t h e r v a r i a t i o n
T h e N L P system u n d e r d e v e l o p m e n t
at S R A i n c o r p o r a t e s a N a t u r a l L a n g u a g e
K n o w l e d g e Base (NLKB), a m a j o r part o f
w h i c h consists o f objects r e p r e s e n t i n g
S E M A N T I C P R E D I C A T E CLASSES T h e system uses h i e r a r c h i c a l k n o w l e d g e sources; all general "class-level" c h a r a c t e r i s t i c s o f a
s e m a n t i c p r e d i c a t e class, i n c l u d i n g the
n u m b e r , type, a n d m a r k i n g o f t h e i r
a r g u m e n t s , are p u t in the NLKB This leads to increased e f f i c i e n c y in a n u m b e r
o f system aspects, e.g., the lexicon is more
c o m p a c t a n d easier to m o d i f y since it only
c o n t a i n s i d i o s y n c r a t i c i n f o r m a t i o n This
r e p r e s e n t a t i o n allows us to d i s t i n g u i s h
b e t w e e n I c x i c a l l y a n d s e m a n t i c a l l y
d e t e r m i n e d ARGUIVIENT M A P P I N G a n d to
f o r m u l a t e general class-level c o n s t r a i n t
r e l a x a t i o n m e c h a n i s m s
I I C L A S S I F Y I N G P R E P O S I T I O N
U S A G E
P r e p o s i t i o n usage in English in positions g o v e r n e d by p r e d i c a t i n g elements,
w h e t h e r a d j e c t i v a l , verbal, or n o m i n a l , may
be classified as (I) lexically d e t e r m i n e d , (2)
s y n t a c t i c a l l y d e t e r m i n e d , o r (3)
s e m a n t i c a l l y d e t e r m i n e d Examples are:
L E X I C A L L Y D E T E R M I N E D :
laugh at, afraid of
S Y N T A C T I C A L L Y D E T E R M I N E D :
by in passive sentences
S E M A N T I C A L L Y D E T E R M I N E D :
move to~from
P r e p o s i t i o n usage in i d i o m a t i c phrases is also c o n s i d e r e d to be lexically d e t e r m i n e d , e.g., ~ respect to
1.2 A T Y P O L O G Y OF E R R O R S IN
P R E P O S I T I O N U S A G E
We have classified o u r corpus o f examples o f the use o f n o n - s t a n d a r d
Trang 2prepositions into the following categories:
(1) s u b s t i t u t i o n o f a s e m a n t i c a l l y
appropriate preposition either f r o m the
same class or a n o t h e r f o r a semantically
d e t e r m i n e d one, (2) substitution of a
semantically appropriate preposition f o r a
lexically d e t e r m i n e d one, (3) false starts,
(4) blends, and (5) substitution of a
semantically appropriate preposition f o r a
s y n t a c t i c a l l y d e t e r m i n e d one A small
percentage of the non-standard use of
prepositions appears to be random
1.3 COMPUTATIONAL APPLICATIONS
OF T H I S WORK
In a theoretical linguistics f o r u m
(Blejcr and Flank 1988), we argued that
these examples of the use of non-standard
prepositions to m a r k arguments (1)
represent the kind of principled variation
that underlies language change, and (2)
support a semantic analysis of government
that utilizes t h e m a t i c roles, citing other
evidence f o r the semantic basis of
prepositional case m a r k i n g f r o m studies of
language d y s f u n c t i o n (Aitchison 1987:103),
language acquisition (Pinker 1982:678;
Mcnyuk 1969:56), and typological, cross-
linguistic studies on case-marking systems
More theoretical aspects of our work
( i n c l u d i n g d i a c h r o n i ¢ c h a n g e a n d
arguments f o r and against p a r t i c u l a r
linguistic theories) were covered in that
paper; here we concentrate on issues of
interest to a computational linguistics
forum First, our n a t u r a l language
knowledge representation and processing
strategies take into account the semantic
basis of prepositional case marking, and
thus f a c i l i t a t e handling non-standard and
novel use of prepositions to mark
arguments The second contribution is our
typology of errors in preposition usage We
claim that an NLP system which accepts
n a t u r a l l y occurring input must recognize
the type of the error to know how to
compensate f o r it F u r t h e r m o r e , the
knowledge representation scheme we have
implemented is an e f f i c i e n t representation
for English and lends itself to adaptation to
representing non-English case-marking as
well
T h e r e is w i d e v a r i a t i o n in
computational strategies for mapping f r o m
the actual n a t u r a l language expression to
some sort of P R E D I C A T E - A R G U M E N T
representation At issue is how the system
recognizes the arguments of the predicate
At one end of the spectrum is an approach which allows a n y m a r k i n g of a r g u m e n t s if the type o f the a r g u m e n t is correct f o r that predicate This approach is i n a d e q u a t e because it ignores vital i n f o r m a t i o n carried
by the preposition At the other extreme is
a semantically constrained syntactic parse,
in m a n y ways a highly desirable strategy This latter method, however, constrains more strictly than what h u m a n s actually produce and understand Our strategy has been to use the latter method, allowing relaxation of those constraints, u n d e r certain well-specified circumstances
Constraint relaxation has been recognized as a viable strategy for handling ill-formed input Most discussion centers a r o u n d orthographic errors and errors in subject-verb agreement Jensen, Heidorn, Miller, and Ravin (1983:158) note the importance of "relaxing restrictions in the g r a m m a r rules in some principled way." Knowing which constraints to relax and avoiding a p r o l i f e r a t i o n of incorrect parses however, is a non-trivial task Weischedel
a n d S o n d h e i m e r .(1983:163ff) o f f e r
c a u t i o n a r y advice on this subject
There has been some discussion of errors similar to those cited in our paper Carbonell and Hayes (1983:132) observed that "problems created by the absence of expected case markers can be overcome by the application of domain knowledge" using case f r a m e instantiation We agree with these authors that the use of domain knowledge is an i m p o r t a n t element in
u n d e r s t a n d i n g ill-formed input However,
in instances where the preposition is not omitted, but r a t h e r replaced by a non-
s t a n d a r d preposition, we claim that an
u n d e r s t a n d i n g of the linguistic principles involved in the substitution is necessary
To e x p l a i n h o w c o n s t r a i n t relaxation is accomplished, a brief system description is needed Our system uses a parser based on Tomita (1986), with modifications to allow constraints and structure-building It uses c o n t e x t - f r e e phrase s t r u c t u r e rules, a u g m e n t e d with morphological, contextual, and semantic constraints Application of the phrase structure rules results in a parse tree, similar to a Lexical-Functional G r a m m a r (LFG) "c-structure" (Bresnan 1982) The constraints are u n i f i e d at parse time to produce a f u n c t i o n a l l y labelled template (FLT) The FLT is then input to a semantic translation module Using A R G U M E N T
Trang 3MAPPING rules a n d other operator-
o p e r a n d s e m a n t i c r u l e s , s e m a n t i c
translation creates situation f r a m e s (SF)
SFs consist of a predicate a n d e n t i t y f r a m e s
(EF), whose semantic roles in the situation
are labeled Other semantic objects are
r e l a t i o n a l f r a m e s (e.g p r e p o s i t i o n a l
phrases), p r o p e r t y f r a m e s (e.g adjective
phrases), and unit f r a m e s (measure phrases)
D u r i n g the semantic i n t e r p r e t a t i o n and
discourse analysis phase, the situation
f r a m e is i n t e r p r e t e d , resulting in one or
more i n s t a n t i a t e d knowledge base (KB)
objects, w h i c h are state or event
descriptions with e n t i t y participants
2.0 R E P R E S E N T I N G ARGUMENT
MAPPING IN AN NLP SYSTEM
In our lexicons, verbs and adjectives
are linked to one or more predicate classes
which are d e f i n e d in the N a t u r a l Language
K n o w l e d g e Base (NLKB) Predicates
typically govern one or more arguments or
t h e m a t i c roles All general, class-level
i n f o r m a t i o n about the t h e m a t i c roles which
a given predicate governs is represented at
the highest possible level Only
i d i o s y n c r a t i c i n f o r m a t i o n is represented in
the lexicon When lexicons are loaded the
i d i o s y n c r a t i c i n f o r m a t i o n in the lexicon is
u n i f i e d with the general i n f o r m a t i o n in the
NLKB Our representation scheme has
c e r t a i n i m p l e m e n t a t i o n a l a d v a n t a g e s :
lexicons are less e r r o r - p r o n e and easier to
m o d i f y , the d a t a are more compact,
constraint relaxation is f a c i l i t a t e d , etc
More i m p o r t a n t l y , we claim that such
semantic classes are psychologically valid
Our representation scheme is based
on the principle that A R G U M E N T
MAPPING is generally d e t e r m i n e d at the
class-level, i.e., predicates group along
semantic lines as to the type of
A R G U M E N T MAPPING they take Our
work draws f r o m theoretical linguistic
studies of t h e m a t i c relations (e.g., G r u b e r
1976, J a c k e n d o f f 1983, and Ostler 1980)
We do not accept the "strong" version of
localism, i.e., that all f o r m mirrors f u n c t i o n
that A R G U M E N T MAPPING classes
arise f r o m metaphors based on spatial
relations U n l i k e case grammar, we limit
the n u m b e r of cases or roles to a small set,
based on how they are m a n i f e s t e d in
s u r f a c e syntax We subsequently "interpret"
roles based on the semantic class of the
predicate, e.g., the GOAL of an A T T I T U D E
is generally an a n i m a t e "experiencer'
For example, in the NLKB the
A R G U M E N T M A P P I N G o f predicates
w h i c h denote a c h a n g e in spatial relation specifies a GOAL a r g u m e n t , m a r k e d with prepositions which posit a GOAL relation
(to, into, a n d onto) and a SOURCE
a r g u m e n t , m a r k e d with prepositions which posit a SOURCE relation (from, out of, o f f
of) A sub-class of these predicates, n a m e l y Vendler's (1967) achievements, m a r k the GOAL a r g u m e n t with prepositions which posit an O V E R L A P relation (at, in)
Compare:
MOVE t o / i n t o / o n t o
f r o m / o u t o f / o f f of ARRIVE a t / i n
from The entries f o r these verbs in SRA's lexicon
m e r e l y s p e c i f y which semantic class they belong to (e.g., SPATIAL-RELATION),
w h e t h e r t h e y are stative or d y n a m i c ,
w h e t h e r they allow an agent, and w h e t h e r
t h e y denote an achievement T h e i r
A R G U M E N T MAPPING is not e n t e r e d explicitly in the lexicon The verb reach,
on the other hand, which marks its GOAL idiosyncratically, as a d i r e c t object, would have this f a c t in its lexical entry
2.1 G R O U P I N G SEMANTIC ROLES
Both on i m p l e m e n t a t i o n a l and on theoretical grounds, we have grouped
c e r t a i n semantic roles into superclasses Such groupings arc common in the
l i t e r a t u r e on case and v a l e n c y (see Somers 1987) and are also supported by cross- linguistic evidence Our grouping of roles follows previous work For example, the
A G E N T SUPERCLASS covers both a n i m a t e agents as well as i n a n i m a t e instruments A
G R O U N D SUPERCLASS (as discussed in
T a l m y 1985) includes both S O U R C E and
i n c l u d e s G O A L , P U R P O S E , an'd DIRECTION
Certain semantic roles, like GOAL and SOURCE, as well as being sisters are
"privatives", that is, opposites semantically
O u r r e p r e s e n t a t i o n s c h e m e
d i f f e r e n t i a t e s between lexically and semantically d e t e r m i n e d prepositions We will show how this r e p r e s e n t a t i o n facilitates recognition of the type of error, and t h e r e f o r e principled relaxation of the constraints F u r t h e r m o r e , a principled
Trang 4relaxation of the constraints depends in
m a n y instances on knowing the relationship
between the n o n - s t a n d a r d and the expected
prepositions: are t h e y sisters, privatives, or
is the n o n - s t a n d a r d preposition a p a r e n t of
the expected preposition
In the following section we present
examples of the five types of preposition
usage errors In the subsequent section, we
discuss how our system presently handles
these errors, or how it might e v e n t u a l l y
handle them
3.0 THE DATA
We have classified the variation
data according to the type of substitution
The main types are:
(1) semantic for semantic (Section 3.1),
(2) semantic for lexical (Section 3.2),
(3) blends (Section 3.3),
(4) false starts (Section 3.4), and
(5) semantic for syntactic (Section 3.5)
The data presented below are a
representative sample of a larger group of
examples The c u r r e n t paper covers the
classifications which we have e n c o u n t e r e d
so far; we expect that analysis o f additional
data will provide f u r t h e r types o f
substitutions within each class
3.1 SEMANTIC FOR SEMANTIC
3.1.1 To/From
The substitution of the goal m a r k e r
for the source m a r k e r cross-linguistically is
recognized in the case l i t e r a t u r e (e.g.,
lkegami 1987) In English, this appears to
be more pronounced in certain regional
dialects Common source/goal alternations
cited by Ikegami (1987:125) include: averse
f r o m / t o , d i f f e r e n t f r o m / t o , immune
f r o m / t o , and distinction f r o m / t o The
m a j o r i t y o f e x a m p l e s i n v o l v e to
substituting for from in lexical items which
incorporate a negation of the predicate; the
standard m a r k e r of G R O U N D in this class
of predicates is a SOURCE marker, e.g.,
different from The "positive" counterparts
mark the G R O U N D with GOAL, e.g.,
similar to, as discussed in detail in Gruber
(1976) Variation between to and from can
only occur with verbs which incorporate a
negative, otherwise the semantic distinction
which these prepositions denote is
necessary
(1) The way that he came on to that bereaved
brother completely alienated me TO Mr Bush
9/26/88 MCS
(2) At this moment I'm different TO primitive man 10/12/88 The Mind, PBS
3.1.2 To/With
C o m m u n i c a t i o n and t r a n s f e r of knowledge can be expressed e i t h e r as a process with multiple, equally involved participants, or as an a s y m m e t r i c process with one of the participants as the "agent"
of the t r a n s f e r of i n f o r m a t i o n Our data
d o c u m e n t the substitution of the GOAL
m a r k e r f o r the CO-THEME marker; this
m a y r e f l e c t the t e n d e n c y of English to
p r e f e r "agent" focussing The participants
in a COMMUNICATION situation are similar in their semantic roles, the only
d i f f e r e n c e being one of "viewpoint." By no means all c o m m u n i c a t i o n predicates operate
in this way: e.g., E X P L A N A T I O N ,
T R A N S F E R OF KNOWLEDGE are more
c l e a r l y a s y m m e t r i c T h e s y s t e m
d i f f e r e n t i a t e s between "mutual" and
"asymmetric" c o m m u n i c a t i o n predicates
(3) The only reason they'll chat TO you is, you're either pretty, or they need something from your husband 9/30/88 MCS
(4) 171 have to sit down and explore this TO you 10/16/88
3.2 SEMANTIC FOR LEXICAL
3.2.1 Goal Superclass ( G o a l /
P u r p o s e / D i r e c t i o n )
Goal and purpose are f r e q u e n t l y expressed by the same case-marking, with
the DIRECTION m a r k e r alternating with these at times The s t a n d a r d preposition in these examples is lexically determined In
example (6), instead o f the lexically
d e t e r m i n e d to, which also marks the
semantic role GOAL, a n o t h e r preposition within the same superclass is chosen In
example (5) the phrasally d e t e r m i n e d for is
replaced by the GOAL marker There is
a b u n d a n t cross-linguistic evidence for a GOAL SUPERCLASS which includes GOAL and PURPOSE; to a lesser extent DIRECTION also patterns with these cross- linguistically
(5) It's changing TO the better 8/3/88 MCS (6) Mr Raspberry is almost 200 years behind Washingtonians aspiring FOR full citizenship
10/13/88 WP
Trang 53.2.2 O n / O f
Several examples involve lexical
items expressing knowledge or cognition,
f o r which the s t a n d a r d preposition is
lexically d e t e r m i n e d This preposition is
u n i f o r m l y replaced by on, also a m a r k e r of
the semantic role of R E F E R E N T
Examples include abreast of, grasp of, an
idea of, and knowledge of We claim that
the association of the role R E F E R E N T
with knowledge a n d cognition (as well as
with t r a n s f e r - o f - i n f o r m a t i o n predicates) is
among the more salient associations t h a t
language learners encounter
(7) Terry Brown, 47, a truck driver, agreed;
"with eight years in the White House," he said,
"Bush ought to have a better grasp ON the
details." 9/27/88 NYT p B8
(8) I did get an idea ON the importance o f
consistency as f a r as reward and penalty are
concerned 11/88 ETM j o u r n a l
3.2.3 W i t h / F r o m / T o
In this class, we believe that "mutual
action verbs" such as marry and divorce
r o u t i n e l y show a CO-THEME m a r k e r with
being substituted f o r e i t h e r to or from
Such predicates have a SECONDARY-
MAPPING of P L U R A L - T H E M E in the
NLKB C o m m u n i c a t i o n predicates are
a n o t h e r class which allows a P L U R A L -
THEME and show a l t e r n a t i o n of GOAL
and CO-THEME (Section 3.1.2)
(9) Today Robin Givens said she won't ask
for any money in her divorce WITH Mike
Tyson 10/19/88 ATC
3.3 FALSE S T A R T S
The next set of examples suggests
that the speaker has "retrieved" a
preposition f r o m a d i f f e r e n t A R G U b I E N T
MAPPING f o r the verb or f o r a d i f f e r e n t
a r g u m e n t than the one which is e v e n t u a l l y
produced For example, confused with
replaces confused by in (10), and say to
replaces say about in (11) Such examples
are more prevalent in oral language
Handling these examples is d i f f i c u l t since
all sorts of contextual i n f o r m a t i o n
linguistic and non-linguistic goes into
detecting the error
(10) They didn't want to be confused WITH
the facts 11/14/88 DRS
(11) The memorial service was really well done The rabbi did a good job What do you say TO a kid who died fike that?
11/14/88
3.4 BLENDS
Here, a lexically or phrasally
d e t e r m i n e d preposition is replaced by a preposition associated with a semantically similar lexical item In (12) Q u a y l e says he was smitten about Marilyn, possibly
t h i n k i n g o f crazy about In (13) he may be
t h i n k i n g of on the s u b j e c t / t o p i c of The
q u e s t i o n e r in (14) m a y h a v e in
s u p p o r t / f a v o r o f in mind In (15) Quayle
may have meant we learn by making mistakes In (16), the idiomatic phrase in support o f is c o n f u s e d w i t h the ARGUlVlENT M A P P I N G of the noun
support, e.g., "he showed his support for the president'
(12) I was very smitten A B O U T her I saw
a good thing and I responded rather quickly and she did too 10/20/88 WP, p C8
(13) ON the area o f the federal budget deficit 10/5/88 Sen Q u a y l e in v p debate (& NYT 10/7/88 p B6)
(14) You made one o f the most eloquent speeches I N behalf o f contra aid 10/5/88 Questioner in VP debate (& N Y T 10/7/88 p.B6)
(15) We learn B Y o u r mistakes 10/5/88 Sen
Q u a y l e in v p debate (& NYT 10/7/88 p
B6)
(16) We testified in support FOR medical leave 10/22/88 FFS
3.5 SEMANTIC FOR SYNTACTIC WITH/BY
In the m a j o r i t y of the following examples, the s y n t a c t i c a l l y governed by
m a r k i n g passives is replaced by WITH
This a l t e r n a t i o n of with and by in passives
has been attested for h u n d r e d s of years, and we hypothesize that English may be in the process of r e i n t e r p r e t i n g by, as well as
replacing it with with in c e r t a i n contexts
On the one hand, by is being r e i n t e r p r e t e d
as a m a r k e r of "archetypal" agents, i.e, those high on the scale of A G E N T I V I T Y (i.e., speaker • h u m a n • a n i m a t e • potent • non- animate, non-potent) On the other hand,
a semantically a p p r o p r i a t e m a r k e r is being
Trang 6substituted f o r by
We analyze the WITH in these
examples e i t h e r as the less agentive
A G E N T ( n a m e l y the INSTRUlVlENT) in
example (18), or the less agentive CO-
THEME in example (17) The substitutions
are semantically appropriate and the
substitutes are semantically related to
AGENT •
(17) All o f Russian Hfe was accompanied
WITH some kind o f singing 8/5/88 ATC
(18) Audiences here are especially enthused
WITH Dukakis's description o f the
Reagan-Bush economic policies 11/5/88 ATC
4.0 THE COMPUTATIONAL
IMPLEMENTATION
Of the f i v e types o f errors cited in
Section 3, substitutions o f semantic for
semantic (Section 3.1), semantic f o r lexical
(Section 3.2), and semantic f o r syntactic
(Section 3.5) are the simplest to handle
computationally
4.1 SEMANTIC FOR SEMANTIC OR
LEXICAL
T h e r e p r e s e n t a t i o n s c h e m e
described above (Section 2) facilitates
handling the semantic f o r semantic and
semantic f o r lexical substitutions
Semantic f o r semantic substitutions
are allowed i f
(i) the p r e d i c a t e belongs to the
c o m m u n i c a t i o n class and the s t a n d a r d CO-
THEME m a r k e r is replaced by a GOAL
marker, or
(ii) the predicate incorporates a negative
and GOAL is substituted for a s t a n d a r d
SOURCE, or vice versa
Semantic f o r lexical substitutions
are allowed i f
(iii) the non-standard preposition is a non-
privative sister of the s t a n d a r d preposition
(e.g., in the GOAL SUPERCLASS),
(iv) "the non-standard preposition is the
NLKB-inherited, "default" preposition for
the predicate (e.g., R E F E R E N T for
predicates of cognition and knowledge), or
(v) in the NLKB the p r e d i c a t e allows a SECONDARY-MAPPING of P L U R A L - THElvIE (e.g., m a r i t a l predicates as in the
divorce with example)
H a n d l i n g the use of a non-standard preposition m a r k i n g an a r g u m e n t crucially involves "type-checking', w h e r e i n the "type"
of the noun phrase is checked, e.g f o r membership in an NLKB class such as
a n i m a t e - c r e a t u r e , time, etc T y p e - c h e c k i n g
is also used to n a r r o w the possible senses of the preposition in a prepositional phrase,
as well as to p r e f e r certain m o d i f i e r attachments
Prepositional phrases can have two relations to predicating expressions, i.e., a governed a r g u m e n t (PREP-ARG) or an ADJUNCT During parsing, the system accesses the A R G U M E N T MAPPING for the predicate; once the preposition is recognized as the s t a n d a r d m a r k e r of an argument, an A D J U N C T reading is disallowed The rule f o r P R E P - A R G is a separate rule in the grammar When the preposition does not match the expected preposition, the system checks w h e t h e r any
o f the above conditions (i-v) hold; if so, the parse is accepted, but is assigned a lower likelihood If a parse of the PP as an
A D J U N C T is also accepted, it will be
p r e f e r r e d over the ill-formed PREP-ARG 4.2 SEMANTIC FOR SYNTACTIC
The substitution o f semantic
m a r k i n g f o r syntactic (WITH for BY) is easily handled: d u r i n g semantic mapping
by phrases in the ADJUNCTS are mapped
to the role o f the active subject, assuming
t h a t " t y p e c h e c k i n g " a l l o w s t h a t
i n t e r p r e t a t i o n of the noun phrase It is also possible f o r such a sentence to be ambiguous, e.g., "he was seated by the man' We treat with phrases similarly,
except that a m b i g u i t y between CO-THEME and PASSIVE SUBJECT is not allowed,
based on our observation that with for by
is used f o r noun phrases low on the
a n i m a c y scale Thus, only the CO-THEME
i n t e r p r e t a t i o n is valid if the noun phrase is animate
4.3 FALSE STARTS AND BLENDS
False starts are more d i f f i c u l t , requiring an approach similar to that of case grammar In these examples, the preposition is acceptable with the verb, but not to mark that p a r t i c u l a r argument The
Trang 7type of the a r g u m e n t m a r k e d with the
"incorrect" preposition must be quite
inconsistent with that sense of the
predicate f o r the e r r o r even to be noticed,
since the preposition is acceptable with
some other sense We are assessing the
f r e q u e n c y of false starts in the various
genres in which our system is being used,
to d e t e r m i n e w h e t h e r we need to implement
a strategy to h a n d l e these examples We
p r e d i c t t h a t f u t u r e s y s t e m s f o r
u n d e r s t a n d i n g spoken language will need to
accomodate this phenomenon
We do not h a n d l e blends c u r r e n t l y
T h e y involve a f o r m of analogy, i.e.,
smitten is like mad, s y n t a c t i c a l l y ,
semantically, a n d even stylistically; they
may shed some light on language storage
and retrieval Recognizing the similarity in
order to allow a principled h a n d l i n g seems
very d i f f i c u l t
In addition, blends may provide
evidence f o r a "top down" language
production strategy, in which the a r g u m e n t
s t r u c t u r e is d e t e r m i n e d b e f o r e the lexieai
items are chosen/inserted Our data
suggest that some people may be more
prone to making this type of error than are
others Finally, blends are more f r e q u e n t
in genres in which people a t t e m p t to use a
style that they do not c o m m a n d (e.g.,
student papers, radio talk shows)
5.0 DIRECTIONS FOR F U T U R E WORK
In this paper we have described a
f r e q u e n t type of ill-formed input which
NLP systems must handle, involving the use
of n o n - s t a n d a r d prepositions to m a r k
arguments We presented a classification of
these errors and described our algorithm
for handling some of these error types The
importance of h a n d l i n g such n o n - s t a n d a r d
input will increase as speech recognition
becomes more reliable, because spoken
input is less formal
In t h e n e a r t e r m , p l a n n e d
e n h a n c e m e n t s include adjusting the
weighting scheme to more a c c u r a t e l y
r e f l e c t the empirical data A f r e q u e n c y -
based model of preposition usage, based on
a much larger and b r o a d e r sampling of text
will improve system h a n d l i n g of those
errors
ACKNOWLEDGEMENTS
We would like to express our
a p p r e c i a t i o n o f o u r c o l l e a g u e s '
c o n t r i b u t i o n s to the SRA NLP system: Gayle Aycrs, A n d r e w FanG, Ben Fine,
K a r y n G e r m a n , Mary Dee Harris, David Reel, and Robert M Simmons
R E F E R E N C E S
1 Aitchison, Jean 1987 Words in the Mind
Blackwell, NY
2 Blejer, Hatte a n d Sharon Flank 1988 More E v i d e n c e f o r the Semantic Basis of Prepositional Case Marking, d e l i v e r e d December 28, 1988, Linguistic Society of
A m e r i c a A n n u a l Meeting, New Orleans
3 Bresnan, Joan, cd 1982 The Mental Representation of Grammatical Relations
MIT Press, Cambridge
4 Carbonell, Jaime and Philip Hayes 1983
R e c o v e r y S t r a t e g i e s f o r P a r s i n g
E x t r a g r a m m a t i c a l Language American Journal of Computational Linguistics 9(3-4): 123-146
5 Chierchia, G e n n a r o , Barbara Partee, and
R a y m o n d T u r n e r , eds 1989 Properties, Types and Meaning K l u w e r , Dordrecht
6 Chomsky, Noam 1981 Lectures on Government and Binding Foris, Dordrecht
7 Croft, William 1986 Categories and Relations in Syntax: The Clause-Level Organization of Information Ph.D Dissertation, S t a n f o r d University
8 Dahlgren, Kathleen 1988 Naive
S e m a n t i c s f o r N a t u r a l L a n g u a g e Understanding K l u w e r , Boston
9 Dirven, Rene and G u n t e r R a d d e n , eds
1987 Concepts o/ Case G u n t e r Narr, Tubingen
10 Dowry, David 1989 On the Semantic Content of the Notion of ' T h e m a t i c Role'
In Chierchia, et al II:69-129
11 Foley, William and R o b e r t Van Valin Jr
1984 Functional Syntax and Universal Grammar Cambridge Univ Press, Cambridge
Trang 812 Gawron, Jean Mark 1988 Lexical
Representations and the Semantics of
Complementation Garland, NY
13 Gazdar, Gerald, Ewan Klein, Geoffrey
Pullum, and Ivan Sag (GKPS) 1985
Generalized Phrase Structure Grammar
Harvard Univ Press, Cambridge
14 Gruber, Jeffrey 1 9 7 6 Lexical
Structures in Syntax and Semantics North-
Holland, Amsterdam
15 Haiman, John 1985 Natural Syntax:
lconicity and Erosion Cambridge
University Press, Cambridge
16 Hirst, Graeme 1 9 8 7 Semantic
Interpretation and the Resolution of
Ambiguity Cambridge University Press,
Cambridge
17 Ikegami, Yoshihiko 1987 'Source' vs
'Goal': a Case of Linguistic Dissymetry, in
Dirven and Radden 122-146
18 Jackendoff, Ray 1983 Semantics and
Cognitwn MIT Press, Cambridge
19 Jensen, Karen, George Heidorn, Lance
Miller and Yael Ravin 1983 Parse Fitting
and Prose Fixing: Getting a Hold on Ill-
formedness American Journal o f
Computational Linguistics 9(3-4): 147-160
20 Menyuk, Paula 1969 Sentences Children
Use MIT Press, Cambridge
21 Miller, Glenn and Philip Johnson-Laird
1976 Language and Perception Harvard
University Press, Cambridge
22 Ostler, Nicholas 1980 A Theory of
Case Linking and Agreement Indiana
University Linguistics Club
23 Pinker, Steven 1982 A Theory of the
Acquisition of Lexical Interpretive
Grammars, in Bresnan 655-726
24 Shopen, Timothy, ed 1985 Language
Typology and Syntactic Description
Cambridge University Press, Cambridge
25 Somers, H L 1987 Valency and Case in
Computational Linguistics Edinburgh
University Press, Edinburgh
26 Talmy, Leonard 1985 Lexicalization Patterns: Semantic Structure in Lexical Forms In Shopen III:57-149
27 Tomita, Masuru 1986 Efficient Parsing for Natural Language Kluwer, Boston
28 Vendler, Zeno 1967 Linguistics in Philosophy Cornell University Press, Ithaca
29 Weischedel, Ralph and Norman Sondheimer 1983 Meta-rules as a Basis for Processing Ill-Formed Input American Journal of Computational Linguistics 9(3-
4):161-177
APPENDIX A DATA SOURCES ATC: National Public Radio news program, "All Things Considered"
ME: National Public Radio news program, "Morning Edition"
WE: National Public Radio news program, "Weekend Edition"
MCS: WAMU radio, Washington D.C., interview program, "The Mike Cuthbert Show"
DRS: WAMU radio, Washington D.C., interview program, "Diane Rehm Show" FFS: WAMU radio, Washington D.C., interview program, "Fred Fiske Saturday" AIH: Canadian Broadcasting Company radio news program, "As It Happens" NYT: The New York Times
WP: The Washington Post ETM_: Student journal for "Effective Teaching Methods," a junior undergraduate course