1. Trang chủ
  2. » Luận Văn - Báo Cáo

Báo cáo khoa học: "ON REPRESENTING GOVERNED PREPOSITIONS AND HANDLING "INCORRECT" AND NOVEL PREPOSITIONS" ppt

8 282 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 8
Dung lượng 678,85 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

The second contribution is our typology of errors in preposition usage.. MIT Press, Cambridge.. Cambridge University Press, Cambridge.. Cambridge University Press, Cambridge.. MIT Pres

Trang 1

ON R E P R E S E N T I N G G O V E R N E D P R E P O S I T I O N S AND

H A N D L I N G " I N C O R R E C T " AND NOVEL P R E P O S I T I O N S

H a t t e R Blejer, Sharon F l a n k , a n d A n d r e w K c h l e r

SRA C o r p o r a t i o n

2000 15th St N o r t h

A r l i n g t o n , VA 22201, USA

A B S T R A C T

N L P systems, in o r d e r to be robust,

must h a n d l e novel a n d i l l - f o r m e d i n p u t

One c o m m o n type o f error involves the use

o f n o n - s t a n d a r d prepositions to m a r k

a r g u m e n t s In this paper, we argue t h a t

such errors can be h a n d l e d in a systematic

f a s h i o n , a n d t h a t a system d e s i g n e d to

h a n d l e t h e m o f f e r s o t h e r advantages We

o f f e r a c l a s s i f i c a t i o n s c h e m e f o r

p r e p o s i t i o n usage errors F u r t h e r , we show

h o w t h e k n o w l e d g e r e p r e s e n t a t i o n

e m p l o y e d in t h e SRA NLP system

facilitates h a n d l i n g these data

1.0 I N T R O D U C T I O N

It is well k n o w n t h a t NLP systems,

in o r d e r to be robust, must h a n d l e ill-

f o r m e d i n p u t One c o m m o n type o f error

i n v o l v e s t h e use o f n o n - s t a n d a r d

prepositions to m a r k arguments In this

paper, we a r g u e t h a t such errors can be

h a n d l e d in a systematic fashion, a n d t h a t a

system d e s i g n e d to h a n d l e t h e m o f f e r s

o t h e r a d v a n t a g e s

T h e examples o f n o n - s t a n d a r d

prepositions we present in the p a p e r are

taken f r o m colloquial language, both

w r i t t e n a n d oral T h e type o f error these

examples represent is q u i t e f r e q u e n t in

colloquial w r i t t e n language T h e f r e q u e n c y

o f such examples rises sharply in evolving

sub-languages a n d in oral colloquial

language In d e v e l o p i n g an NLP system to

be used by various U.S g o v e r n m e n t

customers, we have been sensitized to the

need to h a n d l e v a r i a t i o n a n d i n n o v a t i o n in

preposition usage H a n d l i n g this type o f

v a r i a t i o n or i n n o v a t i o n is part o f o u r

overall c a p a b i l i t y to h a n d l e novel

predicates, w h i c h arc f r e q u e n t in sub-

language Novel predicates created f o r sub-

languages arc less "stable" in how they m a r k

a r g u m e n t s ( A R G U M E N T MAPPING) t h a n

general English "core" predicates w h i c h

speakers learn as c h i l d r e n It can be

expected that the e v e n t u a l a d v e n t o f

successful speech u n d e r s t a n d i n g systems will f u r t h e r e m p h a s i z e the n e e d to h a n d l e this a n d o t h e r v a r i a t i o n

T h e N L P system u n d e r d e v e l o p m e n t

at S R A i n c o r p o r a t e s a N a t u r a l L a n g u a g e

K n o w l e d g e Base (NLKB), a m a j o r part o f

w h i c h consists o f objects r e p r e s e n t i n g

S E M A N T I C P R E D I C A T E CLASSES T h e system uses h i e r a r c h i c a l k n o w l e d g e sources; all general "class-level" c h a r a c t e r i s t i c s o f a

s e m a n t i c p r e d i c a t e class, i n c l u d i n g the

n u m b e r , type, a n d m a r k i n g o f t h e i r

a r g u m e n t s , are p u t in the NLKB This leads to increased e f f i c i e n c y in a n u m b e r

o f system aspects, e.g., the lexicon is more

c o m p a c t a n d easier to m o d i f y since it only

c o n t a i n s i d i o s y n c r a t i c i n f o r m a t i o n This

r e p r e s e n t a t i o n allows us to d i s t i n g u i s h

b e t w e e n I c x i c a l l y a n d s e m a n t i c a l l y

d e t e r m i n e d ARGUIVIENT M A P P I N G a n d to

f o r m u l a t e general class-level c o n s t r a i n t

r e l a x a t i o n m e c h a n i s m s

I I C L A S S I F Y I N G P R E P O S I T I O N

U S A G E

P r e p o s i t i o n usage in English in positions g o v e r n e d by p r e d i c a t i n g elements,

w h e t h e r a d j e c t i v a l , verbal, or n o m i n a l , may

be classified as (I) lexically d e t e r m i n e d , (2)

s y n t a c t i c a l l y d e t e r m i n e d , o r (3)

s e m a n t i c a l l y d e t e r m i n e d Examples are:

L E X I C A L L Y D E T E R M I N E D :

laugh at, afraid of

S Y N T A C T I C A L L Y D E T E R M I N E D :

by in passive sentences

S E M A N T I C A L L Y D E T E R M I N E D :

move to~from

P r e p o s i t i o n usage in i d i o m a t i c phrases is also c o n s i d e r e d to be lexically d e t e r m i n e d , e.g., ~ respect to

1.2 A T Y P O L O G Y OF E R R O R S IN

P R E P O S I T I O N U S A G E

We have classified o u r corpus o f examples o f the use o f n o n - s t a n d a r d

Trang 2

prepositions into the following categories:

(1) s u b s t i t u t i o n o f a s e m a n t i c a l l y

appropriate preposition either f r o m the

same class or a n o t h e r f o r a semantically

d e t e r m i n e d one, (2) substitution of a

semantically appropriate preposition f o r a

lexically d e t e r m i n e d one, (3) false starts,

(4) blends, and (5) substitution of a

semantically appropriate preposition f o r a

s y n t a c t i c a l l y d e t e r m i n e d one A small

percentage of the non-standard use of

prepositions appears to be random

1.3 COMPUTATIONAL APPLICATIONS

OF T H I S WORK

In a theoretical linguistics f o r u m

(Blejcr and Flank 1988), we argued that

these examples of the use of non-standard

prepositions to m a r k arguments (1)

represent the kind of principled variation

that underlies language change, and (2)

support a semantic analysis of government

that utilizes t h e m a t i c roles, citing other

evidence f o r the semantic basis of

prepositional case m a r k i n g f r o m studies of

language d y s f u n c t i o n (Aitchison 1987:103),

language acquisition (Pinker 1982:678;

Mcnyuk 1969:56), and typological, cross-

linguistic studies on case-marking systems

More theoretical aspects of our work

( i n c l u d i n g d i a c h r o n i ¢ c h a n g e a n d

arguments f o r and against p a r t i c u l a r

linguistic theories) were covered in that

paper; here we concentrate on issues of

interest to a computational linguistics

forum First, our n a t u r a l language

knowledge representation and processing

strategies take into account the semantic

basis of prepositional case marking, and

thus f a c i l i t a t e handling non-standard and

novel use of prepositions to mark

arguments The second contribution is our

typology of errors in preposition usage We

claim that an NLP system which accepts

n a t u r a l l y occurring input must recognize

the type of the error to know how to

compensate f o r it F u r t h e r m o r e , the

knowledge representation scheme we have

implemented is an e f f i c i e n t representation

for English and lends itself to adaptation to

representing non-English case-marking as

well

T h e r e is w i d e v a r i a t i o n in

computational strategies for mapping f r o m

the actual n a t u r a l language expression to

some sort of P R E D I C A T E - A R G U M E N T

representation At issue is how the system

recognizes the arguments of the predicate

At one end of the spectrum is an approach which allows a n y m a r k i n g of a r g u m e n t s if the type o f the a r g u m e n t is correct f o r that predicate This approach is i n a d e q u a t e because it ignores vital i n f o r m a t i o n carried

by the preposition At the other extreme is

a semantically constrained syntactic parse,

in m a n y ways a highly desirable strategy This latter method, however, constrains more strictly than what h u m a n s actually produce and understand Our strategy has been to use the latter method, allowing relaxation of those constraints, u n d e r certain well-specified circumstances

Constraint relaxation has been recognized as a viable strategy for handling ill-formed input Most discussion centers a r o u n d orthographic errors and errors in subject-verb agreement Jensen, Heidorn, Miller, and Ravin (1983:158) note the importance of "relaxing restrictions in the g r a m m a r rules in some principled way." Knowing which constraints to relax and avoiding a p r o l i f e r a t i o n of incorrect parses however, is a non-trivial task Weischedel

a n d S o n d h e i m e r .(1983:163ff) o f f e r

c a u t i o n a r y advice on this subject

There has been some discussion of errors similar to those cited in our paper Carbonell and Hayes (1983:132) observed that "problems created by the absence of expected case markers can be overcome by the application of domain knowledge" using case f r a m e instantiation We agree with these authors that the use of domain knowledge is an i m p o r t a n t element in

u n d e r s t a n d i n g ill-formed input However,

in instances where the preposition is not omitted, but r a t h e r replaced by a non-

s t a n d a r d preposition, we claim that an

u n d e r s t a n d i n g of the linguistic principles involved in the substitution is necessary

To e x p l a i n h o w c o n s t r a i n t relaxation is accomplished, a brief system description is needed Our system uses a parser based on Tomita (1986), with modifications to allow constraints and structure-building It uses c o n t e x t - f r e e phrase s t r u c t u r e rules, a u g m e n t e d with morphological, contextual, and semantic constraints Application of the phrase structure rules results in a parse tree, similar to a Lexical-Functional G r a m m a r (LFG) "c-structure" (Bresnan 1982) The constraints are u n i f i e d at parse time to produce a f u n c t i o n a l l y labelled template (FLT) The FLT is then input to a semantic translation module Using A R G U M E N T

Trang 3

MAPPING rules a n d other operator-

o p e r a n d s e m a n t i c r u l e s , s e m a n t i c

translation creates situation f r a m e s (SF)

SFs consist of a predicate a n d e n t i t y f r a m e s

(EF), whose semantic roles in the situation

are labeled Other semantic objects are

r e l a t i o n a l f r a m e s (e.g p r e p o s i t i o n a l

phrases), p r o p e r t y f r a m e s (e.g adjective

phrases), and unit f r a m e s (measure phrases)

D u r i n g the semantic i n t e r p r e t a t i o n and

discourse analysis phase, the situation

f r a m e is i n t e r p r e t e d , resulting in one or

more i n s t a n t i a t e d knowledge base (KB)

objects, w h i c h are state or event

descriptions with e n t i t y participants

2.0 R E P R E S E N T I N G ARGUMENT

MAPPING IN AN NLP SYSTEM

In our lexicons, verbs and adjectives

are linked to one or more predicate classes

which are d e f i n e d in the N a t u r a l Language

K n o w l e d g e Base (NLKB) Predicates

typically govern one or more arguments or

t h e m a t i c roles All general, class-level

i n f o r m a t i o n about the t h e m a t i c roles which

a given predicate governs is represented at

the highest possible level Only

i d i o s y n c r a t i c i n f o r m a t i o n is represented in

the lexicon When lexicons are loaded the

i d i o s y n c r a t i c i n f o r m a t i o n in the lexicon is

u n i f i e d with the general i n f o r m a t i o n in the

NLKB Our representation scheme has

c e r t a i n i m p l e m e n t a t i o n a l a d v a n t a g e s :

lexicons are less e r r o r - p r o n e and easier to

m o d i f y , the d a t a are more compact,

constraint relaxation is f a c i l i t a t e d , etc

More i m p o r t a n t l y , we claim that such

semantic classes are psychologically valid

Our representation scheme is based

on the principle that A R G U M E N T

MAPPING is generally d e t e r m i n e d at the

class-level, i.e., predicates group along

semantic lines as to the type of

A R G U M E N T MAPPING they take Our

work draws f r o m theoretical linguistic

studies of t h e m a t i c relations (e.g., G r u b e r

1976, J a c k e n d o f f 1983, and Ostler 1980)

We do not accept the "strong" version of

localism, i.e., that all f o r m mirrors f u n c t i o n

that A R G U M E N T MAPPING classes

arise f r o m metaphors based on spatial

relations U n l i k e case grammar, we limit

the n u m b e r of cases or roles to a small set,

based on how they are m a n i f e s t e d in

s u r f a c e syntax We subsequently "interpret"

roles based on the semantic class of the

predicate, e.g., the GOAL of an A T T I T U D E

is generally an a n i m a t e "experiencer'

For example, in the NLKB the

A R G U M E N T M A P P I N G o f predicates

w h i c h denote a c h a n g e in spatial relation specifies a GOAL a r g u m e n t , m a r k e d with prepositions which posit a GOAL relation

(to, into, a n d onto) and a SOURCE

a r g u m e n t , m a r k e d with prepositions which posit a SOURCE relation (from, out of, o f f

of) A sub-class of these predicates, n a m e l y Vendler's (1967) achievements, m a r k the GOAL a r g u m e n t with prepositions which posit an O V E R L A P relation (at, in)

Compare:

MOVE t o / i n t o / o n t o

f r o m / o u t o f / o f f of ARRIVE a t / i n

from The entries f o r these verbs in SRA's lexicon

m e r e l y s p e c i f y which semantic class they belong to (e.g., SPATIAL-RELATION),

w h e t h e r t h e y are stative or d y n a m i c ,

w h e t h e r they allow an agent, and w h e t h e r

t h e y denote an achievement T h e i r

A R G U M E N T MAPPING is not e n t e r e d explicitly in the lexicon The verb reach,

on the other hand, which marks its GOAL idiosyncratically, as a d i r e c t object, would have this f a c t in its lexical entry

2.1 G R O U P I N G SEMANTIC ROLES

Both on i m p l e m e n t a t i o n a l and on theoretical grounds, we have grouped

c e r t a i n semantic roles into superclasses Such groupings arc common in the

l i t e r a t u r e on case and v a l e n c y (see Somers 1987) and are also supported by cross- linguistic evidence Our grouping of roles follows previous work For example, the

A G E N T SUPERCLASS covers both a n i m a t e agents as well as i n a n i m a t e instruments A

G R O U N D SUPERCLASS (as discussed in

T a l m y 1985) includes both S O U R C E and

i n c l u d e s G O A L , P U R P O S E , an'd DIRECTION

Certain semantic roles, like GOAL and SOURCE, as well as being sisters are

"privatives", that is, opposites semantically

O u r r e p r e s e n t a t i o n s c h e m e

d i f f e r e n t i a t e s between lexically and semantically d e t e r m i n e d prepositions We will show how this r e p r e s e n t a t i o n facilitates recognition of the type of error, and t h e r e f o r e principled relaxation of the constraints F u r t h e r m o r e , a principled

Trang 4

relaxation of the constraints depends in

m a n y instances on knowing the relationship

between the n o n - s t a n d a r d and the expected

prepositions: are t h e y sisters, privatives, or

is the n o n - s t a n d a r d preposition a p a r e n t of

the expected preposition

In the following section we present

examples of the five types of preposition

usage errors In the subsequent section, we

discuss how our system presently handles

these errors, or how it might e v e n t u a l l y

handle them

3.0 THE DATA

We have classified the variation

data according to the type of substitution

The main types are:

(1) semantic for semantic (Section 3.1),

(2) semantic for lexical (Section 3.2),

(3) blends (Section 3.3),

(4) false starts (Section 3.4), and

(5) semantic for syntactic (Section 3.5)

The data presented below are a

representative sample of a larger group of

examples The c u r r e n t paper covers the

classifications which we have e n c o u n t e r e d

so far; we expect that analysis o f additional

data will provide f u r t h e r types o f

substitutions within each class

3.1 SEMANTIC FOR SEMANTIC

3.1.1 To/From

The substitution of the goal m a r k e r

for the source m a r k e r cross-linguistically is

recognized in the case l i t e r a t u r e (e.g.,

lkegami 1987) In English, this appears to

be more pronounced in certain regional

dialects Common source/goal alternations

cited by Ikegami (1987:125) include: averse

f r o m / t o , d i f f e r e n t f r o m / t o , immune

f r o m / t o , and distinction f r o m / t o The

m a j o r i t y o f e x a m p l e s i n v o l v e to

substituting for from in lexical items which

incorporate a negation of the predicate; the

standard m a r k e r of G R O U N D in this class

of predicates is a SOURCE marker, e.g.,

different from The "positive" counterparts

mark the G R O U N D with GOAL, e.g.,

similar to, as discussed in detail in Gruber

(1976) Variation between to and from can

only occur with verbs which incorporate a

negative, otherwise the semantic distinction

which these prepositions denote is

necessary

(1) The way that he came on to that bereaved

brother completely alienated me TO Mr Bush

9/26/88 MCS

(2) At this moment I'm different TO primitive man 10/12/88 The Mind, PBS

3.1.2 To/With

C o m m u n i c a t i o n and t r a n s f e r of knowledge can be expressed e i t h e r as a process with multiple, equally involved participants, or as an a s y m m e t r i c process with one of the participants as the "agent"

of the t r a n s f e r of i n f o r m a t i o n Our data

d o c u m e n t the substitution of the GOAL

m a r k e r f o r the CO-THEME marker; this

m a y r e f l e c t the t e n d e n c y of English to

p r e f e r "agent" focussing The participants

in a COMMUNICATION situation are similar in their semantic roles, the only

d i f f e r e n c e being one of "viewpoint." By no means all c o m m u n i c a t i o n predicates operate

in this way: e.g., E X P L A N A T I O N ,

T R A N S F E R OF KNOWLEDGE are more

c l e a r l y a s y m m e t r i c T h e s y s t e m

d i f f e r e n t i a t e s between "mutual" and

"asymmetric" c o m m u n i c a t i o n predicates

(3) The only reason they'll chat TO you is, you're either pretty, or they need something from your husband 9/30/88 MCS

(4) 171 have to sit down and explore this TO you 10/16/88

3.2 SEMANTIC FOR LEXICAL

3.2.1 Goal Superclass ( G o a l /

P u r p o s e / D i r e c t i o n )

Goal and purpose are f r e q u e n t l y expressed by the same case-marking, with

the DIRECTION m a r k e r alternating with these at times The s t a n d a r d preposition in these examples is lexically determined In

example (6), instead o f the lexically

d e t e r m i n e d to, which also marks the

semantic role GOAL, a n o t h e r preposition within the same superclass is chosen In

example (5) the phrasally d e t e r m i n e d for is

replaced by the GOAL marker There is

a b u n d a n t cross-linguistic evidence for a GOAL SUPERCLASS which includes GOAL and PURPOSE; to a lesser extent DIRECTION also patterns with these cross- linguistically

(5) It's changing TO the better 8/3/88 MCS (6) Mr Raspberry is almost 200 years behind Washingtonians aspiring FOR full citizenship

10/13/88 WP

Trang 5

3.2.2 O n / O f

Several examples involve lexical

items expressing knowledge or cognition,

f o r which the s t a n d a r d preposition is

lexically d e t e r m i n e d This preposition is

u n i f o r m l y replaced by on, also a m a r k e r of

the semantic role of R E F E R E N T

Examples include abreast of, grasp of, an

idea of, and knowledge of We claim that

the association of the role R E F E R E N T

with knowledge a n d cognition (as well as

with t r a n s f e r - o f - i n f o r m a t i o n predicates) is

among the more salient associations t h a t

language learners encounter

(7) Terry Brown, 47, a truck driver, agreed;

"with eight years in the White House," he said,

"Bush ought to have a better grasp ON the

details." 9/27/88 NYT p B8

(8) I did get an idea ON the importance o f

consistency as f a r as reward and penalty are

concerned 11/88 ETM j o u r n a l

3.2.3 W i t h / F r o m / T o

In this class, we believe that "mutual

action verbs" such as marry and divorce

r o u t i n e l y show a CO-THEME m a r k e r with

being substituted f o r e i t h e r to or from

Such predicates have a SECONDARY-

MAPPING of P L U R A L - T H E M E in the

NLKB C o m m u n i c a t i o n predicates are

a n o t h e r class which allows a P L U R A L -

THEME and show a l t e r n a t i o n of GOAL

and CO-THEME (Section 3.1.2)

(9) Today Robin Givens said she won't ask

for any money in her divorce WITH Mike

Tyson 10/19/88 ATC

3.3 FALSE S T A R T S

The next set of examples suggests

that the speaker has "retrieved" a

preposition f r o m a d i f f e r e n t A R G U b I E N T

MAPPING f o r the verb or f o r a d i f f e r e n t

a r g u m e n t than the one which is e v e n t u a l l y

produced For example, confused with

replaces confused by in (10), and say to

replaces say about in (11) Such examples

are more prevalent in oral language

Handling these examples is d i f f i c u l t since

all sorts of contextual i n f o r m a t i o n

linguistic and non-linguistic goes into

detecting the error

(10) They didn't want to be confused WITH

the facts 11/14/88 DRS

(11) The memorial service was really well done The rabbi did a good job What do you say TO a kid who died fike that?

11/14/88

3.4 BLENDS

Here, a lexically or phrasally

d e t e r m i n e d preposition is replaced by a preposition associated with a semantically similar lexical item In (12) Q u a y l e says he was smitten about Marilyn, possibly

t h i n k i n g o f crazy about In (13) he may be

t h i n k i n g of on the s u b j e c t / t o p i c of The

q u e s t i o n e r in (14) m a y h a v e in

s u p p o r t / f a v o r o f in mind In (15) Quayle

may have meant we learn by making mistakes In (16), the idiomatic phrase in support o f is c o n f u s e d w i t h the ARGUlVlENT M A P P I N G of the noun

support, e.g., "he showed his support for the president'

(12) I was very smitten A B O U T her I saw

a good thing and I responded rather quickly and she did too 10/20/88 WP, p C8

(13) ON the area o f the federal budget deficit 10/5/88 Sen Q u a y l e in v p debate (& NYT 10/7/88 p B6)

(14) You made one o f the most eloquent speeches I N behalf o f contra aid 10/5/88 Questioner in VP debate (& N Y T 10/7/88 p.B6)

(15) We learn B Y o u r mistakes 10/5/88 Sen

Q u a y l e in v p debate (& NYT 10/7/88 p

B6)

(16) We testified in support FOR medical leave 10/22/88 FFS

3.5 SEMANTIC FOR SYNTACTIC WITH/BY

In the m a j o r i t y of the following examples, the s y n t a c t i c a l l y governed by

m a r k i n g passives is replaced by WITH

This a l t e r n a t i o n of with and by in passives

has been attested for h u n d r e d s of years, and we hypothesize that English may be in the process of r e i n t e r p r e t i n g by, as well as

replacing it with with in c e r t a i n contexts

On the one hand, by is being r e i n t e r p r e t e d

as a m a r k e r of "archetypal" agents, i.e, those high on the scale of A G E N T I V I T Y (i.e., speaker • h u m a n • a n i m a t e • potent • non- animate, non-potent) On the other hand,

a semantically a p p r o p r i a t e m a r k e r is being

Trang 6

substituted f o r by

We analyze the WITH in these

examples e i t h e r as the less agentive

A G E N T ( n a m e l y the INSTRUlVlENT) in

example (18), or the less agentive CO-

THEME in example (17) The substitutions

are semantically appropriate and the

substitutes are semantically related to

AGENT •

(17) All o f Russian Hfe was accompanied

WITH some kind o f singing 8/5/88 ATC

(18) Audiences here are especially enthused

WITH Dukakis's description o f the

Reagan-Bush economic policies 11/5/88 ATC

4.0 THE COMPUTATIONAL

IMPLEMENTATION

Of the f i v e types o f errors cited in

Section 3, substitutions o f semantic for

semantic (Section 3.1), semantic f o r lexical

(Section 3.2), and semantic f o r syntactic

(Section 3.5) are the simplest to handle

computationally

4.1 SEMANTIC FOR SEMANTIC OR

LEXICAL

T h e r e p r e s e n t a t i o n s c h e m e

described above (Section 2) facilitates

handling the semantic f o r semantic and

semantic f o r lexical substitutions

Semantic f o r semantic substitutions

are allowed i f

(i) the p r e d i c a t e belongs to the

c o m m u n i c a t i o n class and the s t a n d a r d CO-

THEME m a r k e r is replaced by a GOAL

marker, or

(ii) the predicate incorporates a negative

and GOAL is substituted for a s t a n d a r d

SOURCE, or vice versa

Semantic f o r lexical substitutions

are allowed i f

(iii) the non-standard preposition is a non-

privative sister of the s t a n d a r d preposition

(e.g., in the GOAL SUPERCLASS),

(iv) "the non-standard preposition is the

NLKB-inherited, "default" preposition for

the predicate (e.g., R E F E R E N T for

predicates of cognition and knowledge), or

(v) in the NLKB the p r e d i c a t e allows a SECONDARY-MAPPING of P L U R A L - THElvIE (e.g., m a r i t a l predicates as in the

divorce with example)

H a n d l i n g the use of a non-standard preposition m a r k i n g an a r g u m e n t crucially involves "type-checking', w h e r e i n the "type"

of the noun phrase is checked, e.g f o r membership in an NLKB class such as

a n i m a t e - c r e a t u r e , time, etc T y p e - c h e c k i n g

is also used to n a r r o w the possible senses of the preposition in a prepositional phrase,

as well as to p r e f e r certain m o d i f i e r attachments

Prepositional phrases can have two relations to predicating expressions, i.e., a governed a r g u m e n t (PREP-ARG) or an ADJUNCT During parsing, the system accesses the A R G U M E N T MAPPING for the predicate; once the preposition is recognized as the s t a n d a r d m a r k e r of an argument, an A D J U N C T reading is disallowed The rule f o r P R E P - A R G is a separate rule in the grammar When the preposition does not match the expected preposition, the system checks w h e t h e r any

o f the above conditions (i-v) hold; if so, the parse is accepted, but is assigned a lower likelihood If a parse of the PP as an

A D J U N C T is also accepted, it will be

p r e f e r r e d over the ill-formed PREP-ARG 4.2 SEMANTIC FOR SYNTACTIC

The substitution o f semantic

m a r k i n g f o r syntactic (WITH for BY) is easily handled: d u r i n g semantic mapping

by phrases in the ADJUNCTS are mapped

to the role o f the active subject, assuming

t h a t " t y p e c h e c k i n g " a l l o w s t h a t

i n t e r p r e t a t i o n of the noun phrase It is also possible f o r such a sentence to be ambiguous, e.g., "he was seated by the man' We treat with phrases similarly,

except that a m b i g u i t y between CO-THEME and PASSIVE SUBJECT is not allowed,

based on our observation that with for by

is used f o r noun phrases low on the

a n i m a c y scale Thus, only the CO-THEME

i n t e r p r e t a t i o n is valid if the noun phrase is animate

4.3 FALSE STARTS AND BLENDS

False starts are more d i f f i c u l t , requiring an approach similar to that of case grammar In these examples, the preposition is acceptable with the verb, but not to mark that p a r t i c u l a r argument The

Trang 7

type of the a r g u m e n t m a r k e d with the

"incorrect" preposition must be quite

inconsistent with that sense of the

predicate f o r the e r r o r even to be noticed,

since the preposition is acceptable with

some other sense We are assessing the

f r e q u e n c y of false starts in the various

genres in which our system is being used,

to d e t e r m i n e w h e t h e r we need to implement

a strategy to h a n d l e these examples We

p r e d i c t t h a t f u t u r e s y s t e m s f o r

u n d e r s t a n d i n g spoken language will need to

accomodate this phenomenon

We do not h a n d l e blends c u r r e n t l y

T h e y involve a f o r m of analogy, i.e.,

smitten is like mad, s y n t a c t i c a l l y ,

semantically, a n d even stylistically; they

may shed some light on language storage

and retrieval Recognizing the similarity in

order to allow a principled h a n d l i n g seems

very d i f f i c u l t

In addition, blends may provide

evidence f o r a "top down" language

production strategy, in which the a r g u m e n t

s t r u c t u r e is d e t e r m i n e d b e f o r e the lexieai

items are chosen/inserted Our data

suggest that some people may be more

prone to making this type of error than are

others Finally, blends are more f r e q u e n t

in genres in which people a t t e m p t to use a

style that they do not c o m m a n d (e.g.,

student papers, radio talk shows)

5.0 DIRECTIONS FOR F U T U R E WORK

In this paper we have described a

f r e q u e n t type of ill-formed input which

NLP systems must handle, involving the use

of n o n - s t a n d a r d prepositions to m a r k

arguments We presented a classification of

these errors and described our algorithm

for handling some of these error types The

importance of h a n d l i n g such n o n - s t a n d a r d

input will increase as speech recognition

becomes more reliable, because spoken

input is less formal

In t h e n e a r t e r m , p l a n n e d

e n h a n c e m e n t s include adjusting the

weighting scheme to more a c c u r a t e l y

r e f l e c t the empirical data A f r e q u e n c y -

based model of preposition usage, based on

a much larger and b r o a d e r sampling of text

will improve system h a n d l i n g of those

errors

ACKNOWLEDGEMENTS

We would like to express our

a p p r e c i a t i o n o f o u r c o l l e a g u e s '

c o n t r i b u t i o n s to the SRA NLP system: Gayle Aycrs, A n d r e w FanG, Ben Fine,

K a r y n G e r m a n , Mary Dee Harris, David Reel, and Robert M Simmons

R E F E R E N C E S

1 Aitchison, Jean 1987 Words in the Mind

Blackwell, NY

2 Blejer, Hatte a n d Sharon Flank 1988 More E v i d e n c e f o r the Semantic Basis of Prepositional Case Marking, d e l i v e r e d December 28, 1988, Linguistic Society of

A m e r i c a A n n u a l Meeting, New Orleans

3 Bresnan, Joan, cd 1982 The Mental Representation of Grammatical Relations

MIT Press, Cambridge

4 Carbonell, Jaime and Philip Hayes 1983

R e c o v e r y S t r a t e g i e s f o r P a r s i n g

E x t r a g r a m m a t i c a l Language American Journal of Computational Linguistics 9(3-4): 123-146

5 Chierchia, G e n n a r o , Barbara Partee, and

R a y m o n d T u r n e r , eds 1989 Properties, Types and Meaning K l u w e r , Dordrecht

6 Chomsky, Noam 1981 Lectures on Government and Binding Foris, Dordrecht

7 Croft, William 1986 Categories and Relations in Syntax: The Clause-Level Organization of Information Ph.D Dissertation, S t a n f o r d University

8 Dahlgren, Kathleen 1988 Naive

S e m a n t i c s f o r N a t u r a l L a n g u a g e Understanding K l u w e r , Boston

9 Dirven, Rene and G u n t e r R a d d e n , eds

1987 Concepts o/ Case G u n t e r Narr, Tubingen

10 Dowry, David 1989 On the Semantic Content of the Notion of ' T h e m a t i c Role'

In Chierchia, et al II:69-129

11 Foley, William and R o b e r t Van Valin Jr

1984 Functional Syntax and Universal Grammar Cambridge Univ Press, Cambridge

Trang 8

12 Gawron, Jean Mark 1988 Lexical

Representations and the Semantics of

Complementation Garland, NY

13 Gazdar, Gerald, Ewan Klein, Geoffrey

Pullum, and Ivan Sag (GKPS) 1985

Generalized Phrase Structure Grammar

Harvard Univ Press, Cambridge

14 Gruber, Jeffrey 1 9 7 6 Lexical

Structures in Syntax and Semantics North-

Holland, Amsterdam

15 Haiman, John 1985 Natural Syntax:

lconicity and Erosion Cambridge

University Press, Cambridge

16 Hirst, Graeme 1 9 8 7 Semantic

Interpretation and the Resolution of

Ambiguity Cambridge University Press,

Cambridge

17 Ikegami, Yoshihiko 1987 'Source' vs

'Goal': a Case of Linguistic Dissymetry, in

Dirven and Radden 122-146

18 Jackendoff, Ray 1983 Semantics and

Cognitwn MIT Press, Cambridge

19 Jensen, Karen, George Heidorn, Lance

Miller and Yael Ravin 1983 Parse Fitting

and Prose Fixing: Getting a Hold on Ill-

formedness American Journal o f

Computational Linguistics 9(3-4): 147-160

20 Menyuk, Paula 1969 Sentences Children

Use MIT Press, Cambridge

21 Miller, Glenn and Philip Johnson-Laird

1976 Language and Perception Harvard

University Press, Cambridge

22 Ostler, Nicholas 1980 A Theory of

Case Linking and Agreement Indiana

University Linguistics Club

23 Pinker, Steven 1982 A Theory of the

Acquisition of Lexical Interpretive

Grammars, in Bresnan 655-726

24 Shopen, Timothy, ed 1985 Language

Typology and Syntactic Description

Cambridge University Press, Cambridge

25 Somers, H L 1987 Valency and Case in

Computational Linguistics Edinburgh

University Press, Edinburgh

26 Talmy, Leonard 1985 Lexicalization Patterns: Semantic Structure in Lexical Forms In Shopen III:57-149

27 Tomita, Masuru 1986 Efficient Parsing for Natural Language Kluwer, Boston

28 Vendler, Zeno 1967 Linguistics in Philosophy Cornell University Press, Ithaca

29 Weischedel, Ralph and Norman Sondheimer 1983 Meta-rules as a Basis for Processing Ill-Formed Input American Journal of Computational Linguistics 9(3-

4):161-177

APPENDIX A DATA SOURCES ATC: National Public Radio news program, "All Things Considered"

ME: National Public Radio news program, "Morning Edition"

WE: National Public Radio news program, "Weekend Edition"

MCS: WAMU radio, Washington D.C., interview program, "The Mike Cuthbert Show"

DRS: WAMU radio, Washington D.C., interview program, "Diane Rehm Show" FFS: WAMU radio, Washington D.C., interview program, "Fred Fiske Saturday" AIH: Canadian Broadcasting Company radio news program, "As It Happens" NYT: The New York Times

WP: The Washington Post ETM_: Student journal for "Effective Teaching Methods," a junior undergraduate course

Ngày đăng: 31/03/2014, 18:20

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm