1. Trang chủ
  2. » Luận Văn - Báo Cáo

Báo cáo khoa học: "SENTENCE FRAGMENTS REGULAR STRUCTURES" potx

10 331 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 10
Dung lượng 730,28 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

The syntactic regularity of fragment struc- tures has been demonstrated elsewhere, notably in ~/larsh1983, Hirschman1983]; we will focus here upon the regularity of these structures acro

Trang 1

S E N T E N C E F R A G M E N T S R E G U L A R S T R U C T U R E S Marcia C Linebarger, Deborah A Dahl, L y n e t t e Hirschman, Rebecca J Passonneau

Paoli Research Center Unlsys Corporation P.O Box 517 Paoli, P A

A B S T R A C T This p a p e r describes an analysis of telegraphic

fragments as regular structures (not errors) han-

dled by rn~n~nal extensions to a system designed

for processing the s t a n d a r d language The modu-

lar approach which has been implemented in the

Unlsys n a t u r a l language processing system PUNDIT

is based on a division of labor in which syntax

regulates the occurrence and distribution of

elided elements, and semantics and pragumtics

use the system's s t a n d a r d mechankms to inter-

pret them

1 I N T R O D U C T I O N

In t ] ~ p a p e r we discuss the syntactic,

semantic, and pragmatic analysis of f r a g m e n t a r y

sentences in English Our central claim is t h a t

these sentences, which have often been classified

in the literature with truly erroneous input such

as misspellings (see, for example, the work dis-

cussed in ~wnsny1980, Thompson1980,

Kwnsny1981, Sondheimer1983, Eustman1981, Jen-

sen1983]), are regular structures which can be

processed by adding a small number of rules to

the g r a m m a r and other components of the sys-

tem The syntactic regularity of fragment struc-

tures has been demonstrated elsewhere, notably

in ~/larsh1983, Hirschman1983]; we will focus here

upon the regularity of these structures across all

levels of linguistic representation Because the

syntactic component regularizes these structures

into a form almost indistinguishable from full

tThis work has been supported in part by DARPA

under contract N00014-85-C-0012, administered by the Office

of Naval Research; by National Science Foundation contract

DCR-85-02205; and by Independent R~D fuudinz from Sys-

tens Development Corporation, now part of Unisys Corpora-

tion Approved for public release, distribution unlimited

assertions, the semantic and pragmatic com- ponents are able to interpret them with few or no extensions to existing mechanisms This process

of incremental regularisation of fragment struc- tures~is possible only within a linguistically modu- lar system Furthermore, we claim t h a t although

f r a ~ n e n t s m a y occur more frequently in special- ised sublanguages t h a n in the s t a n d a r d grammar, they do not provide evidence t h a t sublanguages are based on gra,~m*tical principles fundamen- tally different from those underlying standard languages, as claimed by ~itspatrick1986], for example

This paper is divided into five sections The introductory section defines fragments and describes the scope of our work In the second section, we consider certain properties of sentence fragments which motivate a modular approach The third section describes our implementation of processing for fragments, to which each com- ponent of the system makes a distinct contribu- tion The fourth section describes the temporal analysis of fragments Finally, the fifth section discusses the status of sublanguages characterized

by these telegraphic constructions

We define fragments as regular structures which are distinguished from full assertions by a missing element or elements which are normally syntactically obligatory We distinguish them from errors on the basis of their regularity and consistency of interpretation, and because they appear to be generated intentionally We are not denying the existence of true errors, nor t h a t pro- ceasing sentences containing true errors may require sophisticated techniques and deep reason- ing Rather, we are saying t h a t fragments are dis- tinct from errors, and can be handled in a quite general fashion, with minimal extensions to nor- mal processing Because we base the definition of

/ragmer, t on the absence of a syntactically

7

Trang 2

obligatory element, noun phrases without articles

are not considered to be fragmentary, since this

om;~sion is conditioned heavily by sem•ntlc fac-

tors such •s the mass vs count distinction How-

ever, we have implemented a p r • g m • t l c a l i y based

t r e a t m e n t of noun phrases without determiners,

which is briefly discussed in Section 3

Fragments, then, • r e defined here as eli-

slons We describe below the way in which these

ore;••ions are detected and subsequently 'filled in'

by different modules of the system

The problem of processing f r a g m e n t a r y sen-

tences has arisen in the context of a l•rge-scnle

n a t u r a l language processing research project con-

ducted a t UNIsYs over the past five years ~ a l -

mer1986, Hirschman1986, Dowding1987,

Dahl1987] We have developed a portable,

broad-coverage text-processing system, PUNDIT 1

Our initial applications have involved v•rlons

message types, including: field engineering reports

for maintenance of computers; N a v y maintenance

reports (Casualty Reports, or CASR~S) for start-

ing air compressors; N a v y intelligence reports

( ~ m ~ r o R m ) ; trouble and f • U ~ reports (TEas)

from N a v y Vessels; and recently we have exam-

ined several medical domains (radiology reports,

COmments fields from • DNA sequence database)

A t least half the sentences in these corpora are

fragments; T a b l e 1 below gives • summary of the

fragment content of three domains, showing the

percent of c e n t e r s which are classified as frag-

ments ( C e n t e r s comprise all sentence types:

assertions, questions, fragments, and so forth.)

Table 1 Fragments in three domaiu~

Total centers Percent fragments

The PUNDIT system is highly modular: it

consists of a syntactic component, based on string

g r a m m a r and restriction g r a m m a r [Sager1981,

Hirschman1985]; a semantic component, based on

inference-driven mapping, which decomposes

predicating expressions into predicates and

thematic roles ~almer1983, Palmerlg85]; and a

pragmatic• component which processes both refer-

ring expressions ~)ah11986], and temporal expres-

sions ~assonneau1987, Passonneau1988]

1 Prolog UNDer#h;~isO ol l~tzgr~zd Teal

2 D I V I S I O N O F L A B O R A M O N G S Y N -

T A X , S E M A N T I C S , A N D P R A G M A T I C S

We argue here t h a t sentence fragments pro- vide a strong case for linguistically modular sys- tems such as PUNDIT, because such elislons have distinct consequences • t different levels of linguis- tic description Our approach to fragments can be snmm•rlsed by saying t h a t s y n t a x detects 'holes'

in surface structure and creates dummy elements

as piaceholders for the missing elements; seman- tics and pragmatics interpret these placeholders

a t the appropriate point in sentence processing, utllising the same mechanisms for fragments •s for full assertions

S y n t a x r e g u l a t e s t h e h o l e s F r a g m e n t eUsions cannot be accounted for in purely

s e m a n t l c / p r a g m a t i c terms This is evidenced by the fact t h a t there • r e syntactic restrictions on om;nlons; the acceptability of a sentence frag- ment hinges on g r a m m • t l c a l factors r a t h e r than, e.g., how readily the elided material can be inferred from context For example, the discourse

Old h o w e too small *New one ~ be larger titan

_ was (where the elided object of t~an is under- stood to be old howe) is Ul-formed, whereas a

comparable discourse First repairman ordered new air eonditiom~r Second repairman will i n l t a l i _

(where the elided object of inJto//is understood to

be air eoaditloasr) is acceptable In both cases above, the referent of the elided element is avail- able from context, and y e t only the second elilpsis sounds well-formed Thus • n appreciation of where such ellipses m a y occur is p a r t of the

l i n g u , t/e knowledge of speakers of English and not simply a function of the contextual salience

of elided elements Since these restrictions con- cern structure r a t h e r t h a n content, they would be

d ; ~ c u l t or impossible to s t a t e in • system such •s

a 'pure' semantic g r a m m a r which only recognised such omissions a t the level of s e m a n t i c / p r a g m a t i c representation

Furthermore, it m a t t e r s to semantics and pragmatic• HOW an argument is omitted The syntactic component must tell sem•ntlcs whether

a verb argument is re;Ring bec•use the verb is

used intransitively (as in The tiger was eating,

where the p a t i e n t argument is not specified) or

because of • fragment ellipsis (as in Eaten bl/ a

tiger, where the p a t i e n t argument is missing because the subject of a passive sentence has been elided) Only in the l a t t e r case does the missing argument of eat function •s • n

Trang 3

antecedent subsequently in the discourse: c o m p a r e

Eaten by a tiler Had mcreamed bloody murder

right before tKe attack (where the victim and the

screamer are the same) vs TKe tiger teas eating

Had screamed bloody murder right before tKe

attack (where it is dlmcnlt or impossible to get the

reading in which the victim and the screamer are

the same)

S e m a n t l e s a n d p r a g m s t l e s fill t h e h o l e s

In PUNDIT's t r e a t m e n t of fragments, each com-

ponent contributes exactly w h a t is a p p r o p r i a t e to

the specification of elided elements Thus the syn-

t a x does not a t t e m p t to 'fill in' the holes t h a t it

discovers, unless t h a t information is completely

predictable given the structure a t hand Instead,

it creates • d u m m y element If the missing ele-

m e n t is an elided subject, then the d u m m y ele-

m e n t c r e a t e d b y the syntactic component is

assigned a referent b y the p r a g m a t i c s component

This referent is then assigned • t h e m a t i c role by

the semantics component llke a n y other referent,

and is subject to a n y selectlonal restrictions atom-

cinted with the t h e m a t i c role assigned to it If

the missing element is a verb, it is specified in

either the syntactic or the semantic component,

depending upon the f r a g m e n t type

| P R O C E S S I N G F R A G M E N T S I N P U N -

D I T

Although the initial PUNDIT system w u

designed to handle full, as opposed to fragmen-

t a r y , sentences, one of the interesting results of

our work is t h a t it has required only very minor

changes to the system to handle the basic frag-

m e n t types introduced below These included the

additions of: 6 f r a g m e n t BNF definitions to the

g r a m m a r (a 5 ~ increase in g r a m m a r size) and 7

context-sensitive restrictions (a 12~o increase in

the n u m b e r of restrictions); one semantic rule for

the i n t e r p r e t • • i o n of the d u m m y element inserted

for missing verbs; • minor modification to the

reference resolution mechanism to t r e a t elided

noun phrases llke pronouns; and a small addition

to the t e m p o r a l processing mechanism to handle

tenseless fragments The small number of

changes to the semantic and p r a g m a t i c com-

ponents reflects the f a c t t h a t these components

are not ' a w a r e ' t h a t t h e y are interpreting frag-

m e n t a r y structures, because the regularlsatlon

performed by the syntactic component renders

t h e m s t r u c t u r a l l y indistinguishable from full

assertions

F r a g m e n t s present parsing problems because the ellipsis creates degenerate structures For example, • sequence such as cheer negative can

be a n a l y s e d as a 'sero-copuia' f r a g m e n t meaning

the chest X-ray im negative, or • noun compound

llke tKe nefative of the ehe,L This is compounded

by the lack of d e r i v • t i o n a l and inflectional mor- phology in English, so t h a t in m a n y cases it m a y not be possible to distinguish • noun from • verb

(repair parts) or a p a s t tense from a p a s t partici- ple (decreased medication) Adding f r a g m e n t definitions to the g r a m m a r (especially if deter- miner om;Mion is •]so allowed) results in • n explosion of ambiguity This problem has been noted and discussed by K w a s n y and Sondheimer

~wasny1981] Their solution to the problem is

to suggest special relax••ion techniques for the analysis of fragments However, in keeping with our thesis t h a t f r a g m e n t s are n o r m a l construc- tions, we have chosen the a l t e r n a t i v e of con- straining the explosion of parses in two ways The first is the addition of • control structure to implement a i;m;ted form of preference via ' u n b a c k t r • c k a b l e ' or (xor) This binary o p e r a t o r tries its second a r g u m e n t only if its first argu-

m e n t does not lead to • parse In the g r a m m a r , this is used to prefer "the most structured" alter- native T h a t is, full assertions are preferred over fragments - if an assertion or other non-fragment parse is obtained, the parser does not t r y for •

f r a g m e n t parse

The second mechanism t h a t helps to control generation of incorrect parses is selection PUNDIT applies s u r f a c e selectlonal constraints incremen- tally, as the parse is built up ~ang1988] For example, the phrase air compressor would NOT be allowed as • serocopnla because the construction

air is eompree#or would fall selection, s 8.1 F r a g m e n t T y p e s

The f r a g m e n t types currently t r e a t e d in PUNDIT include the following:

Z e r o c o p u l a : a subject followed by • predicate, differing from a full clause only in the absence of

• verb, as in ImpeUor blade tip erosion eviden~

T v o (tensed v e r b + object): a sentence m ; ~ i n g its subject, as in Believe the coupling from diesel to lac lube oil p u m p to be reheated;

s Similarly, the assertion parse for the title of this pa- per would fail selection (sentences don't frngment structures), permitting the serocopuin fragment pLrse

Trang 4

N s t ~ a g : an isolated noun phrase (noun-string

fragment), as in L o u o / o ~ primp p r e u u r e

O b J l z e _ f r a g (object-of-be fragment): an isolated

complement appropriate to the main verb be, as

in Unable to eonJ.tenffy Itart nr l b gaa turbine;

P r e d i c a t e : an isolated complement appropriate

to a ~ a r y be, as in Believed due to worn b~h-

ingJ, where the full sentence c o u n t e r p a r t is

Failure 14 believed (to be) due to uorn b~hlnfm; s

Obj gap_flea&qnent: a center (assertion, ques-

tion, or o t h e r fragment structure) m i n i n g an obli-

g a t o r y noun phrase object, as in Field engineer

t ~ l replace_

Note t h a t we do not address here the pro-

cessing of reapon~e f r a f m e n ~ which occur in

interactive discourse, typically as responses to

questions

The relative frequency of these six fragment

types (expressed as a percentage of the t o t a l frag-

ment content of each corpus) is summarised

below.'

T a ~ e 2 3reLkdown of fragments by

typ•o

TFR 61%

18.8%

18.8%

S.S%

0%

0%

The processing of these basic f r a g m e n t

types can be svmm~rlsed briefly as follows: a

detailed surface parse tree is provided which

represents the overt lexical content in its surface

order At this level, fragments bear very little

resemblance to full assertions But a t the level of

the Intermediate S~/ntac~e Representation (ISR),

s It is i n t e r e s t i n g to n o t e t h a t a t least some of t h e s e

t y p e s of f r a g m e n t s resemble n o n - f r n s m e n t a r y s t r u c t u r e s in

other languages, two f r a g m e n t s , for m Lmple, can be com-

p a r e d to sero-subject sentences in Japanese, s e r o e o p u l a s

resemble copular sentences in Arabic a n d R u s s i a n , a n d strue-

t u r e s similar to p r e d l c a t e can be found in C a n t o n e s e (our

t h a n k s to K F u for t h e C a n t o n e s e data) T h i s being t h e case,

it is n o t s u r p r i s i n g t h a t analozoue sentences in E n g l k h can be

processed w i t h o u t resorting to e x t r a ~ i m m n t i c z d mechanismsc

4 ZC serocopula; N F =- u s t g _ f r a g m e n t ; P R E D -,

predicate; OBJBE , - objba_frag; OBJ_GAP -

obj L~p_fraEment

which is a regularized representation of syntactic structure ~)ah11987 ], fragments are regularized

to p a r a n e l full assertions by the use of dummy elements standing in for the mlasing subject or verb The CONTENT of these dummy elements, however, is left unspecified in most cases, to be filled in by the semantic or pragmatic components

of the system

T v o We consider first the t v o , a subject- less tensed clause such as Operate, norton/Ill This

is parsed as a sequence of tensed verb and object:

no subject is inferred a t the level of surface struc- ture In the ISR, the missing subject is fined in by the dnmmy element e l i d e d A t the level of the ISR, then, the fragment operates norma/f~/ differs

from a full assertion such as ]t operates normaU~/

only by virtue of the element e l i d e d in place of

s n overt pronoun The element e l i d e d is asslgned

a referent which subsequently fills a thematic role, exactly as if it were a pronoun; thus these two sentences get the same t r e a t m e n t from semantics and reference resolutlon~)ah11986, Pal- mer1988]

Elided subjects in the domains we have looked a t often refer to the writer of the report,

so one s t r a t e g y for interpreting t h e m might be simply to assume t h a t the filler of the elided sub- Sect is the writer of the report This simple stra- tegy is not snlBclent in all cases F o r example, in the CASREPS corpus we observe sequences such

as the following, where the filler of the elided sub- Sect is provided by the previous sentence, and is clearly not the writer of the report

(i) Problem appears to be caused by one or more of two hydraulic valves Requires disassembly and investigation

(2) Sac lube oll pressure decreases below alarm point approximately seven minutes after engagement Believed due to worn bushings Thus, it is necessary to be able to t r e a t elided subjects as pronouns in order to handle these sen- tences

The effect of an elided subject on subse- quent focusing is the same as t h a t of an overt pronoun We demonstrated in section 2 t h a t elided subjects, b u t not semantically implicit arguments, are expected loci (or forward-looklng centers [Gross1988]) for l a t e r sentences

10

Trang 5

The basic assumption underlying this t r e a t -

m e n t is t h a t the p r a g m a t i c analysis for elided

subjects should be as re;re;far to t h a t of pronouns

as possible One piece of supporting evidence for

this assumption is t h a t in m a n y languages, such

as J a p a n e s e [Gundel1980, l-nnds1983,

Kameyama1985] the functional equivalent of

unstressed pronouns in English is a sere, or elided

noun phrase, s If seres in other languages c a n

correspond to unstressed pronouns in English,

then we hypothesise t h a t seres in a sublunguage

of English c a n correspond functionally to pro-

nouns in s t a n d a r d English In addition, since pro-

ceasing of pronouns is independently m o t l v a t e d , it

is a priori simpler to t r y to fit elision Into the pro-

nominal p a r a d i g m , if possible, t h a n to create an

entirely s e p a r a t e component for handling elision

Under this hypothesis, then, t v o fragments

represent 8 ~ p l y a realization of a g r a m m a t i c a l

s t r a t e g y t h a t is generally available to languages

of the world, s

Z e r o e o p u l a F o r a serocopuia (e.g., D~Jk

bad), the surface parse tree r a t h e r t h a n the ISR

inserts a d n m m y verb, In order to enforce sub-

categorization constraints on the object A n d In

the ISR, this null v e r b is 'filled in' as the v e r b be

It is possible to fill in the v e r b a t this level

because no further semantic or p r a g m a t i c infor-

m a t i o n is required in order to determ;ne its con-

t e n t 7 Hence the representation for D ~ k bad is

nearly indistinguishable f r o m t h a t assigned to the

c o r r e s p o n d i n g / ) / I k / s bad; the only difference is in

the absence of tense from the former If the null

verb represents an~llsLry be, then, like an overt

a n ~ I ; a r y , it does not a p p e a r in the regularised

form Sac failing thus receives a regularisatlon

with /ai/ as the m a i n verb Thus the null verb

inserted in the s y n t a x is t r e a t e d in the ISR ill a

fashion exactly parallel to the t r e a t m e n t of overt

t Stressed pronouns in Eugiish corrupond to overt pro-

nouns in lanzua,res like Japanese u discummd in [Gun-

dell980, Gundellg81J, and [Dahl1982J

t An interesting hypothesis, discussed by Gundel and

Kameyama, is that the more topic prominent a language is,

the more likely it is to have sero-NP's Perhaps the fact that

sublangusge mumn~J are characterised by rigid, contextualiy

supplied, topics contributes to the availability of the rye

fragment type in English

7 In some restricted subdomains, however, other verbs

may be omitted: for example, in certain radiology reports an

omitted verb may be interpreted u ,hew rather than be

H e n c e w e find Chemf Fdm* 1/.10 tittle c A a ~ e , p a r a p h r u a b l e a s

Che#t Fdme s h o w Htffe cA~sge

occurrences of 6c

N s t g - ~ a g The syntactic parse tree for this f r a g m e n t type contains no e m p t y elements; it

is a regular noun phrase, labeled as an nstg_f~aK T h e ISR transforms it into a VSO sequence This is done by t r e a t i n g it as the sub- Sect of an element e m p t y _ v e r b ; in the semantic component, the subject of e m p t y _ v e r b is t r e a t e d

as the sole a r g u m e n t of a predicate

e x l s t e n t l s l ( X ) As a result, the n s t g _ f r a g Fai/ure o[ see a n d a synonymous assertion such as

Failure o.f sac occurred are eventually m a p p e d onto s;rnil~r final representations by virtue of the

t e m p o r a l semantics o f e m p t y _ v e r b and of the

b e a d of the noun phrase

O b j b e _ / ~ a g and p r e d i c a t e These are iso- inted complements; the same devices described above are utillsed in their processing The sur- face parse tree of these f r a g m e n t types contains

no e m p t y elements; as with s e r o e o p u l a , the unteused v e r b be is inserted into the ISR; as with

t v o , the d n r - m y subject e l i d e d is also inserted in the ISR, to be filled in by reference resolution Thus the simple adjective Inoperatiee will receive

a n ISR quite s;rn;lsr to t h a t of .~e/,Ise/it ~ ino- perative

O b J _ g a p _ ~ a g m e n t T h e final f r a g m e n t

t y p e to be considered here is the elided noun phrase object Such object elisioca occur more widely in English in the context of instructions, as

in Handle _ udtA sere Cookbooks are especially well-known respositories of elided objects, presum- ably because they are filled with instructions Object elision also occurs in telegrarnmatic sub-

languages generally, as in Took _ u n d e r ~re ud~

m , e ~ e s from the N a v y sighting messages If these omissions occurred only in direct object position following the verb, one might argue for a lexlcal

t r e a t m e n t ; t h a t is, such omissions could be

t r e a t e d as a lexlcal process of intransitivisation

r a t h e r t h a n by explicitly representing gaps in the syntactic structure However, noun phrase objects

of prepositions m a y also be omitted, as in FraCas

Do not tamper ~ t h _ Thus we have chosen to represent such elislons with an explicit surface structure gap This gap is p e r m i t t e d in most con- texts where n s t K o (noun phrase object) is found:

as a direct object of the v e r b a n d as an object of

a preposition 8 In PUNDIT, elided objects are

s N o t e , h o w e v e r , t h a t t h e r e a r e s o m e r e s t r i c t i o n s o n t h e

o c c u r r e n c e o f t h e s e e l e m e n t s T h e y s e e m n o t t o o c c u r in

11

Trang 6

permitted only in a fragment type called

o b j _ g a p _ f k a g m e n t , which, llke other fragment

types, m a y be a t t e m p t e d only if an assertion

parse has failed Thus a sentence such as Pressure

was c/stressing rap~ffy will never be analysed as

containing an elided object, because there is a

semantically acceptable assertion parse In con-

trust, Johts ~ a s d e e r e ~ i n f gr~uag[I/ will receive

an elided object analysis, paraphrasable as Joh~

w ~ d e e r e ~ i ~ f I T gradua~v, because Jo~n is not

an acceptable subject of intransitive J e e r e ~ e ;

only pressure or some equally mensurable entity

m a y be said to decrease This selectional failure

of the assertion parse permits the elided object

analysis

Our working hypothesis for determ;u;uS the

reference of object gaps is t h a t they are, just llke

subject gaps, appropriately t r e a t e d as pronouns

However, we have not as y e t seen extensive d a t a

relevant to this hypothesis, and it remains subject

to further testing

These, then, are the fragment types

currently Inzplemented In PUNDIT As mentioned

above, we do n o t consider noun phrases without

determ;-ers to be fragments, because it is not

clear t h a t the missing element is s y m f ~ f ~ e ~ y

obligatory The Interpretation of these noun

phrases is t r e a t e d as a p r a g m a t i c problem In the

style of speech characteristic of the CASREPs,

determ;uers are nearly always omitted Their

function must therefore be replaced by other

mechanisms One possible a p p r o a c h to this prob-

lem would be to have the system t r y to determine

w h a t the determ;uer would have been, h a d there

been one, insert it, a n d then resume processing as

if the detervn;ner h a d been there all along This

a p p r o a c h was t a k e n by ~V[arsh1981] However,

it was rejected here for two reasons The first is

t h a t it was judged to be more error-prone t h a n

simply equipping the reference resolution com-

ponent with the ability to handle noun phrases

without determiners directly 0 The second reason

predicative objects, in double dative constructions, and,

perhaps, in sentence adjuncts rather than arguments of the

verb (Thus compare P4fiesf eertf d/ Do sot opersfe os

with Opersti~ room cloud os Snadslt Do nor pe~om ~r-

gcIT oz ) One po~ibility is that these expreruione can occur

only where a definite pronoun would also be acceptable In

general, object p p s seem mcet acceptable where they

represent an argument ot n verb, either as direct object or u

object of a preposition selected for by a verb

This ability would be required in any case, should the

system be extended to process languages which do not have

for not selecting this approach is t h a t it would el|m;uate the distinction between noun phrases which originally h a d a determiner and those which did not A t some point in the development

of the system it m a y become necessary to use this information°

The basic a p p r o a c h currently taken is to assume t h a t the noun phrase is definite, t h a t is, it triggers a search through the discourse context for a previously mentioned referent If the search succeeds, the noun phrase is assumed to refer to

t h a t entity If the search fans, z new discourse entity is created

In summary, then, these fragment types are parsed 'as is' a t the surface level; dummy ele- ments are inserted Into the ISR to bring fragments into close parallelism with fuil assertions Because of the resulting s t r u c t u r a l s;m;l~rlty between these two sentence types, the semantic and p r a g m a t i c components can apply exactly the same Interpretive processes to both fragments and assertions, using p r e e x i s t i n g mechanisms to 'flu In' the holes detected by syntax

4 T E M P O R A L A N A L Y S I S O F F I ~ G -

M E N T S

T e m p o r a l processing of f r a g m e n t a r y sen- tences further supports the efficacy of a modular

a p p r o a c h to the analysis of these strings 1° In PUNDIT'S current message domains, a single assumption leads to assignment of present or past tense in untensed fragments, depending on the nspectual properties of the fragment, lz This assumption is t h a t the messages report on a c t u a l situations which are of p r e s e n t relevance Con- sequently, the default tense assignment is present unless t h ~ prevents assigning an a c t u a l time 1~ For sentences having progressive grammati- cal aspect or statlve lexical aspect, the assign- ment of present tense always permits interpreting

articl~

1°For a discussion of the temporal component, of

~Parsonsoan1987, PassonnenulgSnJ

u$ince the rye fragment is tensed, its input to the time component is indistinguishable from that of a full mntence z~Pundit do~ not currently take full advantage of modifier information that could indicate whether a situation has real time associated with it (e.,r, pot4ntial sac tinware),

or whether a situation is past or present (e.g., sac 1~ure yen- teeday; pump now opera/~ng so~m~y)

12

Trang 7

a situation as having an a c t u a l time ~ a s s o n -

neau1987] Thus, • present tense reading is

always assigned to an untensed progressive frag-

ment, such as pressure decreasing; or an untensed

serocopula with • non-partlclplal complement,

such as p u m p i~operatlee

A non-progressive serocopula f r a g m e n t con-

taining • cognitive s t a t e verb, as in /a~ure

believed due to w o w bushings, is assigned •

present tense reading However, if the lexlc•l

verb has non-stative aspect, Is e.g., t s s ~ eomluetsd

( p r o c e s s ) or new sac received ( t r a n s i t i o n e v e n t )

then assignment of present tense conflicts with

the assumption t h a t the mentioned situation has

occurred or is occurring T h e slmple present

tense form of verbs in this class is given • habi-

t u a l or iterative reading T h a t is, the

corresponding full sentences in the present, t s s ~

are conducted a n d nelo sac ~ reeelved, are inter-

preted as referring to types of situations t h a t

tend to occur, r a t h e r t h a n to situations t h a t have

occurred In order t o permit a c t u a l t e m p o r a l

reference, these f r a g m e n t s are assigned • p a s t

tense reading

N s t ~ / ~ a g represents a n o t h e r case where

present tense m a y conflict with lexical aspect If

• n n m t g _ f r a g refers to • n o n - s t • t i r e situation,

the situation is i n t e r p r e t e d as having an a c t u a l

p a s t time This c a n be the case if the head of the

noun phrase is • nom;nallsation, and is derived

from • verb in the process or t r • n s l t l o n event

aspectual class Thus, ineestlgation of problem

would be i n t e r p r e t e d as an a c t u a l p r o c e s s which

took place prior to the report time, a n d ~irnilurly,

s a c / a i / u r e would be interpreted • s • p a s t t r a n s i -

t | o n e v e n t On the other hand, an n s t f f ~ r a J ¢

which refers to • s t • t i r e situation, as in i~opera-

~iee pump, is assigned present tense

5 R E L A T I O N O F F R A G M E N T S T O T H E

L A R G E R G ~

An i m p o r t a n t finding which has emerged

from the investigation of sentence fragments in a

v a r i e t y of sublanguage domains is t h a t the

linguistic properties of these constructions are

largely domain-independent A~nrn|rlg t h a t these

sentence f r a g m e n t s remain c o n s t a n t across

different sublanguages, w h a t is their relationship

to the language a t large? As indicated above, we

Is Mourelat~' class of occurrences [Mourelatoslg81]

believe t h a t f r a g m e n t s should not be regarded as ERRORS, • position t a k e n also by ~ehrberger1982, Marsh1983], and others F r a g m e n t s do occur with disproportionate frequency in some domains, such as field reports of mechanical failure or newspaper headlines However, despite this fre- quency v • r i a t l o n , it a p p e a r s t h a t the parser's preferences remain c o n s t a n t •cross domains Therefore, even in telegraphic domains the prefer- ence is for • full assertion parse, if one is avail- able As discussed above, we have enforced this preference by means of the x o r ( ' u n b a c k t r a c k - able' or) connective Thus despite the g r e a t e r frequency of fragments we do not require either •

g r • m m * r or • preference structure different from

t h a t of s t a n d a r d English in order to apply the stable system ~ r a m m l r to these telegraphic mes- sages

Others h a v e argued against this view of the relationship between sublanguages and the language a t large For example, F i t s p a t r l c k et al

~itspatrick1986] propose t h a t fragments are sub- ject to • constraint quite unlike a n y found in English generally Their T r * n * l t l v i t y C o n -

s t r a i n t (TC) requires t h a t if • verb occurs as • transitive in • sublanguage with f r a g m e n t a r y messages, then it m a y not also occur in an intran- sitive form, even if the v e r b is ambiguous in the language a t large This constraint, they argue, provides evidence t h a t sublanguage gramm,,rs have " • llfe of their own", since there is no such principle governing s t a n d a r d languages The TC

w o u l d also cut down on ambiguities arising out

of object deletion, since • v e r b would be permit- ted to occur transitively or intransltlve]y in • given subdomain, b u t not both

As the authors recogulse, this hypothesis runs into tllt~culty in the face of verbs such as resume (we find b o t h Sac resumed n o r m ~ opera-

tlon and N o ~ e ]~am resumed), since resume occurs

b o t h transitively and intransitively in these cases

F o r these cases, the authors are forced to appeal

to a problematic analysis of resume as syntacti- caliy transitive in b o t h cases; they analyse TKe

~o~e /sue resumed, for example, as deriving from

a structure of the form CSomeone/aomethingJ

resumed tKc nose; t h a t is, it is analysed as under- lyingiy transitive O t h e r t r a n s i t i v i t y alternations which present p o t e n t i a l counter-examples are

t r e a t e d as syntactic gapping processes In fact, with these two mechanisms available, it is not clear w h a t COULD provide a counter-example to

13

Trang 8

the TC The effect of all this insulation is to

render the Transitivity Constraint vacuous If all

trans|tive/intranslt|ve alternations can be treated

as underlying|y transitive, then of course there

win be no counter-examples to the transitivity

constraint Therefore we see no evidence that

sublanguage grammars are subject to additional

constraints of this nature

In snmm*ry, this supports the view t h a t

fragmentary constructions in English are regular,

minimally from the standard language, rather

than ill-formed, unpredictable sublanguage exo-

tlca ~Vithln a modular system such as PUNDIT

this regularity can be captured with the l~rn~ted

augmentations of the grammsr described above

ACKNOWLEDGMENTS

The system described in this paper has been

developed by the entire natural language group

at Unisys In particular, we wish to acknowledge

the contributions of John Dowding, who

developed the ISR in conjunction with Deborah

Dahi; and h ~ r t h a Palmer's work on the seman-

tics component The ISR is based upon the work of

Mark Gawron

We thank Tim F;-;" and Martha Palmer as

well as the anonymous reviewers for useful com-

ments on an earlier version of this paper

]~f~Fen~es

~ah11987 ]

Deborah A Dahi, John Dowdlng,

Lynette Hirschman, Francois Lang,

Marcia Linebarger, ~ r t h a Palmer,

Rebecca Passonneau, and Leslie Riley,

Integrating Syntax, Semantics, and

Discourse: DARPA Natural Language

Understanding Program, RScD Status

Report, Paoli Research Center, Unlsys

Defense Systems, May 14, 1987

ahi1980]

Deborah A Dahi, Focusing and Refer-

ence Resolution in PUNDIT, Presented

at AAAI, PhUadelphi~, PA, 1988

[Dah11982]

Deborah A Dahi and Jeanette K Gun- del, Identifying Referents for two kinds

of Pronouns In Minnesota Wor~n¢ Pa-

pete in Lingn~ca and Ph~osophy o/

Language, Kathieen Houlihan (ed.),

1982, pp 10-29,

~ah11987]

Deborah A Dahl, Martha S Palmer, and Rebecca J Passonneau, Nom;-ali- satious in PUNDIT, Proceedings of the 25th Annual Meeting of the ACL, Stanford, CA, July, 1987

~)owdlng1987]

John Dowdlng and Lynette Hirschman, Dynamic Translation for Rule P r - n ; - $

in Restriction Gra,~m~r In Proc o~ the

~ d Intewatlonal Workshop on Natural Language Under#tandln~ and Logic Pro-

1987

~ a s t m n 1 9 8 1 ] C.M Eastman and D.q McLean, On the

Input Amev/can Jonma/ o/ Compn~s-

~itspatrick1988]

E Fitzpatrick, J Bachenko, and D Hindie, The Status of Telegraphic Sublanguages In Ana/yz/nf laneuaee in

R Kittredse (ed.), Lawrence Erlbaum Associates, HUlsdale, lqY, 1986

[G.o,,19.]

Barbara J Gross, Arsvind K Joahi,

and Scott Welnstein, Towards a Com- putatlonal Theory of Discourse In- terpretation, M~., 1986

[Gundel1981]

Jeanette K Gundel and Deborah A Dab], The Comprehension of Focussed and Non-Focussed Pronouns, Proceed- ings of the Third Annual Meeting of the Cognitive Science Society, Berke- ley, CA, August, 1981

14

Trang 9

[Gunde11980]

Jeanette K Gundel, Zero-NP Anaphora

in Russian Chicago LingtJistic ";ocisty

Parasession on Pronouns and AnapKora,

1980

[Hinds1983]

John Hinds, Topic Continuity in

Japanese In Topic Continuit!! in

Discourse, T Givon (ed.), John Benja-

mlns Publishing Company, Philadel-

phla, 1983

nrsc n1983]

Lynette Hirschman and Naomi Sager,

Automatic Inforumtion Formatting of a

Medical Sublanguage In ~ub]anguagc:

mantic Domains, R Kittredge and J

Lehrberger (ed.), Series of Foundations

of Communications, Walter de Gruyter,

Berlin, 1983, pp 27-80

~-Iirschman1986]

L HL'schman, Conjunction in Meta-

Restriction Grammar ,I of L o ~ Pro-

grammin~4), 1986, pp 299-328

[mnchman1985]

L H]zschxn~n and K Puder, Restriction

Gramm*r: A Prolog Implementation In

Logic Programming and its Applications,

D.H.D Warren and M

VanCaneghem (ed.), 1985

[Jensen1983]

K Jensen, G.E Heidoru, L.A ~uller,

and Y Ravin, Parse Fitting and Prose

Formedness American Journal of Com-

putational Linguistic8 9, 1983

~ameyama1985]

Megumi Kameyama, Zero Anaphora:

The Case of Japanese, Ph.D thesis,

Stanford University, 1985

~wasny1981]

S.C Kwasny and N ~ Sondheimer,

laxstlon Techniques for Parsing 111-

Formed Input A m J of Computational

Linguutica 7, 1981, pp 99.108

~wasny1980]

Stan C Kwasny, Treatment of Ungram-

Phenomena in 2Va~ural Language Under- standing Systems Indiana University

Linguistics Club, 1980

[Lang1988]

Francois Lang and Lynette Hirschman, Improved Portability and Parsing Through Interactive Acquisition of Se- mantle Information, Proc of the Second Conference on Applied Natural Language Processing, Austin, TX, February, 1988

~,ehrberger1982]

J Lehrberger, Automatic Translation and the Concept of Sublanguage In

Restricted Semantic Domains, R Kit-

tredge and J Lehrberger (ed.), de Gruyter, Berlin, 1982

p rsh1983]

Elaine Marsh, Utilislng Domain-Specific Infornmtion for Processing Compact

Text In Proceedings of tKe Conference

on Applied Natured Language Process-

1983, pp 99-103

[Marsh1981]

Elaine Marsh, A or THE? Reconstruc- tion of Omitted Articles in Medical Notes, lVlss., 1981

~Vlourelatos1981]

A l e n n d e r P D Mourelatos, Events,

Processes and States In Spntaz and Se-

mantics: Tense and Aspect, P J Tedes-

chi and A Zaenen (ed.), Academic Press, New York, 1981, pp 191-212

~almer1983]

M Palmer, Inference Driven Semantic Analysis In Proceedingm of tKe National

15

Trang 10

~'almer1986]

Martha S Palmer, Deborah A Dahl,

Rebecca J [Passonnesu] Sch~man,

Lynette Hirschmsn, Marcia Linebarger,

and John Dowding, Recovering Implicit

Information, Presented at the 24th An-

nual Meeting of the Association for

Computational Linguistics, Columbls

University, New York, August 1986

~almer1985]

Martha S Palmer, Driving Semantics

for a L;mlted Domain, Ph.D thesis,

University of Edinburgh, 1985

~assonnesu1988]

Rebecca J Passonneau, A Computa-

tional Model of the Semantics of Tense

and Aspect Gomputatio~a/ Lingu~h~I,

1988

~assonneau1987]

Rebecca J Passonueau, Situations and

Intervals, Presented at the 25th Annu-

al Meeting of the Association for Com-

putational Linsuistics, Stanford

University, California, July 1987

[Sager1981]

N Sager, Natur~ Laaeu~e In/orma~a

Proceuing: A Computer Grammar o/

Engl~h and I~ Application Addkon-

Wesley, Reading, Mau., 1981

[Sondhelmer1983]

N K Sondhelmer and R M

Wekchedel, Meta-rules as a Basis for

Processing m-Formed Input Amerieaa

Lingu/~ticm 9(3-4), 1983

[Thompson1980]

Bosena H Thompson, Linguistic

Analysis of Natural Languase Com-

munication with Computers In

Proceedings of O,c 8~, Intcrnatlonal

Con/erer~ee on Computationag Li~gu~-

~icl, Tokyo, 1980

16

Ngày đăng: 31/03/2014, 18:20

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN