Báo cáo khoa học: "CORRECTING ILLEGAL NP OMISSIONS USING LOCAL FOCUS" pdf

These structures, she claims, are analogous to the null argument structures found in languages like Chinese that allow a null argument if the argument co-specifies the topic of a previou

Trang 1

C O R R E C T I N G I L L E G A L N P O M I S S I O N S U S I N G L O C A L F O C U S

Linda Z Suri 1 Department of Computer and Information Sciences

University of Delaware Newark DE 19716 Internet: suri@udel.edu

1 I N T R O D U C T I O N

The work described here is in the context of de-

veloping a system that will correct the written En-

liSh of native users of American Sign Language

SL) who are learning English as a second lan-

guage In this paper we focus on one error class

that we have found to be particularly prevalent:

the illegal omission of NP's

Our previous analysis of the written English of

ASL natives has led us to conclude that language

transfer (LT) can explain many errors, and should

thus be taken advantage of by an instructional sys-

tem (Suri, 1991; Suri and McCoy, 1991) We be-

lieve that many of the omission errors we have

found are among the errors explainable by LT

Lillo-Martin (1991) investigates null argument

structures in ASL She identifies two classes of ASL

verbs that allow different types of null argument

structures Plain verbs do not carry morphological

markings for subject or object agreement and yet

allow null argument structures in some contexts

These structures, she claims, are analogous to the

null argument structures found in languages (like

Chinese) that allow a null argument if the argument

co-specifies the topic of a previous sentence (ttuang,

1984) Such languages are said to be discourse-

oriented languages

As it turns out, our writing samples collected

from deaf writers contain m a n y instances of omit-

ted NP's where those NP's are the topic of a pre-

vious sentence and where the verb involved would

be a plain verb in ASL We believe these errors can

be explained as a result of the ASL native carry-

ing over conventions of (discourse-oriented) ASL to

(sentence-oriented) English

If this is the case, then these omissions can be

corrected if we track the topic, or, in computa-

tional linguistics terms, the local focus, and the

actor focus 2 We propose to do this by develop-

ing a modified version of Sidner's focus tracking

algorithm (1979, 1983) that includes mechanisms

for handling complex sentence types and illegally

omitted NP's

1Thls r e s e a r c h was s u p p o r t e d in p a r t by N S F G r a n t

~ I R I - 9 0 1 0 1 1 2 S u p p o r t was also p r o v i d e d by t h e N e m o u r s

Fotuldation We t h a n k G a l l a u d e t U~fiversity, the N a t i o n a l

Technical I n s t i t u t e for the Deaf, t h e Pennsylvalfia School for

the Deaf, the M a r g a r e t S Sterck School, a n d the B i c u l t u r a l

C e n t e r for p r o v i d i n g us with writing s a m p l e s

2 Grosz, Joshi h a d Weinstein (1983) use the n o t i o n of cen-

t e r i n g to track s o m e t h i n g s i m i l a r to local focus a n d a r g u e

a g a i n s t t h e use of a s e p a r a t e a c t o r focus However, we t h i n k

t h a t the e x a m p l e t h e y use does n o t a r g u e a g a i n s t a s e p a r a t e

a c t o r focus, b u t i l l u s t r a t e s the need for e x t e n s i o n s to Sial-

h e r ' s a l g o r i t h m to specify h o w c o m p l e x s e n t e n c e s s h o u l d be

processed

273

2 F O C U S T R A C K I N G Our focusing algorithm is based on Sidner's focusing algorithm for tracking local and actor foci (Sidner 1979; Sidner 1983) 3 In each sentence, the actor focus (AF) is identified with the (thematic) agent of the sentence T h e Potential Actor Focus List (PAFL) contains all NP's that specify an ani-

m a t e element of the database but are not the agent

of the sentence

Tracking local focus is more complex T h e first sentence in a text can be said to be a b o u t something T h a t something is called the current focus (.CF) of the sentence and can generally be identified via syntactic means, taking into consideration the thematic roles of the elements in the sentence In addition to the CF, an initial sentence introduces

a number of other items (any of which can become the focus of the next sentence) Thus, these items are recorded in a potential focus list ( P F L )

At any given point in a well-formed text, after the first sentence, the writer has a n u m b e r of op- tions:

• Continue talking about the same thing; in this case, the CF doesn't change

• Talk about something just introduced; in this case, the CF is selected from the previous sentence's PFL

• Return to a topic of previous discussion; in this case, that topic must have been the CF of

a previous sentence

• Discuss an item previously introduced, b u t which was not the topic of previous discussion;

in this case, that item must have been on the

P F L of a previous sentence

T h e decision (by the r e a d e r / h e a r e r / a l g o r i t h m ) as

to which of these alternatives was chosen by the speaker is based on the thematic roles (with par- ticular attention to the agent role) held by the anaphora of the current sentence, and whether their co-specification is the CF, a previous CF, or

a m e m b e r of the current P F L or a previous PFL Confirmation of co-specifications requires inferenc- ing based on general knowledge and semantics

At each sentence in the discourse, the CF and

P F L of the previous sentence are stacked for the possibility of subsequent return 4 When one of these items is returned to, the stacked CF's and

P F L ' s above it are popped, and are thus no longer available for return

3 Carter.(1987) extended Sichler s work to haaldle in-

t r a s e n t e n t l a l a n a p h o r a , b u t for s p a c e r e a s o n s we do n o t discuss t h e s e e x t e n s i o n s

4Sidner did n o t s t a c k P F L ' s O u r r e a s o n s for s t a c k i n g

P F L ' s are discussed in s e c t i o n 4

Trang 2

2.1 F I L L I N G I N A M I S S I N G N P

We propose extending this algorithm to iden-

tify an illegally o m i t t e d NP To do this, we treat

the o m i t t e d NP as an anaphor which, like Sidner's

t r e a t m e n t of full definite NP's and personal pro-

nouns, co-specifies an element recorded by the fo-

cusing algorithm This approach is based on the

belief t h a t an o m i t t e d NP is likely to be the topic of

a previous sentence We define preferences among

the focus d a t a structures which are similar to Sid-

ner's preferences

More specifically, when we encounter an omit-

ted NP t h a t is not the agent, we first try to fill

the deleted NP with the CF of the immediately

preceding sentence If syntax, semantics or infer-

encing based on general knowledge cause this co-

specification to be rejected, we then consider mem-

bers of the P F L of the previous sentence as fillers

for the deleted NP If these too are rejected, we con-

sider stacked CF's and elements of stacked PFL's,

taking into account preferences (yet to be deter-

mined) a m o n g these elements

W h e n we encounter an o m i t t e d agent NP, in a

simple sentence or a sentence-initial clause, we first

test the A F of the previous sentence as co-specifier,

then members of the PAFL, the previous CF, and

finally stacked AF's, CF's and PAFL's To iden-

tify a missing agent N P in a non-sentence-initial

clause, our algorithm will first test the AF of the

previous clause, and then follow the same prefer-

ences just given Further preferences are yet to be

determined, including those between the stacked

AF, stacked PAFL, and stacked CF

2.2 C O M P U T I N G T H E C F

To c o m p u t e the CF of a sentence without any

illegally o m i t t e d NP's, we prefer the CF of the last

sentence over members of the PFL, and P F L mem-

bers over members of the focus stacks Exceptions

to these preferences involve picking a non-agent

anaphor co-specifying a P F L m e m b e r over an agent

co-specifying the CF, and preferring a P F L m e m b e r

co-specified by a pronoun to the CF co-specified by

a full definite description

To c o m p u t e the CF of a sentence with an illegally

o m i t t e d NP, our algorithm treats illegally o m i t t e d

NP's as a n a p h o r a since they (implicitly) co-specify

something in the preceding discourse However, it

is i m p o r t a n t to r e m e m b e r t h a t discourse-oriented

languages allow deletions of NP's that are the topic

of the discourse Thus, we prefer a deleted non-

agent as the focus, as long as it closely ties to

the previous sentence Therefore, we prefer the co-

specifier of the o m i t t e d non-agent NP as the (new)

CF if it co-specifies either the last CF or a m e m b e r

of the last PFL If the o m i t t e d NP is the thematic

agent, we prefer for the new CF to be a pronomi-

nal (or, as a second choice, full definite description)

non-agent anaphor co-specifying either the last CF

or a m e m b e r of the last P F L (allowing the deleted

agent NP to be the AF and keeping the AF and CF

different) 5 If no anaphor meets these criteria, then

5As f u t u r e work, we will e x p l o r e h o w to resolve m o r e

t h a n o n e n o n - a g e n t a n a p h o r in a s e n t e n c e co-specifying P F L

e l e m e n t s

274

the members of the CF and P F L focus stacks will

be considered, testing a co-specifier of the o m i t t e d

NP before co-specifiers of pronouns and definite de- scriptions at each stack level

3 E X A M P L E Below, we describe the behavior of the extended algorithm on an example from our collected texts containing b o t h a deleted non-agent and agent

E x a m p l e : "($1) First, in summer I live at home with my parenls ($2) I can budget money easily ($3) I did not spend lot of money at home because

al home we have lot of good foods, I ate lot of foods (S4) While living at college I spend lot of money because_ go out to eat almost everyday ($5) At home, sometimes my parents gave me some money right away when I need_ "

After S1, the AF is I, the CF is I, and the P F L contains SUMMER, HOME, and the LIVE VP For $2,

I is the only anaphor, so it becomes the CF, the

P F L contains HONEY and the BUDGET VP, and the focus stack contains I and the previous PFL

$3 is a complex sentence using the conjunction

"because." Such sentences are not explicitly han- dled by Sidner's algorithm Our analysis so far suggests that we should not split this sentence into two 6, and should prefer elements of the main clause

as focus candidates Thus, we take the CF from the first clause, and rank other elements in t h a t clause before elements in the second clause on the PFL 7 In this case, we have several anaphora: I, money, at home T h e AF remains I T h e CF becomes MONEY since it co-specifies a m e m b e r of the

P F L and since the co-specifier of the last CF is the agent Ordering the elements of the first clause before the elements in the second results in the P F L containing HOME, the NOT SPEND VP, GOOD FOOD, and the HAVE VP We stack the CF and the P F L of

$2

Note that $4 has a missing agent in the second clause To identify the missing agent in a non-sentence-initiM clause, our algorithm will first test the AF of the preceding clause for possible co- specification Because this co-specification would cause no contradiction, the o m i t t e d NP is filled with 'T', which is eventually taken as the AF of

$4 T h e CF is computed by first considering the first clause of $4, since the X clause is the preferred clause of an X B E C A U S E Y construct Since

"money" co-specifies the CF of $3, and nothing else

in the preferred clause co-specifies a m e m b e r of the

P F L , MONEY remains the CF T h e P F L contains COLLEGE, the SPEND VP, EVER.Y DAY, the TO EAT

VP, and the GO OUT TO EAT VP We stack the CF and P F L of $3

$5 contains a subordinate clause with a missing non-agent Our algorithm first considers the 6If we were to split t h e s e n t e n c e u p , t h e n tile f o c u s would shift away f r o m M O N E Y w h e n we p r o c e s s t h e s e c o n d clause (which c o n t r a d i c t s o u r i n t u i t i o n of w h a t t h e f o c u s is in t h i s

p a r a g r a p h )

7 T h e a p p r o p r i a t e n e s s of p l a c i n g e l e m e n t s f r o m b o t h clauses in one P F L a n d r a n k i n g t h e m a c c o r d i n g to clause

m e n l b e r s h i p will b e f u r t h e r i n v e s t i g a t e d T h i s c o n s t r u c t ( " X

B E C A U S E Y " ) is f u r t h e r d i s c u s s e d in s e c t i o n 4

Trang 3

CF, MONEY, as the co-specifier of the o m i t t e d NP;

syntax, semantics and general knowledge inferenc-

ing do not prevent this co-specification, so it is

adopted MONEY is also chosen as the CF since it

is the co-specifier of the o m i t t e d NP occurring in

the verb complement clause which is the preferred

clause in this type of construct

4 D I S C U S S I O N O F E X T E N S I O N S

One of the m a j o r extensions needed in Sidner's

algorithm is a mechanism for handling complex sen-

tences Based on a limited analysis of sample texts,

we propose computing the CF and P F L of a com-

plex sentence based on a classification of sentence

types For instance, for a sentence of the form "X

BECAUSE Y" or "BECAUSE Y, X", we prefer the

expected focus of the effect clause as CF, and or-

der elements of the X clause on the P F L before el-

ements of the Y clause Analogous P F L orderings

apply to other sentence types described here For a

sentence of the form "X CONJ Y", where X and Y

are sentences, and CONJ is "and", "or", or "but",

we prefer the expected focus of the Y clause For a

sentence of the form "IF X ( T H E N ) Y", we prefer

the expected focus of the T H E N clause, while for

"X, IF Y", we prefer the expected focus of the X

clause Further study is needed to determine other

preferences and actions (including how to further

order elements on the PFL) for these and other

sentence types These preferences will likely de-

pend on thematic roles and syntactic criteria (e.g.,

whether an element occurs in the clause containing

the expected CF)

T h e decisions about how these and other exten-

sions should proceed have been or will be based on

analysis of both standard written English and the

written English of deaf students T h e algorithm

will be developed to match the intuitions of native

English speakers as to how focus shifts

A second difference between our algorithm and

Sidner's is that we stack the P F L ' s as well as the

CF's We think that stacking the P F L ' s m a y be

needed for processing standard English (and not

just for our purposes) since focus sometimes re-

volves around the theme of one of the clauses of

a complex sentence, and later returns to revolve

around items of another clause Further investiga-

tion may indicate that we need to add new d a t a

structures or enhance existing ones to handle focus

shifts related to these and other complex discourse

patterns

We should note that while we prefer the CF as

the co-specifier of an omitted NP, Sidner's recency

rule suggests that perhaps we should prefer a mem-

ber of the P F L if it is the last constituent of the

previous sentence (since a null argument seems sim-

ilar to pronominal reference) However, our studies

show that a rule analogous to the recency rule does

not seem to be needed for resolving the co-specifier

of an omitted NP In addition, Carter (1987) feels

the recency rule leads to unreliable predictions for

co-specifiers of pronouns Thus, we do not expect

to change our algorithm to reflect the recency rule

(We also believe we will abandon the recency rule

for resolving pronouns.)

2 7 5

Another task is to specify focus preferences among stacked P F L ' s and stacked CF's, perhaps using t h e m a t i c and syntactic information

An i m p o r t a n t question raised by our analysis is how to handle a paragraph-initial, but not discourse-initial, sentence Do we want to t r e a t it

as discourse-initial, or as any other non-discourse- initial sentence? We suggest (based on analysis of samples) that we should treat the sentence as any non-discourse-initial sentence, unless its sentence type matches one of a set of sentence types (which often mark focus m o v e m e n t from one element to a new one) In this latter case, we will treat the sentence as discourse-initial by calculating the CF and

P F L in the same m a n n e r as a discourse-initial sentence, but we will retain the focus stacks We have identified a n u m b e r of sentence types t h a t should

be included in the set of types which trigger the latter treatment; we will explore whether other sentence types should be included in this set

5 C O N C L U S I O N S

We have discussed proposed extensions to Sid- ner's algorithm to track local focus in the pres- ence of illegally o m i t t e d NP's, and to use the extended focusing algorithm to identify the intended co-specifiers of o m i t t e d NP's This strategy is rea- sonable since LT m a y lead a native signer of ASL

to use discourse-oriented strategies t h a t allow the omission of an NP t h a t is the topic of a preceding sentence when writing English

R E F E R E N C E S

David Carter (1987) Interpreting Anaphors in Natural Language Texts John Wiley and Sons, New York

B a r b a r a J Grosz, Aravind K Joshi and Scott We- instein (1983) Providing a unified account of definite noun phrases in discourse In Proceed- ings of the 21st Annual Meeting of the Associa- tion for Computational Linguistics, 44-50

C - T James Huang (1984) On the distribution and reference of e m p t y pronouns Linguistic In- quiry, 15(4):531-574

Diane C Lillo-Martin (1991) Universal Grammar and American Sign Language Kluwer Academic Publishers, Boston

Candace L Sidner (1979) Towards a Computa- tional Theory of Definite Anaphora Comprehen- sion in English Discourse Ph.D thesis, M.I.T., Cambridge, MA

Candace L Sidner (1983) Focusing in the com- prehension of definite anaphora In Robert C Berwick and Michael Brady, eds., Computational Models of Discourse, chapter 5 , 2 6 7 - 3 3 0 M.I.T Press, Cambridge, MA

Linda Z S u r i and Kathleen F McCoy (1991) Language transfer in deaf writing: A correction methodology for an instructional system T R - 91-20, Dept of CIS, University of Delaware Linda Z Suri (1991) Language transfer: A foun- dation for correcting the written English of ASL signers TR-91-19, Dept of CIS, University of Delaware

Tiêu đề	Correcting illegal np omissions using local focus
Tác giả	Linda Z. Suri
Trường học	University of Delaware
Chuyên ngành	Computer and Information Sciences
Thể loại	báo cáo khoa học
Thành phố	Newark

Định dạng
Số trang	3
Dung lượng	358,43 KB