These structures, she claims, are analogous to the null argument structures found in languages like Chinese that allow a null argument if the argument co-specifies the topic of a previou
Trang 1C O R R E C T I N G I L L E G A L N P O M I S S I O N S U S I N G L O C A L F O C U S
Linda Z Suri 1 Department of Computer and Information Sciences
University of Delaware Newark DE 19716 Internet: suri@udel.edu
1 I N T R O D U C T I O N
The work described here is in the context of de-
veloping a system that will correct the written En-
liSh of native users of American Sign Language
SL) who are learning English as a second lan-
guage In this paper we focus on one error class
that we have found to be particularly prevalent:
the illegal omission of NP's
Our previous analysis of the written English of
ASL natives has led us to conclude that language
transfer (LT) can explain many errors, and should
thus be taken advantage of by an instructional sys-
tem (Suri, 1991; Suri and McCoy, 1991) We be-
lieve that many of the omission errors we have
found are among the errors explainable by LT
Lillo-Martin (1991) investigates null argument
structures in ASL She identifies two classes of ASL
verbs that allow different types of null argument
structures Plain verbs do not carry morphological
markings for subject or object agreement and yet
allow null argument structures in some contexts
These structures, she claims, are analogous to the
null argument structures found in languages (like
Chinese) that allow a null argument if the argument
co-specifies the topic of a previous sentence (ttuang,
1984) Such languages are said to be discourse-
oriented languages
As it turns out, our writing samples collected
from deaf writers contain m a n y instances of omit-
ted NP's where those NP's are the topic of a pre-
vious sentence and where the verb involved would
be a plain verb in ASL We believe these errors can
be explained as a result of the ASL native carry-
ing over conventions of (discourse-oriented) ASL to
(sentence-oriented) English
If this is the case, then these omissions can be
corrected if we track the topic, or, in computa-
tional linguistics terms, the local focus, and the
actor focus 2 We propose to do this by develop-
ing a modified version of Sidner's focus tracking
algorithm (1979, 1983) that includes mechanisms
for handling complex sentence types and illegally
omitted NP's
1Thls r e s e a r c h was s u p p o r t e d in p a r t by N S F G r a n t
~ I R I - 9 0 1 0 1 1 2 S u p p o r t was also p r o v i d e d by t h e N e m o u r s
Fotuldation We t h a n k G a l l a u d e t U~fiversity, the N a t i o n a l
Technical I n s t i t u t e for the Deaf, t h e Pennsylvalfia School for
the Deaf, the M a r g a r e t S Sterck School, a n d the B i c u l t u r a l
C e n t e r for p r o v i d i n g us with writing s a m p l e s
2 Grosz, Joshi h a d Weinstein (1983) use the n o t i o n of cen-
t e r i n g to track s o m e t h i n g s i m i l a r to local focus a n d a r g u e
a g a i n s t t h e use of a s e p a r a t e a c t o r focus However, we t h i n k
t h a t the e x a m p l e t h e y use does n o t a r g u e a g a i n s t a s e p a r a t e
a c t o r focus, b u t i l l u s t r a t e s the need for e x t e n s i o n s to Sial-
h e r ' s a l g o r i t h m to specify h o w c o m p l e x s e n t e n c e s s h o u l d be
processed
273
2 F O C U S T R A C K I N G Our focusing algorithm is based on Sidner's fo- cusing algorithm for tracking local and actor foci (Sidner 1979; Sidner 1983) 3 In each sentence, the actor focus (AF) is identified with the (thematic) agent of the sentence T h e Potential Actor Focus List (PAFL) contains all NP's that specify an ani-
m a t e element of the database but are not the agent
of the sentence
Tracking local focus is more complex T h e first sentence in a text can be said to be a b o u t some- thing T h a t something is called the current focus (.CF) of the sentence and can generally be identified via syntactic means, taking into consideration the thematic roles of the elements in the sentence In addition to the CF, an initial sentence introduces
a number of other items (any of which can become the focus of the next sentence) Thus, these items are recorded in a potential focus list ( P F L )
At any given point in a well-formed text, after the first sentence, the writer has a n u m b e r of op- tions:
• Continue talking about the same thing; in this case, the CF doesn't change
• Talk about something just introduced; in this case, the CF is selected from the previous sen- tence's PFL
• Return to a topic of previous discussion; in this case, that topic must have been the CF of
a previous sentence
• Discuss an item previously introduced, b u t which was not the topic of previous discussion;
in this case, that item must have been on the
P F L of a previous sentence
T h e decision (by the r e a d e r / h e a r e r / a l g o r i t h m ) as
to which of these alternatives was chosen by the speaker is based on the thematic roles (with par- ticular attention to the agent role) held by the anaphora of the current sentence, and whether their co-specification is the CF, a previous CF, or
a m e m b e r of the current P F L or a previous PFL Confirmation of co-specifications requires inferenc- ing based on general knowledge and semantics
At each sentence in the discourse, the CF and
P F L of the previous sentence are stacked for the possibility of subsequent return 4 When one of these items is returned to, the stacked CF's and
P F L ' s above it are popped, and are thus no longer available for return
3 Carter.(1987) extended Sichler s work to haaldle in-
t r a s e n t e n t l a l a n a p h o r a , b u t for s p a c e r e a s o n s we do n o t dis- cuss t h e s e e x t e n s i o n s
4Sidner did n o t s t a c k P F L ' s O u r r e a s o n s for s t a c k i n g
P F L ' s are discussed in s e c t i o n 4
Trang 22.1 F I L L I N G I N A M I S S I N G N P
We propose extending this algorithm to iden-
tify an illegally o m i t t e d NP To do this, we treat
the o m i t t e d NP as an anaphor which, like Sidner's
t r e a t m e n t of full definite NP's and personal pro-
nouns, co-specifies an element recorded by the fo-
cusing algorithm This approach is based on the
belief t h a t an o m i t t e d NP is likely to be the topic of
a previous sentence We define preferences among
the focus d a t a structures which are similar to Sid-
ner's preferences
More specifically, when we encounter an omit-
ted NP t h a t is not the agent, we first try to fill
the deleted NP with the CF of the immediately
preceding sentence If syntax, semantics or infer-
encing based on general knowledge cause this co-
specification to be rejected, we then consider mem-
bers of the P F L of the previous sentence as fillers
for the deleted NP If these too are rejected, we con-
sider stacked CF's and elements of stacked PFL's,
taking into account preferences (yet to be deter-
mined) a m o n g these elements
W h e n we encounter an o m i t t e d agent NP, in a
simple sentence or a sentence-initial clause, we first
test the A F of the previous sentence as co-specifier,
then members of the PAFL, the previous CF, and
finally stacked AF's, CF's and PAFL's To iden-
tify a missing agent N P in a non-sentence-initial
clause, our algorithm will first test the AF of the
previous clause, and then follow the same prefer-
ences just given Further preferences are yet to be
determined, including those between the stacked
AF, stacked PAFL, and stacked CF
2.2 C O M P U T I N G T H E C F
To c o m p u t e the CF of a sentence without any
illegally o m i t t e d NP's, we prefer the CF of the last
sentence over members of the PFL, and P F L mem-
bers over members of the focus stacks Exceptions
to these preferences involve picking a non-agent
anaphor co-specifying a P F L m e m b e r over an agent
co-specifying the CF, and preferring a P F L m e m b e r
co-specified by a pronoun to the CF co-specified by
a full definite description
To c o m p u t e the CF of a sentence with an illegally
o m i t t e d NP, our algorithm treats illegally o m i t t e d
NP's as a n a p h o r a since they (implicitly) co-specify
something in the preceding discourse However, it
is i m p o r t a n t to r e m e m b e r t h a t discourse-oriented
languages allow deletions of NP's that are the topic
of the discourse Thus, we prefer a deleted non-
agent as the focus, as long as it closely ties to
the previous sentence Therefore, we prefer the co-
specifier of the o m i t t e d non-agent NP as the (new)
CF if it co-specifies either the last CF or a m e m b e r
of the last PFL If the o m i t t e d NP is the thematic
agent, we prefer for the new CF to be a pronomi-
nal (or, as a second choice, full definite description)
non-agent anaphor co-specifying either the last CF
or a m e m b e r of the last P F L (allowing the deleted
agent NP to be the AF and keeping the AF and CF
different) 5 If no anaphor meets these criteria, then
5As f u t u r e work, we will e x p l o r e h o w to resolve m o r e
t h a n o n e n o n - a g e n t a n a p h o r in a s e n t e n c e co-specifying P F L
e l e m e n t s
274
the members of the CF and P F L focus stacks will
be considered, testing a co-specifier of the o m i t t e d
NP before co-specifiers of pronouns and definite de- scriptions at each stack level
3 E X A M P L E Below, we describe the behavior of the extended algorithm on an example from our collected texts containing b o t h a deleted non-agent and agent
E x a m p l e : "($1) First, in summer I live at home with my parenls ($2) I can budget money easily ($3) I did not spend lot of money at home because
al home we have lot of good foods, I ate lot of foods (S4) While living at college I spend lot of money because_ go out to eat almost everyday ($5) At home, sometimes my parents gave me some money right away when I need_ "
After S1, the AF is I, the CF is I, and the P F L contains SUMMER, HOME, and the LIVE VP For $2,
I is the only anaphor, so it becomes the CF, the
P F L contains HONEY and the BUDGET VP, and the focus stack contains I and the previous PFL
$3 is a complex sentence using the conjunction
"because." Such sentences are not explicitly han- dled by Sidner's algorithm Our analysis so far suggests that we should not split this sentence into two 6, and should prefer elements of the main clause
as focus candidates Thus, we take the CF from the first clause, and rank other elements in t h a t clause before elements in the second clause on the PFL 7 In this case, we have several anaphora: I, money, at home T h e AF remains I T h e CF be- comes MONEY since it co-specifies a m e m b e r of the
P F L and since the co-specifier of the last CF is the agent Ordering the elements of the first clause be- fore the elements in the second results in the P F L containing HOME, the NOT SPEND VP, GOOD FOOD, and the HAVE VP We stack the CF and the P F L of
$2
Note that $4 has a missing agent in the sec- ond clause To identify the missing agent in a non-sentence-initiM clause, our algorithm will first test the AF of the preceding clause for possible co- specification Because this co-specification would cause no contradiction, the o m i t t e d NP is filled with 'T', which is eventually taken as the AF of
$4 T h e CF is computed by first considering the first clause of $4, since the X clause is the pre- ferred clause of an X B E C A U S E Y construct Since
"money" co-specifies the CF of $3, and nothing else
in the preferred clause co-specifies a m e m b e r of the
P F L , MONEY remains the CF T h e P F L contains COLLEGE, the SPEND VP, EVER.Y DAY, the TO EAT
VP, and the GO OUT TO EAT VP We stack the CF and P F L of $3
$5 contains a subordinate clause with a miss- ing non-agent Our algorithm first considers the 6If we were to split t h e s e n t e n c e u p , t h e n tile f o c u s would shift away f r o m M O N E Y w h e n we p r o c e s s t h e s e c o n d clause (which c o n t r a d i c t s o u r i n t u i t i o n of w h a t t h e f o c u s is in t h i s
p a r a g r a p h )
7 T h e a p p r o p r i a t e n e s s of p l a c i n g e l e m e n t s f r o m b o t h clauses in one P F L a n d r a n k i n g t h e m a c c o r d i n g to clause
m e n l b e r s h i p will b e f u r t h e r i n v e s t i g a t e d T h i s c o n s t r u c t ( " X
B E C A U S E Y " ) is f u r t h e r d i s c u s s e d in s e c t i o n 4
Trang 3CF, MONEY, as the co-specifier of the o m i t t e d NP;
syntax, semantics and general knowledge inferenc-
ing do not prevent this co-specification, so it is
adopted MONEY is also chosen as the CF since it
is the co-specifier of the o m i t t e d NP occurring in
the verb complement clause which is the preferred
clause in this type of construct
4 D I S C U S S I O N O F E X T E N S I O N S
One of the m a j o r extensions needed in Sidner's
algorithm is a mechanism for handling complex sen-
tences Based on a limited analysis of sample texts,
we propose computing the CF and P F L of a com-
plex sentence based on a classification of sentence
types For instance, for a sentence of the form "X
BECAUSE Y" or "BECAUSE Y, X", we prefer the
expected focus of the effect clause as CF, and or-
der elements of the X clause on the P F L before el-
ements of the Y clause Analogous P F L orderings
apply to other sentence types described here For a
sentence of the form "X CONJ Y", where X and Y
are sentences, and CONJ is "and", "or", or "but",
we prefer the expected focus of the Y clause For a
sentence of the form "IF X ( T H E N ) Y", we prefer
the expected focus of the T H E N clause, while for
"X, IF Y", we prefer the expected focus of the X
clause Further study is needed to determine other
preferences and actions (including how to further
order elements on the PFL) for these and other
sentence types These preferences will likely de-
pend on thematic roles and syntactic criteria (e.g.,
whether an element occurs in the clause containing
the expected CF)
T h e decisions about how these and other exten-
sions should proceed have been or will be based on
analysis of both standard written English and the
written English of deaf students T h e algorithm
will be developed to match the intuitions of native
English speakers as to how focus shifts
A second difference between our algorithm and
Sidner's is that we stack the P F L ' s as well as the
CF's We think that stacking the P F L ' s m a y be
needed for processing standard English (and not
just for our purposes) since focus sometimes re-
volves around the theme of one of the clauses of
a complex sentence, and later returns to revolve
around items of another clause Further investiga-
tion may indicate that we need to add new d a t a
structures or enhance existing ones to handle focus
shifts related to these and other complex discourse
patterns
We should note that while we prefer the CF as
the co-specifier of an omitted NP, Sidner's recency
rule suggests that perhaps we should prefer a mem-
ber of the P F L if it is the last constituent of the
previous sentence (since a null argument seems sim-
ilar to pronominal reference) However, our studies
show that a rule analogous to the recency rule does
not seem to be needed for resolving the co-specifier
of an omitted NP In addition, Carter (1987) feels
the recency rule leads to unreliable predictions for
co-specifiers of pronouns Thus, we do not expect
to change our algorithm to reflect the recency rule
(We also believe we will abandon the recency rule
for resolving pronouns.)
2 7 5
Another task is to specify focus preferences among stacked P F L ' s and stacked CF's, perhaps using t h e m a t i c and syntactic information
An i m p o r t a n t question raised by our analy- sis is how to handle a paragraph-initial, but not discourse-initial, sentence Do we want to t r e a t it
as discourse-initial, or as any other non-discourse- initial sentence? We suggest (based on analysis of samples) that we should treat the sentence as any non-discourse-initial sentence, unless its sentence type matches one of a set of sentence types (which often mark focus m o v e m e n t from one element to a new one) In this latter case, we will treat the sen- tence as discourse-initial by calculating the CF and
P F L in the same m a n n e r as a discourse-initial sen- tence, but we will retain the focus stacks We have identified a n u m b e r of sentence types t h a t should
be included in the set of types which trigger the latter treatment; we will explore whether other sen- tence types should be included in this set
5 C O N C L U S I O N S
We have discussed proposed extensions to Sid- ner's algorithm to track local focus in the pres- ence of illegally o m i t t e d NP's, and to use the ex- tended focusing algorithm to identify the intended co-specifiers of o m i t t e d NP's This strategy is rea- sonable since LT m a y lead a native signer of ASL
to use discourse-oriented strategies t h a t allow the omission of an NP t h a t is the topic of a preceding sentence when writing English
R E F E R E N C E S
David Carter (1987) Interpreting Anaphors in Natural Language Texts John Wiley and Sons, New York
B a r b a r a J Grosz, Aravind K Joshi and Scott We- instein (1983) Providing a unified account of definite noun phrases in discourse In Proceed- ings of the 21st Annual Meeting of the Associa- tion for Computational Linguistics, 44-50
C - T James Huang (1984) On the distribution and reference of e m p t y pronouns Linguistic In- quiry, 15(4):531-574
Diane C Lillo-Martin (1991) Universal Grammar and American Sign Language Kluwer Academic Publishers, Boston
Candace L Sidner (1979) Towards a Computa- tional Theory of Definite Anaphora Comprehen- sion in English Discourse Ph.D thesis, M.I.T., Cambridge, MA
Candace L Sidner (1983) Focusing in the com- prehension of definite anaphora In Robert C Berwick and Michael Brady, eds., Computational Models of Discourse, chapter 5 , 2 6 7 - 3 3 0 M.I.T Press, Cambridge, MA
Linda Z S u r i and Kathleen F McCoy (1991) Language transfer in deaf writing: A correction methodology for an instructional system T R - 91-20, Dept of CIS, University of Delaware Linda Z Suri (1991) Language transfer: A foun- dation for correcting the written English of ASL signers TR-91-19, Dept of CIS, University of Delaware