The parser's behaviour is basically the "literal" one, unless a certain threshold is crossed by the weight of a particular idiom.. As the normal literal analysis proceeds and partial rep
Trang 1Oliviero Stock I.P - Consiglio Nazionale delle Ricerche Via dei Monti Tiburtini 509
00157 Roma, Italy
A B S T R A C T
A n account is given of flexible idiom processing within a
lexicon based parser The view is a compositional one
The parser's behaviour is basically the "literal" one,
unless a certain threshold is crossed by the weight of a
particular idiom A n e w process will then be added The
parser, besides yielding all idiomatic a n d literal
interpretations embodies s o m e claims of h u m a n
processing simulation
1 M o t i v a t i o n a n d c o m p a r i s o n w i t h o t h e r
a p p r o a c h e s
I d i o m s a r e a p e r v a s i v e p h e n o m e n o n in n a t u r a l
languages For instance, the first page of this paper
(even if written by a non-native speaker) includes no
less than halfdozen of them Linguists have proposed
different accounts for idioms, which are derived from
two basic points of view: one point of view considers
idioms as the basic units of l a n g u a g e , w i t h holistic
characteristics, perhaps including w o r d s a s a particular
case; the other point of view emphasizes instead the
fact t h a t idioms are made up of normal parts of speech,
t h a t play a precise role in the c o m p l e t e idiom An
e x p l i c i t s t a t e m e n t w i t h i n t h i s a p p r o a c h is t h e
Principle of Decompositionality (Wasow, S a g and
Nunberg 1982): "When an expression admits analysis
as morphologically or syntactically complex, assume as
an operating hypothesis t h a t the sense of the expression
arises from the composition of t h e s e n s e s of its
constituent parts" The syntactic consequence is that
idioms are not a different thing from "normal" forms
Our view is of the latter kind We are aware of the fact
t h a t the f l e x i b i l i t y of an idiom, d e p e n d s on how
recognizable its metaphorical origin is Within flexible
word order languages the flexibility of idioms seems to
be even more closely linked to the strengths of particular syntactic constructions
Let us n o w briefly discuss s o m e c o m p u t a t i o n a l
a p p r o a c h e s to i d i o m understanding A p p l i e d computational systems m u s t necessarily have a capacity for analyzing idioms In some systems there is
a preprocessor delegated to the recognition of idiomatic forms This preprocessor replaces the group of words that
m a k e for one idiom with the word or words that convey the m e a n i n g involved In A T N systems instead, specially if oriented towards a particular domain, sometimes there are sequences of particular arcs inserted in the network, which, if transited, lead to the recognition of a particular idiom (e.g P L A N E S , Waltz 1978) L I F E R (Hendrix 1977), one of the most successful applied systems, w a s based on a semantic
g r a m m a r , a n d within this m e c h a n i s m idiom recognition was easy to implement, without considering flexibility O f course, in all these systems there is no intention to give an account of h u m a n processing P H R A N (Wilensky and Arens 1980) is a system based entirely on pattern recognition Idiom recognition, following Fillmore's view (Fillmore 1979)
is considered the basic resource all the w a y d o w n to replace the concept of g r a m m a r based parsing P H R A N
is based on a data base of patterns (including single words, at the s a m e level), a n d p r o c e e d s deterministically, applying the two principles "when in doubt choose the more specific pattern'* and "choose the longest pattern' The limits of this approach lie in the
interpretations in case of ambiguity and in running the risk of having an eccessive spread of nonterminal symbols if the data base of idioms is large A recent work on idioms with a similar perspective is Dyer and Zernik (1986)
The approach w e have followed is different The goals w e had with our work must be stated explicitly: I) to yield a cognitive model of idiom processing; 2) to integrate
Trang 2idioms in our lexical d a t e , j u s t as f u r t h e r information
c o n c e r n i n g words (as in a t r a d i t i o n a l dictionary) 3) to
i n s e r t all t h i s in the f r a m e w o r k of W E D N E S D A Y 2
(Stock 1986), a n o n d e t e r m i n i s t i c lexicon based parser
To a n t i c i p a t e the cognitive solution we are discussing
here: idiom u n d e r s t a n d i n g is based on n o r m a l syntactic
a n a l y s i s w i t h w o r d d r i v e n r e c o g n i t i o n in t h e
background W h e n a c e r t a i n t h r e s h o l d is crossed by
the w e i g h t of a p a r t i c u l a r idiom, the l a t t e r s t a r t s a
p r o c e s s of its own, t h a t m a y e v e n t u a l l y l e a d to a
complete i n t e r p r e t a t i o n
Some of the q u e s t i o n s we h a v e d e a l t with are: how are
idioms to be specified? b) w h e n are they recognized? c)
w h a t h a p p e n s w h e n t h e y a r e r e c o g n i z e d ? d) w h a t
h a p p e n s a f t e r w a r d s ?
2 A s u m m a r y o f W E D N E S D A Y 2
W E D N E S D A Y 2 (Stock 1986) is a p a r s e r based on
l i n g u i s t i c k n o w l e d g e d i s t r i b u t e d f u n d a m e n t a l l y
t h r o u g h t h e lexicon The g e n e r a l viewpoint of the
linguistic r e p r e s e n t a t i o n is not far from LFG ( K a p l a n
& B r e s n a n 1982), a l t h o u g h i n d e p e n d e n t l y conceived
A word i n t e r p r e t a t i o n includes:
- a s e m a n t i c r e p r e s e n t a t i o n of the Word, in the form of
a s e m a n t i c n e t s h r e d ;
- s t a t i c syntactic i n f o r m a t i o n , including t h e category,
features, i n d i c a t i o n of l i n g u i s t i c f u n c t i o n s t h a t a r e
bound to p a r t i c u l a r nodes in the net One p a r t i c u l a r
specification is the M a i n node, the h e a d of the syntactic
c o n s t i t u e n t the word occurs in;
- d y n a m i c s y n t a c t i c i n f o r m a t i o n , including impulses to
connect pieces of s e m a n t i c i n f o r m a t i o n , g u i d e d by
s y n t a c t i c c o n s t r a i n t s Impulses look for "fillers" on a
g i v e n s e a r c h s p a c e T h e y h a v e a l t e r n a t i v e s , (for
i n s t a n c e the word tell h a s a n impulse to merge its
object node w i t h t h e Main node of e i t h e r a n N P or a
s u b o r d i n a t e clause) A n a l t e r n a t i v e i n c l u d e s : a
c o n t e x t u a l c o n d i t i o n of a p p l i c a b i l i t y , a c a t e g o r y ,
features, m a r k i n g , side effects ( t h r o u g h w h i c h , for
example, coreference b e t w e e n subject of a s u b o r d i n a t e
c l a u s e a n d a f u n c t i o n of t h e m a i n c l a u s e c a n be
i n d i c a t e d ) I m p u l s e s m a y also be d i r e c t e d to a
different s e a r c h s p a c e t h a n t h e n o r m a l one w i t h a
m e c h a n i s m t h a t c a n d e a l w i t h l o n g d i s t a n c e
dependencies;
- m e a s u r e s of likelihood These are m e a s u r e s t h a t are
used in order to d e r i v e a n overall m e a s u r e of likelihood
of a p a r t i a l a n a l y s i s M e a s u r e s are included for the
likelihood of t h a t p a r t i c u l a r r e a d i n g of the word a n d for aspects a t t a c h e d to a n impulse: a) for one p a r t i c u l a r
a l t e r n a t i v e b) for the r e l a t i v e position t h e filler c) for the overall necessity o f f i n d i n g a ffiler
- a c h a r a c t e r i z a t i o n of idioms i n v o l v i n g t h a t word (see
n e x t p a r a g r a p h ) The only o t h e r d a t a t h a t the p a r s e r uses are in the form of s i m p l e (non a u g m e n t e d ) t r a n s i t i o n n e t w o r k s
t h a t only provide restrictions on s e a r c h spaces where
i m p u l s e s can look for fillers In more t r a d i t i o n a l words
t h e s e n e t w o r k s d e a l w i t h t h e d i s t r i b u t i o n o f constituents A d i s t i n g u i s h e d symbol, SEXP, indicates
t h a t only t h e o c c u r r e n c e of s o m e t h i n g e x p e c t e d b y preceding words (i.e for which a n impulse was set up) will allow the transition It is stressed t h a t inside a
c o n s t i t u e n t t h e position of e l e m e n t s c a n be free In
W E D N E S D A Y 2 one c a n specify in a n a t u r a l a n d
n o n r e d u n d a n t way, all the g r a d u a l i t y from obligatory positions, to o b l i g a t o r y p r e c e d e n c e s to s i m p l e likelihoods of relative positions
The parser is based on an extension of the idea of chart parsing [Kay 1980, Kaplan 1973] [see Stock 1986]
W h a t is relevant here is the fact that "edges" correspond
to search spaces They are complex data structures provided with a rich a m o u n t of information including
a semantic interpretation of the fragment, syntactic data, pending impulses, an overall measure of likelihood etc Data on an edge are "unified" dynamically
Parsing goes basically bottom-up with top-down confirmation, improving the so called Left C o r n e r technique W h e n a lexical edge with category C is added
to the chart, its First Left Cross References F(C) are fetched First Left Cross References are defined recursively: for every lexical category C, the set of initial states that allow for transitions on C, or the set of initial states (without repetitions) that allow for transitions on symbols in F(C) So, for instance, F(Det) {NP,S~, at least
For each element in F(C) an edge of a special kind is added to the chart These special edges are called
sleeping edges A sleeping edge at a vertex V~ is
awakened, i.e causes the introduction of a normal active edge iffthere is an active edge arriving at Vs that m a y
be extended with an edge with the category of S If they are not awakened, sleeping edges play no role at all in the process
A n agenda is provided which includes tasks ofseveral
different types, including ~ x i c a l tasks, extension tasks, insertion tasks a n d virtual tasks A lexical t a s k specifies
Trang 3a possible r e a d i n g e r a word to be introduced in the c h a r t
as a n inactive edge A n e x t e n s i o n t a s k s p e c i f i e s a n
active edge and a n inactive e d g e t h a t c a n e x t e n d i t
( t o g e t h e r w i t h some more information) A n i n s e r t i o n
task specifies a n o n d e t e r m i n i s t i c unification operation
A v i r t u a l task consists in e x t e n d i n g a n active edge w i t h
a n edge displaced to a n o t h e r p o i n t of t h e s e n t e n c e ,
according to the m e c h a n i s m t h a t t r e a t s long d i s t a n c e
dependencies At each s t a g e the n e x t t a s k c h o s e n for
execution is the v a l u e of a scheduling-selecting function
The p a r s e r works a s y m m e t r i c a l l y with respects to t h e
" a r r i v a l " of the M a i n node: before t h e M a i n n o d e
a r r i v e s , a n e x t e n s i o n of a n edge c a u s e s a l m o s t
nothing On the a r r i v a l of the Main, all the c a n d i d a t e
fillers m u s t find a c o m p a t i b l e impulse end all i m p u l s e s
c o n c e r n i n g the m a i n node m u s t find satisfaction, f l a i l
this does not h a p p e n t h e n the new edge supposedly to
be a d d e d to the c h a r t is not added: t h e s i t u a t i o n is
recognized as a failure After the a r r i v a l of the Main,
each new head m u s t find a n impulse to m e r g e w i t h ,
and e a c h i n c o m i n g i m p u l s e m u s t find s a t i s f a c t i o n
A g a i n , if all this does not h a p p e n , the new edge will not
be a d d e d to the c h a r t
D y n a m i c a l l y , a p a r t from t h e g e n e r a l b e h a v i o u r of t h e
p a r s e r , t h e r e are some p a r t i c u l a r r e s t r i c t i o n s for i t s
n o n d e t e r m i n i s t i c b e h a v i o u r , t h a t p u t into effect s y n t a x -
b a s e d d y n a m i c d i s a m b i g u a t i o n
1) the S E X P arc a l l o w s for a t r a n s i t i o n o n l y i f t h e
c o n f i g u r a t i o n in the active edge includes a n impulse to
link w i t h the Main of the proposed inactive edge
2) T h e sleeping edge m e c h a n i s m p r e v e n t s e d g e s n o t
c o m p a t i b l e w i t h t h e left c o n t e x t from b e i n g e s t a b l i s h e d
3) A s e a r c h space can be closed only if no impulse t h a t
was specified as h a v i n g to be satisfied r e m a i n s In o t h e r
words, if in a s t a t e with a n outgoing EXIT arc, a n a c t i v e
edge c a n cause the e s t a b l i s h i n g of a n inactive edge only
if t h e r e a r e no obligatory i m p u l s e s left
4) A p r o p o s e d new edge A ' w i t h a v e r b t e n s e n o t
m a t c h i n g the expected v a l u e s c a u s e s a failure, i.e t h a t
A' will not be i n t r o d u c e d in the c h a r t
5) F a i l u r e is caused by i n a d e q u a t e m e r g i n g s , w i t h
r e l a t i o n to the presence, absence or ongoing i n t r o d u c t i o n
of the Main node
C o m p a r i n g to t h e c r i t e r i a e s t a b l i s h e d for L F G for
f u n c t i o n a l c o m p a t i b i l i t y of a n f - s t r u c t u r e [ K a p l a n &
B r e s n a n 1982], the following can be said of the d y n a m i c s
o u t l i n e d here Incompleteness recognition p e r f o r m s a s
specified in 3) a n d f u r t h e r m o r e t h e r e is a n e a r l i e r check
w h e n the Main arrives, in case t h e r e were o b l i g a t o r y impulses to be satisfied a t t h a t point (e.g a n a r g u m e n t
t h a t m u s t occur b e f o r e t h e M a i n ) Incoherence is completely avoided a f t e r the Main has a r r i v e d , by t h e
$ E X P arc m e c h a n i s m ; before this point, it is recognized
as specified in 5) above, a n d causes a n i m m e d i a t e failure
Inconsistency is detected as indicated in 4) a n d 5) As far
as 5) is concerned, t h o u g h , the a t t i t u d e is to " a c t i v a t e " impulses w h e n the r i g h t p r e m i s e s are p r e s e n t a n d to
"look for the r i g h t t h i n g " a n d not to "check if w h a t w a s done is consistent"
Note that a morphological analyzer, W E D - M O R P H , linked to W E D N E S D A Y 2, plays a substantial role, specially if the language is Italian In Italian you m a y find words like rifacendogliene, t h a t s t a n d s for while
m a k i n g s o m e ( o f t h e m ) for him a g a i n T h e morphological a n a l y z e r n o t o n l y r e c o g n i z e s c o m p l e x forms, b u t m u s t be a b l e to p u t t o g e t h e r c o m p l e x
c o n s t r a i n t s o r i g i n a t e d in p a r t by the s t e m a n d in p a r t by
t h e a f f i x e s T h e s a m e h o l d s f o r t h e s e m a n t i c
r e p r e s e n t a t i o n a n d will h a v e c o n s e q u e n c e s in o u r
d e a l i n g w i t h i d i o m s Fig I s h o w s a d i a g r a m o f
W E D N E S D A Y 2
s e n t e n c e u n H i ¢ a l , o n F - - -
-i "o°o0+"'1 I " I I i/
procussor
Fig 1
Idioms are introduced in the l e x i c o n as f u r t h e r specifications of words, j u s t as in a n o r m a l d i c t i o n a r y They m a y be of two types: a) c a n n e d p h r a s e s , t h a t j u s t
b e h a v e a s several-word e n t r i e s in t h e lexicon ( t h e r e is
n o t h i n g p a r t i c u l a r l y i n t e r e s t i n g in t h a t , so we s h a l l n o t
go into d e t a i l here); b) flexible idioms; t h e s e idioms a r e
Trang 4described in the lexicon bound to the p a r t i c u l a r word
r e p r e s e n t i n g t h e " t h r e a d " o f t h a t i d i o m ; i n
W E D N E S D A Y 2 t e r m s , t h i s is the word t h a t b e a r s the
Main of t h e i m m e d i a t e c o n s t i t u e n t i n c l u d i n g t h e
idiom Thus, Lfwe h a v e a n idiom like to build castles
in the a i r , i t will be described along w i t h the verb, to
build
After the n o r m a l word s p e c i f i c a t i o n s , t h e word m a y
include a list of idiomatic entries Fig.2 shows a BNF
specification of idioms in the lexicon T h e s y m b o l +
s t a n d s for " a t l e a s t one occurrence of w h a t precedes")
Each idiom is described in two sections: the first one
describes t h e e l e m e n t s t h a t c h a r a c t e r i z e t h a t idiom,
expressed c o h e r e n t l y with the n o r m a l c h a r a c t e r i z a t i o n
of the word, the second one describes the i n t e r p r e t a t i o n ,
i.e which s u b s t i t u t i o n s should be performed w h e n the
idiom is recognized
Let us briefly describe Fig 2 The lexicalform indicates
w h e t h e r p a s s i v i z a t i o n ( t h a t in our theory, like in LFG, is
t r e a t e d in t h e lexicon) is a d m i t t e d in t h e i d i o m a t i c
reading The idiom.stats, d e s c r i b i n g c o n f i g u r a t i o n s o f
the c o m p o n e n t s of a n idiom, a r e b a s e d on the basic
impulses i n c l u d e d i n t h e word I n o t h e r w o r d s
c o n s t i t u e n t s of a n idiom a r e described as p a r t i c u l a r
fillers of linguistic functions or p a r t i c u l a r modifiers
For e x a m p l e build castles in the air, w h e n build is in a n
active form, has castles as a further description of the
filler of the O B J function and the string in the air as a
further specification of a particular modifier that m a y
be attached to the M a i n node M O R E S P E C I F I C , the
further specification of an impulse to set a filler for a
function includes: a reference to one of the possible
a l t e r n a t i v e types of i d l e r s s p e c i f i e d i n t h e n o r m a l impulse, a specification t h a t describes t h e f r a g m e n t
t h a t is to play t h i s p a r t i c u l a r role in t h e idiom, a n d t h e
w e i g h t t h a t t h i s c o m p o n e n t h a s i n t h e o v e r a l l recognition of the idiom IDMODIFIER is a specification
of a modifier, including the description of t h e f r a g m e n t
a n d t h e w e i g h t of t h i s component C H A N G E I M P U L S E
a n d R E M O V E I M P U I ~ E c o n s e n t a n a l t e r a t i o n of t h e
n o r m a l s y n t a c t i c b e h a v i o u r The f o r m e r specifies a new
a l t e r n a t i v e for a f i l l e r for a n e x i s t i n g f u n c t i o n , including the description of t h e c o m p o n e n t a n d its
w e i g h t (for i n s t a n c e t h e n e w a l t e r n a t i v e m a y b e a
p a r t i a l N P i n s t e a d of a complete N P (as in take care), or
a N P m a r k e d d i f f e r e n t l y f r o m u s u a l ) T h e l a t t e r specifies t h a t a c e r t a i n impulse, specified for the word,
is to be considered to h a v e b e e n r e m o v e d for t h i s idiom description
T h e r e a r e a n u m b e r of possible f r a g m e n t specifications,
i n c l u d i n g s t r i n g p a t t e r n s , s e m a n t i c p a t t e r n s , morphological v a r i a t i o n s , coreferences etc
S u b s t i t u t i o n s include the s e m a n t i c s of t h e idiom, which
a r e supposed to t a k e t h e place of the l i t e r a l s e m a n t i c s , plus the specfication of t h e new M a i n a n d of t h e
b i n d i n g s for t h e functions New b i n d i n g s m a y be included to specify new s e m a n t i c l i n k i n g s n o t p r e s e n t in
the l i t e r a l m e a n i n g (e.g take care o f ~ : s o m e o n e ~ , if t h e
m e a n i n g is to attend to < : s o m e o n e , , t h e n <:somcone ~
m u s t become a n a r g u m e n t of attend)
< idioms > :: ffi (IDIOMS < i d i o m e n t r y > + )
< i d i o m e n t r y > :: ffi ( < lexicalform > < idiom-stat > + S U B S T I T U T I O N S < i d i o m s u b s t > + )
< lexical£orm > :: = T/(NOT-PASSIVE)
< i d i o m - s t a r >:: ffi (MORESPECIFIC < lingfunc > < a l t e r n n u m > < f r a g m e n t s p e c > < w e i g h t > ) /
( C H A N G E I M P U L S E < lingfunc > < a l t e r n a t i v e > + < f r a g m e n t s p e c > < w e i g h t > ) / (IDMODIFIER < f r a g m e n t s p e c > < w e i g h t > ) /
(REMOVEIMPULSE < l i n g f u n c > )
< a l t e r n a t i v e >:: = ( < t e s t > < fillertype > < b e f o r e l h > < f e a t u r e s > < m a r k > < s i d e f f e c t > < fragmentspec > )
< f r a g m e n t s p e c > :: - (WORD < word >)/(FIXWORDS < wordseq >)/(FIRSTWORDS < wordseq > ) /
(MORPHWORD < wordroot > )/(SEM ( < concept > + ) < prep > ) / ( E Q S U B J )
< i d i o m s u b s t > :: ffi (SEM-UNITS < s e m - u n i t > + )/(MAIN < node > ) /
( B I N D I N G S ( < lingfunc > < node > ) + )/
{NEWBINDINGS( < node > < lingfunc p a t h > ) + )
Fig 2
Trang 54 Idiom p r o c e s s i n g
Idiom processing works in W E D N E S D A Y 2
integrated in the nondeterministic, multiprocessing-
based behaviour of the parser As the normal (literal)
analysis proceeds and partial representations are
built, impulses are monitored in the background,
checking for possible idiomatic fragments Monitoring is
carried on only for fragments of idioms not in contrast
with the present configuration A dynamic activation
table is introduced with the occurrence of a word that
has some idiom specification associated Occurrence of
an expected fragment of an idiom in the table raises the
level of activation of that idiom, in proportion to the
relative weight of the fragment If the configuration of
the sentence contrasts with one fragment then the
relative idiom is discarded from the table So all the
normal processing goes on, including the possible
nondeterministic choices, the establishing of new
processes etc The activation tables are included in the
edges of the chart
When the activation level of a particular idiom crosses a
fixed threshold, a new process is introduced,
dedicated to that particular idiom In that process,
only that, idiomatic interpretation is considered Thus,
in the first place, an edge is introduced, in which
substitutions are carried on; the process will proceed
with the idiomatic representation Note t h a t the
process begins at that precise point, with all the
previous literal analysis acquired t o t h e idiomatic
analysis The original process goes on as well (unless
the f r a g m e n t t h a t caused the new process is non
syntactic and only peculiar to that idiom); only, the
idiom is removed from the active idiom table At this
point there are two working processes and it is a
matter of the (external) scheduling function to decide
priorities What is relevant is: a) still, the idiomatic
process may result in a failure: further analysis may
not confirm what has been hypothesized as an idiom; b)
a different idiomatic process may be parted from the
literal process at a later stage, when its own activation
level crosses the threshold
Altogether, this yields all the analyses, literal and
i d i o m a t i c , w i t h l i k e l i h o o d s f o r t h e d i f f e r e n t
interpretations In addition, it seems a reasonable
model of how humans process idioms Some
psycholinguistic experiments have supported this view
(Cacciari & Stock, in preparation) which is also
compatible with the model presented by Swinney and
Cutler (1978)
Here we have disregarded the situation in which a possible idiomatic form occurs and its role in disambiguating The whole parsing mechanism in WEDNESDAY 2 is based on dynamic unification, i.e
at every step in the p a r s i n g process a p a r t i a l interpretation is provided; d y n a m i c choices are performed scheduling the agenda on the base of the relation between partial interpretations and the context
5 An e x a m p l e
As an example let us consider the Italian idiom prendere
// toro per /e corn~ (literally: to take the bull by the
horns; idiomatically: to confront a difficult situation)
The verb prendere (to take) in the lexicon includes some d e s c r i p t i o n s of idioms Fig 3 shows the representation of prendere in the lexicon The stem representation will be unified with other information and constraints coming from the affixes involved in a particular form of the verb The fwst portion of the representation is devoted to the literal interpretation of the word, and includes the semantic representation, the l/kelihood of that reading, and fimctional information, included the specification of impulses for unification The numbers are likelihoods of the presence of an argument or of a relative position of an argument The
(sere-traits (nl(p-take n2 n3))) (likeliradix 0.8)
(ma/n n l ) (lingfunctions (subj n2Xobj n3))
(cat v) (un/(subj)
(must 0.7) ((t np 0.9 nil nora))) (uni (obj)
(must) ((t np 0.3 nil acc))) (idioms ((t
(morespocific (obj) 1 (fixwords il taro) 8) (idmodifier (fixwords per le coma) 10) substitutions
(sere-units (ml(p-confront m2 m3))
(m4 (p-situation m3))
(m5 (p-difficult m3))) (main m l )
(bindings (subj m2))]
Fig 3
Trang 6second portion, after " i d i o m s " includes the idioms
involving "prendere" In Fig 3 only one such idiom is
specified It is indicated that the idiom can also occur in
a passive form and the specification of the e x p e c t e d
fragments is given The nmnbers here are the weights
of the fragments (the threshold is fixed to 10) The
substitutions include the new semantic representation,
with the specification el" the main ,rode and of the
binding of the subject Note t h a t the surface functional
r e p r e s e n t a t i o n will not be d e s t r o y e d a f t e r t h e
substitutions, only the semantic (logical} representation
will be recomputed, imposing its own bindings
As mentioned, Italian allows g r e a t flexibility Let the
input sentence be rinformatieo prese p e r le corna la
capra (literally: the computer scientist took by the horns
the goat} When prese (took) is analyzed its idiom
activation table is inserted When the modifier per le
corna (by the horns) shows up, the activation of the
idiom referred to above crosses the threshold (the sum of
the two weights goes up to 12) A new process starts at
this point, with the new interpretation unified with the
previous interpretation of the Subject Also, s e m a n t i c
specifications coming from the suffixes are reused in the
new partial interpretation The process just departs from
the literal process, no backtracking is performed At
this point we have two processes going on: an idiomatic
process, where the interpretation is already the
c o m p u t e r scientist is confronting a difficult situation
and a literal process, where, in the background, still other active idioms monitor the events In fig 4 the two semantic representations, in the form of semantic
networks, are shown When the last NP, la capra (the goat), is recognized, the idiq)matic proce.,~ fails(it nee(led
the hull as ()bjcct) The l i t e r a l p r , c e s s y i c h l s its analysis, but also a n o t h e r idiom crosses the threshold, starts its process with the substitutions and immediately concludes positively This latter
unlikely, idiomatic interpretation means the computer scientist confused the goat a n d the horns
6 I m p l e m e n t a t i o n
W E D N E S D A Y 2 is implemented in lnterlisp-D and runs on a Xerox 1186 The idiom recognition ability was e a s i l y i n t e g r a t e d i n t o t h e s y s t e m The performance is very satisfying, in particular with regard to the flexibility present in Italian Around the parser a rich environment has been built Besides allowing easy editing and graphic inspecting of resulting structures, it allows i n t e r a c t i o n with the agenda and exploration of heuristics in order to drive the multiprocessing m e c h a n i s m of W E D N E S D A Y 2
a)
/ , / 1 ~ ~ \ t - - / * / \ z i ~ " 1 1 1 / " \ ~ |
- - 1 1 a ~ p ~.t~4 P-BY C1110¥ ,lld~ ~
p.TQ-TNK.F ;(11~06 ~O'&
b)
Fig 4
Trang 7This environment constitutes a basic resource for
exploring c o g n i t i v e aspects, c o m p l e m e n t a r y to
laboratory experiments with humans
A t p r e s e n t we a r e a l s o w o r k i n g on an
implementation of a generator that includes the ability
to produce idioms, based on the same data structure and
principles as the parser
A c k n o w l e d g e m e n t s
Thanks to Cristina Cacciari for many discussions and to
Federico Cecconi for his continuous help
Wasow, T., Sag, I., Nunberg, G Idioms: a n interim report Preprints of the International Congress of Linguistics, 87-96, Tokyo (1982)
Wllensky, R &Arens, Y PHRAN A Knowledge Based Approach to Natural Language Analysis University of
C a l i f o r n i a a t B e r k e l e y , ERL M e m o r a n d u m No UCB/ERL M80/34 (1980)
R e f e r e n c e s Dyer, M & Zernik, U Encoding and Acquiring Meaning
for Figurative Phrases In Proceedings of the 24th
Meeting of the Association for Computational
Linguistics New York (1986)
Fillmore, C Innocence: a Second I d e a l i z a t i o n for
Linguistics In Proceedings of th~ Fifth Annual Meeting
of the Berkeley Linguistics Society U n i v e r s i t y of
California at Berkeley, 63-76 (1979)
Hendrix, G.G LIFEP~ a Natural Language Interface
Facility SlGARTNewsletter Vol 61 (1977)
Kaplan, R A general syntactic processor In Rnstin, R
(Ed.), Natural Language Processing Englewood Cliffs,
N.J.: Prentice-Hall (1973)
Kaplan,R & Bresnan~I Lexical-Functional Grammar: a
formal system for g r a m m a t i c a l r e p r e s e n t a t i o n In
B r e s n a n , J , Ed The Mental Representation of
Grammatical Relations The MIT Press, Cambridge,
173-281(1982)
Kay, M Algorithm Schemata and Data Structures in
Syntactic Processing Report CSL-80-12, Xerox, Pale
Alto Research Center, Pale Alto (1980)
Stock, O Dynamic Unification in Lexically Based
Parsing In Proceedings of the Seventh European
Conference on Artificial Intelligence Brighton, 212-221
(1986)
Swinney, D~A., & Cutler, A The Access and Processing
of Idiomatic Expressions Journal of Verbal Learning
and Verbal Beh~viour, 18, 523-534(1978)
Waltz, D An English Language Question Answering
S y s t e m for a L a r g e R e l a t i o n a l D a t a b a s e
Communications of the of the Association for Computing
Machinery, Vol 21, N 7 (1978)