1. Trang chủ
  2. » Luận Văn - Báo Cáo

Báo cáo khoa học: "GETTING IDIOMS INTO A LEXICON BASED PARSERS HEAD" pptx

7 315 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 7
Dung lượng 474,84 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

The parser's behaviour is basically the "literal" one, unless a certain threshold is crossed by the weight of a particular idiom.. As the normal literal analysis proceeds and partial rep

Trang 1

Oliviero Stock I.P - Consiglio Nazionale delle Ricerche Via dei Monti Tiburtini 509

00157 Roma, Italy

A B S T R A C T

A n account is given of flexible idiom processing within a

lexicon based parser The view is a compositional one

The parser's behaviour is basically the "literal" one,

unless a certain threshold is crossed by the weight of a

particular idiom A n e w process will then be added The

parser, besides yielding all idiomatic a n d literal

interpretations embodies s o m e claims of h u m a n

processing simulation

1 M o t i v a t i o n a n d c o m p a r i s o n w i t h o t h e r

a p p r o a c h e s

I d i o m s a r e a p e r v a s i v e p h e n o m e n o n in n a t u r a l

languages For instance, the first page of this paper

(even if written by a non-native speaker) includes no

less than halfdozen of them Linguists have proposed

different accounts for idioms, which are derived from

two basic points of view: one point of view considers

idioms as the basic units of l a n g u a g e , w i t h holistic

characteristics, perhaps including w o r d s a s a particular

case; the other point of view emphasizes instead the

fact t h a t idioms are made up of normal parts of speech,

t h a t play a precise role in the c o m p l e t e idiom An

e x p l i c i t s t a t e m e n t w i t h i n t h i s a p p r o a c h is t h e

Principle of Decompositionality (Wasow, S a g and

Nunberg 1982): "When an expression admits analysis

as morphologically or syntactically complex, assume as

an operating hypothesis t h a t the sense of the expression

arises from the composition of t h e s e n s e s of its

constituent parts" The syntactic consequence is that

idioms are not a different thing from "normal" forms

Our view is of the latter kind We are aware of the fact

t h a t the f l e x i b i l i t y of an idiom, d e p e n d s on how

recognizable its metaphorical origin is Within flexible

word order languages the flexibility of idioms seems to

be even more closely linked to the strengths of particular syntactic constructions

Let us n o w briefly discuss s o m e c o m p u t a t i o n a l

a p p r o a c h e s to i d i o m understanding A p p l i e d computational systems m u s t necessarily have a capacity for analyzing idioms In some systems there is

a preprocessor delegated to the recognition of idiomatic forms This preprocessor replaces the group of words that

m a k e for one idiom with the word or words that convey the m e a n i n g involved In A T N systems instead, specially if oriented towards a particular domain, sometimes there are sequences of particular arcs inserted in the network, which, if transited, lead to the recognition of a particular idiom (e.g P L A N E S , Waltz 1978) L I F E R (Hendrix 1977), one of the most successful applied systems, w a s based on a semantic

g r a m m a r , a n d within this m e c h a n i s m idiom recognition was easy to implement, without considering flexibility O f course, in all these systems there is no intention to give an account of h u m a n processing P H R A N (Wilensky and Arens 1980) is a system based entirely on pattern recognition Idiom recognition, following Fillmore's view (Fillmore 1979)

is considered the basic resource all the w a y d o w n to replace the concept of g r a m m a r based parsing P H R A N

is based on a data base of patterns (including single words, at the s a m e level), a n d p r o c e e d s deterministically, applying the two principles "when in doubt choose the more specific pattern'* and "choose the longest pattern' The limits of this approach lie in the

interpretations in case of ambiguity and in running the risk of having an eccessive spread of nonterminal symbols if the data base of idioms is large A recent work on idioms with a similar perspective is Dyer and Zernik (1986)

The approach w e have followed is different The goals w e had with our work must be stated explicitly: I) to yield a cognitive model of idiom processing; 2) to integrate

Trang 2

idioms in our lexical d a t e , j u s t as f u r t h e r information

c o n c e r n i n g words (as in a t r a d i t i o n a l dictionary) 3) to

i n s e r t all t h i s in the f r a m e w o r k of W E D N E S D A Y 2

(Stock 1986), a n o n d e t e r m i n i s t i c lexicon based parser

To a n t i c i p a t e the cognitive solution we are discussing

here: idiom u n d e r s t a n d i n g is based on n o r m a l syntactic

a n a l y s i s w i t h w o r d d r i v e n r e c o g n i t i o n in t h e

background W h e n a c e r t a i n t h r e s h o l d is crossed by

the w e i g h t of a p a r t i c u l a r idiom, the l a t t e r s t a r t s a

p r o c e s s of its own, t h a t m a y e v e n t u a l l y l e a d to a

complete i n t e r p r e t a t i o n

Some of the q u e s t i o n s we h a v e d e a l t with are: how are

idioms to be specified? b) w h e n are they recognized? c)

w h a t h a p p e n s w h e n t h e y a r e r e c o g n i z e d ? d) w h a t

h a p p e n s a f t e r w a r d s ?

2 A s u m m a r y o f W E D N E S D A Y 2

W E D N E S D A Y 2 (Stock 1986) is a p a r s e r based on

l i n g u i s t i c k n o w l e d g e d i s t r i b u t e d f u n d a m e n t a l l y

t h r o u g h t h e lexicon The g e n e r a l viewpoint of the

linguistic r e p r e s e n t a t i o n is not far from LFG ( K a p l a n

& B r e s n a n 1982), a l t h o u g h i n d e p e n d e n t l y conceived

A word i n t e r p r e t a t i o n includes:

- a s e m a n t i c r e p r e s e n t a t i o n of the Word, in the form of

a s e m a n t i c n e t s h r e d ;

- s t a t i c syntactic i n f o r m a t i o n , including t h e category,

features, i n d i c a t i o n of l i n g u i s t i c f u n c t i o n s t h a t a r e

bound to p a r t i c u l a r nodes in the net One p a r t i c u l a r

specification is the M a i n node, the h e a d of the syntactic

c o n s t i t u e n t the word occurs in;

- d y n a m i c s y n t a c t i c i n f o r m a t i o n , including impulses to

connect pieces of s e m a n t i c i n f o r m a t i o n , g u i d e d by

s y n t a c t i c c o n s t r a i n t s Impulses look for "fillers" on a

g i v e n s e a r c h s p a c e T h e y h a v e a l t e r n a t i v e s , (for

i n s t a n c e the word tell h a s a n impulse to merge its

object node w i t h t h e Main node of e i t h e r a n N P or a

s u b o r d i n a t e clause) A n a l t e r n a t i v e i n c l u d e s : a

c o n t e x t u a l c o n d i t i o n of a p p l i c a b i l i t y , a c a t e g o r y ,

features, m a r k i n g , side effects ( t h r o u g h w h i c h , for

example, coreference b e t w e e n subject of a s u b o r d i n a t e

c l a u s e a n d a f u n c t i o n of t h e m a i n c l a u s e c a n be

i n d i c a t e d ) I m p u l s e s m a y also be d i r e c t e d to a

different s e a r c h s p a c e t h a n t h e n o r m a l one w i t h a

m e c h a n i s m t h a t c a n d e a l w i t h l o n g d i s t a n c e

dependencies;

- m e a s u r e s of likelihood These are m e a s u r e s t h a t are

used in order to d e r i v e a n overall m e a s u r e of likelihood

of a p a r t i a l a n a l y s i s M e a s u r e s are included for the

likelihood of t h a t p a r t i c u l a r r e a d i n g of the word a n d for aspects a t t a c h e d to a n impulse: a) for one p a r t i c u l a r

a l t e r n a t i v e b) for the r e l a t i v e position t h e filler c) for the overall necessity o f f i n d i n g a ffiler

- a c h a r a c t e r i z a t i o n of idioms i n v o l v i n g t h a t word (see

n e x t p a r a g r a p h ) The only o t h e r d a t a t h a t the p a r s e r uses are in the form of s i m p l e (non a u g m e n t e d ) t r a n s i t i o n n e t w o r k s

t h a t only provide restrictions on s e a r c h spaces where

i m p u l s e s can look for fillers In more t r a d i t i o n a l words

t h e s e n e t w o r k s d e a l w i t h t h e d i s t r i b u t i o n o f constituents A d i s t i n g u i s h e d symbol, SEXP, indicates

t h a t only t h e o c c u r r e n c e of s o m e t h i n g e x p e c t e d b y preceding words (i.e for which a n impulse was set up) will allow the transition It is stressed t h a t inside a

c o n s t i t u e n t t h e position of e l e m e n t s c a n be free In

W E D N E S D A Y 2 one c a n specify in a n a t u r a l a n d

n o n r e d u n d a n t way, all the g r a d u a l i t y from obligatory positions, to o b l i g a t o r y p r e c e d e n c e s to s i m p l e likelihoods of relative positions

The parser is based on an extension of the idea of chart parsing [Kay 1980, Kaplan 1973] [see Stock 1986]

W h a t is relevant here is the fact that "edges" correspond

to search spaces They are complex data structures provided with a rich a m o u n t of information including

a semantic interpretation of the fragment, syntactic data, pending impulses, an overall measure of likelihood etc Data on an edge are "unified" dynamically

Parsing goes basically bottom-up with top-down confirmation, improving the so called Left C o r n e r technique W h e n a lexical edge with category C is added

to the chart, its First Left Cross References F(C) are fetched First Left Cross References are defined recursively: for every lexical category C, the set of initial states that allow for transitions on C, or the set of initial states (without repetitions) that allow for transitions on symbols in F(C) So, for instance, F(Det) {NP,S~, at least

For each element in F(C) an edge of a special kind is added to the chart These special edges are called

sleeping edges A sleeping edge at a vertex V~ is

awakened, i.e causes the introduction of a normal active edge iffthere is an active edge arriving at Vs that m a y

be extended with an edge with the category of S If they are not awakened, sleeping edges play no role at all in the process

A n agenda is provided which includes tasks ofseveral

different types, including ~ x i c a l tasks, extension tasks, insertion tasks a n d virtual tasks A lexical t a s k specifies

Trang 3

a possible r e a d i n g e r a word to be introduced in the c h a r t

as a n inactive edge A n e x t e n s i o n t a s k s p e c i f i e s a n

active edge and a n inactive e d g e t h a t c a n e x t e n d i t

( t o g e t h e r w i t h some more information) A n i n s e r t i o n

task specifies a n o n d e t e r m i n i s t i c unification operation

A v i r t u a l task consists in e x t e n d i n g a n active edge w i t h

a n edge displaced to a n o t h e r p o i n t of t h e s e n t e n c e ,

according to the m e c h a n i s m t h a t t r e a t s long d i s t a n c e

dependencies At each s t a g e the n e x t t a s k c h o s e n for

execution is the v a l u e of a scheduling-selecting function

The p a r s e r works a s y m m e t r i c a l l y with respects to t h e

" a r r i v a l " of the M a i n node: before t h e M a i n n o d e

a r r i v e s , a n e x t e n s i o n of a n edge c a u s e s a l m o s t

nothing On the a r r i v a l of the Main, all the c a n d i d a t e

fillers m u s t find a c o m p a t i b l e impulse end all i m p u l s e s

c o n c e r n i n g the m a i n node m u s t find satisfaction, f l a i l

this does not h a p p e n t h e n the new edge supposedly to

be a d d e d to the c h a r t is not added: t h e s i t u a t i o n is

recognized as a failure After the a r r i v a l of the Main,

each new head m u s t find a n impulse to m e r g e w i t h ,

and e a c h i n c o m i n g i m p u l s e m u s t find s a t i s f a c t i o n

A g a i n , if all this does not h a p p e n , the new edge will not

be a d d e d to the c h a r t

D y n a m i c a l l y , a p a r t from t h e g e n e r a l b e h a v i o u r of t h e

p a r s e r , t h e r e are some p a r t i c u l a r r e s t r i c t i o n s for i t s

n o n d e t e r m i n i s t i c b e h a v i o u r , t h a t p u t into effect s y n t a x -

b a s e d d y n a m i c d i s a m b i g u a t i o n

1) the S E X P arc a l l o w s for a t r a n s i t i o n o n l y i f t h e

c o n f i g u r a t i o n in the active edge includes a n impulse to

link w i t h the Main of the proposed inactive edge

2) T h e sleeping edge m e c h a n i s m p r e v e n t s e d g e s n o t

c o m p a t i b l e w i t h t h e left c o n t e x t from b e i n g e s t a b l i s h e d

3) A s e a r c h space can be closed only if no impulse t h a t

was specified as h a v i n g to be satisfied r e m a i n s In o t h e r

words, if in a s t a t e with a n outgoing EXIT arc, a n a c t i v e

edge c a n cause the e s t a b l i s h i n g of a n inactive edge only

if t h e r e a r e no obligatory i m p u l s e s left

4) A p r o p o s e d new edge A ' w i t h a v e r b t e n s e n o t

m a t c h i n g the expected v a l u e s c a u s e s a failure, i.e t h a t

A' will not be i n t r o d u c e d in the c h a r t

5) F a i l u r e is caused by i n a d e q u a t e m e r g i n g s , w i t h

r e l a t i o n to the presence, absence or ongoing i n t r o d u c t i o n

of the Main node

C o m p a r i n g to t h e c r i t e r i a e s t a b l i s h e d for L F G for

f u n c t i o n a l c o m p a t i b i l i t y of a n f - s t r u c t u r e [ K a p l a n &

B r e s n a n 1982], the following can be said of the d y n a m i c s

o u t l i n e d here Incompleteness recognition p e r f o r m s a s

specified in 3) a n d f u r t h e r m o r e t h e r e is a n e a r l i e r check

w h e n the Main arrives, in case t h e r e were o b l i g a t o r y impulses to be satisfied a t t h a t point (e.g a n a r g u m e n t

t h a t m u s t occur b e f o r e t h e M a i n ) Incoherence is completely avoided a f t e r the Main has a r r i v e d , by t h e

$ E X P arc m e c h a n i s m ; before this point, it is recognized

as specified in 5) above, a n d causes a n i m m e d i a t e failure

Inconsistency is detected as indicated in 4) a n d 5) As far

as 5) is concerned, t h o u g h , the a t t i t u d e is to " a c t i v a t e " impulses w h e n the r i g h t p r e m i s e s are p r e s e n t a n d to

"look for the r i g h t t h i n g " a n d not to "check if w h a t w a s done is consistent"

Note that a morphological analyzer, W E D - M O R P H , linked to W E D N E S D A Y 2, plays a substantial role, specially if the language is Italian In Italian you m a y find words like rifacendogliene, t h a t s t a n d s for while

m a k i n g s o m e ( o f t h e m ) for him a g a i n T h e morphological a n a l y z e r n o t o n l y r e c o g n i z e s c o m p l e x forms, b u t m u s t be a b l e to p u t t o g e t h e r c o m p l e x

c o n s t r a i n t s o r i g i n a t e d in p a r t by the s t e m a n d in p a r t by

t h e a f f i x e s T h e s a m e h o l d s f o r t h e s e m a n t i c

r e p r e s e n t a t i o n a n d will h a v e c o n s e q u e n c e s in o u r

d e a l i n g w i t h i d i o m s Fig I s h o w s a d i a g r a m o f

W E D N E S D A Y 2

s e n t e n c e u n H i ¢ a l , o n F - - -

-i "o°o0+"'1 I " I I i/

procussor

Fig 1

Idioms are introduced in the l e x i c o n as f u r t h e r specifications of words, j u s t as in a n o r m a l d i c t i o n a r y They m a y be of two types: a) c a n n e d p h r a s e s , t h a t j u s t

b e h a v e a s several-word e n t r i e s in t h e lexicon ( t h e r e is

n o t h i n g p a r t i c u l a r l y i n t e r e s t i n g in t h a t , so we s h a l l n o t

go into d e t a i l here); b) flexible idioms; t h e s e idioms a r e

Trang 4

described in the lexicon bound to the p a r t i c u l a r word

r e p r e s e n t i n g t h e " t h r e a d " o f t h a t i d i o m ; i n

W E D N E S D A Y 2 t e r m s , t h i s is the word t h a t b e a r s the

Main of t h e i m m e d i a t e c o n s t i t u e n t i n c l u d i n g t h e

idiom Thus, Lfwe h a v e a n idiom like to build castles

in the a i r , i t will be described along w i t h the verb, to

build

After the n o r m a l word s p e c i f i c a t i o n s , t h e word m a y

include a list of idiomatic entries Fig.2 shows a BNF

specification of idioms in the lexicon T h e s y m b o l +

s t a n d s for " a t l e a s t one occurrence of w h a t precedes")

Each idiom is described in two sections: the first one

describes t h e e l e m e n t s t h a t c h a r a c t e r i z e t h a t idiom,

expressed c o h e r e n t l y with the n o r m a l c h a r a c t e r i z a t i o n

of the word, the second one describes the i n t e r p r e t a t i o n ,

i.e which s u b s t i t u t i o n s should be performed w h e n the

idiom is recognized

Let us briefly describe Fig 2 The lexicalform indicates

w h e t h e r p a s s i v i z a t i o n ( t h a t in our theory, like in LFG, is

t r e a t e d in t h e lexicon) is a d m i t t e d in t h e i d i o m a t i c

reading The idiom.stats, d e s c r i b i n g c o n f i g u r a t i o n s o f

the c o m p o n e n t s of a n idiom, a r e b a s e d on the basic

impulses i n c l u d e d i n t h e word I n o t h e r w o r d s

c o n s t i t u e n t s of a n idiom a r e described as p a r t i c u l a r

fillers of linguistic functions or p a r t i c u l a r modifiers

For e x a m p l e build castles in the air, w h e n build is in a n

active form, has castles as a further description of the

filler of the O B J function and the string in the air as a

further specification of a particular modifier that m a y

be attached to the M a i n node M O R E S P E C I F I C , the

further specification of an impulse to set a filler for a

function includes: a reference to one of the possible

a l t e r n a t i v e types of i d l e r s s p e c i f i e d i n t h e n o r m a l impulse, a specification t h a t describes t h e f r a g m e n t

t h a t is to play t h i s p a r t i c u l a r role in t h e idiom, a n d t h e

w e i g h t t h a t t h i s c o m p o n e n t h a s i n t h e o v e r a l l recognition of the idiom IDMODIFIER is a specification

of a modifier, including the description of t h e f r a g m e n t

a n d t h e w e i g h t of t h i s component C H A N G E I M P U L S E

a n d R E M O V E I M P U I ~ E c o n s e n t a n a l t e r a t i o n of t h e

n o r m a l s y n t a c t i c b e h a v i o u r The f o r m e r specifies a new

a l t e r n a t i v e for a f i l l e r for a n e x i s t i n g f u n c t i o n , including the description of t h e c o m p o n e n t a n d its

w e i g h t (for i n s t a n c e t h e n e w a l t e r n a t i v e m a y b e a

p a r t i a l N P i n s t e a d of a complete N P (as in take care), or

a N P m a r k e d d i f f e r e n t l y f r o m u s u a l ) T h e l a t t e r specifies t h a t a c e r t a i n impulse, specified for the word,

is to be considered to h a v e b e e n r e m o v e d for t h i s idiom description

T h e r e a r e a n u m b e r of possible f r a g m e n t specifications,

i n c l u d i n g s t r i n g p a t t e r n s , s e m a n t i c p a t t e r n s , morphological v a r i a t i o n s , coreferences etc

S u b s t i t u t i o n s include the s e m a n t i c s of t h e idiom, which

a r e supposed to t a k e t h e place of the l i t e r a l s e m a n t i c s , plus the specfication of t h e new M a i n a n d of t h e

b i n d i n g s for t h e functions New b i n d i n g s m a y be included to specify new s e m a n t i c l i n k i n g s n o t p r e s e n t in

the l i t e r a l m e a n i n g (e.g take care o f ~ : s o m e o n e ~ , if t h e

m e a n i n g is to attend to < : s o m e o n e , , t h e n <:somcone ~

m u s t become a n a r g u m e n t of attend)

< idioms > :: ffi (IDIOMS < i d i o m e n t r y > + )

< i d i o m e n t r y > :: ffi ( < lexicalform > < idiom-stat > + S U B S T I T U T I O N S < i d i o m s u b s t > + )

< lexical£orm > :: = T/(NOT-PASSIVE)

< i d i o m - s t a r >:: ffi (MORESPECIFIC < lingfunc > < a l t e r n n u m > < f r a g m e n t s p e c > < w e i g h t > ) /

( C H A N G E I M P U L S E < lingfunc > < a l t e r n a t i v e > + < f r a g m e n t s p e c > < w e i g h t > ) / (IDMODIFIER < f r a g m e n t s p e c > < w e i g h t > ) /

(REMOVEIMPULSE < l i n g f u n c > )

< a l t e r n a t i v e >:: = ( < t e s t > < fillertype > < b e f o r e l h > < f e a t u r e s > < m a r k > < s i d e f f e c t > < fragmentspec > )

< f r a g m e n t s p e c > :: - (WORD < word >)/(FIXWORDS < wordseq >)/(FIRSTWORDS < wordseq > ) /

(MORPHWORD < wordroot > )/(SEM ( < concept > + ) < prep > ) / ( E Q S U B J )

< i d i o m s u b s t > :: ffi (SEM-UNITS < s e m - u n i t > + )/(MAIN < node > ) /

( B I N D I N G S ( < lingfunc > < node > ) + )/

{NEWBINDINGS( < node > < lingfunc p a t h > ) + )

Fig 2

Trang 5

4 Idiom p r o c e s s i n g

Idiom processing works in W E D N E S D A Y 2

integrated in the nondeterministic, multiprocessing-

based behaviour of the parser As the normal (literal)

analysis proceeds and partial representations are

built, impulses are monitored in the background,

checking for possible idiomatic fragments Monitoring is

carried on only for fragments of idioms not in contrast

with the present configuration A dynamic activation

table is introduced with the occurrence of a word that

has some idiom specification associated Occurrence of

an expected fragment of an idiom in the table raises the

level of activation of that idiom, in proportion to the

relative weight of the fragment If the configuration of

the sentence contrasts with one fragment then the

relative idiom is discarded from the table So all the

normal processing goes on, including the possible

nondeterministic choices, the establishing of new

processes etc The activation tables are included in the

edges of the chart

When the activation level of a particular idiom crosses a

fixed threshold, a new process is introduced,

dedicated to that particular idiom In that process,

only that, idiomatic interpretation is considered Thus,

in the first place, an edge is introduced, in which

substitutions are carried on; the process will proceed

with the idiomatic representation Note t h a t the

process begins at that precise point, with all the

previous literal analysis acquired t o t h e idiomatic

analysis The original process goes on as well (unless

the f r a g m e n t t h a t caused the new process is non

syntactic and only peculiar to that idiom); only, the

idiom is removed from the active idiom table At this

point there are two working processes and it is a

matter of the (external) scheduling function to decide

priorities What is relevant is: a) still, the idiomatic

process may result in a failure: further analysis may

not confirm what has been hypothesized as an idiom; b)

a different idiomatic process may be parted from the

literal process at a later stage, when its own activation

level crosses the threshold

Altogether, this yields all the analyses, literal and

i d i o m a t i c , w i t h l i k e l i h o o d s f o r t h e d i f f e r e n t

interpretations In addition, it seems a reasonable

model of how humans process idioms Some

psycholinguistic experiments have supported this view

(Cacciari & Stock, in preparation) which is also

compatible with the model presented by Swinney and

Cutler (1978)

Here we have disregarded the situation in which a possible idiomatic form occurs and its role in disambiguating The whole parsing mechanism in WEDNESDAY 2 is based on dynamic unification, i.e

at every step in the p a r s i n g process a p a r t i a l interpretation is provided; d y n a m i c choices are performed scheduling the agenda on the base of the relation between partial interpretations and the context

5 An e x a m p l e

As an example let us consider the Italian idiom prendere

// toro per /e corn~ (literally: to take the bull by the

horns; idiomatically: to confront a difficult situation)

The verb prendere (to take) in the lexicon includes some d e s c r i p t i o n s of idioms Fig 3 shows the representation of prendere in the lexicon The stem representation will be unified with other information and constraints coming from the affixes involved in a particular form of the verb The fwst portion of the representation is devoted to the literal interpretation of the word, and includes the semantic representation, the l/kelihood of that reading, and fimctional information, included the specification of impulses for unification The numbers are likelihoods of the presence of an argument or of a relative position of an argument The

(sere-traits (nl(p-take n2 n3))) (likeliradix 0.8)

(ma/n n l ) (lingfunctions (subj n2Xobj n3))

(cat v) (un/(subj)

(must 0.7) ((t np 0.9 nil nora))) (uni (obj)

(must) ((t np 0.3 nil acc))) (idioms ((t

(morespocific (obj) 1 (fixwords il taro) 8) (idmodifier (fixwords per le coma) 10) substitutions

(sere-units (ml(p-confront m2 m3))

(m4 (p-situation m3))

(m5 (p-difficult m3))) (main m l )

(bindings (subj m2))]

Fig 3

Trang 6

second portion, after " i d i o m s " includes the idioms

involving "prendere" In Fig 3 only one such idiom is

specified It is indicated that the idiom can also occur in

a passive form and the specification of the e x p e c t e d

fragments is given The nmnbers here are the weights

of the fragments (the threshold is fixed to 10) The

substitutions include the new semantic representation,

with the specification el" the main ,rode and of the

binding of the subject Note t h a t the surface functional

r e p r e s e n t a t i o n will not be d e s t r o y e d a f t e r t h e

substitutions, only the semantic (logical} representation

will be recomputed, imposing its own bindings

As mentioned, Italian allows g r e a t flexibility Let the

input sentence be rinformatieo prese p e r le corna la

capra (literally: the computer scientist took by the horns

the goat} When prese (took) is analyzed its idiom

activation table is inserted When the modifier per le

corna (by the horns) shows up, the activation of the

idiom referred to above crosses the threshold (the sum of

the two weights goes up to 12) A new process starts at

this point, with the new interpretation unified with the

previous interpretation of the Subject Also, s e m a n t i c

specifications coming from the suffixes are reused in the

new partial interpretation The process just departs from

the literal process, no backtracking is performed At

this point we have two processes going on: an idiomatic

process, where the interpretation is already the

c o m p u t e r scientist is confronting a difficult situation

and a literal process, where, in the background, still other active idioms monitor the events In fig 4 the two semantic representations, in the form of semantic

networks, are shown When the last NP, la capra (the goat), is recognized, the idiq)matic proce.,~ fails(it nee(led

the hull as ()bjcct) The l i t e r a l p r , c e s s y i c h l s its analysis, but also a n o t h e r idiom crosses the threshold, starts its process with the substitutions and immediately concludes positively This latter

unlikely, idiomatic interpretation means the computer scientist confused the goat a n d the horns

6 I m p l e m e n t a t i o n

W E D N E S D A Y 2 is implemented in lnterlisp-D and runs on a Xerox 1186 The idiom recognition ability was e a s i l y i n t e g r a t e d i n t o t h e s y s t e m The performance is very satisfying, in particular with regard to the flexibility present in Italian Around the parser a rich environment has been built Besides allowing easy editing and graphic inspecting of resulting structures, it allows i n t e r a c t i o n with the agenda and exploration of heuristics in order to drive the multiprocessing m e c h a n i s m of W E D N E S D A Y 2

a)

/ , / 1 ~ ~ \ t - - / * / \ z i ~ " 1 1 1 / " \ ~ |

- - 1 1 a ~ p ~.t~4 P-BY C1110¥ ,lld~ ~

p.TQ-TNK.F ;(11~06 ~O'&

b)

Fig 4

Trang 7

This environment constitutes a basic resource for

exploring c o g n i t i v e aspects, c o m p l e m e n t a r y to

laboratory experiments with humans

A t p r e s e n t we a r e a l s o w o r k i n g on an

implementation of a generator that includes the ability

to produce idioms, based on the same data structure and

principles as the parser

A c k n o w l e d g e m e n t s

Thanks to Cristina Cacciari for many discussions and to

Federico Cecconi for his continuous help

Wasow, T., Sag, I., Nunberg, G Idioms: a n interim report Preprints of the International Congress of Linguistics, 87-96, Tokyo (1982)

Wllensky, R &Arens, Y PHRAN A Knowledge Based Approach to Natural Language Analysis University of

C a l i f o r n i a a t B e r k e l e y , ERL M e m o r a n d u m No UCB/ERL M80/34 (1980)

R e f e r e n c e s Dyer, M & Zernik, U Encoding and Acquiring Meaning

for Figurative Phrases In Proceedings of the 24th

Meeting of the Association for Computational

Linguistics New York (1986)

Fillmore, C Innocence: a Second I d e a l i z a t i o n for

Linguistics In Proceedings of th~ Fifth Annual Meeting

of the Berkeley Linguistics Society U n i v e r s i t y of

California at Berkeley, 63-76 (1979)

Hendrix, G.G LIFEP~ a Natural Language Interface

Facility SlGARTNewsletter Vol 61 (1977)

Kaplan, R A general syntactic processor In Rnstin, R

(Ed.), Natural Language Processing Englewood Cliffs,

N.J.: Prentice-Hall (1973)

Kaplan,R & Bresnan~I Lexical-Functional Grammar: a

formal system for g r a m m a t i c a l r e p r e s e n t a t i o n In

B r e s n a n , J , Ed The Mental Representation of

Grammatical Relations The MIT Press, Cambridge,

173-281(1982)

Kay, M Algorithm Schemata and Data Structures in

Syntactic Processing Report CSL-80-12, Xerox, Pale

Alto Research Center, Pale Alto (1980)

Stock, O Dynamic Unification in Lexically Based

Parsing In Proceedings of the Seventh European

Conference on Artificial Intelligence Brighton, 212-221

(1986)

Swinney, D~A., & Cutler, A The Access and Processing

of Idiomatic Expressions Journal of Verbal Learning

and Verbal Beh~viour, 18, 523-534(1978)

Waltz, D An English Language Question Answering

S y s t e m for a L a r g e R e l a t i o n a l D a t a b a s e

Communications of the of the Association for Computing

Machinery, Vol 21, N 7 (1978)

Ngày đăng: 08/03/2014, 18:20

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm