1. Trang chủ
  2. » Luận Văn - Báo Cáo

Báo cáo khoa học: "Towards resolution of bridging descriptions" docx

3 261 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 3
Dung lượng 258,98 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Of the 1040 DDs in our corpus, 312 30% were identified as anaphoric same head, 492 47% as larger situation/unfamiliar Prince's discourse new, and 204 20% as bridging refer- ences, define

Trang 1

Towards resolution of bridging descriptions

R e n a t a V i e i r a a n d S i m o n e T e u f e l

C e n t r e for C o g n i t i v e S c i e n c e - U n i v e r s i t y o f E d i n b u r g h

2, B u c c l e u c h P l a c e E H 8 9 L W E d i n b u r g h U K {renat a, simone}©cogsci, ed ac uk

A b s t r a c t

We present preliminary results concern-

ing robust techniques for resolving bridging

definite descriptions We report our anal-

ysis of a collection of 20 Wall Street Jour-

nal articles from the Penn Treebank Cor-

pus and our experiments with WordNet to

identify relations between bridging descrip-

tions and their antecedents

1 B a c k g r o u n d

As part of our research on definite description (DD)

interpretation, we asked 3 subjects to classify the

uses of DDs in a corpus using a taxonomy related

to the proposals of (Hawkins, 1978) (Prince, 1981)

and (Prince, 1992) Of the 1040 DDs in our corpus,

312 (30%) were identified as anaphoric (same head),

492 (47%) as larger situation/unfamiliar (Prince's

discourse new), and 204 (20%) as bridging refer-

ences, defined as uses of DDs whose antecedents

coreferential or n o t - - h a v e a different head noun; the

remaining were classified as idioms or were cases for

which the subjects expressed doubt see (Poesio and

Vieira, 1997) for a description of the experiments

In previous work we implemented a system ca-

pable of interpreting DDs in a parsed corpus

(Vieira and Poesio, 1997) Our implementation

employed fairly simple techniques; we concentrated

on anaphoric (same head) descriptions (resolved by

matching the head nouns of DDs with those of

their antecedents) and larger situation/unfamiliar

descriptions (identified by certain syntactic struc-

tures, as suggested in (Hawkins, 1978)) In this

paper we describe our subsequent work on bridging

DDs, which involve more complex forms of common-

sense reasoning

2 B r i d g i n g d e s c r i p t i o n s : a c o r p u s

s t u d y Linguistic and computational theories of bridg- ing references acknowledge two main problems in their resolution: first, to find their antecedents

holding between the descriptions and their anchors (Clark, 1977; Sidner, 1979; Heim, 1982; Carter, 1987; Fraurud, 1990; Chinchor and Sundheim, 1995; Strand, 1997) A speaker is licensed in using a bridg- ing DD when he/she can assume that the common- sense knowledge required to identify the relation is shared by the listener (Hawkins, 1978; Clark and Marshall, 1981; Prince, 1981) This reliance on shared knowledge means that, in general, a system could only resolve bridging references when supplied with an adequate lexicon; the best results have been obtained by restricting the domain and feeding the system with specific knowledge (Carter, 1987) We used the publicly available lexical database Word- Net (WN) (Miller, 1993) as an approximation of a knowledge basis containing generic information

B r i d g i n g D D s a n d W o r d N e t As a first experi- ment, we used WN to automatically find the anchor

of a bridging DD, among the NPs contained in the previous five sentences The system reports a se- mantic link between the DD and the NP if one of the following is true:

• The NP and the DD are synonyms of each other,

as in t h e s u i t - - t h e l a w s u i t

• The NP and the DD are in direct hyponymy relation with each other, for instance, d o l l a r - - t h e

c u r r e n c y

• There is a direct or indirect m e r o n y m y (part-

of relation) between the NP and the DD Indirect meronymy holds when a concept inherits parts from its hypernyms, like c a r inherits the part w h e e l from its hypernym w h e e l e d _ v e h i c l e

• Due to WN's idiosyncratic encoding, it is often

522

Trang 2

necessary to look for a semantic relation between

sisters, i.e h y p o n y m s of the same h y p e r n y m , such

as h o m e - - the house

An a u t o m a t i c search for a semantic relation in

5481 possible a n c h o r / D D pairs (relative to 204

bridging DDs) found a total of 240 relations, dis-

tributed over 107 cases of DDs There were 54 cor-

rect resolutions (distributed over 34 DDs) and 186

false positives

T y p e s o f bridging definite descriptions A

closer analysis revealed one reason for the poor

results: anchors and descriptions are often linked

by other means t h a n direct lexico-semantic rela-

tions According to different anchor/link types and

their processing requirements, we observed six ma-

jor classes of bridging DDs in our corpus:

S y n o n y m y / H y p o n y m y / M e r o n y m y These DDs

are in a semantic relation with their anchors t h a t

m i g h t be encoded in W N Examples are: a) Syn-

o n y m y : n e w album - - the record, three bills - -

the legislation; b) H y p e r n y m y - H y p o n y m y : rice - -

the plant, the television s h o w - - the program; c)

M e r o n y m y : plants - - the pollen, the house - - the

c h i m n e y

N a m e s Definite descriptions m a y be anchored to

proper names, as in: M r s P a r k - - the h o u s e w i f e

and P i n k e r t o n ' s I n c - - the c o m p a n y

E v e n t s There are cases where the anchor of a bridg-

ing DD is not an N P b u t a V P or a sentence Ex-

amples are: .individual investors contend - - T h e y

m a k e the a r g u m e n t in letters ; K a d a n e Oil Co is

c u r r e n t l y drilling t w o wells - - T h e a c t i v i t y

C o m p o u n d N o u n s This class of DDs requires con-

sidering not only the head nouns of a DD and its

anchor for its resolution but also the premodifiers

Examples include: s t o c k m a r k e t crash - - the m a r -

kets, and discount p a c k a g e s - - the discounts

D i s c o u r s e T o p i c There are some cases of DDs

which are anchored to an implicit discourse topic

rather t h a n to some specific N P or VP For instance,

the i n d u s t r y (the topic being oil companies) and the

f i r s t h a l f (the topic being a concert)

I n f e r e n c e One other class of bridging DDs includes

cases based on a relation of reason, cause, conse-

quence, or set-members between an anchor (previous

N P ) and the DD (as in R e p u b l i c a n s / D e m o c r a t i c s - -

the t w o sides, and last w e e k ' s earthquake - - the suf-

f e r i n g people are going through)

T h e relative i m p o r t a n c e of these classes in our

corpus is shown in Table 1 These results explain

in p a r t the poor results obtained in our first experi-

m e n t : only 19% of the cases of bridging DDs fall into

the category which we m i g h t expect W N to handle

C l a s s # % Class # %

S / H / M 38 19% C N o u n s 25 12%

N a m e s 49 24% D T o p i c 15 07%

E v e n t s 40 20% I n f e r e n c e 37 18% Table 1: Distribution of types of bridging DDs

3 O t h e r e x p e r i m e n t s w i t h W o r d N e t

C a s e s t h a t W N c o u l d h a n d l e Next, we consid- ered only the 38 cases of s y n / h y p / m e r relations and tested whether W N encoded a semantic relation be- tween t h e m and their (manually identified) anchors The results for these 38 DDs are s u m m a r i z e d in Ta- ble 2 Overall recall was 39% (15/38) 1

C l a s s Total Found in W N Not Found

S y n 12 4 8

H y p 14 8 6

M e r 12 3 9 Table 2: Search for semantic relations in WN

P r o b l e m s w i t h W o r d N e t Some of the missing relations are due to the unexpected way in which knowledge is organized in W N For example, our

artifact

I

structure/1 construction/4

/ \

house dwelling, home

specific houses blood family

Figure 1: Part of W N ' s semantic net for buildings

m e t h o d could not find an association between house

and walls, because house was not entered as a hy-

p o n y m of building but of housing, and h o u s i n g does

1 Our previous experiment found correct relations for

34 DDs, from which only 18 were in the syn/hyp/mer class Among these 18, 8 were based on different anchors from the ones we identified manually (for instance, we identified pound - - the currency, whereas our automatic search found sterling - - the currency) Other 16 correct relations resulting from the automatic search were found for DDs which we have ascribed manually to other classes than syn/hyp/mer, for instance, a relation was found for the pair Bach - - the composer, in which the anchor is

a name Also, whereas we identified the pair Koreans

- - the population, the search found a WN relation for

nation - - the p o p u l a t i o n

523

Trang 3

not have a meronymy link to wall whereas building

does On the other hand, specific houses (school-

house, smoke house, tavern) were encoded in WN

as h y p o n y m s of building rather than hyponyms of

house (Fig 1)

D i s c o u r s e s t r u c t u r e Another problem found in

our first test with WN was the large number of false

positives Ideally, we should have a mechanism for

focus tracking to reduce the number of false posi-

t i v e s - (Sidner 1979), (Grosz, 1977) We repeated

our first experiment using a simpler heuristic: con-

sidering only the closest anchor found in a five sen-

tence window (instead of all possible anchors) By

adopting this heuristic we found the correct anchors

for 30 DDs (instead of 34) and reduced the n u m b e r

of false positives from 186 to 77

4 F u t u r e w o r k

We are currently working on a revised version of the

system that takes the problems just discussed into

account A few names are available in WN, such as

famous people, countries, cities and languages For

other names, if we can infer their entity type we

could resolve them using WN Entity types can be

identified by complements like Mr., Co., Inc etc

An initial implementation of this idea resulted in

the resolution of 53% (26/49) of the cases based

on names Some relations are not found in WN,

for instance, Mr Morishita (type p e r s o n ) - - the 57

year-old To process DDs based on events we could

try first to transform verbs into their nominalisa-

tions, and then looking for a relation between nouns

in a semantic net Some rule based heuristics or a

stochastic method are required to 'guess' the form

of a nominalisation We propose to use W N ' s mor-

phology component as a stemmer, and to augment

the verbal stems with the most common suffixes for

nominalisations, like -ment, -ion In our corpus, 16%

(7/43) of the cases based on events are direct nom-

inalisations (for instance, changes were proposed

the proposals), and another 16% were based on se-

mantic relations holding between nouns and verbs

(such as borrou~,ed the loan) The other 29 cases

(68%) of DDs based on events require inference rea-

soning based on the compositional meaning of the

phrases (as in It u~ent looking for a partner the

prospect); these cases are out of reach just now, as

well as the cases listed under "'discourse topic" and

"inference" We still have to look in more detail at

c o m p o u n d nouns

R e f e r e n c e s

Carter, D M 1987 Interpreting Anaphors in Vat- ural Language Tezts Ellis Horwood, Chichester

UK

Chinchor, N A and B Sundheim 1995 (MUC) tests of discourse processing In Proc AAA[ SS

on Empirical Methods in Discourse Interpretation and Generation pages 21-26, Stanford

Clark, H H 1977 Bridging In Johnson-Laird and Wason, eds Thinking: Readings in Cognitive Science Cambridge University Press, Cambridge

Clark, H H and C P~ Marshall 1981 Definite ref- erence and mutual knowledge In Joshi, Webber and Sag, eds.,Elements of Discourse Understand- ing Cambridge University Press, Cambridge

Fraurud, K 1990 Definiteness and the Processing

of Noun Phrases in Natural Discourse Journal of Semantics, 7, pages 39.5-433

Grosz, B J 1977 The Representation and Use of Focus in Dialogue Understanding Ph.D thesis,

Stanford University

Hawkins, J A 1978 Definiteness and Indefinite- ness Croom Helm, London

Helm, I 1982 The Semantics of Definite and In- definite Noun Phrases Ph.D thesis, University of

Massachusetts at Amherst

Miller, G et al 1993 Five papers in WordNet

Technical Report CSL Report ~3, Cognitive Sci-

ence Laboratory, Princeton University

Poesio, M and Vieira R 1997 A Corpus based investigation of definite description use Manuscript, Centre for Cognitive Science, Univer- sity of Edinburgh

Prince, E 1981 Toward a t a x o n o m y of given/new information In Cole ed., Radical Pragmatics

Academic Press New York, pages '223-255 Prince, E 1992 The ZPG letter: subjects, definete- ness, and information-status In T h o m p s o n and Mann, eds., Discourse description: diverse analy- ses of a fund raising text Benjamins Amsterdam,

pages 295-325

Sidner, C L 1979 Towards a computational the- ory of definite anaphora comprehension in English discourse Ph.D thesis MIT

Strand, K 1997 A Taxonomy of Linking Relations

Journal of Semantics, forthcoming

Vieira, R and M Poesio 1997 Corpus-based processing of definite descriptions In Botley and McEnery eds., Corpus-based and computational approaches to anaphora UCL Press London

5 2 4

Ngày đăng: 08/03/2014, 21:20

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm