1. Trang chủ
  2. » Luận Văn - Báo Cáo

Báo cáo khoa học: "Evaluation of Importance of Sentences based on Connectivity to Title" doc

5 335 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Evaluation of Importance of Sentences Based on Connectivity to Title
Tác giả Takehiko Yoshimi, Toshiyuki Okunishi, Takahiro Yamaji, Yoji Fukumochi
Trường học Software Business Development Center, SHARP Corporation
Chuyên ngành Software Development
Thể loại Báo cáo khoa học
Thành phố Nara
Định dạng
Số trang 5
Dung lượng 419,88 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

We assume that the title of a text is the most concise statement which expresses the most essential information of the text, and that the closer a sentence relates to an important senten

Trang 1

E v a l u a t i o n o f I m p o r t a n c e o f S e n t e n c e s

b a s e d on C o n n e c t i v i t y to T i t l e

T a k e h i k o Y o s h i m i a n d T o s h i y u k i O k u n i s h i

T a k a h i r o Y a m a j i a n d Y o j i F u k u m o c h i

S o f t w a r e B u s i n e s s D e v e l o p m e n t C e n t e r , S H A R P C o r p o r a t i o n

492 M i n o s h o - c h o Y a m a t o k o r i y a m a N a r a , J a p a n

A b s t r a c t This paper proposes a method of selecting impor-

tant sentences from a text based on the evaluation

of the connectivity between sentences by using sur-

face information We assume that the title of a text

is the most concise statement which expresses the

most essential information of the text, and that the

closer a sentence relates to an important sentence,

the more important this sentence is The importance

of a sentence is defined as the connectivity between

the sentence and the title The connectivity between

two sentences is measured based on correference be-

tween a pronoun and a preceding (pro)noun, and on

lexical cohesion of lexical items In an experiment

with 80 English texts, which consist of an average

of 29.0 sentences, the proposed method has marked

recall of 78.2% and precision of 57.7%, with the se-

lection ratio being 25% The recall and precision

values surpass those achieved by conventional meth-

ods, which means that our method is more effective

in abridging relatively short texts

(Luhn, 1958; Edmundson, 1969; i ~ I ~ I ~ , 1987;

l~)g~)~ and ~ I ~ , 1988; r ~ ] i ~ l ~ et al., 1989;

Salton et al., 1994; Brandow et al., 1995; ~ ' 2 ~ t Z ~

~ t S ~ (F,~i~k~ et al., 1989; Ono et al., 1994) (Hoey, 1991; Collier, 1994; ~ g i ~ - - , 1997; { ~ ;~

- - ~ et al., 1993) 7)~b5 ~ & i ~ - ~ t , ¢ < ' ~ - ~ ot,¢7)~ 9 ~)~b;5 (Halliday and Hasan, 1976) 7)~, ~

2 2.1 ff-#Y, I- ~Ili~ J- R:q) ~_~l~ I= ~ ~ 7~ {K~

Trang 2

[A,B,C} ~

$2 $4"~'' ~ ~ '\

{A,D,E} [B,C,G} ~- ~\,

{ A,D,E,F} {D,H} f4~ ~lZ ~ 7o ~}/~[~.( 5~ ~ ~)

[ ] 1: ~ ¢ ~ / ~ 6 ~ ] ~

~

i<3

(1)

2.2 ~ a ~ ~9 ~ 1 ~ o~ ~ f f i

.~.~ ~ © ~ - e ~ : , #o~ ~ - - , ~ Iz t ~ t ~ - c ~, ~

I~, ~ - ~ , ) , ~ f ~ , .~-~, ~ - ~ , ~ l ~ ©

~ , - ~ - e ~ ~ ~ L.,= e~_~°, t ~ ~

& e S ~ ¢ ) ~ t ~ = M~,I (e)

y_,y_z2, Mi, i ~ S i ~ © ~ ~ t ~ S ~

(2) ~ 9 , ~ ~ t ~ 2.2A ~ _ / ~ - ~ - t 6 _ - - 9 ¢ ~

1444

2 2 1 A , ~ ~ ( ~ ) :~O)~,,~o)~t~

2 2 2 ~ @ ~ ~ " ~ ~/J¢ ~.J ~ ~

t o ~ = ~ d ' h ~ t ~ , ff~J~_[~, "put pressure on"

"put" [ : t ~ ' - - ~ W , "cabinet meeting" ~ "meet- ing" g~ ~ - - ~ k ' ~ , ~ , ~ h ~ - 6

Trang 3

2.2.3 ~ - , , 0) ~ : g ~ {-~

:k:-~ t ~ { [ ~ _ ~ (Edmundson, 1969; P , ~ 8 , ~ et al.,

1989; Watanabe, 1996) © ~ ) { ~ ' ~ b T o ~g~'~t±,

t 1

b, ~ ~ I : w : 5 ~ b t :

2.2.4 ~ - ~ : a ) ~ a ) - - 9 ~ 9

(Givon, 1979) ~ o ~ : , ~ Sj ¢ ) ~ Y ~ S ~ ~ e ) ~ t Z ] ~

( t ~ , 1985)~L ~ T ~ , ~ e { £ ~ $ < ,

~ X ~ ¢ ) 1/4 ~-F¢~

3

¢ ) ~ © 17.9% ~ b o ~:

t

F

~ S ¢ ) ~ _ ~ = - -

N

- : z x ~ ~ A, ~ ~ , ~ o © - : z : ~ - ~ B, C,

25% ~ LT: ~ 2 1:-3:~l~, - - ~ - - + Y i = ~ J ~

Trang 4

20

15

© 0

0 0

I

I

2O

[] 2: ~ - ~ t : ~ 6 ~ / ~ ¢

$

©

©

© ©

I

-:/x ~ h A 72.3% 52.6% 26%

-:/;z ~ ~ B 61.7% 39.5% 29%

-:/y~ ~ h C 61.4% 40.9% 29%

->" ~ ~- A D 57.5% 42.2% 27%

~ g / ~ o t z = k - ~ 5 ~za~-~x b-~l~, "shooting"

"gunfire" © ~ 1 ~ , ~ - ~ - ~ t ; ~ z ~ , "gun-

fire" ~ Z E ~ t ~ g ' ~ ~ l z ~o ot~7]~ ~ t~!, , ~ 7%

~ # © ( ~ b ~ : ~ (base) © ~ J ~ =~ ~ t ~ ~_tff,

nounce" ~ "announcement" t~:, ~ ~ L ~ b~"

1446

t$5

4 ~ 3 ~ ~) l :

t : - - ~ b t z ~ -~, ~ 78.2%, ~ # ~ 57.7% © ~

~ 6 : k ~ , U~=

RIU (Hearst, 1997), ~r+)-:~ " b l:°y~ ' = ' ~ { : - ~ - T - ~

R e f e r e n c e s

R Brandow, K Mitze, and L F Rau 1995 Auto- matic Condensation of Electric Publications by Sentence Selection Information Processing &' Management, 31(5):675-685

Trang 5

100

80

6O

~ # m / ~ m (%)

4O

20

0

0

x ~ - ~ + ' i ~ I g N ~ N N o ~ N ~ -+ -

+

~ m (%)

A Collier 1994 A System for Automating Concor-

dance Line Selection In Proceedings of NeMLaP,

pages 95-100

H P Ednmndson 1969 New Methods in

Automatic Extracting Journal of the ACM,

16(2):264-285

T Givon 1979 From Discourse to Syntax: Gram-

mar as a Processing Strategy In T Givon, editor,

Discourse and Syntax, pages 81-112 Academic

Press

M A K Halliday and R Hasan 1976 Cohesion in

English Longman

M A Hearst 1997 TextTiling: Segmenting Text

into Multi-paragraph Subtopic Passages Compu-

tational Linguistics, 23(1):33-64

M Hoey 1991 Patterns of Lexis in Text Describ-

ing English Language Oxford University Press

H P Luhn 1958 The Automatic Creation of Lit-

erature Abstracts IBM Journal for Research and

Development, 2(2):159-165

K Ono, K Sumita, and S Miike 1994 Abstract

Generation based on Rhetorical Structure Extrac-

tion In Proceedings of COLING, pages 344-348

G Salton, J Allan, C Buckley, and A Singhal

K Zechner 1996 Fast Generation of Abstracts from General Domain Text Corpora by Extract- ing Relevant Sentences In Proceedings of COL- ING, pages 986-989

r , ~ ] i ¢ ~ , ~ i / ~ , and ~ I ~ - 1989 ~P)~3~cP~')~]~t~

~I:-"p~'~ NLC89-40, ~ - T - ~ { ~

~ I ~ B 1987 ~ ~ t ~ - : , ' : x f f - 2 ~ NL63-

6, ' 1 ~ ¢ ~ ~

~ ' ~ - - ~ , ~ , and P ~ J ~ 1993 ~ 3 ~

~

~J~J~Fq, ~ , and ~ 1 ~ - - 1995 ~-Y-=~ - : ~ © ~ ' 4 ~ - ~ ~ - ~ ~ ~ : ~ , 36(10):2371-2379

~ g ~ n ~ : , ~ 1 ~ , and I ~ - - 1995 ~ t ~

~ - ~ I z $ ~ J ~ L ~ ' : ~ ~ - : / ~ ff-.h GREEN

~ J ~ , 2(1):39-55

~'2~1:1:~ and ~<;~gl~ 1995 ~ f f l / ¢ ~ - - : / © ~

~ ~ ~ , 36(8):1838-1844

{ ~ 1997 ~ L ~ $ ~ J ~ b t : ~ [ ~ • t ~ - - b ~

~

~ 1985 * ~ © ~ t : k ~

~2g?~ 1997 3 ~ © ~ # ~ t : - ~ - ' - J < I ~ 1 ~ In

Ngày đăng: 23/03/2014, 19:20

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm