We assume that the title of a text is the most concise statement which expresses the most essential information of the text, and that the closer a sentence relates to an important senten
Trang 1E v a l u a t i o n o f I m p o r t a n c e o f S e n t e n c e s
b a s e d on C o n n e c t i v i t y to T i t l e
T a k e h i k o Y o s h i m i a n d T o s h i y u k i O k u n i s h i
T a k a h i r o Y a m a j i a n d Y o j i F u k u m o c h i
S o f t w a r e B u s i n e s s D e v e l o p m e n t C e n t e r , S H A R P C o r p o r a t i o n
492 M i n o s h o - c h o Y a m a t o k o r i y a m a N a r a , J a p a n
A b s t r a c t This paper proposes a method of selecting impor-
tant sentences from a text based on the evaluation
of the connectivity between sentences by using sur-
face information We assume that the title of a text
is the most concise statement which expresses the
most essential information of the text, and that the
closer a sentence relates to an important sentence,
the more important this sentence is The importance
of a sentence is defined as the connectivity between
the sentence and the title The connectivity between
two sentences is measured based on correference be-
tween a pronoun and a preceding (pro)noun, and on
lexical cohesion of lexical items In an experiment
with 80 English texts, which consist of an average
of 29.0 sentences, the proposed method has marked
recall of 78.2% and precision of 57.7%, with the se-
lection ratio being 25% The recall and precision
values surpass those achieved by conventional meth-
ods, which means that our method is more effective
in abridging relatively short texts
(Luhn, 1958; Edmundson, 1969; i ~ I ~ I ~ , 1987;
l~)g~)~ and ~ I ~ , 1988; r ~ ] i ~ l ~ et al., 1989;
Salton et al., 1994; Brandow et al., 1995; ~ ' 2 ~ t Z ~
~ t S ~ (F,~i~k~ et al., 1989; Ono et al., 1994) (Hoey, 1991; Collier, 1994; ~ g i ~ - - , 1997; { ~ ;~
- - ~ et al., 1993) 7)~b5 ~ & i ~ - ~ t , ¢ < ' ~ - ~ ot,¢7)~ 9 ~)~b;5 (Halliday and Hasan, 1976) 7)~, ~
2 2.1 ff-#Y, I- ~Ili~ J- R:q) ~_~l~ I= ~ ~ 7~ {K~
Trang 2[A,B,C} ~
$2 $4"~'' ~ ~ '\
{A,D,E} [B,C,G} ~- ~\,
{ A,D,E,F} {D,H} f4~ ~lZ ~ 7o ~}/~[~.( 5~ ~ ~)
[ ] 1: ~ ¢ ~ / ~ 6 ~ ] ~
~
i<3
(1)
2.2 ~ a ~ ~9 ~ 1 ~ o~ ~ f f i
.~.~ ~ © ~ - e ~ : , #o~ ~ - - , ~ Iz t ~ t ~ - c ~, ~
I~, ~ - ~ , ) , ~ f ~ , .~-~, ~ - ~ , ~ l ~ ©
~ , - ~ - e ~ ~ ~ L.,= e~_~°, t ~ ~
& e S ~ ¢ ) ~ t ~ = M~,I (e)
y_,y_z2, Mi, i ~ S i ~ © ~ ~ t ~ S ~
(2) ~ 9 , ~ ~ t ~ 2.2A ~ _ / ~ - ~ - t 6 _ - - 9 ¢ ~
1444
2 2 1 A , ~ ~ ( ~ ) :~O)~,,~o)~t~
2 2 2 ~ @ ~ ~ " ~ ~/J¢ ~.J ~ ~
t o ~ = ~ d ' h ~ t ~ , ff~J~_[~, "put pressure on"
"put" [ : t ~ ' - - ~ W , "cabinet meeting" ~ "meet- ing" g~ ~ - - ~ k ' ~ , ~ , ~ h ~ - 6
Trang 32.2.3 ~ - , , 0) ~ : g ~ {-~
:k:-~ t ~ { [ ~ _ ~ (Edmundson, 1969; P , ~ 8 , ~ et al.,
1989; Watanabe, 1996) © ~ ) { ~ ' ~ b T o ~g~'~t±,
t 1
b, ~ ~ I : w : 5 ~ b t :
2.2.4 ~ - ~ : a ) ~ a ) - - 9 ~ 9
(Givon, 1979) ~ o ~ : , ~ Sj ¢ ) ~ Y ~ S ~ ~ e ) ~ t Z ] ~
( t ~ , 1985)~L ~ T ~ , ~ e { £ ~ $ < ,
~ X ~ ¢ ) 1/4 ~-F¢~
3
¢ ) ~ © 17.9% ~ b o ~:
t
F
~ S ¢ ) ~ _ ~ = - -
N
- : z x ~ ~ A, ~ ~ , ~ o © - : z : ~ - ~ B, C,
25% ~ LT: ~ 2 1:-3:~l~, - - ~ - - + Y i = ~ J ~
Trang 420
15
© 0
0 0
I
I
2O
[] 2: ~ - ~ t : ~ 6 ~ / ~ ¢
$
©
©
© ©
I
-:/x ~ h A 72.3% 52.6% 26%
-:/;z ~ ~ B 61.7% 39.5% 29%
-:/y~ ~ h C 61.4% 40.9% 29%
->" ~ ~- A D 57.5% 42.2% 27%
~ g / ~ o t z = k - ~ 5 ~za~-~x b-~l~, "shooting"
"gunfire" © ~ 1 ~ , ~ - ~ - ~ t ; ~ z ~ , "gun-
fire" ~ Z E ~ t ~ g ' ~ ~ l z ~o ot~7]~ ~ t~!, , ~ 7%
~ # © ( ~ b ~ : ~ (base) © ~ J ~ =~ ~ t ~ ~_tff,
nounce" ~ "announcement" t~:, ~ ~ L ~ b~"
1446
t$5
4 ~ 3 ~ ~) l :
t : - - ~ b t z ~ -~, ~ 78.2%, ~ # ~ 57.7% © ~
~ 6 : k ~ , U~=
RIU (Hearst, 1997), ~r+)-:~ " b l:°y~ ' = ' ~ { : - ~ - T - ~
R e f e r e n c e s
R Brandow, K Mitze, and L F Rau 1995 Auto- matic Condensation of Electric Publications by Sentence Selection Information Processing &' Management, 31(5):675-685
Trang 5100
80
6O
~ # m / ~ m (%)
4O
20
0
0
x ~ - ~ + ' i ~ I g N ~ N N o ~ N ~ -+ -
+
~ m (%)
A Collier 1994 A System for Automating Concor-
dance Line Selection In Proceedings of NeMLaP,
pages 95-100
H P Ednmndson 1969 New Methods in
Automatic Extracting Journal of the ACM,
16(2):264-285
T Givon 1979 From Discourse to Syntax: Gram-
mar as a Processing Strategy In T Givon, editor,
Discourse and Syntax, pages 81-112 Academic
Press
M A K Halliday and R Hasan 1976 Cohesion in
English Longman
M A Hearst 1997 TextTiling: Segmenting Text
into Multi-paragraph Subtopic Passages Compu-
tational Linguistics, 23(1):33-64
M Hoey 1991 Patterns of Lexis in Text Describ-
ing English Language Oxford University Press
H P Luhn 1958 The Automatic Creation of Lit-
erature Abstracts IBM Journal for Research and
Development, 2(2):159-165
K Ono, K Sumita, and S Miike 1994 Abstract
Generation based on Rhetorical Structure Extrac-
tion In Proceedings of COLING, pages 344-348
G Salton, J Allan, C Buckley, and A Singhal
K Zechner 1996 Fast Generation of Abstracts from General Domain Text Corpora by Extract- ing Relevant Sentences In Proceedings of COL- ING, pages 986-989
r , ~ ] i ¢ ~ , ~ i / ~ , and ~ I ~ - 1989 ~P)~3~cP~')~]~t~
~I:-"p~'~ NLC89-40, ~ - T - ~ { ~
~ I ~ B 1987 ~ ~ t ~ - : , ' : x f f - 2 ~ NL63-
6, ' 1 ~ ¢ ~ ~
~ ' ~ - - ~ , ~ , and P ~ J ~ 1993 ~ 3 ~
~
~J~J~Fq, ~ , and ~ 1 ~ - - 1995 ~-Y-=~ - : ~ © ~ ' 4 ~ - ~ ~ - ~ ~ ~ : ~ , 36(10):2371-2379
~ g ~ n ~ : , ~ 1 ~ , and I ~ - - 1995 ~ t ~
~ - ~ I z $ ~ J ~ L ~ ' : ~ ~ - : / ~ ff-.h GREEN
~ J ~ , 2(1):39-55
~'2~1:1:~ and ~<;~gl~ 1995 ~ f f l / ¢ ~ - - : / © ~
~ ~ ~ , 36(8):1838-1844
{ ~ 1997 ~ L ~ $ ~ J ~ b t : ~ [ ~ • t ~ - - b ~
~
~ 1985 * ~ © ~ t : k ~
~2g?~ 1997 3 ~ © ~ # ~ t : - ~ - ' - J < I ~ 1 ~ In