1. Trang chủ
  2. » Luận Văn - Báo Cáo

Báo cáo khoa học: "EVALUATION OF NATURAL LANGUAGE INTERFACES TO DATA BASE SYSTEMS" pdf

4 342 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 4
Dung lượng 360,49 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

navy ships by a group in San Diego, Cal- ifornia.. An evaluation which would not include an analysis of unparsed input would at best be of limited value.. , Sentences with relative claus

Trang 1

Bozena Henisz Thompson California Institute of Technology

INTEODUCT~ON

I s e v a l u a t i o n , l i k e b e a u t y , i n t h e eye of t h e b e h o l d e r ?

The a n s w e r i s f a r from s i m p l e b e c a u s e i t d e p e n d s on who

i s c o n s i d e r e d t o be t h e p r o p e r b e h o l d e r E v a l u a c o r s may

r a n g e from c a s u a l u s e r s to s o c i e t y a s a w h o l e , w i t h s y s -

tem builders, sophisticated users, linguists, grant pro-

viders, s y s t e m b u y e r s , and o t h e r s i n b e t w e e n The

members of t h l s p a n e l a r e s y s t e m b u i l d e r s and l i n g u i s t s

- - o r r a t h e r t h e t~ao f u s e d i n t o one - - b u t , I b e l i e v e ,

i n t e r e s t e d i n a l l o r a l m o s t a l l a c t u a l o r p o t e n t i a l

b o d i e s of e v a l u a t o r s One of o u r c o l l e a g u e s e x p r e s s e d a

f o r c e f u l o p i n i o n w h i l e b e i n g a member of a s i m i l a r p a n e l

a t l a s t y e a r ' s ACL c o n f e r e n c e : "Those of us on t h i s

p a n e l and o t h e r r e s e a r c h e r s i n t h e f i e l d s i m p l y d o n ' t

have t h e r i g h t to d e t e r m i n e w h e t h e r a s y s t e m i s p r a c t i -

c a l Only t h e u s e r s of such a s y s t e m c a n make Chat

d e t e r m i n a t i o n Only a u s e r can d e c i d e w h e t h e r t h e hi

[ n a t u r a l l a n g u a g e ] c a p a b i l i t y c o n s t i t u t e s s u f f i c i e n t

added v a l u e to be deemed p r a c t i c a l Only a u s e r can

d e c i d e i f t h e s y s t e m ' s f r e q u e n c y of i n a p p r o p r i a t e

r e s p o n s e i s s u f f i c i e n t l y low t o be deemed p r a c t i c a l

Only a u s e r can d e c i d e w h e t h e r t h e o v e r a l l NL i n t e r a c -

t i o n , t a k e n i n t o t o , o f f e r s enough b e n e f i t s o v e r a l t e r -

n a t i v e f o r m a l i n t e r a c t i o n s to be deemed p r a c t i c a l " I l l

I t i s h a r d f o r me co d i s a g r e e , s i n c e I a r g u e d a s f o r c e -

f u l l y on t h e b a s i s of my s t u d y of u s e r s * e v a l u a t i o n of

m a c h i n e t r a n s l a t i o n [2] - - a s t u d y w h i c h was prompted by

t h e e v a l u a t i o n s of t h e q u a l i t y of m a c h i n e t r a n s l a t i o n a s

v i e w e d by l i n g u i s t s and u s e r s , r a n g i n g from 35Z a c c e p t -

a b l e f o r t h e f o r m e r t o 90Z f o r t h e l a t t e r Whet t h e

s t u d y a l s o showed was c h a t t h e p r a c t i c a l i t y of t h e o u t -

p u t c o u l d i n d e e d o n l y be j u d g e d by t h e u s e r s , s i n c e even

i n c o m p l e t e and s t y l i s t i c a l l y v e r y i n e l e g a n t t r a n s l a t i o n s

w e r e found q u i t e u s e f u l i n p r a c t i c e b e c a u s e t h e y , on t h e

one h a n d , p r o v i d e d , however c r u d e l y , t h e i n f o r m a t i o n

s o u g h t by t h e u s e r s , and, on t h e o t h e r hand, t h e u s e r s

t h e m s e l v e s b r o u g h t k n o w l e d g e c h a t made t h e t e x t s f a r

more u n d e r s t a n d a b l e and u s e f u l t h e n m i g h t a p p e a r co a

n o n s p e c i a l i s t l i n g u i s t But t h i s e n d o r s e m e n t on mY p e r t

of t h e u s e r a~ t h e u l t i m a t e j u d g e i n e v a l u a t i o n s d o e s

n o t p r e c l u d e my f u l l y s u b s c r i b i n g co Norm S o n d h e i m e r ' s

[3] i n t r o d u c t o r y c o ~ e n t s co t h i s p a n e l s t a t i n g t h a t to

"make p r o g r e s s as a f i e l d , we need t o be a b l e Co e v a l u -

a t e " We a r e now l e s s l i k e l y co c o n f u s e t h e i s s u e o f t h e

e v a l u a t i o n by p e o p l e l i k e o u r s e l v e s and t h e j u d g m e n t of

t h e u s e r s , l e s s l i k e l y t o be s u r p r i s e d a t t h e d i s c r e p a n -

c i e s , and less likely to be surprised at the users"

Also, we are far more aware of the fact chac evaluations

of '~orth" or "quality" have Co be conducted in the con-

t e x t s of t h e a c t u a l , p e r c e i v e d n e e d s Zn e x t e n s i v e s t u -

d i e s on e v a l u a t i o n of i n n o v a t i o n s , M o s t e l l e r [ 4 ] , t h e

r e c e n t l y r e t i r e d p r e s i d e n t of AAAS, found t h a t " s u c c e s s -

f u l i n n o v a t o r s b e t t e r u n d e r s t a n d u s e r n e e d s ; [and] pay

more a t t e n t i o n t o m a r k e t i n g " The same s o u r c e ,

however, l e a d s me co t h e n o t o r i o u s d i f f i c u l t i e s of

e v a l u a t i o n g i v e n t h e v i d e r a n g e of e v a l u a C o r s and t h e i r

p u r p o s e s We a r e a l l u n d o u b t e d l y c o n v i n c e d o f t h e v a l u e

of NLI f o r t h e s o c i e t y as a w h o l e , b u t t h e e v a l u a t i o n of

e x p e r i m e n t s w i t h t h e s e i n t e r f a c e s i s a n o t h e r m a t t e r

M o s c e l l e r was f a c e d w i t h s o c i a l , s o c i o m e d i c a l , and m e d i -

c a l f i e l d s L e t me r e c o u n t some of t h e s t u d i e s he and

h i s team made f o r r e a s o n s w h i c h w i l l soon become o b v i -

o u s His teem s c o r e d a g i v e n program on a s c a l e from

plus ~wo Co minus ~wo with zero meaning there was essen-

tially uo g a i n A c c o r d i n g l y , a s t u d y of d e l i n q u e n t

girls that identified th ~- buc failed to prevent them

from d e l i n q u e n c y r e c e i v e d a z e r o L i k e w i s e , a z e r o was

used: (I) no treatment, (2) an alcoholic clinic, and

( 3 ) A l c o h o l i c s Anonymous S i n c e t h e "no t r e a t m e n t "

g r o u p p e r f o r m e d somewhat b e t t e r , s h o r t - t e r m r e f e r r a l s

w e r e c o n s i d e r e d of no v a l u e A minus one was g i v e n to a

s t u d y whose r e s u l t s w e r e o p p o s i t e co t h o s e hoped f o r : a

m a j o r i n s u r a n c e cOmpany i n c r e a s e d o u t p a t i e n t b e n e f i t s i n

t h e hope of d e c r e a s i n g h o s p i t a l c o s t s , b u t t h e o u t p a -

t i e n t g r o u p ' s h o s p i t a l s t a y s i n c r e a s e d F i n a l l y , a d o u -

b l e p l u s was swarded t o an e x p e r i m e n t i n v o l v i n g t h e S a l k

v a c c i n e , w h i c h w a s , p r e d i c t a b l y , v e r y s u c c e s s f u l Now

t h i s k i n d of e v a l u a t i o n may be j u s t i f i e d when t h e n e e d s

of t h e s o c i e t y a r e a t s t a k e I h a v e gone i n t o t h e s e

d e t a i l s , h o w e v e r , f o r t h e p u r p o s e of e x p r e s s i n g t h e

o p i n i o n , i n w h i c h I know I ' m n o t a l o n e , t h a t n e l a t i v e

r e s u l t s a r e a s i m p o r t a n t as p o s i t i v e o n e s , t h a t e v a l u a -

t i o n i n o u r c a s e i s a l m o s t e q u i v a l e n t to t h e amount of

i n f o r m a t i o n o b t a i n e d i n an e x p e r i m e n t An e x p e r i m e n t whose r e s u l t s would be t o t a l l y p r e d i c t a b l e would be

a l m o s t u s e l e s s , b u t one w i t h r e s u l t s d i f f e r e n t frOm

t h o s e hoped f o r m i g h t be e m b a r r a s s i n g b u t v e r y v a l u a b l e

A n o t h e r c ~ e n t prompted by t h o s e e v a l u a t i o n s i s c h a t

inappropriate in the case of NLI evaluations

NLI EVALUATIONS

A METHODOLOGY AND SOME RESULTS

I t had b e e n w i d e l y t a k e n f o r g r a n t e d some t i m e ago Chat l~LI i s a s good a s i s i t s g r - ~ - r , and a grammar i s a s good a s i t i s e x t e n s i v e The s p e c i f i c n e e d s of u s e r s ,

t h e r e q u i r e m e n t s o f s p e c i a l t a s k s and t h e l i k e cook a

b a c k s e a t The n a t u r e of ht an d i s c o u r s e was y e t t o be

e x p l o r e d H a p p i l y , we h a v e b e e n i n a d i f f e r e n t s i t u a -

t i o n for some time When the REL [5, 5, 7] system was getting into • reasonably sturdy shape with respect to

s p e e d and b u s s , I s t a r t e d p l a n n i n g e x p e r i m e n t s t o t e s t

i t There y e s i m p o r t a n t l i t e r a t u r e a b o u t d i s c o u r s e ,

e s p e c i a l l y i n s o c i o l o g y , s u c h a s t h e work of S c h e g l o f f

I t was t h u s c l e a r t h a t s u c c e s s f u l NLI e x p e r i m e n t s had Co

be b a s e d on k n o w l e d g e of h i , a n d i s c o u r s e St was a l s o

c l e a r c h a t t h a t was t h e way Co make t h e i n t e r f a c e more

n a t u r a l T h i s a s s ~ p t i o n h a s a l r e a d y been f r u i t f u l :

t h e NL i n t e r f a c e i n POL [ 9 ] , a s u c c e s s o r Co REL, h a s

a l r e a d y b e e n e x t e n s i v e l y i m p r o v e d a s a r e s u l t of t h e

E E L - r e l a t e d e x p e r i m e n t s

E x p e r i m e n t s w e r e made i n t h r e e modes: i n a d d i t i o n t o

f a c e - t o - f a c e and h u m a n - t o - c o ~ p u t e r , c e r a i n a l - c o - t e r m i n a l

c o m m u n i c a t i o n was e x a m i n e d , s i n c e a t p r e s e n t c h a t i s t h e

o n l y p r a c t i c a l mode of a c c e s s i n g t h e c o m p u t e r Through

e a r l y 1980, Over 80 s u b j e c t s , 8 0 , 0 0 0 w o r d s , and o v e r 50

h o u r s w e r e a n a l y z e d i n g r e a t d e t a i l I n t h e f a l l o f

1980, a n o t h e r 13 s u b j e c t s w e r e t e s t e d i n t h e c o m p u t a -

t i o n a l mode o n l y , a d d i n g a p p r o x i m a t e l y 20 h o u r s From

t h e s t a r t , t h e e x p e r i m e n t s w e r e e n c o u r a g i n g , a l t h o u g h

l i m i t e d t o ~wo modes: F-F and T-T I n t e r a c t i o n s n o t

o n l y showed a g r e a t d e a l of s t r u c t u r e b u t e x t e n s i v e

s i m i l a r i t i e s i n b o t h modes, t h e most i m p o r t a n t b e i n g t h e

c o n s t a n c y of t h e nt=aber of words i n s e n t e n c e s ( a b o u t 70Z); t h e l e n g t h of s e n t e n c e s ( a b o u t 7 w o r d s ) ; t h e

e x i s t e n c e of f r a g m e n t s (70Z of m e s s a g e s i n F-F and 50Z

in T-T containing them); and phatics (10Z of total for

=odes were a candidate for consideration in experiments

in the computational mode, the T-T mode being seemingly quite far removed from natural F-F The sentence having historically been the unit of analysis (and since phat-

i t s were considered of lesser Lmportance from the compu- tational v i ~ , although of great interest in general),

m 7 a t t e n t i o n t u r n e d Co f r a g m e n t s REL a l l o w e d f o r t h r e e

n o n - s e n t e n c e t y p e s t r u c t u r e s : "NP?" ( i n c l u d i n g number

p a r s e d i n t o NP); " a l l / n o n e or uomber" a n s w e r s ; and

39

Trang 2

s i b l e to i n c l u d e i n d i v i d u a l knowledge and t e r m i n o l o g y

The a n a l y s i s of F-F and T-T p r o t o c o l s , however, showed

t h e e x i s t e n c e of o t h e r f r a g m e n t c a t e g o r i e s , f i n a l l y

a n a l y z e d ~ n c o a dozen c a t e g o r i e s ( s e e [ 8 ] ) S i n c e t h e y

c o n s t i t u t e a c o n s i d e r a b l e amount of F-F c o n v e r s a t i o n s

and even T-T p r o t o c o l s , t h e y c l e a r l y had co be w a t c h e d

f o r i n c o m p u t a t i o n a l e x p e r i m e n t s

The e x p e r i m e n t s f o r a c t u a l l y o b s e r v i n ~ u s e r - s y s t e m

i n t e r a c t i o n w e r e c o n d u c t e d i n t h e w i n t e r Cem o f 1979/80

and produced 21 p r o t o c o l s , t h e a n a l y s i s o f which was

compared w i t h r e s u l t s of e i g h t F-F and fou~ T-T e x p e r i -

m e n t s A n o t h e r 13 c o m p u t a t i o n a l e x p e r i m e n t s done i n t h e

fall c o u f i m e d the results o f the earlier ones The

Cask i n a l l t h r e e =odes was a r e a l o n e : l o a d i n g c a r g o

o n t o a s h i p , t h e d a t a coming from t h e a c t u a l e n v i r o o m e n t

of loading U.S navy ships by a group in San Diego, Cal-

ifornia In the F-F and T-T e x p e r i m e n t s , ~n,~o persons

were involved one given cargo item~ Co be loaded, the

other infot~nation about decks (details in [8]) In the

computational mode (H-C) the ship data was in ~he com-

puter and the l i s t of c a r g o t o be l o a d e d was handed Co

the subjects, all with Caltech background Details

b e i n g a v a i l a b l e e l s e w h e r e a n d s p a c e l i m i t e d h e r e , o n l y

some m a j o r r e s u l t s a r e g i v e n h e r e T a b l e 1 shows t h e

c o m p a r i s o n of t h e t h r e e modes

TABLE 1

S e n t e n c e l e n g t h 6 8 6.I 7 8

F r e q u e n t l e n g t h 2.7 2 8 2 8

Z words i n s e n t e n c e s 68.8 7 2 8 8 9 3

Z w o r d s i n f r a g m e n t s 17.2 2 1 1 10.7

Toca~ AvR ~ota~ Avt ToCa~ Ave,

P a r s e d & n o n p a r s e d 1615 77

S e n t e n c e s 5302 663 385 77 882 42

Phatics (including

c o n n e c t o r s & t a g s ) 48A2 605 148 37 46 2

T o t a l ~ o t a [ T o t a l Words i n m e s s a g e s 49800 3285 8525

Words i n s e n t e n c e s 34266 2393 6880

Words i n f r a ~ e n c s 8584 694 823

As can be s e e n , s e v e r a l s t a t i s t i c s show s i a i l a r i t i a s :

s e n t e n c e l e n g t h , message l e n g t h , f r a g m e n t l e n g t h , p e r -

c e n t a g e of w o r d s i n s e n t e n c e s and f r a g m e n t s The c l o s e -

ness o f t h e a v e r a g e o f m e s s a g e s i n T-T and p a r s e d and

uonparsed inputs in H C is striking

Table 2 (the meaning of abbreviations is given below the

cable) deals with fragments Z t i s m o s t l y self-

explanatory, as i s t h e absence of dsfiniclons from ¥-F

and T-T (although some abbreviations used there fall in

this category) and the absence o f some other c a t e g o r i e s

f r o m T-T and K-C At lease ~wo c o m a e n t s , however, a r e

n e c e s s a r y The s u r p r i s i n g l y low use o f t e r s e q u e s t i o n s

£n H-C may be a c c o u n t e d f o r by the t e n d e n c y toward a

formal style i n compuCacionnl i n t e r a c t i o n The d e f i n i -

t i o n s used were o f t e n of q u i t e complex c h a r a c t e r ,

a l t h o u g h f a r f e v e r t h a n c o u l d be hoped f o r due

a p p a r e n t l y to l a c k o f f a m i l i a r i t y w i t h t h i s c a p a b i l i t y

The complex c h a r a c t e r of d e f i n i t i o n s u n d o u b t e d l y had

some e f f e c t on t h e l e n g t h of s e n t e n c e s i n t h e H-C mode

TABLE 2

CORE 56 1 • 7

DrY

91 37 o8

67 27.8

, 30 12,4

53 22.0

A b b r e v i a t i o n s

E ( E c h o ) : An e z a c c o r p a r t i a l r e p e t i t i o n o f u s u a l l y

t h e o t h e r s p e a k e r ' s s t r i n g O f t e n an NP, b u t i t may be an e l l i p t i c a l s t r u c t u r e o f v a r i o u s f o r m s ADD (Added ~ n f o r m a t i o u ) : An e l l i p t i c a l s t r u c t u r e ,

o f t e n NP, used to c l a r i f 7 o r c o m p l e t e a p r e v i o u s

u t t e r a n c e , o f t e n ode" s own, e g , " I C doesn" ~: say anything here about weight, or breaking chins, down Except for orushablee.", "It's smaller 36"x20"x17"." Spelling out words was Lncluded

h e r e CORE (Correction): This may be done by either speaker

Tf done b y t h e smm s p e a k e r i t i s r e l a t e d Co f a l s e

s t a r t , b u t s e m a n t i c c o n s i d e r a t i o n s s u g g e s t a

c o r r e c t i o n , e g , "Those a r e 30, ,,h, 48 l e n g t h b y

40 w i d t h by 14 h e i g h t "

COMP (ComoleCion): C o m p l e t i o n o f t h e o t h e r s p e a k e r ' s

u t t e r a n c e , d i s t i n g u i s h e d from i n t e r r u p t i o n by t h e

c o o p e r a t i v e nature of t h e u t t e r a n c e , e.g., "As T've

g o t a l o t o f Z ' v e t o e B: two p a g e s A: Y e a h " SZLY.(Ta~kin S co 0 u e s e l f ~ : M u t t s r i n g s , even to the

p o i n t o f u n d e c i p h e r a b i l i C y , noc i n t e n d e d f o r t h e

o t h e r p e r s o n

TR ( T e r s e r e p l y ) : An e l l i p t i c a l r e p l y , o f t e n NP, e.go, "No.", "Probably meters.", "50 and 7.62."

TQ (Terse OuesCion) : An elliptical q u e s t i o n , often

NP, e g , ' ~ h y ? " , "How a b o u t p y r o t e c h n i c s ? " , ' ~ h i c h

o n e s ? "

TI ( T e r s e Information): A r a t h e r e l u s i v e c a t e g o r y ,

n e i t h e r q u e s t i o n , r e p l y n o r c o - - a n d , an e l l i p t i c a l

s t a t e m e n t b u t one o f t e n r e q u i r i n g an a c t i o n F8 ( F a l s e S t a ~ c ) : These a r e a l s o abandoned u t t e r -

a n c e s , b u t i ~ e d i s t e l y f o l l o w e d by u s u a l l y s y n t a c -

t i c a l l y and s e m a n t i c a l l y r e l a t e d o n e s , e g , "They may, they may be identical c l a s s e s " , '~ell, the height, the next largest height I've got is 34." TRUN (Truncated.): An incomplete utterance, voluntarily

a b a n d o n e d DEF (Definition): E.g., '~0efine: ED: each deck of t h e Almeo."

P (Phatics): The largest subgroup o f fragments whose nets is borrowed from Malinoweki °s tern "phacic colmtmion" with which he referred to chose vocal

u t t e r a n c e s c h a t s e r v e t o establish s o c i a l relations racher than the direct purpose of communication

T h i s t e r m h a s b e e n b r o a d e n e d t o i n c l u d e a l l f r a g -

m e n t s w h i c h h e l p keep t h e c h a n n e l o f c o m m u n i c a t i o n open, such as '~ell", '~aic", and even '~ou Cur-

k a y " Two s u b c a t e g o r i e s o f p h a c i c s a r e :

C ( D i a l o g u e C o n n e c t o r s ) : Words s u c h as " T h e n " ,

"And", "Because" (at t h e beginning of a m e s s a g e or utterance)

T (Tan O u e s c i o n s ) : E g , " T h e y ' r e a l l u n d e r 60,

s e e n " t t h e y ? "

40

Trang 3

AND ERROR ANALYSIS

S y s t e m p e r f o r m a n c e c a n o b v i o u s l y be e v a l u a t e d i n a

number of w a y s , b u t w i t h o u t good r e s p o n s e t i m e m e a n i n g -

f u l e x p e r i m e n t s a r e i m p o s s i b l e W h e n much d a t a i s

i n v o l v e d i n p r o c e s s i n g a d e l a y of a few m i n u t e s can

p r o b a b l y be t o l e r a t e d , b u t t h e v a s t m a j o r i t y of r e q u e s t s

s h o u l d be r e s p o n d e d t o w i t h i n s e c o n d s The l a t t e r was

t h e c a s e i n my e x p e r i m e n t s F a i r l y complex m e s s a g e s of

a b o u t 12 words w e r e r e s p o n d e d to i n a b o u t l 0 s e c o n d s

The system clearly has to be reasonably free of bugs

in my case, 12 bugs were hit in the total of 1615 parsed

and n o n p a r s e d m e s s a g e s The a d e q u a t e e x t e n t of n a t u r a l

l a n g u a g e s y n t a x i s i m p o s s i b l e t o d e t e r m i n e T a b l e 3

shows t h e s y n t a x u s e d by my s u b j e c t s

suspicion of the computer's limitations

An interesting fact to note is that similar results with

r e s p e c t to s y n t a x w e r e o b t a i n e d i n t h e e x p e r ~ n e n t s w i t h USL, the "sister system" of REL developed by IBM Heidel- berg [10] with German used as gLl in two studies of high school students: predominance of wh-questions (317

in total of 451); not many relative clauses (66); com- mands (35); conjunctions (26); quantifiers (15); defini- tions (ii); comparisons (2); yes/no questions (i)

An evaluation which would not include an analysis of unparsed input would at best be of limited value It was shown in Table i that i093 out of 1515 or about ~ o

t h i r d s w e r e p a r s e d i n my e x p e r i m e n t s

TABLE 3 SENTENCE TYPES

Tot~l

882

651

A l l s e n t e n c e s

S i m p l e s e n t e n c e s , e g , " L i s t t h e d e c k s

S e n t e n c e s with p r o n o u n s , e g , ' ~ / h a t is

its length?", "what is in its pyro-

Sentences with quantifier(s), e.g.,

Sentences with conjunctions, e.g "What

is the maxim, stow height and bale

cube of the pyrotechnic locker of the

Sentences with quantifier and conjunc-

tion(s), e.g., "List hatch width and

hatch length of each deck of the Alamo." 13 2.6

S e n t e n c e s w i t h r e l a t i v e c l a u s e , e g ,

Sentences with relative clause (or

related construction) and cemparator,

e.g., "List the ships with a beam less

Sentences with quantifier and relative

clause, e.g., "List height of each

Sentences with quantifier, conjunction

and relative clause, e.g., "List length,

width and height of each content whose

Sentences with quantifiers and comparator,

e g , '~Iow many s h i p s h a v e a beam g r e a t e r

S t a t e m e n t s ( d a t a a d d i t i o n ) 5 0

C o n s i d e r i n g t h e w i d e r a n g e of R k'r- s y n t a x [ 7 ] , t h e p a u -

c i t y of complex s e n t e n c e s i s s u r p r i s i n g The u s e of

d e f i n i t i o n s w h i c h o f t e n i n v o l v e d complex c o n s t r u c t i o n s

( r e l a t i v e c l a u s e s , c o n j u n c t i o n s , even q u a n t i f i e r s ) had a

d e f i n i t e i n f l u e n c e So d i d , u n d o u b t e d l y , t h e t a s k

s i t u a t i o n c a u s i n g o p t i m i z a t i o n of work m e t h o d s The

i n f l u e n c e of t h e s p e c i f i c n a t u r e of t h e t a s k would

r e q u i r e a d d i t i o n a l s t u d i e s , b u t t h e s p e c i a l d e v i c e p r o -

v i d e d by t h e s y s t e m ( a l o a d i n g prompt s e q u e n c e - - which

was n o t a n a l y z e d ) was employed by e v e r y s u b j e c t Dew-

i c e s such as t h e s e o b v i o u s l y a r e a g r e a t a i d i n accom-

p l i s h i n 8 t a s k s They s h o u l d be t e s t e d e x t e n s i v e l y t o

d e t e r m i n e how t h e y can augment t h e u a t u r a l n e s s of NLIs

O t h e r r e a s o n s f o r t h e r e l a t i v e l y s i m p l e s y n t a x u s e d w e r e

s p e c i a l strategies: paraphrasing into s i m p l e r s y n t a x

even t h o u g h a s e n t e n c e d i d n o t p a r s e f o r o t h e r r e a s o n s ;

"SUCCesS strategy" resulting in repetitious simple

TABLE 4

T a b l e 4 s t ~ _ e r i z e s t h e c a t e g o r i e s of e r r o r s The

p r e d o m i n a n c e of v o c a b u l a r y i s n o t s u r p r i s i n g , b u t r e l a -

t i v e l y few s y n t a c t i c e r r o r s a r e In p a r t t h i s may be due to t h e method of s c o r i n g i n w h i c h e r r o r s w e r e

c o u n t e d o n l y o n c e , so i f a s e n t e n c e c o n t a i n e d an unknown

v o c a b u l a r y i t e m ( e g "On what d e c k s of t h e Alamo

c a r g o be s t o r e d ? " ) b u t would h a v e f a i l e d on s y a t a c t i c

g r o u n d s as w e l l , i t would f a l l i n t h e v o c a b u l a r y

c a t e g o r y A c o m p a r i s o n can be made h e r e w i t h D a m e r a u ' s

s t u d y I l l ] of t h e u s e of t h e ll~A s y s t e m by t h e c i t y

p l a n n i n S d e p a r t m e n t i n White P l a i n s , a t l e a s t w i t h

r e g a r d to t h e t o t a l of q u e r i e s t o t h o s e c o m p l e t e d : 788

t o 513 So, a g a i n , r o u g h l y t~ao t h i r d s w e r e p a r s e d I n

o t h e r c a t e g o r i e s " p a r s i n S f a i l u r e " i s 147, " l o o k u p failures" 119, "nothing in data base" 61, "program error" 39, but this only points to the general difficul- ties of comparisons of system performance

SOME CONCLUSIONS

Norm Sondheimer suggested some questions we might try to answer What has been learned about user needs? What most important linguistic phenomena to allOW for? What other kinds of interactions? Error analysis points in the obvious directions of user needs, and so do the types of sentences employed While it is justified to quit the search for an almost perfect grnmm,r, it would

be a mistake to constrain it to the constructions used Improved naturalness can be achieved with diagnostics, definitions, and devices geared to specific tasks such

o b j e c t i v e m e a s u r e m e n t i s p r o b a b l y i m p o s s i b l e , b u t t h e

p e r c e n t a g e of r e q u e s t s p r o c e s s e d m i g h t g i v e some i d e a

I n t h e c a s e of a t a s k s i t u a t i o n such as l o a d i n g c a r g o

i t e m s , t h e p e r c e n t a g e of t a s k c o m p l e t i o n may s i g n a l b o t h

response times are a very important measure The ques- tionnaire method can and has been used (in the case of

MT and USL), but as yet there is too little experience

to measure user satisfaction Users seem very good at adapting to systems They paraphrase, use success stra- tegy, simplify syntax, use special devices what they really do is maximize their performance with respect Co

a given task

41

Trang 4

important Co know what to look for, therefore the need

for good knowledge of human to hmnan discourse Good

system response times are a sine qua non Controlled

e x p e r i m e n t s have the a d v a n t a g e of b e i n g r e p l i c a b l e , a

c r u c i a l f a c t o r i n a r r i v i n g ac e v a l u a t i o n c r i t e r i a

D e t e r m i n i n g u s e r b i a s and e x p e r i e n c e nay be i m p o r t a n t ,

b u t even more so £s u s e r t r a i n i n g C o n t r o l l e d e x p e r i -

ments can show what methods a r e ~ o s t e f f e c t i v e ( e g a

manual or study of proCocols~) Study of u s e r commence

- - p h a c i c m a t e r i a l - - g i v e s some m e a s u r e o f u s e r

( d i s ) s a t i s f a c t i o n ( I have s e e n '"/ou l i e , " buc I have yeC

to see "Good boy, y o u Z " ) C l e a r l y , t h e b e s t i n d i c a t i o n

of u s e r s a t i s f a c t i o n i s w h e t h e r he o r she u s e s t h e s y s -

tem a g a i n E x t e n s i v e I o n S - t e r m s t u d i e s a r e needed f o r

t h a t

What s h o u l d t h e f u t u r e l o o k l i k e ? Task o r i e n t e d s i t u a -

t i o n s seem to be a p r o m i s i n g e n v i r o o m e n t f o r ~LZ The

s t a n d a r d s of NL s y s t e m s p e r f o r m a n c e w i l l be s e t by the

u s e r s F u t u r e e v a l u a t i o n s ? As A n t o i n e de Sainc-Zxup&r7

w r o t e , "As f o r the F u t u r e , your t a s k i s n o t to f o r e s e e ,

b u t to e n a b l e i t "

i H a r r i s , L a r r y E " P r o s p e c t s of P r a c t i c a l N a t u r a l Language S y s t e m s " P r o c e e d i n g s of the 18th Annual Meetin~ o f the A s s o c i a t i o n f o r Computationa~ Linguistics, June 1980, p 129

Z H e n i s z - D o s t e r C , B.; Macdonald, R E ; and Z a r e c h - rusk, M Machine Translation The Hague: Mouton,

1979

3 Sondheimer, N K "Evaluation of Natural Language

I n t e r f a c e s to Data Base S y s t e m s " P r o c e e d i n g s o (

t h e 19th Annual Meecin~ o f t h e A s s o c i a t i o n f o r Com-

p u t a t i o n a l L i n g u i s t i c s , J u n e 1981

4 M o s t e l l e r , F " ~ n n o v a t i o n and E v a l u a t i o n " S c i e n c e ( F e b r u a r y 27, 1 9 8 1 ) : 8 8 1 - 8 8 6

5 Thompson, F B and Thompson, Boaena H "?tactical Natural Language Processing: The EEL System as Prototype." In Advances in Computers, ed M Rubi-

n o f f and M C Y o v i t s Yol 13 New York: Academic P r e s s , 1975

6 Thompson, BozenaH and Thompson, F B "Rapidly Extendable Natural Language." Proceedings of the

1978 Nationa~ Conference of the ACM, pp 173-182

7 Thompson, Bozena H REL English for the User Pasadena: California Institute of Technology, 1978

8 Thompson, Bozena H "Linguistic Analysis of Natural Language Co ,unication rich Computers." COLING 80: Proceedings of the gCh Internationa~ Conference on Computariona~ Linguistics, Tokyo, October 1980, pp 190-201

9 Thompson, Bozeua H and Thompson, F B "Shifting

t o a H i g h e r Gear i n a H a t u r a l Language S y s t e m "

P r o c e e d i n z s of t h e Nat~ona~ Computer C o n f e r e n c e , May 1981

10 Lehmann, H u b e r t ; OCt, N i k o l a u e ; Z o e p p r i ~ z , Mag-

d a l e n e ' ~ s e r E x p e r i m e n t s w i t h N a t u r a l Language for DaCe Base Access." COLING 78: ProceedinRs of ch~ 7oh International Conference on Computational Linguistics Bergen, August 1978

Ii Oamtrau, Fred J The Transformational ~uestion Answ~rin~ ~T~A~ System: Operational Statistics -

1978 EC 7739 Yorktown Heights: IBM T J Watson research Center, June 1979

42

Ngày đăng: 17/03/2014, 19:20

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN