navy ships by a group in San Diego, Cal- ifornia.. An evaluation which would not include an analysis of unparsed input would at best be of limited value.. , Sentences with relative claus
Trang 1Bozena Henisz Thompson California Institute of Technology
INTEODUCT~ON
I s e v a l u a t i o n , l i k e b e a u t y , i n t h e eye of t h e b e h o l d e r ?
The a n s w e r i s f a r from s i m p l e b e c a u s e i t d e p e n d s on who
i s c o n s i d e r e d t o be t h e p r o p e r b e h o l d e r E v a l u a c o r s may
r a n g e from c a s u a l u s e r s to s o c i e t y a s a w h o l e , w i t h s y s -
tem builders, sophisticated users, linguists, grant pro-
viders, s y s t e m b u y e r s , and o t h e r s i n b e t w e e n The
members of t h l s p a n e l a r e s y s t e m b u i l d e r s and l i n g u i s t s
- - o r r a t h e r t h e t~ao f u s e d i n t o one - - b u t , I b e l i e v e ,
i n t e r e s t e d i n a l l o r a l m o s t a l l a c t u a l o r p o t e n t i a l
b o d i e s of e v a l u a t o r s One of o u r c o l l e a g u e s e x p r e s s e d a
f o r c e f u l o p i n i o n w h i l e b e i n g a member of a s i m i l a r p a n e l
a t l a s t y e a r ' s ACL c o n f e r e n c e : "Those of us on t h i s
p a n e l and o t h e r r e s e a r c h e r s i n t h e f i e l d s i m p l y d o n ' t
have t h e r i g h t to d e t e r m i n e w h e t h e r a s y s t e m i s p r a c t i -
c a l Only t h e u s e r s of such a s y s t e m c a n make Chat
d e t e r m i n a t i o n Only a u s e r can d e c i d e w h e t h e r t h e hi
[ n a t u r a l l a n g u a g e ] c a p a b i l i t y c o n s t i t u t e s s u f f i c i e n t
added v a l u e to be deemed p r a c t i c a l Only a u s e r can
d e c i d e i f t h e s y s t e m ' s f r e q u e n c y of i n a p p r o p r i a t e
r e s p o n s e i s s u f f i c i e n t l y low t o be deemed p r a c t i c a l
Only a u s e r can d e c i d e w h e t h e r t h e o v e r a l l NL i n t e r a c -
t i o n , t a k e n i n t o t o , o f f e r s enough b e n e f i t s o v e r a l t e r -
n a t i v e f o r m a l i n t e r a c t i o n s to be deemed p r a c t i c a l " I l l
I t i s h a r d f o r me co d i s a g r e e , s i n c e I a r g u e d a s f o r c e -
f u l l y on t h e b a s i s of my s t u d y of u s e r s * e v a l u a t i o n of
m a c h i n e t r a n s l a t i o n [2] - - a s t u d y w h i c h was prompted by
t h e e v a l u a t i o n s of t h e q u a l i t y of m a c h i n e t r a n s l a t i o n a s
v i e w e d by l i n g u i s t s and u s e r s , r a n g i n g from 35Z a c c e p t -
a b l e f o r t h e f o r m e r t o 90Z f o r t h e l a t t e r Whet t h e
s t u d y a l s o showed was c h a t t h e p r a c t i c a l i t y of t h e o u t -
p u t c o u l d i n d e e d o n l y be j u d g e d by t h e u s e r s , s i n c e even
i n c o m p l e t e and s t y l i s t i c a l l y v e r y i n e l e g a n t t r a n s l a t i o n s
w e r e found q u i t e u s e f u l i n p r a c t i c e b e c a u s e t h e y , on t h e
one h a n d , p r o v i d e d , however c r u d e l y , t h e i n f o r m a t i o n
s o u g h t by t h e u s e r s , and, on t h e o t h e r hand, t h e u s e r s
t h e m s e l v e s b r o u g h t k n o w l e d g e c h a t made t h e t e x t s f a r
more u n d e r s t a n d a b l e and u s e f u l t h e n m i g h t a p p e a r co a
n o n s p e c i a l i s t l i n g u i s t But t h i s e n d o r s e m e n t on mY p e r t
of t h e u s e r a~ t h e u l t i m a t e j u d g e i n e v a l u a t i o n s d o e s
n o t p r e c l u d e my f u l l y s u b s c r i b i n g co Norm S o n d h e i m e r ' s
[3] i n t r o d u c t o r y c o ~ e n t s co t h i s p a n e l s t a t i n g t h a t to
"make p r o g r e s s as a f i e l d , we need t o be a b l e Co e v a l u -
a t e " We a r e now l e s s l i k e l y co c o n f u s e t h e i s s u e o f t h e
e v a l u a t i o n by p e o p l e l i k e o u r s e l v e s and t h e j u d g m e n t of
t h e u s e r s , l e s s l i k e l y t o be s u r p r i s e d a t t h e d i s c r e p a n -
c i e s , and less likely to be surprised at the users"
Also, we are far more aware of the fact chac evaluations
of '~orth" or "quality" have Co be conducted in the con-
t e x t s of t h e a c t u a l , p e r c e i v e d n e e d s Zn e x t e n s i v e s t u -
d i e s on e v a l u a t i o n of i n n o v a t i o n s , M o s t e l l e r [ 4 ] , t h e
r e c e n t l y r e t i r e d p r e s i d e n t of AAAS, found t h a t " s u c c e s s -
f u l i n n o v a t o r s b e t t e r u n d e r s t a n d u s e r n e e d s ; [and] pay
more a t t e n t i o n t o m a r k e t i n g " The same s o u r c e ,
however, l e a d s me co t h e n o t o r i o u s d i f f i c u l t i e s of
e v a l u a t i o n g i v e n t h e v i d e r a n g e of e v a l u a C o r s and t h e i r
p u r p o s e s We a r e a l l u n d o u b t e d l y c o n v i n c e d o f t h e v a l u e
of NLI f o r t h e s o c i e t y as a w h o l e , b u t t h e e v a l u a t i o n of
e x p e r i m e n t s w i t h t h e s e i n t e r f a c e s i s a n o t h e r m a t t e r
M o s c e l l e r was f a c e d w i t h s o c i a l , s o c i o m e d i c a l , and m e d i -
c a l f i e l d s L e t me r e c o u n t some of t h e s t u d i e s he and
h i s team made f o r r e a s o n s w h i c h w i l l soon become o b v i -
o u s His teem s c o r e d a g i v e n program on a s c a l e from
plus ~wo Co minus ~wo with zero meaning there was essen-
tially uo g a i n A c c o r d i n g l y , a s t u d y of d e l i n q u e n t
girls that identified th ~- buc failed to prevent them
from d e l i n q u e n c y r e c e i v e d a z e r o L i k e w i s e , a z e r o was
used: (I) no treatment, (2) an alcoholic clinic, and
( 3 ) A l c o h o l i c s Anonymous S i n c e t h e "no t r e a t m e n t "
g r o u p p e r f o r m e d somewhat b e t t e r , s h o r t - t e r m r e f e r r a l s
w e r e c o n s i d e r e d of no v a l u e A minus one was g i v e n to a
s t u d y whose r e s u l t s w e r e o p p o s i t e co t h o s e hoped f o r : a
m a j o r i n s u r a n c e cOmpany i n c r e a s e d o u t p a t i e n t b e n e f i t s i n
t h e hope of d e c r e a s i n g h o s p i t a l c o s t s , b u t t h e o u t p a -
t i e n t g r o u p ' s h o s p i t a l s t a y s i n c r e a s e d F i n a l l y , a d o u -
b l e p l u s was swarded t o an e x p e r i m e n t i n v o l v i n g t h e S a l k
v a c c i n e , w h i c h w a s , p r e d i c t a b l y , v e r y s u c c e s s f u l Now
t h i s k i n d of e v a l u a t i o n may be j u s t i f i e d when t h e n e e d s
of t h e s o c i e t y a r e a t s t a k e I h a v e gone i n t o t h e s e
d e t a i l s , h o w e v e r , f o r t h e p u r p o s e of e x p r e s s i n g t h e
o p i n i o n , i n w h i c h I know I ' m n o t a l o n e , t h a t n e l a t i v e
r e s u l t s a r e a s i m p o r t a n t as p o s i t i v e o n e s , t h a t e v a l u a -
t i o n i n o u r c a s e i s a l m o s t e q u i v a l e n t to t h e amount of
i n f o r m a t i o n o b t a i n e d i n an e x p e r i m e n t An e x p e r i m e n t whose r e s u l t s would be t o t a l l y p r e d i c t a b l e would be
a l m o s t u s e l e s s , b u t one w i t h r e s u l t s d i f f e r e n t frOm
t h o s e hoped f o r m i g h t be e m b a r r a s s i n g b u t v e r y v a l u a b l e
A n o t h e r c ~ e n t prompted by t h o s e e v a l u a t i o n s i s c h a t
inappropriate in the case of NLI evaluations
NLI EVALUATIONS
A METHODOLOGY AND SOME RESULTS
I t had b e e n w i d e l y t a k e n f o r g r a n t e d some t i m e ago Chat l~LI i s a s good a s i s i t s g r - ~ - r , and a grammar i s a s good a s i t i s e x t e n s i v e The s p e c i f i c n e e d s of u s e r s ,
t h e r e q u i r e m e n t s o f s p e c i a l t a s k s and t h e l i k e cook a
b a c k s e a t The n a t u r e of ht an d i s c o u r s e was y e t t o be
e x p l o r e d H a p p i l y , we h a v e b e e n i n a d i f f e r e n t s i t u a -
t i o n for some time When the REL [5, 5, 7] system was getting into • reasonably sturdy shape with respect to
s p e e d and b u s s , I s t a r t e d p l a n n i n g e x p e r i m e n t s t o t e s t
i t There y e s i m p o r t a n t l i t e r a t u r e a b o u t d i s c o u r s e ,
e s p e c i a l l y i n s o c i o l o g y , s u c h a s t h e work of S c h e g l o f f
I t was t h u s c l e a r t h a t s u c c e s s f u l NLI e x p e r i m e n t s had Co
be b a s e d on k n o w l e d g e of h i , a n d i s c o u r s e St was a l s o
c l e a r c h a t t h a t was t h e way Co make t h e i n t e r f a c e more
n a t u r a l T h i s a s s ~ p t i o n h a s a l r e a d y been f r u i t f u l :
t h e NL i n t e r f a c e i n POL [ 9 ] , a s u c c e s s o r Co REL, h a s
a l r e a d y b e e n e x t e n s i v e l y i m p r o v e d a s a r e s u l t of t h e
E E L - r e l a t e d e x p e r i m e n t s
E x p e r i m e n t s w e r e made i n t h r e e modes: i n a d d i t i o n t o
f a c e - t o - f a c e and h u m a n - t o - c o ~ p u t e r , c e r a i n a l - c o - t e r m i n a l
c o m m u n i c a t i o n was e x a m i n e d , s i n c e a t p r e s e n t c h a t i s t h e
o n l y p r a c t i c a l mode of a c c e s s i n g t h e c o m p u t e r Through
e a r l y 1980, Over 80 s u b j e c t s , 8 0 , 0 0 0 w o r d s , and o v e r 50
h o u r s w e r e a n a l y z e d i n g r e a t d e t a i l I n t h e f a l l o f
1980, a n o t h e r 13 s u b j e c t s w e r e t e s t e d i n t h e c o m p u t a -
t i o n a l mode o n l y , a d d i n g a p p r o x i m a t e l y 20 h o u r s From
t h e s t a r t , t h e e x p e r i m e n t s w e r e e n c o u r a g i n g , a l t h o u g h
l i m i t e d t o ~wo modes: F-F and T-T I n t e r a c t i o n s n o t
o n l y showed a g r e a t d e a l of s t r u c t u r e b u t e x t e n s i v e
s i m i l a r i t i e s i n b o t h modes, t h e most i m p o r t a n t b e i n g t h e
c o n s t a n c y of t h e nt=aber of words i n s e n t e n c e s ( a b o u t 70Z); t h e l e n g t h of s e n t e n c e s ( a b o u t 7 w o r d s ) ; t h e
e x i s t e n c e of f r a g m e n t s (70Z of m e s s a g e s i n F-F and 50Z
in T-T containing them); and phatics (10Z of total for
=odes were a candidate for consideration in experiments
in the computational mode, the T-T mode being seemingly quite far removed from natural F-F The sentence having historically been the unit of analysis (and since phat-
i t s were considered of lesser Lmportance from the compu- tational v i ~ , although of great interest in general),
m 7 a t t e n t i o n t u r n e d Co f r a g m e n t s REL a l l o w e d f o r t h r e e
n o n - s e n t e n c e t y p e s t r u c t u r e s : "NP?" ( i n c l u d i n g number
p a r s e d i n t o NP); " a l l / n o n e or uomber" a n s w e r s ; and
39
Trang 2s i b l e to i n c l u d e i n d i v i d u a l knowledge and t e r m i n o l o g y
The a n a l y s i s of F-F and T-T p r o t o c o l s , however, showed
t h e e x i s t e n c e of o t h e r f r a g m e n t c a t e g o r i e s , f i n a l l y
a n a l y z e d ~ n c o a dozen c a t e g o r i e s ( s e e [ 8 ] ) S i n c e t h e y
c o n s t i t u t e a c o n s i d e r a b l e amount of F-F c o n v e r s a t i o n s
and even T-T p r o t o c o l s , t h e y c l e a r l y had co be w a t c h e d
f o r i n c o m p u t a t i o n a l e x p e r i m e n t s
The e x p e r i m e n t s f o r a c t u a l l y o b s e r v i n ~ u s e r - s y s t e m
i n t e r a c t i o n w e r e c o n d u c t e d i n t h e w i n t e r Cem o f 1979/80
and produced 21 p r o t o c o l s , t h e a n a l y s i s o f which was
compared w i t h r e s u l t s of e i g h t F-F and fou~ T-T e x p e r i -
m e n t s A n o t h e r 13 c o m p u t a t i o n a l e x p e r i m e n t s done i n t h e
fall c o u f i m e d the results o f the earlier ones The
Cask i n a l l t h r e e =odes was a r e a l o n e : l o a d i n g c a r g o
o n t o a s h i p , t h e d a t a coming from t h e a c t u a l e n v i r o o m e n t
of loading U.S navy ships by a group in San Diego, Cal-
ifornia In the F-F and T-T e x p e r i m e n t s , ~n,~o persons
were involved one given cargo item~ Co be loaded, the
other infot~nation about decks (details in [8]) In the
computational mode (H-C) the ship data was in ~he com-
puter and the l i s t of c a r g o t o be l o a d e d was handed Co
the subjects, all with Caltech background Details
b e i n g a v a i l a b l e e l s e w h e r e a n d s p a c e l i m i t e d h e r e , o n l y
some m a j o r r e s u l t s a r e g i v e n h e r e T a b l e 1 shows t h e
c o m p a r i s o n of t h e t h r e e modes
TABLE 1
S e n t e n c e l e n g t h 6 8 6.I 7 8
F r e q u e n t l e n g t h 2.7 2 8 2 8
Z words i n s e n t e n c e s 68.8 7 2 8 8 9 3
Z w o r d s i n f r a g m e n t s 17.2 2 1 1 10.7
Toca~ AvR ~ota~ Avt ToCa~ Ave,
P a r s e d & n o n p a r s e d 1615 77
S e n t e n c e s 5302 663 385 77 882 42
Phatics (including
c o n n e c t o r s & t a g s ) 48A2 605 148 37 46 2
T o t a l ~ o t a [ T o t a l Words i n m e s s a g e s 49800 3285 8525
Words i n s e n t e n c e s 34266 2393 6880
Words i n f r a ~ e n c s 8584 694 823
As can be s e e n , s e v e r a l s t a t i s t i c s show s i a i l a r i t i a s :
s e n t e n c e l e n g t h , message l e n g t h , f r a g m e n t l e n g t h , p e r -
c e n t a g e of w o r d s i n s e n t e n c e s and f r a g m e n t s The c l o s e -
ness o f t h e a v e r a g e o f m e s s a g e s i n T-T and p a r s e d and
uonparsed inputs in H C is striking
Table 2 (the meaning of abbreviations is given below the
cable) deals with fragments Z t i s m o s t l y self-
explanatory, as i s t h e absence of dsfiniclons from ¥-F
and T-T (although some abbreviations used there fall in
this category) and the absence o f some other c a t e g o r i e s
f r o m T-T and K-C At lease ~wo c o m a e n t s , however, a r e
n e c e s s a r y The s u r p r i s i n g l y low use o f t e r s e q u e s t i o n s
£n H-C may be a c c o u n t e d f o r by the t e n d e n c y toward a
formal style i n compuCacionnl i n t e r a c t i o n The d e f i n i -
t i o n s used were o f t e n of q u i t e complex c h a r a c t e r ,
a l t h o u g h f a r f e v e r t h a n c o u l d be hoped f o r due
a p p a r e n t l y to l a c k o f f a m i l i a r i t y w i t h t h i s c a p a b i l i t y
The complex c h a r a c t e r of d e f i n i t i o n s u n d o u b t e d l y had
some e f f e c t on t h e l e n g t h of s e n t e n c e s i n t h e H-C mode
TABLE 2
CORE 56 1 • 7
DrY
91 37 o8
67 27.8
, 30 12,4
53 22.0
A b b r e v i a t i o n s
E ( E c h o ) : An e z a c c o r p a r t i a l r e p e t i t i o n o f u s u a l l y
t h e o t h e r s p e a k e r ' s s t r i n g O f t e n an NP, b u t i t may be an e l l i p t i c a l s t r u c t u r e o f v a r i o u s f o r m s ADD (Added ~ n f o r m a t i o u ) : An e l l i p t i c a l s t r u c t u r e ,
o f t e n NP, used to c l a r i f 7 o r c o m p l e t e a p r e v i o u s
u t t e r a n c e , o f t e n ode" s own, e g , " I C doesn" ~: say anything here about weight, or breaking chins, down Except for orushablee.", "It's smaller 36"x20"x17"." Spelling out words was Lncluded
h e r e CORE (Correction): This may be done by either speaker
Tf done b y t h e smm s p e a k e r i t i s r e l a t e d Co f a l s e
s t a r t , b u t s e m a n t i c c o n s i d e r a t i o n s s u g g e s t a
c o r r e c t i o n , e g , "Those a r e 30, ,,h, 48 l e n g t h b y
40 w i d t h by 14 h e i g h t "
COMP (ComoleCion): C o m p l e t i o n o f t h e o t h e r s p e a k e r ' s
u t t e r a n c e , d i s t i n g u i s h e d from i n t e r r u p t i o n by t h e
c o o p e r a t i v e nature of t h e u t t e r a n c e , e.g., "As T've
g o t a l o t o f Z ' v e t o e B: two p a g e s A: Y e a h " SZLY.(Ta~kin S co 0 u e s e l f ~ : M u t t s r i n g s , even to the
p o i n t o f u n d e c i p h e r a b i l i C y , noc i n t e n d e d f o r t h e
o t h e r p e r s o n
TR ( T e r s e r e p l y ) : An e l l i p t i c a l r e p l y , o f t e n NP, e.go, "No.", "Probably meters.", "50 and 7.62."
TQ (Terse OuesCion) : An elliptical q u e s t i o n , often
NP, e g , ' ~ h y ? " , "How a b o u t p y r o t e c h n i c s ? " , ' ~ h i c h
o n e s ? "
TI ( T e r s e Information): A r a t h e r e l u s i v e c a t e g o r y ,
n e i t h e r q u e s t i o n , r e p l y n o r c o - - a n d , an e l l i p t i c a l
s t a t e m e n t b u t one o f t e n r e q u i r i n g an a c t i o n F8 ( F a l s e S t a ~ c ) : These a r e a l s o abandoned u t t e r -
a n c e s , b u t i ~ e d i s t e l y f o l l o w e d by u s u a l l y s y n t a c -
t i c a l l y and s e m a n t i c a l l y r e l a t e d o n e s , e g , "They may, they may be identical c l a s s e s " , '~ell, the height, the next largest height I've got is 34." TRUN (Truncated.): An incomplete utterance, voluntarily
a b a n d o n e d DEF (Definition): E.g., '~0efine: ED: each deck of t h e Almeo."
P (Phatics): The largest subgroup o f fragments whose nets is borrowed from Malinoweki °s tern "phacic colmtmion" with which he referred to chose vocal
u t t e r a n c e s c h a t s e r v e t o establish s o c i a l relations racher than the direct purpose of communication
T h i s t e r m h a s b e e n b r o a d e n e d t o i n c l u d e a l l f r a g -
m e n t s w h i c h h e l p keep t h e c h a n n e l o f c o m m u n i c a t i o n open, such as '~ell", '~aic", and even '~ou Cur-
k a y " Two s u b c a t e g o r i e s o f p h a c i c s a r e :
C ( D i a l o g u e C o n n e c t o r s ) : Words s u c h as " T h e n " ,
"And", "Because" (at t h e beginning of a m e s s a g e or utterance)
T (Tan O u e s c i o n s ) : E g , " T h e y ' r e a l l u n d e r 60,
s e e n " t t h e y ? "
40
Trang 3AND ERROR ANALYSIS
S y s t e m p e r f o r m a n c e c a n o b v i o u s l y be e v a l u a t e d i n a
number of w a y s , b u t w i t h o u t good r e s p o n s e t i m e m e a n i n g -
f u l e x p e r i m e n t s a r e i m p o s s i b l e W h e n much d a t a i s
i n v o l v e d i n p r o c e s s i n g a d e l a y of a few m i n u t e s can
p r o b a b l y be t o l e r a t e d , b u t t h e v a s t m a j o r i t y of r e q u e s t s
s h o u l d be r e s p o n d e d t o w i t h i n s e c o n d s The l a t t e r was
t h e c a s e i n my e x p e r i m e n t s F a i r l y complex m e s s a g e s of
a b o u t 12 words w e r e r e s p o n d e d to i n a b o u t l 0 s e c o n d s
The system clearly has to be reasonably free of bugs
in my case, 12 bugs were hit in the total of 1615 parsed
and n o n p a r s e d m e s s a g e s The a d e q u a t e e x t e n t of n a t u r a l
l a n g u a g e s y n t a x i s i m p o s s i b l e t o d e t e r m i n e T a b l e 3
shows t h e s y n t a x u s e d by my s u b j e c t s
suspicion of the computer's limitations
An interesting fact to note is that similar results with
r e s p e c t to s y n t a x w e r e o b t a i n e d i n t h e e x p e r ~ n e n t s w i t h USL, the "sister system" of REL developed by IBM Heidel- berg [10] with German used as gLl in two studies of high school students: predominance of wh-questions (317
in total of 451); not many relative clauses (66); com- mands (35); conjunctions (26); quantifiers (15); defini- tions (ii); comparisons (2); yes/no questions (i)
An evaluation which would not include an analysis of unparsed input would at best be of limited value It was shown in Table i that i093 out of 1515 or about ~ o
t h i r d s w e r e p a r s e d i n my e x p e r i m e n t s
TABLE 3 SENTENCE TYPES
Tot~l
882
651
A l l s e n t e n c e s
S i m p l e s e n t e n c e s , e g , " L i s t t h e d e c k s
S e n t e n c e s with p r o n o u n s , e g , ' ~ / h a t is
its length?", "what is in its pyro-
Sentences with quantifier(s), e.g.,
Sentences with conjunctions, e.g "What
is the maxim, stow height and bale
cube of the pyrotechnic locker of the
Sentences with quantifier and conjunc-
tion(s), e.g., "List hatch width and
hatch length of each deck of the Alamo." 13 2.6
S e n t e n c e s w i t h r e l a t i v e c l a u s e , e g ,
Sentences with relative clause (or
related construction) and cemparator,
e.g., "List the ships with a beam less
Sentences with quantifier and relative
clause, e.g., "List height of each
Sentences with quantifier, conjunction
and relative clause, e.g., "List length,
width and height of each content whose
Sentences with quantifiers and comparator,
e g , '~Iow many s h i p s h a v e a beam g r e a t e r
S t a t e m e n t s ( d a t a a d d i t i o n ) 5 0
C o n s i d e r i n g t h e w i d e r a n g e of R k'r- s y n t a x [ 7 ] , t h e p a u -
c i t y of complex s e n t e n c e s i s s u r p r i s i n g The u s e of
d e f i n i t i o n s w h i c h o f t e n i n v o l v e d complex c o n s t r u c t i o n s
( r e l a t i v e c l a u s e s , c o n j u n c t i o n s , even q u a n t i f i e r s ) had a
d e f i n i t e i n f l u e n c e So d i d , u n d o u b t e d l y , t h e t a s k
s i t u a t i o n c a u s i n g o p t i m i z a t i o n of work m e t h o d s The
i n f l u e n c e of t h e s p e c i f i c n a t u r e of t h e t a s k would
r e q u i r e a d d i t i o n a l s t u d i e s , b u t t h e s p e c i a l d e v i c e p r o -
v i d e d by t h e s y s t e m ( a l o a d i n g prompt s e q u e n c e - - which
was n o t a n a l y z e d ) was employed by e v e r y s u b j e c t Dew-
i c e s such as t h e s e o b v i o u s l y a r e a g r e a t a i d i n accom-
p l i s h i n 8 t a s k s They s h o u l d be t e s t e d e x t e n s i v e l y t o
d e t e r m i n e how t h e y can augment t h e u a t u r a l n e s s of NLIs
O t h e r r e a s o n s f o r t h e r e l a t i v e l y s i m p l e s y n t a x u s e d w e r e
s p e c i a l strategies: paraphrasing into s i m p l e r s y n t a x
even t h o u g h a s e n t e n c e d i d n o t p a r s e f o r o t h e r r e a s o n s ;
"SUCCesS strategy" resulting in repetitious simple
TABLE 4
T a b l e 4 s t ~ _ e r i z e s t h e c a t e g o r i e s of e r r o r s The
p r e d o m i n a n c e of v o c a b u l a r y i s n o t s u r p r i s i n g , b u t r e l a -
t i v e l y few s y n t a c t i c e r r o r s a r e In p a r t t h i s may be due to t h e method of s c o r i n g i n w h i c h e r r o r s w e r e
c o u n t e d o n l y o n c e , so i f a s e n t e n c e c o n t a i n e d an unknown
v o c a b u l a r y i t e m ( e g "On what d e c k s of t h e Alamo
c a r g o be s t o r e d ? " ) b u t would h a v e f a i l e d on s y a t a c t i c
g r o u n d s as w e l l , i t would f a l l i n t h e v o c a b u l a r y
c a t e g o r y A c o m p a r i s o n can be made h e r e w i t h D a m e r a u ' s
s t u d y I l l ] of t h e u s e of t h e ll~A s y s t e m by t h e c i t y
p l a n n i n S d e p a r t m e n t i n White P l a i n s , a t l e a s t w i t h
r e g a r d to t h e t o t a l of q u e r i e s t o t h o s e c o m p l e t e d : 788
t o 513 So, a g a i n , r o u g h l y t~ao t h i r d s w e r e p a r s e d I n
o t h e r c a t e g o r i e s " p a r s i n S f a i l u r e " i s 147, " l o o k u p failures" 119, "nothing in data base" 61, "program error" 39, but this only points to the general difficul- ties of comparisons of system performance
SOME CONCLUSIONS
Norm Sondheimer suggested some questions we might try to answer What has been learned about user needs? What most important linguistic phenomena to allOW for? What other kinds of interactions? Error analysis points in the obvious directions of user needs, and so do the types of sentences employed While it is justified to quit the search for an almost perfect grnmm,r, it would
be a mistake to constrain it to the constructions used Improved naturalness can be achieved with diagnostics, definitions, and devices geared to specific tasks such
o b j e c t i v e m e a s u r e m e n t i s p r o b a b l y i m p o s s i b l e , b u t t h e
p e r c e n t a g e of r e q u e s t s p r o c e s s e d m i g h t g i v e some i d e a
I n t h e c a s e of a t a s k s i t u a t i o n such as l o a d i n g c a r g o
i t e m s , t h e p e r c e n t a g e of t a s k c o m p l e t i o n may s i g n a l b o t h
response times are a very important measure The ques- tionnaire method can and has been used (in the case of
MT and USL), but as yet there is too little experience
to measure user satisfaction Users seem very good at adapting to systems They paraphrase, use success stra- tegy, simplify syntax, use special devices what they really do is maximize their performance with respect Co
a given task
41
Trang 4important Co know what to look for, therefore the need
for good knowledge of human to hmnan discourse Good
system response times are a sine qua non Controlled
e x p e r i m e n t s have the a d v a n t a g e of b e i n g r e p l i c a b l e , a
c r u c i a l f a c t o r i n a r r i v i n g ac e v a l u a t i o n c r i t e r i a
D e t e r m i n i n g u s e r b i a s and e x p e r i e n c e nay be i m p o r t a n t ,
b u t even more so £s u s e r t r a i n i n g C o n t r o l l e d e x p e r i -
ments can show what methods a r e ~ o s t e f f e c t i v e ( e g a
manual or study of proCocols~) Study of u s e r commence
- - p h a c i c m a t e r i a l - - g i v e s some m e a s u r e o f u s e r
( d i s ) s a t i s f a c t i o n ( I have s e e n '"/ou l i e , " buc I have yeC
to see "Good boy, y o u Z " ) C l e a r l y , t h e b e s t i n d i c a t i o n
of u s e r s a t i s f a c t i o n i s w h e t h e r he o r she u s e s t h e s y s -
tem a g a i n E x t e n s i v e I o n S - t e r m s t u d i e s a r e needed f o r
t h a t
What s h o u l d t h e f u t u r e l o o k l i k e ? Task o r i e n t e d s i t u a -
t i o n s seem to be a p r o m i s i n g e n v i r o o m e n t f o r ~LZ The
s t a n d a r d s of NL s y s t e m s p e r f o r m a n c e w i l l be s e t by the
u s e r s F u t u r e e v a l u a t i o n s ? As A n t o i n e de Sainc-Zxup&r7
w r o t e , "As f o r the F u t u r e , your t a s k i s n o t to f o r e s e e ,
b u t to e n a b l e i t "
i H a r r i s , L a r r y E " P r o s p e c t s of P r a c t i c a l N a t u r a l Language S y s t e m s " P r o c e e d i n g s of the 18th Annual Meetin~ o f the A s s o c i a t i o n f o r Computationa~ Linguistics, June 1980, p 129
Z H e n i s z - D o s t e r C , B.; Macdonald, R E ; and Z a r e c h - rusk, M Machine Translation The Hague: Mouton,
1979
3 Sondheimer, N K "Evaluation of Natural Language
I n t e r f a c e s to Data Base S y s t e m s " P r o c e e d i n g s o (
t h e 19th Annual Meecin~ o f t h e A s s o c i a t i o n f o r Com-
p u t a t i o n a l L i n g u i s t i c s , J u n e 1981
4 M o s t e l l e r , F " ~ n n o v a t i o n and E v a l u a t i o n " S c i e n c e ( F e b r u a r y 27, 1 9 8 1 ) : 8 8 1 - 8 8 6
5 Thompson, F B and Thompson, Boaena H "?tactical Natural Language Processing: The EEL System as Prototype." In Advances in Computers, ed M Rubi-
n o f f and M C Y o v i t s Yol 13 New York: Academic P r e s s , 1975
6 Thompson, BozenaH and Thompson, F B "Rapidly Extendable Natural Language." Proceedings of the
1978 Nationa~ Conference of the ACM, pp 173-182
7 Thompson, Bozena H REL English for the User Pasadena: California Institute of Technology, 1978
8 Thompson, Bozena H "Linguistic Analysis of Natural Language Co ,unication rich Computers." COLING 80: Proceedings of the gCh Internationa~ Conference on Computariona~ Linguistics, Tokyo, October 1980, pp 190-201
9 Thompson, Bozeua H and Thompson, F B "Shifting
t o a H i g h e r Gear i n a H a t u r a l Language S y s t e m "
P r o c e e d i n z s of t h e Nat~ona~ Computer C o n f e r e n c e , May 1981
10 Lehmann, H u b e r t ; OCt, N i k o l a u e ; Z o e p p r i ~ z , Mag-
d a l e n e ' ~ s e r E x p e r i m e n t s w i t h N a t u r a l Language for DaCe Base Access." COLING 78: ProceedinRs of ch~ 7oh International Conference on Computational Linguistics Bergen, August 1978
Ii Oamtrau, Fred J The Transformational ~uestion Answ~rin~ ~T~A~ System: Operational Statistics -
1978 EC 7739 Yorktown Heights: IBM T J Watson research Center, June 1979
42