FIELD TESTING THE TRANSFORMATIONAL QUESTION ANSWERING TQA SYSTEM IBM T.J, Watson Research Centar R, Patrick PO Box 218 Yorktown Heights, New York 10598 Tha Transformational Question An
Trang 1FIELD TESTING THE TRANSFORMATIONAL QUESTION ANSWERING (TQA) SYSTEM
IBM T.J, Watson Research Centar R, Patrick
PO Box 218 Yorktown Heights, New York 10598
Tha Transformational Question Answering (TQA) system
was developed over a period of time beginning in tha
early part of the last decade and continuing to the
prasant Its syntactic component is a transformational
grammar parser Cl, 2, 31, and its semantic componant is
@ Knuth attribute grammar (4, 5] The combination of
thesa components provides sufficient generality,
convenience, and efficiency to implement a broad range
of Linguistic models; in addition to a wide spectrum of
transformational grammars, Gazdar-typa Phrase
structure grammar [4] and lexical functional grammar
{7] systems appear to be cases in point, for axampla
The particular grammar which was, in fact, developed,
however, was closest to those of tha generative
semantics variety of transformational grammar; both the
underlying structures assigned to sentences and the
transformations employed to affact that assignment
traced their origins to tha generativa samantics model
The system works by finding tha underlying structures
corresponding to English queries through the use of the
transformational parsing facility Those underlying
structures are then translated to logical farms in a
domain reiational calculus by the Knuth attributa
grammar component Evaluation of logical forms with
respect ta a given data basa complates the
legical form implamentation of a quastion~ansuaring process Our first
avaluator took the form of a toy
relational data basa system in LISP Wa soon replaced
the low level tuple retrieval facilities of this
implamentation with the RSS (Relational Storage Systam)
portion of the IBM System Rk [8] This version of logical
form avaluation was the one employed in the field
testing to be dascribed In a morg recant version af the
system, however, it has been replaced by a translatian
of logical forms, first to equivalent logical forms in
a seat domain relational calculus and then to
appropriate expressions in the SQL language, Systam R's
high level query language
The first data base to which the system was applied was
one concerning business statistics such as the sales,
@arnings: oumber of employees, ate of 60 large
companias over a five-year period This was a toy data:
base, to be sura, but it was usaful to us in developing
our system A later data base contained the basic Land
identification records of about 16,000 parcels of Land
usa by members of the city planning department and
Cless frequantly) other departments to answer auestians
concerning the information in that file Our purpose in
making the system available to thosa city employees
was, of course, to provide accass to a data base of raal
interest te a group of users and to fiald tast our
system by avaluating their usa of it Accordingly, the
TQA system was tailored to tha land usa file
application and installad at City Hall at the end of
1977 It remained there during 1978 and 1979, during
which time it was used intermittently as the need arose
for ad hoc query to supplemant tha report generation
programs that ware alraady available for the extraction
af information
Total yvsage of the system was less than we had expectad
would be the case when we made the decision to procaed
with this application This resulted from a number of
factors, including a change in mission for the planning
department, a reduction jin the number of people in that
department, a decision to rabuild the office space
during the period of usage, and a daegrea af
obsolascence of the data due to tha length of time
between updates (which were to have been supplied by
the planning department) Ouring 1978 a total of 788
queries were addressed to tha system, and during 1979
the total was 210 Damerau [9] gives the distribution
of these suaries by month, and he also breaks them down
by month into a number of different catagories
Damerau's report of the gross parformanca statistics
far the year 1973, and a similar, as yet unpublished
report of his for 1979, contain a wealth of data that Ï
will not attampt to include in this brief note Evan
though Ais reports contain a iarge quantity of
statistical perfarmanca data, however, there ara a lot
of important observations which can only be made from a
datailed analysis of the day-by-day transcript of
system usage An analysis of sequences of related
quastions is a case in point as is an analysis of the
attempts of users to phrase new quearias in response to
failura of the system to process certain sentances A
paper in preparation by Plath is concerned with
treating thease and similar issues with the care and
datail which they warrant Time and spaca
considerations Limit my contribution in this nota to
35
just highlighting some of the major findings of Damarau and Plath
Consider first a summary of the 1978 statistics:
Tarmination Conditions:
Completed CAnswer raached) 513 6 Aborted (System crash, etc.)
User Cancelled Program Error Parsing Failure Unknown Other Realavant Evants:
User Comment Operator Massage
Laxical Choice Resolved by User 119 15
"Nothing in Data Base” Answar 61 : The percentage of succassfully procassed sentances is cansistent with but slightly smaller than that of such other investigators as Woods C103, Ballard and Sierman [il], and Hershman at al (121 Extreme care should be exercised in interpreting any such averall numbers, however, and even mora care must be axercised in comparing numbers from differant studies Lat ma just mention a few considerations that must ba kept in mind
in interpreting the TQ@A rasults above
Fiest of all, our users’ purposes varied tramandously from day to day and evan from quastion to questian On ona occasion, for axample, a session might be davoted
te a serious attempt to extract data needed far a federal grant proposal, and either the quary complexity might ba relatively limited so as to minimize the chance of arror, or else the questions might ba essentially repetitions of the same query, with minor variations to select differant data, On another occasion, however, tha session might be a demonstration, or a serious attempt to determina the limits of the system's understanding capability, or aven a frivolous auery to satisfy the user's curiosity
as to the computer's response to a question outsida its area of axpertise (Ona of our failures was the santancaea, "Whe killed Cock Robin?".)
Our users variad widely in terms of their familiarity with tha contants of tha data base None knew anything about the internal organization of information (e.g how the dats was arranged into relations), but some had good knowladga of just what kind of data was stored, some had limitad knowledga, and soma had no knowledge and aven falsa expactations as to what knowledge was included in the data base In addition, they varied
widely with respect to tha amount of prior experience
they had with the system Initially wa provided no formal training in the usa of tha system but some users acquired significant knowledge of the system through its sustained use over a period of time Something over half of tha total usage was made by the individual from the planning department who was rasponsible for starting the system up and shutting it down each day Usage was also mada by other members of the planning department, by mambers of other departments, and by summer interns
It should also ba noted that the TQA system itsal? did not stay constant over the two-year period af tasting
As problems wara encountared, modifications were mada
to many componants of the system In particular, the lexicon, grammar, semantic intarpretation rules Cattribute grammar rules), and lagical form evaluation functions ali avolved over the pariod in question
Ceontinuously, But at a decreasing rate) The parser
and tha semantic interpreter changed little, if any A rerun of all sentences, using tha version of the grammar that axisted at the canclusion of the field test program showed that 50 % of the santences which previously failed wera processed correctly This is impressive when it is observed that a large percantaga
of tha remaining 50 % constitute sentencas which are either ungrammatical (sometimes sufficiently to preclude human comprehansigon) or alsa contain references to semantic concepts outside our universe of (land use) discourse
On the whole, our users indicated thay were satisfied with the performance of tha system In a conference with them at ona point during the field tast, thay indicated thay would prefer us to spend our time bringing more of thair files on line (a.g., the zoning beard of appeals file) rather than to spend mora time
Trang 2providing additional syntactic and associated semantic
capability Those instances where an unsuceasstful
query was followed up by attempts to rephrase the query
so as to parmit its procassing showed faw instances
where success was not achieved within three attemprts
This data is obscured somawhat by the fact that users
called us on a faw occasions to gat advice as to how ta
reword a quarry Gn other occasions the terminal message
facility was invoked for the purpose of obtaining
advice, and this left a record in our automatic logging
facility That facility preserved a racord of all
traffic batween the user's tarminal, the computer and
our own monitoring tarminal (Cuwhich was not always
turned on or attendad), and it included 4 time stamp for
every line displayed on the users’ terminal
A word is in order on the raal time performance of the
system and on the amount of CPU time required Damerau
C9] ineiudes a chart which shows how many queries
required a given number af minutes of real time for
complate processing Tha total alapsed tima for a
query was typically around three minutes (58% of the
sentences were processed in four minutes or less)
Elapsed time depended primarily an machine load and
user behavior at thea tarminal The computer on which
the system cperated was an IBM System 3704168 with an
attached processor, § megabytes of mamory and extansive
Peripheral storagdg, operating under tha VMZ370
operating system There were typically in excass of 200
users competing for resourcas on the system at the
times when tha TQẬA system was running during the
1978-1979 fiald tasts Besides queuing fer the CPU and
mamory, this system developed queues for the IBM 3859
mess dị rao@ System, an which the TQA data base was
stored
Users had no complaints about real time responsa, but
this may have been due to thair procadura for handling
ad hoc aueries prior to the installation of the TQA
system That procedure called for ad học queries to be
coded in RPG by members of the data processing
department, and the turnaround tima was a matter af
days rather than minutes Ít is likely that the real
time performance of tha system caused users sometimes
to look up data about a spacific sarceal in a hard copy
printout rather than giving it to tha systam %uaries
were most oftan of the type requiring statistical
Procassing of a set of parcels or of the type requiring
@ search for the percel or parcals that satisfiad given
search critaria
The CPU requirements of the system, broken down into a
number of catagories, are alsa plotted by Damerau f9},
The typical time ta procass 4 sentence was ten seconds,
but sentencas with large data basa retriaval damands
took up to @ minute System hardware imprevements made
subsequent to the 1978-1979 field tests have cut this
processing time approximately in half Throughout our
devealoemant af tha TRA system, considaratians of spead
have been secondary We have idantified many araas in
which recodirg should produce a dramatic increase in
speed, but this has been assigned a lasser priority
than basic enhancement of the system and the coverage
of English provided through its transformational
grammar
Our axperimant has shown that field testing of quastion
answering systems provides cartain information that is
not otherwise available The day to day usage of the
system was different in many respects from usage that
results from controlled but inevitably somewhat
artificial, experiments We did not influence our users
by the wording of problems pasad to them because we gave
them no problems; their raquasts for information were
solely for their own purposes Our sample queries that
we initially exhibited ta city employees to indicate
the system wes ready to be tested were invariably
greetad with mirth due to the improbability that
anyone would want to know the information requested
(They asked for reassurance that the systam would also
answer "reali" quastions) We alse obtained valuable
information on such matters as how long users persist
in rephrasing queries whan thay ancounter difficulties
of various kinds, how successful they ara in correcting
correcting initial arrors I hope to discuss these and
ather matters in more datail in the oral version of this
Paper
Valuable as our field tasts are, they cannot srovide
certain information that must be obtained from
controlled experiments Accordingly, wa hope to conduct
in the near future, using the latest enhanced version
of the system and carefully controlling such factors as
user training and problam statement After teaching a
course in data basa management systems at Queens
Callege and the Pratt Institute, and after running
informal experiments there comparing students’ relative
success in using TQA, ALPHA relational algabra, BE,
and SEQUEL, I am canvincad that aven for educated,
programaing-oriented users with a fair amount of
experience in learning a formal query language, the TQA
system offers significant advantages over formal quary
36
languages in retriaving data quickly and corractly This remains to be areved Cor disproved) by conducting apprepriata formal axperiments
REFERENCES
[11
(21
f3)
{41
(3)
tá)
{9}
C10]
C12]
C12]
Plath, WwW de> Transformational Grammar and Transformational Parsing in the Request System, ISM Research Report RC 4396, Thomas J Watson Research Center, Yorktown Heights, N.Y 1973 Plath, M d., String Transformations ¡in the REQUEST System, American Journal of Computational Linguistics, Microfiche 8, 1974
Patrick, 5 R.» Transformational Analysis; Natural
n Pr CR Rustin, ed.), Algorithmics Press, 19
Knuth,
hem
127-145
Patriek, 5 R., Raquest Systam, Linguistics, Proceedings of Conferenca on Computational 27⁄/VI111-1⁄1X 1973, pp 585-610
Gazdar, 6 J M., Pheasa Structure Grammar, to appear in The Nature of Syntactic Representation › (ads P Jacobson and G K Pullum), 1979
D E., Samanties oÝ Cantext~Frse Languagegs,
ms Th » II, June 1968 2, pp Semantic Interpretation itn the
in Computational and Mathematical
tha International Linguistics, Pisa,
Brasnan; J Wd and Kaplan, R M., Laxical-Functional Grammar: A Formal System for Grammatical Representation, to appear in Th a
W Sresnan;, ed.), Cambridge: MIT Press
Astrahan, M.M.; M.WN.;
M.F.: CLerias, R.A.; MeJones, J.; Mahl, J.W.; Putzolu, G.R.; Traiger, I.L.; Wade, §.W.; and Watson, V., System R: Ralational Approach to Database Management, ACM Transactions on Databasa Systems, Vol 1, No 21, June, 1976, pp 97-137 Damerau, F J The Transformational Question Answering (TQA) System Operational Statistics =
1978, to appear in AJCL, dụng 1941
D.D.;3 King, Ghamber lin, P.P.; Blasgen,
Gray, ởj.H.;
W A.» Transition Network Grammars, Natural
Pr CR Rustin, ed.}, Algorithmics
1973
Biermann, A W and Ballard, 8 W., Toward Natural Language Computation, AJCL, Vai 6, He 2, April-June 1980, pp 71-86
Harshman, R L., Kallay, ® T and Miller, H Ẵ€., User Performance with a Natural Language Query System for Command Control, NPRDC TR 79-7, Navy Personnel Research and Development Cantar, San Diego, Gal 92152, January 1979
Weeds, Press,