Báo cáo khoa học: "FIELD TESTING THE TRANSFORMATIOHAL QUESTION ANSWERING" potx

FIELD TESTING THE TRANSFORMATIONAL QUESTION ANSWERING TQA SYSTEM IBM T.J, Watson Research Centar R, Patrick PO Box 218 Yorktown Heights, New York 10598 Tha Transformational Question An

Trang 1

FIELD TESTING THE TRANSFORMATIONAL QUESTION ANSWERING (TQA) SYSTEM

IBM T.J, Watson Research Centar R, Patrick

PO Box 218 Yorktown Heights, New York 10598

Tha Transformational Question Answering (TQA) system

was developed over a period of time beginning in tha

early part of the last decade and continuing to the

prasant Its syntactic component is a transformational

grammar parser Cl, 2, 31, and its semantic componant is

@ Knuth attribute grammar (4, 5] The combination of

thesa components provides sufficient generality,

convenience, and efficiency to implement a broad range

of Linguistic models; in addition to a wide spectrum of

transformational grammars, Gazdar-typa Phrase

structure grammar [4] and lexical functional grammar

{7] systems appear to be cases in point, for axampla

The particular grammar which was, in fact, developed,

however, was closest to those of tha generative

semantics variety of transformational grammar; both the

underlying structures assigned to sentences and the

transformations employed to affact that assignment

traced their origins to tha generativa samantics model

The system works by finding tha underlying structures

corresponding to English queries through the use of the

transformational parsing facility Those underlying

structures are then translated to logical farms in a

domain reiational calculus by the Knuth attributa

grammar component Evaluation of logical forms with

respect ta a given data basa complates the

legical form implamentation of a quastion~ansuaring process Our first

avaluator took the form of a toy

relational data basa system in LISP Wa soon replaced

the low level tuple retrieval facilities of this

implamentation with the RSS (Relational Storage Systam)

portion of the IBM System Rk [8] This version of logical

form avaluation was the one employed in the field

testing to be dascribed In a morg recant version af the

system, however, it has been replaced by a translatian

of logical forms, first to equivalent logical forms in

a seat domain relational calculus and then to

appropriate expressions in the SQL language, Systam R's

high level query language

The first data base to which the system was applied was

one concerning business statistics such as the sales,

@arnings: oumber of employees, ate of 60 large

companias over a five-year period This was a toy data:

base, to be sura, but it was usaful to us in developing

our system A later data base contained the basic Land

identification records of about 16,000 parcels of Land

usa by members of the city planning department and

Cless frequantly) other departments to answer auestians

concerning the information in that file Our purpose in

making the system available to thosa city employees

was, of course, to provide accass to a data base of raal

interest te a group of users and to fiald tast our

system by avaluating their usa of it Accordingly, the

TQA system was tailored to tha land usa file

application and installad at City Hall at the end of

1977 It remained there during 1978 and 1979, during

which time it was used intermittently as the need arose

for ad hoc query to supplemant tha report generation

programs that ware alraady available for the extraction

af information

Total yvsage of the system was less than we had expectad

would be the case when we made the decision to procaed

with this application This resulted from a number of

factors, including a change in mission for the planning

department, a reduction jin the number of people in that

department, a decision to rabuild the office space

during the period of usage, and a daegrea af

obsolascence of the data due to tha length of time

between updates (which were to have been supplied by

the planning department) Ouring 1978 a total of 788

queries were addressed to tha system, and during 1979

the total was 210 Damerau [9] gives the distribution

of these suaries by month, and he also breaks them down

by month into a number of different catagories

Damerau's report of the gross parformanca statistics

far the year 1973, and a similar, as yet unpublished

report of his for 1979, contain a wealth of data that Ï

will not attampt to include in this brief note Evan

though Ais reports contain a iarge quantity of

statistical perfarmanca data, however, there ara a lot

of important observations which can only be made from a

datailed analysis of the day-by-day transcript of

system usage An analysis of sequences of related

quastions is a case in point as is an analysis of the

attempts of users to phrase new quearias in response to

failura of the system to process certain sentances A

paper in preparation by Plath is concerned with

treating thease and similar issues with the care and

datail which they warrant Time and spaca

considerations Limit my contribution in this nota to

35

just highlighting some of the major findings of Damarau and Plath

Consider first a summary of the 1978 statistics:

Tarmination Conditions:

Completed CAnswer raached) 513 6 Aborted (System crash, etc.)

User Cancelled Program Error Parsing Failure Unknown Other Realavant Evants:

User Comment Operator Massage

Laxical Choice Resolved by User 119 15

"Nothing in Data Base” Answar 61 : The percentage of succassfully procassed sentances is cansistent with but slightly smaller than that of such other investigators as Woods C103, Ballard and Sierman [il], and Hershman at al (121 Extreme care should be exercised in interpreting any such averall numbers, however, and even mora care must be axercised in comparing numbers from differant studies Lat ma just mention a few considerations that must ba kept in mind

in interpreting the TQ@A rasults above

Fiest of all, our users’ purposes varied tramandously from day to day and evan from quastion to questian On ona occasion, for axample, a session might be davoted

te a serious attempt to extract data needed far a federal grant proposal, and either the quary complexity might ba relatively limited so as to minimize the chance of arror, or else the questions might ba essentially repetitions of the same query, with minor variations to select differant data, On another occasion, however, tha session might be a demonstration, or a serious attempt to determina the limits of the system's understanding capability, or aven a frivolous auery to satisfy the user's curiosity

as to the computer's response to a question outsida its area of axpertise (Ona of our failures was the santancaea, "Whe killed Cock Robin?".)

Our users variad widely in terms of their familiarity with tha contants of tha data base None knew anything about the internal organization of information (e.g how the dats was arranged into relations), but some had good knowladga of just what kind of data was stored, some had limitad knowledga, and soma had no knowledge and aven falsa expactations as to what knowledge was included in the data base In addition, they varied

widely with respect to tha amount of prior experience

they had with the system Initially wa provided no formal training in the usa of tha system but some users acquired significant knowledge of the system through its sustained use over a period of time Something over half of tha total usage was made by the individual from the planning department who was rasponsible for starting the system up and shutting it down each day Usage was also mada by other members of the planning department, by mambers of other departments, and by summer interns

It should also ba noted that the TQA system itsal? did not stay constant over the two-year period af tasting

As problems wara encountared, modifications were mada

to many componants of the system In particular, the lexicon, grammar, semantic intarpretation rules Cattribute grammar rules), and lagical form evaluation functions ali avolved over the pariod in question

Ceontinuously, But at a decreasing rate) The parser

and tha semantic interpreter changed little, if any A rerun of all sentences, using tha version of the grammar that axisted at the canclusion of the field test program showed that 50 % of the santences which previously failed wera processed correctly This is impressive when it is observed that a large percantaga

of tha remaining 50 % constitute sentencas which are either ungrammatical (sometimes sufficiently to preclude human comprehansigon) or alsa contain references to semantic concepts outside our universe of (land use) discourse

On the whole, our users indicated thay were satisfied with the performance of tha system In a conference with them at ona point during the field tast, thay indicated thay would prefer us to spend our time bringing more of thair files on line (a.g., the zoning beard of appeals file) rather than to spend mora time

Trang 2

providing additional syntactic and associated semantic

capability Those instances where an unsuceasstful

query was followed up by attempts to rephrase the query

so as to parmit its procassing showed faw instances

where success was not achieved within three attemprts

This data is obscured somawhat by the fact that users

called us on a faw occasions to gat advice as to how ta

reword a quarry Gn other occasions the terminal message

facility was invoked for the purpose of obtaining

advice, and this left a record in our automatic logging

facility That facility preserved a racord of all

traffic batween the user's tarminal, the computer and

our own monitoring tarminal (Cuwhich was not always

turned on or attendad), and it included 4 time stamp for

every line displayed on the users’ terminal

A word is in order on the raal time performance of the

system and on the amount of CPU time required Damerau

C9] ineiudes a chart which shows how many queries

required a given number af minutes of real time for

complate processing Tha total alapsed tima for a

query was typically around three minutes (58% of the

sentences were processed in four minutes or less)

Elapsed time depended primarily an machine load and

user behavior at thea tarminal The computer on which

the system cperated was an IBM System 3704168 with an

attached processor, § megabytes of mamory and extansive

Peripheral storagdg, operating under tha VMZ370

operating system There were typically in excass of 200

users competing for resourcas on the system at the

times when tha TQẬA system was running during the

1978-1979 fiald tasts Besides queuing fer the CPU and

mamory, this system developed queues for the IBM 3859

mess dị rao@ System, an which the TQA data base was

stored

Users had no complaints about real time responsa, but

this may have been due to thair procadura for handling

ad hoc aueries prior to the installation of the TQA

system That procedure called for ad học queries to be

coded in RPG by members of the data processing

department, and the turnaround tima was a matter af

days rather than minutes Ít is likely that the real

time performance of tha system caused users sometimes

to look up data about a spacific sarceal in a hard copy

printout rather than giving it to tha systam %uaries

were most oftan of the type requiring statistical

Procassing of a set of parcels or of the type requiring

@ search for the percel or parcals that satisfiad given

search critaria

The CPU requirements of the system, broken down into a

number of catagories, are alsa plotted by Damerau f9},

The typical time ta procass 4 sentence was ten seconds,

but sentencas with large data basa retriaval damands

took up to @ minute System hardware imprevements made

subsequent to the 1978-1979 field tests have cut this

processing time approximately in half Throughout our

devealoemant af tha TRA system, considaratians of spead

have been secondary We have idantified many araas in

which recodirg should produce a dramatic increase in

speed, but this has been assigned a lasser priority

than basic enhancement of the system and the coverage

of English provided through its transformational

grammar

Our axperimant has shown that field testing of quastion

answering systems provides cartain information that is

not otherwise available The day to day usage of the

system was different in many respects from usage that

results from controlled but inevitably somewhat

artificial, experiments We did not influence our users

by the wording of problems pasad to them because we gave

them no problems; their raquasts for information were

solely for their own purposes Our sample queries that

we initially exhibited ta city employees to indicate

the system wes ready to be tested were invariably

greetad with mirth due to the improbability that

anyone would want to know the information requested

(They asked for reassurance that the systam would also

answer "reali" quastions) We alse obtained valuable

information on such matters as how long users persist

in rephrasing queries whan thay ancounter difficulties

of various kinds, how successful they ara in correcting

correcting initial arrors I hope to discuss these and

ather matters in more datail in the oral version of this

Paper

Valuable as our field tasts are, they cannot srovide

certain information that must be obtained from

controlled experiments Accordingly, wa hope to conduct

in the near future, using the latest enhanced version

of the system and carefully controlling such factors as

user training and problam statement After teaching a

course in data basa management systems at Queens

Callege and the Pratt Institute, and after running

informal experiments there comparing students’ relative

success in using TQA, ALPHA relational algabra, BE,

and SEQUEL, I am canvincad that aven for educated,

programaing-oriented users with a fair amount of

experience in learning a formal query language, the TQA

system offers significant advantages over formal quary

36

languages in retriaving data quickly and corractly This remains to be areved Cor disproved) by conducting apprepriata formal axperiments

REFERENCES

[11

(21

f3)

{41

(3)

tá)

{9}

C10]

C12]

Plath, WwW de> Transformational Grammar and Transformational Parsing in the Request System, ISM Research Report RC 4396, Thomas J Watson Research Center, Yorktown Heights, N.Y 1973 Plath, M d., String Transformations ¡in the REQUEST System, American Journal of Computational Linguistics, Microfiche 8, 1974

Patrick, 5 R.» Transformational Analysis; Natural

n Pr CR Rustin, ed.), Algorithmics Press, 19

Knuth,

hem

127-145

Patriek, 5 R., Raquest Systam, Linguistics, Proceedings of Conferenca on Computational 27⁄/VI111-1⁄1X 1973, pp 585-610

Gazdar, 6 J M., Pheasa Structure Grammar, to appear in The Nature of Syntactic Representation › (ads P Jacobson and G K Pullum), 1979

D E., Samanties oÝ Cantext~Frse Languagegs,

ms Th » II, June 1968 2, pp Semantic Interpretation itn the

in Computational and Mathematical

tha International Linguistics, Pisa,

Brasnan; J Wd and Kaplan, R M., Laxical-Functional Grammar: A Formal System for Grammatical Representation, to appear in Th a

W Sresnan;, ed.), Cambridge: MIT Press

Astrahan, M.M.; M.WN.;

M.F.: CLerias, R.A.; MeJones, J.; Mahl, J.W.; Putzolu, G.R.; Traiger, I.L.; Wade, §.W.; and Watson, V., System R: Ralational Approach to Database Management, ACM Transactions on Databasa Systems, Vol 1, No 21, June, 1976, pp 97-137 Damerau, F J The Transformational Question Answering (TQA) System Operational Statistics =

1978, to appear in AJCL, dụng 1941

D.D.;3 King, Ghamber lin, P.P.; Blasgen,

Gray, ởj.H.;

W A.» Transition Network Grammars, Natural

Pr CR Rustin, ed.}, Algorithmics

1973

Biermann, A W and Ballard, 8 W., Toward Natural Language Computation, AJCL, Vai 6, He 2, April-June 1980, pp 71-86

Harshman, R L., Kallay, ® T and Miller, H Ẵ€., User Performance with a Natural Language Query System for Command Control, NPRDC TR 79-7, Navy Personnel Research and Development Cantar, San Diego, Gal 92152, January 1979

Weeds, Press,

Định dạng
Số trang	2
Dung lượng	220,88 KB