Báo cáo khoa học: "Semantic Acquisition In TELI: A Transportable, User-Customized Natural Language Processor" pptx

Tinkham 1984, In the remainder of this paper, the reader should bear in mind that the acquisition modules of TEL1, including the menus they generate, are driven by extensible data struct

Trang 1

Semantic Acquisition In TELI: A Transportable, User-Customized Natural Language Processor

B r u c e W B a l l a r d

D o u g l a s E S t u m b e r g e r

A T & T B e l l L a b o r a t o r i e s

6 0 0 M o u n t a i n A v e n u e

M u r r a y H i l l , N J 0 7 9 7 4

Abstract

We discuss ways of allowing the users of a natural

language processor to define, examine, and modify the

definitions of any domain-specific words or phrases

known to the system A n implementation of this work

forms a critical portion of the knowledge acquisition

Interface ( T E L l ) , which answers English questions

about tabular (first normal-form) data files and runs

on a Symbolics Lisp Machine H o w e v e r , our

techniques enable the design of customization modules

that are largely independent of the syntactic and

retrieval components of the specific system they

supply information to In addition to its obvious

practical value, this area of research is important

because it requires careful attention to the formalisms

interactions among the modules based on those

formalisms

1 Introduction

Language Interface system ( T E L I ) we have sought to

scientific nature Concerning the applied side of

computational linguistics, we seek to redress the fact

that many natural language prototypes, despite their

sophistication and even their robustness, have fallen

into disuse because of failures (1) to make known to

users exactly what inputs are allowed (e.g what words

capabilities that meet the precise needs of a given user

or group of users (e.g appropriate vocabulary, syntax

and semantics) Since experience has shown that

neither users nor svstem designers can predict in

meanings that will arise in accessing a given database

(cf Tennant 1979) we have sought to make T E L l

"transportable" in an extreme sense, where

customizations may be p e r f o r m e d (1) by end users, as

opposed to the system designers, and (2) at any time

during the processing of English sentences, rather

English processing may occur

In addition to the potential practical benefits of

conceived transportability projects can make useful

scientific contributions to computational linguistics since single-domain systems and, to a lesser extent, systems adapted over weeks or months by their designers, afford opportunities to circumvent, rather than squarely address, important issues concerning (a)

the precise nature of the formalisms the system is

system modules Although customization efforts offer

above are less likely to go unnoticed when dealing with a system whose domain-specific information is supplied at run-time, especially when that information

is being provided by the actual users of the system

By way of overview, we note that the T E L I system derives from previous work on the L D C project, as d o c u m e n t e d in Ballard (1982), Ballard (1984), Ballard, Lusth and T i n k h a m (1984) and Ballard and T i n k h a m (1984) The initial prototype of

T E L I which runs on a Symbolics Lisp Machine, is

information stored in one or more tables, (i.e first-

n o r m a l - f o r m relational database) A sample view of the display screen during a session with T E L l which may give the flavor of how the system operates, is shown in Figure L I n f o r m a t i o n on some aspects of knowledge acquisition not discussed in this paper particularly with regard to syntactic case frames, can

be found in Ballard (1986)

2 Types of Modifiers Available in TELI

The syntactic and semantic models adopted for TEL1 are intended to provide a unified t r e a t m e n t of a broad and extendible class of word and phrase types

By providing for an "extendible" class of constructs,

we m a k e the knowledge acquisition module of T E L l independent of the natural language portion of the system, whose earlier version has been described in Ballard and T i n k h a m (1984) and Ballard Lusth and

Trang 2

Tinkham (1984), In the remainder of this paper, the

reader should bear in mind that the acquisition

modules of TEL1, including the menus they generate,

are driven by extensible data structures that convey

the linguistic coverage of the underlying natural

language processor (NLP) for which information is

being acquired For example, incorporating adjective

phrases into the system involved adding 12 lines of

Lisp-like data specifications This brevity is largely

due to the use of case frames that embody dynamically

alterable selectional restrictions (Ballard, 1986)

As an initial feeling for the coverage of the

N L P for which information is currently acquired,

TEL1 provides semantics for the word categories

Adjective

e.g a n expensive restaurant

Noun Modifier

e.g a graduate student

Noun

e.g a pub

and the phrase types

Adjective Phrase

e.g employees responsible f o r the planning projects

Noun-Modifier Phrase

e.g the speech researchers

Prepositional Phrase

e.g the trails on the Franconia-Region map

Verb Phrase

e.g employees that report to Brachman

Functional Noun Phrase

e.g the size of department 11387,

the colleagues of Litman

In addition to these user-defined modifier types, the system currently provides for negation, comparative and superlative forms of adjectives, possessives, and ordinals Among the grammatical features supported

prepositional and adjective phrases, fronting of verb phrase complements, and other minor features One important area for expansion involves quantifiers both logical (e.g "all") and numerical (e.g "at least 3")

3 Principles Behind Semantic Acquisition

As noted above, our goal is to devise techniques that enable end users of a natural language processor

to furnish all domain-specific information to by the system This information includes (1) the vocabulary

needed for the data at hand; (2) various types of

selectional restrictions that define acceptable phrase attachments; and most critically (3) the definitions of words and phrases With this in mind, the primary criteria which the semantic acquisition component of

T E L I has been designed around are as follows

To allow users to define, examine or modify domain- specific il(ormation at any time This derives from our beliefs that the needs of a user or group of users cannot all be predicted in advance, and will probably change once the system has begun operation

To enable users to impart new concepts to the system

We provide more than just synonym and paraphrase capabilities and, in fact definitions may be arbitrarily complex, by being defined either (a) in terms of other

definitions, or (b) as the conjuction of an arbitrary number of constraints

E n g l i s h I n p u t :

h i c h t r a i l s t h a t a r e n ' t long lead to a mountain on £ r a n c o n i a r i d g e

I n t e r n a l R e p r e s e n t a t i o n :

(TRAIL (VERBINFO (TRAIL LEAD NIL NIL TO MOUNTAIN)

(SUBJ ?) (ARG (MOUNTAIN (QURNT = NIL)

(NOT (RDJ LONG)))

Algebra Querg:

(SELECT t r a i l s ( t r a i l length-Km)

(and (< length-km 67

Answer:

(PREPINFO (MOUNTAIN ON TRAIL)

(SUBJ ?) (ARG (TRAIL (= FRANCONIA-RIDGE)))))))

(= t r a i l (SELECT

(TJOIN t r a i l s ( t r a i l ) = m t n - t r a i l s ( t r a i l ) ) ( t r a i l ) (= name (SELECT

(TJOIN mountains(name) = m t n - t r a i l s ( n a m e ) ) ( n a m e ) (= t r a i l ' f r a n c o n i a - r i d g e ) ) ) ) ) ) )

(TRAILS) TRAIL - LENGTH-KM OLD-BRIDLE-PATH 4.1

LIBERTY-SPRING 4.7

W h a t ' s Y o u r P l e a s u r e ? ,qrl:S~,ver a QuE:st~Jotl

E d i t the L a s t I n p u t

P r i n t Parse Tree Run Pieces o f the NLP

Exit

Begin a &~ston-dzatior~

UocabulaG

5ynta:,., Sernavftic:s General I n f o Clear 5 , : r e o l

E d i t Global Flags

5ave/Pet.rieve Session

Figure 1: Sample Display Screen; Top-Level M e n u of TEL1

Trang 3

To provide definition capabilities independent of

prepositional phrases, verb phrases, and so forth are

all defined in precisely the same way This is achieved

in part by treating all modifiers as n-place predicates

To allow definitions to be given at various conceptual

English; (b) in terms of the meanings of previously

"conceptual" relationships, which have been abstracted

to a level above that of the physical data files; or (d)

in terms of database columns We strive to minimize

the need for low-level database references, since this

helps (1) to avoid tedious and redundant references,

and (2) to assure that most of our techniques will be

applicable beyond the current conventional database

setting

example, the menu scheme described in Section 7.2

offers the user more assistance in making definitions

but is less powerful, than the alternative English and

English-like methods described in Section 7.3 We

prefer to let users decide when each modality is

appropriate, rather than force a compromise among

simplicity, reliability, and power

To enable the system t o proride help or guidance to the

all current modifiers of or functions associated with,

opportunities exist for co-operation on the part of the

system To avoid unnecessary limitations, however,

users are generally able to override any hints made by

the system

4 Semantic Processing in TELI

The semantic model developed for T E L I , in

which definitions are acquired from users, assumes

that (1) modifier meanings will be purely extensional,

and can thus be treated as n-place predicates, and (2)

compositional Concerning the latter assumption, we

including problems of word sense, will have been

restrictions (Ballard and Tinkham, 1984), and (b)

minimal re-ordering does occur in converting parse

trees into internal representations

4.1 Types of Semantics

All user-defined semantics, however acquired,

are stored in a global Lisp structure indexed by the

word or phrase being defined Single-word modifiers

are indexed by the word being defined, its part of

speech, and the entity it modifies; phrasal modifiers

are indexed by the phrase type and the associated case

frame For example, the internal references

(new adj room) (prep-ph (restaurant in county))

respectively index the definitions of "new", when used

as an adjective modifier of rooms, and "in", as it relates restaurants to counties As suggested by this indexing scheme, word meanings arise only in the context of their occurrence, never in isolation Thus,

"new room" and "restaurant in county" receive definitions, not "new" or "in" This decision lends

additional effort thereby needed to make multiple

borrowed meanings, as described in Section 7.4 Although our representation strategies allow for definitions that involve relatively elaborate traversals

of the physical data files T E L I does not presently provide for arithmetic computations Thus, the input

"Which restaurants are within 3 blocks of China Gardens?" requires a 2-place "distance" function and, unless the underlying data files provide distances

distances to account for) the necessary semantics cannot be supplied

4.2 Internal Representations

As an example of the "internal representation" (IR) of an input, which results from a recursive traversal of a completed parse tree, and which illustrates preparations for compositional analysis, the (artificially complex) input

"Which Mexican restaurants in the largest city other than New Providence that are not expensive are open for lunch'?"

will have [roughly] the internal representation

(restaurant (not (adj expensive))

(nounmod-ph (food restaurant)

(nounmod (food (= Mexican))) (head ?))

(prep-ph (restaurant in city)

(subj ?) (arg (city (super large)

(!= New-Providence)))) (adj-ph (restaurant open for meal) (subj ?)

(arg (meal (= lunch)))))

This top-level interpretation of the input instructs the system to find all restaurants that satisfy (a) the

"expensive", and (b) the three 2-place predicates

associated with the noun-noun, prepositional, and adjective phrases Note that modifiers associated with

Trang 4

phrasal m o d i f i e r s are r e f e r e n c e d by their case f r a m e ,

e.g "restaurant in city" W i t h i n the scope of these

r e f e r e n c e s , case labels (e.g "subj" and "arg") indicate

which slots have been i n s t a n t i a t e d and which slot has

been r e l a t i v i z e d , the l a t t e r d e n o t e d by "?" The list of

slot names associated with each phrase type is s t o r e d

globally In most instances, the a r g u m e n t of a case

slot can be an a r b i t r a r y IR s t r u c t u r e , in k e e p i n g with

the recursive n a t u r e of the English inputs being

recognized

Since IR structures are built a r o u n d the w o r d

and phrase types of the English being dealt with, and

since the meanings of words and p h r a s e s are s t o r e d

globally, IR s t r u c t u r e s should not be r e g a r d e d as a

"knowledge r e p r e s e n t a t i o n " in the sense of K L - O N E ,

logical form and so forth Systems similar in goals to

T E L I but which revolve a r o u n d logical form include

T E A M (Grosz, 1983; G r o s z , A p p e l t M a r t i n , and

P e r e i r a 1985), I R U S (Bates and Bobrow, 1983; Bates,

M o s e r , and Stallard 1984), and T Q A (Plath, 1976;

D a m e r a u , 1985) One system similar to T E L I in

building i n t e r m e d i a t e s t r u c t u r e s that c o n t a i n

r e f e r e n c e s to language-specific concepts is

D A T A L O G ( H a f n e r and G o d d e n 1985)

5 The Initial Phase of Customization

W h e n a user asks TEL1 to begin l e a r n i n g about

a new d o m a i n , the system spends from five to t h i r t y

minutes, d e p e n d i n g o n the c o m p l e x i t y of the

a p p l i c a t i o n , obtaining basic i n f o r m a t i o n about each

table in the the d a t a b a s e (see F i g u r e 2) U s e r s are

first a s k e d to give the key column of the table This

i n f o r m a t i o n is used p r i m a r i l y to guide the system in

i n f e r r i n g the s e m a n t i c s of c e r t a i n noun-noun and "of"-

based p a t t e r n s N e x t , users are a s k e d which columns

c o n t a i n entity values as o p p o s e d to property values

T y p i c a l p r o p e r t i e s are "size", "color", and "length",

which d i f f e r from entities in that (a) their values do

not a p p e a r as an a r g u m e n t to a r b i t r a r y verbs and

p r e p o s i t i o n s (e.g o t h e r than "have", "with", etc.) and

(b) they will not t h e m s e l v e s have p r o p e r t i e s a s s o c i a t e d

with them F i n a l l y , users are a s k e d to specify the type

of value e a c h c o l u m n contains This i n f o r m a t i o n

allows subsequent r e f e r e n c e s to concepts (e.g "color")

r a t h e r than physical column names It also aids the

system in forming subsequent suggestions to the user

(e.g defaults that can be o v e r r i d d e n )

H a v i n g o b t a i n e d the i n f o r m a t i o n

above, the system constructs definitions

simple questions to be a n s w e r e d , such as

i n d i c a t e d that allow

"What is Sally's social security number?"

"What is the age of John"

A l o n g with i n f o r m a t i o n f r e e l y v o l u n t e e r e d by the

user, these definitions can be s u b s e q u e n t l y e x a m i n e d

or c h a n g e d at the user's request

STUDENT-INFO

- STUDENT - BILL DOUG FRED JOHN SALLY SUE TERESA

123-45-67891 I BBLLRRD 111-22-3333 3 LITMAN 321-54-9876 3 MARCUS 555-33-1234 2 JONES

314-15-9265 4 BRACH~RN 987-65-4321 3 BRCHENKO

3 3 3 - 2 2 - 4 4 4 4 G BORGIDR

W h i c h is t h e " k e y " c o l u m n o f S T U D E N T - I N F O ?

5 T L I D E N T (BILL, [IOIJI5 ) ~

~" 1~_,-4J-b,.',_.,9 )

SSN (111-~-~:,_,D, .:.o ~ - ~ e - CLASS (1, 2, ,

~DUI5 (BACHENKO, B,~LL,~RD .)

Columns o f STUDEHT-IHFO E n t i t y PrToperty STUDENT (BILL ) 1~

SSH ( 1 1 1 - 2 2 - 3 3 3 3 ) [] []

CLASS (1 ) [ ] [ ]

A D U I S ( B B C H E N K O ) [] []

Return [ ]

I E n t i t y Type f o r STUDEHT ( B I L L DOUG ) I

~tuder, t

I

I E n t i t y Type f o r BDUIS (BRCHE,KO, BBLLBRD )1

i n s t r u c t o r l

I

F i g u r e 2: I n i t i a l A c q u i s i t i o n s

Based upon the a n s ~ c r s to the questions

d e s c r i b e d above, a small n u m b e r of follow-up questions, mostly u n r e l a t e d to the subject of this

p a p e r , will be asked F o r e x a m p l e , the system will propose its best guess as to the m o r p h o l o g i c a l variants

of nouns, verbs, and o t h e r words for the user to confirm or correct

6 Intermediate Customizations

H a v i n g l e a r n e d about each physical relation

T E L I asks for i n f o r m a t i o n which, though not n e e d e d

i m m e d i a t e l y , is e i t h e r (a) m o r e simply o b t a i n e d at the outset, in a context r e l e v a n t to its semantics, than at a later, a r b i t r a r y point, or (b) acquirable collectivelv thus p r e v e n t i n g several subequent acquisitions

U n l i k e the initial acquisitions described in Section 5,

i n t e r m e d i a t e c u s t o m i z a t i o n s could be excised from the system without any loss in processing ability We now

s u m m a r i z e three forms of i n t e r m e d i a t e customizations, the last of which may be r e q u e s t e d by the user at any time A l l o w i n g users to ask for the other forms as well would be a simple m a t t e r

First, the system will ask which columns contain values that e i t h e r c o r r e s p o n d to or are t h e m s e l v e s English modifiers In F i g u r e 2-a, the values ' T ' through "G" in the "class" column might c o r r e s p o n d (respectively) to "freshman" through "graduate student", in which case acquisitions might continue as

Trang 5

suggested in F i g u r e 3 F r o m this i n f o r m a t i o n , the

system constructs a definition for each u s e r - d e f i n e d

m o d i f i e r ; for e x a m p l e the internal definition of

"sophomore" will be

((sophomore noun student) ((class p-noun) = 2))

A second i n t e r m e d i a t e acquisition, c a r r i e d out

subject to user c o n f i r m a t i o n , involves the a c c e p t a b i l i t y

of h y p o t h e s i z e d syntax and s e m a n t i c s for (a) phrases

based on "of", (b) p h r a s e s built a r o u n d "have", "with"

and "in", and (c) noun-noun phrases In deciding what

case f r a m e s to p r o p o s e T E L I considers the

i n f o r m a t i o n it has a l r e a d y a c q u i r e d about simple

functional ("of") relationships

A third f o r m of i n t e r m e d i a t e a c q u i s i t i o n

involves the s y s t e m ' s i n v i t a t i o n for the user to give

lexical and syntactic i n f o r m a t i o n for one or m o r e

u s e r - d e f i n e d c a t e g o r i e s , n a m e l y titles, a d j e c t i v e s

c o m m o n nouns, noun m o d i f i e r s , p r e p o s i t i o n s , and

verbs F o r e x a m p l e , the u s e r might specify six

adjectives and the e n t i t i e s t h e y m o d i f y , followed by

four or five verbs and t h e i r a s s o c i a t e d case f r a m e s

and so forth

7 On-Line Customization

In g e n e r a l , d e f i n i t i o n s are s u p p l i e d to T E L l

w h e n e v e r (a) an u n d e f i n e d m o d i f i e r is e n c o u n t e r e d

during the processing of an English input, or (b) the

user asks to supply or m o d i f y a d e f i n i t i o n In e a c h

case, the s a m e m e t h o d s are a v a i l a b l e for m a k i n g

definitions, and are i n d e p e n d e n t of the m o d i f i e r type

being defined W h e n c r e a t i n g or m o d i f y i n g a

m e a n i n g , users are p r e s e n t e d with i n f o r m a t i o n as

shown in F i g u r e 4-a; upon a s k i n g to "add a c o n s t r a i n t " ,

they are given the m e n u shown in F i g u r e 4-b

M u l t i p l e "constraints" a p p e a r r i n g in a s e m a n t i c

s p e c i f i c a t i o n are p r e s e n t l y p r e s u m e d to be conjoined

I Nh, ich, columns c o n t a i n (er, coded) Engli:mh ~,3r,ds? SIUDEMT (BILL, DOUG, ) = []

RDVIS (BACHEMK0, BALLARD, ) [ ]

A b o r t [ ] Return []

l U o r d s a s s o c i a t e d w i t h t h e C L A S S v a l u e 1 :

fre~hmar,

I U o r d s a s s o c i a t e d w i t h t h e CLASS v a l u e G:

9 r a d u a t e l

M o d i f ' i e r s in CLASS A d , i e c t l v e Mounmod Houn

R e t u r n [ ]

Figure 3: [l)termediate Acquisitions

S e m a n t i c S p e c i f i c a t i o n

[ Sample Usa.qe: Sa.qe is LARGE ] the LENGTH of Sage :: 380

(~dd a constraint)

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

[ retur'n ]

D e f i n e t i l e s e m a n t i c s o f

V e r b P h r a s e : T R A I L L E A D S TO M O U N T A I I q

b y Henu Selection En91istn(lik:e) Re:fercnce:

Database Refet'ences

gorr0vAng from an E×istin9 l'leardrbg

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

[ ret.urn ]

Figure 4: Top-Level Semantics Menus

As s u g g e s t e d in F i g u r e 4-a and below, definitions are m a d e in t e r m s of sample values, which the system t r e a t s as f o r m a l p a r a m e t e r s In this way we avoid the p r o b l e m of d e f i n i n g a p h r a s e two or m o r e of whose case slots m a y be f i l l e d by the s a m e t y p e of

e n t i t y (cf "a s t u d e n t is a c l a s s m a t e of a s t u d e n t if .")

T o assure that any d o m a i n value m a y a p p e a r as a

c o n s t a n t , the user is able to a l t e r the s y s t e m ' s choice

of s a m p l e n a m e s at any time

7.1 Specification at the Database Level

As n o t e d in Section 3, s e m a n t i c s p e c i f i c a t i o n s at the d a t a b a s e level are p r i m i t i v e but useful A s shown

in F i g u r e 5, a d a t a b a s e level s p e c i f i c a t i o n c o m p r i s e s (a) a r e l a t i o n , possibly a r r i v e d at via a u s e r - d e f i n e d join, and (b) r e f e r e n c e s to c o l u m n s that c o r r e s p o n d to the p a r a m e t e r s of the p h r a s e whose s e m a n t i c s is being

d e f i n e d In m a n y cases, the s y s t e m can utilize its

c o l u m n type i n f o r m a t i o n , a c q u i r e d as d e s c r i b e d in Section 5, to p r e d i c t b o t h the r e l a t i o n to be used (or pair of r e l a t i o n s for joining) and the a p p r o p r i a t e columns to join over, in which case the m e n u ( s ) that are p r e s e n t e d will c o n t a i n b o l d f a c e selections for the user to c o n f i r m or a l t e r

7.2 Specification by Menu

In our previous e x p e r i e n c e with L D C , we found that a large variety of meanings could be d e f i n e d by a

p r e d i c a t e in which the result of some function is

c o m p a r e d using some relational operator to a s p e c i f i e d

e n h a n c e m e n t to this s c h e m e w h e r e d e f i n i t i o n s (a) may involve m o r e than one a r g u m e n t (b) may c o n t a i n

m o r e than one function r e f e r e n c e , and (c) are

a c q u i r e d in menu form The c u r r e n t i n t e r n a l

r e p r e s e n t a t i o n of a m e n u s p e c i f i c a t i o n is a triple of the f o r m suggested by

Trang 6

W h i c h r e l a t i o n gives tile m e a n i n ( j o f

H E I G H T o f M O U N T A I N

HOUNT,qlNS: N,ql,iE, ELEL,,',qTION, P I A P - " ~ - ~

C,qI,1PSITES: SITE, C,qP,qCITY, TYPE

.

[Join TI.~'O Relations]

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

[ ret, urn ]

To f i n d t h e HEIGHT o f a MOUHTRIM:

+ Which ,:olumr~ 9 i r e s MOUMTFIIH: NAME ELEVATION MAP

Which column 9i'v'e:5 HEIGHT: NAME ELEVATION MAP

MOUHTAIMS: [ t F I I ' I E (NASHIHGTOM, ADAMS, )

ELEUFITIOM (1917, 1768, )

MAP ( 6, 6 )

E>:it [ ]

Figure 5: Database Specification

< s p e c > > < t e r m > < r e l o p > < t e r m >

< t e r m > > < a t o m > ] <func> ( < a t o m > )

< a t o m > > < c o n s t a n t > I < p a r a m e t e r >

< r e l o p > > = I < [ < = I > 1 > - - I - =

A n e x a m p l e of how menu s e m a n t i c s o p e r a t e s is given

in F i g u r e 6 W h e n a semantics m e n u first a p p e a r s , its

"Function" field contains a list of all functions known

to apply to at least one of the entities that the

definition relates to This reduces the n u m b e r of

k e y s t r o k e s r e q u i r e d from the user and m o r e

i m p o r t a n t l y , helps g u a r d against an i n a d v e r t e n t

p r o l i f e r a t i o n of concept names

7.3 English and English-Like Specifications

In addition, to the d a t a b a s e and m e n u schemes

just d e s c r i b e d , users may supply definitions in terms of

English a l r e a d y k n o w n to the system Some

a d v a n t a g e s to this are that (1) definitions may be

a r b i t r a r i l y complex, l i m i t e d only by the c o v e r a g e of

the underlying syntactic c o m p o n e n t , and (2) users will

implicitly be learning to supply s e m a n t i c s at the same

time they learn to use the N L P itself Some

d i s a d v a n t a g e s are (1) a user might want to define

something that cannot be p a r a p h r a s e d within the

bounds of the g r a m m a t i c a l c o v e r a g e of the system,

and (2) unless o p t i m i z a t i o n s are c a r r i e d out,

r e f e r e n c e s to u s e r - d e f i n e d concepts may entail

inefficient processing

A n a l t e r n a t i v e to English specification, which

functions similarly from the user's s t a n d p o i n t , is to

provide for "English-like" specifications in which an

expression supplied by the user is t r a n s l a t e d by some

p a t t e r n - m a t c h i n g algorithm d i f f e r e n t from and

probably less s o p h i s t i c a t e d than the process involved

in a c t u a l English parsing The p r i m a r y a d v a n t a g e of

English-like s p e c i f i c a t i o n , over English specification,

is that translations into i n t e r n a l form can be more

e f f i c i e n t , since definitions or p a r t s of definitions will

be h a n d l e d on a case by case basis One p r o b a b l e

d i s a d v a n t a g e is that the scheme will be less g e n e r a l , in

t e r m s of d e f i n a b l e concetps, and p e r h a p s "spotty" in

t e r m s of what it m a k e s available

In T E L I , both English and English-like

s p e c i f i c a t i o n are done in t e r m s of sample d o m a i n values, which are t r e a t e d as f o r m a l p a r a m e t e r s A n

e x a m p l e a p p e a r s in F i g u r e 7 In the c u r r e n t

i m p l e m e n t a t i o n , English-like specifications include (a) any d e f i n i t i o n d e f i n a b l e by menu, and (b) definitions that involve (possibly n e g a t e d ) adjective or noun

r e f e r e n c e s As of this writing, only English specifications that involve no nested p a r a m e t e r

r e f e r e n c e s can be processed

7.4 Specification by Borrowing

In a d d i t i o n to w h a t e v e r m e c h a n i s m s an N L system specifically provides for s e m a n t i c acquisitions,

it is r e a s o n a b l e to allow users to define one meaning

directly in terms of a n o t h e r (in a d d i t i o n to indirect

d e p e n d e n c e , as in the case of English specification)

In T E L I , users may ask to "borrow" from an existing

m e a n i n g at any time As shown in F i g u r e 8, the system responds by finding all c u r r e n t items d e f i n e d in

t e r m s of all or some of the p a r a m e t e r s (i.e entities) of the i t e m for which the b o r r o w i n g is being done This assures that the e n t i r e b o r r o w e d m e a n i n g can be

m o d i f i e d to apply to the i t e m being defined A f t e r being copied, a b o r r o w e d m e a n i n g may be e d i t e d just

as though it had b e e n e n t e r e d f r o m scratch

A d j e c t i v e : FILE i s LFIRGE

[ Sample Usage: Sage i s LFIRGE ]

F u n c t i o n : CREATION-DATE LEN6TH OWNER (none)

o t h e r : M I L

Rr9ument: Sage

o t h e r : M I L

R e l a t i o n : != < <= > >=

F u n c t i o n : C R E A T I O N - D A T E L E N G T H O W N E R (none)

o t h e r : M I L

Flrgu~ent: 3 0 0 Sage

o t h e r : HIL Retain t h i s d e f i n i t i o n : Yes No

E.-: i t [ ]

Figure 6: Menu Specification

tk, e h e i g h t o f adams i s 9 r e a r e r thar, 4B001

A d j e c t i v e : MOUNTAIN i s TALL

[ SaBple Usage: Rdans i s IRLL ]

I y p e an E n g l i s h ( l i k e ) Reference

Figure 7: English-like Specification

Trang 7

Is t i l e m e a n i n g o f

S T U D E N T is A D V A N C E D

r e l a t e d t o o n e o f t h e f o l l o w i n q ?

STUDENT is a FRESHH,qN STUDENT is a 6R,qDU,qTE STUDENT is a C,R,@UATE STUDENT

STUDENT is a JUNIOR STUDENT i:s a SENIOR STUDENT is a SOPHOPIORE STUDENT is an UNDERC, Rf~DU,qTE

.

CLflSS of STUDENT

Figure 8: Borrowing a M e a n i n g

8 Relation to Similarly Motivated Systems

At the most abstract level, our approach to

transportability is unusual in that we have begun by

building a moderately sophisticated N L P ' w h i c h , from

the outset, fundamentally includes replete customization

first built, perhaps over a period of several years, a

distinctive, though perhaps less so in seeking to allow

for customization by end users, as opposed to (say) a

database administrator (cf Thompson and Thompson,

1975, 1983, 1985; Johnson, 1985)

Some of the systems which, like TEL1, seek to

provide for user customization within the context of

database query are ASK (Thompson and Thompson

1983, 1985) formerly R E L (Thompson and Thompson,

1975) from Caltech; INTELLECT, formerly Robot

(Harris, 1977), marketed by Artificial Intelligence

Corporation; IRUS (Bates and Bobrow, 1983; Bates

Moser, and Stallard 1984), from BBN Laboratories;

TQA (Damerau, 1985) formerly R E Q U E S T (Plath,

1976), from IBM Yorktown Heights; TEAM (Grosz

1983; Grosz et al, 1985) from SRI International; and

USL (Lehmann, 1978), from IBM Heidleberg Other

DATALOG (Hafner and Godden 1985) from General

Motors Research Labs; HAM-ANS (Wahlster 1984),

from the University of Hamburg; and PHLIQA

(Bronnenberg et al, 1978-1979) from Philips Research

We now provide a comparison of T E L I ' s

customization strategies with those of the T E A M ,

IRUS, T Q A , and ASK systems (other comparisons

would also have been instructive, time and space

permitting) Although we have recently spoken with

at least one designer of each of these systems (see the

Acknowledgements), it is possible that, in addition to

intended simplifications, we may have overlooked or

undocumented, features, in which case we apologize

to the reader Also, we note that our remarks are

overall quality of T E L l or any other system

8.1 A Comparison with TEAM

Both T E A M and T E L I represent English- language interfaces that have been applied to several

Each system provides for a variety of customizations

system has claimed success with actual users in either customization or English processing mode In terms of method, each system obtains (among other things) information about each column of each relation (table) of the database We proceed to point out some

of the more significant differences between the projects, as suggested by Grosz et al (1985) and indicated by Martin (1986)

To begin with, T E A M incorporates a more powerful natural language processor than does T E L l , with provisions for quantifiers, simple pronouns,

conjunction, and numerous smaller features Its "sort hierarchy" provides a taxonomy more general than that of TELI It also incorporates disambiguation heuristics which seek to obviate the need for users to

prepositional phrases based on "on", "from", "with", and "in"), and its preparations to deal with time and place references are without counterpart in TELI

On the other hand, the customization features

of T E L l appear to offer greater sophistication, and

sophistication, T E L I always offers multiple ways of acquiring information, provides the ability to examine and borrow existing definitions, and is able to invoke the appropriate knowledge acquisition module when missing lexical, syntactic, or semantic information is required

generally provides for more complex definitions of words and phrases than does T E A M , as described in Sections 5-7 For example, whereas the SRI system typically requires a verb to map into some explicit or virtual relation (e.g a join of explicit relations), T E L l also allows an arbitrary number of properties of objects to be used in definitions (e.g an old employee

is one hired before I980 or an employee admires a

manager that works more hours than she does)

In T E A M , "acquisition is centered around the relations and fields in the database" In contrast,

T E L I provides several customization modes, as described in Section 3, and discourages low-level database specifications

Trang 8

In contrast to the principles we espoused for

T E L I in Section 3, T E A M couples its methods of

acquisition with the type of modifier being defined

For example, when seeing a "feature field", which

contains exactly two distinct values, the system asks

for "positive adjectives" and "negative adjectives"

associated with these values (e.g "volcanic" is a

positive adjective associated with the database value

"Y") In TEL1, these relationships arise as a special

case of the acquisitions shown in Figures 3 6 , and 7b

A n interesting similarity b e t w e e n T E A M and

T E L I is that each provides for English(like)

definitions For example T E A M might be told that "a

volcano erupts", from which it infers that a mountain

erupts just in case it is a volcano

8.2 A Comparison with IRUS

A n o t h e r recently developed facilitiy to allow

represented by the I R A C Q c o m p o n e n t of the I R U S

system ( A y u s o and Weischedel, 1986) In addition to

its practical value, I R A C Q is intended as a vehicle

that permits experimental work with sophisticated

knowledge representation formalisms

I R A C Q is similar to T E L I in shielding the user

from the layout of the underlying data files A n o t h e r

similarity is that each system accepts case frame

specifications in English-like form but I R A C Q allows

proper nouns as well as c o m m o n nouns to be used

Thus a user might suggest the case frame of the verb

"write" by saying "Jones wrote some articles" Since

relationships among defined concepts (e.g nouns),

I R A C Q proceeds to ascertain which of the possibly

several classes that "Jones" belongs to is the most

general one that can act as the subject of "write"

One important difference between T E L I and

I R A C Q is that I R U S distinguishes conceptual

information, which resides within its KR f r a m e w o r k ,

from the linguistic information that characterizes the

English to be used Thus, while I R A C Q supports

predicates, as does T E L l , it assumes that any concepts

needed to define a new language item have already

been specified These representations, acquired by a

separate module called K R E M E , involve the K L - O N E

notions of "concept" and "relation", which are similar

to, but more sophisticated than, the 1- and 2-place

predicates that come into existence during a session

with T E L I

At present, I R A C Q allows users to define case

phrases, and noun phrases involving "of" Its

t r e a t m e n t of prepositional phrases is very much like

that of T E L I in that the head noun being modified is

considered part of the the noun-preposition-noun triple for which a definition is beine acquired (cf Section 4,1) Definitions for individual words (e.g nouns and adjectives) are not s u p p o r t e d but are being considered for future versions of the system, as are facilities that enable the system to inform the user of existing predicates that might be useful in defining a new language item This facility will be similar in spirit to T E L I ' s provisions for "borrowing" definitions

as described in Section 7.4

8.3 A Comparison with TQA

Unlike most efforts at transportability, T Q A has been designed as a working prototype, capable of being customizated for complex d a t a b a s e applications

by actual users The primary responsibility of the customization module is to acquire information that relates language concepts, e.g subject of a given verb,

to the columns of the database at hand

Like T E L I , T Q A avoids having to copy all database values into the lexicon by constructing

"shape" information to recognize numbers and similar patterns For example, the system might deduce that all database values referring to a d e p a r t m e n t are of the form "letter followed by two digits", which allows for valuable disambiguations during parsing Thus, in

a database where employees m a n a g e projects and supervisors manage departments, the question "Who manages K34?" can be understood to be asking about supervisors without having to find "K34" in either the lexicon or the database

A related problem, which T Q A addresses more squarely than most systems (including T E L I ) , concerns the appearance and possible equivalence of database values For example "vac lnd" might indicate "vacant land", "grn" and "green" might be used interchangeable, and so forth M a n y practical applications require that these sorts of issues be addressed in order for a user to obtain reliable information

A n o t h e r useful feature concerns the acquisition

formatting In simple cases, a database administrator might want nine-digit values appearing in columns associated with social security numbers to be printed with dashes at the appropriate points (e.g 123456789 becomes 123-45-6789), In more complicated situations, values might actually need to be decoded, so that 0910 becomes "vacant land" This provision for decoding is similar to to the form of intermediate acquisition shown in Figure 3, though here it is being used for opposite effect

Trang 9

8.4 A Comparison with ASK

The current ASK prototypes, which run on Sun,

Vax, and HP desktop systems, are derived from

earlier work on the R E L system, which itself derives

from work on the D E A C O N project, which stems

from the early 1960's Unlike most recent efforts,

features into an existing more-or-less single-domain

system, the work with R E L , the "Rapidly Extensible

capabilities as early as 1969

To begin with, ASK provides quite general

customization facilities, allowing English definitions at

least as sophisiticated as those outlined in Section 7.3

An example is "ships 'carry' coal to Oslo if there is a

shipment whose carrier is ships, type is coal and

destination is Oslo" Arithmetic facilities are also

provided, e.g "area equals length times beam"

intergrated information management system, rather

than provide simple sentence-by-sentence database

retrieval One feature allows ASK to be connected to

several external database systems, drawing information

from each of them in the context of answering a user's

question A second feature allows a user to provide

specification of a record type, followed by information

used to populate the newly created relation

Acknowledgements

The current T E L I system derives from work on

the LDC project, which was carried out at Duke

University by John Lusth and Nancy Tinkham In

converting the NL portions of LDC to operate in our

Bachenko, Alan Biermann, Marcia Derr, George

Heidorn, Mark Jones and Mitch Marcus We also

wish to thank Paul Martin of SRI, Damaris Ayuso and

Ralph Weischedel of BBN, Fred D a m e r a u of IBM

Yorktown Heights, and Fred Thompson of Caltech,

for their willingness to answer a number of questions

that helped us to formulate the comparisons given in

Section 8 Finally, we wish to thank Marcia Derr for

many useful comments on a draft of our paper

References

Communication, April 1986

Cognition and Brain Theory 5, 3 (1982), 269-287

Ballard, B "The Syntax and Semantics of User-

University, July 1984, 52-56

Ballard, B "User Specification of Syntactic Case Frames in T E L I , A Transportable, User-Customized

Natural Language Processor", Proc Coling-86, Bonn,

West G e r m a n y , August, 1986

Ballard, B., Lusth, J., and Tinkham, N "LDC-I: A Transportable Natural Language Processor for Office

Information Systems 2, 1 (1984), 1-23

Ballard, B and Tinkham, N "A Phrase-Structured Grammatical Framework for Transportable Natural

Language Processing", Computational Linguistics 10, 2

(1984), 81-96

Bates, M and Bobrow, R "A Transportable Natural

Language Interface for Information Retrieval", Proc 6th Int ACM SIGIR Conference, Washington, D.C.,

June 1983

Bates, M., Moser, M and Stallard, D "The I R U S Transportable Natural Language Interface", Proc First Int Workshop on Expert Database Systems,

Kiawah Island, October 1984, 258-274

Schoenmakers, W and van Utteren, E "PHLIQA-1,

Consultation in Natural English", Philips tech Rev 38

(1978-79), 229-239 and 269-284

Customization of Natural Language Database Front

Ends", ACM Transactions on Office Information Systems 3, 2 (1985), 165-184

Language Interface System", Conf on A p p l i e d Natural Language Processing, Santa Monica, 1983, 39-45

Grosz, B., Appelt, D., Martin, P and Pereira, F

Transportable Natural-Langauge Interfaces", Artificial Intelligence, in press

Hafner, C and Godden, C "Portability of Syntax and

Semantics in Datalog" ACM Transactions on Office Information Systems 3 2 (1985), 141-164

Trang 10

Harris, L "User-Oriented Database Query with the ROBOT Natural Language System", Int Journal of Man-Machine Studies 9 (1977), 697-713

Commercial Applications Ovum Ltd, London 1985 Lehmann H "Interpretation of natural language in

an information system", IBM J Res Dev 22, 5 (1978),

pp 560-571

Martin, P Personal communication, March 1986

Tennant, H "Experience With the Evaluation of Natural Language Question Answerers", Int J Conf

on Artificial Intelligence, 1979, pp 275-281

Thompson, F and Thompson, B "Practical Natural Language Processing: The REL System as Prototype",

In Advances in Computers, Vol 3, M Rubinoff and M Yovits, Eds., Academic Press, 1975

Thompson, B and Thompson, F "Introducing ASK:

A Simple Knowledgeable System", Conf on Applied Natural Language Processing, Santa Monica, 1983 17-

24

Transportable in Half a Dozen Ways", ACM Trans on Office Information Systems 3, 2 (1985), 185-203

Wahlster, W "User Models in Dialog Systems", Invited talk at Coling-84, Stanford University, July

1984

Tiêu đề	Semantic Acquisition In Teli: A Transportable, User-Customized Natural Language Processor
Tác giả	Bruce W. Ballard, Douglas E. Stumberger
Trường học	AT&T Bell Laboratories
Chuyên ngành	Computational Linguistics
Thể loại	báo cáo khoa học
Thành phố	Murray Hill

Định dạng
Số trang	10
Dung lượng	851,54 KB