Tinkham 1984, In the remainder of this paper, the reader should bear in mind that the acquisition modules of TEL1, including the menus they generate, are driven by extensible data struct
Trang 1Semantic Acquisition In TELI: A Transportable, User-Customized Natural Language Processor
B r u c e W B a l l a r d
D o u g l a s E S t u m b e r g e r
A T & T B e l l L a b o r a t o r i e s
6 0 0 M o u n t a i n A v e n u e
M u r r a y H i l l , N J 0 7 9 7 4
Abstract
We discuss ways of allowing the users of a natural
language processor to define, examine, and modify the
definitions of any domain-specific words or phrases
known to the system A n implementation of this work
forms a critical portion of the knowledge acquisition
Interface ( T E L l ) , which answers English questions
about tabular (first normal-form) data files and runs
on a Symbolics Lisp Machine H o w e v e r , our
techniques enable the design of customization modules
that are largely independent of the syntactic and
retrieval components of the specific system they
supply information to In addition to its obvious
practical value, this area of research is important
because it requires careful attention to the formalisms
interactions among the modules based on those
formalisms
1 Introduction
Language Interface system ( T E L I ) we have sought to
scientific nature Concerning the applied side of
computational linguistics, we seek to redress the fact
that many natural language prototypes, despite their
sophistication and even their robustness, have fallen
into disuse because of failures (1) to make known to
users exactly what inputs are allowed (e.g what words
capabilities that meet the precise needs of a given user
or group of users (e.g appropriate vocabulary, syntax
and semantics) Since experience has shown that
neither users nor svstem designers can predict in
meanings that will arise in accessing a given database
(cf Tennant 1979) we have sought to make T E L l
"transportable" in an extreme sense, where
customizations may be p e r f o r m e d (1) by end users, as
opposed to the system designers, and (2) at any time
during the processing of English sentences, rather
English processing may occur
In addition to the potential practical benefits of
conceived transportability projects can make useful
scientific contributions to computational linguistics since single-domain systems and, to a lesser extent, systems adapted over weeks or months by their designers, afford opportunities to circumvent, rather than squarely address, important issues concerning (a)
the precise nature of the formalisms the system is
system modules Although customization efforts offer
above are less likely to go unnoticed when dealing with a system whose domain-specific information is supplied at run-time, especially when that information
is being provided by the actual users of the system
By way of overview, we note that the T E L I system derives from previous work on the L D C project, as d o c u m e n t e d in Ballard (1982), Ballard (1984), Ballard, Lusth and T i n k h a m (1984) and Ballard and T i n k h a m (1984) The initial prototype of
T E L I which runs on a Symbolics Lisp Machine, is
information stored in one or more tables, (i.e first-
n o r m a l - f o r m relational database) A sample view of the display screen during a session with T E L l which may give the flavor of how the system operates, is shown in Figure L I n f o r m a t i o n on some aspects of knowledge acquisition not discussed in this paper particularly with regard to syntactic case frames, can
be found in Ballard (1986)
2 Types of Modifiers Available in TELI
The syntactic and semantic models adopted for TEL1 are intended to provide a unified t r e a t m e n t of a broad and extendible class of word and phrase types
By providing for an "extendible" class of constructs,
we m a k e the knowledge acquisition module of T E L l independent of the natural language portion of the system, whose earlier version has been described in Ballard and T i n k h a m (1984) and Ballard Lusth and
Trang 2Tinkham (1984), In the remainder of this paper, the
reader should bear in mind that the acquisition
modules of TEL1, including the menus they generate,
are driven by extensible data structures that convey
the linguistic coverage of the underlying natural
language processor (NLP) for which information is
being acquired For example, incorporating adjective
phrases into the system involved adding 12 lines of
Lisp-like data specifications This brevity is largely
due to the use of case frames that embody dynamically
alterable selectional restrictions (Ballard, 1986)
As an initial feeling for the coverage of the
N L P for which information is currently acquired,
TEL1 provides semantics for the word categories
Adjective
e.g a n expensive restaurant
Noun Modifier
e.g a graduate student
Noun
e.g a pub
and the phrase types
Adjective Phrase
e.g employees responsible f o r the planning projects
Noun-Modifier Phrase
e.g the speech researchers
Prepositional Phrase
e.g the trails on the Franconia-Region map
Verb Phrase
e.g employees that report to Brachman
Functional Noun Phrase
e.g the size of department 11387,
the colleagues of Litman
In addition to these user-defined modifier types, the system currently provides for negation, comparative and superlative forms of adjectives, possessives, and ordinals Among the grammatical features supported
prepositional and adjective phrases, fronting of verb phrase complements, and other minor features One important area for expansion involves quantifiers both logical (e.g "all") and numerical (e.g "at least 3")
3 Principles Behind Semantic Acquisition
As noted above, our goal is to devise techniques that enable end users of a natural language processor
to furnish all domain-specific information to by the system This information includes (1) the vocabulary
needed for the data at hand; (2) various types of
selectional restrictions that define acceptable phrase attachments; and most critically (3) the definitions of words and phrases With this in mind, the primary criteria which the semantic acquisition component of
T E L I has been designed around are as follows
To allow users to define, examine or modify domain- specific il(ormation at any time This derives from our beliefs that the needs of a user or group of users cannot all be predicted in advance, and will probably change once the system has begun operation
To enable users to impart new concepts to the system
We provide more than just synonym and paraphrase capabilities and, in fact definitions may be arbitrarily complex, by being defined either (a) in terms of other
definitions, or (b) as the conjuction of an arbitrary number of constraints
E n g l i s h I n p u t :
h i c h t r a i l s t h a t a r e n ' t long lead to a mountain on £ r a n c o n i a r i d g e
I n t e r n a l R e p r e s e n t a t i o n :
(TRAIL (VERBINFO (TRAIL LEAD NIL NIL TO MOUNTAIN)
(SUBJ ?) (ARG (MOUNTAIN (QURNT = NIL)
(NOT (RDJ LONG)))
Algebra Querg:
(SELECT t r a i l s ( t r a i l length-Km)
(and (< length-km 67
Answer:
(PREPINFO (MOUNTAIN ON TRAIL)
(SUBJ ?) (ARG (TRAIL (= FRANCONIA-RIDGE)))))))
(= t r a i l (SELECT
(TJOIN t r a i l s ( t r a i l ) = m t n - t r a i l s ( t r a i l ) ) ( t r a i l ) (= name (SELECT
(TJOIN mountains(name) = m t n - t r a i l s ( n a m e ) ) ( n a m e ) (= t r a i l ' f r a n c o n i a - r i d g e ) ) ) ) ) ) )
(TRAILS) TRAIL - LENGTH-KM OLD-BRIDLE-PATH 4.1
LIBERTY-SPRING 4.7
W h a t ' s Y o u r P l e a s u r e ? ,qrl:S~,ver a QuE:st~Jotl
E d i t the L a s t I n p u t
P r i n t Parse Tree Run Pieces o f the NLP
Exit
Begin a &~ston-dzatior~
UocabulaG
5ynta:,., Sernavftic:s General I n f o Clear 5 , : r e o l
E d i t Global Flags
5ave/Pet.rieve Session
Figure 1: Sample Display Screen; Top-Level M e n u of TEL1
Trang 3To provide definition capabilities independent of
prepositional phrases, verb phrases, and so forth are
all defined in precisely the same way This is achieved
in part by treating all modifiers as n-place predicates
To allow definitions to be given at various conceptual
English; (b) in terms of the meanings of previously
"conceptual" relationships, which have been abstracted
to a level above that of the physical data files; or (d)
in terms of database columns We strive to minimize
the need for low-level database references, since this
helps (1) to avoid tedious and redundant references,
and (2) to assure that most of our techniques will be
applicable beyond the current conventional database
setting
example, the menu scheme described in Section 7.2
offers the user more assistance in making definitions
but is less powerful, than the alternative English and
English-like methods described in Section 7.3 We
prefer to let users decide when each modality is
appropriate, rather than force a compromise among
simplicity, reliability, and power
To enable the system t o proride help or guidance to the
all current modifiers of or functions associated with,
opportunities exist for co-operation on the part of the
system To avoid unnecessary limitations, however,
users are generally able to override any hints made by
the system
4 Semantic Processing in TELI
The semantic model developed for T E L I , in
which definitions are acquired from users, assumes
that (1) modifier meanings will be purely extensional,
and can thus be treated as n-place predicates, and (2)
compositional Concerning the latter assumption, we
including problems of word sense, will have been
restrictions (Ballard and Tinkham, 1984), and (b)
minimal re-ordering does occur in converting parse
trees into internal representations
4.1 Types of Semantics
All user-defined semantics, however acquired,
are stored in a global Lisp structure indexed by the
word or phrase being defined Single-word modifiers
are indexed by the word being defined, its part of
speech, and the entity it modifies; phrasal modifiers
are indexed by the phrase type and the associated case
frame For example, the internal references
(new adj room) (prep-ph (restaurant in county))
respectively index the definitions of "new", when used
as an adjective modifier of rooms, and "in", as it relates restaurants to counties As suggested by this indexing scheme, word meanings arise only in the context of their occurrence, never in isolation Thus,
"new room" and "restaurant in county" receive definitions, not "new" or "in" This decision lends
additional effort thereby needed to make multiple
borrowed meanings, as described in Section 7.4 Although our representation strategies allow for definitions that involve relatively elaborate traversals
of the physical data files T E L I does not presently provide for arithmetic computations Thus, the input
"Which restaurants are within 3 blocks of China Gardens?" requires a 2-place "distance" function and, unless the underlying data files provide distances
distances to account for) the necessary semantics cannot be supplied
4.2 Internal Representations
As an example of the "internal representation" (IR) of an input, which results from a recursive traversal of a completed parse tree, and which illustrates preparations for compositional analysis, the (artificially complex) input
"Which Mexican restaurants in the largest city other than New Providence that are not expensive are open for lunch'?"
will have [roughly] the internal representation
(restaurant (not (adj expensive))
(nounmod-ph (food restaurant)
(nounmod (food (= Mexican))) (head ?))
(prep-ph (restaurant in city)
(subj ?) (arg (city (super large)
(!= New-Providence)))) (adj-ph (restaurant open for meal) (subj ?)
(arg (meal (= lunch)))))
This top-level interpretation of the input instructs the system to find all restaurants that satisfy (a) the
"expensive", and (b) the three 2-place predicates
associated with the noun-noun, prepositional, and adjective phrases Note that modifiers associated with
Trang 4phrasal m o d i f i e r s are r e f e r e n c e d by their case f r a m e ,
e.g "restaurant in city" W i t h i n the scope of these
r e f e r e n c e s , case labels (e.g "subj" and "arg") indicate
which slots have been i n s t a n t i a t e d and which slot has
been r e l a t i v i z e d , the l a t t e r d e n o t e d by "?" The list of
slot names associated with each phrase type is s t o r e d
globally In most instances, the a r g u m e n t of a case
slot can be an a r b i t r a r y IR s t r u c t u r e , in k e e p i n g with
the recursive n a t u r e of the English inputs being
recognized
Since IR structures are built a r o u n d the w o r d
and phrase types of the English being dealt with, and
since the meanings of words and p h r a s e s are s t o r e d
globally, IR s t r u c t u r e s should not be r e g a r d e d as a
"knowledge r e p r e s e n t a t i o n " in the sense of K L - O N E ,
logical form and so forth Systems similar in goals to
T E L I but which revolve a r o u n d logical form include
T E A M (Grosz, 1983; G r o s z , A p p e l t M a r t i n , and
P e r e i r a 1985), I R U S (Bates and Bobrow, 1983; Bates,
M o s e r , and Stallard 1984), and T Q A (Plath, 1976;
D a m e r a u , 1985) One system similar to T E L I in
building i n t e r m e d i a t e s t r u c t u r e s that c o n t a i n
r e f e r e n c e s to language-specific concepts is
D A T A L O G ( H a f n e r and G o d d e n 1985)
5 The Initial Phase of Customization
W h e n a user asks TEL1 to begin l e a r n i n g about
a new d o m a i n , the system spends from five to t h i r t y
minutes, d e p e n d i n g o n the c o m p l e x i t y of the
a p p l i c a t i o n , obtaining basic i n f o r m a t i o n about each
table in the the d a t a b a s e (see F i g u r e 2) U s e r s are
first a s k e d to give the key column of the table This
i n f o r m a t i o n is used p r i m a r i l y to guide the system in
i n f e r r i n g the s e m a n t i c s of c e r t a i n noun-noun and "of"-
based p a t t e r n s N e x t , users are a s k e d which columns
c o n t a i n entity values as o p p o s e d to property values
T y p i c a l p r o p e r t i e s are "size", "color", and "length",
which d i f f e r from entities in that (a) their values do
not a p p e a r as an a r g u m e n t to a r b i t r a r y verbs and
p r e p o s i t i o n s (e.g o t h e r than "have", "with", etc.) and
(b) they will not t h e m s e l v e s have p r o p e r t i e s a s s o c i a t e d
with them F i n a l l y , users are a s k e d to specify the type
of value e a c h c o l u m n contains This i n f o r m a t i o n
allows subsequent r e f e r e n c e s to concepts (e.g "color")
r a t h e r than physical column names It also aids the
system in forming subsequent suggestions to the user
(e.g defaults that can be o v e r r i d d e n )
H a v i n g o b t a i n e d the i n f o r m a t i o n
above, the system constructs definitions
simple questions to be a n s w e r e d , such as
i n d i c a t e d that allow
"What is Sally's social security number?"
"What is the age of John"
A l o n g with i n f o r m a t i o n f r e e l y v o l u n t e e r e d by the
user, these definitions can be s u b s e q u e n t l y e x a m i n e d
or c h a n g e d at the user's request
STUDENT-INFO
- STUDENT - BILL DOUG FRED JOHN SALLY SUE TERESA
123-45-67891 I BBLLRRD 111-22-3333 3 LITMAN 321-54-9876 3 MARCUS 555-33-1234 2 JONES
314-15-9265 4 BRACH~RN 987-65-4321 3 BRCHENKO
3 3 3 - 2 2 - 4 4 4 4 G BORGIDR
W h i c h is t h e " k e y " c o l u m n o f S T U D E N T - I N F O ?
5 T L I D E N T (BILL, [IOIJI5 ) ~
~" 1~_,-4J-b,.',_.,9 )
SSN (111-~-~:,_,D, .:.o ~ - ~ e - CLASS (1, 2, ,
~DUI5 (BACHENKO, B,~LL,~RD .)
Columns o f STUDEHT-IHFO E n t i t y PrToperty STUDENT (BILL ) 1~
SSH ( 1 1 1 - 2 2 - 3 3 3 3 ) [] []
CLASS (1 ) [ ] [ ]
A D U I S ( B B C H E N K O ) [] []
Return [ ]
I E n t i t y Type f o r STUDEHT ( B I L L DOUG ) I
~tuder, t
I
I E n t i t y Type f o r BDUIS (BRCHE,KO, BBLLBRD )1
i n s t r u c t o r l
I
F i g u r e 2: I n i t i a l A c q u i s i t i o n s
Based upon the a n s ~ c r s to the questions
d e s c r i b e d above, a small n u m b e r of follow-up questions, mostly u n r e l a t e d to the subject of this
p a p e r , will be asked F o r e x a m p l e , the system will propose its best guess as to the m o r p h o l o g i c a l variants
of nouns, verbs, and o t h e r words for the user to confirm or correct
6 Intermediate Customizations
H a v i n g l e a r n e d about each physical relation
T E L I asks for i n f o r m a t i o n which, though not n e e d e d
i m m e d i a t e l y , is e i t h e r (a) m o r e simply o b t a i n e d at the outset, in a context r e l e v a n t to its semantics, than at a later, a r b i t r a r y point, or (b) acquirable collectivelv thus p r e v e n t i n g several subequent acquisitions
U n l i k e the initial acquisitions described in Section 5,
i n t e r m e d i a t e c u s t o m i z a t i o n s could be excised from the system without any loss in processing ability We now
s u m m a r i z e three forms of i n t e r m e d i a t e customizations, the last of which may be r e q u e s t e d by the user at any time A l l o w i n g users to ask for the other forms as well would be a simple m a t t e r
First, the system will ask which columns contain values that e i t h e r c o r r e s p o n d to or are t h e m s e l v e s English modifiers In F i g u r e 2-a, the values ' T ' through "G" in the "class" column might c o r r e s p o n d (respectively) to "freshman" through "graduate student", in which case acquisitions might continue as
Trang 5suggested in F i g u r e 3 F r o m this i n f o r m a t i o n , the
system constructs a definition for each u s e r - d e f i n e d
m o d i f i e r ; for e x a m p l e the internal definition of
"sophomore" will be
((sophomore noun student) ((class p-noun) = 2))
A second i n t e r m e d i a t e acquisition, c a r r i e d out
subject to user c o n f i r m a t i o n , involves the a c c e p t a b i l i t y
of h y p o t h e s i z e d syntax and s e m a n t i c s for (a) phrases
based on "of", (b) p h r a s e s built a r o u n d "have", "with"
and "in", and (c) noun-noun phrases In deciding what
case f r a m e s to p r o p o s e T E L I considers the
i n f o r m a t i o n it has a l r e a d y a c q u i r e d about simple
functional ("of") relationships
A third f o r m of i n t e r m e d i a t e a c q u i s i t i o n
involves the s y s t e m ' s i n v i t a t i o n for the user to give
lexical and syntactic i n f o r m a t i o n for one or m o r e
u s e r - d e f i n e d c a t e g o r i e s , n a m e l y titles, a d j e c t i v e s
c o m m o n nouns, noun m o d i f i e r s , p r e p o s i t i o n s , and
verbs F o r e x a m p l e , the u s e r might specify six
adjectives and the e n t i t i e s t h e y m o d i f y , followed by
four or five verbs and t h e i r a s s o c i a t e d case f r a m e s
and so forth
7 On-Line Customization
In g e n e r a l , d e f i n i t i o n s are s u p p l i e d to T E L l
w h e n e v e r (a) an u n d e f i n e d m o d i f i e r is e n c o u n t e r e d
during the processing of an English input, or (b) the
user asks to supply or m o d i f y a d e f i n i t i o n In e a c h
case, the s a m e m e t h o d s are a v a i l a b l e for m a k i n g
definitions, and are i n d e p e n d e n t of the m o d i f i e r type
being defined W h e n c r e a t i n g or m o d i f y i n g a
m e a n i n g , users are p r e s e n t e d with i n f o r m a t i o n as
shown in F i g u r e 4-a; upon a s k i n g to "add a c o n s t r a i n t " ,
they are given the m e n u shown in F i g u r e 4-b
M u l t i p l e "constraints" a p p e a r r i n g in a s e m a n t i c
s p e c i f i c a t i o n are p r e s e n t l y p r e s u m e d to be conjoined
I Nh, ich, columns c o n t a i n (er, coded) Engli:mh ~,3r,ds? SIUDEMT (BILL, DOUG, ) = []
RDVIS (BACHEMK0, BALLARD, ) [ ]
A b o r t [ ] Return []
l U o r d s a s s o c i a t e d w i t h t h e C L A S S v a l u e 1 :
fre~hmar,
I U o r d s a s s o c i a t e d w i t h t h e CLASS v a l u e G:
9 r a d u a t e l
M o d i f ' i e r s in CLASS A d , i e c t l v e Mounmod Houn
R e t u r n [ ]
Figure 3: [l)termediate Acquisitions
S e m a n t i c S p e c i f i c a t i o n
[ Sample Usa.qe: Sa.qe is LARGE ] the LENGTH of Sage :: 380
(~dd a constraint)
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
[ retur'n ]
D e f i n e t i l e s e m a n t i c s o f
V e r b P h r a s e : T R A I L L E A D S TO M O U N T A I I q
b y Henu Selection En91istn(lik:e) Re:fercnce:
Database Refet'ences
gorr0vAng from an E×istin9 l'leardrbg
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
[ ret.urn ]
Figure 4: Top-Level Semantics Menus
As s u g g e s t e d in F i g u r e 4-a and below, definitions are m a d e in t e r m s of sample values, which the system t r e a t s as f o r m a l p a r a m e t e r s In this way we avoid the p r o b l e m of d e f i n i n g a p h r a s e two or m o r e of whose case slots m a y be f i l l e d by the s a m e t y p e of
e n t i t y (cf "a s t u d e n t is a c l a s s m a t e of a s t u d e n t if .")
T o assure that any d o m a i n value m a y a p p e a r as a
c o n s t a n t , the user is able to a l t e r the s y s t e m ' s choice
of s a m p l e n a m e s at any time
7.1 Specification at the Database Level
As n o t e d in Section 3, s e m a n t i c s p e c i f i c a t i o n s at the d a t a b a s e level are p r i m i t i v e but useful A s shown
in F i g u r e 5, a d a t a b a s e level s p e c i f i c a t i o n c o m p r i s e s (a) a r e l a t i o n , possibly a r r i v e d at via a u s e r - d e f i n e d join, and (b) r e f e r e n c e s to c o l u m n s that c o r r e s p o n d to the p a r a m e t e r s of the p h r a s e whose s e m a n t i c s is being
d e f i n e d In m a n y cases, the s y s t e m can utilize its
c o l u m n type i n f o r m a t i o n , a c q u i r e d as d e s c r i b e d in Section 5, to p r e d i c t b o t h the r e l a t i o n to be used (or pair of r e l a t i o n s for joining) and the a p p r o p r i a t e columns to join over, in which case the m e n u ( s ) that are p r e s e n t e d will c o n t a i n b o l d f a c e selections for the user to c o n f i r m or a l t e r
7.2 Specification by Menu
In our previous e x p e r i e n c e with L D C , we found that a large variety of meanings could be d e f i n e d by a
p r e d i c a t e in which the result of some function is
c o m p a r e d using some relational operator to a s p e c i f i e d
e n h a n c e m e n t to this s c h e m e w h e r e d e f i n i t i o n s (a) may involve m o r e than one a r g u m e n t (b) may c o n t a i n
m o r e than one function r e f e r e n c e , and (c) are
a c q u i r e d in menu form The c u r r e n t i n t e r n a l
r e p r e s e n t a t i o n of a m e n u s p e c i f i c a t i o n is a triple of the f o r m suggested by
Trang 6W h i c h r e l a t i o n gives tile m e a n i n ( j o f
H E I G H T o f M O U N T A I N
HOUNT,qlNS: N,ql,iE, ELEL,,',qTION, P I A P - " ~ - ~
C,qI,1PSITES: SITE, C,qP,qCITY, TYPE
.
[Join TI.~'O Relations]
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
[ ret, urn ]
To f i n d t h e HEIGHT o f a MOUHTRIM:
+ Which ,:olumr~ 9 i r e s MOUMTFIIH: NAME ELEVATION MAP
Which column 9i'v'e:5 HEIGHT: NAME ELEVATION MAP
MOUHTAIMS: [ t F I I ' I E (NASHIHGTOM, ADAMS, )
ELEUFITIOM (1917, 1768, )
MAP ( 6, 6 )
E>:it [ ]
Figure 5: Database Specification
< s p e c > > < t e r m > < r e l o p > < t e r m >
< t e r m > > < a t o m > ] <func> ( < a t o m > )
< a t o m > > < c o n s t a n t > I < p a r a m e t e r >
< r e l o p > > = I < [ < = I > 1 > - - I - =
A n e x a m p l e of how menu s e m a n t i c s o p e r a t e s is given
in F i g u r e 6 W h e n a semantics m e n u first a p p e a r s , its
"Function" field contains a list of all functions known
to apply to at least one of the entities that the
definition relates to This reduces the n u m b e r of
k e y s t r o k e s r e q u i r e d from the user and m o r e
i m p o r t a n t l y , helps g u a r d against an i n a d v e r t e n t
p r o l i f e r a t i o n of concept names
7.3 English and English-Like Specifications
In addition, to the d a t a b a s e and m e n u schemes
just d e s c r i b e d , users may supply definitions in terms of
English a l r e a d y k n o w n to the system Some
a d v a n t a g e s to this are that (1) definitions may be
a r b i t r a r i l y complex, l i m i t e d only by the c o v e r a g e of
the underlying syntactic c o m p o n e n t , and (2) users will
implicitly be learning to supply s e m a n t i c s at the same
time they learn to use the N L P itself Some
d i s a d v a n t a g e s are (1) a user might want to define
something that cannot be p a r a p h r a s e d within the
bounds of the g r a m m a t i c a l c o v e r a g e of the system,
and (2) unless o p t i m i z a t i o n s are c a r r i e d out,
r e f e r e n c e s to u s e r - d e f i n e d concepts may entail
inefficient processing
A n a l t e r n a t i v e to English specification, which
functions similarly from the user's s t a n d p o i n t , is to
provide for "English-like" specifications in which an
expression supplied by the user is t r a n s l a t e d by some
p a t t e r n - m a t c h i n g algorithm d i f f e r e n t from and
probably less s o p h i s t i c a t e d than the process involved
in a c t u a l English parsing The p r i m a r y a d v a n t a g e of
English-like s p e c i f i c a t i o n , over English specification,
is that translations into i n t e r n a l form can be more
e f f i c i e n t , since definitions or p a r t s of definitions will
be h a n d l e d on a case by case basis One p r o b a b l e
d i s a d v a n t a g e is that the scheme will be less g e n e r a l , in
t e r m s of d e f i n a b l e concetps, and p e r h a p s "spotty" in
t e r m s of what it m a k e s available
In T E L I , both English and English-like
s p e c i f i c a t i o n are done in t e r m s of sample d o m a i n values, which are t r e a t e d as f o r m a l p a r a m e t e r s A n
e x a m p l e a p p e a r s in F i g u r e 7 In the c u r r e n t
i m p l e m e n t a t i o n , English-like specifications include (a) any d e f i n i t i o n d e f i n a b l e by menu, and (b) definitions that involve (possibly n e g a t e d ) adjective or noun
r e f e r e n c e s As of this writing, only English specifications that involve no nested p a r a m e t e r
r e f e r e n c e s can be processed
7.4 Specification by Borrowing
In a d d i t i o n to w h a t e v e r m e c h a n i s m s an N L system specifically provides for s e m a n t i c acquisitions,
it is r e a s o n a b l e to allow users to define one meaning
directly in terms of a n o t h e r (in a d d i t i o n to indirect
d e p e n d e n c e , as in the case of English specification)
In T E L I , users may ask to "borrow" from an existing
m e a n i n g at any time As shown in F i g u r e 8, the system responds by finding all c u r r e n t items d e f i n e d in
t e r m s of all or some of the p a r a m e t e r s (i.e entities) of the i t e m for which the b o r r o w i n g is being done This assures that the e n t i r e b o r r o w e d m e a n i n g can be
m o d i f i e d to apply to the i t e m being defined A f t e r being copied, a b o r r o w e d m e a n i n g may be e d i t e d just
as though it had b e e n e n t e r e d f r o m scratch
A d j e c t i v e : FILE i s LFIRGE
[ Sample Usage: Sage i s LFIRGE ]
F u n c t i o n : CREATION-DATE LEN6TH OWNER (none)
o t h e r : M I L
Rr9ument: Sage
o t h e r : M I L
R e l a t i o n : != < <= > >=
F u n c t i o n : C R E A T I O N - D A T E L E N G T H O W N E R (none)
o t h e r : M I L
Flrgu~ent: 3 0 0 Sage
o t h e r : HIL Retain t h i s d e f i n i t i o n : Yes No
E.-: i t [ ]
Figure 6: Menu Specification
tk, e h e i g h t o f adams i s 9 r e a r e r thar, 4B001
A d j e c t i v e : MOUNTAIN i s TALL
[ SaBple Usage: Rdans i s IRLL ]
I y p e an E n g l i s h ( l i k e ) Reference
Figure 7: English-like Specification
Trang 7Is t i l e m e a n i n g o f
S T U D E N T is A D V A N C E D
r e l a t e d t o o n e o f t h e f o l l o w i n q ?
STUDENT is a FRESHH,qN STUDENT is a 6R,qDU,qTE STUDENT is a C,R,@UATE STUDENT
STUDENT is a JUNIOR STUDENT i:s a SENIOR STUDENT is a SOPHOPIORE STUDENT is an UNDERC, Rf~DU,qTE
.
CLflSS of STUDENT
Figure 8: Borrowing a M e a n i n g
8 Relation to Similarly Motivated Systems
At the most abstract level, our approach to
transportability is unusual in that we have begun by
building a moderately sophisticated N L P ' w h i c h , from
the outset, fundamentally includes replete customization
first built, perhaps over a period of several years, a
distinctive, though perhaps less so in seeking to allow
for customization by end users, as opposed to (say) a
database administrator (cf Thompson and Thompson,
1975, 1983, 1985; Johnson, 1985)
Some of the systems which, like TEL1, seek to
provide for user customization within the context of
database query are ASK (Thompson and Thompson
1983, 1985) formerly R E L (Thompson and Thompson,
1975) from Caltech; INTELLECT, formerly Robot
(Harris, 1977), marketed by Artificial Intelligence
Corporation; IRUS (Bates and Bobrow, 1983; Bates
Moser, and Stallard 1984), from BBN Laboratories;
TQA (Damerau, 1985) formerly R E Q U E S T (Plath,
1976), from IBM Yorktown Heights; TEAM (Grosz
1983; Grosz et al, 1985) from SRI International; and
USL (Lehmann, 1978), from IBM Heidleberg Other
DATALOG (Hafner and Godden 1985) from General
Motors Research Labs; HAM-ANS (Wahlster 1984),
from the University of Hamburg; and PHLIQA
(Bronnenberg et al, 1978-1979) from Philips Research
We now provide a comparison of T E L I ' s
customization strategies with those of the T E A M ,
IRUS, T Q A , and ASK systems (other comparisons
would also have been instructive, time and space
permitting) Although we have recently spoken with
at least one designer of each of these systems (see the
Acknowledgements), it is possible that, in addition to
intended simplifications, we may have overlooked or
undocumented, features, in which case we apologize
to the reader Also, we note that our remarks are
overall quality of T E L l or any other system
8.1 A Comparison with TEAM
Both T E A M and T E L I represent English- language interfaces that have been applied to several
Each system provides for a variety of customizations
system has claimed success with actual users in either customization or English processing mode In terms of method, each system obtains (among other things) information about each column of each relation (table) of the database We proceed to point out some
of the more significant differences between the projects, as suggested by Grosz et al (1985) and indicated by Martin (1986)
To begin with, T E A M incorporates a more powerful natural language processor than does T E L l , with provisions for quantifiers, simple pronouns,
conjunction, and numerous smaller features Its "sort hierarchy" provides a taxonomy more general than that of TELI It also incorporates disambiguation heuristics which seek to obviate the need for users to
prepositional phrases based on "on", "from", "with", and "in"), and its preparations to deal with time and place references are without counterpart in TELI
On the other hand, the customization features
of T E L l appear to offer greater sophistication, and
sophistication, T E L I always offers multiple ways of acquiring information, provides the ability to examine and borrow existing definitions, and is able to invoke the appropriate knowledge acquisition module when missing lexical, syntactic, or semantic information is required
generally provides for more complex definitions of words and phrases than does T E A M , as described in Sections 5-7 For example, whereas the SRI system typically requires a verb to map into some explicit or virtual relation (e.g a join of explicit relations), T E L l also allows an arbitrary number of properties of objects to be used in definitions (e.g an old employee
is one hired before I980 or an employee admires a
manager that works more hours than she does)
In T E A M , "acquisition is centered around the relations and fields in the database" In contrast,
T E L I provides several customization modes, as described in Section 3, and discourages low-level database specifications
Trang 8In contrast to the principles we espoused for
T E L I in Section 3, T E A M couples its methods of
acquisition with the type of modifier being defined
For example, when seeing a "feature field", which
contains exactly two distinct values, the system asks
for "positive adjectives" and "negative adjectives"
associated with these values (e.g "volcanic" is a
positive adjective associated with the database value
"Y") In TEL1, these relationships arise as a special
case of the acquisitions shown in Figures 3 6 , and 7b
A n interesting similarity b e t w e e n T E A M and
T E L I is that each provides for English(like)
definitions For example T E A M might be told that "a
volcano erupts", from which it infers that a mountain
erupts just in case it is a volcano
8.2 A Comparison with IRUS
A n o t h e r recently developed facilitiy to allow
represented by the I R A C Q c o m p o n e n t of the I R U S
system ( A y u s o and Weischedel, 1986) In addition to
its practical value, I R A C Q is intended as a vehicle
that permits experimental work with sophisticated
knowledge representation formalisms
I R A C Q is similar to T E L I in shielding the user
from the layout of the underlying data files A n o t h e r
similarity is that each system accepts case frame
specifications in English-like form but I R A C Q allows
proper nouns as well as c o m m o n nouns to be used
Thus a user might suggest the case frame of the verb
"write" by saying "Jones wrote some articles" Since
relationships among defined concepts (e.g nouns),
I R A C Q proceeds to ascertain which of the possibly
several classes that "Jones" belongs to is the most
general one that can act as the subject of "write"
One important difference between T E L I and
I R A C Q is that I R U S distinguishes conceptual
information, which resides within its KR f r a m e w o r k ,
from the linguistic information that characterizes the
English to be used Thus, while I R A C Q supports
predicates, as does T E L l , it assumes that any concepts
needed to define a new language item have already
been specified These representations, acquired by a
separate module called K R E M E , involve the K L - O N E
notions of "concept" and "relation", which are similar
to, but more sophisticated than, the 1- and 2-place
predicates that come into existence during a session
with T E L I
At present, I R A C Q allows users to define case
phrases, and noun phrases involving "of" Its
t r e a t m e n t of prepositional phrases is very much like
that of T E L I in that the head noun being modified is
considered part of the the noun-preposition-noun triple for which a definition is beine acquired (cf Section 4,1) Definitions for individual words (e.g nouns and adjectives) are not s u p p o r t e d but are being considered for future versions of the system, as are facilities that enable the system to inform the user of existing predicates that might be useful in defining a new language item This facility will be similar in spirit to T E L I ' s provisions for "borrowing" definitions
as described in Section 7.4
8.3 A Comparison with TQA
Unlike most efforts at transportability, T Q A has been designed as a working prototype, capable of being customizated for complex d a t a b a s e applications
by actual users The primary responsibility of the customization module is to acquire information that relates language concepts, e.g subject of a given verb,
to the columns of the database at hand
Like T E L I , T Q A avoids having to copy all database values into the lexicon by constructing
"shape" information to recognize numbers and similar patterns For example, the system might deduce that all database values referring to a d e p a r t m e n t are of the form "letter followed by two digits", which allows for valuable disambiguations during parsing Thus, in
a database where employees m a n a g e projects and supervisors manage departments, the question "Who manages K34?" can be understood to be asking about supervisors without having to find "K34" in either the lexicon or the database
A related problem, which T Q A addresses more squarely than most systems (including T E L I ) , concerns the appearance and possible equivalence of database values For example "vac lnd" might indicate "vacant land", "grn" and "green" might be used interchangeable, and so forth M a n y practical applications require that these sorts of issues be addressed in order for a user to obtain reliable information
A n o t h e r useful feature concerns the acquisition
formatting In simple cases, a database administrator might want nine-digit values appearing in columns associated with social security numbers to be printed with dashes at the appropriate points (e.g 123456789 becomes 123-45-6789), In more complicated situations, values might actually need to be decoded, so that 0910 becomes "vacant land" This provision for decoding is similar to to the form of intermediate acquisition shown in Figure 3, though here it is being used for opposite effect
Trang 98.4 A Comparison with ASK
The current ASK prototypes, which run on Sun,
Vax, and HP desktop systems, are derived from
earlier work on the R E L system, which itself derives
from work on the D E A C O N project, which stems
from the early 1960's Unlike most recent efforts,
features into an existing more-or-less single-domain
system, the work with R E L , the "Rapidly Extensible
capabilities as early as 1969
To begin with, ASK provides quite general
customization facilities, allowing English definitions at
least as sophisiticated as those outlined in Section 7.3
An example is "ships 'carry' coal to Oslo if there is a
shipment whose carrier is ships, type is coal and
destination is Oslo" Arithmetic facilities are also
provided, e.g "area equals length times beam"
intergrated information management system, rather
than provide simple sentence-by-sentence database
retrieval One feature allows ASK to be connected to
several external database systems, drawing information
from each of them in the context of answering a user's
question A second feature allows a user to provide
specification of a record type, followed by information
used to populate the newly created relation
Acknowledgements
The current T E L I system derives from work on
the LDC project, which was carried out at Duke
University by John Lusth and Nancy Tinkham In
converting the NL portions of LDC to operate in our
Bachenko, Alan Biermann, Marcia Derr, George
Heidorn, Mark Jones and Mitch Marcus We also
wish to thank Paul Martin of SRI, Damaris Ayuso and
Ralph Weischedel of BBN, Fred D a m e r a u of IBM
Yorktown Heights, and Fred Thompson of Caltech,
for their willingness to answer a number of questions
that helped us to formulate the comparisons given in
Section 8 Finally, we wish to thank Marcia Derr for
many useful comments on a draft of our paper
References
Communication, April 1986
Cognition and Brain Theory 5, 3 (1982), 269-287
Ballard, B "The Syntax and Semantics of User-
University, July 1984, 52-56
Ballard, B "User Specification of Syntactic Case Frames in T E L I , A Transportable, User-Customized
Natural Language Processor", Proc Coling-86, Bonn,
West G e r m a n y , August, 1986
Ballard, B., Lusth, J., and Tinkham, N "LDC-I: A Transportable Natural Language Processor for Office
Information Systems 2, 1 (1984), 1-23
Ballard, B and Tinkham, N "A Phrase-Structured Grammatical Framework for Transportable Natural
Language Processing", Computational Linguistics 10, 2
(1984), 81-96
Bates, M and Bobrow, R "A Transportable Natural
Language Interface for Information Retrieval", Proc 6th Int ACM SIGIR Conference, Washington, D.C.,
June 1983
Bates, M., Moser, M and Stallard, D "The I R U S Transportable Natural Language Interface", Proc First Int Workshop on Expert Database Systems,
Kiawah Island, October 1984, 258-274
Schoenmakers, W and van Utteren, E "PHLIQA-1,
Consultation in Natural English", Philips tech Rev 38
(1978-79), 229-239 and 269-284
Customization of Natural Language Database Front
Ends", ACM Transactions on Office Information Systems 3, 2 (1985), 165-184
Language Interface System", Conf on A p p l i e d Natural Language Processing, Santa Monica, 1983, 39-45
Grosz, B., Appelt, D., Martin, P and Pereira, F
Transportable Natural-Langauge Interfaces", Artificial Intelligence, in press
Hafner, C and Godden, C "Portability of Syntax and
Semantics in Datalog" ACM Transactions on Office Information Systems 3 2 (1985), 141-164
Trang 10Harris, L "User-Oriented Database Query with the ROBOT Natural Language System", Int Journal of Man-Machine Studies 9 (1977), 697-713
Commercial Applications Ovum Ltd, London 1985 Lehmann H "Interpretation of natural language in
an information system", IBM J Res Dev 22, 5 (1978),
pp 560-571
Martin, P Personal communication, March 1986
Tennant, H "Experience With the Evaluation of Natural Language Question Answerers", Int J Conf
on Artificial Intelligence, 1979, pp 275-281
Thompson, F and Thompson, B "Practical Natural Language Processing: The REL System as Prototype",
In Advances in Computers, Vol 3, M Rubinoff and M Yovits, Eds., Academic Press, 1975
Thompson, B and Thompson, F "Introducing ASK:
A Simple Knowledgeable System", Conf on Applied Natural Language Processing, Santa Monica, 1983 17-
24
Transportable in Half a Dozen Ways", ACM Trans on Office Information Systems 3, 2 (1985), 185-203
Wahlster, W "User Models in Dialog Systems", Invited talk at Coling-84, Stanford University, July
1984