06510 Zn the past decade, a number of natural lang- uage database access systems have been constructed e.g... The conceptual- ly-based domain-specific knowledge is the key t o robust un
Trang 1Problems ¥ i t h Domain-Independent Natural Language Database Access Systems
Steven P Shvartz
C o g n i t i v e Systems Inc
234 Church S t r e e t New Haven, Ca 06510
Zn the past decade, a number of natural lang-
uage database access systems have been constructed
(e.g Hendrix 1976; Waltz et e l 1976; Sac-
e r d o t i 1978; Harris 1979; Lehner~ and Shwartz
1982; Shvartz 1982) The level of performance
achieved by natural language database access sys-
tems v a r i e s considerably, with the sore robust
systems o p e r a t i n g v i t h t n a narrow domain ( i e ,
content area) and r e l y i n g h e a v i l y on domain-speci-
f i c knowledge to guide the language understanding
process Transporting a system constructed f o r one
domain i n t o a new domain is extremely resource-in-
tensive because a new set of domain-specific know-
ledge must be encoded
In order to reduce the cost of t r a n s p o r t a t i o n ,
a great deal of c u r r e n t research has focussed on
b u i l d i n g natural language access systems t h a t are
domain-independent More s p e c i f i c a l l y , these sys-
tems attempt to use s y n t a c t i c knowledge in con-
~unction with knowledge about the s t r u c t u r e of the
database as a s u b s t i t u t e f o r conceptual knowledge
regarding the database content area In t h i s paper
I examine the issue of whether or not i t is possi-
ble to b u i l d a natural language database access
systee t h a t achieves an acceptable level of per-
formance without i n c l u d i n g domain-specific concep-
tual knowledge
6 gerforn=nca ~i~g~ion for oa~u£al language atoms=
=X=~em=,
The p r i n c i p l e motivation f o r b u i l d i n g natural
language systems f o r database access is ~o free the
user from the need f o r data processing i n s t r u c t i o n
A natural language f r o n t end is a step above the
" E n g l i s h - l i k e = query systems t h a t presently domi-
nate the commercial database r e t r i e v a l f i e l d
E n g l i s h - l i k e query systems allow the user to phrase
requests as English sentences, but permit only a
r e s t r i c t e d subset o f English and impose a r i g i d
syntax on user requests These E n g l i s h - l i k e query
systems are easy to learn, but a t r a i n i n g period is
s t i l l required f o r the user to learn t o phrase re-
quests t h a t conform to ~ h c ~ r e s t r i c t i o n s Howe-
ver, the t r a i n i n g period is often very b r i e f , and
natura~ language systems can be considered superior
only i f no computer-related t r a i n i n g or knowledge
is required of the user
This c r i t e r i o n can only be met i f no r e s t r i c -
t i o n s are placed on user queries A user who has
previously r e l i e d on a programmer-technician to
code formal queries f o r information r e t r i e v a l should be permitted to phrase inform%ion r e t r i e v a l requests t~ the program in e x a c t l y the same way as
to the t e c h n i c i a n That is, whatever the t e c h n i - cian would understand, the program should understand For example, a natural language f r o n t end to a stock market database should understand
t h a t (1) Did IBM go up yesterday?
refers to PRZCE and not VOLUME However, the sys- tem need not understand requests t h a t a program- mer-technician would be unable t o process, e.g (2) I s GENCO a l i k e l y takeover target?
That is, the programmer-technlcisn uorking for an investment firm would not be expected to know how t<) process requests that require "expert" knowledge and neither should | natural language front end,
If, however, = natural language system cannot a- chieve the level of performance of a program- ear-technician it will seem stupid because it does not meet = user's expectations for an English un- derstanding system,
The mprograemer-technician criterion m cannot possibly be met by = domain-independent natural language access system because language understan- ding requires domain-specific world knowledge On
a t h e o r e t i c a l l e v e l , the need f o r a knowledge base
in a natural language processing system has been well-documented (e.g Schank A Abelson 1977; Lehnert 1978; Dyer 1982) I t w i l l be argued below t h a t in an a p p l i e d c o n t e x t , a system t h a t does not have a conceptual knowledge base can pro- duce at best only a shallow level of understanding and one t h a t does not meet the c r i t e r i o n s p e c i f l e d above Further, the domain-independent approach creates a host of problems t h a t are simply non-ex-
i s t e n t in knowledge-based s~stems
E~oble== f a r dolai0:i0dg~a0dan~ =~=~®=~ i n f e r - ence ambiguity, sod aoagbora,
I n f e r e n t i a l processing is an i n t e g r a l p a r t o f natural language understanding Consider the f o l - lowing requests from PEARL (Lehnert and Shvartz 1982; Shwartz 1982) when i t operates in the domain
of geological map generation:
Trang 2(4) Show I s a l l o i l v e i l s f r o ! 8000 ~ 7000
(5) Show se a l l o i l w e l l s 1 t~a 2000
(6) Show ee a l l o i l w e l l s 40 t o 41, 80 t o 81
A programmer-technician In the petrochemical i n -
d u s t r y would i n f e r t h a t (3) r e f e r s t o d r i l l i n g
dates, (4) r e f e r s ~o v e i l depth, (5) r e f e r s ~o the
sap s c a l e , end (6) r e f e r s t o l a t i t u d e / l o n g i t u d e
s p e c i f i c a t i o n s
C o r r e c t p roc essi ng o f these requsst~ r e q u i r e s i n -
f e r e n t i a l p r o c e s s i n g t h a t i s based on knowledge o f
the petrochemical i n d u s t r y That i s , these con-
v e n t i o n s =re not in e v e r y o n e ' s general working
knowledge o f the English language Yet they are
standard usage f o r people who communicate with each
o t h e r about d r i l l i n g data, and any s y s t s s t h a t
claims t~o p r o v i d e a n a t u r a l language i n t e r f a c e t~ l
data base o f d r i l l i n g data must have the knowledge
t o c o r r e c t l y process requests such as these
Without such i n f e r e n t i a l processing, the user i s
r e q u i r e d t o s p e l l out e v e r y t h i n g in d e t a i l , some-
t h i n g t h a t i s s i s p t y not necessary in normal Eng-
l i s h d i s c o u r s e
Another p r o b l e s f o r any n a t u r a l language un-
d e r s t a n d i n g s y s t e s is the processing o f ambiguous
words In some cases disambiguation can be p e r -
formed s y n t a c t i c a l l y In o t h e r cases, the s t r u c -
t u r e o f the database can provide the i n f o r m a t i o n
necessary f o r word sense disambiguation (more on
t h i s below) However, in many cases d i s a s b i g u a t i o n
can only be performed i f d o m a i n - s p e c i f i c , world
knowledge is a v a i l a b l e For example, consider the
processing o f the word " s a l e s = in ( 7 ) , (8) and ( 9 )
(7) What is the average mark up f o r s a l e s o f s t e r e o
equipment?
(8) What is the average mark d o w n f o r s a l e s o f
s t e r e o equipment?
(9) What is the average mark up d u r i n g s a l e s o f
s t e r e o equipment?
(10) What is the average mark down d u r l n g s a l e s o f
s t e r e o equipment?
These f o u r requests, which are so n e l r l y i d e n t i c a l
both l e x i c a l l y and s y n t a c t i c a l l y , have very d i s -
t i n c t meanings t h a t d e r i v e from the f a c t t h a t the
c o r r e c t sense o f ' s l i e s t in (7) l s q u i t s d i f f e r e n t
from the sense o f " s a l e s = intended in ( 8 ) , ( 9 ) , end
(10) Nest people have l i t t l e d i f f i c u l t y d e t e r -
mining which sense o f =sales = is intended in these
sentences, and n e i t h e r would a knowledge-based un-
d e r s t a n d e r The key to the disambiguation process
i n v o l v e s world knowledge regarding r e t a i l s a l e s
Problems o f anaphora pose s i m i l a r problems
For example, suppose the f o l l o w i n g requests were
submitted t o a personnel data base:
(11) L i s t a l l salesmen with r e t i r e m e n t plans along
with t h e i r s a l a r i e s
(12) L i s t a l l o f f i c e s with women managers along
with t h e i r s a l a r i e s
While these requests are s y n t a c t i c a l l y i d e n t i c a l ,
the r e f e r e n t s f o r " t h e i r " in (11) end (12) occupy
d i f f e r e n t s y n t a c t i c p o s i t i o n s As human informa-
t i o n processors, ve have no t r o u b l e understanding
r e t i r e m e n t p l l n s and o f f i c e s are never considered
as p o s s i b l e r e f e r e n t s Again, d o m a i n - s p e c i f i c world knouledge i s h e l p f u l in understanding these requests
~ U g ~ u ~ a l knQwlldgm i = m =uh=~i~u~m fo~ GQO¢ID~ual knowlsdgg,
One o f inn er |al ien s t o e a e r g e from the con-
s t r u c t i o n o f domain-independent systems is t c l e v e r mechanism t h a t e x t r a c t s d o s a i n - s p e c l f l c knowledge
f r e e the s t r u c t u r e o f the d a t a base For example, the r e s o l u t i o n o f the pronoun ' t h e i r = in both (11) and (12) above could be accomplished by using o n l y
s t r u c t u r a l ( r a t h e r than conceptual) knowledge o f the domain For example, suppose the p a y r o l l database f o r (11) were s t r u c t u r e d such t h a t SALARY and RETIRENENT-PLANS were f i e l d s w i t h i n a SALESMAN
f i l e I t would then be p o s s i b l e t o i n f e r t h a t
l t h e i r = r e f e r s t o =salesmen = in (11) by n o t i n g t h a t SALARY is a f i e l d in the SALESMEN f i l e , but t h a t SALARY i s not an e n t r y in I RETIREMENT-PLANS f i l e
U n f o r t u n a t e l y , t h i s approach has l i l i t e d u-
t i l i t y because i t r e l i e s on a f o r t u i t o u s de,abase
s t r u c t u r e Consider what would happen i f the data base had a t o p - l e v e l ERPLOYEES f i l e ( r a t h e r than
i n d i v i d u a l f i l e s f o r each type o f employee) with
f i e l d s f o r JOB-TYPE, SALARY, COMMISSIONS, and RE- TZRENENT-PLANS, With t h i s database o r g a n i z a t i o n ,
i t would not he p o s s i b l e t o d e t e r s i n e t h a t (13) L i s t a l l salesmen who have s e c r e b a r i e s along with t h e i r comsissions
l t h e i r = r e f e r s ~o meal=amen" and not " s e c r e t a r i e s =
in (13) on the b a s i s o f the s t r u c t u r e o f the d a t a - bass To the naive user, however, the s e i n i n g o f
t h i s sentence i s p e r f e c t l y c l e a r A person who
c o u l d n ' t determine the r e f e r e n t o f " t h e i r = in (13) would not be perceived as having an adequate cos- sand o f the English language and the same would be
t r u e f o r a computer system t h a t did not understand the request
~i~fall= a==g~il~Id wi~b ~bm dQ®zin:indag~ndln~ i ~ -
In a knowledge-based systes such as PEARL, =
n a t u r a l language request is parsed i n t o a concep-
t u a l r e p r e s e n t a t i o n o f the meaning o f the request The r e t r i e v a l r o u t i n e i s then generated f r e e t h i s concepbual r e p r e s e n t a t i o n As a r e s u l t , the parser
is independent o f the l o g i c a l s t r u c t u r e o f the database That i s , the same parser can be used f o r databases with d i f f e r e n t l o g i c a l s t r u c t u r e s , but the same i n f o r m a t i o n c o n t e n t F u r t h e r , the same parser can be used whether the r e q u i r e d i n f o r m a t i o n
is located in = s i n g l e f i l e or in l u l t i p l e f i l e s
In a domaln-independent s y s t e s , the parser i s
e n t i r e l y dependent on the s t r u c t u r e o f the database
f o r d o m a i n - s p e c i f i c knowledge As a r e s u l t , one must r e s t r u c t u r e the parser f o r databases with i -
d e n t i c a l c o n t e n t but d i f f e r e n t l o g i c a l s t r u c t u r e
S i s i l a r l y , the o u t p u t o f the p a r s e r l u s t be very
Trang 3d l f f e r e n t vhen the required information Is con-
tained in mulSiple f i l e s rather than a s i n g l e f i l e
Because of t h e i r lack of conceptual knowledge
regarding the database, domain-independent systems
r e l y h e a v i l y on key words or phrases to i n d i c a t e
which database f i e l d iS being referred t o For
example,
(14) Vhat is B i l l Smith's ~ob &male?
High& be e a s i l y processed by simply r e t r i e v i n g the
con&ants of a JOB-TITLE f i e l d D i f f e r e n t v l y s of
r e f e r r i n g ~o job t i t l e can also be handled as syn-
onyms However, dosiin°independent systems g e t
i n t o deep t r o u b l e vhen the database f i e l d t h a t
needs t o be accessed is not d i r e c t l y indicated by
key words or phrases in the input request For
example,
(15) I s John Jones the c h i l d of an alumnus?
is e a s i l y processed i f there e x i s t s a
CHILD-OF-AN-ALUMNUS f i e l d , but the query
(16) I s one of John Jones' paren&s an alumnus?
contains no key word or phrase t o i n d i c a t e t h a t the
CHILD-OF-AN-ALURNUS f i e l d should be accessed, In a
knowledge-based system, the r e t r i e v a l routine is
generated from a conceptual representation of the
meaning of the user query and t h e r e f o r e key words
or phrases arm not required A r e l a t e d problem
occurs with queries i n v o l v i n g a ~ r e p t i o n or quan-
t i t y For example,
(17) How many employees are in the sales depart-
ment?
l i g h t require r e t r i e v i n g the value of a p a r t i c u l a r
f i e l d (e.g NUHBER-OF-EHPLOYEES), or i t s i g h t re-
quire t o t a l l i n g the number of records in the EH-
PLOYEE f i l e t h a t have the c o r r e c t DEPARTNENT f i e l d
value, or, i f the departments are broken down i n t o
o f f i c e s , i t l i g h t require t o t a l l i n g the NUN-
BER-OF-ENPLOYEES f i e l d f o r each o f f i c e In m do-
main-independent system, the c o r r e c t parse depends
upon the s t r u c t u r e of the database and is therefore
d i f f i c u l t to handle in a general way In a know-
ledge-based system such as PEARL, the d i f f e r e n t
database s t r u c t u r e s would simply require a l t e r i n g
the mapping between the conceptual representaSion
of the parse and the r e t r i e v a l query
F i n a l l y , t h i s r e l i a n c e on database s t r u c t u r e
can lead to wrong answers A c l a s s i c example is
H a r r i s ' (1979) 'snowmobile problem = Y h e n H a r r i s '
ROBOT system i n t e r f a c e s with a f i l e c o n t a i n i n g i n -
formation about homeowner's insurance, the word
'snowmobile" is defined as any number • 0 in the
'snowmobile f i e l d " of an insurance p o l i c y record
This means t h a t as f a r as ROBOT is concerned, the
question 'How many snowmobiles are there? = is no
d i f f e r e n t from "How many p o l i c i e s have snowmobile
coverage?" However, the c o r r e c t answers to the two
questions w i l l often be very d i f f e r e n t I f the
f i r s t question is asked and the second question is
answered, the r e s u l t is an i n c o r r e c t answer I f
the f i r s t question cannot be answered due t o the
s t r u c t u r e of the database, the system should inform the user the5 t h i s is the case
~oogluaioo=
I have argued above t h a t conceptually-based domain-specific knowledge is a b s o l u t e l y e s s e n t i a l
f o r n | t u r l l language database access systems Systems t h a t r e l y on dltabase s t r u c t u r e f o r t h i s domain-specific knowledge v i i i not achieve an ac- ceptable level of performance - - i e operate a t the level of understanding of a programmer-techni- cian
Because of the requirement f o r d e l i a n - s p e c i f i c knowledge, conceptually-based systems are r e s t r i c - ted t~o l i m i t e d domains and are not r e a d i l y p o r t a b l e
~o new content areas However, e l i m i n a t i n g the domain-speciflc conceptual knowledge is throwing
&he baby out with the ba&h water The conceptual- ly-based domain-specific knowledge is the key t o robust understanding
The approach o f the PEARL p r o j e c t with regard t~ the & r a n s p o r t a b i l i t y problem is t~ t r y and I -
d e n t i f y areas of discourse t h a t are common t~ most domains and t o b u i l d robust modules f o r natural language a n a l y s i s w i t h i n these domains Examples
of such domains are temporal reference, loci&ion reference, and report generation These modules are knowledge-based and can be used by a wide va-
r i e t y of domains t o help e x t r a c t ~hm conceptual content of a requss5
REFERENCES Dyer, N (1982) ~n:~9~h Und~£~aodiag~ ~ Cos- pu~nt HQdnl of In~ng£a~nd 8 t o , o a r i n g fg£ N a ~ i -
~[X§ Cg~D£ObgU~igO Yale U n i v e r s i t y , Computer Science Dept., Research Report #219
H a r r i s , t R (1979) Experience with ROBOT in 12 commercial natural language data base query ap-
p l i c a t i o n s , g£~oeding= Of ~b| O~b [o~ncna~ioo-
a l Joins Cgnfntnnco on & £ ~ i f i c i a l [ n ~ o l l l g o n c o Hendrix, G G (1976) LIFER: A natural language
i n t e r f a c e f a c i l i t y SRZ Tech Note 135 Dec
1976
Lehnert, W (1978) Ibo 8~o~o~ of Ggo~ioo 8O- sHO£iOg Lawrence Erlbaum Associates, H i l l s - dale, New Jersey
Lehnert, ¥ and Shwartz, S (1982) Nabural Language Data Base Access with Pearl EzoCmod- logs of ~be Hin~b Io~ntna~ional Conference on Comp~aSioQal L i n g u i s t i c = , Prague, Czechoslo- vakia
5 a c e r d o t i , E D (1978) A LADOER user's guide Technical Note 163 SRI P r o j e c t 6891,
Schank, R C and kbelson, R (1977) ~ £ i g ~ Elm0=, G~IIs add U0da£s~anding, Lawrence E r l - baum Associates, H i l l s d a l e Ne~ Jersey, 1977 Shwartz, S (1982) PEARL: ' k Natural Language Analysis System f o r Information R e t r i e v a l (sub- mitted to AAAI-82/applications d i v i s i o n ) Waltz, D L , Finin T., Green, F., Conrad, F., Goodman, B., Hadden, G (1976) The planes system: natural language access to a lar~e data base Coordinated Science Lab., Univ, of I l -
l i n o i s , Urbane, Tech Report T-34, (July 1976)