1. Trang chủ
  2. » Luận Văn - Báo Cáo

Tài liệu Báo cáo khoa học: "Domain-Independent Natural Language Database Access Systems" pptx

3 292 0
Tài liệu được quét OCR, nội dung có thể không chính xác
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 3
Dung lượng 251,67 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

06510 Zn the past decade, a number of natural lang- uage database access systems have been constructed e.g... The conceptual- ly-based domain-specific knowledge is the key t o robust un

Trang 1

Problems ¥ i t h Domain-Independent Natural Language Database Access Systems

Steven P Shvartz

C o g n i t i v e Systems Inc

234 Church S t r e e t New Haven, Ca 06510

Zn the past decade, a number of natural lang-

uage database access systems have been constructed

(e.g Hendrix 1976; Waltz et e l 1976; Sac-

e r d o t i 1978; Harris 1979; Lehner~ and Shwartz

1982; Shvartz 1982) The level of performance

achieved by natural language database access sys-

tems v a r i e s considerably, with the sore robust

systems o p e r a t i n g v i t h t n a narrow domain ( i e ,

content area) and r e l y i n g h e a v i l y on domain-speci-

f i c knowledge to guide the language understanding

process Transporting a system constructed f o r one

domain i n t o a new domain is extremely resource-in-

tensive because a new set of domain-specific know-

ledge must be encoded

In order to reduce the cost of t r a n s p o r t a t i o n ,

a great deal of c u r r e n t research has focussed on

b u i l d i n g natural language access systems t h a t are

domain-independent More s p e c i f i c a l l y , these sys-

tems attempt to use s y n t a c t i c knowledge in con-

~unction with knowledge about the s t r u c t u r e of the

database as a s u b s t i t u t e f o r conceptual knowledge

regarding the database content area In t h i s paper

I examine the issue of whether or not i t is possi-

ble to b u i l d a natural language database access

systee t h a t achieves an acceptable level of per-

formance without i n c l u d i n g domain-specific concep-

tual knowledge

6 gerforn=nca ~i~g~ion for oa~u£al language atoms=

=X=~em=,

The p r i n c i p l e motivation f o r b u i l d i n g natural

language systems f o r database access is ~o free the

user from the need f o r data processing i n s t r u c t i o n

A natural language f r o n t end is a step above the

" E n g l i s h - l i k e = query systems t h a t presently domi-

nate the commercial database r e t r i e v a l f i e l d

E n g l i s h - l i k e query systems allow the user to phrase

requests as English sentences, but permit only a

r e s t r i c t e d subset o f English and impose a r i g i d

syntax on user requests These E n g l i s h - l i k e query

systems are easy to learn, but a t r a i n i n g period is

s t i l l required f o r the user to learn t o phrase re-

quests t h a t conform to ~ h c ~ r e s t r i c t i o n s Howe-

ver, the t r a i n i n g period is often very b r i e f , and

natura~ language systems can be considered superior

only i f no computer-related t r a i n i n g or knowledge

is required of the user

This c r i t e r i o n can only be met i f no r e s t r i c -

t i o n s are placed on user queries A user who has

previously r e l i e d on a programmer-technician to

code formal queries f o r information r e t r i e v a l should be permitted to phrase inform%ion r e t r i e v a l requests t~ the program in e x a c t l y the same way as

to the t e c h n i c i a n That is, whatever the t e c h n i - cian would understand, the program should understand For example, a natural language f r o n t end to a stock market database should understand

t h a t (1) Did IBM go up yesterday?

refers to PRZCE and not VOLUME However, the sys- tem need not understand requests t h a t a program- mer-technician would be unable t o process, e.g (2) I s GENCO a l i k e l y takeover target?

That is, the programmer-technlcisn uorking for an investment firm would not be expected to know how t<) process requests that require "expert" knowledge and neither should | natural language front end,

If, however, = natural language system cannot a- chieve the level of performance of a program- ear-technician it will seem stupid because it does not meet = user's expectations for an English un- derstanding system,

The mprograemer-technician criterion m cannot possibly be met by = domain-independent natural language access system because language understan- ding requires domain-specific world knowledge On

a t h e o r e t i c a l l e v e l , the need f o r a knowledge base

in a natural language processing system has been well-documented (e.g Schank A Abelson 1977; Lehnert 1978; Dyer 1982) I t w i l l be argued below t h a t in an a p p l i e d c o n t e x t , a system t h a t does not have a conceptual knowledge base can pro- duce at best only a shallow level of understanding and one t h a t does not meet the c r i t e r i o n s p e c i f l e d above Further, the domain-independent approach creates a host of problems t h a t are simply non-ex-

i s t e n t in knowledge-based s~stems

E~oble== f a r dolai0:i0dg~a0dan~ =~=~®=~ i n f e r - ence ambiguity, sod aoagbora,

I n f e r e n t i a l processing is an i n t e g r a l p a r t o f natural language understanding Consider the f o l - lowing requests from PEARL (Lehnert and Shvartz 1982; Shwartz 1982) when i t operates in the domain

of geological map generation:

Trang 2

(4) Show I s a l l o i l v e i l s f r o ! 8000 ~ 7000

(5) Show se a l l o i l w e l l s 1 t~a 2000

(6) Show ee a l l o i l w e l l s 40 t o 41, 80 t o 81

A programmer-technician In the petrochemical i n -

d u s t r y would i n f e r t h a t (3) r e f e r s t o d r i l l i n g

dates, (4) r e f e r s ~o v e i l depth, (5) r e f e r s ~o the

sap s c a l e , end (6) r e f e r s t o l a t i t u d e / l o n g i t u d e

s p e c i f i c a t i o n s

C o r r e c t p roc essi ng o f these requsst~ r e q u i r e s i n -

f e r e n t i a l p r o c e s s i n g t h a t i s based on knowledge o f

the petrochemical i n d u s t r y That i s , these con-

v e n t i o n s =re not in e v e r y o n e ' s general working

knowledge o f the English language Yet they are

standard usage f o r people who communicate with each

o t h e r about d r i l l i n g data, and any s y s t s s t h a t

claims t~o p r o v i d e a n a t u r a l language i n t e r f a c e t~ l

data base o f d r i l l i n g data must have the knowledge

t o c o r r e c t l y process requests such as these

Without such i n f e r e n t i a l processing, the user i s

r e q u i r e d t o s p e l l out e v e r y t h i n g in d e t a i l , some-

t h i n g t h a t i s s i s p t y not necessary in normal Eng-

l i s h d i s c o u r s e

Another p r o b l e s f o r any n a t u r a l language un-

d e r s t a n d i n g s y s t e s is the processing o f ambiguous

words In some cases disambiguation can be p e r -

formed s y n t a c t i c a l l y In o t h e r cases, the s t r u c -

t u r e o f the database can provide the i n f o r m a t i o n

necessary f o r word sense disambiguation (more on

t h i s below) However, in many cases d i s a s b i g u a t i o n

can only be performed i f d o m a i n - s p e c i f i c , world

knowledge is a v a i l a b l e For example, consider the

processing o f the word " s a l e s = in ( 7 ) , (8) and ( 9 )

(7) What is the average mark up f o r s a l e s o f s t e r e o

equipment?

(8) What is the average mark d o w n f o r s a l e s o f

s t e r e o equipment?

(9) What is the average mark up d u r i n g s a l e s o f

s t e r e o equipment?

(10) What is the average mark down d u r l n g s a l e s o f

s t e r e o equipment?

These f o u r requests, which are so n e l r l y i d e n t i c a l

both l e x i c a l l y and s y n t a c t i c a l l y , have very d i s -

t i n c t meanings t h a t d e r i v e from the f a c t t h a t the

c o r r e c t sense o f ' s l i e s t in (7) l s q u i t s d i f f e r e n t

from the sense o f " s a l e s = intended in ( 8 ) , ( 9 ) , end

(10) Nest people have l i t t l e d i f f i c u l t y d e t e r -

mining which sense o f =sales = is intended in these

sentences, and n e i t h e r would a knowledge-based un-

d e r s t a n d e r The key to the disambiguation process

i n v o l v e s world knowledge regarding r e t a i l s a l e s

Problems o f anaphora pose s i m i l a r problems

For example, suppose the f o l l o w i n g requests were

submitted t o a personnel data base:

(11) L i s t a l l salesmen with r e t i r e m e n t plans along

with t h e i r s a l a r i e s

(12) L i s t a l l o f f i c e s with women managers along

with t h e i r s a l a r i e s

While these requests are s y n t a c t i c a l l y i d e n t i c a l ,

the r e f e r e n t s f o r " t h e i r " in (11) end (12) occupy

d i f f e r e n t s y n t a c t i c p o s i t i o n s As human informa-

t i o n processors, ve have no t r o u b l e understanding

r e t i r e m e n t p l l n s and o f f i c e s are never considered

as p o s s i b l e r e f e r e n t s Again, d o m a i n - s p e c i f i c world knouledge i s h e l p f u l in understanding these requests

~ U g ~ u ~ a l knQwlldgm i = m =uh=~i~u~m fo~ GQO¢ID~ual knowlsdgg,

One o f inn er |al ien s t o e a e r g e from the con-

s t r u c t i o n o f domain-independent systems is t c l e v e r mechanism t h a t e x t r a c t s d o s a i n - s p e c l f l c knowledge

f r e e the s t r u c t u r e o f the d a t a base For example, the r e s o l u t i o n o f the pronoun ' t h e i r = in both (11) and (12) above could be accomplished by using o n l y

s t r u c t u r a l ( r a t h e r than conceptual) knowledge o f the domain For example, suppose the p a y r o l l database f o r (11) were s t r u c t u r e d such t h a t SALARY and RETIRENENT-PLANS were f i e l d s w i t h i n a SALESMAN

f i l e I t would then be p o s s i b l e t o i n f e r t h a t

l t h e i r = r e f e r s t o =salesmen = in (11) by n o t i n g t h a t SALARY is a f i e l d in the SALESMEN f i l e , but t h a t SALARY i s not an e n t r y in I RETIREMENT-PLANS f i l e

U n f o r t u n a t e l y , t h i s approach has l i l i t e d u-

t i l i t y because i t r e l i e s on a f o r t u i t o u s de,abase

s t r u c t u r e Consider what would happen i f the data base had a t o p - l e v e l ERPLOYEES f i l e ( r a t h e r than

i n d i v i d u a l f i l e s f o r each type o f employee) with

f i e l d s f o r JOB-TYPE, SALARY, COMMISSIONS, and RE- TZRENENT-PLANS, With t h i s database o r g a n i z a t i o n ,

i t would not he p o s s i b l e t o d e t e r s i n e t h a t (13) L i s t a l l salesmen who have s e c r e b a r i e s along with t h e i r comsissions

l t h e i r = r e f e r s ~o meal=amen" and not " s e c r e t a r i e s =

in (13) on the b a s i s o f the s t r u c t u r e o f the d a t a - bass To the naive user, however, the s e i n i n g o f

t h i s sentence i s p e r f e c t l y c l e a r A person who

c o u l d n ' t determine the r e f e r e n t o f " t h e i r = in (13) would not be perceived as having an adequate cos- sand o f the English language and the same would be

t r u e f o r a computer system t h a t did not understand the request

~i~fall= a==g~il~Id wi~b ~bm dQ®zin:indag~ndln~ i ~ -

In a knowledge-based systes such as PEARL, =

n a t u r a l language request is parsed i n t o a concep-

t u a l r e p r e s e n t a t i o n o f the meaning o f the request The r e t r i e v a l r o u t i n e i s then generated f r e e t h i s concepbual r e p r e s e n t a t i o n As a r e s u l t , the parser

is independent o f the l o g i c a l s t r u c t u r e o f the database That i s , the same parser can be used f o r databases with d i f f e r e n t l o g i c a l s t r u c t u r e s , but the same i n f o r m a t i o n c o n t e n t F u r t h e r , the same parser can be used whether the r e q u i r e d i n f o r m a t i o n

is located in = s i n g l e f i l e or in l u l t i p l e f i l e s

In a domaln-independent s y s t e s , the parser i s

e n t i r e l y dependent on the s t r u c t u r e o f the database

f o r d o m a i n - s p e c i f i c knowledge As a r e s u l t , one must r e s t r u c t u r e the parser f o r databases with i -

d e n t i c a l c o n t e n t but d i f f e r e n t l o g i c a l s t r u c t u r e

S i s i l a r l y , the o u t p u t o f the p a r s e r l u s t be very

Trang 3

d l f f e r e n t vhen the required information Is con-

tained in mulSiple f i l e s rather than a s i n g l e f i l e

Because of t h e i r lack of conceptual knowledge

regarding the database, domain-independent systems

r e l y h e a v i l y on key words or phrases to i n d i c a t e

which database f i e l d iS being referred t o For

example,

(14) Vhat is B i l l Smith's ~ob &male?

High& be e a s i l y processed by simply r e t r i e v i n g the

con&ants of a JOB-TITLE f i e l d D i f f e r e n t v l y s of

r e f e r r i n g ~o job t i t l e can also be handled as syn-

onyms However, dosiin°independent systems g e t

i n t o deep t r o u b l e vhen the database f i e l d t h a t

needs t o be accessed is not d i r e c t l y indicated by

key words or phrases in the input request For

example,

(15) I s John Jones the c h i l d of an alumnus?

is e a s i l y processed i f there e x i s t s a

CHILD-OF-AN-ALUMNUS f i e l d , but the query

(16) I s one of John Jones' paren&s an alumnus?

contains no key word or phrase t o i n d i c a t e t h a t the

CHILD-OF-AN-ALURNUS f i e l d should be accessed, In a

knowledge-based system, the r e t r i e v a l routine is

generated from a conceptual representation of the

meaning of the user query and t h e r e f o r e key words

or phrases arm not required A r e l a t e d problem

occurs with queries i n v o l v i n g a ~ r e p t i o n or quan-

t i t y For example,

(17) How many employees are in the sales depart-

ment?

l i g h t require r e t r i e v i n g the value of a p a r t i c u l a r

f i e l d (e.g NUHBER-OF-EHPLOYEES), or i t s i g h t re-

quire t o t a l l i n g the number of records in the EH-

PLOYEE f i l e t h a t have the c o r r e c t DEPARTNENT f i e l d

value, or, i f the departments are broken down i n t o

o f f i c e s , i t l i g h t require t o t a l l i n g the NUN-

BER-OF-ENPLOYEES f i e l d f o r each o f f i c e In m do-

main-independent system, the c o r r e c t parse depends

upon the s t r u c t u r e of the database and is therefore

d i f f i c u l t to handle in a general way In a know-

ledge-based system such as PEARL, the d i f f e r e n t

database s t r u c t u r e s would simply require a l t e r i n g

the mapping between the conceptual representaSion

of the parse and the r e t r i e v a l query

F i n a l l y , t h i s r e l i a n c e on database s t r u c t u r e

can lead to wrong answers A c l a s s i c example is

H a r r i s ' (1979) 'snowmobile problem = Y h e n H a r r i s '

ROBOT system i n t e r f a c e s with a f i l e c o n t a i n i n g i n -

formation about homeowner's insurance, the word

'snowmobile" is defined as any number • 0 in the

'snowmobile f i e l d " of an insurance p o l i c y record

This means t h a t as f a r as ROBOT is concerned, the

question 'How many snowmobiles are there? = is no

d i f f e r e n t from "How many p o l i c i e s have snowmobile

coverage?" However, the c o r r e c t answers to the two

questions w i l l often be very d i f f e r e n t I f the

f i r s t question is asked and the second question is

answered, the r e s u l t is an i n c o r r e c t answer I f

the f i r s t question cannot be answered due t o the

s t r u c t u r e of the database, the system should inform the user the5 t h i s is the case

~oogluaioo=

I have argued above t h a t conceptually-based domain-specific knowledge is a b s o l u t e l y e s s e n t i a l

f o r n | t u r l l language database access systems Systems t h a t r e l y on dltabase s t r u c t u r e f o r t h i s domain-specific knowledge v i i i not achieve an ac- ceptable level of performance - - i e operate a t the level of understanding of a programmer-techni- cian

Because of the requirement f o r d e l i a n - s p e c i f i c knowledge, conceptually-based systems are r e s t r i c - ted t~o l i m i t e d domains and are not r e a d i l y p o r t a b l e

~o new content areas However, e l i m i n a t i n g the domain-speciflc conceptual knowledge is throwing

&he baby out with the ba&h water The conceptual- ly-based domain-specific knowledge is the key t o robust understanding

The approach o f the PEARL p r o j e c t with regard t~ the & r a n s p o r t a b i l i t y problem is t~ t r y and I -

d e n t i f y areas of discourse t h a t are common t~ most domains and t o b u i l d robust modules f o r natural language a n a l y s i s w i t h i n these domains Examples

of such domains are temporal reference, loci&ion reference, and report generation These modules are knowledge-based and can be used by a wide va-

r i e t y of domains t o help e x t r a c t ~hm conceptual content of a requss5

REFERENCES Dyer, N (1982) ~n:~9~h Und~£~aodiag~ ~ Cos- pu~nt HQdnl of In~ng£a~nd 8 t o , o a r i n g fg£ N a ~ i -

~[X§ Cg~D£ObgU~igO Yale U n i v e r s i t y , Computer Science Dept., Research Report #219

H a r r i s , t R (1979) Experience with ROBOT in 12 commercial natural language data base query ap-

p l i c a t i o n s , g£~oeding= Of ~b| O~b [o~ncna~ioo-

a l Joins Cgnfntnnco on & £ ~ i f i c i a l [ n ~ o l l l g o n c o Hendrix, G G (1976) LIFER: A natural language

i n t e r f a c e f a c i l i t y SRZ Tech Note 135 Dec

1976

Lehnert, W (1978) Ibo 8~o~o~ of Ggo~ioo 8O- sHO£iOg Lawrence Erlbaum Associates, H i l l s - dale, New Jersey

Lehnert, ¥ and Shwartz, S (1982) Nabural Language Data Base Access with Pearl EzoCmod- logs of ~be Hin~b Io~ntna~ional Conference on Comp~aSioQal L i n g u i s t i c = , Prague, Czechoslo- vakia

5 a c e r d o t i , E D (1978) A LADOER user's guide Technical Note 163 SRI P r o j e c t 6891,

Schank, R C and kbelson, R (1977) ~ £ i g ~ Elm0=, G~IIs add U0da£s~anding, Lawrence E r l - baum Associates, H i l l s d a l e Ne~ Jersey, 1977 Shwartz, S (1982) PEARL: ' k Natural Language Analysis System f o r Information R e t r i e v a l (sub- mitted to AAAI-82/applications d i v i s i o n ) Waltz, D L , Finin T., Green, F., Conrad, F., Goodman, B., Hadden, G (1976) The planes system: natural language access to a lar~e data base Coordinated Science Lab., Univ, of I l -

l i n o i s , Urbane, Tech Report T-34, (July 1976)

Ngày đăng: 21/02/2014, 20:20

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm