1. Trang chủ
  2. » Luận Văn - Báo Cáo

Báo cáo khoa học: "SAUMER: SENTENCE ANALYSIS USING METARULES" doc

9 294 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 9
Dung lượng 678,52 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

to provide a semantic interpretation of an input sentence.. like the correspondence between syntactic and semantic rules, with definite clause grammars DCC-s Pereira and Warren.. A.n im

Trang 1

S A U M E R : S E N T E N C E A N A L Y S I S U S I N G M E T A R U L E S

F r e d P o p o w i c h Natural Language Group Laboratory for Computer and Communications Research

Department of Computing Science Simon Fraser U n i v e r s i t y Burnaby B.C C A N A D A V5A 1S6

ABSTRACT The SAUMER system uses specifications of natural

language grammars, which consist of rules and metarules

to provide a semantic interpretation of an input sentence

programming language which c o m b i n ~ some of the

features of generalised phrase structure grammars (Gazdar

1981 ) like the correspondence between syntactic and

semantic rules, with definite clause grammars (DCC-s)

(Pereira and Warren 1980) to create an executable

g r a m m a r specification S S L rules are similar to D C G rules

except that they contain a semantic component and m a y

also be left recursive Metarules are used to generate n e w

rules t r o m existing rules before any parsing is attempted

A.n implementation is tested which can provide semantic

interpretations for sentences containing tepicalisation,

relative clauses, passivisation, and questions

1 INTRODUCTION

The SAUMER system a l l o w s the user to specify a

g r a m m a r for a natural language using rules and metarules

rhts g r a m m a r can then be u¢,ed ~ obtain a semantic

Specification l a n g u a g e (SSL) which L~ a variation of

definite clause g r ~ s (DCGs) (Pereira and Warren

1980) captures some ,ff the festures of generaI£.ted phrase

structure grammar5 (GPSGs) (Gazdax, 1981) (GaTrl~r and

Pullum 1982) like rule schemata, rule transformations

correspondence between syntactic and semantic rules The

semantics currently used in the system are based on

Schubert and Pelletiers description in (Schubert and

Pelletier 1982) - which adapts the intetmional logic

intervretation associated w i t h GPSGs into a more

conventional logical n o t a t i o n

2 THE SEMANTIC LOGICAL NOTATION

The logical notation associated w i t h the gr~mm~r

differs f r o m the usual notation of intensional logic_since it

captures some i n t m t i v e aspects of natural language, l

Thus individuals and objects are treated as entities instead of collections of prope'rties, and actions are n - a r y relations between these entities Many of the problems that the intensional notation w o u l d solve are handled by

notation Consequently as is common in other approaches (e.g Gawron 1982) much of the processing is deferred to the pragmatic stage The s t r u c t u r e of the lexicon, and the

brackets) are designed to reflect this ambiguity The lexicon is organised into t w o levels For the semantic interpretation, the first level gives each w o r d a t e n t a t i v e

complete processing information w i l l result in the final interpretation being obtained f r o m the second level of the lexicon For e ~ m p l e , the sentence John misses John could

be given an initial interpretation of:

(2.1) [ Johnl misa2 John3 ]

w i t h Johnl, miss2 and John3 obtained from the first level

of the t w o level lexicon T h e pragmatic stage w i l l determine if J o h a l and John3 both refer to the same

entry, say JOHN SMITH1 of the second level of the lexicon, or if t h e y correspond to d i f f e r e n t entries, say

pragmatic stage, the e n t r y of MISS which is referred to

by miss2 will be determined (if possible) For example, does John miss John because he has been a w a y for a long time, or is it because he is a poor shot w i t h a rifle?

A n y interpretation contained in sharp angle brackets

< > m a y require post processing This is apparent in interpretations containing determiners and co-ordinators The proverb:

(2.2) e v e r y m a n loves some w o m a n could be given the interpretation:

(2.3) [ < e v e r y l m a n 2 > love3 < s o m e 4 w o m a n S > ]

w i t h o u t explicitly stating whmh of the two readings is intended During pragmatic analysis, the scope of every and some w o u l d presumably be determined

111 should also be noted that due Io the separabili'~y of the semantic

component from ",he g r a m m a r rule, • different semantic notation could

easily be introduced at long as ~u~ a p p ~ p r i a t e ~.mantic proce~in8

rou~dne$ were replaced The use of SAUMER w i t h "an "Al-adap'md" version of M o n ~ u e ' s Intensional Logic" is being examined by Fawc©It (1984),

Trang 2

The s y n t a x of t h i s logical n o t a t i o n can be b-~mmav~sed

as follows Sentences a n d c o m p o u n d predicate f o r m u l a s

are c o n t a i n e d w i t h i n s q u a r e brackets So (2.4) s t a t e s t h a t

3oim w a n t s to kiss Mary:

(2.4) [ J o h n l w a n t 2 [John1 kiss3 Mary4]]

These f o r m u l a s can also be expressed e q u i v a l e n t l y in a

more f u n c t i o n a l f o r m according to t h e equivalence

(2.5) [ t n P t I t a d ]

- ( • ( ( P t l ) t 2) t n )

- - ( P t t t )

Consequently (2.4) could also be represented as:

(2.6) ( ( w a n t 2 ( ( k i s s 3 M a r y 4 ) J o h n l ) } J o h n l )

However t h i s n o t a t i o n is u s u a l l y used f o r i n c o m p l e t e

phrases, w i t h t h e s q u a r e b r a c k e t s used to o b t a i n a

cortvent/ona/ final reading Modified predicate f o r m u l a s

are contained in braces T h u s a little dog likes F i d o could

be expressed as:

(2.7) [ < a l {little2 d o g 3 } > likes4 FidoS]

The l a m b d a calculus o p e r a t i o n s of l a m b d a a b s t r a c t i o n a n d

e l i m i n a t i o n are also allowed W h e n a v a r i a b l e is

a b s t r a c t e d f r o m a n expression as in:

(2.8) kx [ • w a n t 2 [ • love3 M a r y 4 ] ]

application of t h i s n e w expression to a n a r g u m e n t , s a y

d o h n l :

(2.9) ( k x [ • w a n t 2 [ • love3 l~u~J'4 ] ] J o h n l )

will r e s u l t in an int~,v,©tation of John w a n t s to love Mary:

(2.10) [ J o h n l w a n t 2 [ J o h n l love3 M a r y 4 ] ]

F u r t h e r details on this n o t a t i o n are a v a i l a b l e in ( S c h u b e r t

a n d Pelletier 1982)

3 T H E S A U M E R S P E C I F I C A T I O N L A N G U A G E

p r o g r a m m i n g l a n g u a g e t h a t a l l o w s t h e user to d e f i n e a

g r a m m a r of a n a t u r a l language "in ~ of rules, and

metarules M e t a r u l e s operate on rules to produce n e w

rules The language is basically a GPSG realised in a

DCG setting U n l i k e GPSGs t h e g r a m m a r s defined b y

t h i s s y s t e m are not required to be c o n t e x t - f r e e since

procedure calls are allowed w i t h i n the rules, and since

logic v a r i a b l e s are allowed in t h e g r a m m a r s y m b o l s

The basic objects of the language are a t o m s , variables

t e r m s , and lists A n y w o r d s t a r t i n g w i t h a l o w e r case

letter, o r enclosed in single q u o t e s is a n atom V a r i a b l e s

s t a r t w i t h a capital letter or a n underscore A t e r m is a n

atom o p t i o n a l l y followed b y a series of objects

( a r g u m e n t s ) , w h i c h are enclosed in parentheses and

separated b y commas L a s t l y a list is a series o f one o r

m o r e objects, separated b y commas, t h a t are enclosed in

s q u a r e b r a c k e t s

3 1 R u l e s

T h e r u l e s are presented in a v a r i a t i o n of t h e DCG

n o t a t i o n , a u g m e n t e d w i t h a s e m a n t i c r u l e c o r r e s p o n d i n g to each s y n t a c t i c rule Each r u l e is of t h e f o r m

"A - - > B : ~," w h e r e A is a t e r m w h i c h denotes a

n o n t e r m i n a l s y m b o l B is e i t h e r a n a t o m list r e p r e s e n t i n g

a t e r m i n a l s y m b o l or a c o n j u n c t i o n of t e r m s ( s e p a r a t e d by

c o m m a s ) c o r r e s p o n d i n g to n o n t e r m i n a l s y m b o l s , and y is a

s e m a n t i c r u l e w h i c h m a y reference t h e i n t e r p r e t a t i o n of

t h e c o m p o n e n t s of ~ in d e t e r m i n i n g the s e m a n t i c s of A The r u l e a r r o w - - > separates t h e t w o sides of the rule

w i t h t h e colon : separating t h e s y n t a c t i c c o m p o n e n t f r o m

t h e s e m a n t i c component If t h e r u l e is preceded b y t h e

w o r d a d d , it can be subjected to t h e t r a n s f o r m a t i o n s described in section 3.2 The n o n t e r m i n a l s y m b o l s can possess a r g u m e n t s , w h i c h m a y be used to c a p t u r e t h e

f l a v o u r of t h e s t r u a u r a d categor/~s of GPSGs ~ m a y also possess a r b i t r a r y p r o c e d u r a l r e s t r i c t i o n s c o n t a i n e d in braces

T consists of expressions in t h e s e m a n t i c n o t a t i o n

T h e d i f f e r e n t t e r m s of t h i s s e m a n t i c expression are joined

b y t h e s e m a n t i c connector, t h e a m p e r s a n d "&' T h e

a m p e r s a n d d i f f e r , f r o m t h e s y n t a c t i c connector, t h e

c o m m a , sinc~ t h e f o r m e r associates to t h e r i g h t w h i l e t h e

l a t t e r associates to the left The /og/col a n d s y m b o l

w h i c h t r a d i t i o n a l l y m a y also be d e n o t e d b y t h e

a m p e r s a n d , m u s t be entered as "&&' Due to c o n s t r a i n t s

imposed b y the c u r r e n t i m p l e m e n t a t i o n , "( exFr )" m u s t

be entered as " < [ expr ]' " < expr >" as " < < [ expr ]'

a n d "k x expr" as "x l m d a expr." An expression m a y

c o n t a i n references to t h e i n t e r p r e t a t i o n s of t h e e l e m e n t s of

18 b y s t a t i n g t h e a p p r o p r i a t e n o n t e r m i n a l f o l l o w e d b y t h e left quote, " To p r e v e n t a m b i g u i t y in "these references

t h a t m a y arise w h e n t w o identical s y m b o l s a p p e a r in B a

n o n t e r m i n a l m a y be a p p e n d e d w i t h a m i n u s sign f o l l o w e d

b y a u n i q u e integer

U n l i k e s t a n d a r d Prolog i m p l e m e n t a t i o n s of DCGs left recursion is allowed in rules, t h u s p e r m i t t i n g m o r e n a t u r a l

d e s c r i p t i o n s of certain p h e n o m e n a (like c o - o r d i n a t i o n ) Since t h e left r e c u r s i v e rules are i n t e r p r e t e d , r a t h e r t h a n

c o n v e r t e d into rules t h a t are not left recursive, t h e

n u m b e r of rules in the d a t a b a s e will not be affected

H o w e v e r t h e efficiency of the sentence a n a l y s i s m a y be affected d u e to the e x t r a processing required Rules of

t h e f o r m "A - - > A A" are not accepted

A n e x a m p l e of a p r o d u c t i o n t h a t derives J o h n f r o m a

p r o p e r n o u n n p r is s h o w n in (3.1):

(3.1) n p r - - > [ ' J o h n ' ] : " J o h n ' #

The s e m a n t i c i n t e r p r e t a t i o n of t h i s n p r will be J o h n #

w i t h " # " replaced b y a u n i q u e integer d u r i n g e v a l u a t i o n (3.2) i l l u s t r a t e s a v e r b p h r a s e r u l e t h a t could be used in

sentences like J o h n w a n t s to wa/k:

(3.2) v p ( N u m ) - - >

v ( N u m R o o t ) w i t h Root in [want.like] v p ( i n f )

x # # lmda [ x # # & v" & [ x # # & vp']) ]

Trang 3

F i r s t nottce t h a t a restriction on t h e v e r b appears w i t h i n

t h e w / t h s t a t e m e n t In t h e GPSG f o r m a l i s m , t h i s t y p e of

r e s t r i c t i o n w o u l d be o b t a i n e d b y n a m i n g t h e r u l e s a n d

associating a list of v a l i d r u l e n a m e s w i t h each lexical

e n t r y A l t h o u g h t h e w/~h r e s t r i c t i o n m a y c o n t a i n a n y

v a l i d in-ocedure, t y p i c a l l y the i n o p e r a t i o n ( f o r d e t e r m i n i n g

list m e m b e r s h i p ) is used T h e d o u b l e p o u n d # # is

replaced b y t h e s a m e u n i q u e integer in t h e e n t i r e

expression w h e n t h e expression is e v a l u a t e d I f " # " w e r e

used instead, each i n s t a n c e of x # w o u l d be d i f f e r e n t For

t h e a b o v e example, if v' is w a n t 2 a n d vp' is runJ then

t h e s e m a n t i c expression could e v a l u a t e to:

(3.3) x4 l m d a [x4 & w a n t 2 & [x4 & r u n 3 ] ]

F u r t h e r m o r e if np" is Johrtl then:

(3.4) [np" & v p ' ]

could r e s u l t in:

(3.5) [Johnl & w a n t 2 & [Johnl & run3]]

3.2 T h e Metarules

T r a d i t i o n a l t r a n s f o r m a t i o n a l g r a m m a r s p r o v i d e

t r a n s f o r m a t i o n s t h a t operate on parse trees, or s i m i l a r

s t r u c t u r e s , a n d o f t e n require t h e t r a n s f o r m a t i o n s to be

used in sentence recognition r a t h e r t h a n in generation

( R a d f o r d 1981) H o w e v e r t h e approach suggested b y

(GaT~2r 1981) uses t h e t r a n s f o r m a t i o n s g e n e r a t i v e l y a n d

applies t h e m to the g r a m m a r T h u s t h e g r a m m a r can

r e m a i n contex:-free b y compiling t h i s t r a n s f o r m a t i o n a l

k n o w l e d g e into t h e g r a m m a r T r a n s f o r m a t i o n s a n d r u l e

s c h e m a t a f o r m t h e m a a z u / ~ s of SSI- 2

Rule s c h e m a t a a l l o w t h e user to specify e n t i r e classes

of r u l e s b y p e r m i t t i n g v a r i a b l e s w h i c h range o v e r a

selection of categories to a p p e a r in t h e rule To c o n t r o l

t h e v a l u e s of t h e variables, t h e f o r a / / c o n t r o l s t r u c t u r e can

be used in the s c h e m a declaration T h e s c h e m a

f o r a / / X ~n List, Body w i l l execute Body f o r each e l e m e n t

of L i ~ w i t h X i n s t a n t i a t e d to t h e c u r r e n t element T h e

use of this statement is illustrated in the following

m e t a r u l e t h a t generates t h e t e r m i n a l p r o d u c t i o n s f o r p r o p e r

nouns."

(3.6) f o r a l l T e r m i n a l in [ ' B o b ' ' C a r o l ' ' r e d ' ' A l i c e ' ] ,

( n p r - - > [ T e r m i n a l ] : T e r m i n a l # )

T r a n s f o r m a t i o n s m a t c h w i t h g r a m m a r r u l e s in t h e

database, using a r u l e p a t t e r n t h a t m a y be a u g m e n t e d

w i t h a r b i t r a r y procedures, a n d produce new r u l e s f r o m

t h e old rules A t r a n s f o r m a t i o n is of t h e f o r m :

(3.7) a - - > /i : y - - - > a' - - > B" : 7"

The m e t a r u l e a r r o w - - > , separates t h e p a t t e r n ,

a - - > ~ : T f r o m the t e m p l a t e , a" - - > /i" : T'-

2 O f l e n metarule~ are considered 1o consisl of t r a n s f o r m a t i o n s o n l y ,

w h i l e s c h e m a t a are p u l inlo a c a t e g o r y of their o w n H o w e v e r sinoe

t h e y can both be considered i~ p a r t of • m e t a g r a m m a ~ , t h e y are called

me~trule~ in t h l , distna~inn

T h e ~ n ~ a ~ p a t t e r n , Q - - > /i c o n t a i n s n o n t e r m i n a l s

w h i c h c o r r e s p o n d to s y m b o l s t h a t m u s t a p p e a r in t h e

m a t c h e d rule, a n d free v a r i a b l e s , w h i c h r e p r e s e n t don't

~ r ~ r e g i o n s o f zero or m o r e n o n t e r m i n a l s T h e p a t t e r n

n o n t e r m m a l s m a y also possess a r g u m e n t s For each r u l e

s y m b o l , a m a t c h i n g p a t t e r n s y m b o l describes p r o p e r t i e s

t h a t must exist, b u t n o t all the p r o p e r t i e s t h a t may exist

T h u s if v p appeared in the p a t t e r n , it w o u l d m a t c h a n y

of vp vp(Num), o r vp(Nura2"ype) with Type in /transl

H o w e v e r pp(to) w o u l d n o t m a t c h pp or pp(frora), b u t it

w o u l d m a t c h plMto,_) T h e m a t c h i n g c o n d i t i o n s are

s u m m a r i s e d in Figures 3-1 and 3-2 In Figure 3-1 A a n d

B are n o n t e r m i n a l s X is a free v a r i a b l e , a n d a a n d /i are

c o n j u n c t i o n s o f one o r m o r e s y m b o l s , y a n d 8 o f Figure 3-2 are also c o n j u n c t i o n s of one or m o r e s y m b o l s "=" is

d e f i n e d as u n i f i c a t i o n (Clocksin a n d Mellish, 1981) P a r t s

of the r u l e c o n t a i n e d in braces are ignored b y t h e p a t t e r n

m a t c h e r T h e s y n t a c t i c p a t t e r n m a y also c o n t a i n a r b i t r a r y restrictions 3 enclosed in braces, t h a t are e v a l u a t e d d u r i n g

t h e p a t t e r n m a t c h T h e s e m a n t / c p a t t e r n , y, is v e r y

p r i m i t i v e , h m a y c o n t a i n a free v a r i a b l e , w h i c h w i l l

b i n d to the e n t i r e s e m a n t i c s field of the m a t c h e d r u l e , or

it m a y c o n t a i n t h e s t r u c t u r e < [ ? ~] w h i c h w i l l b i n d to the e n t i r e s t r u c t u r e c o n t a i n i n g t h e s y m b o l x If < [ ? y]

t h e n a p p e a r s in y ' , t h e r e s u l t will be t h e s e m a n t i c

c o m p o n e n t of t h e m a t c h e d r u l e w i t h x replaced b y y

P a t t e r n

Rule

(A a )

(X a )

A

X

A m a t c h e s B A m a t c h e s B a n d

a n d a m a t c h e s ~ a is a free v a r i a b l e (X a ) m a t c h e s /i a m a t c h e s B

or a matches (B ~ )

F i g u r e 3-1: P a t t e r n M a t c h i n g f o r C o n j u n c t i o n s

P a t t e r n

Rule b(/i[ /I n) b(,/i I /in ) with 8

a(a I a m )

a ( a I a = )

w i t h

a = b m ~ < n ati=/i i, 1~<i~<m

No

a - - b m ~ n

a i = / i i, l ~ i ~ m

a = b m ~ n

a i = / i i l ~ < i ~ < m "

m a t c h e s 8

F i g u r e 3-2: P a t t e r n M a t c h i n g for N o n t e r m i n a l s

3Apparently no1 present in the Hewle1"t Packard system (Gawron, 1982) or the ProGram system (Evans and Ga~l~r, 1984)

Trang 4

The behaviour of patterns can be seen in the following

examples Consider the sentence rule:

(3.8) s(decl) > n p ( n o m N u m b )

v p ( _ J q u m b ) w i t h agreement(Numb) : [ rip" & vp" ]

The patterns shown in (3.9a) w i l l match (3.8) while

those of (3.9b) will not match it

(3.9) (a) s(A) - - > {not element(A,[foo])L X vp : Sere

s - - > np(nom), X vp(pass) Y : Sere

(b) s(inter) - - > np v p : Seam

s - - > vp : Sere

For the v e r b phrase rule shown in (3.10):

(3.10) vp(active.[MIN]) - - >

v([MIN],Root,Type,_) w i t h (intrans in Type)

: v"

the patterns of (3.11a) will result in a successful match

will those of (3.11b) w i l l not:

W i t h external modification, any nonterminal, or variable instantiated to a nonterminal, m a y be f o l l o w e d

by the sequence @rood This w i l l result in rood being inserted i n t o the a r g u m e n t list f o l l o w i n g the specified

arguments Thus, mf N@junk appeared in a rule w h e n N was instantiated to np(more), it w o u l d be expanded as rip(more,junk } Similarly, if the pattern s y m b o l vp matched v,v{NumS) in a rule, then the appearance of vp@foo in the template w o u l d result in vp(foo~Vumb)

introduced by the modifier, can be useful w h e n dealing

w i t h the missing components of slash or derived categories (Gazdar, 1981)

Internal modification allows the m o d i f i e r to be p u t directly into the argument list If an a r g u m e n t is followed by @rood it will be replaced by rood In the case where @rood appears as an argument by itself, rood is

v(Numb@pastpart) were contained in a template, it w o u l d

IT-match v(Numb) in the pattern, and w o u l d result in the appearance of v(pastpart) in the new rule

(3.11) (a) v p - > v : <[?v]

v p - - > v( T y p e _ )

with (X, intrans in Type Y)

Z : S e m (b) v p - - > v ( _ T y p e _ )

w i t h (X trans in Type) : S e m

v p - > v ( _ ~ o o t )

w i t h (Root in [fool X)

:Sem

For every rule that matches the pattern, the template

of the transformation is executed, resulting the creation of

a new rule A n y nonterminal N, that matches a symbol

8 i on the left side of the transformation, will appear in

the new rule if there is a symbol ~i" in 8" that

irura-transformation (IT) matches w i t h ~i" If there are

several symbols in 8" that IT-match ~i" the leftmost

s y m b o l w i l l be selected No symbol on one side of the

transformation may IT-match with more than one s y m b o l

on the other side T w o symbols will IT-match o n l y if

they have the same n u m b e r of arguments, and those

arguments are identical A n y w/th expressions and

modifiers associated w i t h symbols are ignored during IT-

matching 8" m a y also contain extra s y m b o l s that do not

correspond to anything in 8 In this case t h e y are

inserted directly into the new rule Once again, if the

transformation is preceded by the command add then the

transformations

3.3 Modifiers

Both rules and metarules m a y c o n t a i n s modifiers that

alter the ~tructure of the nonterminal symbols There are

t w o types of modification, which have been dubbed

external and /nzerrud modification

4 IMPLEMENTATION

T h e S A U M E R system is currently implemented in highly portable C-Prolog (Pereira 1984) and runs on a Motorola 68000 based S U N Workstation supporting U N I X 4 Calls to Prolog are allowed by the system, thus providing

implementation Implementations in other languages w o u l d

d i f f e r externally o n l y in the syntax of the procedure calls that m a y appear in each rule Use of the system is described in detail in (Popowich, 1985)

The current implementation converts the g r a m m a r as specified by the rules and metarules into Prolog clauses This conversion can be examined in terms of how rules are processecl, and h o w the schemata and transformations are processed

4.1 Rule P r o c e s s i n g

The syntactic component of the rule processor is based

processor (Clocksin and Mellish 1981) which has been

nonterminal is converted into a Prolog predicate, with t w o additional arguments, that can be processed by a t o p - d o w n parser These ~ t n arguments correspond to the list to be parsed, and the remainder of the list after the predicate has parsed the desired category W i t h the addition of semantics to each rule, another argument is required to represent the semantic interpretation of the current symbol Thus w h e n e v e r a left quoted category name x'

4UNIX is • Inulemark of Bell Laboralories

Trang 5

variable bound to the semantic argument of the

expression is then evaluated by the eva/ routine w i t h the

result bound to the semantic argument of the nonterminal

on the left hand side of the production For ~ffiample the

sentence /ule:

(4.1) add s(decl) - >

np(nom.Numb)

v p ( _ 2 q u m b ) w i t h agreement(Numb) : [ np" & vp" ]

will result in a Prolog expression of the form:

(4.2) s(SemS.decl._l 3) :-

nlKSemNP.nom2qumb 1 2 ) vp(SemVP, 2qumb 2 3)

agreement(Numb)

eval([SemNP & SemVP],SemS)

Consequently to process the sentence John runs one

w o u l d try to satisfy:

(4.3) :- s(Sem, Type ['John'.runs] [])

The first argument returns the interpretation, the second

a r g u m e n t returns the t y p e of sentence, the third is the

initial input list and the final argument corresponds to

the list rPmaining after finding a sentence A n y rule R,

that is preceded by add w i l l have the axiom r'ul~(R)

inserted into the database These axioms are used by the

transformations during pattern matching

The eva/ routine processes the s u f f i x symbols, # and

# # along w l t h the lambda expressions, and m a y perform

some- reorganisation of the given expression before

returning a new semantic form For each expression of

the f o r m n a m e # , a unique integer N is ca-eared and

nan~-N is returned With " # # ' the procedure is the

same except that the first occurrence of " # # " w i l l generate

a unique integer that w i l l be saved for all subsequent

occurrences To evaluate an expression of the f o r m :

(4.4) ( expr i Lmda e ~ F j & X )

every subexpression of exprj is recursively searched for an

occurrence of expr i which is then replaced by X

Left recursion is removed w i t h the aid of a gap

predicate identical to the one defined to process gapping

g r - a m m a r S (Dahl and Abramson 1984) and unre~Lricte~

gapping g r a m m a r s (Popowich forthcoming) For any rule

of the form:

(4.5) A - - > A B a

where A does not equal B the result of the translation is:

(4.6) A f _ I N n) :- g a p ( G _ l 2) B ( 2 N o ) A(G,[])

<Xl (No,N 1 ) tXn(Na_l.Nn), According to (4.6) a phrase is processed by skipping over

a region to find a B - - the first non-terminal that does

not equal A The skipped region is then examined to

ensure that it corresponds to an A before the rest of the phrase is processed

4.2 Schema Processing

To process the metarule control structures used by

schemata, a f m l predicate is inserted to force Prolog to t r y

all possible alternatives T h e simple recursive definition

of / o r e / / X / ~ /./rt:

(4.7) f o r a l l ( X in [], Body)

f o r a l l ( X in [YIRest]~xty) :- (X=Y c a l l l ( B o d y ) , fail) : forall(X Rest Body)

uses f a / / to undo the binding of Y, the first element of the list to X before calling fore// w i t h the remainder of the list The predicate ¢.<d/l is used to evaluate Body

since it w i l l prevent the f a / / predicate f r o m causing backtracking into Body

4.3 Transformation Processing

complex processing of all of the m e t a g r a m m a t i c a l operations This processing can be divided into the three stages of transformation c r Y pattern matching, and rule

crem,/on 5

During the rrar~fornuU/~n trot/on phase, the predicate

rrarts(M,X,Y) is created for the metarule M This predicate will transform a list of elements X: into another ILSL Y, according to the syntax specification of the metarule Elements that IT-match will be represented by the s a m e free variable in both lists This binding will be one to one since an element cannot match with m o r e than one element on the other side S y m b o l s that appear on only one side will not have their free variable appearing

on the opposite side Expressions in braces are ignored during this stage If a transformation like:

( 4 8 ) a - - > b, c X - - > a@foo - - > b X c(foo) appears, then a predicate of the form:

(4.9) t r ~ s ( M L 1 _ 2 _ 3 X ] L 1 _ 2 X _ 4 ] ) will be created Notice that the appearance of a modifier does not cause a@/oo to be distinguished from a since all modifiers are removed before the p a t t e r n - t e m p l a t e match is attempted However c and c(foo) are considered to be

d i f f e r e n t symbols M is a unique integer associated w i t h the transformation

The pattern match phase determines if a rule matches the pattern, and produces a list for each successful match

which w i l l be transformed by the trans predicate Each element of the list is either one of the matched s y m b o l s

f r o m the rule or a list of s y m b o l s corresponding to the

don't care region of the pattern A n y predicates that

5(Popowich, forthcoming) examines a method of t r a n s f o r m a l i o n

~ i n g t h a t uses the transformations d u r i n g ~3~e par~e, instead of Using them m L~me~te new ~.fle~

Trang 6

appear in braces in the pattern a r e evaluated during t h e

pattern match Consider the operation of an active-passive

v e r b phrase transformation:

(4.10) vp(active~Numb) - - >

v(Numb.R.Type.SType)

w i t h (X.trans in Type.Y)

np Z

< [ ? np']

v ~ p a s s N u m b ) - - >

v(Numb.be.T.S)-I w i t h auz in T

v(Numb@pastpart.R.Type.SType)

w i t h (X.trans in Type.Y)

z pp(by._)

: x # # Imda [pp(by)" & <[7 x # # ] ]

on the following v e r b phrase:

(4.11) vp(active.Numb) - - >

v(Numb~R.Type._) w i t h trans in Type

n~[x.A.x] )

: < [ v" & np" ]

The list produced by the pattern match w o u l d resemble:

'.12) [ vp(active.Numb)

v ( N u m b R T y p e _ ) w i t h [[].trans in Type~]]

Notice that there w a s nothing in the rule to bind with X

Y or Z Consequently these variables were assigned the

null list [] T h e pattern match of the semantics of the

rule will result in an expression which lambda abswacts

np" out the of semantics:

(4.13) < [ np" lmda < [ v" & np" ] ]

transformation to the list produced by the pattern match

and then uses the new list and the template to obtain a

new rule This phase includes conversion of the new list

back into rule form the application of modifiers, and the

addition of any extra symbols that appear on the right

hand side only To continue w i t h our *Tample the trans

predicate a.~ociated w i t h (4.10) w o u l d be:

(4.14) trans(N [ _ 1 _ 2 _ 3 Z ] [ _ 3 4 _ 2 1 5 ] )

Notice that the two v p ' s on opposite sides of the metarule

do not match So the transformed list w o u l d resemble:

(4.15) [ _ 3

4 ,

v ( N u m b R T y p e _ ) w i t h [[].trans in Type,[]]

[3

_ 5 1

The rule generated by the rule creation phase w o u l d be:

(4.16) v p ( p a s s ~ l u m b ) - - >

v ( N u m b b e T ~ ) - I w i t h aux in T

v ( p a s t p a r t R , T y p e _ ) w i t h t n n s in Type

p p ( b y _ )

: x # # lmda [ pp(by)" & < [ v" & x # # ] ]

• Notice that the expression " < [ v" & x # # ]' which is

• contained in the semantics of (4.16) was obtained by the application of (4.13) to x # #

5 A P P L I C A T I O N S

To examine the usefulness of this t y p e of g r a m m a r

implementation, a g r a m m a r was developed that uses the

(Cercone et.al 1984) The A A A is an interactive information system under development at Simon Fraser University It is intended to act as an aid in "curriculum planning and management', that accepts natural language queries and generates the appropriate responses Routines

retrieving lexical information were also provided

permits some possessive forms, and allows auxiliaries to appear in the sentences From the base of t w e n t y six rules, eighty additional rules were produced by three metarules in about e i g h t y - f i v e seconds Ten more rules were needed to link the lexicon and the grammar A selection of the rules and metarules appears in Figure 5-1

(Popowich 1985)

In the interpretations of some ~ m p l e sentences, which can be found in Figure 5-2, some liberties are taken w i t h the semantic notation Variables of the f o r m wN where

N is any integer, represent entities that are to be instantiated f r o m some database Thus any interpretation containing w N w i l l be a question Possessives like John's

tab/e are represented as:

(5.1) < t a b l e & [John poss table]>

Although m u l t i p l e possessives which associate f r o m left to right are allowed, group possessives as seen in:

(5.2) the man who passed the course's book and in phrases like:

(5.3) John's driver's lice.ace

Inverted sentences are preceded by the w o r d Q u e r y in the output Also proper nouns are assumed to unambiguously refer to some object, and t h u s are no longer followed by

interpretation are give 9 in CPU seconds The total time includes the t i m e spent looking for all other possible parses

Results obtained w i t h SAUMER compare f a v o u r a b l y to those obtained f r o m the ProGram system (Evans and Gazdar 1984) ProGram operates on g r a m m a r s defined according to the current GPSG formalism (Ga2dar and Pullum 1982) but was not developed w i t h efficiency as a major consideration The grammar used w i t h ProGram which is given in (Popowich 1985) is similar to the A A A

Trang 7

add vp(octive.Numb) ~ > v(Numb Root T, _) w i t h (Root in [ p a s s g i v e , t e a c h , o f f e r ] , indabj in T t r e e s in T ) ,

np([x.D.x] ) n p ( [ x * x ] )-1 : <[ v' a np' a np-t' ]

Je WH <lueetions in i n v e r t e d sentences * / e v c l ( y ~ , V a r ) , NP - np(Case.Numb,Feat)

• ( NPONP ~ > [ ] |agreement(Case)| : Var )

, ( e ( i n v ) ~ > n p ( [ x , A , x ] , N o m b , F e a t ) w i t h Clword in Feat, e ( i n v ) O n p ( [ x , A , x ] , N u m b , F e a t )

: <[ (Vat lads s ' ) • np' ] ) /* p a s s i v e t r e n e f a r n m t i o n e /

add vp(octive.Numb) - - > v(Numb.R.Type.Subtype) w i t h (X t r e e s in Type0 Y) npo Z : <[? np °]

mE> vp(poss,Humb) ~ > v(Numb,be,T,S) I w i t h aux in T,

v(Numi:gpaetpart, R Type, Subtype) w i t h (X, t r e e s in Type, Y),

Z o p t i a n a l ( p p ( b y _ ) ) : x ~ Imda [ o p t i o n a l " k <[ ? x ~ ] ]

/ * sentence i n v e r s i o n */

add v p ( T [ M i N ] ) ~ > v([MJN],R,Type,S) w i t h (X, aux in Type, Y ) , Z : $em

m > s ( i n v ) - - > v ( [ U I N ] , R , T y p e , S ) w i t h (X.aux in Type,Y), n p ( [ N l , x , x ] , [ M l N ] , _ ) , Z : [ n p ' a Semi

/ , m e t a r u l e f o r the p r o p a g a t i o n of " h o l e s " in the " s l o s h " c a t e g o r i e s e /

f a r a i l Hole in [ p p ( P r e p , F e a t ) , n p ( C a s e , N o m b , F o o t ) ]

( f o r a l l Cat1 in [ s ( T y p e ) , v p p p ( P r e p , F e a t ) , o p t i o n a l ]

• ( f o r a l l Cat2 in [ v p , p p ( P r e p , F e a t ) , n p ( C a a e , N u m b , F o a t ) , o p t i o n a l ]

, ( Cat1 m > X Cot2, Y : Sem m > C e t l I H o i e m > X, Cat2OHalo, Y : Sen ) ) )

Figure 5-1: Excerpt from Grammar

Sentence Query:

A n a l y o , e :

d i d Fred take o m p t l e l [Fred t a k e s c m p t l e l ] 2.25 eec T o t a l : 4 28334 sea

Sentence: who wonts to teach F r e d ' s p r o f e s s o r ' s course

Semantics: [ <wl • [wl onlmgte]>

wont4 [ <wl • [wl an im at e ] >

teach13

<course14 k [ < p r o f e s s a r I S • [Fred pace p r o f o s e a r l S ] > poes course14]>

] ]

A n a l y s i s : 6.58337 eec T o t a l : 18.9834 ee¢

Sentence' Query"

A n a l y s i s :

whose course does the student whom John l i k e n want t o be t a k i n g [ <<the38 student39> • [John like4S <the38 s t u d e n t 3 9 > ] >

wont46 [ <<the38 student39> • [John like4S <the38 student39>]>

takeS6

<course29 • [<w3e • [w3e a n im a te ]> pose caurwe29]>

] ]

21.9999 eec T o t a l : 39.4 sac

Sentence:

Query:

A n a l y s i s :

t o whom daee the p r o f e s s o r want which paper to be g i v e n [ <the14 p r o f e s s o r l S >

want17 [ x39 givo3S <w7 k [w7 aninmte]> <w21 k [w21 paper22]> ]

]

14.3167 sec T o t a l : 29.5167 sec

Figure 5-2: Summary of Test Results

Trang 8

g r a m m a r u s e d by SAUMER except that it has a much

smaller lexicon, and allows neither relative clauses nor

SAUMER ProGram required about 35 seconds to parse the

sentence does John take cmpelOl, w i t h a total processing

time of abo,.u 140 second.~ SAUMER required just o v e r 2

seconds to parse this phrase, and had a total processing

time of about 4 seconds

As it stands, the semantic notation used by SAUMER

does "not contain much of the relevant information that

" w o u l d be required by a real system Tense n u m b e r and

adverbial information, including concepts like location and

time w o u l d be required in the A A A If the SSL

description were to be extended, w i t h the resulting system

behaving as a natural language interface of the A A A a

more database directed semantic notation w o u l d prove

invaluable

6 PRESENT IXMITATIONS

Although this application of metarules a l l o w s succinct

descriptions of a grammar, several problems have been

observed

Since each metarule is applied to the r u l e base only

once the order of the metarules is v e r y important In

our sample grammar, the passive v e r b phrases were

generated before the sentence inversion t r a n s f o r m a t i o n was

transformations were executed For the c u r r e a t

implementation, if a rule generated by t r a n s f o r m a t i o n T1

is to be subjected to transformation T2 then T1 m u s t

appear before T2 Moreover no rule t h a t is the result of

T 2 - c a n be operated on by T I It w o u l d be preferable to

remove this restriction and impose one that is less severe

such as the finite closure restriction which is described in

(Thompson 1982) and used by ProGram W i t h this

improvement, the o n l y restriction w o u l d be that a

derivation of a rule

The system can not c u r r e n t l y process rules expressed

in the Immediate Dominance/ Linear Precedence (ID/LP)

format (Gazdar and Pullum 1982) With this format, a

production rule is expressed w i t h an unordered right hand

declaration of //near precedence For example, a passive

verb phrase rule could appear something like"

(6.1) vp(pass.[MIN]) - - >

v([MIN], be )

v ( _ Root Type _ ) w i t h

(Root in [pass.carry.give]

indobj in Type

trans in Type)

p p ( t o ) optional(pp(by)) : x # # Imda [optional" & <[v" & pp(to)" & x # # ] ]

w i t h the components having a linear precedence of:

(6.2) v(_.be) < v < pp The result w o u l d be that the pp(by) could appear before

or after the pp(to), since there is no restriction on ' t h e i r relative positions I f this f o r m a t were implemented, o n l y one passive m e t a r u l e w o u l d have to be explicitly stated The direct processing of ID/LP gremm~rs is discussed in (Shieber 1982) (Evans and Gazdar 1984) and (Popowich forthcoming)

7 CONCLUSIONS

SSL appears to adequately capture the f l a v o u r of GPSG descriptions w h i l e allowing more procedural control Investigation into a relationship between SSL and GPSG

grammars could result in a method for translating GPSG

grammars into SSL for execution by SAUMER F u r t h e r research could also provide a relationship between SSL and other g r a m m a r formalisms, such as /ex/c~-funct/on,d

implementation of SAUMER allowing left recursion in rules, should facilitate a more detailed s t u d y of the specification language, and of some problems associated

w i t h m e t a r u l e specifications Due to the easy separability

of the semantic rules, one could attempt to introduce a more database oriented semantic notation and develop an interface to a real database One could then examine system behaviour w i t h a larger rule base and more involved transi'ormations in an applications e n v i r o n m e n t like that of the A A A However as is apparent f r o m the application presented here and f r o m p r e l i m i n a r y

f u r t h e r investigation of the efficient operation of this Prolog implementation w i t h large grammars w i l l be required

ACKNOWLEDGEMENTS

l w o u l d like to thank Nick Cercone for reading an earlier version of this paper and providing some useful suggestions The comments of the referees were also helpful Facilities for this research were provided by the Laboratory for Computer and Communications Research This w o r k Was supported by the Natural Sciences and Engineering Research Council of Canada under Operating

Postgraduate Scholarship #800

REFERENCES

Cercone N Hadley R Martin F McFetridge P and Strzaikowski T D e a i ~ i n ~ a n d a u t o m a t i n g t h e

q u a l i t y mmesmment o f a k n o w l e d g e - b a m ~ s y s t e m : t h e

i n i t i a l a u t o m a t e d a c a d e m i c a d v i s o r e x p e r i e n c e , pages 193-205 IEEE Principles of Knowledge-Based Systems Proceedings Denver Colorado 1984

Clocksin W.F and Mellish C.S P r o g r n m m l n g i n P r o l o g Berlin-Heidelberg-NewYork:Springer-Verlag 1981

Trang 9

Dahl V and Abramson H On Gapping G r ~ m m ~ Proceedings of the Second International Joint Conference

on Logic University of Uppsala Sweden 1984

Evans R and Gazdar G The P r o G r a m M a n u a l Cognitive Science Programme University of Sussex,

1984

Fawcett B personal c o m m n n i c a t i o n Dept of Computing Science University of Toronto 1984

Gawron J.M et.aL Procemiag English w i t h a GenersliT~d Phrase S t r u c t u r e G r a m m a r pages 74-81 Proceedings of the 2Oth Annual Meeting of the Association for Computational Linguistics, June 1982 Gazdar G Phrase Structure Grammar In Po Jacobson and G.K Pullum (Ed.) The N a t u r e o f Syn~cx.ic Representation, D.Reidel Dortrecht, 1981

Gazdar G and Pullum G.K G e n e r a l i z e d Phrase

S t r u c t u r e Gr~mm,~r: A T h e o r e t i c a l Synopsis Technical Report Indiana University Linguistics Club Bloomington Indiana August 1982

Kaplan R and Bresnan J Lexical-Functional Grarnmar:

A Formal System for Grammatical Representation I n

J Bresnan (Ed.) M e n t a l R e p r e s e n t a t i o n o f

G r a m m a t i c a l Relation& M r r Press 1982

Pereira F.C.N.(ed) C-Prolog User's Manual Technical Report SRI International Menlo Park California 1984 Pereira F.C.N and Warren, D.H.D Definite Clause Grammars for Language Analysis A r t i f i c i a l

I n t e l l i g e n c e 1980 13, 231-278

Popowich F S A ~ Sentence ,t~nlysi~ Using ]~ETaJ~lL].es (]Pl-el iminal-y Report) Technical Report TR-84-10 and LCCR TR-84-2 Department of Computing Science Simon Fraser University August

1984

Popowich F The SAUMER User's Manual Technical Report TR-85-3 and LCCR TR-85-4 Department of Computing Science Simon Fraser University, 1985

Popowich F E f f e c t i v e I m p l e m e n t a t i o n a n d A p p l i c a t i o n

o f Ulxrestricted Gapping GrammArS Master's thesis Department of Computing Science Simon Fraser University forthcoming

Radford A Tr,~-~t'ormational S y n t a x Cambridge University Press 1981

Schubert L.K and Pelletier F J From English to Logic: Context-Free Computation of "Conventional" Logical Translation A m e r i c a n J o u r n a l o f C o m p u t a t i o n a l 1=i~nfi,~tics January-March 1982 8(1) 26-44

Shieber S.M D i r e c t Parsing o f I D / L P G r a m m a r draft 1982

Thompson H I-Ia~dlin~ M e t a r u l e s i n a P a r s e r f o r GPSG Technical Report D.A.I No 175 Department

of Artificial Intelligence University of Edinburgh

1982

Ngày đăng: 01/04/2014, 00:20

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN