to provide a semantic interpretation of an input sentence.. like the correspondence between syntactic and semantic rules, with definite clause grammars DCC-s Pereira and Warren.. A.n im
Trang 1S A U M E R : S E N T E N C E A N A L Y S I S U S I N G M E T A R U L E S
F r e d P o p o w i c h Natural Language Group Laboratory for Computer and Communications Research
Department of Computing Science Simon Fraser U n i v e r s i t y Burnaby B.C C A N A D A V5A 1S6
ABSTRACT The SAUMER system uses specifications of natural
language grammars, which consist of rules and metarules
to provide a semantic interpretation of an input sentence
programming language which c o m b i n ~ some of the
features of generalised phrase structure grammars (Gazdar
1981 ) like the correspondence between syntactic and
semantic rules, with definite clause grammars (DCC-s)
(Pereira and Warren 1980) to create an executable
g r a m m a r specification S S L rules are similar to D C G rules
except that they contain a semantic component and m a y
also be left recursive Metarules are used to generate n e w
rules t r o m existing rules before any parsing is attempted
A.n implementation is tested which can provide semantic
interpretations for sentences containing tepicalisation,
relative clauses, passivisation, and questions
1 INTRODUCTION
The SAUMER system a l l o w s the user to specify a
g r a m m a r for a natural language using rules and metarules
rhts g r a m m a r can then be u¢,ed ~ obtain a semantic
Specification l a n g u a g e (SSL) which L~ a variation of
definite clause g r ~ s (DCGs) (Pereira and Warren
1980) captures some ,ff the festures of generaI£.ted phrase
structure grammar5 (GPSGs) (Gazdax, 1981) (GaTrl~r and
Pullum 1982) like rule schemata, rule transformations
correspondence between syntactic and semantic rules The
semantics currently used in the system are based on
Schubert and Pelletiers description in (Schubert and
Pelletier 1982) - which adapts the intetmional logic
intervretation associated w i t h GPSGs into a more
conventional logical n o t a t i o n
2 THE SEMANTIC LOGICAL NOTATION
The logical notation associated w i t h the gr~mm~r
differs f r o m the usual notation of intensional logic_since it
captures some i n t m t i v e aspects of natural language, l
Thus individuals and objects are treated as entities instead of collections of prope'rties, and actions are n - a r y relations between these entities Many of the problems that the intensional notation w o u l d solve are handled by
notation Consequently as is common in other approaches (e.g Gawron 1982) much of the processing is deferred to the pragmatic stage The s t r u c t u r e of the lexicon, and the
brackets) are designed to reflect this ambiguity The lexicon is organised into t w o levels For the semantic interpretation, the first level gives each w o r d a t e n t a t i v e
complete processing information w i l l result in the final interpretation being obtained f r o m the second level of the lexicon For e ~ m p l e , the sentence John misses John could
be given an initial interpretation of:
(2.1) [ Johnl misa2 John3 ]
w i t h Johnl, miss2 and John3 obtained from the first level
of the t w o level lexicon T h e pragmatic stage w i l l determine if J o h a l and John3 both refer to the same
entry, say JOHN SMITH1 of the second level of the lexicon, or if t h e y correspond to d i f f e r e n t entries, say
pragmatic stage, the e n t r y of MISS which is referred to
by miss2 will be determined (if possible) For example, does John miss John because he has been a w a y for a long time, or is it because he is a poor shot w i t h a rifle?
A n y interpretation contained in sharp angle brackets
< > m a y require post processing This is apparent in interpretations containing determiners and co-ordinators The proverb:
(2.2) e v e r y m a n loves some w o m a n could be given the interpretation:
(2.3) [ < e v e r y l m a n 2 > love3 < s o m e 4 w o m a n S > ]
w i t h o u t explicitly stating whmh of the two readings is intended During pragmatic analysis, the scope of every and some w o u l d presumably be determined
111 should also be noted that due Io the separabili'~y of the semantic
component from ",he g r a m m a r rule, • different semantic notation could
easily be introduced at long as ~u~ a p p ~ p r i a t e ~.mantic proce~in8
rou~dne$ were replaced The use of SAUMER w i t h "an "Al-adap'md" version of M o n ~ u e ' s Intensional Logic" is being examined by Fawc©It (1984),
Trang 2The s y n t a x of t h i s logical n o t a t i o n can be b-~mmav~sed
as follows Sentences a n d c o m p o u n d predicate f o r m u l a s
are c o n t a i n e d w i t h i n s q u a r e brackets So (2.4) s t a t e s t h a t
3oim w a n t s to kiss Mary:
(2.4) [ J o h n l w a n t 2 [John1 kiss3 Mary4]]
These f o r m u l a s can also be expressed e q u i v a l e n t l y in a
more f u n c t i o n a l f o r m according to t h e equivalence
(2.5) [ t n P t I t a d ]
- ( • ( ( P t l ) t 2) t n )
- - ( P t t t )
Consequently (2.4) could also be represented as:
(2.6) ( ( w a n t 2 ( ( k i s s 3 M a r y 4 ) J o h n l ) } J o h n l )
However t h i s n o t a t i o n is u s u a l l y used f o r i n c o m p l e t e
phrases, w i t h t h e s q u a r e b r a c k e t s used to o b t a i n a
cortvent/ona/ final reading Modified predicate f o r m u l a s
are contained in braces T h u s a little dog likes F i d o could
be expressed as:
(2.7) [ < a l {little2 d o g 3 } > likes4 FidoS]
The l a m b d a calculus o p e r a t i o n s of l a m b d a a b s t r a c t i o n a n d
e l i m i n a t i o n are also allowed W h e n a v a r i a b l e is
a b s t r a c t e d f r o m a n expression as in:
(2.8) kx [ • w a n t 2 [ • love3 M a r y 4 ] ]
application of t h i s n e w expression to a n a r g u m e n t , s a y
d o h n l :
(2.9) ( k x [ • w a n t 2 [ • love3 l~u~J'4 ] ] J o h n l )
will r e s u l t in an int~,v,©tation of John w a n t s to love Mary:
(2.10) [ J o h n l w a n t 2 [ J o h n l love3 M a r y 4 ] ]
F u r t h e r details on this n o t a t i o n are a v a i l a b l e in ( S c h u b e r t
a n d Pelletier 1982)
3 T H E S A U M E R S P E C I F I C A T I O N L A N G U A G E
p r o g r a m m i n g l a n g u a g e t h a t a l l o w s t h e user to d e f i n e a
g r a m m a r of a n a t u r a l language "in ~ of rules, and
metarules M e t a r u l e s operate on rules to produce n e w
rules The language is basically a GPSG realised in a
DCG setting U n l i k e GPSGs t h e g r a m m a r s defined b y
t h i s s y s t e m are not required to be c o n t e x t - f r e e since
procedure calls are allowed w i t h i n the rules, and since
logic v a r i a b l e s are allowed in t h e g r a m m a r s y m b o l s
The basic objects of the language are a t o m s , variables
t e r m s , and lists A n y w o r d s t a r t i n g w i t h a l o w e r case
letter, o r enclosed in single q u o t e s is a n atom V a r i a b l e s
s t a r t w i t h a capital letter or a n underscore A t e r m is a n
atom o p t i o n a l l y followed b y a series of objects
( a r g u m e n t s ) , w h i c h are enclosed in parentheses and
separated b y commas L a s t l y a list is a series o f one o r
m o r e objects, separated b y commas, t h a t are enclosed in
s q u a r e b r a c k e t s
3 1 R u l e s
T h e r u l e s are presented in a v a r i a t i o n of t h e DCG
n o t a t i o n , a u g m e n t e d w i t h a s e m a n t i c r u l e c o r r e s p o n d i n g to each s y n t a c t i c rule Each r u l e is of t h e f o r m
"A - - > B : ~," w h e r e A is a t e r m w h i c h denotes a
n o n t e r m i n a l s y m b o l B is e i t h e r a n a t o m list r e p r e s e n t i n g
a t e r m i n a l s y m b o l or a c o n j u n c t i o n of t e r m s ( s e p a r a t e d by
c o m m a s ) c o r r e s p o n d i n g to n o n t e r m i n a l s y m b o l s , and y is a
s e m a n t i c r u l e w h i c h m a y reference t h e i n t e r p r e t a t i o n of
t h e c o m p o n e n t s of ~ in d e t e r m i n i n g the s e m a n t i c s of A The r u l e a r r o w - - > separates t h e t w o sides of the rule
w i t h t h e colon : separating t h e s y n t a c t i c c o m p o n e n t f r o m
t h e s e m a n t i c component If t h e r u l e is preceded b y t h e
w o r d a d d , it can be subjected to t h e t r a n s f o r m a t i o n s described in section 3.2 The n o n t e r m i n a l s y m b o l s can possess a r g u m e n t s , w h i c h m a y be used to c a p t u r e t h e
f l a v o u r of t h e s t r u a u r a d categor/~s of GPSGs ~ m a y also possess a r b i t r a r y p r o c e d u r a l r e s t r i c t i o n s c o n t a i n e d in braces
T consists of expressions in t h e s e m a n t i c n o t a t i o n
T h e d i f f e r e n t t e r m s of t h i s s e m a n t i c expression are joined
b y t h e s e m a n t i c connector, t h e a m p e r s a n d "&' T h e
a m p e r s a n d d i f f e r , f r o m t h e s y n t a c t i c connector, t h e
c o m m a , sinc~ t h e f o r m e r associates to t h e r i g h t w h i l e t h e
l a t t e r associates to the left The /og/col a n d s y m b o l
w h i c h t r a d i t i o n a l l y m a y also be d e n o t e d b y t h e
a m p e r s a n d , m u s t be entered as "&&' Due to c o n s t r a i n t s
imposed b y the c u r r e n t i m p l e m e n t a t i o n , "( exFr )" m u s t
be entered as " < [ expr ]' " < expr >" as " < < [ expr ]'
a n d "k x expr" as "x l m d a expr." An expression m a y
c o n t a i n references to t h e i n t e r p r e t a t i o n s of t h e e l e m e n t s of
18 b y s t a t i n g t h e a p p r o p r i a t e n o n t e r m i n a l f o l l o w e d b y t h e left quote, " To p r e v e n t a m b i g u i t y in "these references
t h a t m a y arise w h e n t w o identical s y m b o l s a p p e a r in B a
n o n t e r m i n a l m a y be a p p e n d e d w i t h a m i n u s sign f o l l o w e d
b y a u n i q u e integer
U n l i k e s t a n d a r d Prolog i m p l e m e n t a t i o n s of DCGs left recursion is allowed in rules, t h u s p e r m i t t i n g m o r e n a t u r a l
d e s c r i p t i o n s of certain p h e n o m e n a (like c o - o r d i n a t i o n ) Since t h e left r e c u r s i v e rules are i n t e r p r e t e d , r a t h e r t h a n
c o n v e r t e d into rules t h a t are not left recursive, t h e
n u m b e r of rules in the d a t a b a s e will not be affected
H o w e v e r t h e efficiency of the sentence a n a l y s i s m a y be affected d u e to the e x t r a processing required Rules of
t h e f o r m "A - - > A A" are not accepted
A n e x a m p l e of a p r o d u c t i o n t h a t derives J o h n f r o m a
p r o p e r n o u n n p r is s h o w n in (3.1):
(3.1) n p r - - > [ ' J o h n ' ] : " J o h n ' #
The s e m a n t i c i n t e r p r e t a t i o n of t h i s n p r will be J o h n #
w i t h " # " replaced b y a u n i q u e integer d u r i n g e v a l u a t i o n (3.2) i l l u s t r a t e s a v e r b p h r a s e r u l e t h a t could be used in
sentences like J o h n w a n t s to wa/k:
(3.2) v p ( N u m ) - - >
v ( N u m R o o t ) w i t h Root in [want.like] v p ( i n f )
x # # lmda [ x # # & v" & [ x # # & vp']) ]
Trang 3F i r s t nottce t h a t a restriction on t h e v e r b appears w i t h i n
t h e w / t h s t a t e m e n t In t h e GPSG f o r m a l i s m , t h i s t y p e of
r e s t r i c t i o n w o u l d be o b t a i n e d b y n a m i n g t h e r u l e s a n d
associating a list of v a l i d r u l e n a m e s w i t h each lexical
e n t r y A l t h o u g h t h e w/~h r e s t r i c t i o n m a y c o n t a i n a n y
v a l i d in-ocedure, t y p i c a l l y the i n o p e r a t i o n ( f o r d e t e r m i n i n g
list m e m b e r s h i p ) is used T h e d o u b l e p o u n d # # is
replaced b y t h e s a m e u n i q u e integer in t h e e n t i r e
expression w h e n t h e expression is e v a l u a t e d I f " # " w e r e
used instead, each i n s t a n c e of x # w o u l d be d i f f e r e n t For
t h e a b o v e example, if v' is w a n t 2 a n d vp' is runJ then
t h e s e m a n t i c expression could e v a l u a t e to:
(3.3) x4 l m d a [x4 & w a n t 2 & [x4 & r u n 3 ] ]
F u r t h e r m o r e if np" is Johrtl then:
(3.4) [np" & v p ' ]
could r e s u l t in:
(3.5) [Johnl & w a n t 2 & [Johnl & run3]]
3.2 T h e Metarules
T r a d i t i o n a l t r a n s f o r m a t i o n a l g r a m m a r s p r o v i d e
t r a n s f o r m a t i o n s t h a t operate on parse trees, or s i m i l a r
s t r u c t u r e s , a n d o f t e n require t h e t r a n s f o r m a t i o n s to be
used in sentence recognition r a t h e r t h a n in generation
( R a d f o r d 1981) H o w e v e r t h e approach suggested b y
(GaT~2r 1981) uses t h e t r a n s f o r m a t i o n s g e n e r a t i v e l y a n d
applies t h e m to the g r a m m a r T h u s t h e g r a m m a r can
r e m a i n contex:-free b y compiling t h i s t r a n s f o r m a t i o n a l
k n o w l e d g e into t h e g r a m m a r T r a n s f o r m a t i o n s a n d r u l e
s c h e m a t a f o r m t h e m a a z u / ~ s of SSI- 2
Rule s c h e m a t a a l l o w t h e user to specify e n t i r e classes
of r u l e s b y p e r m i t t i n g v a r i a b l e s w h i c h range o v e r a
selection of categories to a p p e a r in t h e rule To c o n t r o l
t h e v a l u e s of t h e variables, t h e f o r a / / c o n t r o l s t r u c t u r e can
be used in the s c h e m a declaration T h e s c h e m a
f o r a / / X ~n List, Body w i l l execute Body f o r each e l e m e n t
of L i ~ w i t h X i n s t a n t i a t e d to t h e c u r r e n t element T h e
use of this statement is illustrated in the following
m e t a r u l e t h a t generates t h e t e r m i n a l p r o d u c t i o n s f o r p r o p e r
nouns."
(3.6) f o r a l l T e r m i n a l in [ ' B o b ' ' C a r o l ' ' r e d ' ' A l i c e ' ] ,
( n p r - - > [ T e r m i n a l ] : T e r m i n a l # )
T r a n s f o r m a t i o n s m a t c h w i t h g r a m m a r r u l e s in t h e
database, using a r u l e p a t t e r n t h a t m a y be a u g m e n t e d
w i t h a r b i t r a r y procedures, a n d produce new r u l e s f r o m
t h e old rules A t r a n s f o r m a t i o n is of t h e f o r m :
(3.7) a - - > /i : y - - - > a' - - > B" : 7"
The m e t a r u l e a r r o w - - > , separates t h e p a t t e r n ,
a - - > ~ : T f r o m the t e m p l a t e , a" - - > /i" : T'-
2 O f l e n metarule~ are considered 1o consisl of t r a n s f o r m a t i o n s o n l y ,
w h i l e s c h e m a t a are p u l inlo a c a t e g o r y of their o w n H o w e v e r sinoe
t h e y can both be considered i~ p a r t of • m e t a g r a m m a ~ , t h e y are called
me~trule~ in t h l , distna~inn
T h e ~ n ~ a ~ p a t t e r n , Q - - > /i c o n t a i n s n o n t e r m i n a l s
w h i c h c o r r e s p o n d to s y m b o l s t h a t m u s t a p p e a r in t h e
m a t c h e d rule, a n d free v a r i a b l e s , w h i c h r e p r e s e n t don't
~ r ~ r e g i o n s o f zero or m o r e n o n t e r m i n a l s T h e p a t t e r n
n o n t e r m m a l s m a y also possess a r g u m e n t s For each r u l e
s y m b o l , a m a t c h i n g p a t t e r n s y m b o l describes p r o p e r t i e s
t h a t must exist, b u t n o t all the p r o p e r t i e s t h a t may exist
T h u s if v p appeared in the p a t t e r n , it w o u l d m a t c h a n y
of vp vp(Num), o r vp(Nura2"ype) with Type in /transl
H o w e v e r pp(to) w o u l d n o t m a t c h pp or pp(frora), b u t it
w o u l d m a t c h plMto,_) T h e m a t c h i n g c o n d i t i o n s are
s u m m a r i s e d in Figures 3-1 and 3-2 In Figure 3-1 A a n d
B are n o n t e r m i n a l s X is a free v a r i a b l e , a n d a a n d /i are
c o n j u n c t i o n s o f one o r m o r e s y m b o l s , y a n d 8 o f Figure 3-2 are also c o n j u n c t i o n s of one or m o r e s y m b o l s "=" is
d e f i n e d as u n i f i c a t i o n (Clocksin a n d Mellish, 1981) P a r t s
of the r u l e c o n t a i n e d in braces are ignored b y t h e p a t t e r n
m a t c h e r T h e s y n t a c t i c p a t t e r n m a y also c o n t a i n a r b i t r a r y restrictions 3 enclosed in braces, t h a t are e v a l u a t e d d u r i n g
t h e p a t t e r n m a t c h T h e s e m a n t / c p a t t e r n , y, is v e r y
p r i m i t i v e , h m a y c o n t a i n a free v a r i a b l e , w h i c h w i l l
b i n d to the e n t i r e s e m a n t i c s field of the m a t c h e d r u l e , or
it m a y c o n t a i n t h e s t r u c t u r e < [ ? ~] w h i c h w i l l b i n d to the e n t i r e s t r u c t u r e c o n t a i n i n g t h e s y m b o l x If < [ ? y]
t h e n a p p e a r s in y ' , t h e r e s u l t will be t h e s e m a n t i c
c o m p o n e n t of t h e m a t c h e d r u l e w i t h x replaced b y y
P a t t e r n
Rule
(A a )
(X a )
A
X
A m a t c h e s B A m a t c h e s B a n d
a n d a m a t c h e s ~ a is a free v a r i a b l e (X a ) m a t c h e s /i a m a t c h e s B
or a matches (B ~ )
F i g u r e 3-1: P a t t e r n M a t c h i n g f o r C o n j u n c t i o n s
P a t t e r n
Rule b(/i[ /I n) b(,/i I /in ) with 8
a(a I a m )
a ( a I a = )
w i t h
a = b m ~ < n ati=/i i, 1~<i~<m
No
a - - b m ~ n
a i = / i i, l ~ i ~ m
a = b m ~ n
a i = / i i l ~ < i ~ < m "
m a t c h e s 8
F i g u r e 3-2: P a t t e r n M a t c h i n g for N o n t e r m i n a l s
3Apparently no1 present in the Hewle1"t Packard system (Gawron, 1982) or the ProGram system (Evans and Ga~l~r, 1984)
Trang 4The behaviour of patterns can be seen in the following
examples Consider the sentence rule:
(3.8) s(decl) > n p ( n o m N u m b )
v p ( _ J q u m b ) w i t h agreement(Numb) : [ rip" & vp" ]
The patterns shown in (3.9a) w i l l match (3.8) while
those of (3.9b) will not match it
(3.9) (a) s(A) - - > {not element(A,[foo])L X vp : Sere
s - - > np(nom), X vp(pass) Y : Sere
(b) s(inter) - - > np v p : Seam
s - - > vp : Sere
For the v e r b phrase rule shown in (3.10):
(3.10) vp(active.[MIN]) - - >
v([MIN],Root,Type,_) w i t h (intrans in Type)
: v"
the patterns of (3.11a) will result in a successful match
will those of (3.11b) w i l l not:
W i t h external modification, any nonterminal, or variable instantiated to a nonterminal, m a y be f o l l o w e d
by the sequence @rood This w i l l result in rood being inserted i n t o the a r g u m e n t list f o l l o w i n g the specified
arguments Thus, mf N@junk appeared in a rule w h e n N was instantiated to np(more), it w o u l d be expanded as rip(more,junk } Similarly, if the pattern s y m b o l vp matched v,v{NumS) in a rule, then the appearance of vp@foo in the template w o u l d result in vp(foo~Vumb)
introduced by the modifier, can be useful w h e n dealing
w i t h the missing components of slash or derived categories (Gazdar, 1981)
Internal modification allows the m o d i f i e r to be p u t directly into the argument list If an a r g u m e n t is followed by @rood it will be replaced by rood In the case where @rood appears as an argument by itself, rood is
v(Numb@pastpart) were contained in a template, it w o u l d
IT-match v(Numb) in the pattern, and w o u l d result in the appearance of v(pastpart) in the new rule
(3.11) (a) v p - > v : <[?v]
v p - - > v( T y p e _ )
with (X, intrans in Type Y)
Z : S e m (b) v p - - > v ( _ T y p e _ )
w i t h (X trans in Type) : S e m
v p - > v ( _ ~ o o t )
w i t h (Root in [fool X)
:Sem
For every rule that matches the pattern, the template
of the transformation is executed, resulting the creation of
a new rule A n y nonterminal N, that matches a symbol
8 i on the left side of the transformation, will appear in
the new rule if there is a symbol ~i" in 8" that
irura-transformation (IT) matches w i t h ~i" If there are
several symbols in 8" that IT-match ~i" the leftmost
s y m b o l w i l l be selected No symbol on one side of the
transformation may IT-match with more than one s y m b o l
on the other side T w o symbols will IT-match o n l y if
they have the same n u m b e r of arguments, and those
arguments are identical A n y w/th expressions and
modifiers associated w i t h symbols are ignored during IT-
matching 8" m a y also contain extra s y m b o l s that do not
correspond to anything in 8 In this case t h e y are
inserted directly into the new rule Once again, if the
transformation is preceded by the command add then the
transformations
3.3 Modifiers
Both rules and metarules m a y c o n t a i n s modifiers that
alter the ~tructure of the nonterminal symbols There are
t w o types of modification, which have been dubbed
external and /nzerrud modification
4 IMPLEMENTATION
T h e S A U M E R system is currently implemented in highly portable C-Prolog (Pereira 1984) and runs on a Motorola 68000 based S U N Workstation supporting U N I X 4 Calls to Prolog are allowed by the system, thus providing
implementation Implementations in other languages w o u l d
d i f f e r externally o n l y in the syntax of the procedure calls that m a y appear in each rule Use of the system is described in detail in (Popowich, 1985)
The current implementation converts the g r a m m a r as specified by the rules and metarules into Prolog clauses This conversion can be examined in terms of how rules are processecl, and h o w the schemata and transformations are processed
4.1 Rule P r o c e s s i n g
The syntactic component of the rule processor is based
processor (Clocksin and Mellish 1981) which has been
nonterminal is converted into a Prolog predicate, with t w o additional arguments, that can be processed by a t o p - d o w n parser These ~ t n arguments correspond to the list to be parsed, and the remainder of the list after the predicate has parsed the desired category W i t h the addition of semantics to each rule, another argument is required to represent the semantic interpretation of the current symbol Thus w h e n e v e r a left quoted category name x'
4UNIX is • Inulemark of Bell Laboralories
Trang 5variable bound to the semantic argument of the
expression is then evaluated by the eva/ routine w i t h the
result bound to the semantic argument of the nonterminal
on the left hand side of the production For ~ffiample the
sentence /ule:
(4.1) add s(decl) - >
np(nom.Numb)
v p ( _ 2 q u m b ) w i t h agreement(Numb) : [ np" & vp" ]
will result in a Prolog expression of the form:
(4.2) s(SemS.decl._l 3) :-
nlKSemNP.nom2qumb 1 2 ) vp(SemVP, 2qumb 2 3)
agreement(Numb)
eval([SemNP & SemVP],SemS)
Consequently to process the sentence John runs one
w o u l d try to satisfy:
(4.3) :- s(Sem, Type ['John'.runs] [])
The first argument returns the interpretation, the second
a r g u m e n t returns the t y p e of sentence, the third is the
initial input list and the final argument corresponds to
the list rPmaining after finding a sentence A n y rule R,
that is preceded by add w i l l have the axiom r'ul~(R)
inserted into the database These axioms are used by the
transformations during pattern matching
The eva/ routine processes the s u f f i x symbols, # and
# # along w l t h the lambda expressions, and m a y perform
some- reorganisation of the given expression before
returning a new semantic form For each expression of
the f o r m n a m e # , a unique integer N is ca-eared and
nan~-N is returned With " # # ' the procedure is the
same except that the first occurrence of " # # " w i l l generate
a unique integer that w i l l be saved for all subsequent
occurrences To evaluate an expression of the f o r m :
(4.4) ( expr i Lmda e ~ F j & X )
every subexpression of exprj is recursively searched for an
occurrence of expr i which is then replaced by X
Left recursion is removed w i t h the aid of a gap
predicate identical to the one defined to process gapping
g r - a m m a r S (Dahl and Abramson 1984) and unre~Lricte~
gapping g r a m m a r s (Popowich forthcoming) For any rule
of the form:
(4.5) A - - > A B a
where A does not equal B the result of the translation is:
(4.6) A f _ I N n) :- g a p ( G _ l 2) B ( 2 N o ) A(G,[])
<Xl (No,N 1 ) tXn(Na_l.Nn), According to (4.6) a phrase is processed by skipping over
a region to find a B - - the first non-terminal that does
not equal A The skipped region is then examined to
ensure that it corresponds to an A before the rest of the phrase is processed
4.2 Schema Processing
To process the metarule control structures used by
schemata, a f m l predicate is inserted to force Prolog to t r y
all possible alternatives T h e simple recursive definition
of / o r e / / X / ~ /./rt:
(4.7) f o r a l l ( X in [], Body)
f o r a l l ( X in [YIRest]~xty) :- (X=Y c a l l l ( B o d y ) , fail) : forall(X Rest Body)
uses f a / / to undo the binding of Y, the first element of the list to X before calling fore// w i t h the remainder of the list The predicate ¢.<d/l is used to evaluate Body
since it w i l l prevent the f a / / predicate f r o m causing backtracking into Body
4.3 Transformation Processing
complex processing of all of the m e t a g r a m m a t i c a l operations This processing can be divided into the three stages of transformation c r Y pattern matching, and rule
crem,/on 5
During the rrar~fornuU/~n trot/on phase, the predicate
rrarts(M,X,Y) is created for the metarule M This predicate will transform a list of elements X: into another ILSL Y, according to the syntax specification of the metarule Elements that IT-match will be represented by the s a m e free variable in both lists This binding will be one to one since an element cannot match with m o r e than one element on the other side S y m b o l s that appear on only one side will not have their free variable appearing
on the opposite side Expressions in braces are ignored during this stage If a transformation like:
( 4 8 ) a - - > b, c X - - > a@foo - - > b X c(foo) appears, then a predicate of the form:
(4.9) t r ~ s ( M L 1 _ 2 _ 3 X ] L 1 _ 2 X _ 4 ] ) will be created Notice that the appearance of a modifier does not cause a@/oo to be distinguished from a since all modifiers are removed before the p a t t e r n - t e m p l a t e match is attempted However c and c(foo) are considered to be
d i f f e r e n t symbols M is a unique integer associated w i t h the transformation
The pattern match phase determines if a rule matches the pattern, and produces a list for each successful match
which w i l l be transformed by the trans predicate Each element of the list is either one of the matched s y m b o l s
f r o m the rule or a list of s y m b o l s corresponding to the
don't care region of the pattern A n y predicates that
5(Popowich, forthcoming) examines a method of t r a n s f o r m a l i o n
~ i n g t h a t uses the transformations d u r i n g ~3~e par~e, instead of Using them m L~me~te new ~.fle~
Trang 6appear in braces in the pattern a r e evaluated during t h e
pattern match Consider the operation of an active-passive
v e r b phrase transformation:
(4.10) vp(active~Numb) - - >
v(Numb.R.Type.SType)
w i t h (X.trans in Type.Y)
np Z
< [ ? np']
v ~ p a s s N u m b ) - - >
v(Numb.be.T.S)-I w i t h auz in T
v(Numb@pastpart.R.Type.SType)
w i t h (X.trans in Type.Y)
z pp(by._)
: x # # Imda [pp(by)" & <[7 x # # ] ]
on the following v e r b phrase:
(4.11) vp(active.Numb) - - >
v(Numb~R.Type._) w i t h trans in Type
n~[x.A.x] )
: < [ v" & np" ]
The list produced by the pattern match w o u l d resemble:
'.12) [ vp(active.Numb)
v ( N u m b R T y p e _ ) w i t h [[].trans in Type~]]
Notice that there w a s nothing in the rule to bind with X
Y or Z Consequently these variables were assigned the
null list [] T h e pattern match of the semantics of the
rule will result in an expression which lambda abswacts
np" out the of semantics:
(4.13) < [ np" lmda < [ v" & np" ] ]
transformation to the list produced by the pattern match
and then uses the new list and the template to obtain a
new rule This phase includes conversion of the new list
back into rule form the application of modifiers, and the
addition of any extra symbols that appear on the right
hand side only To continue w i t h our *Tample the trans
predicate a.~ociated w i t h (4.10) w o u l d be:
(4.14) trans(N [ _ 1 _ 2 _ 3 Z ] [ _ 3 4 _ 2 1 5 ] )
Notice that the two v p ' s on opposite sides of the metarule
do not match So the transformed list w o u l d resemble:
(4.15) [ _ 3
4 ,
v ( N u m b R T y p e _ ) w i t h [[].trans in Type,[]]
[3
_ 5 1
The rule generated by the rule creation phase w o u l d be:
(4.16) v p ( p a s s ~ l u m b ) - - >
v ( N u m b b e T ~ ) - I w i t h aux in T
v ( p a s t p a r t R , T y p e _ ) w i t h t n n s in Type
p p ( b y _ )
: x # # lmda [ pp(by)" & < [ v" & x # # ] ]
• Notice that the expression " < [ v" & x # # ]' which is
• contained in the semantics of (4.16) was obtained by the application of (4.13) to x # #
5 A P P L I C A T I O N S
To examine the usefulness of this t y p e of g r a m m a r
implementation, a g r a m m a r was developed that uses the
(Cercone et.al 1984) The A A A is an interactive information system under development at Simon Fraser University It is intended to act as an aid in "curriculum planning and management', that accepts natural language queries and generates the appropriate responses Routines
retrieving lexical information were also provided
permits some possessive forms, and allows auxiliaries to appear in the sentences From the base of t w e n t y six rules, eighty additional rules were produced by three metarules in about e i g h t y - f i v e seconds Ten more rules were needed to link the lexicon and the grammar A selection of the rules and metarules appears in Figure 5-1
(Popowich 1985)
In the interpretations of some ~ m p l e sentences, which can be found in Figure 5-2, some liberties are taken w i t h the semantic notation Variables of the f o r m wN where
N is any integer, represent entities that are to be instantiated f r o m some database Thus any interpretation containing w N w i l l be a question Possessives like John's
tab/e are represented as:
(5.1) < t a b l e & [John poss table]>
Although m u l t i p l e possessives which associate f r o m left to right are allowed, group possessives as seen in:
(5.2) the man who passed the course's book and in phrases like:
(5.3) John's driver's lice.ace
Inverted sentences are preceded by the w o r d Q u e r y in the output Also proper nouns are assumed to unambiguously refer to some object, and t h u s are no longer followed by
interpretation are give 9 in CPU seconds The total time includes the t i m e spent looking for all other possible parses
Results obtained w i t h SAUMER compare f a v o u r a b l y to those obtained f r o m the ProGram system (Evans and Gazdar 1984) ProGram operates on g r a m m a r s defined according to the current GPSG formalism (Ga2dar and Pullum 1982) but was not developed w i t h efficiency as a major consideration The grammar used w i t h ProGram which is given in (Popowich 1985) is similar to the A A A
Trang 7add vp(octive.Numb) ~ > v(Numb Root T, _) w i t h (Root in [ p a s s g i v e , t e a c h , o f f e r ] , indabj in T t r e e s in T ) ,
np([x.D.x] ) n p ( [ x * x ] )-1 : <[ v' a np' a np-t' ]
Je WH <lueetions in i n v e r t e d sentences * / e v c l ( y ~ , V a r ) , NP - np(Case.Numb,Feat)
• ( NPONP ~ > [ ] |agreement(Case)| : Var )
, ( e ( i n v ) ~ > n p ( [ x , A , x ] , N o m b , F e a t ) w i t h Clword in Feat, e ( i n v ) O n p ( [ x , A , x ] , N u m b , F e a t )
: <[ (Vat lads s ' ) • np' ] ) /* p a s s i v e t r e n e f a r n m t i o n e /
add vp(octive.Numb) - - > v(Numb.R.Type.Subtype) w i t h (X t r e e s in Type0 Y) npo Z : <[? np °]
mE> vp(poss,Humb) ~ > v(Numb,be,T,S) I w i t h aux in T,
v(Numi:gpaetpart, R Type, Subtype) w i t h (X, t r e e s in Type, Y),
Z o p t i a n a l ( p p ( b y _ ) ) : x ~ Imda [ o p t i o n a l " k <[ ? x ~ ] ]
/ * sentence i n v e r s i o n */
add v p ( T [ M i N ] ) ~ > v([MJN],R,Type,S) w i t h (X, aux in Type, Y ) , Z : $em
m > s ( i n v ) - - > v ( [ U I N ] , R , T y p e , S ) w i t h (X.aux in Type,Y), n p ( [ N l , x , x ] , [ M l N ] , _ ) , Z : [ n p ' a Semi
/ , m e t a r u l e f o r the p r o p a g a t i o n of " h o l e s " in the " s l o s h " c a t e g o r i e s e /
f a r a i l Hole in [ p p ( P r e p , F e a t ) , n p ( C a s e , N o m b , F o o t ) ]
( f o r a l l Cat1 in [ s ( T y p e ) , v p p p ( P r e p , F e a t ) , o p t i o n a l ]
• ( f o r a l l Cat2 in [ v p , p p ( P r e p , F e a t ) , n p ( C a a e , N u m b , F o a t ) , o p t i o n a l ]
, ( Cat1 m > X Cot2, Y : Sem m > C e t l I H o i e m > X, Cat2OHalo, Y : Sen ) ) )
Figure 5-1: Excerpt from Grammar
Sentence Query:
A n a l y o , e :
d i d Fred take o m p t l e l [Fred t a k e s c m p t l e l ] 2.25 eec T o t a l : 4 28334 sea
Sentence: who wonts to teach F r e d ' s p r o f e s s o r ' s course
Semantics: [ <wl • [wl onlmgte]>
wont4 [ <wl • [wl an im at e ] >
teach13
<course14 k [ < p r o f e s s a r I S • [Fred pace p r o f o s e a r l S ] > poes course14]>
] ]
A n a l y s i s : 6.58337 eec T o t a l : 18.9834 ee¢
Sentence' Query"
A n a l y s i s :
whose course does the student whom John l i k e n want t o be t a k i n g [ <<the38 student39> • [John like4S <the38 s t u d e n t 3 9 > ] >
wont46 [ <<the38 student39> • [John like4S <the38 student39>]>
takeS6
<course29 • [<w3e • [w3e a n im a te ]> pose caurwe29]>
] ]
21.9999 eec T o t a l : 39.4 sac
Sentence:
Query:
A n a l y s i s :
t o whom daee the p r o f e s s o r want which paper to be g i v e n [ <the14 p r o f e s s o r l S >
want17 [ x39 givo3S <w7 k [w7 aninmte]> <w21 k [w21 paper22]> ]
]
14.3167 sec T o t a l : 29.5167 sec
Figure 5-2: Summary of Test Results
Trang 8g r a m m a r u s e d by SAUMER except that it has a much
smaller lexicon, and allows neither relative clauses nor
SAUMER ProGram required about 35 seconds to parse the
sentence does John take cmpelOl, w i t h a total processing
time of abo,.u 140 second.~ SAUMER required just o v e r 2
seconds to parse this phrase, and had a total processing
time of about 4 seconds
As it stands, the semantic notation used by SAUMER
does "not contain much of the relevant information that
" w o u l d be required by a real system Tense n u m b e r and
adverbial information, including concepts like location and
time w o u l d be required in the A A A If the SSL
description were to be extended, w i t h the resulting system
behaving as a natural language interface of the A A A a
more database directed semantic notation w o u l d prove
invaluable
6 PRESENT IXMITATIONS
Although this application of metarules a l l o w s succinct
descriptions of a grammar, several problems have been
observed
Since each metarule is applied to the r u l e base only
once the order of the metarules is v e r y important In
our sample grammar, the passive v e r b phrases were
generated before the sentence inversion t r a n s f o r m a t i o n was
transformations were executed For the c u r r e a t
implementation, if a rule generated by t r a n s f o r m a t i o n T1
is to be subjected to transformation T2 then T1 m u s t
appear before T2 Moreover no rule t h a t is the result of
T 2 - c a n be operated on by T I It w o u l d be preferable to
remove this restriction and impose one that is less severe
such as the finite closure restriction which is described in
(Thompson 1982) and used by ProGram W i t h this
improvement, the o n l y restriction w o u l d be that a
derivation of a rule
The system can not c u r r e n t l y process rules expressed
in the Immediate Dominance/ Linear Precedence (ID/LP)
format (Gazdar and Pullum 1982) With this format, a
production rule is expressed w i t h an unordered right hand
declaration of //near precedence For example, a passive
verb phrase rule could appear something like"
(6.1) vp(pass.[MIN]) - - >
v([MIN], be )
v ( _ Root Type _ ) w i t h
(Root in [pass.carry.give]
indobj in Type
trans in Type)
p p ( t o ) optional(pp(by)) : x # # Imda [optional" & <[v" & pp(to)" & x # # ] ]
w i t h the components having a linear precedence of:
(6.2) v(_.be) < v < pp The result w o u l d be that the pp(by) could appear before
or after the pp(to), since there is no restriction on ' t h e i r relative positions I f this f o r m a t were implemented, o n l y one passive m e t a r u l e w o u l d have to be explicitly stated The direct processing of ID/LP gremm~rs is discussed in (Shieber 1982) (Evans and Gazdar 1984) and (Popowich forthcoming)
7 CONCLUSIONS
SSL appears to adequately capture the f l a v o u r of GPSG descriptions w h i l e allowing more procedural control Investigation into a relationship between SSL and GPSG
grammars could result in a method for translating GPSG
grammars into SSL for execution by SAUMER F u r t h e r research could also provide a relationship between SSL and other g r a m m a r formalisms, such as /ex/c~-funct/on,d
implementation of SAUMER allowing left recursion in rules, should facilitate a more detailed s t u d y of the specification language, and of some problems associated
w i t h m e t a r u l e specifications Due to the easy separability
of the semantic rules, one could attempt to introduce a more database oriented semantic notation and develop an interface to a real database One could then examine system behaviour w i t h a larger rule base and more involved transi'ormations in an applications e n v i r o n m e n t like that of the A A A However as is apparent f r o m the application presented here and f r o m p r e l i m i n a r y
f u r t h e r investigation of the efficient operation of this Prolog implementation w i t h large grammars w i l l be required
ACKNOWLEDGEMENTS
l w o u l d like to thank Nick Cercone for reading an earlier version of this paper and providing some useful suggestions The comments of the referees were also helpful Facilities for this research were provided by the Laboratory for Computer and Communications Research This w o r k Was supported by the Natural Sciences and Engineering Research Council of Canada under Operating
Postgraduate Scholarship #800
REFERENCES
Cercone N Hadley R Martin F McFetridge P and Strzaikowski T D e a i ~ i n ~ a n d a u t o m a t i n g t h e
q u a l i t y mmesmment o f a k n o w l e d g e - b a m ~ s y s t e m : t h e
i n i t i a l a u t o m a t e d a c a d e m i c a d v i s o r e x p e r i e n c e , pages 193-205 IEEE Principles of Knowledge-Based Systems Proceedings Denver Colorado 1984
Clocksin W.F and Mellish C.S P r o g r n m m l n g i n P r o l o g Berlin-Heidelberg-NewYork:Springer-Verlag 1981
Trang 9Dahl V and Abramson H On Gapping G r ~ m m ~ Proceedings of the Second International Joint Conference
on Logic University of Uppsala Sweden 1984
Evans R and Gazdar G The P r o G r a m M a n u a l Cognitive Science Programme University of Sussex,
1984
Fawcett B personal c o m m n n i c a t i o n Dept of Computing Science University of Toronto 1984
Gawron J.M et.aL Procemiag English w i t h a GenersliT~d Phrase S t r u c t u r e G r a m m a r pages 74-81 Proceedings of the 2Oth Annual Meeting of the Association for Computational Linguistics, June 1982 Gazdar G Phrase Structure Grammar In Po Jacobson and G.K Pullum (Ed.) The N a t u r e o f Syn~cx.ic Representation, D.Reidel Dortrecht, 1981
Gazdar G and Pullum G.K G e n e r a l i z e d Phrase
S t r u c t u r e Gr~mm,~r: A T h e o r e t i c a l Synopsis Technical Report Indiana University Linguistics Club Bloomington Indiana August 1982
Kaplan R and Bresnan J Lexical-Functional Grarnmar:
A Formal System for Grammatical Representation I n
J Bresnan (Ed.) M e n t a l R e p r e s e n t a t i o n o f
G r a m m a t i c a l Relation& M r r Press 1982
Pereira F.C.N.(ed) C-Prolog User's Manual Technical Report SRI International Menlo Park California 1984 Pereira F.C.N and Warren, D.H.D Definite Clause Grammars for Language Analysis A r t i f i c i a l
I n t e l l i g e n c e 1980 13, 231-278
Popowich F S A ~ Sentence ,t~nlysi~ Using ]~ETaJ~lL].es (]Pl-el iminal-y Report) Technical Report TR-84-10 and LCCR TR-84-2 Department of Computing Science Simon Fraser University August
1984
Popowich F The SAUMER User's Manual Technical Report TR-85-3 and LCCR TR-85-4 Department of Computing Science Simon Fraser University, 1985
Popowich F E f f e c t i v e I m p l e m e n t a t i o n a n d A p p l i c a t i o n
o f Ulxrestricted Gapping GrammArS Master's thesis Department of Computing Science Simon Fraser University forthcoming
Radford A Tr,~-~t'ormational S y n t a x Cambridge University Press 1981
Schubert L.K and Pelletier F J From English to Logic: Context-Free Computation of "Conventional" Logical Translation A m e r i c a n J o u r n a l o f C o m p u t a t i o n a l 1=i~nfi,~tics January-March 1982 8(1) 26-44
Shieber S.M D i r e c t Parsing o f I D / L P G r a m m a r draft 1982
Thompson H I-Ia~dlin~ M e t a r u l e s i n a P a r s e r f o r GPSG Technical Report D.A.I No 175 Department
of Artificial Intelligence University of Edinburgh
1982