PASSIVE can also apply to the rule generated by DATMOVE to yield the passive form of VpIs with dative objects: AP -> V PPL NP PP#A ' V VP NP PP#A {NP VP SUBJ VP; e.g., "Mary was given
Trang 1I N AN A N N O T A T E D P H R A S E - S T R U C T U R E G R A M M A R
K u r t K o n o l i g e SRI I n t e r n a t i o n a l =
1 I n t r o d u c t i o n
C o m p u t a t i o n a l models e m p l o y e d by c u r r e n t n a t u r a l l a n g u a g e
u n d e r s t a n d i n g systems r e l y on p h r a s e - s t r u c t u r e r e p r e s e n t a t i o n s
o f s y n t a x Whether i m p l e m e n t e d as a u g m e n t e d t r a n s i t i o n nets,
B N F g r a m m a r s , a n n o t a t e d phrase-structure g r a m m a r s , or s i m i l a r
m e t h o d s , a phrase-structure representation makes the p a r s i n g
p r o b l e m c o m p u t a t l o n a l l y t r a c t a b l e [ 7 ] H o w e v e r ,
p h r a s e - s t r u c t u r e r e p r e s e n t a t i o n s have been open to the
c r i t i c i s m t h a t t h e y do n o t c a p t u r e l i n g u i s t i c g e n e r a l i z a t i o n s
t h a t are easily expressed in t r a n s f o r m a t i o n a l g r a m m a r s
T h i s p a p e r d e s c r i b e s a f o r m a l i s m f o r s p e c i f y i n g s y n t a c t i c
and s e m a n t i c g e n e r a l i z a t i o n s across the rules o f a
phrase-structure g r a m m a r (PSG) The f o r m a l i s m consists o f
t w o p a r t s :
1 A d e c l a r a t i v e d e s c r i p t i o n o f basic s y n t a c t i c
p h r a s e - s t r u c t u r e s and t h e i r a s s o c i a t e d s e m a n t i c
t r a n s l a t i o n
2 A set o f m e t a r u l e s f o r d e r i v i n g a d d i t i o n a l g r a m m a r
r u l e s f r o m the basic set
Since m e t a r u l e s o p e r a t e on g r a m m a r rules r a t h e r than phrase
m a r k e r s , the t r a n s f o r m a t i o n a l e f f e c t o f m e t a r u l e s can be
p r o - c o m p u t e d b e f o r e the g r a m m a r is used to a n a l y z e i n p u t ,
The c o m p u t a t i o n a l e f f i c i e n c y o f a p h r a s e - s t r u c t u r e g r a m m a r is
thus p r e s e r v e d ,
M e t a r u l e f o r m u l a t i o n s f o r PSGs have r e c e n t l y r e c e i v e d
i n c r e a s e d a t t e n t i o n in the l i n g u i s t i c s l i t e r a t u r e , e s p e c i a l l y in
[ 4 ] , w h i c h g r e a t l y i n f l u e n c e d the f o r m a l i s m p r e s e n t e d in this
p a p e r Our f o r m a l i s m d i f f e r s s i g n i f i c a n t l y f r o m [ 4 ] in t h a t
t h e m e t a r u l e s w o r k on a p h r a s e - s t r u c t u r e g r a m m a r a n n o t a t e d
w i t h a r b i t r a r y f e a t u r e sets ( A n n o t a t e d P h r a s e - s t r u c t u r e
G r a m m a r , or APSG [ 7 ] ) G r a m m a r s f o r a l a r g e subset o f
E n g l i s h have been w r i t t e n using this f o r m a l i s m [ 9 ] , and its
c o m p u t a t i o n a l v i a b i l i t y has been d e m o n s t r a t e d [ 6 ] Because o f
t h e i n c r e a s e d s t r u c t u r a l c o m p l e x i t y o f APSGs o v e r PSGs
w i t h o u t a n n o t a t i o n s , new t e c h n i q u e s f o r a p p l y i n g m e t a r u l e s to
these s t r u c t u r e s are d e v e l o p e d in this p a p e r , and the n o t i o n o f
a m a t c h b e t w e e n a m e t a r u l e and a g r a m m a r r u l e is c a r e f u l l y
d e f i n e d The f o r m a l i s m has been i m p l e m e n t e d as a c o m p u t e r
p r o g r a m and p r e l i m i n a r y tests have been made to e s t a b l i s h its
v a l i d i t y and e f f e c t i v e n e s s
2 M e t a r u l e s
M e t a r u l e s are used to c a p t u r e l i n g u i s t i c g e n e r a l i z a t i o n s t h a t
a r e n o t r e a d i l y expressed in the p h r a s e - s t r u c t u r e r u l e s
C o n s i d e r the t w o s e n t e n c e s :
1, John gave a book to Mary
2 Mary was given a hook by John
A l t h o u g h t h e i r s y n t a c t i c s t r u c t u r e is d i f f e r e n t , these t w o
s e n t e n c e s have many e l e m e n t s in c o m m o n In p a r t i c u l a r , the
p r e d i c a t e / a r g u m e n t s t r u c t u r e t h e y d e s c r i b e is the same: the
g i f t o f a book by j o h n to M a r y T r a n s f o r m a t i o n a l g r a m m a r s
c a p t u r e t h i s c o r r e s p o n d e n c e by t r a n s f o r m i n g the phrase m a r k e r
=This r e s e a r c h was s u p p o r t e d by the D e f e n s e A d v a n c e d
R e s e a r c h P r o j e c t s A g e n c y u n d e r C o n t r a c t N 0 0 0 3 9 - 7 9 - C - 0 1 1 8
w i t h t h e N a v a l E l e c t r o n i c s Systems C o m m a n d The v i e w s and
c o n c l u s i o n s c o n t a i n e d in this d o c u m e n t are those o f the a u t h o r
and should n o t be i n t e r p r e t e d as representative o f the o f f i c i a l
p o l i c i e s , e i t h e r expressed or i m p l i e d , o f the U.S G o v e r n m e n t
The a u t h o r is g r a t e f u l to Jane Robinson and G a r y H e n d r i x f o r
c o m m e n t s on an earlier d r a f t o f this p a p e r
f o r (1) i n t o the phrase m a r k e r f o r ( 2 ) The u n d e r l y i n g
p r e d i c a t e / a r g u m e n t s t r u c t u r e r e m a i n s the same, b u t the s u r f a c e
r e a l i z a t i o n changes H o w e v e r , the r e c o g n i t i o n o f
t r a n s f o r m a t i o n a l g r a m m a r s is a very d i f f i c u l t c o m p u t a t i o n a l
p r o b l e m =
By c o n t r a s t , m e t a r u l e s o p e r a t e d i r e c t l y on the rules o f a PSG to p r o d u c e more rules f o r t h a t g r a m m a r As long as the
n u m b e r o f d e r i v e d rules is f i n i t e , the r e s u l t i n g set o f rules is
s t i l l a PSG, U n l i k e t r a n s f o r m a t i o n a l g r a m m a r s PSGs have
e f f i c i e n t a l g o r i t h m s f o r p a r s i n g [ 3 ] In a sense, all o f the
w o r k o f t r a n s f o r m a t i o n s has been pushed o f f i n t o a
p r e - p r o c e s s i n g phase w h e r e n e w g r a m m a r rules are d e r i v e d
We are n o t g r e a t l y c o n c e r n e d w i t h e f f i c i e n c y in p r e - p r o c e s s i n g , because it only has to be done o n c e
There are s t i l l c o m p u t a t i o n a ! l i m i t a t i o n s on PSGs t h a t must
be t a k e n i n t o a c c o u n t by any m e t a r u l e s y s t e m L a r g e n u m b e r s
o f p h r a s e - s t r u c t u r e rules can s e r i o u s l y d e g r a d e the
p e r f o r m a n c e o f a p a r s e r , b o t h in t e r m s o f i t s r u n n i n g t i m e == ,
s t o r a g e f o r the r u l e s , and the a m b i g u i t y o f the r e s u l t i n g parses [ 6 ] M o r e o v e r , the g e n e r a t i o n o f l a r g e numbers o f rules seems p s y c h o l o g i c a l l y i m p l a u s i b l e Thus the t w o c r i t e r i a we
w i l l use to j u d g e the e f f i c a c y o f m e t a r u l e s w i l l be: can t h e y
a d e q u a t e l y c a p t u r e l i n g u i s t i c g e n e r a l i z a t i o n s , and are t h e y
¢ o m p u t a t i o n a l l y p r a c t i c a b l e in t e r m s o f the n u m b e r o f rules
t h e y generate The f o r m a l i s m o f [ 4 ] is e s p e c i a l l y v u l n e r a b l e
to c r i t i c i s m on the l a t t e r p o i n t , since i t g e n e r a t e s l a r g e
n u m b e r s o f new r u l e s *==
3 R e p r e s e n t a t i o n
An a n n o t a t e d phrase-structure g r a m m a r (APSG) as
d e v e l o p e d in [ 7 ] is the t a r g e t r e p r e s e n t a t i o n f o r the
m e t a r u l e s The core c o m p o n e n t o f an APSG is a set o f
c o n t e x t - f r e e p h r a s e - s t r u c t u r e r u l e s As is c u s t o m a r y , these
r u l e s are i n p u t to a c o n t e x t - f r e e p a r s e r to a n a l y z e a s t r i n g ,
p r o d u c i n g a p h r a s e - s t r u c t u r e t r e e as o u t p u t In a d d i t i o n , the parse t r e e so p r o d u c e d may have a r b i t r a r y f e a t u r e sets, c a l l e d
a n n o t a t i o n s , a p p e n d e d to each node The a n n o t a t i o n s are an
e f f i c i e n t means o f i n c o r p o r a t i n g a d d i t i o n a l i n f o r m a t i o n i n t o the parse t r e e T y p i c a l l y , f e a t u r e s w i l l e x i s t f o r s y n t a c t i c
p r o c e s s i n g (e.g., n u m b e r agreement), g r a m m a t i c a l f u n c t i o n o f
c o n s t i t u e n t s (e.g., s u b j e c t , d i r e c t and i n d i r e c t o b j e c t s ) , and
s e m a n t i c i n t e r p r e t a t i o n
A s s o c i a t e d w i t h each r u l e o f the g r a m m a r are p r o c e d u r e s
f o r o p e r a t i n g on f e a t u r e sets o f the phrase m a r k e r s the r u l e
c o n s t r u c t s These p r o c e d u r e s may c o n s t r a i n the a p p l i c a t i o n o f
t h e r u l e by t e s t i n g f e a t u r e s on c a n d i d a t e c o n s t i t u e n t s , or add
i n f o r m a t i o n to t h e s t r u c t u r e c r e a t e d by t h e r u l e , based on the
f e a t u r e s o f i t s c o n s t i t u e n t s Rule p r o c e d u r e s are w r i t t e n in
t h e p r o g r a m m i n g l a n g u a g e LISP, g i v i n g the g r a m m a r the p o w e r
to r e c o g n i z e class 0 l a n g u a g e s The use o f a r b i t r a r y
p r o c e d u r e s and f e a t u r e set a n n o t a t i o n s makes APSGs an
* T h e r e has been some success in r e s t r i c t i n g the p o w e r o f
t r a n s f o r m a t i o n a l g r a m m a r s s u f f i c i e n t l y to a l l o w a recognizer to
be b u i l t ; see [ 8 ]
= * S h e l l [ 1 0 ] has shown t h a t , f o r a simple recursive d e s c e n t
p a r s i n g a l g o r i t h m , r u n n i n g time is a l i n e a r f u n c t i o n of the
n u m b e r o f r u l e s F o r o t h e r p a r s i n g schemes, the r e l a t i o n s h i p
b e t w e e n the n u m b e r o f rules and p a r s i n g t i m e is u n c l e a r
='~SThis is w i t h o u t c o n s i d e r i n g i n f i n i t e schemas such as the
one f o r c o n i u n c t i o n r e d u c t i o n B a s i c a l l y , the p r o b l e m is t h a t
t h e f o r m a l i s m o f [ 4 ] a l l o w s c o m p l e x f e a t u r e s [21 to d e f i n e
n e w c a t e g o r i e s , g e n e r a t i n g an exponential n u m b e r o f c a t e g o r i e s (and hence r u l e s ) w i t h respect to the n u m b e r o f f e a t u r e s
Trang 2l a n g u a g e , s i m i l a r to the e a r l i e r A T N f o r m a l i s m s [ 1 ] An
e x a m p l e o f how an APSG can encode a large subset o f English
is the D I A G R A M g r a m m a r [ 9 ]
It is u n f o r t u n a t e l y the v e r y p o w e r .of APSGs (and A T N s )
t h a t makes i t d i f f i c u l t to c a p t u r e l i n g u i s t i c g e n e r a l i z a t i o n s
w i t h i n these f o r m a l i s m s M e t a r u l e s f o r t r a n s f o r m i n g one
a n n o t a t e d p h r a s e - s t r u c t u r e rule into a n o t h e r must n o t only
t r a n s f o r m the p h r a s e - s t r u c t u r e , b u t also the p r o c e d u r e s t h a t
o p e r a t e on f e a t u r e sets, in an a p p r o p r i a t e way Because the
t r a n s f o r m a t i o n o f p r o c e d u r e s is n o t o r i o u s l y d i f f i c u l t , * one o f
the tasks of this paper w i l l be to i l l u s t r a t e a d e c l a r a t i v e
n o t a t i o n d e s c r i b i n g o p e r a t i o n s on f e a t u r e sets t h a t is p o w e r f u l
enough to encode the m a n i p u l a t i o n s o f f e a t u r e s necessary f o r
the g r a m m a r , b u t is s t i l l simple enough f o r m e t a r u l o s to
t r a n s f o r m
4 N o t a t i o n
E v e r y rule o f the APSG has t h r e e p a r t s :
1 A p h r a s e - s t r u c t u r e r u l e ;
2 A r e s t r i c t i o n set ( R S E T ) t h a t r e s t r i c t s the
a p p l i c a b i l i t y o f the r u l e , and
3 An a s s i g n m e n t set ( A S E T ) t h a t assigns values to
f e a t u r e s
The RSET and ASET m a n i p u l a t e f e a t u r e s o f the phrase m a r k e r
a n a l y z e d by the r u l e ; t h e y are discussed b e l o w in d e t a i l
P h r a s e - s t r u c t u r e rules are w r i t t e n as:
C A T -> C 1 C 2 Cn
w h e r e CAT is the d o m i n a t i n g c a t e g o r y of the phrase, and C 1
t h r o u g h C n are its i m m e d i a t e c o n s t i t u e n t c a t e g o r i e s T e r m i n a l
s t r i n g s can be i n c l u d e d in the r u l e by e n c l o s i n g them in double
q u o t e marks
A f e a t u r e set is associated w i t h each node in the parse t r e e
t h a t is c r e a t e d when z s t r i n g is a n a l y z e d by the g r a m m a r
Each f e a t u r e has a name (a s t r i n g o f u p p e r c a s e a l p h a n u m e r i c
c h a r a c t e r s ) and an associated value The values a f e a t u r e can
t a k e on (the domain of the f e a t u r e ) are, in g e n e r a l , a r b i t r a r y
One o f the most useful domains is the set " ÷ , - , N I L " , w h e r e
N i l is the u n m a r k e d case; this domain corresponds ~ to the
b i n a r y f e a t u r e s used in [ 2 ) More c o m p l i c a t e d domains can be
used; f o r e x a m p l e , a CASE f e a t u r e m i g h t have as its domain the
set o f tuplos ~<1 SG>,<2 SG>,c3 SG>,<I PL>,<2 PL>,<3 PL>'~
Most i n t e r e s t i n g are those f e a t u r e s whose d o m a i n is a phrase
m a r k e r Since phrase m a r k e r s are just data s t r u c t u r e s t h a t the
parser c r e a t e s , they can be assigned as the value o f a f e a t u r e
This t e c h n i q u e is used to pass phrase m a r k e r s to various parts
o f the tree to r e f l e c t the gr;llmmatical and semantic s t r u c t u r e
o f the input; e x a m p l e s w i l l be g i v e n in l a t e r s e c t i o n s
We adopt the f o l l o w i n g c o n v e n t i o n s in r e f e r r i n g to f e a t u r e s
and t h e i r values:
- F e a t u r e s are o n e - p l a c e f u n c t i o n s t h a t range o v e r
phrase m a r k e r s c o n s t r u c t e d by the p h r a s e - s t r u c t u r e
p a r t o f a g r a m m a r r u l e The f u n c t i o n is named by
the f e a t u r e name
- These f u n c t i o n s are r e p r e s e n t e d in p r e f i x f o r m , e.g.,
(CASE NP) r e f e r s to the CASE f e a t u r e o f the NP
c o n s t i t u e n t o f a phrase m a r k e r In cases w h e r e
t h e r e is more than one c o n s t i t u e n t w i t h the same
c a t e g o r y name, t h e y w i l l be d i f f e r e n t i a t e d by a "~/"
s u f f i x , f o r e x a m p l e ,
V P - > V NP§I NP~2
* i t is sometimes hard to even u n d e r s t a n d w h a t i t is t h a t a
p r o c e d u r e does, since it may i n v o l v e r e c u r s i o n , s i d e - e f f e c t s ,
and o t h e r c o m p l i c a t i o n s
has t w o NP c o n s t i t u e n t s
- A phrase m a r k e r is assumed to have its i m m e d i a t e
c o n s t i t u e n t s as f e a t u r e s under t h e i r c a t e g o r y name,
e | , (N NP) r e f e r s to the N c o n s t i t u e n t of the NP
- F e a t u r e f u n c t i o n s may be nested, e.g., (CASE (N N P ) ) r e f e r s tO the CASE f e a t u r e o f the N
c o n s t i t u e n t o f the NP phrase m a r k e r For these nestings, we adopt the s i m p l e r n o t a t i o n
r i g h t - a s s o c i a t i v e
- T h e value N I L always i m p l i e s the u n m a r k e d case
At times it w i l l be useful to c o n s i d e r f e a t u r e s t h a t are n o t e x p l i c i t l y a t t a c h e d to a phrase m a r k e r as
b e i n g p r e s e n t w i t h value N I L
- A c o n s t a n t t e r m w i l l be w r i t t e n w i t h a p r e c e d i n g single quote m a r k , e.s , tSG r e f e r s to the c o n s t a n t
t o k e n SG
4.1 R e s t r i c t i o n s The RSET o f a r u l e r e s t r i c t s the a p p l i c a b i l i t y o f the rule by
a p r e d i c a t i o n on the f e a t u r e s o f its c o n s t i t u e n t s The phrase
m a r k e r s used as c o n s t i t u e n t s must s a t i s f y the p r e d i c a t i o n s in
t h e RSET b e f o r e t h e y w i l l he a n a l y z e d by t h e r u l e to c r e a t e a
n e w phrase m a r k e r The most useful p r e d i c a t e is e q u a l i t y : a
f e a t u r e can t a k e on only one p a r t i c u l a r value to be a c c e p t a b l e
F o r e x a m p l e , in the phrase s t r u c t u r e r u l e :
S - > NP VP
n u m b e r a g r e e m e n t could be e n f o r c e d by the p r e d i c a t i o n :
( N B R NP) - { N B R VP)
w h e r e N B R is a f e a t u r e whose domain is S G , P L ~ * This w o u l d
r e s t r i c t the NBR f e a t u r e on NP to agree w i t h t h a t on VP
b e f o r e the S phrase was c o n s t r u c t e d The e c o n o m y o f the APSG e n c o d i n g is seen here: only a single p h r a s e - s t r u c t u r e r u l e
is r e q u i r e d Also, the l i n g u i s t i c r e q u i r e m e n t t h a t s u b j e c t s and
t h e i r verbs agree in number is e n f o r c e d by a single s t a t e m e n t ,
r a t h e r than being i m p l i c i t in s e p a r a t e phrase s t r u c t u r e rules, one f o r s i n g u l a r s u b j e c t - v e r b c o m b i n a t i o n s , a n o t h e r f o r p l u r a l s Besides e q u a l i t y , t h e r e are only t h r e e a d d i t i o n a l
p r e d i c a t i o n s : i n e q u a l i t y (#), set m e m b e r s h i p (e) and set
n o n - m e m b e r s h i p (It) The last t w o are useful in d e a l i n g w i t h
n o n - b i n a r y domains As discussed in the n e x t s e c t i o n , t i g h t
r e s t r i c t i o n s on p r e d i c a t i o n s are necessary i f m e t a r u l e s are to
be successful in t r a n s f o r m i n g g r a m m a r r u l e s Whether these
f o u r p r e d i c a t e s are a d e q u a t e in d e s c r i p t i v e p o w e r f o r the
g r a m m a r we c o n t e m p l a t e r e m a i n s an open e m p i r i c a l q u e s t i o n ;
we are c u r r e n t l y a c c u m u l a t i n g e v i d e n c e f o r t h e i r s u f f i c i e n c y by
r e w r i t i n g D I A G R A M using just those p r e d i c a t e s
R e s t r i c t i o n p r e d i c a t i o n s f o r a r u l e are c o l l e c t e d in the RSET o f t h a t r u l e A l l r e s t r i c t i o n s must hold f o r the r u l e to
be a p p l i c a b l e As an i l l u s t r a t i o n , c o n s i d e r the
s u b c a t e g o r i z a t l o n r u l e f o r d l t r a n s i t l v e verbs w i t h p r e p o s i t i o n a l
o b j e c t s (e.g eJohn gave a book to M a r y " ) :
VP -> V NP PP RSET: ( T R A N S V) = ~DI;
(PREP V ) : (PREP PP)
The f i r s t r e s t r i c t i o n selects only verbs t h a t are m a r k e d as
d l t r a n s i t i v e ; the T R A N S f e a t u r e comes f r o m the l e x i c a l e n t r y
o f the v e r b D l t r a n s i t i v verbs w i t h p r e p o s i t i o n a l a r g u m e n t s are a l w a y s s u b c a t e g o r i z e d cy the p a r t i c u l a r p r e p o s i t i o n used, e.g., " g i v e a always uses I r e " f o r its p r e p o s i t i o n a l a r g u m e n t
* H o w NP and VP c a t e g o r i e s could " i n h e r i t " the NBR f e a t u r e
f r o m t h e i r N and V c o n s t i t u e n t s is discussed in the n e x t
s e c t i o n
Trang 3g i v e n v e r b The PREP f e a t u r e o f the verb comes f r o m its
l e x i c a l e n t r y , and must m a t c h the p r e p o s i t i o n o f the PP p h r a s e *
4.2 A s s i g n m e n t s
A r u l e w i l l n o r m a l l y assign f e a t u r e s to the d o m i n a t i n g node
o f t h e phrase m a r k e r it c o n s t r u c t s , based on the values o f the
c o n s t i t u e n t s f f e a t u r e s F o r e x a m p l e , f e a t u r e i n h e r i t a n c e takes
p l a c e in this w a y Assume t h e r e is a f e a t u r e N B R m a r k i n g the
s y n t a c t i c n u m b e r o f nouns Then the ASET o f a r u l e f o r noun
phrases m i g h t be:
NP -> D E T N
A S E T : ( N B R NP) := ( N B R N)
This n o t a t i o n is s o m e w h a t n o n - s t a n d a r d ; i t says t h a t the v a l u e
o f the N B R f u n c t i o n on the NP phrase m a r k e r is to be the
v a l u e o f the N B R f u n c t i o n o f the N phrase m a r k e r
An i n t e r e s t i n g a p p l i c a t i o n o f f e a t u r e a s s i g n m e n t is to
d e s c r i b e the g r a m m a t i c a l f u n c t i o n s o f noun phrases w i t h i n a
c l a u s e R e c a l l t h a t the d o m a i n o f f e a t u r e s can be c o n s t i t u e n t s
t h e m s e l v e s A d d i n g an A S E T d e s c r i b i n g the g r a m m a t i c a l
f u n c t i o n o f i t s c o n s t i t u e n t s to t h e d i t r a n s i t i v e VP r u l e yields
t h e f o l l o w i n g :
V P - > V NP PP
A S E T : ( D I R O B J V P ) := (NP V P ) ;
( I N D O B J V P ) := (NP PP)
T h i s A S E T assigns the D I R O B J ( d i r e c t o b j e c t ) f e a t u r e o f VP
t h e v a l u e o f the c o n s t i t u e n t NP S l m i l a r l y ~ the v a l u e o f
I N D O B J ( i n d i r e c t o b j e c t ) is the NP c o n s t i t u e n t o f the PP
p h r a s e
A r u l e may also assign f e a t u r e values to the c o n s t i t u e n t s o f
t h e phrase m a r k e r i t c o n s t r u c t s Such a s s i g n m e n t s are c o n t e x t
sensitive, because the values are based on the c o n t e x t in w h i c h
t h e c o n s t i t u e n t O c c u r s * " A g a i n , the most i n t e r e s t i n g use o f
this t e c h n i q u e is in assigning f u n c t i o n a l roles to c o n s t i t u e n t s in
p a r t i c u l a r phrases C o n s i d e r a r u l e f o r main c l a u s e s :
S - > NP VP
A S E T : (SUBJ V P ) := (NP S),
The t h r e e f e a t u r e s SUBJ, D I R O B J , and I N D O B J o f the VP
phrase m a r k e r w i l l have as v a l u e the a p p r o p r i a t e NP phrase
m a r k e r s , since the D I R O B J and I N D O B J f e a t u r e s w i l l be
assigned to the VP phrase m a r k e r w h e n i t is c o n s t r u c t e d Thus
the g r a m m a t i c a l f u n c t i o n o f the NPs has been i d e n t i f i e d by
a s s i g n i n g f e a t u r e s a p p r o p r i a t e l y
F i n a l l y , n o t e t h a t the g r a m m a t i c a l Functions w e r e assigned
to the VP phrase m a r k e r By assembling all o f t h e a r g u m e n t s
at this l e v e l , i t is possible to a c c o u n t f o r b o u n d e d d e l e t i o n
p h e n o m e n o n t h a t are l e x i c a l l y c o n t r o l l e d Consider
s u b c a t e g o r i z a t i o n f o r Equi verbs, in w h i c h the s u b j e c t o f t h e
m a i n clause has been d e l e t e d f r o m the i n f i n i t i v e c o m p l e m e n t
( " J o h n w a n t s to gem):
= N o t e t h a t we are n o t c o n s i d e r i n g here p r e p o s i t i o n a l phrases
t h a t are e s s e n t i a l l y m e s a - a r g u m e n t s to the v e r b , d e a l i n g w i t h
t i m e , place, and the l i k e The p r e p o s i t i o n s used f o r
m e s a - a r g u m e n t s are much more v a r i a b l e , and u s u a l l y depend on
s e m a n t i c considerations
" * T h e a s s i g n m e n t o f f e a t u r e s to c o n s t i t u e n t s p r e s e n t s some
c o m p u t a t i o n a l p r o b l e m s , since a c o n t e x t - f r e e p a r s e r w i l l no
l o n g e r be s u f f i c i e n t to a n a l y z e s t r i n g s This was r e c o g n i z e d in
the o r i g i n a l v e r s i o n o f APSGs [ 7 ] , and a t w o - p a s s p a r s e r was
c o n s t r u c t e d t h a t f i r s t uses the c o n t e x t - f r e e c o m p o n e n t o f the
g r a m m a r to p r o d u c e an i n i t i a l parse t r e e , t h e n adds the
a s s i g n m e n t of f e a t u r e s in c o n t e x t
V P - > V I N F
A S E T : (SUBJ I N F ) := ( S U B J ' V P )
H e r e the s u b j e c t NP o f the main clause has been passed d o w n
to the VP ( b y t h e S r u l e ) , which in t u r n passes i t to the
i n f i n i t i v e as i t s s u b j e c t N o t all l i n g u i s t i c p h e n o m e n o n can be
f o r m u l a t e d so e a s i l y w i t h APSGs; in p a r t i c u l a r , APSGs have
t r o u b l e d e s c r i b i n g u n b o u n d e d d e l e t i o n and c o n j u n c t i o n
r e d u c t i o n M e t a r u l e f o r m u l a t i o n s f o r the l a t t e r p h e n o m e n a
have been p r o p o s e d in [ 5 ] , and we w i l l n o t deal w i t h t h e m
here
5 M e t a r u l e s f o r APSGs
M e t a r u l e s c o n s i s t o f t w o p a r t s : a m a t c h t e m p l a t e w i t h
v a r i a b l e s whose purpose is to m a t c h e x i s t i n g g r a m m a r rules; and an i n s t a n t i a t l o n t e m p l a t e t h a t p r o d u c e s a new g r a m m a r
r u l e by using the m a t c h t e m p l a t e ~ s v a r i a b l e b i n d i n g s a f t e r a
s u c c e s s f u l m a t c h I n i t i a l l y , a basic set o f g r a m m a r rules is
i n p u t ; metarules d e r i v e new r u l e s , w h i c h then can r e c u r s i v e l y
be used as i n p u t to the m e t a r u l e s When ( i f ) the process h a l t s ,
t h e new set o f r u l e s , t o g e t h e r w i t h the basic rules, c o m p r i s e s
t h e g r a m m a r
We w i l l use the f o l l o w i n g n o t a t i o n f o r m e t a r u l e s :
MF => IF
C S E T : C1, C2, Cn
w h e r e MF is a _matchln| f o r m , IF is an i n s t a n t i a t i o n f o r m , and CSET is a set o f p r e d i c a t i o n s B o t h the MF and IF have the same f o r m as g r a m m a r rules, b u t in a d d i t i o n , t h e y can c o n t a i n
v a r i a b l e s When an MF is m a t c h e d a g a i n s t a g r a m m a r r u l e , these v a r i a b l e s a r e bound to d i f f e r e n t parts o f the rule i f t h e
m a t c h succeeds The IF is i n s t a n t l a t e d w i t h these b i n d i n g s to
p r o d u c e a n e w r u l e To r e s t r i c t t h e a p p l i c a t i o n o f m e t a r u l e s ,
a d d i t i o n a l c o n d i t i o n s on the v a r i a b l e b i n d i n g s may be s p e c i f i e d
( C S E T ) ; these have the same f o r m as the RSET o f g r a m m a r
r u l e s , h u t t h e y can m e n t i o n the v a r i a b l e s m a t c h e d by the MF
M e t a r u l e s may be c l a s s i f i e d i n t o t h r e e t y p e s :
I I n t r o d u c t o r y m e t a r u l e s , w h e r e the MF is e m p t y (=> I F ) These m e t a r u l e s i n t r o d u c e a class o f
g r a m m a r r u l e s
2 D e l e t i o n m e t a r u l e s , w h e r e the IF is e m p t y ( M F =>) These d e l e t e any d e r i v e d g r a m m a r rules
t h a t t h e y m a t c h
3 D e r i v a t i o n m e t a r u l e s , w h e r e both MF and IF are
p r e s e n t These d e r i v e new g r a m m a r rules f r o m old ones
T h e r e are l i n g u i s t i c g e n e r a l i z a t i o n s t h a t can he c a p t u r e d most
p e r s p i c u o u s l y by each o f the t h r e e f o r m s We w i l l focus on
d e r i v a t i o n m e t a r u l e s here, since t h e y are the most c o m p l i c a t e d
6 M a t c h i n g
An i m p o r t a n t p a r t o f t h e d e r i v a t i o n process is the d e f i n i t i o n
o f a m a t c h b e t w e e n a m e t a r u l e m a t c h i n g f o r m and a g r a m m a r
r u l e The m a t c h i n g p r o b l e m is c o m p l i c a t e d by the p r e s e n c e o f RSET and ASET p r e d i c a t i o n s in the g r a m m a r r u l e s Thus, i t is
h e l p f u l to d e f i n e a m a t c h in t e r m s o f the phrase m a r k e r s t h a t
w i l l be a d m i t t e d by the g r a m m a r r u l e and the MF We w i l l say
t h a t an MF m a t c h e s a g r a m m a r r u l e j u s t in case i t a d m i t s at
l e a s t those phrase m a r k e r s a d m i t t e d by the g r a m m a r r u l e This
d e f i n i t i o n o f a m a t c h is s u f f i c i e n t to a l l o w the f o r m u l a t i o n o f
m a t c h i n g a l g o r i t h m s f o r g r a m m a r rules c o m p l i c a t e d by
a n n o t a t i o n s
We d i v i d e the m a t c h i n g process i n t o t w o p a r t s : m a t c h i n g
p h r a s e - s t r u c t u r e s , and m a t c h i n g f e a t u r e sets B o t h p a r t s must
s u c c e e d in o r d e r f o r the m a t c h to s u c c e e d
Trang 46.1 M a t c h i n g P h r a s e - s t r u c t u r e s
F o r p h r a s e - s t r u c t u r e s , the d e f i n i t i o n o f i m a t c h can be
r e p l a c e d by a d i r e c t c o m p a r i s o n o f the p h r a s e - s t r u c t u r e s o f the
MF and g r a m m a r r u l e V a r i a b l e s in the MF p h r a s e - s t r u c t u r e
are used to i n d i c a t e I d o f l l t care a p a r t s o f the g r a m m a r rule
p h r a s e - s t r u c t u r e , w h i l e c o n s t a n t s must m a t c h e x a c t l y S I n | l e
l o w e r case l e t t e r s are used f o r v a r i a b l e s t h a t must m a t c h
single c a t e g o r i e s o f the g r a m m a r r u l e A t y p i c a l MF m i g h t be:
S - > a VP
w h i c h m a t c h e s
S - > NP VP w i t h a=NP;
S - > SB VP w i t h IBSB;
S - > ' I T ' VP w i t h a J ' I T ' ;
etC
A v a r i a b l e t h a t appears more than once in an MF must have the
same b i n d i n g f o r each o c c u r r e n c e f o r a m a t c h to be successful,
e.$.,
VP -> V a a
m a t c h e s
VP -> V NP NP w i t h a=NP
b u t n o t
VP - > V NP PP
Single l e t t e r v a r i a b l e s must m a t c h a single c a t e g o r y in a
g r a m m a r r u l e Double l e t t e r v a r i a b l e s are used t o m a t c h a
n u m b e r o f c o n s e c u t i v e C a t l l o r i l s ( i n c l u d i n g none) fR the r u l e
We h a v e :
VP -> V uu
m a t c h i n g
VP - > V w i t h UUm();
VP - > V NP w i t h u u " ( N P ) ;
VP - > V NP PP w i t h u u u ( N P PP);
e t c
N o t e t h a t double l e t t e r v a r i a b l e s are bound to an o r d e r e d list
o f e l e m e n t s fTom ~he m a t c h e d r u l e Because o f this
c h a r a c t e r i s t i c , a~ MF w i t h more t h i n one double l e t t e r v a r i a b l e
may m a t c h t rule in several d i f f e r e n t ways:
VP - > V uu vv
m a t c h e s
VP -> V NP PP w i t h u u ' ( ) , v v s ( N P Pp);
u u = ( N P), vvm(PP );
uum(NP V P ) , v v - ( )
A l l of these are c o n s i d e r e d to be v a l i d , i n d e p e n d e n t m a t c h e s
Double and single l e t t e r v a r i a b l e s may be i n t e r m i x e d f r e e l y in
an MF
While double l e t t e r v a r i a b l e s m a t c h m u l t i p l e c a t e g o r i e s In l
phrase s t r u c t u r e r u l e , s t r i n g v a r i a b l e s m a t c h p a r t s o f a
c a t e g o r y S t r i n g v a r i a b l e s o c c u r in b o t h double and single
l e t t e r v a r i e t i e s ; as e x p e c t e d , the f o r m e r m a t c h any n u m b e r o f
c o n s e c u t i v e c h a r a c t e r s , w h i l e the l i t t e r m a t c h s l n | l e
c h a r a c t e r s S t r i n g v a r i a b l e s are assumed when an MF c a t e g o r y
c o n t a i n s i m i x t u r e o f upper and l o w e r case c h a r a c t e r s , e.g.:
Vt -> V NP~la NPuu
m a t c h e s
VP -> V NP~I NP w i t h a=1, uu=();
VP -> V NP/~I NP~2 w i t h a a l , uu=(# 2);
S t r i n g v a r i a b l e s are most useful f o r m a t c h i n g c a t e g o r y names
t h a t may use the ~ c o n v e n t i o n
6.2 F e a t u r e M a t c h i n g
So f a r v a r i a b l e s have m a t c h e d only the p h r a s e - s t r u c t u r e
p a r t o f g r a m m a r rules, and n o t the f e a t u r e a n n o t a t i o n s F o r
f e a t u r e m a t c h i n g , we must r e t u r n to the o r i g i n a l d e f i n i t i o n o f
m a t c h i n g based on the a d m i s s i b i l i t y o f phrase m a r k e r s The RSET o f a g r a m m a r r u l e is a closed f o r m u l a i n v o l v l n g the
f e a t u r e sees o f the phrase m a r k e r c o n s t r u c t e d by t h e r u l e ; l e t
P stand f o r this f o r m u l a I f P is t r u e f o r a given phrase
m a r k e r , then t h a t phrase m a r k e r is a c c e p t e d by t h e r u l e ; i f
n o t , It ts r e j e c t e d S i m i l a r l y , the RSET o f a m a t c h i n g f o r m is
an open f o r m u l a on the f e a t u r e sets o f the phrase m a r k e r ; l e t
R ( x l , x 2 X n ) stand f o r this f o r m u l a , w h e r e the x I are the
v a r i a b l e s o f the RSET For the MF;s r e s t r i c t i o n s to m a t c h those o f the g r a m m a r r u l e , we must be able to p r o v e the
f o r m u l a :
P => t e a 1 ) ( E X 2 ) _ ( E X n ) R ( x l , x 2 , - X n )
T h a t Is w h e n e v e r P a d m i t s a phrase m a r k e r , t h e r e e x i s t s some
b l n d i n | f o r R0s f r e e v a r i a b l e s t h a t also a d m i t s the phrase
m a r k e r
N o w the i m p o r t a n c e o f r e s t r i c t i n g the f o r m of P and R can
be seen P r o v i n g t h a t the above i m p l i c a t i o n holds f o r g e n e r a l P and R can be a hard p r o b l e m , r e q u i r i n g , f o r e x a m p l e , a
r e s o l u t i o n t h e o r e m p r o v e r By r e s t r i c t i n g P and R to simple
c o n j u n c t i o n s o f e q u a l i t i e s , i n e q u a l i t i e s , and set m e m b e r s h i p
p r e d i c a t e s , the m a t c h b e t w e e n P and R can be p e r f o r m e d by a simple and e f f i c i e n t a l g o r i t h m
6.3 I n s t a n t t a t i o n When a m a t a r u l e m a t c h e s a g r a m m a r r u l e , the CSET o f the
m e t a r u i a Is e v a l u a t e d to see i f the m e t a r u i e can indeed be
a p p l i e d For e x a m p l e , t h e MF:
V P - > " B E " xP CSET: x ~t ' V
w i l l m a t c h any r u l e f o r w h i c h x is n o t bound to V
When an MF m a t c h e s a r u l e , and the CSET is s a t i s f i e d , the
I n s t a n t l a t l o n f o r m o f the m e t a r u l e is used to p r o d u c e i new
r u l e TN~ v a r i a b l e s o f the IF are i n s t a n t i a t e d w i t h t h e i r values
f r o m the m a t c h , p r o d u c i n g I new r u l e In a d d i t i o n , r e s t r i c t i o n and a s s i g n m e n t f e a t u r e s t h a t do n o t c o n f l i c t w i t h the I F ' s
f e a t u r e s are c a r r i e d o v e r f r o m the r u l e t h a t m a t c h e d This
l a t t e r is a v e r y handy p r o p e r t y o f the i n s t a n t t a t i o n , since t h a t
is u s u a l l y w h a t the m e t a r u l e w r i t e r desires Consider
m e t a r u l e t h a t d e r i v e s the s u b j e c t - a u x i n v e r t e d f o r m o f a main clause w i t h a f i n i t e v e r b p h r a s e :
g r a m m a r r u l e : S - > NP A U X VP
R S E T : ( N B R NP) = ( N B R A U X ) ; ( F I N V P ) = i+;
m e t a r u l e : S - > NP A U X VP S~N>-> A U X NP VP
i f f e a t u r e s w e r e n o t c a r r i e d o v e r d u r i n g an i n s t a n i a t i o n , the
r e s u l t o f m a t c h i n g and I n s t a n t l a t i n g the m e t a r u l e w o u l d be:
SAI -> A U X NP VP
This does n o t p r e s e r v e number a g r e e m e n t , nor does i t r e s t r i c t the VP to being f i n i t e O f course, the m e t a r u l e could be
r e w r i t t e n to have the c o r r e c t r e s t r i c t i o n s in the IF, b u t this
w o u l d s h a r p l y curb the u t i l i t y o f the m e t a r u l e s , and lead to the
p r o l i f e r a t i o n o f m e t a r u i e s w i t h s l i g h t l y d i f f e r e n t RSETs
Trang 5We are now ready to give a short example o f two m e t , r u l e s
p r e d i c a t e / a r g u m e n t structure will be described by the f e a t u r e
PA, whose value is a list:
(V NP 1 Np 2 .)
arguments The order of the arguments is s i g n i f i c a n t , since:
( " g a v e " "John" "a book" " M a r y " )
<=> g i f t of a book by John to Mary
' g a v e " "John' "Mary m "a b o o k ' )
<=> ?? g i f t of Mary to a hook by John
Adding the PA feature, the rule f o r d i t r a n s l t l v e verbs with
prepositional objects becomes:
VP -> V NP PP
(PREP V) = (PREP PP);
ASET: (PA VP) := ' ( ( V VP) (SUBJ VP)(NP VP)(NP PP))
The SUBJ f e a t u r e is the subject NP passed down by the S rule
7.1 D a t i v e Movement
In dative movement, the prepositional NP becomes a noun
phrase next to the verb:
1 John gave a book to Mary =>
2 John gave Mary a book
The f i r s t object NP o f (2) f i l l s the same argument role as the
can be formulated as f o l l o w s :
m e t r u l e DATMOVE
VP -> V uu PP
DATMOVE accepts VPs with a t r a i l i n g prepositional argument,
and moves the NP from that argument to just a f t e r the verb
The verb must be marked as accepting dative arguments, hence
prepositional argument, the PREP f e a t u r e o f the VP doesn't
have to match i t As for the p r e d i c a t e / a r g u m e n t structure, the
NP#D c o n s t i t u e n t takes the place o f the prepositional NP in
the PA f e a t u r e
DATMOVE can be applied to the d l t r a n s l t l v e VP rule to
bindings are:
uu = (NP);
c : (NP VP}
I n s t a n t l a t i n g the IF then gives the dative c o n s t r u c t i o n :
VP -> V NP#D NP
ASET: (PA VP) :=
' ( ( V VP) (SUBJ VP) (NP VP) (Np~ID VP))
There are other grammar rules that dative movement w i l l apply
Make up a story f o r me => Make me up a story This is the reason the d o u b l e - l e t t e r v a r i a b l e " u u ' was used in
PP, DATMOVE can apply to yield a dative c o n s t r u c t i o n
7.2 Passive
In the passive t r a n s f o r m a t i o n , the NP immediately f o l l o w i n g
moves to an a g e r i v e BY-phrase:
(1) John gave a book to Mary =>
(2) A book was given to Mary by John
A metarule for the passive t r a n s f o r m a t i o n is:
m e t r u l e PASSIVE
VP -> V NPuu vv
PASSIVE deletes the NP immediately f o l l o w i n g the verb, and
p a r t i c i p l e s u f f i x f o r the verb In the p r e d i c a t e / a r g u m = n t
subject, while the new subject is used in place of the original
o b j e c t NP Applying PASSIVE to the d i t r a n s i t t v e rule yields:
AP -> V PPL PP PP#A
(PREP V) = (PREP PP);
' ( ( V VP) (NP PP#A) (SUBJ VP) (NP PP)); e.g "A book was given to Mary by John" w i l l be analyzed by
book" " M a r y " ) , which is the same p r e d i c a t e / a r g u m e n t structure
as the corresponding a c t i v e sentence
PASSIVE can also apply to the rule generated by DATMOVE
to yield the passive form of VpIs with dative objects:
AP -> V PPL NP PP#A
' ( ( V VP) (NP PP#A) {NP VP) (SUBJ VP)); e.g., "Mary was given a book by John"
8 Implementation
A system has been designed and implemented to test the
v a l i d i t y of this approach It consists of a m a t c h e r / i n s t a n t i a t o r
f o r m e t , r u l e s , along with an i t e r a t i o n loop that applies all the
m e t r u l e s on each cycle until no more new rules are generated
M e t r u l e s fur verb subcategorization and f i n i t e and n o n - f i n i t e clause structures have been w r i t t e n and input to the system
We were especially concerned:
- To check the p e r s p i c u i t y o f metarules for describing
s i g n i f i c a n t fragments o f English using the above
r e p r e s e n t a t i o n for grammar rules
- To check that a reasonably small number of new grammar rules were generated by the metarules for these fragments
Both o f these considerations are c r i t i c a l f o r the performance
Trang 6the metarules worked so well that they exposed gaps in a
over a f i v e year period and was thought to be reasonably
derived rules generated was encouragingly small:
Subcategorizatlon:
1 grammar rule
Clauses:
8 grammar rules
9 Conclusions
generalizations in the grammar A great deal of care must be
exercised i n w r i t i n g metarutes, because it is easy to state
metarutes can be used again aS input to the metarules, and this
generalizations w i l l also be a d i f f i c u l t task
The success of the metarule formulation in deriving a small
d e f i n i t i o n a l power of APSGs over ordinary PSGs For example,
number agreement and f e a t u r e inheritance can be expressed
simply by appropriate annotations in an APSG, but require
means that f e w e r metarules are needed, and hence f e w e r
derived rules are generated
3
4
5
6
7
8,
9
10
REFERENCES
W Woods, 'An Experimental Parsing System for Transition
Processins, P r e n t i c e - H a l l , Englewood Cliffs, New Jersey,
1 9 7 3
N Chomsky Aspects o f t h e Theory o f 5.,yntax, MIT Press, Cambridge, Mass., 1965
J Early, "An E f f i c i e n t Context Free Parsing Algorithm,"
CAC_M, Vol 13 (1970) 94-I02
U n i v e r s i t y of Sussex, (unpublished paper, April, 1979) Gerald Gazdar, "Unbounded Dependencies and Coordinate
S t r u c t u r e ' University of Sussex, (submitted to Inquiry, October, 1 9 7 9 )
Kurt Konollge, ' A Framework for a Portable NL I n t e r f a c e
C a l i f o r n i a (October 1979)
Understanding,' Technical Note 142, A r t i f i c i a l Intelligence Center, $RI i n t e r n a t i o n a l , Menlo Park, California (June 1977}
Analysis, e Proceedln|s of the I n t e r d i s c i p l i n a r y Conference
on Automated Text Processing, {November 1976)
Jane Robinson, ' D I A G R A M : A Grammar for Dialogues.'
I n t e r n a t i o n a l , Menlo Park, California {February 1980)