1. Trang chủ
  2. » Luận Văn - Báo Cáo

Báo cáo khoa học: "CAPTURING LINGUISTIC IN ANANNOTATED GENERALIZATIONS WITH METARULES PHRASE-STRUCTURE GRAMMAR" ppt

6 299 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Capturing linguistic in annotated generalizations with metarules phrase-structure grammar
Tác giả Kurt Konolige
Trường học SRI International
Chuyên ngành Linguistics
Thể loại báo cáo khoa học
Định dạng
Số trang 6
Dung lượng 415,92 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

PASSIVE can also apply to the rule generated by DATMOVE to yield the passive form of VpIs with dative objects: AP -> V PPL NP PP#A ' V VP NP PP#A {NP VP SUBJ VP; e.g., "Mary was given

Trang 1

I N AN A N N O T A T E D P H R A S E - S T R U C T U R E G R A M M A R

K u r t K o n o l i g e SRI I n t e r n a t i o n a l =

1 I n t r o d u c t i o n

C o m p u t a t i o n a l models e m p l o y e d by c u r r e n t n a t u r a l l a n g u a g e

u n d e r s t a n d i n g systems r e l y on p h r a s e - s t r u c t u r e r e p r e s e n t a t i o n s

o f s y n t a x Whether i m p l e m e n t e d as a u g m e n t e d t r a n s i t i o n nets,

B N F g r a m m a r s , a n n o t a t e d phrase-structure g r a m m a r s , or s i m i l a r

m e t h o d s , a phrase-structure representation makes the p a r s i n g

p r o b l e m c o m p u t a t l o n a l l y t r a c t a b l e [ 7 ] H o w e v e r ,

p h r a s e - s t r u c t u r e r e p r e s e n t a t i o n s have been open to the

c r i t i c i s m t h a t t h e y do n o t c a p t u r e l i n g u i s t i c g e n e r a l i z a t i o n s

t h a t are easily expressed in t r a n s f o r m a t i o n a l g r a m m a r s

T h i s p a p e r d e s c r i b e s a f o r m a l i s m f o r s p e c i f y i n g s y n t a c t i c

and s e m a n t i c g e n e r a l i z a t i o n s across the rules o f a

phrase-structure g r a m m a r (PSG) The f o r m a l i s m consists o f

t w o p a r t s :

1 A d e c l a r a t i v e d e s c r i p t i o n o f basic s y n t a c t i c

p h r a s e - s t r u c t u r e s and t h e i r a s s o c i a t e d s e m a n t i c

t r a n s l a t i o n

2 A set o f m e t a r u l e s f o r d e r i v i n g a d d i t i o n a l g r a m m a r

r u l e s f r o m the basic set

Since m e t a r u l e s o p e r a t e on g r a m m a r rules r a t h e r than phrase

m a r k e r s , the t r a n s f o r m a t i o n a l e f f e c t o f m e t a r u l e s can be

p r o - c o m p u t e d b e f o r e the g r a m m a r is used to a n a l y z e i n p u t ,

The c o m p u t a t i o n a l e f f i c i e n c y o f a p h r a s e - s t r u c t u r e g r a m m a r is

thus p r e s e r v e d ,

M e t a r u l e f o r m u l a t i o n s f o r PSGs have r e c e n t l y r e c e i v e d

i n c r e a s e d a t t e n t i o n in the l i n g u i s t i c s l i t e r a t u r e , e s p e c i a l l y in

[ 4 ] , w h i c h g r e a t l y i n f l u e n c e d the f o r m a l i s m p r e s e n t e d in this

p a p e r Our f o r m a l i s m d i f f e r s s i g n i f i c a n t l y f r o m [ 4 ] in t h a t

t h e m e t a r u l e s w o r k on a p h r a s e - s t r u c t u r e g r a m m a r a n n o t a t e d

w i t h a r b i t r a r y f e a t u r e sets ( A n n o t a t e d P h r a s e - s t r u c t u r e

G r a m m a r , or APSG [ 7 ] ) G r a m m a r s f o r a l a r g e subset o f

E n g l i s h have been w r i t t e n using this f o r m a l i s m [ 9 ] , and its

c o m p u t a t i o n a l v i a b i l i t y has been d e m o n s t r a t e d [ 6 ] Because o f

t h e i n c r e a s e d s t r u c t u r a l c o m p l e x i t y o f APSGs o v e r PSGs

w i t h o u t a n n o t a t i o n s , new t e c h n i q u e s f o r a p p l y i n g m e t a r u l e s to

these s t r u c t u r e s are d e v e l o p e d in this p a p e r , and the n o t i o n o f

a m a t c h b e t w e e n a m e t a r u l e and a g r a m m a r r u l e is c a r e f u l l y

d e f i n e d The f o r m a l i s m has been i m p l e m e n t e d as a c o m p u t e r

p r o g r a m and p r e l i m i n a r y tests have been made to e s t a b l i s h its

v a l i d i t y and e f f e c t i v e n e s s

2 M e t a r u l e s

M e t a r u l e s are used to c a p t u r e l i n g u i s t i c g e n e r a l i z a t i o n s t h a t

a r e n o t r e a d i l y expressed in the p h r a s e - s t r u c t u r e r u l e s

C o n s i d e r the t w o s e n t e n c e s :

1, John gave a book to Mary

2 Mary was given a hook by John

A l t h o u g h t h e i r s y n t a c t i c s t r u c t u r e is d i f f e r e n t , these t w o

s e n t e n c e s have many e l e m e n t s in c o m m o n In p a r t i c u l a r , the

p r e d i c a t e / a r g u m e n t s t r u c t u r e t h e y d e s c r i b e is the same: the

g i f t o f a book by j o h n to M a r y T r a n s f o r m a t i o n a l g r a m m a r s

c a p t u r e t h i s c o r r e s p o n d e n c e by t r a n s f o r m i n g the phrase m a r k e r

=This r e s e a r c h was s u p p o r t e d by the D e f e n s e A d v a n c e d

R e s e a r c h P r o j e c t s A g e n c y u n d e r C o n t r a c t N 0 0 0 3 9 - 7 9 - C - 0 1 1 8

w i t h t h e N a v a l E l e c t r o n i c s Systems C o m m a n d The v i e w s and

c o n c l u s i o n s c o n t a i n e d in this d o c u m e n t are those o f the a u t h o r

and should n o t be i n t e r p r e t e d as representative o f the o f f i c i a l

p o l i c i e s , e i t h e r expressed or i m p l i e d , o f the U.S G o v e r n m e n t

The a u t h o r is g r a t e f u l to Jane Robinson and G a r y H e n d r i x f o r

c o m m e n t s on an earlier d r a f t o f this p a p e r

f o r (1) i n t o the phrase m a r k e r f o r ( 2 ) The u n d e r l y i n g

p r e d i c a t e / a r g u m e n t s t r u c t u r e r e m a i n s the same, b u t the s u r f a c e

r e a l i z a t i o n changes H o w e v e r , the r e c o g n i t i o n o f

t r a n s f o r m a t i o n a l g r a m m a r s is a very d i f f i c u l t c o m p u t a t i o n a l

p r o b l e m =

By c o n t r a s t , m e t a r u l e s o p e r a t e d i r e c t l y on the rules o f a PSG to p r o d u c e more rules f o r t h a t g r a m m a r As long as the

n u m b e r o f d e r i v e d rules is f i n i t e , the r e s u l t i n g set o f rules is

s t i l l a PSG, U n l i k e t r a n s f o r m a t i o n a l g r a m m a r s PSGs have

e f f i c i e n t a l g o r i t h m s f o r p a r s i n g [ 3 ] In a sense, all o f the

w o r k o f t r a n s f o r m a t i o n s has been pushed o f f i n t o a

p r e - p r o c e s s i n g phase w h e r e n e w g r a m m a r rules are d e r i v e d

We are n o t g r e a t l y c o n c e r n e d w i t h e f f i c i e n c y in p r e - p r o c e s s i n g , because it only has to be done o n c e

There are s t i l l c o m p u t a t i o n a ! l i m i t a t i o n s on PSGs t h a t must

be t a k e n i n t o a c c o u n t by any m e t a r u l e s y s t e m L a r g e n u m b e r s

o f p h r a s e - s t r u c t u r e rules can s e r i o u s l y d e g r a d e the

p e r f o r m a n c e o f a p a r s e r , b o t h in t e r m s o f i t s r u n n i n g t i m e == ,

s t o r a g e f o r the r u l e s , and the a m b i g u i t y o f the r e s u l t i n g parses [ 6 ] M o r e o v e r , the g e n e r a t i o n o f l a r g e numbers o f rules seems p s y c h o l o g i c a l l y i m p l a u s i b l e Thus the t w o c r i t e r i a we

w i l l use to j u d g e the e f f i c a c y o f m e t a r u l e s w i l l be: can t h e y

a d e q u a t e l y c a p t u r e l i n g u i s t i c g e n e r a l i z a t i o n s , and are t h e y

¢ o m p u t a t i o n a l l y p r a c t i c a b l e in t e r m s o f the n u m b e r o f rules

t h e y generate The f o r m a l i s m o f [ 4 ] is e s p e c i a l l y v u l n e r a b l e

to c r i t i c i s m on the l a t t e r p o i n t , since i t g e n e r a t e s l a r g e

n u m b e r s o f new r u l e s *==

3 R e p r e s e n t a t i o n

An a n n o t a t e d phrase-structure g r a m m a r (APSG) as

d e v e l o p e d in [ 7 ] is the t a r g e t r e p r e s e n t a t i o n f o r the

m e t a r u l e s The core c o m p o n e n t o f an APSG is a set o f

c o n t e x t - f r e e p h r a s e - s t r u c t u r e r u l e s As is c u s t o m a r y , these

r u l e s are i n p u t to a c o n t e x t - f r e e p a r s e r to a n a l y z e a s t r i n g ,

p r o d u c i n g a p h r a s e - s t r u c t u r e t r e e as o u t p u t In a d d i t i o n , the parse t r e e so p r o d u c e d may have a r b i t r a r y f e a t u r e sets, c a l l e d

a n n o t a t i o n s , a p p e n d e d to each node The a n n o t a t i o n s are an

e f f i c i e n t means o f i n c o r p o r a t i n g a d d i t i o n a l i n f o r m a t i o n i n t o the parse t r e e T y p i c a l l y , f e a t u r e s w i l l e x i s t f o r s y n t a c t i c

p r o c e s s i n g (e.g., n u m b e r agreement), g r a m m a t i c a l f u n c t i o n o f

c o n s t i t u e n t s (e.g., s u b j e c t , d i r e c t and i n d i r e c t o b j e c t s ) , and

s e m a n t i c i n t e r p r e t a t i o n

A s s o c i a t e d w i t h each r u l e o f the g r a m m a r are p r o c e d u r e s

f o r o p e r a t i n g on f e a t u r e sets o f the phrase m a r k e r s the r u l e

c o n s t r u c t s These p r o c e d u r e s may c o n s t r a i n the a p p l i c a t i o n o f

t h e r u l e by t e s t i n g f e a t u r e s on c a n d i d a t e c o n s t i t u e n t s , or add

i n f o r m a t i o n to t h e s t r u c t u r e c r e a t e d by t h e r u l e , based on the

f e a t u r e s o f i t s c o n s t i t u e n t s Rule p r o c e d u r e s are w r i t t e n in

t h e p r o g r a m m i n g l a n g u a g e LISP, g i v i n g the g r a m m a r the p o w e r

to r e c o g n i z e class 0 l a n g u a g e s The use o f a r b i t r a r y

p r o c e d u r e s and f e a t u r e set a n n o t a t i o n s makes APSGs an

* T h e r e has been some success in r e s t r i c t i n g the p o w e r o f

t r a n s f o r m a t i o n a l g r a m m a r s s u f f i c i e n t l y to a l l o w a recognizer to

be b u i l t ; see [ 8 ]

= * S h e l l [ 1 0 ] has shown t h a t , f o r a simple recursive d e s c e n t

p a r s i n g a l g o r i t h m , r u n n i n g time is a l i n e a r f u n c t i o n of the

n u m b e r o f r u l e s F o r o t h e r p a r s i n g schemes, the r e l a t i o n s h i p

b e t w e e n the n u m b e r o f rules and p a r s i n g t i m e is u n c l e a r

='~SThis is w i t h o u t c o n s i d e r i n g i n f i n i t e schemas such as the

one f o r c o n i u n c t i o n r e d u c t i o n B a s i c a l l y , the p r o b l e m is t h a t

t h e f o r m a l i s m o f [ 4 ] a l l o w s c o m p l e x f e a t u r e s [21 to d e f i n e

n e w c a t e g o r i e s , g e n e r a t i n g an exponential n u m b e r o f c a t e g o r i e s (and hence r u l e s ) w i t h respect to the n u m b e r o f f e a t u r e s

Trang 2

l a n g u a g e , s i m i l a r to the e a r l i e r A T N f o r m a l i s m s [ 1 ] An

e x a m p l e o f how an APSG can encode a large subset o f English

is the D I A G R A M g r a m m a r [ 9 ]

It is u n f o r t u n a t e l y the v e r y p o w e r .of APSGs (and A T N s )

t h a t makes i t d i f f i c u l t to c a p t u r e l i n g u i s t i c g e n e r a l i z a t i o n s

w i t h i n these f o r m a l i s m s M e t a r u l e s f o r t r a n s f o r m i n g one

a n n o t a t e d p h r a s e - s t r u c t u r e rule into a n o t h e r must n o t only

t r a n s f o r m the p h r a s e - s t r u c t u r e , b u t also the p r o c e d u r e s t h a t

o p e r a t e on f e a t u r e sets, in an a p p r o p r i a t e way Because the

t r a n s f o r m a t i o n o f p r o c e d u r e s is n o t o r i o u s l y d i f f i c u l t , * one o f

the tasks of this paper w i l l be to i l l u s t r a t e a d e c l a r a t i v e

n o t a t i o n d e s c r i b i n g o p e r a t i o n s on f e a t u r e sets t h a t is p o w e r f u l

enough to encode the m a n i p u l a t i o n s o f f e a t u r e s necessary f o r

the g r a m m a r , b u t is s t i l l simple enough f o r m e t a r u l o s to

t r a n s f o r m

4 N o t a t i o n

E v e r y rule o f the APSG has t h r e e p a r t s :

1 A p h r a s e - s t r u c t u r e r u l e ;

2 A r e s t r i c t i o n set ( R S E T ) t h a t r e s t r i c t s the

a p p l i c a b i l i t y o f the r u l e , and

3 An a s s i g n m e n t set ( A S E T ) t h a t assigns values to

f e a t u r e s

The RSET and ASET m a n i p u l a t e f e a t u r e s o f the phrase m a r k e r

a n a l y z e d by the r u l e ; t h e y are discussed b e l o w in d e t a i l

P h r a s e - s t r u c t u r e rules are w r i t t e n as:

C A T -> C 1 C 2 Cn

w h e r e CAT is the d o m i n a t i n g c a t e g o r y of the phrase, and C 1

t h r o u g h C n are its i m m e d i a t e c o n s t i t u e n t c a t e g o r i e s T e r m i n a l

s t r i n g s can be i n c l u d e d in the r u l e by e n c l o s i n g them in double

q u o t e marks

A f e a t u r e set is associated w i t h each node in the parse t r e e

t h a t is c r e a t e d when z s t r i n g is a n a l y z e d by the g r a m m a r

Each f e a t u r e has a name (a s t r i n g o f u p p e r c a s e a l p h a n u m e r i c

c h a r a c t e r s ) and an associated value The values a f e a t u r e can

t a k e on (the domain of the f e a t u r e ) are, in g e n e r a l , a r b i t r a r y

One o f the most useful domains is the set " ÷ , - , N I L " , w h e r e

N i l is the u n m a r k e d case; this domain corresponds ~ to the

b i n a r y f e a t u r e s used in [ 2 ) More c o m p l i c a t e d domains can be

used; f o r e x a m p l e , a CASE f e a t u r e m i g h t have as its domain the

set o f tuplos ~<1 SG>,<2 SG>,c3 SG>,<I PL>,<2 PL>,<3 PL>'~

Most i n t e r e s t i n g are those f e a t u r e s whose d o m a i n is a phrase

m a r k e r Since phrase m a r k e r s are just data s t r u c t u r e s t h a t the

parser c r e a t e s , they can be assigned as the value o f a f e a t u r e

This t e c h n i q u e is used to pass phrase m a r k e r s to various parts

o f the tree to r e f l e c t the gr;llmmatical and semantic s t r u c t u r e

o f the input; e x a m p l e s w i l l be g i v e n in l a t e r s e c t i o n s

We adopt the f o l l o w i n g c o n v e n t i o n s in r e f e r r i n g to f e a t u r e s

and t h e i r values:

- F e a t u r e s are o n e - p l a c e f u n c t i o n s t h a t range o v e r

phrase m a r k e r s c o n s t r u c t e d by the p h r a s e - s t r u c t u r e

p a r t o f a g r a m m a r r u l e The f u n c t i o n is named by

the f e a t u r e name

- These f u n c t i o n s are r e p r e s e n t e d in p r e f i x f o r m , e.g.,

(CASE NP) r e f e r s to the CASE f e a t u r e o f the NP

c o n s t i t u e n t o f a phrase m a r k e r In cases w h e r e

t h e r e is more than one c o n s t i t u e n t w i t h the same

c a t e g o r y name, t h e y w i l l be d i f f e r e n t i a t e d by a "~/"

s u f f i x , f o r e x a m p l e ,

V P - > V NP§I NP~2

* i t is sometimes hard to even u n d e r s t a n d w h a t i t is t h a t a

p r o c e d u r e does, since it may i n v o l v e r e c u r s i o n , s i d e - e f f e c t s ,

and o t h e r c o m p l i c a t i o n s

has t w o NP c o n s t i t u e n t s

- A phrase m a r k e r is assumed to have its i m m e d i a t e

c o n s t i t u e n t s as f e a t u r e s under t h e i r c a t e g o r y name,

e | , (N NP) r e f e r s to the N c o n s t i t u e n t of the NP

- F e a t u r e f u n c t i o n s may be nested, e.g., (CASE (N N P ) ) r e f e r s tO the CASE f e a t u r e o f the N

c o n s t i t u e n t o f the NP phrase m a r k e r For these nestings, we adopt the s i m p l e r n o t a t i o n

r i g h t - a s s o c i a t i v e

- T h e value N I L always i m p l i e s the u n m a r k e d case

At times it w i l l be useful to c o n s i d e r f e a t u r e s t h a t are n o t e x p l i c i t l y a t t a c h e d to a phrase m a r k e r as

b e i n g p r e s e n t w i t h value N I L

- A c o n s t a n t t e r m w i l l be w r i t t e n w i t h a p r e c e d i n g single quote m a r k , e.s , tSG r e f e r s to the c o n s t a n t

t o k e n SG

4.1 R e s t r i c t i o n s The RSET o f a r u l e r e s t r i c t s the a p p l i c a b i l i t y o f the rule by

a p r e d i c a t i o n on the f e a t u r e s o f its c o n s t i t u e n t s The phrase

m a r k e r s used as c o n s t i t u e n t s must s a t i s f y the p r e d i c a t i o n s in

t h e RSET b e f o r e t h e y w i l l he a n a l y z e d by t h e r u l e to c r e a t e a

n e w phrase m a r k e r The most useful p r e d i c a t e is e q u a l i t y : a

f e a t u r e can t a k e on only one p a r t i c u l a r value to be a c c e p t a b l e

F o r e x a m p l e , in the phrase s t r u c t u r e r u l e :

S - > NP VP

n u m b e r a g r e e m e n t could be e n f o r c e d by the p r e d i c a t i o n :

( N B R NP) - { N B R VP)

w h e r e N B R is a f e a t u r e whose domain is S G , P L ~ * This w o u l d

r e s t r i c t the NBR f e a t u r e on NP to agree w i t h t h a t on VP

b e f o r e the S phrase was c o n s t r u c t e d The e c o n o m y o f the APSG e n c o d i n g is seen here: only a single p h r a s e - s t r u c t u r e r u l e

is r e q u i r e d Also, the l i n g u i s t i c r e q u i r e m e n t t h a t s u b j e c t s and

t h e i r verbs agree in number is e n f o r c e d by a single s t a t e m e n t ,

r a t h e r than being i m p l i c i t in s e p a r a t e phrase s t r u c t u r e rules, one f o r s i n g u l a r s u b j e c t - v e r b c o m b i n a t i o n s , a n o t h e r f o r p l u r a l s Besides e q u a l i t y , t h e r e are only t h r e e a d d i t i o n a l

p r e d i c a t i o n s : i n e q u a l i t y (#), set m e m b e r s h i p (e) and set

n o n - m e m b e r s h i p (It) The last t w o are useful in d e a l i n g w i t h

n o n - b i n a r y domains As discussed in the n e x t s e c t i o n , t i g h t

r e s t r i c t i o n s on p r e d i c a t i o n s are necessary i f m e t a r u l e s are to

be successful in t r a n s f o r m i n g g r a m m a r r u l e s Whether these

f o u r p r e d i c a t e s are a d e q u a t e in d e s c r i p t i v e p o w e r f o r the

g r a m m a r we c o n t e m p l a t e r e m a i n s an open e m p i r i c a l q u e s t i o n ;

we are c u r r e n t l y a c c u m u l a t i n g e v i d e n c e f o r t h e i r s u f f i c i e n c y by

r e w r i t i n g D I A G R A M using just those p r e d i c a t e s

R e s t r i c t i o n p r e d i c a t i o n s f o r a r u l e are c o l l e c t e d in the RSET o f t h a t r u l e A l l r e s t r i c t i o n s must hold f o r the r u l e to

be a p p l i c a b l e As an i l l u s t r a t i o n , c o n s i d e r the

s u b c a t e g o r i z a t l o n r u l e f o r d l t r a n s i t l v e verbs w i t h p r e p o s i t i o n a l

o b j e c t s (e.g eJohn gave a book to M a r y " ) :

VP -> V NP PP RSET: ( T R A N S V) = ~DI;

(PREP V ) : (PREP PP)

The f i r s t r e s t r i c t i o n selects only verbs t h a t are m a r k e d as

d l t r a n s i t i v e ; the T R A N S f e a t u r e comes f r o m the l e x i c a l e n t r y

o f the v e r b D l t r a n s i t i v verbs w i t h p r e p o s i t i o n a l a r g u m e n t s are a l w a y s s u b c a t e g o r i z e d cy the p a r t i c u l a r p r e p o s i t i o n used, e.g., " g i v e a always uses I r e " f o r its p r e p o s i t i o n a l a r g u m e n t

* H o w NP and VP c a t e g o r i e s could " i n h e r i t " the NBR f e a t u r e

f r o m t h e i r N and V c o n s t i t u e n t s is discussed in the n e x t

s e c t i o n

Trang 3

g i v e n v e r b The PREP f e a t u r e o f the verb comes f r o m its

l e x i c a l e n t r y , and must m a t c h the p r e p o s i t i o n o f the PP p h r a s e *

4.2 A s s i g n m e n t s

A r u l e w i l l n o r m a l l y assign f e a t u r e s to the d o m i n a t i n g node

o f t h e phrase m a r k e r it c o n s t r u c t s , based on the values o f the

c o n s t i t u e n t s f f e a t u r e s F o r e x a m p l e , f e a t u r e i n h e r i t a n c e takes

p l a c e in this w a y Assume t h e r e is a f e a t u r e N B R m a r k i n g the

s y n t a c t i c n u m b e r o f nouns Then the ASET o f a r u l e f o r noun

phrases m i g h t be:

NP -> D E T N

A S E T : ( N B R NP) := ( N B R N)

This n o t a t i o n is s o m e w h a t n o n - s t a n d a r d ; i t says t h a t the v a l u e

o f the N B R f u n c t i o n on the NP phrase m a r k e r is to be the

v a l u e o f the N B R f u n c t i o n o f the N phrase m a r k e r

An i n t e r e s t i n g a p p l i c a t i o n o f f e a t u r e a s s i g n m e n t is to

d e s c r i b e the g r a m m a t i c a l f u n c t i o n s o f noun phrases w i t h i n a

c l a u s e R e c a l l t h a t the d o m a i n o f f e a t u r e s can be c o n s t i t u e n t s

t h e m s e l v e s A d d i n g an A S E T d e s c r i b i n g the g r a m m a t i c a l

f u n c t i o n o f i t s c o n s t i t u e n t s to t h e d i t r a n s i t i v e VP r u l e yields

t h e f o l l o w i n g :

V P - > V NP PP

A S E T : ( D I R O B J V P ) := (NP V P ) ;

( I N D O B J V P ) := (NP PP)

T h i s A S E T assigns the D I R O B J ( d i r e c t o b j e c t ) f e a t u r e o f VP

t h e v a l u e o f the c o n s t i t u e n t NP S l m i l a r l y ~ the v a l u e o f

I N D O B J ( i n d i r e c t o b j e c t ) is the NP c o n s t i t u e n t o f the PP

p h r a s e

A r u l e may also assign f e a t u r e values to the c o n s t i t u e n t s o f

t h e phrase m a r k e r i t c o n s t r u c t s Such a s s i g n m e n t s are c o n t e x t

sensitive, because the values are based on the c o n t e x t in w h i c h

t h e c o n s t i t u e n t O c c u r s * " A g a i n , the most i n t e r e s t i n g use o f

this t e c h n i q u e is in assigning f u n c t i o n a l roles to c o n s t i t u e n t s in

p a r t i c u l a r phrases C o n s i d e r a r u l e f o r main c l a u s e s :

S - > NP VP

A S E T : (SUBJ V P ) := (NP S),

The t h r e e f e a t u r e s SUBJ, D I R O B J , and I N D O B J o f the VP

phrase m a r k e r w i l l have as v a l u e the a p p r o p r i a t e NP phrase

m a r k e r s , since the D I R O B J and I N D O B J f e a t u r e s w i l l be

assigned to the VP phrase m a r k e r w h e n i t is c o n s t r u c t e d Thus

the g r a m m a t i c a l f u n c t i o n o f the NPs has been i d e n t i f i e d by

a s s i g n i n g f e a t u r e s a p p r o p r i a t e l y

F i n a l l y , n o t e t h a t the g r a m m a t i c a l Functions w e r e assigned

to the VP phrase m a r k e r By assembling all o f t h e a r g u m e n t s

at this l e v e l , i t is possible to a c c o u n t f o r b o u n d e d d e l e t i o n

p h e n o m e n o n t h a t are l e x i c a l l y c o n t r o l l e d Consider

s u b c a t e g o r i z a t i o n f o r Equi verbs, in w h i c h the s u b j e c t o f t h e

m a i n clause has been d e l e t e d f r o m the i n f i n i t i v e c o m p l e m e n t

( " J o h n w a n t s to gem):

= N o t e t h a t we are n o t c o n s i d e r i n g here p r e p o s i t i o n a l phrases

t h a t are e s s e n t i a l l y m e s a - a r g u m e n t s to the v e r b , d e a l i n g w i t h

t i m e , place, and the l i k e The p r e p o s i t i o n s used f o r

m e s a - a r g u m e n t s are much more v a r i a b l e , and u s u a l l y depend on

s e m a n t i c considerations

" * T h e a s s i g n m e n t o f f e a t u r e s to c o n s t i t u e n t s p r e s e n t s some

c o m p u t a t i o n a l p r o b l e m s , since a c o n t e x t - f r e e p a r s e r w i l l no

l o n g e r be s u f f i c i e n t to a n a l y z e s t r i n g s This was r e c o g n i z e d in

the o r i g i n a l v e r s i o n o f APSGs [ 7 ] , and a t w o - p a s s p a r s e r was

c o n s t r u c t e d t h a t f i r s t uses the c o n t e x t - f r e e c o m p o n e n t o f the

g r a m m a r to p r o d u c e an i n i t i a l parse t r e e , t h e n adds the

a s s i g n m e n t of f e a t u r e s in c o n t e x t

V P - > V I N F

A S E T : (SUBJ I N F ) := ( S U B J ' V P )

H e r e the s u b j e c t NP o f the main clause has been passed d o w n

to the VP ( b y t h e S r u l e ) , which in t u r n passes i t to the

i n f i n i t i v e as i t s s u b j e c t N o t all l i n g u i s t i c p h e n o m e n o n can be

f o r m u l a t e d so e a s i l y w i t h APSGs; in p a r t i c u l a r , APSGs have

t r o u b l e d e s c r i b i n g u n b o u n d e d d e l e t i o n and c o n j u n c t i o n

r e d u c t i o n M e t a r u l e f o r m u l a t i o n s f o r the l a t t e r p h e n o m e n a

have been p r o p o s e d in [ 5 ] , and we w i l l n o t deal w i t h t h e m

here

5 M e t a r u l e s f o r APSGs

M e t a r u l e s c o n s i s t o f t w o p a r t s : a m a t c h t e m p l a t e w i t h

v a r i a b l e s whose purpose is to m a t c h e x i s t i n g g r a m m a r rules; and an i n s t a n t i a t l o n t e m p l a t e t h a t p r o d u c e s a new g r a m m a r

r u l e by using the m a t c h t e m p l a t e ~ s v a r i a b l e b i n d i n g s a f t e r a

s u c c e s s f u l m a t c h I n i t i a l l y , a basic set o f g r a m m a r rules is

i n p u t ; metarules d e r i v e new r u l e s , w h i c h then can r e c u r s i v e l y

be used as i n p u t to the m e t a r u l e s When ( i f ) the process h a l t s ,

t h e new set o f r u l e s , t o g e t h e r w i t h the basic rules, c o m p r i s e s

t h e g r a m m a r

We w i l l use the f o l l o w i n g n o t a t i o n f o r m e t a r u l e s :

MF => IF

C S E T : C1, C2, Cn

w h e r e MF is a _matchln| f o r m , IF is an i n s t a n t i a t i o n f o r m , and CSET is a set o f p r e d i c a t i o n s B o t h the MF and IF have the same f o r m as g r a m m a r rules, b u t in a d d i t i o n , t h e y can c o n t a i n

v a r i a b l e s When an MF is m a t c h e d a g a i n s t a g r a m m a r r u l e , these v a r i a b l e s a r e bound to d i f f e r e n t parts o f the rule i f t h e

m a t c h succeeds The IF is i n s t a n t l a t e d w i t h these b i n d i n g s to

p r o d u c e a n e w r u l e To r e s t r i c t t h e a p p l i c a t i o n o f m e t a r u l e s ,

a d d i t i o n a l c o n d i t i o n s on the v a r i a b l e b i n d i n g s may be s p e c i f i e d

( C S E T ) ; these have the same f o r m as the RSET o f g r a m m a r

r u l e s , h u t t h e y can m e n t i o n the v a r i a b l e s m a t c h e d by the MF

M e t a r u l e s may be c l a s s i f i e d i n t o t h r e e t y p e s :

I I n t r o d u c t o r y m e t a r u l e s , w h e r e the MF is e m p t y (=> I F ) These m e t a r u l e s i n t r o d u c e a class o f

g r a m m a r r u l e s

2 D e l e t i o n m e t a r u l e s , w h e r e the IF is e m p t y ( M F =>) These d e l e t e any d e r i v e d g r a m m a r rules

t h a t t h e y m a t c h

3 D e r i v a t i o n m e t a r u l e s , w h e r e both MF and IF are

p r e s e n t These d e r i v e new g r a m m a r rules f r o m old ones

T h e r e are l i n g u i s t i c g e n e r a l i z a t i o n s t h a t can he c a p t u r e d most

p e r s p i c u o u s l y by each o f the t h r e e f o r m s We w i l l focus on

d e r i v a t i o n m e t a r u l e s here, since t h e y are the most c o m p l i c a t e d

6 M a t c h i n g

An i m p o r t a n t p a r t o f t h e d e r i v a t i o n process is the d e f i n i t i o n

o f a m a t c h b e t w e e n a m e t a r u l e m a t c h i n g f o r m and a g r a m m a r

r u l e The m a t c h i n g p r o b l e m is c o m p l i c a t e d by the p r e s e n c e o f RSET and ASET p r e d i c a t i o n s in the g r a m m a r r u l e s Thus, i t is

h e l p f u l to d e f i n e a m a t c h in t e r m s o f the phrase m a r k e r s t h a t

w i l l be a d m i t t e d by the g r a m m a r r u l e and the MF We w i l l say

t h a t an MF m a t c h e s a g r a m m a r r u l e j u s t in case i t a d m i t s at

l e a s t those phrase m a r k e r s a d m i t t e d by the g r a m m a r r u l e This

d e f i n i t i o n o f a m a t c h is s u f f i c i e n t to a l l o w the f o r m u l a t i o n o f

m a t c h i n g a l g o r i t h m s f o r g r a m m a r rules c o m p l i c a t e d by

a n n o t a t i o n s

We d i v i d e the m a t c h i n g process i n t o t w o p a r t s : m a t c h i n g

p h r a s e - s t r u c t u r e s , and m a t c h i n g f e a t u r e sets B o t h p a r t s must

s u c c e e d in o r d e r f o r the m a t c h to s u c c e e d

Trang 4

6.1 M a t c h i n g P h r a s e - s t r u c t u r e s

F o r p h r a s e - s t r u c t u r e s , the d e f i n i t i o n o f i m a t c h can be

r e p l a c e d by a d i r e c t c o m p a r i s o n o f the p h r a s e - s t r u c t u r e s o f the

MF and g r a m m a r r u l e V a r i a b l e s in the MF p h r a s e - s t r u c t u r e

are used to i n d i c a t e I d o f l l t care a p a r t s o f the g r a m m a r rule

p h r a s e - s t r u c t u r e , w h i l e c o n s t a n t s must m a t c h e x a c t l y S I n | l e

l o w e r case l e t t e r s are used f o r v a r i a b l e s t h a t must m a t c h

single c a t e g o r i e s o f the g r a m m a r r u l e A t y p i c a l MF m i g h t be:

S - > a VP

w h i c h m a t c h e s

S - > NP VP w i t h a=NP;

S - > SB VP w i t h IBSB;

S - > ' I T ' VP w i t h a J ' I T ' ;

etC

A v a r i a b l e t h a t appears more than once in an MF must have the

same b i n d i n g f o r each o c c u r r e n c e f o r a m a t c h to be successful,

e.$.,

VP -> V a a

m a t c h e s

VP -> V NP NP w i t h a=NP

b u t n o t

VP - > V NP PP

Single l e t t e r v a r i a b l e s must m a t c h a single c a t e g o r y in a

g r a m m a r r u l e Double l e t t e r v a r i a b l e s are used t o m a t c h a

n u m b e r o f c o n s e c u t i v e C a t l l o r i l s ( i n c l u d i n g none) fR the r u l e

We h a v e :

VP -> V uu

m a t c h i n g

VP - > V w i t h UUm();

VP - > V NP w i t h u u " ( N P ) ;

VP - > V NP PP w i t h u u u ( N P PP);

e t c

N o t e t h a t double l e t t e r v a r i a b l e s are bound to an o r d e r e d list

o f e l e m e n t s fTom ~he m a t c h e d r u l e Because o f this

c h a r a c t e r i s t i c , a~ MF w i t h more t h i n one double l e t t e r v a r i a b l e

may m a t c h t rule in several d i f f e r e n t ways:

VP - > V uu vv

m a t c h e s

VP -> V NP PP w i t h u u ' ( ) , v v s ( N P Pp);

u u = ( N P), vvm(PP );

uum(NP V P ) , v v - ( )

A l l of these are c o n s i d e r e d to be v a l i d , i n d e p e n d e n t m a t c h e s

Double and single l e t t e r v a r i a b l e s may be i n t e r m i x e d f r e e l y in

an MF

While double l e t t e r v a r i a b l e s m a t c h m u l t i p l e c a t e g o r i e s In l

phrase s t r u c t u r e r u l e , s t r i n g v a r i a b l e s m a t c h p a r t s o f a

c a t e g o r y S t r i n g v a r i a b l e s o c c u r in b o t h double and single

l e t t e r v a r i e t i e s ; as e x p e c t e d , the f o r m e r m a t c h any n u m b e r o f

c o n s e c u t i v e c h a r a c t e r s , w h i l e the l i t t e r m a t c h s l n | l e

c h a r a c t e r s S t r i n g v a r i a b l e s are assumed when an MF c a t e g o r y

c o n t a i n s i m i x t u r e o f upper and l o w e r case c h a r a c t e r s , e.g.:

Vt -> V NP~la NPuu

m a t c h e s

VP -> V NP~I NP w i t h a=1, uu=();

VP -> V NP/~I NP~2 w i t h a a l , uu=(# 2);

S t r i n g v a r i a b l e s are most useful f o r m a t c h i n g c a t e g o r y names

t h a t may use the ~ c o n v e n t i o n

6.2 F e a t u r e M a t c h i n g

So f a r v a r i a b l e s have m a t c h e d only the p h r a s e - s t r u c t u r e

p a r t o f g r a m m a r rules, and n o t the f e a t u r e a n n o t a t i o n s F o r

f e a t u r e m a t c h i n g , we must r e t u r n to the o r i g i n a l d e f i n i t i o n o f

m a t c h i n g based on the a d m i s s i b i l i t y o f phrase m a r k e r s The RSET o f a g r a m m a r r u l e is a closed f o r m u l a i n v o l v l n g the

f e a t u r e sees o f the phrase m a r k e r c o n s t r u c t e d by t h e r u l e ; l e t

P stand f o r this f o r m u l a I f P is t r u e f o r a given phrase

m a r k e r , then t h a t phrase m a r k e r is a c c e p t e d by t h e r u l e ; i f

n o t , It ts r e j e c t e d S i m i l a r l y , the RSET o f a m a t c h i n g f o r m is

an open f o r m u l a on the f e a t u r e sets o f the phrase m a r k e r ; l e t

R ( x l , x 2 X n ) stand f o r this f o r m u l a , w h e r e the x I are the

v a r i a b l e s o f the RSET For the MF;s r e s t r i c t i o n s to m a t c h those o f the g r a m m a r r u l e , we must be able to p r o v e the

f o r m u l a :

P => t e a 1 ) ( E X 2 ) _ ( E X n ) R ( x l , x 2 , - X n )

T h a t Is w h e n e v e r P a d m i t s a phrase m a r k e r , t h e r e e x i s t s some

b l n d i n | f o r R0s f r e e v a r i a b l e s t h a t also a d m i t s the phrase

m a r k e r

N o w the i m p o r t a n c e o f r e s t r i c t i n g the f o r m of P and R can

be seen P r o v i n g t h a t the above i m p l i c a t i o n holds f o r g e n e r a l P and R can be a hard p r o b l e m , r e q u i r i n g , f o r e x a m p l e , a

r e s o l u t i o n t h e o r e m p r o v e r By r e s t r i c t i n g P and R to simple

c o n j u n c t i o n s o f e q u a l i t i e s , i n e q u a l i t i e s , and set m e m b e r s h i p

p r e d i c a t e s , the m a t c h b e t w e e n P and R can be p e r f o r m e d by a simple and e f f i c i e n t a l g o r i t h m

6.3 I n s t a n t t a t i o n When a m a t a r u l e m a t c h e s a g r a m m a r r u l e , the CSET o f the

m e t a r u i a Is e v a l u a t e d to see i f the m e t a r u i e can indeed be

a p p l i e d For e x a m p l e , t h e MF:

V P - > " B E " xP CSET: x ~t ' V

w i l l m a t c h any r u l e f o r w h i c h x is n o t bound to V

When an MF m a t c h e s a r u l e , and the CSET is s a t i s f i e d , the

I n s t a n t l a t l o n f o r m o f the m e t a r u l e is used to p r o d u c e i new

r u l e TN~ v a r i a b l e s o f the IF are i n s t a n t i a t e d w i t h t h e i r values

f r o m the m a t c h , p r o d u c i n g I new r u l e In a d d i t i o n , r e s t r i c t i o n and a s s i g n m e n t f e a t u r e s t h a t do n o t c o n f l i c t w i t h the I F ' s

f e a t u r e s are c a r r i e d o v e r f r o m the r u l e t h a t m a t c h e d This

l a t t e r is a v e r y handy p r o p e r t y o f the i n s t a n t t a t i o n , since t h a t

is u s u a l l y w h a t the m e t a r u l e w r i t e r desires Consider

m e t a r u l e t h a t d e r i v e s the s u b j e c t - a u x i n v e r t e d f o r m o f a main clause w i t h a f i n i t e v e r b p h r a s e :

g r a m m a r r u l e : S - > NP A U X VP

R S E T : ( N B R NP) = ( N B R A U X ) ; ( F I N V P ) = i+;

m e t a r u l e : S - > NP A U X VP S~N>-> A U X NP VP

i f f e a t u r e s w e r e n o t c a r r i e d o v e r d u r i n g an i n s t a n i a t i o n , the

r e s u l t o f m a t c h i n g and I n s t a n t l a t i n g the m e t a r u l e w o u l d be:

SAI -> A U X NP VP

This does n o t p r e s e r v e number a g r e e m e n t , nor does i t r e s t r i c t the VP to being f i n i t e O f course, the m e t a r u l e could be

r e w r i t t e n to have the c o r r e c t r e s t r i c t i o n s in the IF, b u t this

w o u l d s h a r p l y curb the u t i l i t y o f the m e t a r u l e s , and lead to the

p r o l i f e r a t i o n o f m e t a r u i e s w i t h s l i g h t l y d i f f e r e n t RSETs

Trang 5

We are now ready to give a short example o f two m e t , r u l e s

p r e d i c a t e / a r g u m e n t structure will be described by the f e a t u r e

PA, whose value is a list:

(V NP 1 Np 2 .)

arguments The order of the arguments is s i g n i f i c a n t , since:

( " g a v e " "John" "a book" " M a r y " )

<=> g i f t of a book by John to Mary

' g a v e " "John' "Mary m "a b o o k ' )

<=> ?? g i f t of Mary to a hook by John

Adding the PA feature, the rule f o r d i t r a n s l t l v e verbs with

prepositional objects becomes:

VP -> V NP PP

(PREP V) = (PREP PP);

ASET: (PA VP) := ' ( ( V VP) (SUBJ VP)(NP VP)(NP PP))

The SUBJ f e a t u r e is the subject NP passed down by the S rule

7.1 D a t i v e Movement

In dative movement, the prepositional NP becomes a noun

phrase next to the verb:

1 John gave a book to Mary =>

2 John gave Mary a book

The f i r s t object NP o f (2) f i l l s the same argument role as the

can be formulated as f o l l o w s :

m e t r u l e DATMOVE

VP -> V uu PP

DATMOVE accepts VPs with a t r a i l i n g prepositional argument,

and moves the NP from that argument to just a f t e r the verb

The verb must be marked as accepting dative arguments, hence

prepositional argument, the PREP f e a t u r e o f the VP doesn't

have to match i t As for the p r e d i c a t e / a r g u m e n t structure, the

NP#D c o n s t i t u e n t takes the place o f the prepositional NP in

the PA f e a t u r e

DATMOVE can be applied to the d l t r a n s l t l v e VP rule to

bindings are:

uu = (NP);

c : (NP VP}

I n s t a n t l a t i n g the IF then gives the dative c o n s t r u c t i o n :

VP -> V NP#D NP

ASET: (PA VP) :=

' ( ( V VP) (SUBJ VP) (NP VP) (Np~ID VP))

There are other grammar rules that dative movement w i l l apply

Make up a story f o r me => Make me up a story This is the reason the d o u b l e - l e t t e r v a r i a b l e " u u ' was used in

PP, DATMOVE can apply to yield a dative c o n s t r u c t i o n

7.2 Passive

In the passive t r a n s f o r m a t i o n , the NP immediately f o l l o w i n g

moves to an a g e r i v e BY-phrase:

(1) John gave a book to Mary =>

(2) A book was given to Mary by John

A metarule for the passive t r a n s f o r m a t i o n is:

m e t r u l e PASSIVE

VP -> V NPuu vv

PASSIVE deletes the NP immediately f o l l o w i n g the verb, and

p a r t i c i p l e s u f f i x f o r the verb In the p r e d i c a t e / a r g u m = n t

subject, while the new subject is used in place of the original

o b j e c t NP Applying PASSIVE to the d i t r a n s i t t v e rule yields:

AP -> V PPL PP PP#A

(PREP V) = (PREP PP);

' ( ( V VP) (NP PP#A) (SUBJ VP) (NP PP)); e.g "A book was given to Mary by John" w i l l be analyzed by

book" " M a r y " ) , which is the same p r e d i c a t e / a r g u m e n t structure

as the corresponding a c t i v e sentence

PASSIVE can also apply to the rule generated by DATMOVE

to yield the passive form of VpIs with dative objects:

AP -> V PPL NP PP#A

' ( ( V VP) (NP PP#A) {NP VP) (SUBJ VP)); e.g., "Mary was given a book by John"

8 Implementation

A system has been designed and implemented to test the

v a l i d i t y of this approach It consists of a m a t c h e r / i n s t a n t i a t o r

f o r m e t , r u l e s , along with an i t e r a t i o n loop that applies all the

m e t r u l e s on each cycle until no more new rules are generated

M e t r u l e s fur verb subcategorization and f i n i t e and n o n - f i n i t e clause structures have been w r i t t e n and input to the system

We were especially concerned:

- To check the p e r s p i c u i t y o f metarules for describing

s i g n i f i c a n t fragments o f English using the above

r e p r e s e n t a t i o n for grammar rules

- To check that a reasonably small number of new grammar rules were generated by the metarules for these fragments

Both o f these considerations are c r i t i c a l f o r the performance

Trang 6

the metarules worked so well that they exposed gaps in a

over a f i v e year period and was thought to be reasonably

derived rules generated was encouragingly small:

Subcategorizatlon:

1 grammar rule

Clauses:

8 grammar rules

9 Conclusions

generalizations in the grammar A great deal of care must be

exercised i n w r i t i n g metarutes, because it is easy to state

metarutes can be used again aS input to the metarules, and this

generalizations w i l l also be a d i f f i c u l t task

The success of the metarule formulation in deriving a small

d e f i n i t i o n a l power of APSGs over ordinary PSGs For example,

number agreement and f e a t u r e inheritance can be expressed

simply by appropriate annotations in an APSG, but require

means that f e w e r metarules are needed, and hence f e w e r

derived rules are generated

3

4

5

6

7

8,

9

10

REFERENCES

W Woods, 'An Experimental Parsing System for Transition

Processins, P r e n t i c e - H a l l , Englewood Cliffs, New Jersey,

1 9 7 3

N Chomsky Aspects o f t h e Theory o f 5.,yntax, MIT Press, Cambridge, Mass., 1965

J Early, "An E f f i c i e n t Context Free Parsing Algorithm,"

CAC_M, Vol 13 (1970) 94-I02

U n i v e r s i t y of Sussex, (unpublished paper, April, 1979) Gerald Gazdar, "Unbounded Dependencies and Coordinate

S t r u c t u r e ' University of Sussex, (submitted to Inquiry, October, 1 9 7 9 )

Kurt Konollge, ' A Framework for a Portable NL I n t e r f a c e

C a l i f o r n i a (October 1979)

Understanding,' Technical Note 142, A r t i f i c i a l Intelligence Center, $RI i n t e r n a t i o n a l , Menlo Park, California (June 1977}

Analysis, e Proceedln|s of the I n t e r d i s c i p l i n a r y Conference

on Automated Text Processing, {November 1976)

Jane Robinson, ' D I A G R A M : A Grammar for Dialogues.'

I n t e r n a t i o n a l , Menlo Park, California {February 1980)

Ngày đăng: 24/03/2014, 01:21

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm