Báo cáo khoa học: "COMPOSE-REWUCE PARSING " docx

If reduction ever completes a category which is marked as the left corner of one or more left-recursive rules or rule sequences, ND raise* in place wrt those rules sequences, and propaga

Trang 1

C O M P O S E - R E W U C E P A R S I N G

H e n r y S T h o m p s o n 1

M i k e D i x o n 2

J o h n L a m p i n g 2 1: H u m a n C o m m u n i c a t i o n R e s e a r c h C e n t r e

U n i v e r s i t y o f E d i n b u r g h

2 B u c c l e u c h P l a c e

E d i n b u r g h EH8 9LW

S C O T L A N D 2: X e r o x Palo Alto R e s e a r c h C e n t e r

3 3 3 3 Coyote Hill R o a d

Palo Alto, C A 94304

ABSTRACT Two new p a r s i n g a l g o r i t h m s for

context-free p h r a s e s t r u c t u r e g r a m -

m a r s are p r e s e n t e d which perform a

b o u n d e d a m o u n t of p r o c e s s i n g p e r

word per analysis path, independently

of sentence length They are t h u s ca-

pable of parsing in real-time in a par-

allel i m p l e m e n t a t i o n which forks pro-

cessors in response to non-determinis-

tic choice points

0 INTRODUCTION

The work reported here grew out of

our a t t e m p t to improve on the o (n 2)

p e r f o r m a n c e of t h e SIMD p a r a l l e l

p a r s e r described in (Thompson 1991)

R a t h e r t h a n s t a r t with a commitment

to a specific SIMD architecture, as t h a t

work had, we agreed t h a t the b e s t

place to s t a r t was with a more abstract

a r c h i t e c t u r e - i n d e p e n d e n t considera-

tion of the CF-PSG p a r s i n g p r o b l e m - -

given a r b i t r a r y resources, w h a t algo-

r i t h m s could one e n v i s a g e w h i c h

could recognise and/or p a r s e atomic

category p h r a s e - s t r u c t u r e g r a m m a r s

in o (n) ? In the end, two quite differ-

ent approaches emerged One took as

its s t a r t i n g p o i n t n o n - d e t e r m i n i s t i c

shift-reduce p a r s i n g , a n d sought to

achieve linear (indeed real-time) complexity by performing a constant-time step per word of the input The other took as its s t a r t i n g point t a b u l a r pars-

i n g ( E a r l e y , C KY), a n d s o u g h t to achieve l i n e a r complexity by performing a constant-time step for the identi- fication/construction of constituents of each l e n g t h from 0 to n The l a t t e r

r o u t e h a s b e e n w i d e l y c a n v a s s e d , although to our knowledge has not yet been i m p l e m e n t e d - - s e e (Nijholt 1989, 90) for e x t e n s i v e references The former route, whereby real-time parsing is achieved by processor forking at non-deterministic choice points in an extended shill-reduce parser, is to our knowledge new In this paper we pre- sent outlines of two such parsers, which w e call compose-reduce

parsers

L COMPOSE-Rk~nUCE PARSING

W h y couldn't a simple b r e a d t h - first c h a r t p a r s e r achieve l i n e a r performance on a n a p p r o p r i a t e parallel system? If you provided enough processors to i m m e d i a t e l y process all

a g e n d a entries as t h e y were created, would not this give the desired result?

No, because the processing of a single word m i g h t r e q u i r e m a n y serialised

Trang 2

steps C o n s i d e r p r o c e s s i n g th e word

"park" i n t h e s e n t e n c e "The people

who r a n i n the p a r k got wet." Given a

s i m p l e t r a d i t i o n a l sort of g r a m m a r ,

t h a t word completes a n s P , which i n

t u r n completes a P P, w h i c h i n t u r n

completes a vP, w h i c h i n t u r n com-

pletes a n s, w h ich i n t u r n completes a

REL, w h i c h i n t u r n completes a n NP

The construction/recognition of the se

c o n s t i t u e n t s is n e c e s s a r i l y serialised,

so regardless of the n u m b e r of proces-

sors a v a i l a b l e a c o n s t a n t - t i m e step is

impossible (Note t h a t t h i s only pre-

cludes a r eal -t i m e parse by this route,

b u t not n e c e s s a r i l y a l i n e a r one.) In

the shift-reduce approach to parsing,

all t h i s m e a n s is t h a t for n o n - l i n e a r

g r a m m a r s , a single shift step m a y be

followed b y m a n y reduce steps This

in t u r n suggested the b e g i n n i n g s of a

w a y out, b a s e d on c a t e g o r i a l g r a m -

m a r , n a m e l y t h a t m u l t i p l e r e d u c e s

can be avoided i f composition is al-

lowed To r e t u r n to our e x a m p l e

above, i n a simple shift-reduce p a r s e r

we would h a v e h a d all the words pre-

ceding the word "park" i n t h e stack

W h e n it was s h i f t e d in, t h e r e would

follow six reduce steps If a l t e r n a -

tively following a shift step one was al-

lowed ( n o n - d e t e r m i n i s t i c a l l y ) a com-

pose step, this could be reduced (!) to a

single reduce step R e s t r i c t i n g our-

selves to a s i m p l e r example, consider

j u s t "run i n the park" as a v v , given

r u l e s

V P ) v P P

N P ) d n

P P ) p N P

W i t h a composition step allowed,

th e p a r s e would t h e n proceed as fol-

lows:

Shift run as a v

Shift in as a p

C o m p o s e v and p to give [vP v [PP p • NP]]

w h e r e I use a c o m b i n a t i o n of brack- eted strings a n d the 'dotted rule' nota- tion to indicate the r e s u l t of composition The categorial e q u i v a l e n t would

h a v e b e e n to notate v as v P / P P, P as PP/NP, a n d the r e s u l t of the composition as therefore vP/NP

Shift the as d

Compose the dotted vp with d

to give

[VP v [PP p [NP d • n ] ] ]

Shift park as n

Reduce the dotted v p with n to give the complete result

A l t h o u g h a n u m b e r of d e t a i l s re-

m a i n e d to be worked out, t h i s simple move of allowing composition was the

e n a b l i n g step to achieving o ( n ) parsing P a r a l l e l i s m would arise by fork-

i n g processors a t each n o n - d e t e r m i n - istic choice point, following t h e gen- eral model of Dixon's e a r l i e r work on

p a r a l l e l i s i n g t h e A T M S (Dixon & de Kleer 1988)

S i m p l y allowing composition is not

i n i t s e l f sufficient to achieve o (n) performance Some m e a n s of guarantee-

i n g t h a t each step is c o n s t a n t t i m e

m u s t still be provided Here we found two different ways forward

II TEn~ FIRST COMPOSE-REDUCE PARSER -CR4

In t h i s p a r s e r t h e r e is no stack

We h a v e s i m p l y a c u r r e n t structure,

w h i c h corresponds to t he top node of

t h e s t a c k i n a n o r m a l s h i f t - r e d u c e parser This is achieved by extending the appeal to composition to include a form of l e f t - e m b e d d e d r a i s i n g , w h i c h

w i l l be d i s c u s s e d f u r t h e r below Special a t t e n t i o n is also r e q u i r e d to

h a n d l e left-recursive rules

Trang 3

II.1 The Basic P a r s i n g Algorithm

The c o n s t a n t - t i m e p a r s i n g step is

give n below ( s l i g h t l y s i m p l i f i e d , i n

t h a t empty productions a n d some u n i t

productions are not handled) In this

a l g o r i t h m schema, a n d i n s u b s e q u e n t

discussion, the annotation "ND" will be

used in situations where a n u m b e r of

a l t e r n a t i v e s are (or m a y be) described

The m e a n i n g is t h a t these alternatives

are to be p u r s u e d n o n - d e t e r m i n i s t i -

cally

Algorithm CR-I

1 Shift the next word;

2 ND look it up in the lexicon;

3 ND close the resulting cate-

gory wrt the unit produc-

t i o n s ;

4a ND reduce the resulting

category with the current

s t r u c t u r e

or

4b N D raise* the resulting cat-

egory wrt the non-unary

rules in the grammar for

which it is a left corner, and

compose the result with the

current structure

If reduction ever completes a

category which is marked as

the left corner of one or

more left-recursive rules or

rule sequences, ND raise* in

place wrt those rules

(sequences), and propagate

the marking

Some of these ND steps m a y at var-

ious points produce complete struc-

tures If the i n p u t is exhausted, t h e n

those s t r u c t u r e s are p ars es , or not,

depending on wh e th er or not they have

reached the d i s t i n g u i s h e d symbol If

the i n p u t is not e x h a u s t e d , it is of

course the incomplete s t r uc t ur e s , the

results of composition or raising,

which are carried forward to t h e next step

The operation referred to above as

"raise*" is more than simple raising,

as was involved i n the simple example

in section IV In order to allow for all

possible compositions to take place all possible left-embedded raising must be

p u r s u e d C o n s i d e r t h e f o l lo w i n g

grammar fragment:

S ~ N P VP

VP -~ v N P C M P CMP - - ) t h a t S

NP -~ p r o p n

NP -+ d n

a n d the utterance "Kim told Robin t h a t the child likes Kim"

I f we ignore all t he ND i n c o r r e c t paths, the c u r r e n t s t r u c t u r e after the

"that" has been processed is

[S [NP [ p r o p n Kim]]

[VP [v told]

[NP [ p r o p n R o b i n ] ] [CMP t h a t • S] ] ]

In order for the next word, "the", to

be correctly processed, it m u s t be

r a i s e d all the w a y to s , n a m e l y we

must have

[S [NP [d the] • n] VP]]

to compose with the c u r r e n t structure

W h a t this m e a n s is t h a t for every en-

t r y in the normal bottom-up reachabil- ity table p a ir i n g a left corner with a top category, we need a set of dotted structures, corresponding to all the ways

t h e g r a m m a r can get from t h a t left corner to t h a t top category It is these

s t r u c t u r e s w h i c h are ND m a d e avail- able in step 4b of the pa r s i n g step algo-

r i t h m CR-I a b o v e

Trang 4

II.2 Handling Left Recursion

Now this in itself is not sufficient to

handle left recursive structures, since

by definition there could be an arbi-

t r a r y number of left-embeddings of a

left-recursive s t r u c t u r e The final

note in the description of algorithm

CR-I above is designed to handle this

Glossing over some subtleties, left-re-

cursion is handled by m a r k i n g some

of the structures introduced in step 3b,

and ND raising in place i f the marked

structure is ever completed by reduc-

tion in the course of a parse Consider

the sentence ~Robin likes the c h i l d ' s

dog." We add the following two rules

to the grammar:

D -9 a r t

D -9 N P 's

t h e r e b y t r a n s f o r m i n g D from a pre-

terminal to a non-terminal When we

shift "the", we will raise to inter alia

[NP [D [ a r t t h e ] ] • n] r

with the NP m a r k e d for potential re-

raising This structure will be com-

posed with the then current structure

to produce

IS [NP [propn Robin]]

[VP Iv l i k e s ]

[NP (as above) ]r] ]

After reduction with ~child", we

will have

[S [NP [propn Robin]]

[VP [v l i k e s ]

[NP [D [ a r t t h e ] ] [n c h i l d ] jr] ]

The l a s t reduction will have com-

pleted the m a r k e d N P i n t r o d u c e d

above, so we ND left-recursively raise

in place, giving

[S [NP [propn Robin]]

[VP Iv l i k e s ] [NP [D [NP t h e c h i l d ]

• ' S ]

n ] r ] ] which will then take us through the rest of the sentence

One final detail needs to be cleared

up Although directly left-recursive rules, such as e.g NP -9 NP PP, are correctly d e a l t with by the above mechanism, indirectly left-recursive sets of rules, such as the one exempli- fied above, require one additional sub- tlety Care must be t a k e n not to introduce the potential for spurious ambiguity We will introduce the full de- tails in the next section

II.3 Nature of the required tables Steps 3 and 4b of CR-I require tables

of partial structures: Closures of unit productions up from p r e - t e r m i n a l s , for step 3; left-reachable raisings up from (unit production closures of) pre- terminals, for step 4b In this section

we discuss the creation of the neces-

s a r y tables, in p a r t i c u l a r R a i s e * ,

a g a i n s t the background of a simple exemplary g r a m m a r , given below as Table 1

We have grouped the rules accord- ing to type two kinds of unit productions (from pre-terminals or non-terminals), two kinds of left recursive rules (direct and indirect) and the re-

m a i n d e r

v a n i l l a

S ) N P V P

V P -9 v N P

C M P ) c m p S

P P -9 p r e p N P

Table 1

N P -9 p r o p n N P -9 C M P N P -9 N P P P N P -9 D n

D -9 a r t V P -9 V P P P D ) N P 's

Exemplary g r a m m a r in groups by rule type

Trang 5

Cl*

LRdir

L R i n d i r 2

R S *

I:

2:

[NP pr°pn]l'2 [D art]4 [NP NP PP] 3: [VP VP PP]

[NP [D NP 's] n]

[CMP cmp S],

[pp p r e p NP]

[VP v NP] 3 [NP D n]l, 2, [D NpI 's]4,

[NP CMP] 1,2

4: [D [NP D n] 1 's]

[NP [CMP cmp s]]l, 2, [D [NP [CMP cmp S]] 1,2 's], [S [NP [CMP cmp S]]I, 2 VP]

[S [NP D n]l, 2 VP]

[S NpI'2 VP]

Table 2 Partial structures for CR-I

Ras* [NP -[NP propn] • pp]l,2, [NP [D -[NP propn] • 's] n] 1,2

[D [NP i ~ ° n] 1 's] 4 [CMP cmp • S], [NP [CMP cmp • S]]I, 2,

[D [NP [CMP cmp • S]]I, 2 's], [S [NP [CMP cmp ° S]]I, 2 VP]

[pp prep • NP]

[VP v • NP] 3 [NP [ D ~ " r i l l ' 2 • [S [NF J-D art] " n]l'2 VP]

[D [Np pr°pn]l " 's]4, [S [NP P r°pn]l'2 " VP]

Table 3 Projecting non-terminal left daughters

As a first step towards computing

the table which step 4b above would

use, we can pre-compute the partial

structures given above in Table 2

c l * contains all backbone frag-

ments constructable from the u n i t

productions, and is already essentially

what we require for step 3 of the algo-

rithm LRdir contains all directly left-

recursive structures L R i n d i r 2 con-

tains all indirectly left-recursive struc-

tures involving exactly two rules, and

there might be L R i n d i r 3 , 4, as

well R s* contains all non-recursive

tree fragments constructable from left-

embedding of binary or greater rules

and non-terminal u n i t productions

The superscripts denote loci where

left-recursion m a y be appropriate, and identify the relevant structures

In order to get the full R a i s e * table needed for step 4b, first we need to pro- ject the non-terminal left daughters of

rules such as [ s NpI' 2 VP ] d o w n to terminal left daughters W e achieve this by substituting terminal entries from Cl* wherever w e can in LRdir,

L R i n d i r 2 and Rs* to give us Table 3 from Table 2 (new e m b e d d i n g s are underlined)

Left recursion has one remaining problem for us Algorithm CR-I only checks for annotations and N D raises

in place after a reduction completes a constituent B u t in the last line of Ras* above there are unit constituents

Trang 6

[NP [NP p r o p n ] •

[D [NP [D art] •

[CMP cmp • S],

pp]l,2, [NP [D [NP p r o p n ] • 's]

n] 1 ,s] 4 [NP [CMP cmp • S]]1,2, [D [NP [CMP cmp ° S]]I, 2 's], [S [NP [CMP c m p • S]]I, 2 VP]

[pp p r e p • NP]

[VP v • NP] 3

[NP [D art] • n]l, 2, [S [NP [D art] ° n]l, 2 VP]

[D [NP p r o p n ] ° 's]4, [D [NP [NP p r o p n ] ° pp]l ,s]4

[S [NP p r o p n ] ° VP], [S [NP [NP p r o p n ] ° pp]l,2 VP],

[S [NP [D [NP propn] • 's] n] 1,2 VP]

Table 4 F i n a l form of the structure table R a i S e *

n]l, 2

with annotations Being a l r e a d y com-

plete, t h e y will not ever be completed,

and consequently the a n n o t a t i o n s will

never be checked So we pre-compute

the d e s i r e d r e s u l t , a u g m e n t i n g t h e

above l i s t w i t h e x p a n s i o n s of those

unit s via the i n d i c a t e d left recursions

T h i s gives u s t h e f i n a l v e r s i o n of

R a i s e * , n o w s h o w n w i t h dots in-

cluded, i n Table 4

This table is now suited to its role

in the algorithm E v e r y e n t r y h a s a

le xic a l left d a u g h t e r , all a n n o t a t e d

c o n s t i t u e n t s are incomplete, a n d all

u n i t productions are factored in It is

i n t e r e s t i n g to note t h a t with these tree

f r a g m e n t s , t a k e n t o g e t h e r w i t h t he

t e r m i n a l e n t ri e s i n C l * , as the i n i t i a l

trees a n d L R d i r , L R i n d i r 2 , etc as the

a u x i l i a r y t r e e s we h a v e a T r e e

A d j o i n i n g G r a m m a r ( J o s h i 1985)

which is strongly e q u i v a l e n t to the CF-

PSG we started with We m i g h t call it

the left-lexical TAG for t h a t CF-PSG,

after Schabes et al (1988) Note fur-

ther t h a t i f a TAG p a r s e r respected the

a n n o t a t i o n s as r e s t r i c t i n g adjunction,

no s p u r i o u s l y a m b i g u o u s p a r s e s

would be produced

Indeed it was via this r e l a t i o n s h i p

w i t h TAGs t h a t t h e d e t a i l s were

worked out of how the annotations are distributed, not p r e s e nt e d here to con- serve space

II.4 I m p l e m e n t a t i o n a n d Efficiency

O n l y a s e r i a l p s e u d o - p a r a l l e l im-

p l e m e n t a t i o n h a s b e e n w r i t t e n

B e c a u s e of t h e h i g h degree of pre- computation of structure, this version even though serialised r u n s quite effi- ciently There is very little computation at each step, as it is straight-forward to double index the mai s e* table

so t h a t only s t r u c t u r e s w h i c h will compose w i t h t h e c u r r e n t s t r u c t u r e are retrieved

The price one p a y s for t h i s efficiency, w h e t h e r i n s e r i a l or p a r a l l e l

v e r s i o n s , is t h a t o n l y l e f t - c o m m o n

s t r u c t u r e is s h a r e d R i g h t - c o m m o n

s t r u c t u r e , as for i n s t a n c e i n P P at-

t a c h m e n t ambiguity, is not s h a r e d between a n a l y s i s paths This causes no difficulties for the parallel approach in one sense, i n t h a t it does not compro-

m i s e the r e a l - ti m e performance of the parser Indeed, it is precisely because

no recombination is a t te mp t e d t h a t the

ba si c p a r s i n g step is c o n s t a n t time But it does m e a n t h a t i f the CF-PSG being parsed is the first h a l f of a two step process, i n w h i c h a d d i t i o n a l con-

Trang 7

straints are solved in the second pass,

t h e n the duplication of s t r u c t u r e will

give rise to duplication of effort Any

p a r a l l e l p a r s e r w h i c h a d o p t s t h e

s t r a t e g y of forking at non-determinis-

tic choice points will suffer from this

weakness, including CR-II below

III THE SECOND COMPOSE-R~nUCE

P A R S E R CR-II

O u r second approach to compose-

reduce parsing differs from the first in

r e t a i n i n g a stack, having a more com-

plex basic p a r s i n g step, while requir-

ing f a r less p r e - p r o c e s s i n g of t h e

g r a m m a r In p a r t i c u l a r , no special

t r e a t m e n t is required for left-recursive

rules Nevertheless, the basic step is

still c o n s t a n t time, and despite t h e

stack t h e r e is no potential processing

'balloon' at the end of the input

III 1 The Basic P a r s i n g Algorithm

Algorithm CR-II

1 Shift the next word;

2 ND look it up in the lexicon;

3 ND close the resulting cate-

gory wrt the unit produc-

t i o n s ;

4 N D reduce the resulting cat-

egory with the top of the

s t a c k - - i f results are com-

plete and there is input re-

maining, pop the stack;

5a N D raise the results of (2),

(3) and, where complete, (4)

and

5b N D either push the result

onto the stack

or

5c N D compose the result with

the top of the stack, replac-

ing it

This is not a n e a s y a l g o r i t h m to

u n d e r s t a n d I n the n e x t section we

p r e s e n t a n u m b e r of different ways of

m o t i v a t i n g it, t o g e t h e r with an illus- trative example

III.2 CR-II Explained Let us first consider how CR-II will operate on p u r e l y left-branching and

p u r e l y right-branching structures In each case we will consider the sequence of a l g o r i t h m steps along the

n o n - d e t e r m i n i s t i c a l l y c o r r e c t p a t h , ignoring the others We will also restrict ourselves to considering b i n a r y

b r a n c h i n g rules, as p r e - t e r m i n a l u n i t productions a r e h a n d l e d e n t i r e l y by step 3 of the algorithm, and non-terminal u n i t productions m u s t be factored into the g r a m m a r On the other

h a n d , i n t e r i o r d a u g h t e r s of non-bi-

n a r y nodes are all h a n d l e d by step 4

w i t h o u t c h a n g i n g t h e d e p t h of the stack

III.2.1 Left-branching analysis For a p u r e l y left-branching structure, the first word will be processed

b y steps 1, 2, 5a and 5b, producing a stack with one e n t r y which we can

s c h e m a t i s e as in F i g u r e 1, w h e r e filled circles are processed nodes and unfilled ones are waiting

Figure 1

All s u b s e q u e n t words except the last will be processed by steps 4, 5a and 5b (here and s u b s e q u e n t l y we will not mention steps 1 and 2, which occur for all words), effectively r e p l a c i n g the previous sole e n t r y in the stack with the one given in Figure 2

Trang 8

Figure 2

It should be evident t h a t the cycle of

steps 4, 5a and 5b constructs a left-

b r a n c h i n g s t r u c t u r e of i n c r e a s i n g

depth as the sole stack entry, with one

right daughter, of the top node, wait-

ing to be filled The last input word of

course is simply processed by step 4

and, as there is no f u r t h e r input, left

on the stack as the final result The

complete sequence of steps for any left-

branching analysis is thus r a i s e J r e -

d u c e & r a i s e * - - r e d u c e An o r d i n a r y

s h i f t - r e d u c e or l e f t - c o r n e r p a r s e r

would go through the same sequence

of steps

III.2.2 Right-branching analysis

The first word of a purely right-

branching s t r u c t u r e is analysed ex-

actly as for a left-branching one, t h a t

is, with 5a and 5b, with results as in

Figure 1 (repeated here as Figure 3):

z% Figure 3

Subsequent words, except the last,

are processed via steps 5a and 5c, with

the result remaining as the sole stack

entry, as in Figure 4

Figure 4

Again it should be evident t h a t cy- cling steps 5a and 5c will construct a right-branching structure of increas- ing depth as the sole stack entry, with one r i g h t daughter, of the most embedded node, w a i t i n g to be filled Again, the last input word will be processed by step 4 The complete sequence of steps for any right-branch-

i n g a n a l y s i s is t h u s r a i s e m raise&compose* reduce A categorial g r a m m a r parser with a compose- first s t r a t e g y would go t h r o u g h an isomorphic sequence of steps

III.2.3 Mixed Left- and Right-branching Analysis

All the steps in algorithm CR-II have now been illustrated, but we have yet to see the stack grow beyond one entry This will occur in where an in- dividual word, as opposed to a completed complex c o n s t i t u e n t , is processed by steps 5a and 5b, t h a t is, where steps 5a and 5b apply other than

to the results of step 4

Consider for instance the sentence

"the child believes t h a t the dog likes biscuits ~ With a g r a m m a r which I

t r u s t will be obvious, we would arrive

at the s t r u c t u r e shown in Figure 5

a f t e r processing "the child believes

t h a t ~, h a v i n g done r a i s e - - r e d u c e &

r a i s e J r a i s e & c o m p o s e - - raise&compose, t h a t is, a bit of left- branching analysis, followed by a bit of right-branching analysis

Trang 9

V P

S'

with "the" which will allow immediate

integration with this The ND correct

p a t h a p p l i e s s t e p s 5a a n d 5b,

raise&push, giving a stack as shown

in Figure 6:

S

N P

V P

S

t h e c h i l d b e l i e v e s t h a t

Figure 6

We can then apply steps 4, 5a and

5c, reduce&raise&compose, to "dog",

with the r e s u l t shown in Figure 7

This puts uss back on the s t a n d a r d

right-branching path for the rest of the

sentence

Figure 7

III.3 An Alternative View of CR-II

Returning to a question raised ear- lier, we can now see how a c h a r t

p a r s e r could be modified in order to run in real-time given enough processors to empty the agenda as fast as it is filled We can reproduce the processing of CR-II w i t h i n the active chart

p a r s i n g framework by two modifications to the fundamental rule (see e.g Gazdar and Mellish 1989 or Thompson and Ritchie 1984 for a tutorial introduction to active chart parsing) First

we restrict its normal operation, in which an active and an inactive edge are combined, to apply only in the case

of pre-terminal inactive edges This corresponds to the fact t h a t in CR-II step 4, the reduction step, applies only

to pre-terminal categories (continuing

to ignore unit productions) Secondly

we allow the f u n d a m e n t a l rule to combine two active edges, provided the category to be produced by one is what

is required by the other This effects composition If we now run our chart

p a r s e r left-to-right, left-corner and breadth-first, it will duplicate CR-II

Trang 10

The m a x i m u m number of edges along

a given analysis path which can be in-

troduced by the processing of a single

word is now at most four, correspond-

ing to steps 2, 4, 5a and 5c of CR-IIDthe

pre-terminal itself, a constituent com-

pleted by it, an active edge containing

t h a t constituent as left daughter, cre-

ated by left-corner rule invocation, and

a f u r t h e r active edge combining t h a t

one with one to its left This in t u r n

means t h a t there is a fixed limit to the

a m o u n t of processing r e q u i r e d for

each word

III.4 Implementation and Efficiency

A l t h o u g h clearly not b e n e f i t i n g

from as m u c h p r e - c o m p u t a t i o n of

structure as CR-I, CR-II is also quite ef-

ficient Two modifications can be

added to improve efficiencyDa reach-

ability filter on step 5b, and a shaper

test (Kuno 1965), also on 5b For the

latter, we need simply keep a count of

the number of open nodes on the stack

(equal to the number of stack entries if

all rules are binary), and ensure t h a t

this number never exceeds the num-

ber of words remaining in the input,

as each entry will require a number of

words equal to the number of its open

nodes to pop it off the stack This test

actually cuts down the number of non-

d e t e r m i n i s t i c p a t h s quite d r a m a t i -

cally, as the ND optionality of step 5b

means t h a t quite deep stacks would

o t h e r w i s e be p u r s u e d along some

search paths Again this reduction in

search space is of limited significance

in a true parallel implementation, but

in the serial simulation it makes a big

difference

Note also t h a t no attention has been

paid to u n i t productions, which we

pre-compute as in CR-I Furthermore,

n e i t h e r CR-I nor CR-II address empty

productions, whose effect would also need to be pre-computed

IV CONCLUSIONS

Aside from the intrinsic interest in the abstract of real-time parsablility, is

t h e r e a n y practical significance to these results Two drawbacks, one al-

r e a d y referred to, certainly r e s t r i c t their significance One is t h a t the re- striction to atomic category CF-PSGs is crucial the fact t h a t the comparison between a rule element and a node la- bel is atomic and constant time is fundamental Any move to features or other annotations would put an end to real-time processing This fact gives added weight to the problem men- tioned above in section II,4, t h a t only

l e f t - c o m m o n a n a l y s i s r e s u l t s are shared between alternatives Thus if one finesses the atomic category problem by using a p a r s e r such as those described here only as the first pass of

a two pass system, one is only putting

off the payment of the complexity price

to the second pass, in the absence to date of any linear time solution to the constraint satisfaction problem On this basis, one would clearly prefer a parallel CKY/Earley algorithm, which does share all common substructure,

to the parsers presented here

Nevertheless, there is one class of applications where the left-to-right

r e a l - t i m e b e h a v i o u r of these algo-

r i t h m s m a y be of practical benefit,

n a m e l y i n s p e e c h r e c o g n i t i o n

P r e s e n t day systems require on-line availability of syntactic and domain-

s e m a n t i c c o n s t r a i n t to l i m i t the search space at lower levels of the system Hitherto this has m e a n t these constraints m u s t be brought to bear during recognition as some form of

r e g u l a r g r a m m a r , e i t h e r explicitly

Tiêu đề	Compose-reduce parsing
Tác giả	Henry S. Thompson, Mike Dixon, John Lamping
Trường học	University of Edinburgh
Chuyên ngành	Human Communication Research
Thể loại	báo cáo khoa học
Thành phố	Edinburgh

Định dạng
Số trang	11
Dung lượng	786,45 KB