Báo cáo khoa học: "LAZY UNIFICATION" pptx

Wroblewski 1987 presents what he calls a non-destructive unification algorithm that avoids destruction by incrementally copying the DAG nodes as necessary... Furthermore, in both success

Trang 1

L A Z Y U N I F I C A T I O N

Kurt Godden Computer Science Department General Motors Research Laboratories Warren, MI 48090-9055, USA CSNet: godden@gmr.com

A B S T R A C T Unification-based NL parsers that copy

argument graphs to prevent their destruction

suffer from inefficiency Copying is the

most expensive operation in such parsers,

and several methods to reduce copying have

been devised with varying degrees of

success Lazy Unification is presented here

as a new, conceptually elegant solution that

reduces copying by nearly an order of

magnitude Lazy Unification requires no new

slots in the structure of nodes, and only

n o m i n a l r e v i s i o n s to the u n i f i c a t i o n

algorithm

PROBLEM STATEMENT

d e g r a d a t i o n in p e r f o r m a n c e This performance drain is illustrated in Figure 1, where average parsing statistics are given for the original i m p l e m e n t a t i o n o f graph unification in the TASLINK natural language system TASLINK was built upon the LINK parser in a joint project between GM Research and the University of Michigan LINK is a descendent of the MOPTRANS system developed by Lytinen (1986) The statistics below are for ten sentences parsed by TASLINK As can be seen, copying consumes more computation time than unification

20.0 19 9 1 %

Unification is widely used in natural

language processing (NLP) as the primary

operation during parsing The data

structures unified are directed acyelic

graphs (DAG's), used to encode grammar

rules, lexical entries and intermediate

parsing structures A crucial point

concerning unification is that the resulting

DAG is constructed directly from the raw

material of its input DAG's, i.e unification

is a d e s t r u c t i v e operation This is especially

important when the input DAG's are rules of

the grammar or lexical items If nothing

were done to prevent their destruction

during unification, then the grammar would

no longer have a correct rule, nor the lexicon

a valid lexical entry for the DAG's in

question They would have been transformed

into the unified DAG as a side effect

The simplest way to avoid destroying

grammar rules and lexical entries by

unification is to copy each argument DAG

prior to calling the unification routine This

is sufficient to avoid the problem of

destruction, but the copying itself then

becomes p r o b l e m a t i c , c a u s i n g severe

b / 1 7 V o

I- Unification • Copying [ ] Other j

Figure 1 Relative Cost of Operations during Parsing

PAST SOLUTIONS Improving the efficiency of unification has been an active area of research in unification-based NLP, where the focus has been on reducing the amount of DAG copying, and several approaches have arisen Different versions of structure sharing were employed by Pereira (1985) as well as Karttunen and Kay (1985) In Karttunen (1986) structure sharing was abandoned for

a technique allowing reversible unification Wroblewski (1987) presents what he calls a non-destructive unification algorithm that avoids destruction by incrementally copying the DAG nodes as necessary

1 8 0

Trang 2

All of these approaches to the copying

problem suffer from difficulties of their

own For both Pereira and Wroblewski there

are special cases involving convergent arcs

ares from two or more nodes that point to the

same destination node that still require full

copying In Karttunen and Kay's version of

structure sharing, all DAG's are represented

as b i n a r y branching DAG's, even though

g r a m m a r r u l e s are m o r e n a t u r a l l y

r e p r e s e n t e d as n o n - b i n a r y s t r u c t u r e s

Reversible unification requires two passes

over the input DAG's, one to unify them and

another to copy the result Furthermore, in

both successful and unsuccesful unification

the input DAG's must be restored to their

original forms because reversible unification

allows them to be destructively modified

W r o b l e w s k i p o i n t s out a u s e f u l

distinction between early copying and o v e r

copying Early copying refers to the copying

of input DAG's before unification is applied

This can lead to i n e f f i c i e n c y when

unification fails because only the copying u p

to the point of failure is necessary Over

copying refers to the fact that when the two

input DAG's are copied they are copied in

their entirety Since the resultant unified

DAG generally has fewer total nodes than the

two input DAG's, more nodes than necessary

were c o p i e d to p r o d u c e the r e s u l t

Wroblewski's algorithm eliminates early

copying entirely, but as noted above it can

partially over copy on DAG's involving

convergent arcs Reversible unification may

also over copy, as will be shown below

LAZY U N I F I C A T I O N

I now present Lazy Unification (LU)

as a new approach to the copying problem In

the following section I will present statistics

which indicate that LU accomplishes n e a r l y

an order of magnitude reduction in copying

compared to non-lazy, or eager unification

(EU) These results are attained by turning

DAG's into active data structures to

implement the lazy evaluation of copying

Lazy e v a l u a t i o n is an o p t i m i z a t i o n

technique developed for the interpretation of

functional programming languages (Field and

181

Harrison, 1988), and has been extended to theorem proving and logic programming in attempts to integrate that paradigm with functional programming (Reddy, 1986) The concept underlying lazy evaluation

is simple: delay the operation being optimized until the value it produces is needed by the calling program, at which point the delayed operation is forced These actions may be implemented by high-level procedures called delay and f o r c e Delay is used in place of the original call to the procedure being optimized, and force is inserted into t h e program at each location where the results of the delayed procedure are needed

Lazy evaluation is a good technique for the copying problem in graph unification precisely because the overwhelming majority

of copying is unnecessary If all copying can

be delayed until a destructive change is about to occur to a DAO, then both early copying and over copying can be completely

e l i m i n a t e d

T h e d e l a y o p e r a t i o n is e a s i l y implemented by using closures A closure is

a compound object that is both procedure and data In the context of LU, the data portion

of a closure is a DAG node The procedural code within a closure is a function that processes a variety of messages sent to the closure One may generally think of the encapsulated procedure as being a suspended call to the copy function Let us refer to these closures as active nodes as contrasted with a simple node not combined with a procedure in a closure The delay function returns an active node when given a simple node as its argument For now let us assume that delay behaves as the identity function when applied to an active node That is, it returns an active node unchanged As a mnemonic we will refer to the delay function

as d e l a y - c o p y - t h e - d a g

We now redefine DAG's t o allow either simple or active nodes wherever simple nodes were previously allowed in a DAG An active node will be notated in subsequent diagrams by enclosing the node in angle

b r a c k e t s

Trang 3

In LU the unification algorithm proceeds

largely as it did before, except that at every

point in the algorithm where a destructive

change is about to be made to an active node,

that node is first replaced by a copy of its

encapsulated node This replacement is

mediated through the force function, which

we shall call force-delayed-copy In the case

of a simple node argument force-delayed-

copy acts as the identity function, but when

given an active node it invokes the suspended

copy procedure with the encapsulated node

as argument Force-delayed-copy returns

the DAG that results from this invocation

To avoid copying an entire DAG when

only its root node is going to be modified by

unification, the copying function is also

rewritten The new version of copy-the-dag

takes an optional argument to control how

much of the DAG is to be copied The default

is to copy the entire argument, as one would

expect of a function called c o p y - t h e - d a g

But when copy-the-dag is called from inside

an active node (by f o r c e - d e l a y e d - c o p y

invoking the procedural portion of the active

node), then the optional argument is

supplied with a flag that causes copy-the-

dag to copy o n l y the root node of its

argument The nodes at the ends of the

outgoing arcs from the new root become

active nodes, created by delaying the

original nodes in those positions No

traversal of the DAG takes place and the

deeper nodes are only present implicitly

through the active nodes of the resulting

DAG This is illustrated in Figure 2

v _ ~ g J

becomes

a2<><c>

"~<d>

Figure 2 Copy-the-dag on 'a' from Inside an Active Node

Here, DAG a was initially encapsulated

in a closure as an active node When a is about to undergo a destructive change by being unified with some other DAG, force- delayed-copy activates the suspended call to

c o p y - t h e - d a g with DAG a as its first argument and the message d e l a y - a r e s as its optional argument Copy-the-dag then copies only node a, returning a2 with outgoing arcs pointing at active nodes that encapsulate the original destination nodes b, e, and d DAG a2 may then be unified with another DAG

w i t h o u t d e s t r o y i n g DAG a, and the unification algorithm proceeds with the active nodes , <c>, and < d > As these subdag's are modified, their nodes are likewise copied incrementally Figure 3 illustrates this by showing DAG a 2 a f t e r unifying It may be seen that as active nodes are copied one by one, the resulting unified DA(3 is eventually constructed

b2 a2~i<c>

"~<d>

Figure 3 DAG a2 after Unifying One can see how this scheme reduces the amount o f c o p y i n g if, f o r e x a m p l e , unification fails at the active node < e > In this case only nodes a and b will have been copied and none of the nodes e, d, e, f, g, or

h Copying is also reduced when unification succeeds, this reduction being achieved in two ways

Trang 4

First, lazy unification only creates new

n o d e s for the D A G that r e s u l t s f r o m

unification Generally this DAG has fewer

total nodes than the two input DAG's For

example, if the 8-node DAG a in Figure 2

were unified with the 2-node DAG a - - > i , then

the resulting D A G would have o n l y nine

nodes, not ten The result DAG would have

the arc ' - - > i ' copied onto the 8-node DAG's

root Thus, while EU would copy all ten

original nodes, only nine are necessary for

the result

Active nodes that remain in a final DAG

represent the other savings for successful

unification Whereas E U copies all ten

original nodes to create the 9-node result,

LU would only create five new nodes during

unification, resulting in the DAG o f Figure 4

Note that the "missing" nodes e, f, g, and h

are implicit in the active nodes and did not

require copying For larger DAG's, this kind

of savings in node copying can be significant

as several large sub-DAG's m a y survive

uncopied in the final DAG

a2 ~ <c>

Figure 4 Saving Four Node Copies

with Active Nodes

A useful c o m p a r i s o n with Karttunen's

r e v e r s i b l e u n i f i c a t i o n may now be made

Recall that when reversible unification is

successful the resulting DAG is copied and

the originals restored N o t i c e that this

copying of the entire resulting DAG may

overcopy some of the sub-DAG's This is

evident because we have just seen in LU that

some of the sub-DAG's o f a resulting DAG

remain uncopied inside active nodes Thus,

LU offers less real copying than reversible

u n i f i c a t i o n Let us look again at DAG a in Figure 2 and discuss a potential problem with lazy unification as described thus far Let us suppose that through unification a has been partially copied resulting in the DAG shown

in Figure 5, with active node < f > about to be

c o p i e d

d>

Figure 5 DAG 'a' Partially Copied

Recall from Figure 2 that node f points at

e Following the procedure described above,

< f > would be copied to f2 which would then point at active node < e > , which could lead to

another node e 3 as shown in Figure 6 What

is n e e d e d is some form of m e m o r y to recognize that e was already copied once and that f2 needs to point at e 2 not < e >

b

c<

d>

Figure 6 Erroneous Splitting of Node

e into e2 a n d e3

This memory is implemented with a copy

e n v i r o n m e n t , which is an association list relating o r i g i n a l n o d e s to t h e i r copies Before f2 is given an arc pointing at <e>, this alist is searched to see if e has already been copied Since it has, e 2 is returned as the destination node for the outgoing arc from

f 2 , thus preserving the t o p o g r a p h y of the original DAG

183

Trang 5

B e c a u s e there are s e v e r a l D A G ' s that

m u s t be p r e s e r v e d during the c o u r s e o f

p a r s i n g , the c o p y e n v i r o n m e n t c a n n o t be

global but must be associated with each D A G

for which it r e c o r d s the c o p y i n g history

T h i s is a c c o m p l i s h e d b y e n c a p s u l a t i n g a

particular D A G ' s c o p y e n v i r o n m e n t in each

of the active nodes of that DAG Looking

again at Figure 2, the active nodes for D A G

a 2 are all created in the scope of a variable

bound to an initially e m p t y association list

for a 2 ' s c o p y e n v i r o n m e n t T h u s , the

closures that i m p l e m e n t the a c t i v e n o d e s

, <c>, and < d > all have access to the same

copy environment When invokes the

s u s p e n d e d call to c o p y - t h e - d a g , this

function adds the pair ( b b 2 ) t o the c o p y

environment as a side effect before returning

its value b 2 When this occurs, < c > and < d >

instantly have access to the new pair through

t h e i r s h a r e d a c c e s s to the s a m e c o p y

environment Furthermore, when new active

nodes are created as traversal of the D A G

continues during unification, they are also

c r e a t e d in the s c o p e o f the s a m e c o p y

e n v i r o n m e n t Thus, this alist is p u s h e d

f o r w a r d deeper into the nodes of the parent

D A G as part of the data portion of each active

n o d e

Returning to Figure 5, the pair ( e e2)

was added to the c o p y e n v i r o n m e n t being

maintained for D A G a 2 when e was copied to

e2 Active node <f> was created in the scope

of this list and therefore "remembers" at the

time f2 is created that it should point to the

previously created e2 and not to a new active

node <e>

There is one more m e c h a n i s m needed to

correctly i m p l e m e n t c o p y environments We

h a v e a l r e a d y seen how some active nodes

r e m a i n after unification As i n t e r m e d i a t e

D A G ' s a r e r e u s e d d u r i n g t h e

n o n d e t e r m i n i s t i c p a r s i n g and are u n i f i e d

with other DAG's, it can happen that some of

t h e s e r e m a i n i n g a c t i v e n o d e s b e c o m e

descendents o f a root different f r o m their

original root node As those new root DAG's

are i n c r e m e n t a l l y c o p i e d during unification,

a s i t u a t i o n can arise w h e r e b y an a c t i v e

node's p a r e n t node is copied and then an

184

attempt is made to create an active node out

of an active node

For example, let us suppose that the D A G shown in Figure 5 is a s u b - D A G o f some larger DAG Let us refer to the root of that larger D A G as node n As unification of n

p r o c e e d s , we m a y r e a c h a 2 and start

i n c r e m e n t a l l y c o p y i n g it T h i s c o u l d eventually result in c2 being copied to c3 at which point the system will attempt to create

an outgoing arc for c 3 pointing at a newly created active node o v e r the already active node < f > There is no need to try to create such a beast as < < f > > Rather, what is needed

is to assure that active node <f> be g i v e n access to the new copy environment for n passed d o w n to < f > f r o m its p r e d e c e s s o r

n o d e s T h i s is a c c o m p l i s h e d b y

destructively merging the n e w c o p y environment with that previously created for

a 2 and surviving inside < f > It is important that this m e r g e be destructive in order to give all active nodes that are descendents of

n access to the same information so that the

p r o b l e m o f n o d e s p l i t t i n g i l l u s t r a t e d in Figure 6 continues to be avoided

It was mentioned previously how calls to

force-delayed-copy must be inserted into the

u n i f i c a t i o n a l g o r i t h m to i n v o k e t h e

i n c r e m e n t a l c o p y i n g o f n o d e s A n o t h e r

m o d i f i c a t i o n to t h e a l g o r i t h m is a l s o

n e c e s s a r y as a result o f this i n c r e m e n t a l copying Since active nodes are replaced by new nodes in the middle of unification, the algorithm must undergo a revision to effect this replacement For example, in Figure 5

in order for to be replaced by b 2 , the corresponding arc f r o m a 2 must be replaced Thus as the unification algorithm traverses a DAG, it also collects such r e p l a c e m e n t s in order to reconstruct the outgoing arcs of a parent DAG

In addition to the m e s s a g e delay-arcs

sent to an a c t i v e n o d e to i n v o k e the

s u s p e n d e d c a l l to c o p y - t h e - d a g , o t h e r

m e s s a g e s are needed In order to c o m p a r e

a c t i v e n o d e s a n d m e r g e t h e i r c o p y environments, the active nodes must process messages that cause the active node to return

Trang 6

either its encapsulated node's l a b e l or the

E F F E C T I V E N E S S OF LAZY

U N I F I C A T I O N Lazy Unification results in an impressive

reduction to the amount of copying during

parsing This in turn reduces the overall

slice of parse time consumed by copying as

can be seen by contrasting Figure 7 with

Figure 1 Keep in mind that these charts

illustrate proportional computations, not

speed The pie shown below should be

viewed as a smaller pie, representing faster

parse times, than that in Figure 1 Speed is

discussed below

4 5 7 8 %

18.67%

J l~ Unification • Copying [ ] Other I "

Figure 7 Relative Cost of Operations

with Lazy Unification

Lazy Unification copies less than 7% of

the nodes copied under eager unification

However, this is not a fair comparison with

EU because LU substitutes the creation of

active nodes for some of the copying To get a

truer c o m p a r i s o n o f L a z y vs E a g e r

Unification, we must add together the

number of copied nodes and active nodes

created in LU Even when active nodes are

taken into account, the results are highly

favorable toward LU because again less than

7% of the nodes copied under EU are

accounted for by active nodes in LU

Combining the active nodes with copies, LU

still accounts for an 87% reduction over

eager unification Figure 8 graphically

illustrates this difference for ten sentences

30000

Number

Nodes

10000

Eager Lazy Active Copies Copies Nodes

Figure 8 Comparison of Eager v s

Lazy Unification

From the time slice of eager copying shown in Figure 1, we can see that if LU were

to incur no overhead then an 87% reduction

of copying would result in a faster parse of roughly 59% The actual speedup is about 50%, indicating that the o v e r h e a d of implementing LU is 9% However, the 50% speedup does not consider the effects of garbage collection or paging since they are system dependent These effects will be more pronounced in EU than LU because in the former paradigm more data structures are created and referenced In practice, therefore, LU performs at better than twice the speed of EU

There are several sources of overhead in

LU, The major cost is incurred in distinguishing between active and simple nodes In our Common Lisp implementation simple DAG nodes are defined as named

structures and active nodes as closures

Hence, they are distinguished by the Lisp predicates D A G - P and F U N C T I O N P Disassembly on a Symbolics machine shows both predicates to be rather costly (The functions T Y P E - O F and T Y P E P could also

be used, but they are also expensive.)

185

Trang 7

Another expensive operation occurs when

the copy environments in active nodes are

searched Currently, these environments are

simple a s s o c i a t i o n lists which r e q u i r e

sequential searching As was discussed

a b o v e , the c o p y e n v i r o n m e n t s must

sometimes be merged The merge function

presently uses the U N I O N function While a

far less expensive destructive concatenation

of copy environments could be employed, the

union operation was chosen initially as a

simple way to avoid creation of circular lists

during merging

All of these sources of overhead can and

will be attacked by additional work Nodes

can be defined as a tagged data structure,

a l l o w i n g an i n e x p e n s i v e tag test to

distinguish b e t w e e n active and inactive

nodes A non-sequential data structure

could allow faster than linear searching of

copy e n v i r o n m e n t s and more e f f i c i e n t

merging These and additional modifications

are expected to eliminate most of the

o v e r h e a d i n c u r r e d by the c u r r e n t

implementation of LU In any case, Lazy

Unification was developed to reduce the

amount of copying during unification and we

have seen its dramatic success in achieving

that goal

CONCLUDING REMARKS

There is another optimization possible

regarding certain leaf nodes of a DAG

Depending on the application using graph

unification, a subset of the leaf nodes will

never be unified with other DAG's In the

T A S L I N K a p p l i c a t i o n these are nodes

representing such features as third person

singular This observation can be exploited

under both lazy and eager unification to

reduce both c o p y i n g and active node

creation See Godden (1989) for details

It has been my experience that using

lazy evaluation as an optimization technique

for graph unification, while elegant in the

end result, is slow in development time due

to the difficulties it presents for debugging

This property is intrinsic to lazy evaluation,

(O'Donnell and Hall, 1988)

186

The problem is that a DAG is no longer copied locally because the copy operation is suspended in the active nodes When a DAG is e v e n t u a l l y copied, that copying is p e r f o r m e d i n c r e m e n t a l l y and therefore n o n - l o c a l l y in both time and program space In spite of this distributed nature o f the o p t i m i z e d p r o c e s s , the programmer continues to conceptualize the operation as occurring locally as it would occur in the non-optimized eager mode As a result o f this m i s m a t c h b e t w e e n the programmer's visualization of the operation and its actual e x e c u t i o n , bugs are

n o t o r i o u s l y d i f f i c u l t to t r a c e T h e development time for a program employing lazy evaluation is, therefore, much longer than would be expected Hence, this technique should only be employed when the possible efficiency gains are expected to be large, as they are in the case of graph unification O'Donnell and Hall present an excellent discussion o f these and other problems and offer insight into how tools may be built to alleviate some of them

REFERENCES

Field, Anthony J and Peter G Harrison

1988 Functional Programming Reading, MA: Addison-Wesley

Godden, Kurt 1989 "Improving the Efficiency of Graph Unification." Internal technical report GMR-6928 General Motors Research Laboratories Warren, MI

Karttunen, Lauri 1986 D-PATR: A Development Environment for Unification- Based Grammars Report No CSLI-86-61 Stanford, CA

Karttunen, Lauri and Martin Kay 1985

" S t r u c t u r e - S h a r i n g with B i n a r y T r e e s "

Proceedings of the 23 rd Annual Meeting of the A s s o c i a t i o n f o r C o m p u t a t i o n a l Linguistics Chicago, IL: ACL pp 133- 136A

Lytinen, Steven L 1986 "Dynamically Combining Syntax and Semantics in Natural Language Processing." Proceedings of the

Trang 8

5 t h National Conference on Artificial Intelligence Philadelphia, PA- AAAI pp 574-578

O'Donnell, John T and Cordelia V Hall

1988 "Debugging in A p p l i c a t i v e Languages." Lisp and Symbolic Computation,

1/2 pp 113-145

Pereira, Fernando C N 1985 "A

S t r u c t u r e - S h a r i n g R e p r e s e n t a t i o n for Unification-Based Grammar Formalisms."

Proceedings of the 23 rd Annual Meeting of the Association f o r Computational Linguistics Chicago, IL: ACL pp 137-144

Relationship between Logic and Functional Languages," in Doug DeGroot and Gary Lindstrom, eds Logic P r o g r a m m i n g : Functions, Relations, and Equations

Englewood Cliffs, NJ Prentice-Hall pp 3-

36

W r o b l e w s k i , David A 1987

" N o n d e s t r u c t i v e Graph U n i f i c a t i o n "

Proceedings of the 6 th National Conference

on Artificial Intelligence Seattle, WA: AAAI pp 582-587

1 8 7

Định dạng
Số trang	8
Dung lượng	320,9 KB