Wroblewski 1987 presents what he calls a non-destructive unification algorithm that avoids destruction by incrementally copying the DAG nodes as necessary... Furthermore, in both success
Trang 1L A Z Y U N I F I C A T I O N
Kurt Godden Computer Science Department General Motors Research Laboratories Warren, MI 48090-9055, USA CSNet: godden@gmr.com
A B S T R A C T Unification-based NL parsers that copy
argument graphs to prevent their destruction
suffer from inefficiency Copying is the
most expensive operation in such parsers,
and several methods to reduce copying have
been devised with varying degrees of
success Lazy Unification is presented here
as a new, conceptually elegant solution that
reduces copying by nearly an order of
magnitude Lazy Unification requires no new
slots in the structure of nodes, and only
n o m i n a l r e v i s i o n s to the u n i f i c a t i o n
algorithm
PROBLEM STATEMENT
d e g r a d a t i o n in p e r f o r m a n c e This performance drain is illustrated in Figure 1, where average parsing statistics are given for the original i m p l e m e n t a t i o n o f graph unification in the TASLINK natural language system TASLINK was built upon the LINK parser in a joint project between GM Research and the University of Michigan LINK is a descendent of the MOPTRANS system developed by Lytinen (1986) The statistics below are for ten sentences parsed by TASLINK As can be seen, copying consumes more computation time than unification
20.0 19 9 1 %
Unification is widely used in natural
language processing (NLP) as the primary
operation during parsing The data
structures unified are directed acyelic
graphs (DAG's), used to encode grammar
rules, lexical entries and intermediate
parsing structures A crucial point
concerning unification is that the resulting
DAG is constructed directly from the raw
material of its input DAG's, i.e unification
is a d e s t r u c t i v e operation This is especially
important when the input DAG's are rules of
the grammar or lexical items If nothing
were done to prevent their destruction
during unification, then the grammar would
no longer have a correct rule, nor the lexicon
a valid lexical entry for the DAG's in
question They would have been transformed
into the unified DAG as a side effect
The simplest way to avoid destroying
grammar rules and lexical entries by
unification is to copy each argument DAG
prior to calling the unification routine This
is sufficient to avoid the problem of
destruction, but the copying itself then
becomes p r o b l e m a t i c , c a u s i n g severe
b / 1 7 V o
I- Unification • Copying [ ] Other j
Figure 1 Relative Cost of Operations during Parsing
PAST SOLUTIONS Improving the efficiency of unification has been an active area of research in unification-based NLP, where the focus has been on reducing the amount of DAG copying, and several approaches have arisen Different versions of structure sharing were employed by Pereira (1985) as well as Karttunen and Kay (1985) In Karttunen (1986) structure sharing was abandoned for
a technique allowing reversible unification Wroblewski (1987) presents what he calls a non-destructive unification algorithm that avoids destruction by incrementally copying the DAG nodes as necessary
1 8 0
Trang 2All of these approaches to the copying
problem suffer from difficulties of their
own For both Pereira and Wroblewski there
are special cases involving convergent arcs
ares from two or more nodes that point to the
same destination node that still require full
copying In Karttunen and Kay's version of
structure sharing, all DAG's are represented
as b i n a r y branching DAG's, even though
g r a m m a r r u l e s are m o r e n a t u r a l l y
r e p r e s e n t e d as n o n - b i n a r y s t r u c t u r e s
Reversible unification requires two passes
over the input DAG's, one to unify them and
another to copy the result Furthermore, in
both successful and unsuccesful unification
the input DAG's must be restored to their
original forms because reversible unification
allows them to be destructively modified
W r o b l e w s k i p o i n t s out a u s e f u l
distinction between early copying and o v e r
copying Early copying refers to the copying
of input DAG's before unification is applied
This can lead to i n e f f i c i e n c y when
unification fails because only the copying u p
to the point of failure is necessary Over
copying refers to the fact that when the two
input DAG's are copied they are copied in
their entirety Since the resultant unified
DAG generally has fewer total nodes than the
two input DAG's, more nodes than necessary
were c o p i e d to p r o d u c e the r e s u l t
Wroblewski's algorithm eliminates early
copying entirely, but as noted above it can
partially over copy on DAG's involving
convergent arcs Reversible unification may
also over copy, as will be shown below
LAZY U N I F I C A T I O N
I now present Lazy Unification (LU)
as a new approach to the copying problem In
the following section I will present statistics
which indicate that LU accomplishes n e a r l y
an order of magnitude reduction in copying
compared to non-lazy, or eager unification
(EU) These results are attained by turning
DAG's into active data structures to
implement the lazy evaluation of copying
Lazy e v a l u a t i o n is an o p t i m i z a t i o n
technique developed for the interpretation of
functional programming languages (Field and
181
Harrison, 1988), and has been extended to theorem proving and logic programming in attempts to integrate that paradigm with functional programming (Reddy, 1986) The concept underlying lazy evaluation
is simple: delay the operation being optimized until the value it produces is needed by the calling program, at which point the delayed operation is forced These actions may be implemented by high-level procedures called delay and f o r c e Delay is used in place of the original call to the procedure being optimized, and force is inserted into t h e program at each location where the results of the delayed procedure are needed
Lazy evaluation is a good technique for the copying problem in graph unification precisely because the overwhelming majority
of copying is unnecessary If all copying can
be delayed until a destructive change is about to occur to a DAO, then both early copying and over copying can be completely
e l i m i n a t e d
T h e d e l a y o p e r a t i o n is e a s i l y implemented by using closures A closure is
a compound object that is both procedure and data In the context of LU, the data portion
of a closure is a DAG node The procedural code within a closure is a function that processes a variety of messages sent to the closure One may generally think of the encapsulated procedure as being a suspended call to the copy function Let us refer to these closures as active nodes as contrasted with a simple node not combined with a procedure in a closure The delay function returns an active node when given a simple node as its argument For now let us assume that delay behaves as the identity function when applied to an active node That is, it returns an active node unchanged As a mnemonic we will refer to the delay function
as d e l a y - c o p y - t h e - d a g
We now redefine DAG's t o allow either simple or active nodes wherever simple nodes were previously allowed in a DAG An active node will be notated in subsequent diagrams by enclosing the node in angle
b r a c k e t s
Trang 3In LU the unification algorithm proceeds
largely as it did before, except that at every
point in the algorithm where a destructive
change is about to be made to an active node,
that node is first replaced by a copy of its
encapsulated node This replacement is
mediated through the force function, which
we shall call force-delayed-copy In the case
of a simple node argument force-delayed-
copy acts as the identity function, but when
given an active node it invokes the suspended
copy procedure with the encapsulated node
as argument Force-delayed-copy returns
the DAG that results from this invocation
To avoid copying an entire DAG when
only its root node is going to be modified by
unification, the copying function is also
rewritten The new version of copy-the-dag
takes an optional argument to control how
much of the DAG is to be copied The default
is to copy the entire argument, as one would
expect of a function called c o p y - t h e - d a g
But when copy-the-dag is called from inside
an active node (by f o r c e - d e l a y e d - c o p y
invoking the procedural portion of the active
node), then the optional argument is
supplied with a flag that causes copy-the-
dag to copy o n l y the root node of its
argument The nodes at the ends of the
outgoing arcs from the new root become
active nodes, created by delaying the
original nodes in those positions No
traversal of the DAG takes place and the
deeper nodes are only present implicitly
through the active nodes of the resulting
DAG This is illustrated in Figure 2
v _ ~ g J
becomes
<b>
a2<><c>
"~<d>
Figure 2 Copy-the-dag on 'a' from Inside an Active Node
Here, DAG a was initially encapsulated
in a closure as an active node When a is about to undergo a destructive change by being unified with some other DAG, force- delayed-copy activates the suspended call to
c o p y - t h e - d a g with DAG a as its first argument and the message d e l a y - a r e s as its optional argument Copy-the-dag then copies only node a, returning a2 with outgoing arcs pointing at active nodes that encapsulate the original destination nodes b, e, and d DAG a2 may then be unified with another DAG
w i t h o u t d e s t r o y i n g DAG a, and the unification algorithm proceeds with the active nodes <b>, <c>, and < d > As these subdag's are modified, their nodes are likewise copied incrementally Figure 3 illustrates this by showing DAG a 2 a f t e r unifying < b > It may be seen that as active nodes are copied one by one, the resulting unified DA(3 is eventually constructed
b2 a2~i<c>
"~<d>
Figure 3 DAG a2 after Unifying <b> One can see how this scheme reduces the amount o f c o p y i n g if, f o r e x a m p l e , unification fails at the active node < e > In this case only nodes a and b will have been copied and none of the nodes e, d, e, f, g, or
h Copying is also reduced when unification succeeds, this reduction being achieved in two ways
Trang 4First, lazy unification only creates new
n o d e s for the D A G that r e s u l t s f r o m
unification Generally this DAG has fewer
total nodes than the two input DAG's For
example, if the 8-node DAG a in Figure 2
were unified with the 2-node DAG a - - > i , then
the resulting D A G would have o n l y nine
nodes, not ten The result DAG would have
the arc ' - - > i ' copied onto the 8-node DAG's
root Thus, while EU would copy all ten
original nodes, only nine are necessary for
the result
Active nodes that remain in a final DAG
represent the other savings for successful
unification Whereas E U copies all ten
original nodes to create the 9-node result,
LU would only create five new nodes during
unification, resulting in the DAG o f Figure 4
Note that the "missing" nodes e, f, g, and h
are implicit in the active nodes and did not
require copying For larger DAG's, this kind
of savings in node copying can be significant
as several large sub-DAG's m a y survive
uncopied in the final DAG
<b>
a2 ~ <c>
Figure 4 Saving Four Node Copies
with Active Nodes
A useful c o m p a r i s o n with Karttunen's
r e v e r s i b l e u n i f i c a t i o n may now be made
Recall that when reversible unification is
successful the resulting DAG is copied and
the originals restored N o t i c e that this
copying of the entire resulting DAG may
overcopy some of the sub-DAG's This is
evident because we have just seen in LU that
some of the sub-DAG's o f a resulting DAG
remain uncopied inside active nodes Thus,
LU offers less real copying than reversible
u n i f i c a t i o n Let us look again at DAG a in Figure 2 and discuss a potential problem with lazy unification as described thus far Let us suppose that through unification a has been partially copied resulting in the DAG shown
in Figure 5, with active node < f > about to be
c o p i e d
d>
Figure 5 DAG 'a' Partially Copied
Recall from Figure 2 that node f points at
e Following the procedure described above,
< f > would be copied to f2 which would then point at active node < e > , which could lead to
another node e 3 as shown in Figure 6 What
is n e e d e d is some form of m e m o r y to recognize that e was already copied once and that f2 needs to point at e 2 not < e >
b
c<
d>
Figure 6 Erroneous Splitting of Node
e into e2 a n d e3
This memory is implemented with a copy
e n v i r o n m e n t , which is an association list relating o r i g i n a l n o d e s to t h e i r copies Before f2 is given an arc pointing at <e>, this alist is searched to see if e has already been copied Since it has, e 2 is returned as the destination node for the outgoing arc from
f 2 , thus preserving the t o p o g r a p h y of the original DAG
183
Trang 5B e c a u s e there are s e v e r a l D A G ' s that
m u s t be p r e s e r v e d during the c o u r s e o f
p a r s i n g , the c o p y e n v i r o n m e n t c a n n o t be
global but must be associated with each D A G
for which it r e c o r d s the c o p y i n g history
T h i s is a c c o m p l i s h e d b y e n c a p s u l a t i n g a
particular D A G ' s c o p y e n v i r o n m e n t in each
of the active nodes of that DAG Looking
again at Figure 2, the active nodes for D A G
a 2 are all created in the scope of a variable
bound to an initially e m p t y association list
for a 2 ' s c o p y e n v i r o n m e n t T h u s , the
closures that i m p l e m e n t the a c t i v e n o d e s
<b>, <c>, and < d > all have access to the same
copy environment When < b > invokes the
s u s p e n d e d call to c o p y - t h e - d a g , this
function adds the pair ( b b 2 ) t o the c o p y
environment as a side effect before returning
its value b 2 When this occurs, < c > and < d >
instantly have access to the new pair through
t h e i r s h a r e d a c c e s s to the s a m e c o p y
environment Furthermore, when new active
nodes are created as traversal of the D A G
continues during unification, they are also
c r e a t e d in the s c o p e o f the s a m e c o p y
e n v i r o n m e n t Thus, this alist is p u s h e d
f o r w a r d deeper into the nodes of the parent
D A G as part of the data portion of each active
n o d e
Returning to Figure 5, the pair ( e e2)
was added to the c o p y e n v i r o n m e n t being
maintained for D A G a 2 when e was copied to
e2 Active node <f> was created in the scope
of this list and therefore "remembers" at the
time f2 is created that it should point to the
previously created e2 and not to a new active
node <e>
There is one more m e c h a n i s m needed to
correctly i m p l e m e n t c o p y environments We
h a v e a l r e a d y seen how some active nodes
r e m a i n after unification As i n t e r m e d i a t e
D A G ' s a r e r e u s e d d u r i n g t h e
n o n d e t e r m i n i s t i c p a r s i n g and are u n i f i e d
with other DAG's, it can happen that some of
t h e s e r e m a i n i n g a c t i v e n o d e s b e c o m e
descendents o f a root different f r o m their
original root node As those new root DAG's
are i n c r e m e n t a l l y c o p i e d during unification,
a s i t u a t i o n can arise w h e r e b y an a c t i v e
node's p a r e n t node is copied and then an
184
attempt is made to create an active node out
of an active node
For example, let us suppose that the D A G shown in Figure 5 is a s u b - D A G o f some larger DAG Let us refer to the root of that larger D A G as node n As unification of n
p r o c e e d s , we m a y r e a c h a 2 and start
i n c r e m e n t a l l y c o p y i n g it T h i s c o u l d eventually result in c2 being copied to c3 at which point the system will attempt to create
an outgoing arc for c 3 pointing at a newly created active node o v e r the already active node < f > There is no need to try to create such a beast as < < f > > Rather, what is needed
is to assure that active node <f> be g i v e n access to the new copy environment for n passed d o w n to < f > f r o m its p r e d e c e s s o r
n o d e s T h i s is a c c o m p l i s h e d b y
destructively merging the n e w c o p y environment with that previously created for
a 2 and surviving inside < f > It is important that this m e r g e be destructive in order to give all active nodes that are descendents of
n access to the same information so that the
p r o b l e m o f n o d e s p l i t t i n g i l l u s t r a t e d in Figure 6 continues to be avoided
It was mentioned previously how calls to
force-delayed-copy must be inserted into the
u n i f i c a t i o n a l g o r i t h m to i n v o k e t h e
i n c r e m e n t a l c o p y i n g o f n o d e s A n o t h e r
m o d i f i c a t i o n to t h e a l g o r i t h m is a l s o
n e c e s s a r y as a result o f this i n c r e m e n t a l copying Since active nodes are replaced by new nodes in the middle of unification, the algorithm must undergo a revision to effect this replacement For example, in Figure 5
in order for < b > to be replaced by b 2 , the corresponding arc f r o m a 2 must be replaced Thus as the unification algorithm traverses a DAG, it also collects such r e p l a c e m e n t s in order to reconstruct the outgoing arcs of a parent DAG
In addition to the m e s s a g e delay-arcs
sent to an a c t i v e n o d e to i n v o k e the
s u s p e n d e d c a l l to c o p y - t h e - d a g , o t h e r
m e s s a g e s are needed In order to c o m p a r e
a c t i v e n o d e s a n d m e r g e t h e i r c o p y environments, the active nodes must process messages that cause the active node to return
Trang 6either its encapsulated node's l a b e l or the
E F F E C T I V E N E S S OF LAZY
U N I F I C A T I O N Lazy Unification results in an impressive
reduction to the amount of copying during
parsing This in turn reduces the overall
slice of parse time consumed by copying as
can be seen by contrasting Figure 7 with
Figure 1 Keep in mind that these charts
illustrate proportional computations, not
speed The pie shown below should be
viewed as a smaller pie, representing faster
parse times, than that in Figure 1 Speed is
discussed below
4 5 7 8 %
18.67%
J l~ Unification • Copying [ ] Other I "
Figure 7 Relative Cost of Operations
with Lazy Unification
Lazy Unification copies less than 7% of
the nodes copied under eager unification
However, this is not a fair comparison with
EU because LU substitutes the creation of
active nodes for some of the copying To get a
truer c o m p a r i s o n o f L a z y vs E a g e r
Unification, we must add together the
number of copied nodes and active nodes
created in LU Even when active nodes are
taken into account, the results are highly
favorable toward LU because again less than
7% of the nodes copied under EU are
accounted for by active nodes in LU
Combining the active nodes with copies, LU
still accounts for an 87% reduction over
eager unification Figure 8 graphically
illustrates this difference for ten sentences
30000
Number
Nodes
10000
Eager Lazy Active Copies Copies Nodes
Figure 8 Comparison of Eager v s
Lazy Unification
From the time slice of eager copying shown in Figure 1, we can see that if LU were
to incur no overhead then an 87% reduction
of copying would result in a faster parse of roughly 59% The actual speedup is about 50%, indicating that the o v e r h e a d of implementing LU is 9% However, the 50% speedup does not consider the effects of garbage collection or paging since they are system dependent These effects will be more pronounced in EU than LU because in the former paradigm more data structures are created and referenced In practice, therefore, LU performs at better than twice the speed of EU
There are several sources of overhead in
LU, The major cost is incurred in distinguishing between active and simple nodes In our Common Lisp implementation simple DAG nodes are defined as named
structures and active nodes as closures
Hence, they are distinguished by the Lisp predicates D A G - P and F U N C T I O N P Disassembly on a Symbolics machine shows both predicates to be rather costly (The functions T Y P E - O F and T Y P E P could also
be used, but they are also expensive.)
185
Trang 7Another expensive operation occurs when
the copy environments in active nodes are
searched Currently, these environments are
simple a s s o c i a t i o n lists which r e q u i r e
sequential searching As was discussed
a b o v e , the c o p y e n v i r o n m e n t s must
sometimes be merged The merge function
presently uses the U N I O N function While a
far less expensive destructive concatenation
of copy environments could be employed, the
union operation was chosen initially as a
simple way to avoid creation of circular lists
during merging
All of these sources of overhead can and
will be attacked by additional work Nodes
can be defined as a tagged data structure,
a l l o w i n g an i n e x p e n s i v e tag test to
distinguish b e t w e e n active and inactive
nodes A non-sequential data structure
could allow faster than linear searching of
copy e n v i r o n m e n t s and more e f f i c i e n t
merging These and additional modifications
are expected to eliminate most of the
o v e r h e a d i n c u r r e d by the c u r r e n t
implementation of LU In any case, Lazy
Unification was developed to reduce the
amount of copying during unification and we
have seen its dramatic success in achieving
that goal
CONCLUDING REMARKS
There is another optimization possible
regarding certain leaf nodes of a DAG
Depending on the application using graph
unification, a subset of the leaf nodes will
never be unified with other DAG's In the
T A S L I N K a p p l i c a t i o n these are nodes
representing such features as third person
singular This observation can be exploited
under both lazy and eager unification to
reduce both c o p y i n g and active node
creation See Godden (1989) for details
It has been my experience that using
lazy evaluation as an optimization technique
for graph unification, while elegant in the
end result, is slow in development time due
to the difficulties it presents for debugging
This property is intrinsic to lazy evaluation,
(O'Donnell and Hall, 1988)
186
The problem is that a DAG is no longer copied locally because the copy operation is suspended in the active nodes When a DAG is e v e n t u a l l y copied, that copying is p e r f o r m e d i n c r e m e n t a l l y and therefore n o n - l o c a l l y in both time and program space In spite of this distributed nature o f the o p t i m i z e d p r o c e s s , the programmer continues to conceptualize the operation as occurring locally as it would occur in the non-optimized eager mode As a result o f this m i s m a t c h b e t w e e n the programmer's visualization of the operation and its actual e x e c u t i o n , bugs are
n o t o r i o u s l y d i f f i c u l t to t r a c e T h e development time for a program employing lazy evaluation is, therefore, much longer than would be expected Hence, this technique should only be employed when the possible efficiency gains are expected to be large, as they are in the case of graph unification O'Donnell and Hall present an excellent discussion o f these and other problems and offer insight into how tools may be built to alleviate some of them
REFERENCES
Field, Anthony J and Peter G Harrison
1988 Functional Programming Reading, MA: Addison-Wesley
Godden, Kurt 1989 "Improving the Efficiency of Graph Unification." Internal technical report GMR-6928 General Motors Research Laboratories Warren, MI
Karttunen, Lauri 1986 D-PATR: A Development Environment for Unification- Based Grammars Report No CSLI-86-61 Stanford, CA
Karttunen, Lauri and Martin Kay 1985
" S t r u c t u r e - S h a r i n g with B i n a r y T r e e s "
Proceedings of the 23 rd Annual Meeting of the A s s o c i a t i o n f o r C o m p u t a t i o n a l Linguistics Chicago, IL: ACL pp 133- 136A
Lytinen, Steven L 1986 "Dynamically Combining Syntax and Semantics in Natural Language Processing." Proceedings of the
Trang 85 t h National Conference on Artificial Intelligence Philadelphia, PA- AAAI pp 574-578
O'Donnell, John T and Cordelia V Hall
1988 "Debugging in A p p l i c a t i v e Languages." Lisp and Symbolic Computation,
1/2 pp 113-145
Pereira, Fernando C N 1985 "A
S t r u c t u r e - S h a r i n g R e p r e s e n t a t i o n for Unification-Based Grammar Formalisms."
Proceedings of the 23 rd Annual Meeting of the Association f o r Computational Linguistics Chicago, IL: ACL pp 137-144
Relationship between Logic and Functional Languages," in Doug DeGroot and Gary Lindstrom, eds Logic P r o g r a m m i n g : Functions, Relations, and Equations
Englewood Cliffs, NJ Prentice-Hall pp 3-
36
W r o b l e w s k i , David A 1987
" N o n d e s t r u c t i v e Graph U n i f i c a t i o n "
Proceedings of the 6 th National Conference
on Artificial Intelligence Seattle, WA: AAAI pp 582-587
1 8 7