structures of the lexical entries are also these declarations can be seen in Figure 3.. referred in a uniform manner using their values straight.. The system automatically details associ
Trang 1Lehtola, A., J ~ p p i n e n , H., N e l i m a r k k a , E
s i r r a F o u n d a t i o n (*) and
H e l s i n k i U n i v e r s i t y of T e c h n o l o g y
H e l s i n k i , F i n l a n d
A B S T R A C T This paper i n t r o d u c e s a s p e c i a l
p r o g r a m m i n g e n v i r o n m e n t for the d e f i n i t i o n
of g r a m m a r s and for the i m p l e m e n t a t i o n of
c o r r e s p o n d i n g parsers In n a t u r a l
l a n g u a g e p r o c e s s i n g s y s t e m s it is
a d v a n t a g e o u s to have l i n g u i s t i c k n o w l e d g e
and p r o c e s s i n g m e c h a n i s m s separated Our
e n v i r o n m e n t a c c e p t s g r a m m a r s c o n s i s t i n g of
b i n a r y d e p e n d e n c y r e l a t i o n s and
g r a m m a t i c a l functions W e l l - f o r m e d
e x p r e s s i o n s of f u n c t i o n s and r e l a t i o n s
p r o v i d e c o n s t i t u e n t s u r r o u n d i n g s for
s y n t a c t i c c a t e g o r i e s in the form of
t w o - w a y automata These r e l a t i o n s ,
functions, and a u t o m a t a are d e s c r i b e d in a
s p e c i a l d e f i n i t i o n language
In focusing on high level d e s c r i p t i o n s a
l i n g u i s t may ignore c o m p u t a t i o n a l d e t a i l s
of the p a r s i n g process He w r i t e s the
g r a m m a r into a D P L - d e s c r i p t i o n and a
c o m p i l e r t r a n s l a t e s it into e f f i c i e n t
L I S P - c o d e The e n v i r o n m e n t has also a
t r a c i n g f a c i l i t y for the p a r s i n g process,
g r a m m a r - s e n s i t i v e l e x i c a l m a i n t e n a n c e
p r o g r a m s , and r o u t i n e s for the i n t e r a c t i v e
g r a p h i c d i s p l a y of p a r s e trees and g r a m m a r
d e f i n i t i o n s T r a n s l a t o r r o u t i n e s are also
a v a i l a b l e for the t r a n s p o r t of c o m p i l e d
code b e t w e e n v a r i o u s L I S P - d i a l e c t s The
e n v i r o n m e n t itself e x i s t s c u r r e n t l y in
I N T E R L I S P and F R A N Z L I S P This p a p e r
focuses on k n o w l e d g e e n g i n e e r i n g issues
and d o e s not enter l i n g u i s t i c
a r g u m e n t a t i o n
I N T R O D U C T I O N Our o b j e c t i v e has b e e n to build a p a r s e r
for F i n n i s h to work as a p r a c t i c a l tool in
real p r o d u c t i o n a p p l i c a t i o n s In the
b e g i n n i n g of our work we were faced with
two major problems First, so far there
was no formal d e s c r i p t i o n of the F i n n i s h
grammar S e c o n d d i f f i c u l t y was that
F i n n i s h d i f f e r s by its s t r u c t u r e g r e a t l y
from the I n d o e u r o p e a n languages F i n n i s h
has r e l a t i v e l y free word order and
s y n t a c t i c o - s e m a n t i c k n o w l e d g e in a
s e n t e n c e is o f t e n e x p r e s s e d in the
i n f l e c t i o n s of the words T h e r e f o r e
e x i s t i n g p a r s i n g m e t h o d s for I n d o e u r o p e a n
l a n g u a g e s (eg ATN, DCG, LFG etc.) did not seem to g r a s p the i d i o s y n c r a c i e s of
F i n n i s h The p a r s e r s y s t e m we h a v e d e v e l o p e d is
b a s e d on f u n c t i o n a l d e p e n d e n c y G r a m m a r
is s p e c i f i e d by a f a m i l y of t w o - w a y f i n i t e
a u t o m a t a and by d e p e n d e n c y f u n c t i o n and
r e l a t i o n d e f i n i t i o n s Each a u t o m a t o n
e x p r e s s e s the valid d e p e n d e n c y c o n t e x t of
o n e c o n s t i t u e n t type In a b s t r a c t s e n s e the w o r k i n g s t o r a g e of the p a r s e r c o n s i s t s
of two c o n s t i t u e n t s t a c k s and of a
r e g i s t e r w h i c h h o l d s the c u r r e n t
c o n s t i t u e n t (Figure I)
The register of the current constituent
LI L2 L3
RI R2 R3
F i g u r e I The w o r k i n g s t o r a g e
of D P L - p a r s e r s
(*) S I T R A F o u n d a t i o n P.O Box 329, S F - 0 0 1 2 1 H e l s i n k i ,
F i n l a n d
Trang 2<-Phrase Adverbial ) < + P h r a s e Adverbial ON RIGHT
~*Phrase Subject~ ~ophrase
Phrase ] I
L Adverbial
! * P h r a s e
I A d v e r b i a l
IILO PHRASE
ON RIGHT
~Phrase
P h r a s e
Sublet1
ILO PHRASE
ON RIGHT
• - - N o m i n a
e m p t y l e f t - hand side
BUILD PXRA:
ON RIGHT
= , N o m i n a l
~nd of inpul
@
FIND REGENT
ON RIGHT
Notations:
of the a u t o m a t o n {cond$ the d e p e n d e n t c a n d i d a t e (if not
Toncllon) o t h e r w i s e d s t a t e d ) and
k
Double circles a r e used
to d e n o t e e n t r e e s and
e x i t s of an a u t o m a t o n • Inside is e x p r e s s e d the
m a n n e r of o p e r a t i o n
F i g u r e 2 A t w o - w a y a u t o m a t o n for F i n n i s h v e r b s
The two stacks hold the right and left
c o n t e x t s of the c u r r e n t c o n s t i t u e n t The
p a r s i n g p r o c e s s is a l w a y s d i r e c t e d by the
e x p e c t a t i o n s of the c u r r e n t c o n s t i t u e n t
D y n a m i c local c o n t r o l is r e a l i z e d by
p e r m i t t i n g the a u t o m a t a to a c t i v a t e one
another The b a s i c d e c i s i o n for the
a u t o m a t o n a s s o c i a t e d w i t h the c u r r e n t
c o n s t i t u e n t is to a c c e p t or r e j e c t a
n e i g h b o r via a valid s y n t a c t i c o - s e m a n t i c
s u b o r d i n a t e relation A c c e p t a n c e
s u b o r d i n a t e s the n e i g h b o r , and it
d i s a p p e a r s from the stack The s t r u c t u r e
an input s e n t e n c e r e c e i v e s is an a n n o t a t e d
tree of such b i n a r y relations
An a u t o m a t o n for v e r b s is d e s c r i b e d in
F i g u r e 2 W h e n a v e r b b e c o m e s the c u r r e n t
c o n s t i t u e n t for the first time it w i l l
enter the a u t o m a t o n t h r o u g h the S T A R T
node The a u t o m a t o n e x p e c t s to find a
d e p e n d e n t from the left (?V) If the left
n e i g h b o r has the c o n s t i t u e n t f e a t u r e
S u b j e c t and then for Object W h e n a
f u n c t i o n test s u c c e e d s , the n e i g h b o r w i l l
the s t a t e i n d i c a t e d by arcs The d o u b l e
c i r c l e s t a t e s d e n o t e e n t r y and exit p o i n t s
of the a u t o m a t o n
~f c o m p l e t e d c o n s t i t u e n t s do not e x i s t as
n e i g h b o r s , an a u t o m a t o n m a y d e f e r
d e c i s i o n In the F i g u r e 2 s t a t e s l a b e l l e d
" B U I L D P H R A S E ON RIGHT" and " F I N D R E G E N T
ON R I G H T " p u s h the v e r b to the left stack and p o p the r i g h t stack for the c u r r e n t
c o n s t i t u e n t W h e n the v e r b is a c t i v a t e d later on, the c o n t r o l flow w i l l c o n t i n u e from the s t a t e e x p r e s s e d in the
d e a c t i v a t i o n c o m m a n d
T h e r e are two d i s t i n c t s e a r c h s t r a t e g i e s involved If a s i n g l e p a r s e is
s u f f i c i e n t , the g r a p h s (i.e the automata) are s e a r c h e d d e p t h first
Trang 3expressed in a special conditional
expression formalism DPL (for D e p e n d e n c y
inflectional languages as well
D P L - D E S C R I P T I O N S The main object in DPL is a constituent
structures of the lexical entries are also
these declarations can be seen in Figure
3
referred in a uniform manner using their
values straight The system automatically
details associated to property types For
example, the system is automatically tuned
to notice the inheritance of properties in
m u l t i d i m e n s i o n a l analysis has been one of
the DPL-formalism Patterning can be done
set associated to constituents can easily
be extended
An example of a constituent structure and
d e f i n i t i o n s further specify C o n s t F e a t and
d e f i n i t i o n of a category tree SemCat is given This tree has sets of p r o p e r t y
D P L - s y s t e m automatically takes care of
c o n s t i t u e n t that belongs to the semantic
c a t e g o r y Human the system a u t o m a t i c a l l y
+Countable, and +Concr
relations are defined using the syntax in
value the binary construct built from the
~ u r r e n t constituent (C) and its d e p e n d e n t
c a n d i d a t e (D), or it returns NIL
pairs of C and D c o n s t i t u e n t s that have passed the associated predicate filter
By choosing operators a user may vary a
p r e d i c a t i o n between simple equality (=)
< c o n s t i t u e n t s t r u c t u r e > : : = ( CONSTITUENT:
< s u b t r e e o~ c o n s t i t u e n t > : : = ( SUBTREE:
< l i s t o f p r o p e r t i e s >
< p r o p e r t y name>
< t y p e name>
< g l u e node name>
< g l u e node>
< l i s t o f p r o p e r t i e s > )
< g l u e node>
< l i s t o f p r o p e r t i e s > ) : ( LEXICON-ENTRY: < g l u e node>
< l i s t o f p r o p e r t i e s > ) : : = ( < l i s t o f p r o p e r t i e s > )
( < p r o p e r t y name> ) : : = < t y p e name> : < g l u e node name>
: : = < u n i q u e l i s p atom>
: : = < u n i q u e l i s p atom>
: : = < g l u e node name i n u p p e r l e v e l - >
< p r o p e r t y d e c l a r a t i o n >
< p o s s i b l e v a l u e s >
< d e f a u l t v a l u e >
<node d e f i n i t i o n >
<node name>
< f e a t u r e s e t >
< f a t h e r node>
<empty>
: : = ( PROPERTY: < t y p e name> < p o s s i b l e v a l u e s > ) :
( FEATURE: < t y p e name> < p o s s i b l e v a l u e s > ) ( CATEGORY: < t y p e name> < <node d e f i n i t i o n > > ) : : = < < d e f a u l t v a l u e > < u n i q u e l i s p a t o m > >
: : = N o D e f a u l t : < u n i q u e l i s p atom>
: : = ( <node name> < f e a t u r e s e t > < f a t h e r node> ) : : = < u n i q u e l i s p atom>
: : = ( < f e a t u r e v a l u e > ) : <empty>
: : = / <name o f an a l r e a d y d e f i n e d node> : <empty>
: : =
Figure 3 The syntax of constituent structure
and property definitions
Trang 4(CONSTITUENT:
(LEXICON-ENTRY:
(SUBTREE:
(CATEGORY:
( F u n c t i o n R o l e C o n s t F e a t P r o p O g L e x e m e M o r p h c h a r ) )
P r o p O f L e x e m e ( ( S y n t C a t S y n t F e a t ) (SemCat SemFeat) ( F r a m e C a t L e x F r a m e ) AKO ) )
MorphChar ( P o l a r V o i c e Modal T e n s e C o m p a r i s o n Number Case P e r s o n N P e r s o n P C l i t l C l i t 2 ) ) SemCat
< ( E n t i t y ) ( C o n c r e t e ( + C o n c r ) / E n t i t y ) ( A n i m a t e ( +Anim + C o u n t a b l e ) / C o n c r e t e ) ( Human ( +Hum ) / A n i m a t e )
( A n i m a l s / A n i m a t e ) ( NonAnim / C o n c r e t e ) ( M a t t e r ( - C o u n t a b l e ) / NonAnim ) ( T h i n g ( + C o u n t a b l e ) / NonAnim ) >
F i g u r e 4 A n e x a m p l e of a c o n s t i t u e n t s t r u c t u r e s p e c i f i c a t i o n
a n d the d e f i n i t i o n of an c a t e g o r y t r e e
i m p l i c i t A N D - o p e r a t o r A n a r r o w t r i g g e r s
d e f a u l t s on: t h e e l e m e n t s of e x p r e s s i o n s
to the r i g h t of an a r r o w a r e in the
O R - r e l a t i o n a n d t h o s e to the l e f t of it
a r e in t h e A N D - r e l a t i o n T w o k i n d s of
a r r o w s a r e in use A s i m p l e a r r o w (->)
p e r f o r m s all o p e r a t i o n s on t h e r i g h t and a
d o u b l e a r r o w (=>) t e r m i n a t e s t h e e x e c u t i o n
at the f i r s t s u c c e s s f u l o p e r a t i o n
In F i g u r e 6 is an e x a m p l e of h o w o n e m a y
d e f i n e S u b j e c t If the r e l a t i o n R e c S u b j
h o l d s b e t w e e n the r e g e n t and the d e p e n d e n t
c a n d i d a t e the l a t t e r w i l l be l a b e l l e d
S u b j e c t and s u b o r d i n a t e d to the f o r m e r
T h e r e l a t i o n a l e x p r e s s i o n R e c S u b j d e f i n e s
t h e p r o p e r t y p a t t e r n s t h e c o n s t i t u e n t s
s h o u l d m a t c h
A g r a m m a r d e f i n i t i o n e n d s w i t h the c o n t e x t
s p e c i f i c a t i o n s of c o n s t i t u e n t s e x p r e s s e d
a s t w o - w a y a u t o m a t a T h e a u t o m a t a a r e
d e s c r i b e d u s i n g t h e n o t a t i o n s h o w n in
s o m e w h a t s i m p l i f i e d f o r m in F i g u r e 7 A n
a u t o m a t o n c a n r e f e r up to t h r e e
c o n s t i t u e n t s to the r i g h t or l e f t u s i n g
i n d e x e d n a m e s : LI, L2, L3, RI, R2 or R3
< ~ u n c t i o n > : : = ( FUNCTION: < ~ u n c t i o n name> < o p e r a t i o n e x p r > )
< r e l a t i o n > : : = ( RELATION: < r e l a t i o n name> < o p e r a t i o n e x p r > )
< o p e r a t i o n e x p r > : : = ( < p r e d i c a t e e ~ p r > < i m p l y < o p e r a t i o n e × p r > )
< p r e d i c a t e e x p r >
< r e l a t i o n name> : ( DEL < c o n s t i t u e n t l a b e l > )
< p r e d i c a t e e x p r > : : = < < p r e d i c a t e e x p r > > I
( < p r e d i c a t e e x p r > ) ( < c o n s t i t u e n t p o i n t e r > < o p e r a t o r > < v a l u e e x p r > )
< i m p l > : : = - > I =>
< c o n s t i t u e n t l a b e l > : : = C I D
< o p e r a t o r > ::= = I := I : I = : =
< v a l u e e x p r > : : = < < v a l u e e x p r > > :
( < v a l u e e x p r > ) :
< v a l u e o~ s o m e p r o p e r t y > I ' < l e x e m e > I
( < p r o p e r t y n a m e > < c o n s t i t u e n t l a b e l > )
F i g u r e 5 T h e s y n t a x of D P L - f u n c t i o n s a n d D P L - r e l a t i o n s
Trang 5)
(RELATION:
( R e c S u b j - > (D : = S u b j e c t ) )
R e c S u b j ( ( C = A c t < I n d Cond P o t I m p e r >) (D = - S e n t e n c e + N o m i n a l )
- > ( ( D = Nom)
- > (D = P e r s P r o n ( P e r s o n P C) ( P e r s o n N C ) )
( ( C = P ) ( D = P L ) ) ) ) ( ( D = P a r t ) ( C = S 3 P )
- > ( ( C = " O L L A )
=> (C : - + E x i s t e n c e ) ) ( ( C = - T r a n s i t i v e + E x i s t e n c e ) ) ) )
Figure 6 A realisation of Subject
< s t a t e i n a u t o m > : : = ( STATE: < s t a t e name> < d i r e c t i o n > < s t a t e e x p r > )
< s t a t e e x p r > : : = ( < l h s o f s e x p r > < i m p l > < s t a t e e x p r > )
( < l h s o f s e x p r > < i m p l > < s t a t e c h a n g e > )
< l h s o f s e x p r > : : = < f u n c t i o n name> ~ < p r e d i c a t e e x p r >
< s t a t e c h a n g e > : : = ( C : = <name o f n e x t s t a t e > ) :
( BUILD-PHRASE-ON < d i r e c t i o n > < s s t a t e o h > )
( P A R S E D )
< s t a t e c h a n g e > : : = < w o r k s p m a n i p ° > < s t a t e c h a n g e >
< s s t a t e c h > : : = ( C : = <name o f r e t u r n s t a t e > )
< w o r k s p m a n i p ° > : : = ( DEL < c o n s t i t u e n t l a b e l > )
( TRANSPOSE < c o n s t i t u e n t l a b e l >
< c o n s t i t u e n t l a b e l > )
Figure 7 Simplified syntax of state specifications
( ( D = + P h r a s e ) - > ( S u b j e c t - > (C : = V S ? ) )
( A d v e r b i a l - > (C : = V ? ) )
( ( D = - P h r a s e ) - > (BUILD-PHRASE-ON RIGHT (C : = V ? ) ) )
Figure 8 The expression of V? in Figure 2
Trang 6s e l e c t s the d e p e n d e n t c a n d i d a t e n o r m a l l y
as L1 or R1 A s w i t c h of state takes
p l a c e by an a s s i g n m e n t in the same way as
l i n g u i s t i c p r o p e r t i e s are assigned As an
e x a m p l e the node V? of F i g u r e 2 is
d e f i n e d f o r m a l l y in F i g u r e 8
M o r e l i n g u i s t i c a l l y o r i e n t e d
a r g u m e n t a t i o n of the D P L - f o r m a l i s m a p p e a r s
e l s e w h e r e (Nelimarkka, 1984a, and
N e l i m a r k k a , 1984b)
THE A R C H I T E C T U R E OF THE D P L - E N V I R O N M E N T
The a r c h i t e c t u r e of the D P L - e n v i r o n m e n t is
d e s c r i b e d s c h e m a t i c a l l y in F i g u r e 9 The
m a i n parts are h i g h l i g h t e d by h e a v y lines
S i n g l e arrows r e p r e s e n t d a t a transfer;
d o u b l e arrows indicate the p r o d u c t i o n of
d a t a structures All m o d u l e s have b e e n
i m p l e m e n t e d in LISP The r e a l i s a t i o n s do
not rely on s p e c i f i c s of u n d e r l y i n g
L I S P - e n v i r o n m e n t s
The D P L - c o m p i l e r
A c o m p i l a t i o n results in e x e c u t a b l e code
of a parser The c o m p i l e r p r o d u c e s h i g h l y
I n t e r n a l l y d a t a s t r u c t u r e s are only p a r t l y
d y n a m i c for the r e a s o n of fast i n f o r m a t i o n fetch A m b i g u i t i e s are e x p r e s s e d l o c a l l y
to m i n i m i z e redundant search The
p r i n c i p l e of s t r u c t u r e s h a r i n g is f o l l o w e d
w h e n e v e r new data s t r u c t u r e s are built
In the m a n i p u l a t i o n of c o n s t i t u e n t
s t r u c t u r e s there e x i s t s a s p e c i a l s e r v i c e
r o u t i n e for each c o m b i n a t i o n of p r o p e r t y and p r e d i c a t i o n types T h e s e r o u t i n e s take s p e c i a l care of time and m e m o r y
c o n s u m p t i o n For i n s t a n c e with r e g a r d
r e p l a c e m e n t s and i n s e r t i o n s the c o p y i n g
i n c l u d e s p h y s i c a l l y only the path from the root of the list s t r u c t u r e to the c h a n g e d sublist The l o g i c a l l y shared p a r t s w i l l
• be s h a r e d also p h y s i c a l l y This
s t i p u l a t i o n m i n i m i z e s m e m o r y u s a g e
In the state t r a n s i t i o n n e t w o r k level the
s e a r c h is done d e p t h first To h a n d l e
a m b i q u i t i e s D P L - f u n c t i o n s and - r e l a t i o n s
p r o c e s s all a l t e r n a t i v e i n t e r p r e t a t i o n s in
p a r a l l e l In fact the a l t e r n a t i v e s are
s t o r e d in the stacks and in the C - r e g i s t e r
as trees of a l t e r n a n t s
In the first v e r s i o n of the D P L - c o m p i l e r the g e n e r a t i o n rules w e r e i n t e r m i x e d w i t h the c o m p i l e r code The m a i n t e n a n c e of the
c o m p i l e r g r e w h a r d e r w h e n we e x p e r i m e n t e d
w i t h new c o m p u t a t i o n a l features We
p a r s e r facility
lexicon maintenance
information extraction system with graphic output
Trang 7m e t a c o m p i l e r in w h i c h c o m p i l a t i o n is
d e f i n e d by rules At m o m e n t we are
t e s t i n g it and soon it w i l l be in e v e r y d a y
use T h e a m o u n t of L I S P - c o d e has g r e a t l y
r e d u c e d with the rule based a p p r o a c h , and
we are n o w p l a n n i n g to i n s t a l l the
D P L - e n v i r o n m e n t into IBM PC
Our p a r s e r s w e r e a i m e d to be p r a c t i c a l
tools in real p r o d u c t i o n a p p l i c a t i o n s It
w a s h e n c e i m p o r t a n t to m a k e the p r o d u c e d
p r o g r a m s t r a n s f e r a b l e As of now we h a v e
a r u l e - b a s e d t r a n s l a t o r w h i c h c o n v e r t s
p a r s e r s b e t w e e n L I S P d i a l e c t s The
t r a n s l a t o r a c c e p t s c u r r e n t l y I N T E R L I S P ,
F r a n z L I S P and C o m m o n Lisp
T h e e n v i r o n m e n t has a s p e c i a l m a i n t e n a n c e
p r o g r a m for l e x i c o n s The p r o g r a m uses
v i d e o g r a p h i c s to e a s e u p d a t i n g and it
p e r f o r m s v a r i o u s c h e c k s to g u a r a n t e e the
c o n s i s t e n c y of the l e x i c a l e n t r i e s It
a l s o c o - o p e r a t e s w i t h the i n f o r m a t i o n
e x t r a c t i o n s y s t e m to h e l p the user in the
s e l e c t i o n of p r o p e r t i e s
T h e T r a c i n g F a c i l i t y
T h e t r a c i n g f a c i l i t y is a c o n v e n i e n t tool for g r a m m a r d e b u g g i n g For e x a m p l e , in
F i g u r e I0 a p p e a r s the t r a c e of the p a r s i n g
of the s e n t e n c e " P o i k a n i tuli i l l a l l a
k e n t ~ i t ~ h e i t t ~ m ~ s t ~ k i e k k o a " (= " M y son
~ 8 ~ ¢ c ~ s e s
• 03 seconds
0 0 s e c o n d s , g a r b a g e c o l l e c t i o n t i m e
P A R S E D _ P R T H ( )
= > ( P O I K A ) (TULJ.A) ( I L T A ) ( K E N T T ~ ) ( H E I T T ~ ) (KIE]<KO) ?N ( P O I K A ) < = ( T U L L A ) ( I L T A ) ( K E N T T ~ ) ( H E I T T ~ ) ( K I E K K O ) N?
= > ( P O I K A ) ( T U L L A ) ( I L T A ) ( K E N T T ~ ) ( H E I T T ~ ) ( K I E K K O ) ? N F i n a l ( # # ) ( P O I K A ) ( T U L L A ) ( I L T A ) ( K E N T T ~ ) ( H E I T T ~ ) ( K I E K K O ) NIL ( P O I K A ) => ( T U L L A ) (ILTA) ( K E N T T ~ ) ( H E I T T ~ ) ( K I E K K O ) ?V
,=> ( ( P O I K A ) TULLA) (ILTA) (KENTT~) ( H E I T T ~ ) ( K I E K K O ) ?VS ((POIKA) TULLA) <= ( ~ L T A ) (KENTT~) (HEITT~&) ( K I E K K O ) VS?
( ( P O I K A ) T U L L A ) ( I L T A ) <= ( K E N T T ~ ) ( H E I T T ~ ) ( K I E K K O ) N?
((POIKA) TULLA) => "(ILTA) (KENTT~) ( H E I T T ~ ) ( K I E K K O ) ? N F i n a l
((POIKA) TULLA ( I L T A ) ) => (KENTT~) ( H E I T T ~ ) ( K I E K K O ) ? N F i n a l ((POIKA) TULLA (ILTA)) <= (KENTT&) ( H E I T T ~ ) ( K I E K K O ) VS?
( ( P O L K A ) T U L L A ( I L T A ) ( K E N T T ~ ) ) < = ( H E I T T ~ ) ( K I E K K O ) V S ?
DONE
F i g u r e I0 A trace of p a r s i n g p r o c e s s
Trang 8c a m e back in the e v e n i n g f r o m the s t a d i u m
w h e r e he had b e e n t h r o w i n g the d i s c u s " )
Each row r e p r e s e n t s a state of the p a r s e r
b e f o r e the c o n t r o l e n t e r s the s t a t e
m e n t i o n e d on the r i g h t - h a n d column T h e
t h u s - f a r found c o n s t i t u e n t s are s h o w n by
the p a r e n t h e s i s An a r r o w h e a d p o i n t s
from a d e p e n d e n t c a n d i d a t e (one w h i c h is
s u b j e c t e d to d e p e n d e n c y tests) t o w a r d s the
c u r r e n t c o n s t i t u e n t
The t r a c i n g f a c i l i t y g i v e s also the
c o n s u m e d C P U - t i m e and two q u a l i t y
i n d i c a t o r s : s e a r c h e f f i c i e n c y and
c o n n e c t i o n e f f i c i e n c y S e a r c h e f f i c i e n c y
is 100%, if no u s e l e s s s t a t e t r a n s i t i o n s
took p l a c e in the search T h i s figure is
m e a n i n g l e s s w h e n the s y s t e m is
p a r a m e t e r i z e d to full s e a r c h b e c a u s e then
all t r a n s i t i o n s are tried
C o n n e c t i o n e f f i c i e n c y is the ratio of the
n u m b e r of c o n n e c t i o n s r e m a i n i n g in a
r e s u l t to the total n u m b e r of c o n n e c t i o n s
a t t e m p t e d for it d u r i n g the search W e
are c u r r e n t l y d e v e l o p i n g o t h e r m e a s u r i n g
tools to e x t r a c t s t a t i s t i c a l i n f o r m a t i o n ,
eg a b o u t the f r e q u e n c y d i s t r i b u t i o n of
is also a u t o m a t i c b o o k - k e e p i n g of all
s e n t e n c e ~ input to the system T h e s e w i l l
be d i v i d e d into two g r o u p s : p a r s e d and
n o t parsed The first g r o u p c o n s t i t u t e s
g r o w i n g test m a t e r i a l to e n s u r e m o n o t o n i c
i m p r o v e m e n t of g r a m m a r s : a f t e r a non
t r i v i a l c h a n g e is d o n e in the g r a m m a r , a
n e w c o m p i l e d p a r s e r runs all test
s e n t e n c e s and the r e s u l t s are c o m p a r e d to the p r e v i o u s ones
I n f o r m a t i o n E x t r a c t i o n S y s t e m
In an a c t u a l w o r k i n g s i t u a t i o n t h e r e m a y
be t h o u s a n d s of l i n g u i s t i c s y m b o l s in the
w o r k space To m a k e such a c o m p l e x
m a n a g e a b l e , we have i m p l e m e n t e d an
i n f o r m a t i o n s y s t e m that for a g i v e n s y m b o l
p r e t t y - p r i n t s all i n f o r m a t i o n a s s o c i a t e d
w i t h it
T h e e n v i r o n m e n t has r o u t i n e s for the
g r a p h i c d i s p l a y of p a r s i n g results A user c a n s e l e c t i n f o r m a t i o n by p o i n t i n g
w i t h the cursor The e x a m p l e in F i g u r e Ii
d e m o n s t r a t e s the use of this facility
T h e c o m m a n d SHOW() i n q u i r e s the r e s u l t s of
_SHOW ( )
TULLA
I
I
i
S u b J e c t
, e HEITT~U~
Adverbial
S
!
K I E K K O
O b j e c t
N e u t r a l
D e f a u l t v a l u e n - P h r a s e
A s s o c i a t e d v a l u e s : ( + D e c l a r a t i v e - D e c l a r a t i v e +Main - M a i n +Nominal
- N o m i n a l +Phrase - P h r a s e + P r e d i c a t i v e - P r e d i c a t i v e + R e l a t i v e - R e l a t i v e
A s s o c i a t e d ~ u n c t i o n s l ( C ~ n s t F e a t / I N I T C o n s t F e a t / F N C e n s t F e a t l = C o n s t F e a t / = : = C o n s t F e a t / : -
F i g u r e ii An e x a m p l e of i n f o r m a t i o n e x t r a c t i o n u t i l i t i e s
Trang 9the p a r s i n g p r o c e s s d e s c r i b e d in F i g u r e i0
The s y s t e m r e p l i e s by first p r i n t i n g the
s t a r t state and then the found result(s)
in c o m p r e s s e d Eorm The c u r s o r has b e e n
m o v e d on top of this p a r s e and C T R L - G has
b e e n typed The s y s t e m now d r a w s the
p i c t u r e of the tree s t r u c t u r e
S u b s e q u e n t l y one of the n o d e s has b e e n
opened The p r o p e r t i e s of the node P O I K A
a p p e a r p r e t t y - p r i n t e d The user has
f u r t h e r m o r e asked i n f o r m a t i o n a b o u t the
p r o p e r t y type C o n s t F e a t All t h e s e
o p e r a t i o n s are g e n e r a l ; they do not use
the s p e c i a l f e a t u r e s of any p a r t i c u l a r
terminal
C O N C L U S I O N The p a r s i n g s t r a t e g y a p p l i e d for the
D P L - f o r m a l i s m was o r i g i n a l l y v i e w e d as a
c o g n i t i v e model It has p r o v e d to r e s u l t
p r a c t i c a l and e f f i c i e n t p a r s e r s as well
E x p e r i m e n t s w i t h a n o n - t r i v i a l set of
F i n n i s h s e n t e n c e s t r u c t u r e s h a v e b e e n
p e r f o r m e d both on D E C - 2 0 6 0 and on
V A X - I I / 7 8 0 systems The a n a l y s i s of an
e i g h t word sentence, for instance, takes
b e t w e e n 20 and 600 ms of DEC C P U - t i m e in
the I N T E R L I S P - v e r s i o n d e p e n d i n g on w h e t h e r
one w a n t s o n l y the first or, t h r o u g h
c o m p l e t e search, all p a r s e s for
s t r u c t u r a l l y a m b i g u o u s s e n t e n c e s The
M a c L I S P - v e r s i o n of the p a r s e r r u n s a b o u t
20 % f a s t e r on the same c o m p u t e r T h e
N I L - v e r s i o n (Common L i s p compatible) is
a b o u t 5 times slower on VAX T h e w h o l e
e n v i r o n m e n t has b e e n t r a n s f e r r e d a l s o to
F r a n z L I S P on VAX W e have not yet focused
on o p t i m a l i t y issues in g r a m m a r
d e s c r i p t i o n s We b e l i e v e that by
r e a r r a n g i n g the o r d e r i n g s of e x p e c t a t i o n s
in the a u t o m a t a i m p r o v e m e n t in e f f i c i e n c y
ensues
i Lehtola, A., C o m p i l a t i o n and
I m p l e m e n t a t i o n of 2 - w a y T r e e A u t o m a t a for the P a r s i n g of Finnish M.So Thesis,
~ e l s i n k i U n i v e r s i t y of T e c h n o l o g y ,
D e p a r t m e n t of P h y s i c s , 1984, 120 p (in Finnish)
2° N e l i m a r k k a , E°, J ~ p p i n e n , H and
L e h t o l a A., T w o - w a y F i n i t e A u t o m a t a and
D e p e n d e n c y Theory: A P a r s i n g M e t h o d for
I n f l e c t i o n a l Free W o r d O r d e r L a n g u a g e s Proc C O L I N G 8 4 / A C L , S t a n f o r d , 1984a, pp 389-392
3° N e l i m a r k k a , E., J ~ p p i n e n , H and
L e h t o l a A., P a r s i n g an I n f l e c t i o n a l F r e e
W o r d O r d e r L a n g u a g e w i t h T w o - w a y F i n i t e
A u t o m a t a ° Proc of the 6th E u r o p e a n
C o n f e r e n c e on A r t i f i c i a l I n t e l l i g e n c e , Pisa, 1984b, pp 167-176
4 W i n o g r a d , To, L a n g u a g e as a C o g n i t i v e
A d d i s o n - W e s l e y P u b l i s h i n g Company,
R e a d i n g , 1983, 640 p