1. Trang chủ
  2. » Luận Văn - Báo Cáo

Báo cáo khoa học: "LANGUAGE-BASED ENVIRONMENT FOR NATURAL LANGUAGE ENGLISH PARSING" potx

9 369 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 9
Dung lượng 545,38 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

structures of the lexical entries are also these declarations can be seen in Figure 3.. referred in a uniform manner using their values straight.. The system automatically details associ

Trang 1

Lehtola, A., J ~ p p i n e n , H., N e l i m a r k k a , E

s i r r a F o u n d a t i o n (*) and

H e l s i n k i U n i v e r s i t y of T e c h n o l o g y

H e l s i n k i , F i n l a n d

A B S T R A C T This paper i n t r o d u c e s a s p e c i a l

p r o g r a m m i n g e n v i r o n m e n t for the d e f i n i t i o n

of g r a m m a r s and for the i m p l e m e n t a t i o n of

c o r r e s p o n d i n g parsers In n a t u r a l

l a n g u a g e p r o c e s s i n g s y s t e m s it is

a d v a n t a g e o u s to have l i n g u i s t i c k n o w l e d g e

and p r o c e s s i n g m e c h a n i s m s separated Our

e n v i r o n m e n t a c c e p t s g r a m m a r s c o n s i s t i n g of

b i n a r y d e p e n d e n c y r e l a t i o n s and

g r a m m a t i c a l functions W e l l - f o r m e d

e x p r e s s i o n s of f u n c t i o n s and r e l a t i o n s

p r o v i d e c o n s t i t u e n t s u r r o u n d i n g s for

s y n t a c t i c c a t e g o r i e s in the form of

t w o - w a y automata These r e l a t i o n s ,

functions, and a u t o m a t a are d e s c r i b e d in a

s p e c i a l d e f i n i t i o n language

In focusing on high level d e s c r i p t i o n s a

l i n g u i s t may ignore c o m p u t a t i o n a l d e t a i l s

of the p a r s i n g process He w r i t e s the

g r a m m a r into a D P L - d e s c r i p t i o n and a

c o m p i l e r t r a n s l a t e s it into e f f i c i e n t

L I S P - c o d e The e n v i r o n m e n t has also a

t r a c i n g f a c i l i t y for the p a r s i n g process,

g r a m m a r - s e n s i t i v e l e x i c a l m a i n t e n a n c e

p r o g r a m s , and r o u t i n e s for the i n t e r a c t i v e

g r a p h i c d i s p l a y of p a r s e trees and g r a m m a r

d e f i n i t i o n s T r a n s l a t o r r o u t i n e s are also

a v a i l a b l e for the t r a n s p o r t of c o m p i l e d

code b e t w e e n v a r i o u s L I S P - d i a l e c t s The

e n v i r o n m e n t itself e x i s t s c u r r e n t l y in

I N T E R L I S P and F R A N Z L I S P This p a p e r

focuses on k n o w l e d g e e n g i n e e r i n g issues

and d o e s not enter l i n g u i s t i c

a r g u m e n t a t i o n

I N T R O D U C T I O N Our o b j e c t i v e has b e e n to build a p a r s e r

for F i n n i s h to work as a p r a c t i c a l tool in

real p r o d u c t i o n a p p l i c a t i o n s In the

b e g i n n i n g of our work we were faced with

two major problems First, so far there

was no formal d e s c r i p t i o n of the F i n n i s h

grammar S e c o n d d i f f i c u l t y was that

F i n n i s h d i f f e r s by its s t r u c t u r e g r e a t l y

from the I n d o e u r o p e a n languages F i n n i s h

has r e l a t i v e l y free word order and

s y n t a c t i c o - s e m a n t i c k n o w l e d g e in a

s e n t e n c e is o f t e n e x p r e s s e d in the

i n f l e c t i o n s of the words T h e r e f o r e

e x i s t i n g p a r s i n g m e t h o d s for I n d o e u r o p e a n

l a n g u a g e s (eg ATN, DCG, LFG etc.) did not seem to g r a s p the i d i o s y n c r a c i e s of

F i n n i s h The p a r s e r s y s t e m we h a v e d e v e l o p e d is

b a s e d on f u n c t i o n a l d e p e n d e n c y G r a m m a r

is s p e c i f i e d by a f a m i l y of t w o - w a y f i n i t e

a u t o m a t a and by d e p e n d e n c y f u n c t i o n and

r e l a t i o n d e f i n i t i o n s Each a u t o m a t o n

e x p r e s s e s the valid d e p e n d e n c y c o n t e x t of

o n e c o n s t i t u e n t type In a b s t r a c t s e n s e the w o r k i n g s t o r a g e of the p a r s e r c o n s i s t s

of two c o n s t i t u e n t s t a c k s and of a

r e g i s t e r w h i c h h o l d s the c u r r e n t

c o n s t i t u e n t (Figure I)

The register of the current constituent

LI L2 L3

RI R2 R3

F i g u r e I The w o r k i n g s t o r a g e

of D P L - p a r s e r s

(*) S I T R A F o u n d a t i o n P.O Box 329, S F - 0 0 1 2 1 H e l s i n k i ,

F i n l a n d

Trang 2

<-Phrase Adverbial ) < + P h r a s e Adverbial ON RIGHT

~*Phrase Subject~ ~ophrase

Phrase ] I

L Adverbial

! * P h r a s e

I A d v e r b i a l

IILO PHRASE

ON RIGHT

~Phrase

P h r a s e

Sublet1

ILO PHRASE

ON RIGHT

• - - N o m i n a

e m p t y l e f t - hand side

BUILD PXRA:

ON RIGHT

= , N o m i n a l

~nd of inpul

@

FIND REGENT

ON RIGHT

Notations:

of the a u t o m a t o n {cond$ the d e p e n d e n t c a n d i d a t e (if not

Toncllon) o t h e r w i s e d s t a t e d ) and

k

Double circles a r e used

to d e n o t e e n t r e e s and

e x i t s of an a u t o m a t o n • Inside is e x p r e s s e d the

m a n n e r of o p e r a t i o n

F i g u r e 2 A t w o - w a y a u t o m a t o n for F i n n i s h v e r b s

The two stacks hold the right and left

c o n t e x t s of the c u r r e n t c o n s t i t u e n t The

p a r s i n g p r o c e s s is a l w a y s d i r e c t e d by the

e x p e c t a t i o n s of the c u r r e n t c o n s t i t u e n t

D y n a m i c local c o n t r o l is r e a l i z e d by

p e r m i t t i n g the a u t o m a t a to a c t i v a t e one

another The b a s i c d e c i s i o n for the

a u t o m a t o n a s s o c i a t e d w i t h the c u r r e n t

c o n s t i t u e n t is to a c c e p t or r e j e c t a

n e i g h b o r via a valid s y n t a c t i c o - s e m a n t i c

s u b o r d i n a t e relation A c c e p t a n c e

s u b o r d i n a t e s the n e i g h b o r , and it

d i s a p p e a r s from the stack The s t r u c t u r e

an input s e n t e n c e r e c e i v e s is an a n n o t a t e d

tree of such b i n a r y relations

An a u t o m a t o n for v e r b s is d e s c r i b e d in

F i g u r e 2 W h e n a v e r b b e c o m e s the c u r r e n t

c o n s t i t u e n t for the first time it w i l l

enter the a u t o m a t o n t h r o u g h the S T A R T

node The a u t o m a t o n e x p e c t s to find a

d e p e n d e n t from the left (?V) If the left

n e i g h b o r has the c o n s t i t u e n t f e a t u r e

S u b j e c t and then for Object W h e n a

f u n c t i o n test s u c c e e d s , the n e i g h b o r w i l l

the s t a t e i n d i c a t e d by arcs The d o u b l e

c i r c l e s t a t e s d e n o t e e n t r y and exit p o i n t s

of the a u t o m a t o n

~f c o m p l e t e d c o n s t i t u e n t s do not e x i s t as

n e i g h b o r s , an a u t o m a t o n m a y d e f e r

d e c i s i o n In the F i g u r e 2 s t a t e s l a b e l l e d

" B U I L D P H R A S E ON RIGHT" and " F I N D R E G E N T

ON R I G H T " p u s h the v e r b to the left stack and p o p the r i g h t stack for the c u r r e n t

c o n s t i t u e n t W h e n the v e r b is a c t i v a t e d later on, the c o n t r o l flow w i l l c o n t i n u e from the s t a t e e x p r e s s e d in the

d e a c t i v a t i o n c o m m a n d

T h e r e are two d i s t i n c t s e a r c h s t r a t e g i e s involved If a s i n g l e p a r s e is

s u f f i c i e n t , the g r a p h s (i.e the automata) are s e a r c h e d d e p t h first

Trang 3

expressed in a special conditional

expression formalism DPL (for D e p e n d e n c y

inflectional languages as well

D P L - D E S C R I P T I O N S The main object in DPL is a constituent

structures of the lexical entries are also

these declarations can be seen in Figure

3

referred in a uniform manner using their

values straight The system automatically

details associated to property types For

example, the system is automatically tuned

to notice the inheritance of properties in

m u l t i d i m e n s i o n a l analysis has been one of

the DPL-formalism Patterning can be done

set associated to constituents can easily

be extended

An example of a constituent structure and

d e f i n i t i o n s further specify C o n s t F e a t and

d e f i n i t i o n of a category tree SemCat is given This tree has sets of p r o p e r t y

D P L - s y s t e m automatically takes care of

c o n s t i t u e n t that belongs to the semantic

c a t e g o r y Human the system a u t o m a t i c a l l y

+Countable, and +Concr

relations are defined using the syntax in

value the binary construct built from the

~ u r r e n t constituent (C) and its d e p e n d e n t

c a n d i d a t e (D), or it returns NIL

pairs of C and D c o n s t i t u e n t s that have passed the associated predicate filter

By choosing operators a user may vary a

p r e d i c a t i o n between simple equality (=)

< c o n s t i t u e n t s t r u c t u r e > : : = ( CONSTITUENT:

< s u b t r e e o~ c o n s t i t u e n t > : : = ( SUBTREE:

< l i s t o f p r o p e r t i e s >

< p r o p e r t y name>

< t y p e name>

< g l u e node name>

< g l u e node>

< l i s t o f p r o p e r t i e s > )

< g l u e node>

< l i s t o f p r o p e r t i e s > ) : ( LEXICON-ENTRY: < g l u e node>

< l i s t o f p r o p e r t i e s > ) : : = ( < l i s t o f p r o p e r t i e s > )

( < p r o p e r t y name> ) : : = < t y p e name> : < g l u e node name>

: : = < u n i q u e l i s p atom>

: : = < u n i q u e l i s p atom>

: : = < g l u e node name i n u p p e r l e v e l - >

< p r o p e r t y d e c l a r a t i o n >

< p o s s i b l e v a l u e s >

< d e f a u l t v a l u e >

<node d e f i n i t i o n >

<node name>

< f e a t u r e s e t >

< f a t h e r node>

<empty>

: : = ( PROPERTY: < t y p e name> < p o s s i b l e v a l u e s > ) :

( FEATURE: < t y p e name> < p o s s i b l e v a l u e s > ) ( CATEGORY: < t y p e name> < <node d e f i n i t i o n > > ) : : = < < d e f a u l t v a l u e > < u n i q u e l i s p a t o m > >

: : = N o D e f a u l t : < u n i q u e l i s p atom>

: : = ( <node name> < f e a t u r e s e t > < f a t h e r node> ) : : = < u n i q u e l i s p atom>

: : = ( < f e a t u r e v a l u e > ) : <empty>

: : = / <name o f an a l r e a d y d e f i n e d node> : <empty>

: : =

Figure 3 The syntax of constituent structure

and property definitions

Trang 4

(CONSTITUENT:

(LEXICON-ENTRY:

(SUBTREE:

(CATEGORY:

( F u n c t i o n R o l e C o n s t F e a t P r o p O g L e x e m e M o r p h c h a r ) )

P r o p O f L e x e m e ( ( S y n t C a t S y n t F e a t ) (SemCat SemFeat) ( F r a m e C a t L e x F r a m e ) AKO ) )

MorphChar ( P o l a r V o i c e Modal T e n s e C o m p a r i s o n Number Case P e r s o n N P e r s o n P C l i t l C l i t 2 ) ) SemCat

< ( E n t i t y ) ( C o n c r e t e ( + C o n c r ) / E n t i t y ) ( A n i m a t e ( +Anim + C o u n t a b l e ) / C o n c r e t e ) ( Human ( +Hum ) / A n i m a t e )

( A n i m a l s / A n i m a t e ) ( NonAnim / C o n c r e t e ) ( M a t t e r ( - C o u n t a b l e ) / NonAnim ) ( T h i n g ( + C o u n t a b l e ) / NonAnim ) >

F i g u r e 4 A n e x a m p l e of a c o n s t i t u e n t s t r u c t u r e s p e c i f i c a t i o n

a n d the d e f i n i t i o n of an c a t e g o r y t r e e

i m p l i c i t A N D - o p e r a t o r A n a r r o w t r i g g e r s

d e f a u l t s on: t h e e l e m e n t s of e x p r e s s i o n s

to the r i g h t of an a r r o w a r e in the

O R - r e l a t i o n a n d t h o s e to the l e f t of it

a r e in t h e A N D - r e l a t i o n T w o k i n d s of

a r r o w s a r e in use A s i m p l e a r r o w (->)

p e r f o r m s all o p e r a t i o n s on t h e r i g h t and a

d o u b l e a r r o w (=>) t e r m i n a t e s t h e e x e c u t i o n

at the f i r s t s u c c e s s f u l o p e r a t i o n

In F i g u r e 6 is an e x a m p l e of h o w o n e m a y

d e f i n e S u b j e c t If the r e l a t i o n R e c S u b j

h o l d s b e t w e e n the r e g e n t and the d e p e n d e n t

c a n d i d a t e the l a t t e r w i l l be l a b e l l e d

S u b j e c t and s u b o r d i n a t e d to the f o r m e r

T h e r e l a t i o n a l e x p r e s s i o n R e c S u b j d e f i n e s

t h e p r o p e r t y p a t t e r n s t h e c o n s t i t u e n t s

s h o u l d m a t c h

A g r a m m a r d e f i n i t i o n e n d s w i t h the c o n t e x t

s p e c i f i c a t i o n s of c o n s t i t u e n t s e x p r e s s e d

a s t w o - w a y a u t o m a t a T h e a u t o m a t a a r e

d e s c r i b e d u s i n g t h e n o t a t i o n s h o w n in

s o m e w h a t s i m p l i f i e d f o r m in F i g u r e 7 A n

a u t o m a t o n c a n r e f e r up to t h r e e

c o n s t i t u e n t s to the r i g h t or l e f t u s i n g

i n d e x e d n a m e s : LI, L2, L3, RI, R2 or R3

< ~ u n c t i o n > : : = ( FUNCTION: < ~ u n c t i o n name> < o p e r a t i o n e x p r > )

< r e l a t i o n > : : = ( RELATION: < r e l a t i o n name> < o p e r a t i o n e x p r > )

< o p e r a t i o n e x p r > : : = ( < p r e d i c a t e e ~ p r > < i m p l y < o p e r a t i o n e × p r > )

< p r e d i c a t e e x p r >

< r e l a t i o n name> : ( DEL < c o n s t i t u e n t l a b e l > )

< p r e d i c a t e e x p r > : : = < < p r e d i c a t e e x p r > > I

( < p r e d i c a t e e x p r > ) ( < c o n s t i t u e n t p o i n t e r > < o p e r a t o r > < v a l u e e x p r > )

< i m p l > : : = - > I =>

< c o n s t i t u e n t l a b e l > : : = C I D

< o p e r a t o r > ::= = I := I : I = : =

< v a l u e e x p r > : : = < < v a l u e e x p r > > :

( < v a l u e e x p r > ) :

< v a l u e o~ s o m e p r o p e r t y > I ' < l e x e m e > I

( < p r o p e r t y n a m e > < c o n s t i t u e n t l a b e l > )

F i g u r e 5 T h e s y n t a x of D P L - f u n c t i o n s a n d D P L - r e l a t i o n s

Trang 5

)

(RELATION:

( R e c S u b j - > (D : = S u b j e c t ) )

R e c S u b j ( ( C = A c t < I n d Cond P o t I m p e r >) (D = - S e n t e n c e + N o m i n a l )

- > ( ( D = Nom)

- > (D = P e r s P r o n ( P e r s o n P C) ( P e r s o n N C ) )

( ( C = P ) ( D = P L ) ) ) ) ( ( D = P a r t ) ( C = S 3 P )

- > ( ( C = " O L L A )

=> (C : - + E x i s t e n c e ) ) ( ( C = - T r a n s i t i v e + E x i s t e n c e ) ) ) )

Figure 6 A realisation of Subject

< s t a t e i n a u t o m > : : = ( STATE: < s t a t e name> < d i r e c t i o n > < s t a t e e x p r > )

< s t a t e e x p r > : : = ( < l h s o f s e x p r > < i m p l > < s t a t e e x p r > )

( < l h s o f s e x p r > < i m p l > < s t a t e c h a n g e > )

< l h s o f s e x p r > : : = < f u n c t i o n name> ~ < p r e d i c a t e e x p r >

< s t a t e c h a n g e > : : = ( C : = <name o f n e x t s t a t e > ) :

( BUILD-PHRASE-ON < d i r e c t i o n > < s s t a t e o h > )

( P A R S E D )

< s t a t e c h a n g e > : : = < w o r k s p m a n i p ° > < s t a t e c h a n g e >

< s s t a t e c h > : : = ( C : = <name o f r e t u r n s t a t e > )

< w o r k s p m a n i p ° > : : = ( DEL < c o n s t i t u e n t l a b e l > )

( TRANSPOSE < c o n s t i t u e n t l a b e l >

< c o n s t i t u e n t l a b e l > )

Figure 7 Simplified syntax of state specifications

( ( D = + P h r a s e ) - > ( S u b j e c t - > (C : = V S ? ) )

( A d v e r b i a l - > (C : = V ? ) )

( ( D = - P h r a s e ) - > (BUILD-PHRASE-ON RIGHT (C : = V ? ) ) )

Figure 8 The expression of V? in Figure 2

Trang 6

s e l e c t s the d e p e n d e n t c a n d i d a t e n o r m a l l y

as L1 or R1 A s w i t c h of state takes

p l a c e by an a s s i g n m e n t in the same way as

l i n g u i s t i c p r o p e r t i e s are assigned As an

e x a m p l e the node V? of F i g u r e 2 is

d e f i n e d f o r m a l l y in F i g u r e 8

M o r e l i n g u i s t i c a l l y o r i e n t e d

a r g u m e n t a t i o n of the D P L - f o r m a l i s m a p p e a r s

e l s e w h e r e (Nelimarkka, 1984a, and

N e l i m a r k k a , 1984b)

THE A R C H I T E C T U R E OF THE D P L - E N V I R O N M E N T

The a r c h i t e c t u r e of the D P L - e n v i r o n m e n t is

d e s c r i b e d s c h e m a t i c a l l y in F i g u r e 9 The

m a i n parts are h i g h l i g h t e d by h e a v y lines

S i n g l e arrows r e p r e s e n t d a t a transfer;

d o u b l e arrows indicate the p r o d u c t i o n of

d a t a structures All m o d u l e s have b e e n

i m p l e m e n t e d in LISP The r e a l i s a t i o n s do

not rely on s p e c i f i c s of u n d e r l y i n g

L I S P - e n v i r o n m e n t s

The D P L - c o m p i l e r

A c o m p i l a t i o n results in e x e c u t a b l e code

of a parser The c o m p i l e r p r o d u c e s h i g h l y

I n t e r n a l l y d a t a s t r u c t u r e s are only p a r t l y

d y n a m i c for the r e a s o n of fast i n f o r m a t i o n fetch A m b i g u i t i e s are e x p r e s s e d l o c a l l y

to m i n i m i z e redundant search The

p r i n c i p l e of s t r u c t u r e s h a r i n g is f o l l o w e d

w h e n e v e r new data s t r u c t u r e s are built

In the m a n i p u l a t i o n of c o n s t i t u e n t

s t r u c t u r e s there e x i s t s a s p e c i a l s e r v i c e

r o u t i n e for each c o m b i n a t i o n of p r o p e r t y and p r e d i c a t i o n types T h e s e r o u t i n e s take s p e c i a l care of time and m e m o r y

c o n s u m p t i o n For i n s t a n c e with r e g a r d

r e p l a c e m e n t s and i n s e r t i o n s the c o p y i n g

i n c l u d e s p h y s i c a l l y only the path from the root of the list s t r u c t u r e to the c h a n g e d sublist The l o g i c a l l y shared p a r t s w i l l

• be s h a r e d also p h y s i c a l l y This

s t i p u l a t i o n m i n i m i z e s m e m o r y u s a g e

In the state t r a n s i t i o n n e t w o r k level the

s e a r c h is done d e p t h first To h a n d l e

a m b i q u i t i e s D P L - f u n c t i o n s and - r e l a t i o n s

p r o c e s s all a l t e r n a t i v e i n t e r p r e t a t i o n s in

p a r a l l e l In fact the a l t e r n a t i v e s are

s t o r e d in the stacks and in the C - r e g i s t e r

as trees of a l t e r n a n t s

In the first v e r s i o n of the D P L - c o m p i l e r the g e n e r a t i o n rules w e r e i n t e r m i x e d w i t h the c o m p i l e r code The m a i n t e n a n c e of the

c o m p i l e r g r e w h a r d e r w h e n we e x p e r i m e n t e d

w i t h new c o m p u t a t i o n a l features We

p a r s e r facility

lexicon maintenance

information extraction system with graphic output

Trang 7

m e t a c o m p i l e r in w h i c h c o m p i l a t i o n is

d e f i n e d by rules At m o m e n t we are

t e s t i n g it and soon it w i l l be in e v e r y d a y

use T h e a m o u n t of L I S P - c o d e has g r e a t l y

r e d u c e d with the rule based a p p r o a c h , and

we are n o w p l a n n i n g to i n s t a l l the

D P L - e n v i r o n m e n t into IBM PC

Our p a r s e r s w e r e a i m e d to be p r a c t i c a l

tools in real p r o d u c t i o n a p p l i c a t i o n s It

w a s h e n c e i m p o r t a n t to m a k e the p r o d u c e d

p r o g r a m s t r a n s f e r a b l e As of now we h a v e

a r u l e - b a s e d t r a n s l a t o r w h i c h c o n v e r t s

p a r s e r s b e t w e e n L I S P d i a l e c t s The

t r a n s l a t o r a c c e p t s c u r r e n t l y I N T E R L I S P ,

F r a n z L I S P and C o m m o n Lisp

T h e e n v i r o n m e n t has a s p e c i a l m a i n t e n a n c e

p r o g r a m for l e x i c o n s The p r o g r a m uses

v i d e o g r a p h i c s to e a s e u p d a t i n g and it

p e r f o r m s v a r i o u s c h e c k s to g u a r a n t e e the

c o n s i s t e n c y of the l e x i c a l e n t r i e s It

a l s o c o - o p e r a t e s w i t h the i n f o r m a t i o n

e x t r a c t i o n s y s t e m to h e l p the user in the

s e l e c t i o n of p r o p e r t i e s

T h e T r a c i n g F a c i l i t y

T h e t r a c i n g f a c i l i t y is a c o n v e n i e n t tool for g r a m m a r d e b u g g i n g For e x a m p l e , in

F i g u r e I0 a p p e a r s the t r a c e of the p a r s i n g

of the s e n t e n c e " P o i k a n i tuli i l l a l l a

k e n t ~ i t ~ h e i t t ~ m ~ s t ~ k i e k k o a " (= " M y son

~ 8 ~ ¢ c ~ s e s

• 03 seconds

0 0 s e c o n d s , g a r b a g e c o l l e c t i o n t i m e

P A R S E D _ P R T H ( )

= > ( P O I K A ) (TULJ.A) ( I L T A ) ( K E N T T ~ ) ( H E I T T ~ ) (KIE]<KO) ?N ( P O I K A ) < = ( T U L L A ) ( I L T A ) ( K E N T T ~ ) ( H E I T T ~ ) ( K I E K K O ) N?

= > ( P O I K A ) ( T U L L A ) ( I L T A ) ( K E N T T ~ ) ( H E I T T ~ ) ( K I E K K O ) ? N F i n a l ( # # ) ( P O I K A ) ( T U L L A ) ( I L T A ) ( K E N T T ~ ) ( H E I T T ~ ) ( K I E K K O ) NIL ( P O I K A ) => ( T U L L A ) (ILTA) ( K E N T T ~ ) ( H E I T T ~ ) ( K I E K K O ) ?V

,=> ( ( P O I K A ) TULLA) (ILTA) (KENTT~) ( H E I T T ~ ) ( K I E K K O ) ?VS ((POIKA) TULLA) <= ( ~ L T A ) (KENTT~) (HEITT~&) ( K I E K K O ) VS?

( ( P O I K A ) T U L L A ) ( I L T A ) <= ( K E N T T ~ ) ( H E I T T ~ ) ( K I E K K O ) N?

((POIKA) TULLA) => "(ILTA) (KENTT~) ( H E I T T ~ ) ( K I E K K O ) ? N F i n a l

((POIKA) TULLA ( I L T A ) ) => (KENTT~) ( H E I T T ~ ) ( K I E K K O ) ? N F i n a l ((POIKA) TULLA (ILTA)) <= (KENTT&) ( H E I T T ~ ) ( K I E K K O ) VS?

( ( P O L K A ) T U L L A ( I L T A ) ( K E N T T ~ ) ) < = ( H E I T T ~ ) ( K I E K K O ) V S ?

DONE

F i g u r e I0 A trace of p a r s i n g p r o c e s s

Trang 8

c a m e back in the e v e n i n g f r o m the s t a d i u m

w h e r e he had b e e n t h r o w i n g the d i s c u s " )

Each row r e p r e s e n t s a state of the p a r s e r

b e f o r e the c o n t r o l e n t e r s the s t a t e

m e n t i o n e d on the r i g h t - h a n d column T h e

t h u s - f a r found c o n s t i t u e n t s are s h o w n by

the p a r e n t h e s i s An a r r o w h e a d p o i n t s

from a d e p e n d e n t c a n d i d a t e (one w h i c h is

s u b j e c t e d to d e p e n d e n c y tests) t o w a r d s the

c u r r e n t c o n s t i t u e n t

The t r a c i n g f a c i l i t y g i v e s also the

c o n s u m e d C P U - t i m e and two q u a l i t y

i n d i c a t o r s : s e a r c h e f f i c i e n c y and

c o n n e c t i o n e f f i c i e n c y S e a r c h e f f i c i e n c y

is 100%, if no u s e l e s s s t a t e t r a n s i t i o n s

took p l a c e in the search T h i s figure is

m e a n i n g l e s s w h e n the s y s t e m is

p a r a m e t e r i z e d to full s e a r c h b e c a u s e then

all t r a n s i t i o n s are tried

C o n n e c t i o n e f f i c i e n c y is the ratio of the

n u m b e r of c o n n e c t i o n s r e m a i n i n g in a

r e s u l t to the total n u m b e r of c o n n e c t i o n s

a t t e m p t e d for it d u r i n g the search W e

are c u r r e n t l y d e v e l o p i n g o t h e r m e a s u r i n g

tools to e x t r a c t s t a t i s t i c a l i n f o r m a t i o n ,

eg a b o u t the f r e q u e n c y d i s t r i b u t i o n of

is also a u t o m a t i c b o o k - k e e p i n g of all

s e n t e n c e ~ input to the system T h e s e w i l l

be d i v i d e d into two g r o u p s : p a r s e d and

n o t parsed The first g r o u p c o n s t i t u t e s

g r o w i n g test m a t e r i a l to e n s u r e m o n o t o n i c

i m p r o v e m e n t of g r a m m a r s : a f t e r a non

t r i v i a l c h a n g e is d o n e in the g r a m m a r , a

n e w c o m p i l e d p a r s e r runs all test

s e n t e n c e s and the r e s u l t s are c o m p a r e d to the p r e v i o u s ones

I n f o r m a t i o n E x t r a c t i o n S y s t e m

In an a c t u a l w o r k i n g s i t u a t i o n t h e r e m a y

be t h o u s a n d s of l i n g u i s t i c s y m b o l s in the

w o r k space To m a k e such a c o m p l e x

m a n a g e a b l e , we have i m p l e m e n t e d an

i n f o r m a t i o n s y s t e m that for a g i v e n s y m b o l

p r e t t y - p r i n t s all i n f o r m a t i o n a s s o c i a t e d

w i t h it

T h e e n v i r o n m e n t has r o u t i n e s for the

g r a p h i c d i s p l a y of p a r s i n g results A user c a n s e l e c t i n f o r m a t i o n by p o i n t i n g

w i t h the cursor The e x a m p l e in F i g u r e Ii

d e m o n s t r a t e s the use of this facility

T h e c o m m a n d SHOW() i n q u i r e s the r e s u l t s of

_SHOW ( )

TULLA

I

I

i

S u b J e c t

, e HEITT~U~

Adverbial

S

!

K I E K K O

O b j e c t

N e u t r a l

D e f a u l t v a l u e n - P h r a s e

A s s o c i a t e d v a l u e s : ( + D e c l a r a t i v e - D e c l a r a t i v e +Main - M a i n +Nominal

- N o m i n a l +Phrase - P h r a s e + P r e d i c a t i v e - P r e d i c a t i v e + R e l a t i v e - R e l a t i v e

A s s o c i a t e d ~ u n c t i o n s l ( C ~ n s t F e a t / I N I T C o n s t F e a t / F N C e n s t F e a t l = C o n s t F e a t / = : = C o n s t F e a t / : -

F i g u r e ii An e x a m p l e of i n f o r m a t i o n e x t r a c t i o n u t i l i t i e s

Trang 9

the p a r s i n g p r o c e s s d e s c r i b e d in F i g u r e i0

The s y s t e m r e p l i e s by first p r i n t i n g the

s t a r t state and then the found result(s)

in c o m p r e s s e d Eorm The c u r s o r has b e e n

m o v e d on top of this p a r s e and C T R L - G has

b e e n typed The s y s t e m now d r a w s the

p i c t u r e of the tree s t r u c t u r e

S u b s e q u e n t l y one of the n o d e s has b e e n

opened The p r o p e r t i e s of the node P O I K A

a p p e a r p r e t t y - p r i n t e d The user has

f u r t h e r m o r e asked i n f o r m a t i o n a b o u t the

p r o p e r t y type C o n s t F e a t All t h e s e

o p e r a t i o n s are g e n e r a l ; they do not use

the s p e c i a l f e a t u r e s of any p a r t i c u l a r

terminal

C O N C L U S I O N The p a r s i n g s t r a t e g y a p p l i e d for the

D P L - f o r m a l i s m was o r i g i n a l l y v i e w e d as a

c o g n i t i v e model It has p r o v e d to r e s u l t

p r a c t i c a l and e f f i c i e n t p a r s e r s as well

E x p e r i m e n t s w i t h a n o n - t r i v i a l set of

F i n n i s h s e n t e n c e s t r u c t u r e s h a v e b e e n

p e r f o r m e d both on D E C - 2 0 6 0 and on

V A X - I I / 7 8 0 systems The a n a l y s i s of an

e i g h t word sentence, for instance, takes

b e t w e e n 20 and 600 ms of DEC C P U - t i m e in

the I N T E R L I S P - v e r s i o n d e p e n d i n g on w h e t h e r

one w a n t s o n l y the first or, t h r o u g h

c o m p l e t e search, all p a r s e s for

s t r u c t u r a l l y a m b i g u o u s s e n t e n c e s The

M a c L I S P - v e r s i o n of the p a r s e r r u n s a b o u t

20 % f a s t e r on the same c o m p u t e r T h e

N I L - v e r s i o n (Common L i s p compatible) is

a b o u t 5 times slower on VAX T h e w h o l e

e n v i r o n m e n t has b e e n t r a n s f e r r e d a l s o to

F r a n z L I S P on VAX W e have not yet focused

on o p t i m a l i t y issues in g r a m m a r

d e s c r i p t i o n s We b e l i e v e that by

r e a r r a n g i n g the o r d e r i n g s of e x p e c t a t i o n s

in the a u t o m a t a i m p r o v e m e n t in e f f i c i e n c y

ensues

i Lehtola, A., C o m p i l a t i o n and

I m p l e m e n t a t i o n of 2 - w a y T r e e A u t o m a t a for the P a r s i n g of Finnish M.So Thesis,

~ e l s i n k i U n i v e r s i t y of T e c h n o l o g y ,

D e p a r t m e n t of P h y s i c s , 1984, 120 p (in Finnish)

2° N e l i m a r k k a , E°, J ~ p p i n e n , H and

L e h t o l a A., T w o - w a y F i n i t e A u t o m a t a and

D e p e n d e n c y Theory: A P a r s i n g M e t h o d for

I n f l e c t i o n a l Free W o r d O r d e r L a n g u a g e s Proc C O L I N G 8 4 / A C L , S t a n f o r d , 1984a, pp 389-392

3° N e l i m a r k k a , E., J ~ p p i n e n , H and

L e h t o l a A., P a r s i n g an I n f l e c t i o n a l F r e e

W o r d O r d e r L a n g u a g e w i t h T w o - w a y F i n i t e

A u t o m a t a ° Proc of the 6th E u r o p e a n

C o n f e r e n c e on A r t i f i c i a l I n t e l l i g e n c e , Pisa, 1984b, pp 167-176

4 W i n o g r a d , To, L a n g u a g e as a C o g n i t i v e

A d d i s o n - W e s l e y P u b l i s h i n g Company,

R e a d i n g , 1983, 640 p

Ngày đăng: 01/04/2014, 00:20

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm