1. Trang chủ
  2. » Luận Văn - Báo Cáo

Báo cáo khoa học: "SEMANIIC PARSING AS GRAPH LANGUAGE TRANSFORMATION A MULIIDIMENSIONAL APPROACH TO PARSING HIGHLY INFLECTIONAL LANGUAGES" docx

4 246 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 4
Dung lượng 254,28 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

An RP is said to be applicable to an r-graph G i f f EDGES~EDGES G and the values in N~sare subsets 6f corresponding values in NPofor each node in LS... AFTER: Node properties as above

Trang 1

A MULIIDIMENSIONAL APPROACH TO PARSING HIGHLY INFLECTIONAL LANGUAGES

Eero Hyv~nen He]sJnkJ I J n i v e r s i t y o f TechnoloQy

D i a J t a l SysLems L a b o r a t o r y

O t a k a a r J 5A n215n Espoo 15 FINLAND

A B S T R A C T

The s t r u c t u r e of many languages with

"free" word order and r i c h morphology l i k e

Finnish is rather c o n f i g u r a t i o n a l than

l i n e a r Although n o n - l i n e a r s t r u c t u r e s

can be represented by l i n e a r formalisms i t

is often more natural to study

multidimensional arrangement of symbols

Graph grammars are a multidimensional

g e n e r a l i z a t i o n of l i n e a r s t r i n g grammars

In graph grammars s t r i n g r e w r i t e rules are

generalized i n t o g r a p h r e w r i t e r u l e s

This p a p e r presents a g r a p h grammar

formalism and parsing scheme f o r parsing

languages with inherent c o n f i g u r a t i o n a l

f l a v o r A small experimental Finnish

parsing system has b e e n implemented

(Hyv6nen 1983)

A SIMPLE GRAPH GRAMMAR FORMALISM

WITH A CONTROL FACILITY

In applying s t r i n g grammars to parsing

natural Finnish several problems arise in

representing c o m p l e x w o r d s t r u c t u r e s ,

argeements, " f r e e " w o r d ordering,

d i s c o n t i n u i t y , and intermediate depencies

between morphology, syntax and semantics

A strong, multidimensional formalism that

can cope with d i f f e r e n t l e v e l s of language

seems necessary In t h i s chapter a graph

grammar formalism based on the notions of

r e l a t i o n a l graph grammars ( R a j l i c h 1975)

and a t t r i b u t e d programmed graph grammars

(Bunke 1982) is developed f o r parsing

languages with c o n f i g u r a t i o n a l s t r u c t u r e

D e f i n i t i o n 1.1 ( r e l a t i o n a l graph, r-graph)

Let ARCS, NODES, and PROPS be f i n i t e sets

of symbols A r e l a t i o n a l graph (r-graph)

RG i s pair RG = (EDGES, NP) c o n s i s t i n g of

a set of edges

EDGES, ARCSxNODESxNODES

and a f u n c t i o n liP t h a t associates each

node in EDGES to a set of labeled

property values:

tJP: NODESxPROPS -> PVALUES

PVALUES is the set of possible node

property values T h e y are represented as sets of symbols or l i s t s

Example: Figure I 1 depicts the morphological r-graph representation of Finnish word "ihmisten" (the humans') and

i t s edges as a l i s t EXT-property expresses the set of symbols the node

c u r r e n t l y r e f e r s to ( e x t e n s i o n ) ; CAT

t e l l s the syntactico-semantic category of the node

C~L~£ NR [XT.(PL)

[XT- {IHNINEN) CAT- (SUBST- I HHINEN)

((NOUN N1 N2) (C#3E NI N3) (NR Nl N4) (PERS Nl N5) (PS Nl N6) (EP Nl N7))

Fig 1.1 Morphological r-graph representation of w o r d "ihmisten" (the humans)

D e f i n i t i o n 1.2 ( r - p r o d u c t i o n )

An r - p r o d u c t i o n RP i s a p a i r :

RP = (LS, RS)

LS ( l e f t side) and RS ( r i g h t side) are r-graphs An RP is said to be applicable

to an r-graph G i f f EDGES~EDGES G and the values in N~sare subsets 6f corresponding values in NPofor each node in LS

D e f i n i t i o n 1.3 ( d i r e c t r - d e r i v a t i o n ) The d i r e c t r - d e r i v a t i o n of r-graph H from r-graph G via an r - p r o d u c t i o n RP = (LS, RS) i s defined by the f o l l o w i n g algorithm: Algorithm 1.1 ( D i r e c t r - d e r i v a t i o n )

I n p u t : An r-graph G and

an r - p r o d u c t i o n RP = (LS, RS) Output: An r-graph H derived via RP

from G

Trang 2

PROCEDURE Di rect-r-deri vation :

BEGIN

IF RP is applicable to G (see text)

THEN

EDGES G := EDGES G - EDGESLs

H :=GURS

RETURN H

ELSE

RETURN "Not applicable"

END

Here U is an operation defined f o r two

r-graphs RGI and RG2 as f o l l o w s :

H = RGI I~ RG2

i f f

EDGES H = EDGESRG 1 U EDGESRG 2 a n d

NPw(ni, propj) = NPDr.~(ni, propj) for any

priJperty propj in every node ni in RG2

Time complexity: D i r e c t r - d e r i v a t i o n s are

e s s e n t i a l l y set operations and can be

performed e f f i c i e n t l y By using a hash

table the expected time complexity i s O(n)

w i t h respect to the size of the production

( i t d o e s not depend on the size of the

object graph) The worst c a s e complexity

i s O(n**2)

Example: Figure 1.2 represents an

r - p r o d u c t i o n and f i g u r e 1.3 i t s

a p p l i c a t i o n to an r-graph We have

designed a meta-production d e s c r i p t i o n

f a c i l i t y f o r r - p r o d u c t i o n s by which

match-predicates can be attached to nodes

and arcs in order to t e s t and modify node

properies The i n s t a n t i a t i o n of a

context-dependently w h i l e matching the

production l e f t side I t i s also possible

to specify some special m o d i f i c a t i o n s to

the d e r i v a t i o n graph by meta-productions

)

Fig 1.2 Production ADJ-ATTR

i d e n t i f y a d j e c t i v e a t t r i b u t e s

to

D e f i n i t i o n 1.4 (r-graph gralnmar and

r-graph language)

An r-graph grammar (RGG) i s a p a i r :

RGG = (PROD, START)

PROD i s a set of r - p r o d u c t i o n s and START

i s a set of r-graphs

An r-graph language (RGL) generated by an r-graph grammar i s the set of a l l derivable r-graphs f r o m any r-graph in START by any sequence of a p p l i c a b l e

r - p r o d u c t i o n s of PROD:

RGL ={R-graphISTART =,~R-graph!

EXT-fPL) EXT-{~ PL)

• ~T~U~T I F CM.ANECilVE CM-IIOUtt-ABST EXT=(eO~-ALL) EXT.{BIG) [XT=(PRCG

AFTER:

(Node properties as above)

Fig 1.3 The e f f e c t of applying production ADJ-ATTK ( f i g 1.2) to an r-graph

D e f i n i t i o n 1.5 ( c o n t r o l l e d r-graph grammar)

A c o n t r o l l e d r-graph grammar (CRG) is a

p a i r : CRG = (CG, RGG)

CG i s an r-graph c a l l e d control graph ( c - g r a p h ) I t s i n t e r p r e t a t i o n is defined very much in the same way as w i t h ATN-networks The actions associated to arcs are d i r e c t r - d e r i v a t i o n s ( d e f 1.3) RGG i s an r-graph grammar ( d e f 1 4 ) Example: Figure 1.4 i l l u s t r a t e s a c-graph expressing p o t e n t i a l a t t r i b u t e

c o n f i g u r a t i o n s of n o u n s belonging to category !JOUN-HUMAN A d j e c t i v e , pronoun and genetive a t t r i b u t e s and a q u a n t i f i e r may be i d e n t i f i e d hy corresponding

r - p r o d u c t i o n s (the meaning of (READWORD)- and (PUT-LAST)-arcs is not r e l e v a n t here)

Trang 3

PRON-ATTR

Fig 1.4 A control g r a p h expressing

a t t r i b u t e c o n f i g u r a t i o n s of

syntactico-semantic w o r d category

NOUN-HUHAN

D e f i n i t i o n 1.6 ( C o n t r o l l e d graph language)

A c o n t r o l l e d g r a p h language (CGL)

corresponding to a c o n t r o l l e d r-graph

grammar CRG = (CG, RGG) is the set of

r-graphs derived by the CG using the s t a r t

graphs START and the productions of the

grammar RGG

2 A GRAPH GRAIItIAR PARSING SCHEME

2.1 Function and s t r u c t u r e

Figure 2.1 depicts a RGG-based parsing

scheme that we have applied to natural

language parsing Roughly s p o k e n , the

i n p u t of the parser, i e the set START

of a CRG, i s the morphological

representation(s) of a sentence The

output i s a set of corresponding semantic

deep c a s e representations Parsing is

~een as a multidimensional transformation

between the morphological and semantic

l e v e l s of a language T h e s e l e v e l s are

seen as g r a p h languages The parser

e s s e n t i a l l y defines a "meaning preserving"

mapping from the morphological

representations of a sentence i n t o i t s

semantic representations The

transformation is specified by a

c o n t r o l l e d r-graph grammar The control

graph is not predefined but i s constructed

dynamically according to the i n d i v i d u a l

words of the c u r r e n t sentence During

parsing morphological and semantic

representations are generated in p a r a l l e l

as words are read from l e f t to r i g h t

2.2 S p e c i f i c a t i o n of the morphological

and semantic graph languages

Morphological l e v e l The morphological

representation of a sentence consists of

s t a r - l i k e morphological representations of

the w o r d s ( f i g 1.1) t h a t are glued

togetiler by sequential >- and < - r e l a t i o n s

( f i g 1 3 )

Semantic l e v e l The semantic

representatien of a sentence consists of a

semantic deop case s t r u c t u r e corresponding

tc Lhe main verb Deep case c o n s t i t u e n t s

have t h e i r own semantic c a s e s t r u c t u r e s

corresponding to t h e i r main words

SOURCE GRAPH LANGUAG£

MORPHOLOGY

C o n t r o l l e d r - n r a p h c-~M

INTERPRE~R

g ramma r (CRG', /

i

GOAL GRAPH LANGUAGE

/ 3

SEtIANTI CS

\ PRODUCTIONS j

Fig 2.1 A parsing scheme for transforming

graph languages

Example: Figure 2.2 i l l u s t r a t e s the semantic representation of question " Kuka

l u e n n o i t s i j a on luennoinut jonkun

t i e t o j e n k ~ s i t t e l y t e o r i a s t a syksyll~ 1981" ("Which l e c t u r e r has l e c t u r e d some seminar-type course on computer science in the autumn 1981")

MAZN

Fig 2 2 Semantic graph representation of

a Finnish question Node properties are not shown

2.3 S p e c i f i c a t i o n of the graph language transformation

The transformation i s s p e c i f i e d by an agenda of p r i o r i t i z e d c-graphs

I n i t i a l l y , the agenda consists of a set of sentence independent " t r a n s f o r m a t i o n a l " c-graphs ( t h a t , f o r example, transform passive clauses i n t o a c t i v e o n e s ) and

Trang 4

sentence dependent c-graphs corresponding

to the syntactico-semantic categories of

the i n d i v i d u a l words in the sentence For

example, the c-graph of f i g 1.4

corresponds to nouns belonging to category

NOUN-HUMAN I t t r i e s to i d e n t i f y semantic

case c o n s t i t u e n t s by the productions

corresponding to the arcs Fig 1.2

i l l u s t r a t e s the production ADJ-ATTR

( a d j e c t i v e a t t r i b u t e ) used i n the c-graph

of f i g 1.4 The i n t e r p r e t a t i o n of the

production i s : I f there is an a d j e c t i v e

preceeding a noun in the same c a s e and

number the w o r d s are in semantic KIND

r e l a t i o n w i t h each other As a whole, the

agenda c o n s t i t u t e s a modular, sentence

dependent c-graph

Parsing i s performed by i n t e r p r e t i n g the

agenda D i f f e r e n t s t r a t e g i e s could be

applied here; the s t r u c t u r e of the

c-graphs depend on the choice In our

experimental system parsing i s performed

by i n t e r p r e t i n g the f i r s t c-graph i n the

agenda The c-graohs are defined in such

way t h a t they interpret each other and glue

morphological representations of words

i n t o the d e r i v a t i o n graph (arcs (READWORD)

and (PUTLAST) in f i g 1.4) u n t i l a

grammatical semantic representation (or in

ambiguous cases several ones) i s reached

2.4 L i n g u i s t i c and computational

motivations

Most i n f l u e n t i a l l i n g u i s t i c t h e o r i e s and

ideas behind our parser are dependence

grammar, semantic c a s e grammar, and the

notion of "word expert" parsing The idea

is t h a t the c-graphs of w o r d categories

a c t i v e l y t r y to f i n d the dependents of the

main words and i d e n t i f y i n what semantic

roles they are ( c f the

ADJ-ATTR-production of f i g 1 2 ) In

some cases i t i t useful to assign a c t i v e

role to dependents The c-graphs serve as

i l l u s t r a t i v e l i n g u i s t i c d e s c r i p t i o n s of

the syntactico-semantic features of word

categories and other fenomena

Computationally, our formalism and parsing

scheme gives high expressive power but i t s

time complexity i s not high Only

p o t e n t i a l l y r e l e v a n t productions are t r i e d

to use during parsing Graphs are

i l l u s t r a t i v e and can be used to express

both procedural and d e c l a r a t i v e knowledge

New w o r d category models can be added to

the parser r a t h e r independently f r o m the

other models

Our small experimental g r a p h grammar

parser f o r Finnish (Hyv6nen 1983) is s t i l l

l i g u i s t i c a l l y quite naive c o n t a i n i n g some

150 l e x i c a l e n t r i e s , 50 productions, and

50 c-graphs A l a r q e r subset of Finnish

needs to be modelled in order to evaluate

the approach p r o p e r l y We are c u r r e n t l y

developing the graph grammar approch

f u r t h e r by g e n e r a l i z i n g the formalism i n t o

h i e r a r c h i c graphs By t h i s w a y , f o r example, large graph s t r u c t u r e s could be manipulated more e a s i l y as s i n g l e e n t i t i e s and i d e n t i c a l s t r u c t u r e s could have

d i f f e r e n t i n t e r p r e t a t i o n s in d i f f e r e n t contexts Also, a m o r e elaborate coroutine b a s e d control s t r u c t u r e f o r

i n t e r p r e t i n g the c-graphs is under developement We feel t h a t the idea of seeing parsing as a multidimensional transformation of r e l a t i o n a l graphs in stead of as a d e l i n e a r i z a t i o n process of a

s t r i n g i n t o a parse tree i s worth

i n v e s t i c a t i n g f u r t h e r

3 ACKNOWLEDGEMENTS Thanks are due to Rauno Heinonen, Harri J~ppinen, Leo Ojala, J o u k o Sepp~nen and the personnel of D i g i t a l Systems Laboratory f o r f r u i t f u l discussions Finnish A c a d e m y , Finnish C u l t u r a l Foundation, S i e m e n s Foundation, and Technical Foundation of Finland have supported our work f i n a n c i a l l y

4 REFERENCES Bunke H (1982): A t t r i b u t e d g r a p h grammars and t h e i r a p p l i c a t i o n to schematic d i a g r a m i n t e r p r e t a t i o n IEEE Trans of pattern a n a l y s i s and machine

i n t e l l i g e n c e , No 6, pp 574-582

Hyv~nen E (1983): G r a p h grammar approach to natural language parsing and understanding Proceedings of IJCAI-83, Karlsruhe

Rajlich V (1975): Dynamics of d i s c r e t e

s t r u c t u r e s and pattern reproduction Journal of computer and s y s t e m sciences,

No 11, pp 186-202

Ngày đăng: 17/03/2014, 19:21

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm