1. Trang chủ
  2. » Luận Văn - Báo Cáo

Báo cáo khoa học: "INCORPORATING INHERITANCE AND FEATURE STRUCTURES INTO A LOGIC GRAMMAR FORMALISM" pptx

7 190 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 7
Dung lượng 401,61 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Its taxo- nomic reasoning facilitates semantic type-class reasoning during grammatical analysis.. T h a t is, the GLB of the two symbols in the signature lattice becomes the head symbol

Trang 1

I N C O R P O R A T I N G I N H E R I T A N C E A N D F E A T U R E S T R U C T U R E S

I N T O A L O G I C G R A M M A R F O R M A L I S M

H a r r y H P o r t e r , III Oregon G r a d u a t e C e n t e r

19600 N.W Von N e u m a n n Dr

B e a v e r t o n Oregon 97008-1999

A B S T R A C T Hassan Ait-Kaci introduced the #/-term,

an informational s t r u c t u r e resembling feature-

based functional s t r u c t u r e s but which also

includes taxonomic inheritance (Ait-Kaci, 1984)

We describe e - t e r m s and how they have been

i n c o r p o r a t e d into the Logic G r a m m a r formal-

ism The result, which we call Inheritance

G r a m m a r , is a proper superset of DCG and

includes m a n y features of PATR-II Its taxo-

nomic reasoning facilitates semantic type-class

reasoning during grammatical analysis

I N T R O D U C T I O N

T h e Inheritance G r a m m a r (IG) formalism

is an extension of Hassan Ait-Kaci's work on #/-

t e r m s (Ait-Kaci, 1984; Ait-Kaci and Nasr,

1986) A e - t e r m is an informational s t r u c t u r e

similar to both the f e a t u r e s t r u c t u r e of P A T R - I I

(Shieber, 1985; Shieber, et al, 1986) and the

first-order t e r m of logic, e - t e r m s are ordered by

subsumption and form a lattice in which

unification of #/-terms a m o u n t s to greatest lower

bounds (GLB, [-']) In Inheritance G r a m m a r , #/-

terms are i n c o r p o r a t e d into a c o m p u t a t i o n a l

p a r a d i g m similar to the Definite Clause G r a m -

mar (DCG) formalism (Pereira and W a r r e n ,

1980) Unlike f e a t u r e s t r u c t u r e s and first-order

terms, the atomic symbols of #/-terms are

ordered in an IS-A taxonomy, a distinction t h a t

is useful in performing semantic type-class rea-

soning during g r a m m a t i c a l analysis We begin

by discussing this ordering

T H E I S - A R E L A T I O N A M O N G

F E A T U R E V A L U E S

Like o t h e r g r a m m a r formalisms using

feature-based functional structures, we will

assume a fixed set of symbol8 called the signa-

represent lexical, s y n t a c t i c and semantic categories and o t h e r f e a t u r e values In many formalisms (e.g DCG and PATR-II), equality is the only o p e r a t i o n for symbols; in IG symbols are r e l a t e d in an IS-A h i e r a r c h y These rela- tionships are indicated in the g r a m m a r using

s t a t e m e n t s such as1:

b o y < m a s c u l i n e O b j e c t girl < f e m i n i n e O b j e c t

m a n < m a s c u l i n e O b j e c t

w o m a n < f e m i n i n e O b J e c t {boy, g i r l } < c h i l d {man, w o m a n } < adult

{child, a d u l t } < h u m a n

T h e symbol < can be read as "is a" and the

n o t a t i o n { a , , ,an}<b is an a b b r e v i a t i o n for

a l < b , • • • ,an<b The g r a m m a r w r i t e r need not distinguish between instances and classes, or between s y n t a c t i c and semantic categories when the hierarchy is specified Such distinctions are only determined by how the symbols are used in the g r a m m a r Note t h a t this example ordering exhibits multiple inheritance: f e m i n i n e O b -

j e e r s a r e n o t n e c e s s a r i l y h u m a n s a n d

h u m a n s are not necessarily f e m i n i n e 0 b -

J e e r s , yet a g i r l is both a h u m a n and a

f e m i n i n e O b j ect

C o m p u t a t i o n of LUB (t_ J ) and GLB (['7)

in a r b i t r a r y p a r t i a l orders is problematic In

IG, the g r a m m a r writer specifies an a r b i t r a r y ordering which the rule execution system

a u t o m a t i c a l l y embeds in a lattice by the addi- tion of newly c r e a t e d symbols (Maier, 1980) Symbols may be t h o u g h t of as standing for conceptual sets or semantic types and the IS-A relationship can be t h o u g h t of as set

I Symbols appearing in the g r a m m a r but not in the

Trang 2

inclusion Finding the G L B - i e unification of

s y m b o l s - t h e n a m o u n t s to set intersection For

the p a r t i a l order specified above, two new sym-

bols are a u t o m a t i c a l l y added, representing

semantic categories implied by the IS-A state-

ments, i.e h u m a n females and h u m a n males

The first new category (human females) can be

t h o u g h t of as the intersection of h u m a n and

f e m i n l n e O b J e c t or as the union of girl and

w o m a n 2, and similarly for h u m a n males The

signature resulting from the IS-A statements is

shown in Figure 1

C - T E R M S A S F E A T U R E S T R U C T U R E S

Much work in c o m p u t a t i o n a l linguistics is

focussed around the application of unification to

an informational s t r u c t u r e t h a t maps a t t r i b u t e

names (also called feature names, slot names, or

labels) to values (Kay, 1984a; K a y , 1984b;

Shieber, 1985; Shieber, et al, 1986) A value is

either atomic or (recursively) a n o t h e r such map-

ping These mappings are called by various

names: feature structures, functional structures,

f-structures, and feature matrices The feature

s t r u c t u r e s of P A T R - I I are most easily under-

stood by viewing them as directed, acyclic

graphs (DAGs) whose arcs are a n n o t a t e d with

f e a t u r e labels and whose leaves are a n n o t a t e d

with atomic feature values (Shieber, 1985)

IS-A s t a t e m e n t s are t a k e n to be u n r e l a t e d

2 Or a n y t h i n g in between One is the m o s t liberal in-

t e r p r e t a t i o n , t h e o t h e r t h e m o s t conservative T h e signs-

t u r e could be extended by a d d i n g both classes, a n d any

n u m b e r in between

IGs use C-terms, an informational struc- ture t h a t is best described as a rooted, possibly cyclic, directed graph E a c h node (both leaf and interior) is a n n o t a t e d with a symbol from the signature Each arc of the graph is labelled with a feature label (an attribute) The set of feature labels is unordered and is distinct from the signature The formal definition of C-terms, given in set theoretic terms, is complicated in several ways beyond the scope of this

p r e s e n t a t i o n - s e e the definition of well-formed types in (Ait-Kaci, 1984) We give several examples to give the flavor of C-terms

F e a t u r e s t r u c t u r e s are often represented using a b r a c k e t e d m a t r i x n o t a t i o n , in addition

to the DAG notation C-terms, on the other hand, are represented using a t e x t u a l notation similar to t h a t of first-order terms The s y n t a x

of the t e x t u a l r e p r e s e n t a t i o n is given by the fol- lowing extended B N F g r a m m a r 3

t e r m ::=

featureList ::=

f e a t u r e ::=

symbol [ f e a t u r e L i s t ] [ featureList

( feature , f e a t u r e , , f e a t u r e ) label => t e r m [ label ~ variable [ : t e r m ]

Our first example contains the symbols

n p , s i n g u l a r , and t h i r d The label of

3 T h e vertical bar s e p a r a t e s a l t e r n a t e c o n s t i t u e n t s ,

b r a c k e t s enclose optional c o n s t i t u e n t s , a n d ellipses are used (loosely) to indicate repetition T h e c h a r a c t e r s ( ) - > , and

z are t e r m i n a l s

f e m i n i n e O b j e c t h u m a n m a s c u l i n e O b j e c t

a d u i t h u m a n F e m a i e h u m a n M a i e c h i i d

Figure 1 A signature

Trang 3

the root node, np, is called the head symbol

This C-term contains two features, labelled by

n u m b e r and person

np ( n u m b e r ~ singular,

p e r s o n ~ t h i r d )

T h e next example includes a s u b t e r m at

agreement:=>:

agreement ~ (number ~ singular,

p e r s o n ~ third))

In this C-term the head symbol is missing, as is

the head symbol of the subterm When a sym-

bol is missing, the most general symbol of the

signature ( T ) is implied

In t r a d i t i o n a l first-order terms, a variable

serves two purposes First, as a wild card, it

serves as a place holder which will m a t c h any

term Second, as a tag, one variable can con-

strain several positions in the t e r m to be filled

by the same s t r u c t u r e In C-terms, the wild

card function is filled by the maximal symbol of

the signature ( T ) which will m a t c h any C-term

during unification Variables are used

exclusively for the tagging function t o indicate

C-term eore/erence By convention, variables

always begin with an uppercase letter while

symbols and labels begin with lowercase letters

and digits

In the following ~b-term, representing The

used to identify the subject of wants with the

subject of dance

sentence (

subject ~ X: man,

p r e d i c a t e ~ wants,

v e r b C o m p ~ clause (

p r e d i c a t e ~ dance,

object ~ m a r y ))

If a variable X appears in a t e r m tagging

a s u b t e r m t, then all subterms tagged by other

occurrences of X must be consistent with (i.e

unify with) t 4 If a variable a p p e a r s w i t h o u t a

s u b t e r m following it, the t e r m consisting of sim-

ply the top symbol ( T ) is assumed The con-

s t r a i n t implied by variable coreference is not just equality of s t r u c t u r e b u t equality of refer- ence F u r t h e r unifications t h a t add i n f o r m a t i o n

to one sub-structure will necessarily add it to the other Thus, in this example, X constrains the terms a p p e a r i n g at the p a t h s subject=>

a n d v e r b C o m p ~ s u b j e c t ~ to be the same term

In the ~b-term r e p r e s e n t a t i o n of the sen-

tence The man with the toupee sneezed, shown

below, the n p filling the s u b j e c t role, X, has two a t t r i b u t e s One is a q u a l i f i e r filled by

a r e l a t i v e C l a u s e whose s u b j e c t is X itself

s e n t e n c e (

s u b j e c t ~ X: np (

h e a d ~ man,

q u a l i f i e r ~ r e l a t i v e C l a u s e

s u b j e c t ~ X,

p r e d i c a t e ~ wear,

o b j e c t ~ toupee)),

p r e d i c a t e ~ sneezed)

As the graphical r e p r e s e n t a t i o n (in Figure 2) of this t e r m clearly shows, this C-term is cyclic

U N I F I C A T I O N O F ~ b - T E R M S The unification of two ~b-terms is similar

to the unification of two f e a t u r e s t r u c t u r e s in

P A T R - I I or two first-order terms in logic Unification of two terms t I and t 2 proceeds as follows First, the head symbols of tl and t2"are unified T h a t is, the GLB of the two symbols in the signature lattice becomes the head symbol

of the result Second, the subterms of t I and t , are unified When t I and t 2 both contain the

f e a t u r e f, the corresponding subterms are unified and added as f e a t u r e f of the result If one term, say h , contains f e a t u r e f and the o t h e r

t e r m does not, then the result will contain

f e a t u r e f with the value from h This is the same result t h a t would o b t a i n if t2 contained feature f with value T Finally, the s u b t e r m

4 N o r m a l l y , t h e s u b t e r m a t X will be w r i t t e n follow- ing t h e first occurrence of X a n d all o t h e r occurrences of X will not include s u b t e r m s

Trang 4

coreference constraints implied by the variables

in t 1 and t 2 are respected T h a t is, the result is

the least constrained ~b-term such t h a t if two

p a t h s (addresses) in t 1 (or t2) are tagged by the

same variable (i.e they core/%r) then they will

corefer in the result

For example, when the C-term

(agreement @ X: (number@singular),

subject => (agreement@X))

is unified with

( s u b j e c t @

( a g r e e m e n t @

(person@third)))

the result is

(agreement @ X: (number@singular,

p e r s o n @ t h i r d ) ,

I N H E R I T A N C E G R A M M A R S

An IG consists of several IS-A s t a t e m e n t s

and several g r a m m a r rul¢~ A g r a m m a r rule is

a definite clause which uses C-terms in place of the first-order literals used in first-order logic programming s Much of the n o t a t i o n of Pro]og and DCGs is used In p a r t i c u l a r , the : - sym- bol separates a rule head from the C-terms comprising the rule body Analogously to Pro- log, l i s t - n o t a t i o n (using [, I, and ] ) can be used as a s h o r t h a n d for C-terms representing lists and containing h e a d and t a i l features When the - - > symbol is used instead of " - , the rule is t r e a t e d as a context-free g r a m m a r rule and the i n t e r p r e t e r a u t o m a t i c a l l y appends two additional arguments (start and e n d ) to facilitate parsing The final syntactic sugar allows feature labels to be elided; sequentially numbered numeric labels are a u t o m a t i c a l l y sup- plied

Our first simple Inheritance G r a m m a r consists of the rules:

s e n t - - > n o u n ( N u m ) , v e r b ( N u m )

n o u n ( p l u r a l ) - - > [ c a t s ]

v e r b ( p l u r a l ) - - > [ m e o w ]

The sentence to be parsed is supplied as a goal

6 This is to be contrasted with L O G I N , in which ¢-

Figure 2 Graphical representation of a C-term

Trang 5

clause, as in:

: - s e n t ( [ c a t s , m e o w ] , [])

T h e interpreter first translates these clauses

into the following equivalent IG clauses,

expanding a w a y the notational sugar, before

execution begins

s e n t ( s t a r t ~ P l , e n d ~ P 3 ) : -

n o u n ( l ~ N u m , s t a r t ~ P l , e n d ~ P 2 ) ,

v e r b ( l ~ N u m , s t a r t ~ P 2 , e n d ~ P 3 )

n o u n ( l ~ p l u r a l ,

s t a r t ~ l i s t ( h e a d , c a t s , t a i l ~ L ) ,

e n d ~ L )

v e r b ( l ~ p l u r a l ,

s t a r t ~ l i s t ( h e a d , m e o w , t a i l ~ L ) ,

e n d ~ L )

: - s e n t ( s t a r t ~ l i s t (

h e a d , c a t s ,

t a i l ~ l i s t (

h e a d , m e o w ,

t a i l ~ n i l ) ) ,

e n d ~ n i l )

As this example indicates, every DCG is an

Inheritance Grammar However, since the argu-

ments may be arbitrary C-terms, IG can also

accomodate feature structure manipulation

T Y P E - C L A S S R E A S O N I N G IN P A R S I N G

Several logic-based grammars have used

semantic categorization of verb arguments to

disambiguate word senses and fill case slots (e.g

Dahl, 1979; Dahl, 1981; McCord, 1980) The

primary motivation for using !b-terms for gram-

matical analysis is to facilitate such semantic

type-class reasoning during the parsing stage

As an example, the DCG presented in

(McCord, 1980) uses unification to do taxonomic

reasoning Two types unify iff one is a subtype

of the other; the result is the most specific type

For example, if the first-order term s m i t h : _

representing an untyped individual 6, is unified

with the type expression X : p e r s o n : s t u d e n t ,

representing the student subtype of person, the

result is s m i t h : p e r s o n : s t u d e n t

terms replace first-order terms rather than predications

e Here the colon is used as a right-associative infix

operator meaning subtype

While this grammar achieves extensive coverage, we perceive two shortcomings to the approach (1) The semantic hierarchy is some- what inflexible because it is distributed throughout the lexicon, rather than being main- tained separately (2) Multiple Inheritance is not accommodated (although see McCord, 1985) In IG, the ¢-term s t u d e n t can act as a typed variable and unifies with the C-term

s m i t h (yielding s m i t h ) assuming the presence

of IS-A statements such as:

s t u d e n t < p e r s o n { s m i t h , J o n e s , b r o w n } < s t u d e n t

T h e t a x o n o m y is specified separately-even with the potential of d y n a m i c modification-and mul- tiple inheritance is a c c o m m o d a t e d naturally

OTHER GRAMMATICAL APPLICATIONS

OF TAXONOMIC REASONING The taxonomic reasoning mechanism of IG has applications in lexical and syntactic categorization as well as in semantic type-class reasoning As an illustration which uses C-term predications, consider the problem of writing a grammar t h a t accepts a prepositional phrase or

a relative clause after a noun phrase but only accepts a prepositional phrase after the verb

phrase So The flower under the tree wilted, The

a n d r e l a t i v e C l a u s e are n p M o d i f i e r s but only a p r e p o s i t i o n a l P h r a s e is a v p M o -

d i f i e r The following highly abbreviated IG shows one simple solution:

{ p r e p o s i t i o n a l P h r a s e ,

r e l a t i v e C l a u s e } < n p M o d i f i e r

p r e p o s i t i o n a l P h r a s e < v p M o d i f i e r

s e n t ( ) - - > r i p ( ) ,

v p ( ) ,

v p M o d i f i e r ( )

n p ( ) > n p ( ) ,

n p M o d i f i e r ( )

n p ( ) - - >

v p ( ) - - > .

p r e p o s i t i o n a l P h r a s e ( ) - - > •

Trang 6

r e l a t i v e C l a u s e ( ) - - >

I M P L E M E N T A T I O N

We have implemented an IG development

environment in Smalltalk on the Tektronix

4406 The IS-A statements are handled by an

ordering package which dynamically performs

the lattice extension and which allows interac-

tive display of the ordering Many of the tech-

niques used in standard depth-first Prolog exe-

cution have been carried over to IG execution

To speed grammar execution, our system

precompiles the grammar rules To speed gram-

mar development, incremental compilation

allows individual rules to be compiled when

modified We are currently developing a large

grammar using this environment

As in Prolog, top-down evaluation is not

ren, 1980; Porter, 1986), a sound and complete

evaluation strategy for Logic programs, frees

the writer of DCGs from the worry of infinite

left-recursion Earley Deduction is essentially a

generalized form of chart parsing (Kaplan, 1973;

Winograd, 1983), applicable to DCGs We are

investigating the application of alternative exe-

cution strategies, such as Earley Deduction and

Extension Tables (Dietrich and Warren, 1986)

to the execution of IGs

A C K N O W L E D G E M E N T S

Valuable interactions with the following people

are gratefully acknowledged: Hassan A.it-Kaci,

David Maier, David S Warren, Fernando

Pereira, and Lauri Karttunen

R E F E R E N C E S AJt-Kaci, Hassan 1984 A Lattice

Theoretic Approach to Computation Based on a

Calculus of Partially Ordered Type Structures,

Ph.D Dissertation, University of Pennsylvannia,

Philadelphia, PA

A.it~-Kaci, Hassan and Nasr, Roger 1986

LOGIN: A Logic Programming Language with

Built-in Inheritance, Journal of Logic Program,

Dahl, Veronica 1979 Logical Design of

Deductive NL Consultable Data Bases, Proc

5th Intl Conf on Very Large Data Bascn, Rio de

Janeiro

Dahl, Veronica 1981 Translating Span- ish into Logic through Logic, Am Journal of

Dietrich, Susan Wagner and Warren, David S 1986 Extension Tables: Memo Rela- tions in Logic Programming, Technical Report 86/18, C.S Dept., SUNY, Stony Brook, New York

Kaplan, Ronald 1973 A General Syn- tactic Processor, in: Randall Rustin, Ed.,

Press, New York, NY

Kay, Martin 1984a Functional Unification Grammar: A "Formalism for Machine Translation, Proc 2Znd Ann Meeting of the

Stanford University, Palo Alto, CA

Kay, Martin 1984b Unification in Grammar, Natural Lang Understanding and

INRIA, Rennes, France

Maier, David 1980 DAGs as Lattices: Extended Abstract, Unpublished manuscript MeCord, Michael C 1980 Using Slots and Modifiers in Logic Grammars for Natural Language, Artificial Intelligence, 18(3):327-368 McCord, Michael C 1985 Modular Logic Grammars, Proc of the eSrd ACL Conference,

Chicago, IL

Pereira, F.C.N and Warren, D.H.D 1980 Definite Clause Grammars for Language Analysis - A Survey of the Formalism and a Comparison with Augmented Transition Net- works, Artificial Intelligence, 13:231-278

Pereira, F.C.N and Warren, D.H.D 1983 Parsing as Deduction, elst Annual Meeting of

ton, MA

Porter, Harry H 1986 Earley Deduction, Technical Report CS/E-86-002, Oregon Gradu- ate Center, Beaverton, OR

Shieber, Stuart M 1985 An Introduction

to Unification-Based Approaches to Grammar, Tutorial Session Notes, £3rd Annual Meeting of

cago, IL

Trang 7

Shieber, S.M., Pereira, F.C.N., Karttunen,

L and Kay, M 1986 A Compilation of Papers

and Information, Stanford

Winograd, Terry 1983 Language aa a

Wesley, Reading, MA

Ngày đăng: 31/03/2014, 17:20

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm