1. Trang chủ
  2. » Luận Văn - Báo Cáo

Báo cáo khoa học: "SEMANTIC RELEVANCEAD ASPECTDPNEC IN AGIVEN SUBJECT DOMAIN NEEDNY" pptx

4 247 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 4
Dung lượng 295,39 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Contents-drlven algorithmic processing of fuzzy wordmeanings to form dynamic stereotype representations Burghard B.. Introduction Modelling system structures of word meanings and/or worl

Trang 1

Contents-drlven algorithmic processing of fuzzy wordmeanings

to form dynamic stereotype representations

Burghard B Rieger Arbeitsgruppe fur mathematisch-empirische Systemforschung (MESY) German Department, Technical University of Aachen,

Aachen, West Germany

ABSTRACT Cognitive p r i n c i p l e s underlying the ( r e - ) c o n s t r u c -

tion of word meaning and/or world knowledge s t r u c -

tures are poorly understood yet In a rather sharp

departure from more orthodox lines of i n t r o s p e c t i v e

a c q u i s i t i o n of s t r u c t u r a l data on meaning and know-

ledge representation in cognitive science, an empi-

r i c a l approach is explored that analyses natural

language data s t a t i s t i c a l l y , represents i t s numeri-

cal findings fuzzy-set t h e o r e t i c a l l y , and i n t e r -

pret5 i t s intermediate constructs (stereotype mean-

ing points) t o p o l o g i c a l l y as elements of semantic

space As connotative meaning representations,

these elements allow an aspect-controlled, con-

tents-driven algorithm to operate which reorganizes

them dynamically in d i s p o s i t i o n a l dependency s t r u c -

tures (DDS-trees) which c o n s t i t u t e a procedurally

defined meaning representation format

O Introduction Modelling system structures of word meanings and/or

world knowledge is to face the problem of t h e i r

mutual and complex relatedness As the cognitive

p r i n c i p l e s underlying these structures are poorly

understood yet, the work of psychologists, A I - r e -

searchers, and l i n g u i s t s active in that f i e l d ap-

pears to be determined by the respective d i s c i -

p l l n e ' s general l i n e of approach rather than by

consequences drawn from these approaches' i n t e r s e c -

ting results in t h e i r common f i e l d of i n t e r e s t In

l i n g u i s t i c semantics, cognitive psychology, and

knowledge representation most of the necessary data

concerning l e x i c a l , semantic and/or external world

information is s t i l l provided i n t r o s p e c t i v e l y Be-

searchers are exploring (or make test-persons ex-

plore) t h e i r own l i n g u i s t i c / c o g n i t i v e capacities

and memory structures to depict t h e i r findings (or

l e t hypotheses a b o u t them be tested) in various

representational formats ( l i s t s arrays, t r e e s ,

nets, active networks, e t c ) I t is widely accepted

that these modelstructures do have a more or less

ad hoc character and tend to be confined to t h e i r

limited t h e o r e t i c a l or operational performances

within a specified approach, subject domain or im-

plemented system Basically i n t e r p r e t a t i v e approa-

ches l i k e these, however, lack the most s a l i e n t

c h a r a c t e r i s t i c s of more constructive modelstruc-

tures that can be developed along the lines of an

e n t i t y - r e ! s t l o n s h i o approach (CHEN 1 9 8 0 ) Their

properties of f l e x i b i l i t y and dynamics are needed

for automatic meaning representation f r o m input

texts to build up and/or modify the realm and scope

of t h e i r own knowledge, however baseline and vague

that may appear compared to human understanding

In a rather sharp departure from those more o r t h o - dox lines of i n t r o s p e c t i v e data a c q u i s i t i o n in mea- ning and knowledge representation research, the present approach ( I ) has been based on the algo- rithmic analysis of discourse that real speakers/

w r i t e r s produce in actual s i t u a t i o n s of performed

or intended communication on a certain subject do- main, and (2) the approach makes essential use of the w o r d - u s a g e / e n t i t y - r e l a t i o n s h i p paradigm in com- bination with procedural means to map fuzzy word meanings and t h e i r connotative i n t e r r e l a t i o n s in a format of stereotypes Their dynamic dependencies (3) c o n s t i t u t e semantic d i s p o s i t i o n s that render only those conceptual i n t e r r e l a t i o n s accessible to automatic processing which can - under d i f f e r i n g aspects d i f f e r e n t l y - be considered r e l e v a n t Such

d i s p o s i t i o n a l dependency structures (DDS) would seem to be an operational p r e r e q u i s i t e to and a promising candidate f o r the simulation of contents- driven (analogically-associative), instead of f o r - mal ( l o g i c a l l y - d e d u c t i v e ) inferences in semantic processing

I The approach The empirical analysis of discourse and the formal representation of vague word meanings in natural language t e x t s as a system of i n t e r r e l a t e d concepts (RIEGER 1980) is based on a WITTGENSTEINian assump- tion according to which a great number of t e x t s analysed for any of the employed terms' usage regu-

l a r z t i e ~ w i l l reveal essential parts of the con- cepts and hence the meanings conveyed

I t has been shown elsewhere (RIEGER 1 9 8 0 ) , that in

a s u f f i c i e n t l y large sample of pragmatically homo- geneous t e x t s , c a l l e d corpus, only a r e s t r i c t e d vo- cabulary, i e a l i m i t e d number of l e x i c a l items

w i l l be used by the i n t e r l o c u t o r s however compre- hensive t h e i r personal vocabularies in general might be Consequently, the l e x i c a l items employed

to convey information on a certain subject domain under consideration in the discourse concerned w i l l

be d i s t r i b u t e d according to t h e i r conventionalized communicative p r o p e r t i e s , c o n s t i t u t i n g semantic r e -

g u ! a r i t i e z which may be detected e m p i r i c a l l y from the t e x t s

For the q u a n t i t a t i v e analysis not of propositional strings but of t h e i r elements, namely words in na- tural language t e x t s , rather simple s t a t i s t i c s ser-

ve the basicalkly d e s c r i p t i v e purpose Developed from and centred around a c o r r e l a t i o n a l measure to specify i n t e n s i t i e s of co-occurring l e x i c a l items used in natural language discourse, these analysing

Trang 2

fragment of the lexical structure constituted by

the vocabulary employed in the texts as part of the

concomitantly conveyed world knowledge

A c o r r e l a t i o n c o e f f i c i e n t appropriately modified

f o r the purpose has been used as a mapping function

(RIEGER 1981a) I t allows to compute the r e l a t i o n a l

interdependency of any two l e x i c a l items from t h e i r

t e x t u a l frequencies Those items w h i c h co-occur

f r e q u e n t l y in a number of t e x t s w i l l p o s i t i v e l y be

c o r r e l a t e d and hence called a f f i n e d , those of which

only one (and not the other) frequently occurs in a

number of t e x t s w i l l negatively be c o r r e l a t e d and

hence called repugnant Different degrees of word-

repugnancy and word-affinity may thus be ascer-

tained without r e c u r r i n g to an i n v e s t i g a t o r ' s or

his test-persons' word and/or world knowledge (se-

mantic competence), but can instead s o l e l y be based

upon the usage r e g u l a r i t i e s of l e x i c a l items obser-

ved in a corpus of pragmatically homogeneous t e x t s ,

spoken or w r i t t e n by real speakers~hearers in ac-

tual or intended acts of communication (communica-

t i v e performance)

2 The semantic space s t r u c t u r e

Following a system-theoretic approach and taking

each w o r d employed as a p o t e n t i a l descriptor to

characterize any other word's v i r t u a l meaning, the

modified c o r r e l a t i o n c o e f f i c i e n t can be used to map

each l e x i c a l item i n t o fuzzy subsets (ZADEH 1981)

of the vocabulary according to i t s numerically spe-

c i f i e d usage r e g u l a r i t i e s Measuring the d i f f e r e n -

ces of any one's l e x i c a l item's usages, represented

as fuzzy subsets of the vocabulary, against those

of a l l others allows f o r a consecutive mapping of

items onto another abstract e n t i t y of the t h e o r e t i -

cal construct These new o p e r a t i o n a l l y defined en-

t i t i e s - called an item's meanings - may v e r b a l l y

be characterized as a function of a l l the d i f f e -

rences of a l l r e g u l a r i t i e s any one item is used

with compared to any other item in the same corpus

of discourse

UNTERNEHM/enterpr 0.000

SYSTEM/system 2.035

ELEKTR/electron 2.195

DIPCOM/diploma 2 2 8 8

INDUSTR/industry 2.538

SUCHE/search 2.772

SCHUC/school 2.922

FOLGE/consequ 3.135

ERFAHR/experienc 3.485

ORGANISAT/organis 3.84b

VERBAND/assoc 2.299 STELLE/position 2.620 SCHREIB/write 2.791 AUFTRAG/order 3.058 BERUF/professn 3 4 7 7 UNTERR/instruct 3.586 VERWALT/administ 3.952 WUNSCH/wish/desir 4.081 , o

Table I : Topological environment E<UNTERNEHM>

The r e s u l t i n g system of sets of fuzzy subsets con-

s t i t u t e s the semantic space As a d i s t a n c e - r e l a t i o -

nal datastructure of s t e r e o t y p i c a l l y formatted mea-

ning representations i t may be interpreted topo-

l o g i c a l l y as a hyperspace with a natural metric

I t s l i n g u i s t i c a l l y labelled elements represent mea-

ning points, and t h e i r mutual distances represent

meaning differences

The position of a meaning point may be described by

i t s semantic environment Tab.1 shows the t o p o l o g i -

points b e i n g situated within the hypersphere of a certain diameter around i t s center meaning point

UNTERNEHM/enterprise as computed from a corpus of German newspaper t e x t s comprising some 8000 tokens

of 360 types in 175 t e x t s from the 1964 e d i t i o n s of the d a i l y DIE WELT

Having checked a great number of environments, %t was ascertained that they do in f a c t assemble mea- ning points of a certain semantic a f f i n i t y Further

i n v e s t i g a t i o n revealed (RIEGER 1983) that there are regions of higher point density in the semantic space, forming clouds and c l u s t e r s These were de- tected by m u l t i v a r i a t e and cluster-analyzing me- thods which showed, however, that the both, para- digmatically and syntagmatically, related items formed what may be named connotatlve clouds rather than what is known to be called semantic f l e ! d s

Although i t s i n t e r n a l r e l a t i o n s appeared to be un-

s p e c i f i a b l e in terms of any l o g i c a l l y deductive or concept h i e r a r c h i c a l system, t h e i r elements' posi- tions showed high degree of stable structures which suggested a regular form of contents-dependant as-

s o c i a t i v e connectedness (RIEGER 19Bib)

3 The d i s p o s i t i o n a l dependency Following a more semiotic understanding of meaning

c o n s t i t u t i o n , the present semantic space model may become part of a word meaning/world knowledge re- presentation system which separates the format of a basic (stereotype) meaning representation f r o m i t s

l a t e n t (dependency) r e l a t i o n a l organization Where-

as the former is a rather s t a t i c , t o p o l o g i c a l l y structured (associative) memory representing the data that t e x t analysing algorithms provide, the

l a t t e r can be characterized as a c o l l e c t i o n of dy- namic and f l e x i b l e s t r u c t u r i n g processes to re- organize these data under various p r i n c i p l e s (RIE- 6ER 1981b) O t h e r than d e c l a r a t i v e knowledge that can be represented in pre-defined semantic network

s t r u c t u r e s , meaning r e l a t i o n s of l e x i c a l relevance and semantic d i s p o s i t l o n s which are haevlly depen- dent on context and domain of knowledge concerned

w i l l more adequately be defined procedurally, i e

by generative algorithms that induce them on chang- ing data only and whenever necessary This is achieved by a r e c u r s i v e l y defined procedure that produces hierarchies of meaning points, structured under given aspects according to and in dependence

of t h e i r meanings' relevancy (RIEGER 1984b)

Corroborating ideas expressed within the theories

spreading activation and the process of priming

studied in cognitive psychology (LORCH 1982), a new algorithm has been developed which operates on the semantic space data and generates - other than in RIEGER (1982) - d i s p o s i t i o n a l dependency structures (DDS) in the format of n-ary t r e e s Given one mean- ing p o i n t ' s p o s i t i o n as a s t a r t , the algorithm of least distances (LD) w~ll f i r s t l i s t a l l i t s neigh- bouring points and stack them by increasing d i s t a n - ces, second prime the s t a r t i n g point as head node

or root of the DDS-tree to be generated before,

third, the algorithm's generic procedure takes over I t w i l l take the f i r s t entry from the stack, generate a l i s t of i t s neighbours, determine from

i t the least d i s t a n t one that has already been primed, and i d e n t i f y i t as the ancestor-node to

Trang 3

whlcn the new point is linked as descendant-node to

be primed next Repeated succesively f o r each of

the meaning polnts stacked and in turn primed in

accordance with t h i s procedure, the algorithm w i l l

select a p a r t i c u l a r fragment of the r e l a t i o n a l

s t r u c t u r e e l a t e n t l v inherent in the semantic space

data and depending on the aspect, i e the i n i t i a l -

ly p r i m e d meaning point the algorithm is started

with Working i t s way through and consuming a l l

lapeled points in the space s t r u c t u r e - unless

stopped u n d e r conditions of given t a r g e t nodes,

number of nodes to be processed, or threshold of

maximum distance - the algorithm transforms pre-

v a i l i n g s i m i l a r i t i e s of meanings as represented by

adjacent points to e s t a b l i s h a b i n a r y , non-symme-

t r i c , and t r a n s i t i v e r e l a t i o n of semantic relevance

between them This r e l a t i o n allows f o r the h i e r a r -

chical re-organization of meaning points as nodes

under a pr,med head in an n-arv DDS-tree (RIEGER

1984a)

Without introducing the algorithms f o r m a l l y , some

of t h e i r operatlve c h a r a c t e r i s t i c s can well be i l -

l u s t r a t e d in the sequel by a few s i m p l i f i e d examp-

les Beginning with the schema of a d i s t a n c e - l i k e

data s t r u c t u r e as shown in the two-dimensional con-

f i g u r a t i o n of 11 p o i n t s , labeled a to k (Fig I I }

the s t i m u l a t i o n of e.g points a or c w i l l s t a r t

the procedure and produce two s p e c i f i c selections

of distances a c t i v a t e d among these 11 points (Fig

1.2) The order of how these p a r t i c u l a r distances

are selected can be represented e i t h e r by step-

l i s t s (Fig 1.3), or n-ary t r e e - s t r u c t u r e s (Fig

1.41, or t h e i r binary transformations {Fig 1.5)

I t is apparent that s t i m u l a t i o n of other points

within the same c o n f i g u r a t i o n of basic data points

w i l l r e s u l t in s i m i l a r but nevertheless d i f f e r i n g

trees, depending on the aspect u n d e r w h i c h the

s t r u c t u r e is accessed, i e the point i n i t l a l l y

stimulated to s t a r t the algorithm wlth

Applied to the semantic space data of 360 defined

meaning points calculated f r o m the textcorpus of

the t964 e d i t i o n s of the German newspaper DIE WELT,

UNTERNEHMlenterprise is given in Fig 2 as gene-

rated by the procedure described

Beside giving distances between nodes in the DDS-

t r e e , a numerlcal measure has been devised which

describes any node's degree of relevance according

to that tree s t r u c t u r e As a numerical measure, a

node's c r z t e r i a l i t y is to be calculated with re-

spect to i t s root or aspect and has been defined as

a function of both, i t s distance values and i t s

level tn the tree concerned For a w~de range of

purposes ~n processing DDS-trees, d i f f e r e n t c r l t e -

r i a l i t i e s of nodes can be used to estimate which

paths are more l i k e l y being taken against others

being followed less l i k e l y under priming of c e r t a i n

meanlng points Source-orlented, contents-drlven

s e a r c h and r a t t l e r s ! p r o c e d u r e s m a y t h u s be p e r f o r -

med e f f e c t i v e l y on the semantlc s p a c e s t r u c t u r e ,

allowing f o r the a c t l v a t l o n of depeneency paths

These are to trace those intermediate nodes which

determine the associative t r a n s i t i o n s of any target

node u n d e r a n y s p e c i f i a b l e aspect

e

J

F i g I I

£

l

S t e p Z d Z a

4 f - @ e

8 i - ~ h

I 0 J - ÷ c

Fig 1 2

0 c - ~ c

I j - ~ c

4 h - } i

5 k -~ b

T 9 - ÷ h

8 d - ÷ b

!0 f -÷ e

f c d k h

I

Fig 1.3

h k a d

r

f

8

v

e

f c

I

Fig 1.4

c

Fig 1.5

¥

b

v v

k ,m

J

m

I

f

Trang 4

5.326/.158

F O L G E

3.135/.242

U N T E R N E H M E N ~ S Y S T E M

O O O O / 1 00 2 0 3 5 / .329

==.VERNANDELN

4 5 5 9 J O 5 0

BERUF ==ERFAHREN

2 5 2 1 / 1 1 5 2.677/.O41

~ GEUIET

= = I N D O S T R I E 1,104/.230

F ~ H I G

r 1.86o/.o22

~¢~ORGANISA'I' 1.88B/.o21 UOCH

~ 4.O23/.O15 M~.GCH INE

3 3 1 0 / O 1 ~

HERRSCHAFT

L 3.445/.O63 ~ 3 9 1 3 / O 1 6

STELLE K O S T E N

2 OO3/ IO3 > 4 644/.022

= A U F T R A G 1.923/.089

=,SUCHE

O 7 2 0 / 2 0 7

:~VERBAND O.734/.204

• TECIINIK

~ 1 4 4 0 / O 1 5

= = A U S G A ~ E

2.220/.009

BKITE

~ a 5 3 1 / 0 0 5

~ 1.227/.012 2.165/.LOb

KENNEN EiNSATZ RADM

] 5 1 3 / O 1 0 ~='4.459/.OO2 ~='3,890/.iX~I

W I R T ~ C I ~ F T

F 3.459/.O11

VERWALTEN V E H A N T W O R T K ENTWZCKELN 2.650/.O90 =>'2.242/.O39 N1~"3.405/.Oll

U N T E R R I C H T

1.583/.142

S C l l U L E NUNI:iCli 1.150/.186 ;~"1.795/.O94

I

t

SCHREIUEN 1.257/.173 LEITEN L O E L ~ : K T R O COMPUI'Ek

=" 1.425/ 188 528/,263 O O 9 5 / , 7 3 5

Fi

proved particularly promising in an a n a l o g i c a l ,

,hich - as opposed to l o g i c a l deduction - has ope-

rationally be described in RIEGER (1984c) and simu-

more) dependency-trees

REFERENCES

proach to Systems Analysis and Design (UCLA),

Amsterdam/NewYork (North Holland) 1980

Semantic Memory: A T e s t of Three Models of

Representation, Proceedings of COLINS 80, Tok-

yo 1980, 76-84

l i n / NewYork (deSruyter) 1981, 193-209

Rieger,B.(1981b): Connotative Dependency Structures

622-711

AUGLAND

~ ' 3 0 4 J / 0 0 4 ]~ HKNDEL

4 7 ? 4 / O 0 2

B/~t) t i l l s

F 4.650/.000 ~ 1 9 8 3 / O O O EkWAH'|'EN KU~Z

I-~'4.611/.OO2 1:"'4.U92/.OOO

J.426/.004

~KRA/~K ~.NTRAuE N'fEUEH

2 8 7 5 / O 5 7 4.4J5/.013 [ ~ " 4 4 2 7 / c ~ 3

DIPLOM

";="O.115/.865

g 2

(North Holland) 1982, 319-324

Informatique et Sciences Humaines, Universitd

de Lieges (LASLA), 1983, 805-814 Rieger, B (1984a): Semantische Dispositionen Pro-

Hamburg (Buske) 1983 Kin print)

a Distance-like Data Structure of Fuzzy Word Menanlng Representation in: Allen, R F ( E d ) : Data Bases in the Humanities and Social Scien-

Amsterdam/NewYork (North Holland) 1984 (in pr)

ters (Eds.): Meaning and the Lexicon Nijmegen University (M.I.S Press) 1984 (in print) Zadeh, L.A.(1981): Test-Score Semantics for Natural Languages and Meaning Representation via PRUF

chum (Brockmeyer) 1981, 281-349

Ngày đăng: 17/03/2014, 19:21

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm