Contents-drlven algorithmic processing of fuzzy wordmeanings to form dynamic stereotype representations Burghard B.. Introduction Modelling system structures of word meanings and/or worl
Trang 1Contents-drlven algorithmic processing of fuzzy wordmeanings
to form dynamic stereotype representations
Burghard B Rieger Arbeitsgruppe fur mathematisch-empirische Systemforschung (MESY) German Department, Technical University of Aachen,
Aachen, West Germany
ABSTRACT Cognitive p r i n c i p l e s underlying the ( r e - ) c o n s t r u c -
tion of word meaning and/or world knowledge s t r u c -
tures are poorly understood yet In a rather sharp
departure from more orthodox lines of i n t r o s p e c t i v e
a c q u i s i t i o n of s t r u c t u r a l data on meaning and know-
ledge representation in cognitive science, an empi-
r i c a l approach is explored that analyses natural
language data s t a t i s t i c a l l y , represents i t s numeri-
cal findings fuzzy-set t h e o r e t i c a l l y , and i n t e r -
pret5 i t s intermediate constructs (stereotype mean-
ing points) t o p o l o g i c a l l y as elements of semantic
space As connotative meaning representations,
these elements allow an aspect-controlled, con-
tents-driven algorithm to operate which reorganizes
them dynamically in d i s p o s i t i o n a l dependency s t r u c -
tures (DDS-trees) which c o n s t i t u t e a procedurally
defined meaning representation format
O Introduction Modelling system structures of word meanings and/or
world knowledge is to face the problem of t h e i r
mutual and complex relatedness As the cognitive
p r i n c i p l e s underlying these structures are poorly
understood yet, the work of psychologists, A I - r e -
searchers, and l i n g u i s t s active in that f i e l d ap-
pears to be determined by the respective d i s c i -
p l l n e ' s general l i n e of approach rather than by
consequences drawn from these approaches' i n t e r s e c -
ting results in t h e i r common f i e l d of i n t e r e s t In
l i n g u i s t i c semantics, cognitive psychology, and
knowledge representation most of the necessary data
concerning l e x i c a l , semantic and/or external world
information is s t i l l provided i n t r o s p e c t i v e l y Be-
searchers are exploring (or make test-persons ex-
plore) t h e i r own l i n g u i s t i c / c o g n i t i v e capacities
and memory structures to depict t h e i r findings (or
l e t hypotheses a b o u t them be tested) in various
representational formats ( l i s t s arrays, t r e e s ,
nets, active networks, e t c ) I t is widely accepted
that these modelstructures do have a more or less
ad hoc character and tend to be confined to t h e i r
limited t h e o r e t i c a l or operational performances
within a specified approach, subject domain or im-
plemented system Basically i n t e r p r e t a t i v e approa-
ches l i k e these, however, lack the most s a l i e n t
c h a r a c t e r i s t i c s of more constructive modelstruc-
tures that can be developed along the lines of an
e n t i t y - r e ! s t l o n s h i o approach (CHEN 1 9 8 0 ) Their
properties of f l e x i b i l i t y and dynamics are needed
for automatic meaning representation f r o m input
texts to build up and/or modify the realm and scope
of t h e i r own knowledge, however baseline and vague
that may appear compared to human understanding
In a rather sharp departure from those more o r t h o - dox lines of i n t r o s p e c t i v e data a c q u i s i t i o n in mea- ning and knowledge representation research, the present approach ( I ) has been based on the algo- rithmic analysis of discourse that real speakers/
w r i t e r s produce in actual s i t u a t i o n s of performed
or intended communication on a certain subject do- main, and (2) the approach makes essential use of the w o r d - u s a g e / e n t i t y - r e l a t i o n s h i p paradigm in com- bination with procedural means to map fuzzy word meanings and t h e i r connotative i n t e r r e l a t i o n s in a format of stereotypes Their dynamic dependencies (3) c o n s t i t u t e semantic d i s p o s i t i o n s that render only those conceptual i n t e r r e l a t i o n s accessible to automatic processing which can - under d i f f e r i n g aspects d i f f e r e n t l y - be considered r e l e v a n t Such
d i s p o s i t i o n a l dependency structures (DDS) would seem to be an operational p r e r e q u i s i t e to and a promising candidate f o r the simulation of contents- driven (analogically-associative), instead of f o r - mal ( l o g i c a l l y - d e d u c t i v e ) inferences in semantic processing
I The approach The empirical analysis of discourse and the formal representation of vague word meanings in natural language t e x t s as a system of i n t e r r e l a t e d concepts (RIEGER 1980) is based on a WITTGENSTEINian assump- tion according to which a great number of t e x t s analysed for any of the employed terms' usage regu-
l a r z t i e ~ w i l l reveal essential parts of the con- cepts and hence the meanings conveyed
I t has been shown elsewhere (RIEGER 1 9 8 0 ) , that in
a s u f f i c i e n t l y large sample of pragmatically homo- geneous t e x t s , c a l l e d corpus, only a r e s t r i c t e d vo- cabulary, i e a l i m i t e d number of l e x i c a l items
w i l l be used by the i n t e r l o c u t o r s however compre- hensive t h e i r personal vocabularies in general might be Consequently, the l e x i c a l items employed
to convey information on a certain subject domain under consideration in the discourse concerned w i l l
be d i s t r i b u t e d according to t h e i r conventionalized communicative p r o p e r t i e s , c o n s t i t u t i n g semantic r e -
g u ! a r i t i e z which may be detected e m p i r i c a l l y from the t e x t s
For the q u a n t i t a t i v e analysis not of propositional strings but of t h e i r elements, namely words in na- tural language t e x t s , rather simple s t a t i s t i c s ser-
ve the basicalkly d e s c r i p t i v e purpose Developed from and centred around a c o r r e l a t i o n a l measure to specify i n t e n s i t i e s of co-occurring l e x i c a l items used in natural language discourse, these analysing
Trang 2fragment of the lexical structure constituted by
the vocabulary employed in the texts as part of the
concomitantly conveyed world knowledge
A c o r r e l a t i o n c o e f f i c i e n t appropriately modified
f o r the purpose has been used as a mapping function
(RIEGER 1981a) I t allows to compute the r e l a t i o n a l
interdependency of any two l e x i c a l items from t h e i r
t e x t u a l frequencies Those items w h i c h co-occur
f r e q u e n t l y in a number of t e x t s w i l l p o s i t i v e l y be
c o r r e l a t e d and hence called a f f i n e d , those of which
only one (and not the other) frequently occurs in a
number of t e x t s w i l l negatively be c o r r e l a t e d and
hence called repugnant Different degrees of word-
repugnancy and word-affinity may thus be ascer-
tained without r e c u r r i n g to an i n v e s t i g a t o r ' s or
his test-persons' word and/or world knowledge (se-
mantic competence), but can instead s o l e l y be based
upon the usage r e g u l a r i t i e s of l e x i c a l items obser-
ved in a corpus of pragmatically homogeneous t e x t s ,
spoken or w r i t t e n by real speakers~hearers in ac-
tual or intended acts of communication (communica-
t i v e performance)
2 The semantic space s t r u c t u r e
Following a system-theoretic approach and taking
each w o r d employed as a p o t e n t i a l descriptor to
characterize any other word's v i r t u a l meaning, the
modified c o r r e l a t i o n c o e f f i c i e n t can be used to map
each l e x i c a l item i n t o fuzzy subsets (ZADEH 1981)
of the vocabulary according to i t s numerically spe-
c i f i e d usage r e g u l a r i t i e s Measuring the d i f f e r e n -
ces of any one's l e x i c a l item's usages, represented
as fuzzy subsets of the vocabulary, against those
of a l l others allows f o r a consecutive mapping of
items onto another abstract e n t i t y of the t h e o r e t i -
cal construct These new o p e r a t i o n a l l y defined en-
t i t i e s - called an item's meanings - may v e r b a l l y
be characterized as a function of a l l the d i f f e -
rences of a l l r e g u l a r i t i e s any one item is used
with compared to any other item in the same corpus
of discourse
UNTERNEHM/enterpr 0.000
SYSTEM/system 2.035
ELEKTR/electron 2.195
DIPCOM/diploma 2 2 8 8
INDUSTR/industry 2.538
SUCHE/search 2.772
SCHUC/school 2.922
FOLGE/consequ 3.135
ERFAHR/experienc 3.485
ORGANISAT/organis 3.84b
VERBAND/assoc 2.299 STELLE/position 2.620 SCHREIB/write 2.791 AUFTRAG/order 3.058 BERUF/professn 3 4 7 7 UNTERR/instruct 3.586 VERWALT/administ 3.952 WUNSCH/wish/desir 4.081 , o
Table I : Topological environment E<UNTERNEHM>
The r e s u l t i n g system of sets of fuzzy subsets con-
s t i t u t e s the semantic space As a d i s t a n c e - r e l a t i o -
nal datastructure of s t e r e o t y p i c a l l y formatted mea-
ning representations i t may be interpreted topo-
l o g i c a l l y as a hyperspace with a natural metric
I t s l i n g u i s t i c a l l y labelled elements represent mea-
ning points, and t h e i r mutual distances represent
meaning differences
The position of a meaning point may be described by
i t s semantic environment Tab.1 shows the t o p o l o g i -
points b e i n g situated within the hypersphere of a certain diameter around i t s center meaning point
UNTERNEHM/enterprise as computed from a corpus of German newspaper t e x t s comprising some 8000 tokens
of 360 types in 175 t e x t s from the 1964 e d i t i o n s of the d a i l y DIE WELT
Having checked a great number of environments, %t was ascertained that they do in f a c t assemble mea- ning points of a certain semantic a f f i n i t y Further
i n v e s t i g a t i o n revealed (RIEGER 1983) that there are regions of higher point density in the semantic space, forming clouds and c l u s t e r s These were de- tected by m u l t i v a r i a t e and cluster-analyzing me- thods which showed, however, that the both, para- digmatically and syntagmatically, related items formed what may be named connotatlve clouds rather than what is known to be called semantic f l e ! d s
Although i t s i n t e r n a l r e l a t i o n s appeared to be un-
s p e c i f i a b l e in terms of any l o g i c a l l y deductive or concept h i e r a r c h i c a l system, t h e i r elements' posi- tions showed high degree of stable structures which suggested a regular form of contents-dependant as-
s o c i a t i v e connectedness (RIEGER 19Bib)
3 The d i s p o s i t i o n a l dependency Following a more semiotic understanding of meaning
c o n s t i t u t i o n , the present semantic space model may become part of a word meaning/world knowledge re- presentation system which separates the format of a basic (stereotype) meaning representation f r o m i t s
l a t e n t (dependency) r e l a t i o n a l organization Where-
as the former is a rather s t a t i c , t o p o l o g i c a l l y structured (associative) memory representing the data that t e x t analysing algorithms provide, the
l a t t e r can be characterized as a c o l l e c t i o n of dy- namic and f l e x i b l e s t r u c t u r i n g processes to re- organize these data under various p r i n c i p l e s (RIE- 6ER 1981b) O t h e r than d e c l a r a t i v e knowledge that can be represented in pre-defined semantic network
s t r u c t u r e s , meaning r e l a t i o n s of l e x i c a l relevance and semantic d i s p o s i t l o n s which are haevlly depen- dent on context and domain of knowledge concerned
w i l l more adequately be defined procedurally, i e
by generative algorithms that induce them on chang- ing data only and whenever necessary This is achieved by a r e c u r s i v e l y defined procedure that produces hierarchies of meaning points, structured under given aspects according to and in dependence
of t h e i r meanings' relevancy (RIEGER 1984b)
Corroborating ideas expressed within the theories
spreading activation and the process of priming
studied in cognitive psychology (LORCH 1982), a new algorithm has been developed which operates on the semantic space data and generates - other than in RIEGER (1982) - d i s p o s i t i o n a l dependency structures (DDS) in the format of n-ary t r e e s Given one mean- ing p o i n t ' s p o s i t i o n as a s t a r t , the algorithm of least distances (LD) w~ll f i r s t l i s t a l l i t s neigh- bouring points and stack them by increasing d i s t a n - ces, second prime the s t a r t i n g point as head node
or root of the DDS-tree to be generated before,
third, the algorithm's generic procedure takes over I t w i l l take the f i r s t entry from the stack, generate a l i s t of i t s neighbours, determine from
i t the least d i s t a n t one that has already been primed, and i d e n t i f y i t as the ancestor-node to
Trang 3whlcn the new point is linked as descendant-node to
be primed next Repeated succesively f o r each of
the meaning polnts stacked and in turn primed in
accordance with t h i s procedure, the algorithm w i l l
select a p a r t i c u l a r fragment of the r e l a t i o n a l
s t r u c t u r e e l a t e n t l v inherent in the semantic space
data and depending on the aspect, i e the i n i t i a l -
ly p r i m e d meaning point the algorithm is started
with Working i t s way through and consuming a l l
lapeled points in the space s t r u c t u r e - unless
stopped u n d e r conditions of given t a r g e t nodes,
number of nodes to be processed, or threshold of
maximum distance - the algorithm transforms pre-
v a i l i n g s i m i l a r i t i e s of meanings as represented by
adjacent points to e s t a b l i s h a b i n a r y , non-symme-
t r i c , and t r a n s i t i v e r e l a t i o n of semantic relevance
between them This r e l a t i o n allows f o r the h i e r a r -
chical re-organization of meaning points as nodes
under a pr,med head in an n-arv DDS-tree (RIEGER
1984a)
Without introducing the algorithms f o r m a l l y , some
of t h e i r operatlve c h a r a c t e r i s t i c s can well be i l -
l u s t r a t e d in the sequel by a few s i m p l i f i e d examp-
les Beginning with the schema of a d i s t a n c e - l i k e
data s t r u c t u r e as shown in the two-dimensional con-
f i g u r a t i o n of 11 p o i n t s , labeled a to k (Fig I I }
the s t i m u l a t i o n of e.g points a or c w i l l s t a r t
the procedure and produce two s p e c i f i c selections
of distances a c t i v a t e d among these 11 points (Fig
1.2) The order of how these p a r t i c u l a r distances
are selected can be represented e i t h e r by step-
l i s t s (Fig 1.3), or n-ary t r e e - s t r u c t u r e s (Fig
1.41, or t h e i r binary transformations {Fig 1.5)
I t is apparent that s t i m u l a t i o n of other points
within the same c o n f i g u r a t i o n of basic data points
w i l l r e s u l t in s i m i l a r but nevertheless d i f f e r i n g
trees, depending on the aspect u n d e r w h i c h the
s t r u c t u r e is accessed, i e the point i n i t l a l l y
stimulated to s t a r t the algorithm wlth
Applied to the semantic space data of 360 defined
meaning points calculated f r o m the textcorpus of
the t964 e d i t i o n s of the German newspaper DIE WELT,
UNTERNEHMlenterprise is given in Fig 2 as gene-
rated by the procedure described
Beside giving distances between nodes in the DDS-
t r e e , a numerlcal measure has been devised which
describes any node's degree of relevance according
to that tree s t r u c t u r e As a numerical measure, a
node's c r z t e r i a l i t y is to be calculated with re-
spect to i t s root or aspect and has been defined as
a function of both, i t s distance values and i t s
level tn the tree concerned For a w~de range of
purposes ~n processing DDS-trees, d i f f e r e n t c r l t e -
r i a l i t i e s of nodes can be used to estimate which
paths are more l i k e l y being taken against others
being followed less l i k e l y under priming of c e r t a i n
meanlng points Source-orlented, contents-drlven
s e a r c h and r a t t l e r s ! p r o c e d u r e s m a y t h u s be p e r f o r -
med e f f e c t i v e l y on the semantlc s p a c e s t r u c t u r e ,
allowing f o r the a c t l v a t l o n of depeneency paths
These are to trace those intermediate nodes which
determine the associative t r a n s i t i o n s of any target
node u n d e r a n y s p e c i f i a b l e aspect
e
J
F i g I I
£
l
S t e p Z d Z a
4 f - @ e
8 i - ~ h
I 0 J - ÷ c
Fig 1 2
0 c - ~ c
I j - ~ c
4 h - } i
5 k -~ b
T 9 - ÷ h
8 d - ÷ b
!0 f -÷ e
f c d k h
I
Fig 1.3
h k a d
r
f
8
v
e
f c
I
Fig 1.4
c
Fig 1.5
¥
b
v v
k ,m
J
m
I
f
Trang 45.326/.158
F O L G E
3.135/.242
U N T E R N E H M E N ~ S Y S T E M
O O O O / 1 00 2 0 3 5 / .329
==.VERNANDELN
4 5 5 9 J O 5 0
BERUF ==ERFAHREN
2 5 2 1 / 1 1 5 2.677/.O41
~ GEUIET
= = I N D O S T R I E 1,104/.230
F ~ H I G
r 1.86o/.o22
~¢~ORGANISA'I' 1.88B/.o21 UOCH
~ 4.O23/.O15 M~.GCH INE
3 3 1 0 / O 1 ~
HERRSCHAFT
L 3.445/.O63 ~ 3 9 1 3 / O 1 6
STELLE K O S T E N
2 OO3/ IO3 > 4 644/.022
= A U F T R A G 1.923/.089
=,SUCHE
O 7 2 0 / 2 0 7
:~VERBAND O.734/.204
• TECIINIK
~ 1 4 4 0 / O 1 5
= = A U S G A ~ E
2.220/.009
BKITE
~ a 5 3 1 / 0 0 5
~ 1.227/.012 2.165/.LOb
KENNEN EiNSATZ RADM
] 5 1 3 / O 1 0 ~='4.459/.OO2 ~='3,890/.iX~I
W I R T ~ C I ~ F T
F 3.459/.O11
VERWALTEN V E H A N T W O R T K ENTWZCKELN 2.650/.O90 =>'2.242/.O39 N1~"3.405/.Oll
U N T E R R I C H T
1.583/.142
S C l l U L E NUNI:iCli 1.150/.186 ;~"1.795/.O94
I
t
SCHREIUEN 1.257/.173 LEITEN L O E L ~ : K T R O COMPUI'Ek
=" 1.425/ 188 528/,263 O O 9 5 / , 7 3 5
Fi
proved particularly promising in an a n a l o g i c a l ,
,hich - as opposed to l o g i c a l deduction - has ope-
rationally be described in RIEGER (1984c) and simu-
more) dependency-trees
REFERENCES
proach to Systems Analysis and Design (UCLA),
Amsterdam/NewYork (North Holland) 1980
Semantic Memory: A T e s t of Three Models of
Representation, Proceedings of COLINS 80, Tok-
yo 1980, 76-84
l i n / NewYork (deSruyter) 1981, 193-209
Rieger,B.(1981b): Connotative Dependency Structures
622-711
AUGLAND
~ ' 3 0 4 J / 0 0 4 ]~ HKNDEL
4 7 ? 4 / O 0 2
B/~t) t i l l s
F 4.650/.000 ~ 1 9 8 3 / O O O EkWAH'|'EN KU~Z
I-~'4.611/.OO2 1:"'4.U92/.OOO
J.426/.004
~KRA/~K ~.NTRAuE N'fEUEH
2 8 7 5 / O 5 7 4.4J5/.013 [ ~ " 4 4 2 7 / c ~ 3
DIPLOM
";="O.115/.865
g 2
(North Holland) 1982, 319-324
Informatique et Sciences Humaines, Universitd
de Lieges (LASLA), 1983, 805-814 Rieger, B (1984a): Semantische Dispositionen Pro-
Hamburg (Buske) 1983 Kin print)
a Distance-like Data Structure of Fuzzy Word Menanlng Representation in: Allen, R F ( E d ) : Data Bases in the Humanities and Social Scien-
Amsterdam/NewYork (North Holland) 1984 (in pr)
ters (Eds.): Meaning and the Lexicon Nijmegen University (M.I.S Press) 1984 (in print) Zadeh, L.A.(1981): Test-Score Semantics for Natural Languages and Meaning Representation via PRUF
chum (Brockmeyer) 1981, 281-349