Association-based Natural Language Processing with Neural Networks AMANO Sin-ya Information Systems Laboratory Research and Development Center TOSHIBA Corp.. Therefore, simulating word
Trang 1Association-based Natural Language Processing
with Neural Networks
AMANO Sin-ya Information Systems Laboratory Research and Development Center
TOSHIBA Corp
1 Komukai-T6siba-ty6, Saiwai-ku, Kawasaki 210 Japan
kim~isl.rdc.toshiba.co.jp
A b s t r a c t
This paper describes a natural language pro-
cessing system reinforced by the use of associ-
ation of words and concepts, implemented as a
neural network Combining an associative net-
work with a conventional system contributes
to semantic disambiguation in the process of
interpretation The model is employed within
a kana-kanji conversion s y s t e m and the advan-
tages over conventional ones are shown
1 I n t r o d u c t i o n
Currently, most practical applications in nat-
ural language processing (NLP) have been
realized via symbolic manipulation engines,
such as grammar parsers However, the cur-
rent trend (and focus of research) is shift-
ing to consider aspects of semantics and dis-
course as part of NLP This can be seen in
the emergence of new theories of language,
such as Situation Theory [Barwise 83] and
Discourse Representation Theory [Kamp 84]
While these theories provide an excellent the-
oretical framework for natural language un-
derstanding, the practical treatment of con- text dependency within the language can also
be improved by enhancing underlying compo- nent technologies, such as knowledge based systems In particular, alternate approaches
to symbolic manipulation provided by connec- tionist models [Rumelhart 86] have emerged Connectionist approaches enable the extrac- tion of processing knowledge from examples, instead of building knowledge bases manually The model described here represents the unification of the connectionist approach and conventional symbolic manipulation; its most valuable feature is the use of word as- sociations using neural network technology Word and concept associations appear to
be central in human cognition [Minsky 88] Therefore, simulating word associations con- tributes to semantic disambiguation in the computational process of interpreting sen- tences by putting a strong preference to ex- pected words(meanings)
The paper describes NLP reinforced by as- sociation of concepts and words via a con- nectionist network The model is employed
within a NLP application system for kana-
Trang 2kanji conversion x Finally, an evaluation of
the system and advantages over conventional
systems are presented
2 A b r i e f o v e r v i e w o f
k a n a - k a n j i c o n v e r s i o n
Japanese has a several interesting feature in
its variety of letters Especially the ex-
istence of several thousand of kanji (based
on Chinese characters; ~ , 111, ) made typing
task hard before the invention of kana-kanji
conversion[Amano 79] Now it has become
a standard method in inputting Japanese to
computers It is also used in word processors
and is familiar to those who are not computer
experts It comes from the simpleness of op-
erations By only typing sentences by pho-
netic expressions of Japanese (kan a), the kana-
kanji converter automatically converts kana
into meaningful expressions(kanji) The sim-
plified mechanism of kana-kanji conversion can
be described as two stages of processing: mor-
phological analysis and homonym selection
• Morphological Analysis
Kana-inputted (fragment of) sentences
are morphologically analized through dic-
tionary look up, both lexicons and gram-
mars There are m a n y ambiguities in
word division due to the agglutinative na-
ture of Japanese (Japanese has no spaces
in text), Each partitioning of the kana
is then further open to being a possible
interpretation of several alternate kanji
The spoken word douki, for example, can
mean motivation, pulsation, synchroniza-
tion, or copperware All of them are spelt
identically in kana( k°5 -~), but have dif-
ferent kanji e h a r a c t e r s ( ~ , ~'t-~, ~ ] , ~1
1 M a n y commercial p r o d u c t s use kana-kanji conver-
sion technology in J a p a n , including the T O S H I B A
Tosword-series of Japanese word processors
~-~,respectively) Some kana words have
10 or more possible meanings Therefore the stage of Homonym Selection is indis- pensable to kana-kanji conversion for the reduction of homonyms
Homonym Selection Preferable semantic homonyms are se- lected according to the co-occurrence restrictions and selectional restrictions The frequency of use of each word is also taken into account Usually, the selection
is also reinforced by a simple context hold- ing mechanism; when homonyms appear
in previous discourse and one of t h e m is chosen by a user, the word is automat- ically memorized in the system as in a cache technology Then, when the same homonyms appear the memorized word is selected as the most preferred candidate and is shown to the user
3 A s s o c i a t i o n - b a s e d k a n a - kanji c o n v e r s i o n
The above mechanisms are simple and effec- tive in regarding kana-kanji converter as a typ- ing aid However, the abundance of homonyms
in Japanese contributes to m a n y of the am- biguities and a user is forced to choose the desired kanji from m a n y candidates To re- duce homonym ambiguities a variety of tech- niques are available; however, these tend to
be limited from a semantic disambiguation perspective In using word co-occurrence re- strictions, it is necessary to collect a large amount of co-occurrence phenomena, a prac- tically impossible task In the case of the use of selectional restrictions, an appropri- ate thesaurus is necessary but it is known
t h a t defining the conceptual hierarchy is dif- ficult work [Lenat 89][EDR 90] Techniques for storing previous kanji selections (cache)
Trang 3j
',,'- / ~ \
~ ~ \ ',, "t ~ 2 " ~ J
Figure 1: Kana-Kanji Conversion with a Neural Network
are too simple to disambiguate between possi-
ble previous selections for the same h o m o n y m
with respect to the context or between context
switches
T o avoid these problems without increasing
c o m p u t a t i o n a l costs, we propose the use of the
associative functionality of neural networks
T h e use of association is a natural extension to
the conventional context holding mechanism
T h e idea is summarized as follows T h e r e are
two stages of processing: network generation
and kana-kanji conversion
A network representing the strength of word
association is automatically generated from
real documents Real documents can be con-
s i d e r e d a s training d a t a because they are made
of correctly converted kanji Each node in the network uniquely correspond to a word
e n t r y in the dictionary of kana-kanji conver- sion Each node has an activation level
T h e link between nodes is a weighted link and represents the strength of association be- tween words T h e network is a Hopfield-type network[Hopfield 84]; links are bidirectional and a network is one layered
When the user chooses a word from
h o m o n y m candidates, a certain value is in-
p u t t e d to the node corresponding to the cho- sen word and the node will be activated T h e activation level of nodes connected to the ac- tivated node will be then activated In this manner, the activation spreads over the net-
Trang 4work through the links and the active part of
the network can be considered as the associa-
tive words in that context In kana-kanji con-
version, the converter decides the preference
of word order for homonyms in the given con-
text by comparing the node activation level of
each node of homonyms An example of the
m e t h o d is shown in Figure 1
Assume the network is already built from
certain documents A user is inputting a text
whose topic is related to computer hardware
In the example, words like clock ( ~ t~ ~ ~ )
ous context, so their activation levels are rela-
tively high When the word DOUKI (~") ~)
is inputted in kana and the conversion starts,
the activation level of synchronization (~J~)
is higher than that of other candidates due to
its relationship to clock or signal T h e input
nization ([~jtj])
T h e advantages of our m e t h o d are:
* T h e m e t h o d enables kanji to be given
based on a preference related to the cur-
rent context Alternative kanji selections
are not discarded but are just given a
lower context weighing Should the con-
text switch, the other possible selections
will obtain a stronger context preference;
this strategy allows the system to capably
handle context change
* Word preferences of a user are reflected in
the network
• T h e correctness of the conversion is im-
proved without high-cost computation
such as semantic/discourse analyses
T h e system was built on Toshiba AS-4000
workstation (Sun4 compatible machine) using
C T h e system configuration is shown in Fig- ure 2
T h e left-hand side of the dashed line repre- sents an off-line network building process T h e right-hand side represents a kana-kanji con- version process reinforced with a neural net-
are done in parallel with kana-kanji conver- sion T h e kana-kanji converter receives kana-
sequences from a user It searches the dictio-
and finally creates a list of possible h o m o n y m candidates T h e n the neural network handler
is requested for activation levels of homonyms After the selection of preferred homonyms, it shows the candidates in kanji to a user W h e n the user chooses the desired one, the chosen word information is sent to the neural network
and the corresponding node is activated
T h e roles and the functions of main compo- nents are described as follows
* Neural Network G e n e r a t o r Several real d o c u m e n t s are analyzed and the network nodes and the weights of links are automatically decided T h e docu- ments consist of the m i x t u r e of kana and
given context are also provided T h e doc- uments, therefore, can be seen as training
d a t a for the neural network T h e analysis proceeds through the following steps
1 Analyze the d o c u m e n t s morpholog- ically and convert into a sequence
of words Note t h a t particles and demonstratives are ignored because they have no characteristics in word association
2 C o u n t up the frequency of the all combination of co-appeared word- pair in a p a r a g r a p h a n d memorize
Trang 5a s s ~ l a C l v e
net~rX
I
1
F ~
H ~ d l e r
Lex.lcons £ 1 ~ammars h i r a g a n a
sequeltce4
I Kana.Kaq/i -!
activation l e v e l s
o1" n e u r o n s homonym
c a n d l d a t e s
fin kanJ$)
actlvet~ngchoeen neurons
Figure 2: System Configuration
! ~ j
I,u.#'~
i
them as the strength of connection
A paragraph is recognized only by a
format information of documents
3 Sum up the strength of connection
for each word-pair
4 Regularize the training data; this
involves removing low occurrences
(noise) and partitioning the fre-
quency range in order to obtain
a monotonically decreasing (in fre-
quency) training set
Although the network data have
only positive links and not all nodes
are connected, non-connected nodes
are assumed to be connected by neg-
ative weights so that the Hopfield
conditions [Hopfield 84] are satisfied
As described above, the technique used here is a morphological and statistical analysis Actually this module is a pat- tern learning of co-appearing words in a paragraph
The idea behind of this approach is that words that appear together in a para- graph have some sort of associative con- nection By accumulating them, pairs without such relationships will be statis- tically rejected
From a practical point of view, automated network generation is inevitable Since human word association differ by individ-
Trang 6ual, creation of a general purpose asso-
ciative network is not realistic Because
the training d a t a for the network is sup-
posed to be supplied by users' d o c u m e n t s
in our system, a u t o m a t i c network genera-
tion m e c h a n i s m is necessary even if the
generated network is s o m e w h a t inaccu-
rate
• Neural Network Handler
T h e role of the module is to recall the
total p a t t e r n s of co-appearing words in a
p a r a g r a p h from the partial p a t t e r n s of the
current p a r a g r a p h given by a user
T h e o u t p u t value Oj for each node j is
calculated by following equations
Oj = f ( n j )
nj = (1 - 5)nj + 6 ( Z wjiO i -11- I j )
i
where
f : a sigmoidal function
: a real n u m b e r representing the inertia
of the network(0 < ~ < 1)
nj : input value to node j
Ij : external input value to node j
wjl : weight of a link from node i to node
j ; W j i W i j , Wii ~ O
T h e external input value Ij takes a cer-
tain positive value when the word corre-
sponding to node j is chosen by a user
Otherwise zero
Although the module is software imple-
mented, it is fast enough to follow tile
typing speed of a user 2
• Kana-Kanji Converter
2A certain optinfization technique is used respect-
ing for the spm-seness of the network
Tile basic algorithm is almost s a m e as the conventional one T h e difference is
t h a t holnonym candidates are sorted by the activation levels of the correspond- ing nodes in the network, except when lo- cal constraints such as word co-occurrence restrictions are applicable to the candi- dates T h e associative information also affects the preference decision of g r a m - matical ambiguities
5 E v a l u a t i o n
To evaluate tile m e t h o d , we tested the im- plemented s y t e m by doing kana-kanji conver- sion for real documents T h e training d a t a and tested d a t a were t a k e n from four types
of documents: business letters, personal let- ters, news articles, and technical articles T h e
a m o u n t of training d a t a and tested d a t a was over 100,000 phrases and 10,000 phrases re- spectively, for each t y p e of document T h e measure for accuracy of conversion was a re- duction r a t i o ( R R ) of the h o m o n y m choice operations of a user For comparison, we also evaluated the reduction r a t i o ( R R ~) of the kana-kanji conversion with a conventional con- text holding mechanism
R R = (A - B ) / A
R R ' = ( A - C ) / A whe1:e
A : number of clmice operations required when
an untrained kana-kanji converter was used
B : n u m b e r of choice operations required when
a NN-trained kana-kanji converter was used
C : nunlber of choice operations required when a kana-kanji converter with a conven- tional context holding m e c h a n i s m was used Tile result is shown in Table 1 T h e ad- vantages of our m e t h o d is clear for each t y p e
Trang 7Table 1: Result of the Evaluation
d o c u m e n t - t y p e RR(%) RR'(%)
business letters 41.8 32.6 personal letters 20.7 12.7 news articles 23.4 12.2 technical articles 45.6 40.7
of documents Especially, it is notable t h a t
the advantages in business letter field is promi-
nent, because more than 80% of word proces-
sor users write business letters
6 D i s c u s s i o n
Although the result of conversion test is sat-
isfactory, word associations by neural network
are not human-like ones yet Following is a list
o f improvements t h a t m a n y further enhance
the system:
• Improvements for generating a network
T h e quality of the network depends on
how to reduce noisy word occurrence in
the network from the point of view of as-
sociation T h e existence of noisy words
is inevitable in automatic generation but
plays a role to make unwanted associa-
tions One approach to reducing noisy
words is to identify those words which
are context independent and remove t h e m
from the network generation stage T h e
identification can be based on word cat-
egories and meanings In most cases,
words representing very abstract concepts
are noisy because they force unwanted ac-
tivations in unrelated contexts There-
fore they should be detected through ex-
periments Another problem arises be-
cause of the ambiguity of morphological
analysis Word extraction from real doc-
uments is not always correct because of
the agglutinative nature of the Japanese language O t h e r possibility for network improvement is to consider a syntactic relationship or co-occurrence relationship while deciding link weights In addition, there are keywords in a d o c u m e n t in gen- eral which play a central role in associa- tion T h e y will be reflected in a network more in consideration of technical t e r m s Preference decision in kana-kanji conver- sion
T h e reinforcement of associative informa- tion complicates the decision of h o m o n y m preference in kana-kanji conversion We already have several means of seman- tic disambiguation of homonyms: co- occurrence restrictions and selectional re- strictions As building a complete the- saurus is very difficult, our thesaurus
is still not enough to select the cor- rect meaning(kanfi-conversion) of kana-
written word So selectional restrictions should be weak constraints in h o m o n y m selection In the same vein, associative information should be considered a weak constraint because associations by neural networks are not always reliable Pos- sible conflict between selectional restric- tions and associative information, added
to tile grammatical ambiguities remaining
in the stage of h o m o n y m selection, make kanji selection very complex T h e prob- lem of multiply and weakly constrained
Trang 8homonyms is one to which we have not
yet found the best solution
7 C o n c l u s i o n
This paper described an association based nat-
ural language processing and its application
to kana.kanji conversion We showed advan-
tages of the method over the conventional one
through the experiments After the improve-
ments discussed above, we are planning to de-
velop a neuro-word processor available in com-
mercial use We are also planning the applica-
tion of the method to other fields including
machine translations and discourse analyses
for natural language interface to computers
R e f e r e n c e s
[Amano 79]
[Barwise 83]
[EDR 90]
[Hopfield 84]
[Kamp 84]
Kawada, T and Amano, S.,
"Japanese Word Processor,"
Proc IJCAI-79, pp 466-468,
1979
Barwise, J and Perry, J., "Sit- uations and Attitudes," MIT Press, 1983
Japan Electronic Dictionary Research Institute,
"Concept Dictionary," Tech
Rep No.027, 1990
Hopfield, J., "Neurons with Graded Response Have Col- lective Computational Proper- ties Like Those of Two-State Neurons," Proc Natl Acad
Sci USA 81, pp 3088-3092,
1984
Kamp, H., "A Theory of Truth and Semantic Repre- sentation," in Groenendijk et
[Lenat 89]
[Minsky 88]
[Rumelhart 86]
[Waltz 85]
al(eds.) "Truth, Interpreta- tion and Information", 1984 Lenat, D and Guha, R.,
"Building Large Knowledge- Based Systems: Represen-
tation and Inference in the Cyc Project," Addison- Wesley, 1989
Minsky, M., "The Society Of Mind,", Simon gz Schuster Inc., 1988
Rumelhart, D., McClelland, J., and the PDP Research Group, "Parallel Distributed Processing: Explorations in the Microstructure of Cogni- tion," MIT Press, 1986 Waltz, D and Pollack, J.,
"Massively Parallel Parsing:
A Strongly Interactive Model
of Natural Language Interpre- tation," Cognitive Science, pp 51-74, 1985