In addition, by using the SOM, we demonstrate visually how the importance ofinput variables affects the outputs from the other components, such as competitive units.. However, we have ha
Trang 1SELF ORGANIZING MAPS ͳ APPLICATIONS AND NOVEL
ALGORITHM DESIGNEdited by Josphat Igadwa Mwasiagi
Trang 2Self Organizing Maps - Applications and Novel Algorithm Design
Edited by Josphat Igadwa Mwasiagi
Published by InTech
Janeza Trdine 9, 51000 Rijeka, Croatia
Copyright © 2011 InTech
All chapters are Open Access articles distributed under the Creative Commons
Non Commercial Share Alike Attribution 3.0 license, which permits to copy,
distribute, transmit, and adapt the work in any medium, so long as the original
work is properly cited After this work has been published by InTech, authors
have the right to republish it, in whole or part, in any publication of which they
are the author, and to make other personal use of the work Any republication,
referencing or personal use of the work must explicitly identify the original source.Statements and opinions expressed in the chapters are these of the individual contributors and not necessarily those of the editors or publisher No responsibility is accepted for the accuracy of information contained in the published articles The publisher
assumes no responsibility for any damage or injury to persons or property arising out
of the use of any materials, instructions, methods or ideas contained in the book
Publishing Process Manager Jelena Marusic
Technical Editor Teodora Smiljanic
Cover Designer Martina Sirotic
Image Copyright riri, 2010 Used under license from Shutterstock.com
First published January, 2011
Printed in India
A free online edition of this book is available at www.intechopen.com
Additional hard copies can be obtained from orders@intechweb.org
Self Organizing Maps - Applications and Novel Algorithm Design,
Edited by Josphat Igadwa Mwasiagi
p cm
ISBN 978-953-307-546-4
Trang 3free online editions of InTech
Books and Journals can be found at
www.intechopen.com
Trang 5Ryotaro Kamimura
Privacy-Preserving Clustering on Distributed Databases:
A Review and Some Contributions 33
Flavius L Gorgônio and José Alfredo F Costa
A Method for Project Member Role Assignment
in Open Source Software Development using Self-Organizing Maps 55
Shingo Kawamura, Minoru Uehara, and Hideki Mori
Data Envelopment Analysis 69 Modelling with Self-Organising Maps and Data Envelopment Analysis:
A Case Study in Educational Evaluation 71
Lidia Angulo Meza, Luiz Biondi Neto, Luana Carneiro Brandão, Fernando do Valle Silva Andrade, João Carlos Correia Baptista Soares de Mello and Pedro Henrique Gouvêa Coelho
Self-Organizing Maps Infusion with Data Envelopment Analysis 89
Mithun J Sharma and Yu Song Jin
The Study of Multi-media and Web-based Contents 95
A Speech Recognition System for Embedded Applications Using the SOM and TS-SOM Networks 97
Amauri H Souza Júnior, Guilherme A Barreto and Antonio T Varela
Contents
Trang 6Combining SOMs and Ontologies for Effective Web Site Mining 109
Dimitris Petrilis and Constantin Halatsis
A Study on Facial Expression Recognition Model using an Adaptive Learning Capability 125
Masaki Ishii
Self-Organization and Aggregation of Knowledge 143
Koichiro Ishikawa, Yoshihisa Shinozawa and Akito Sakurai
Image Search in a Visual Concept Feature Space with SOM-Based Clustering and Modified Inverted Indexing 173
Mahmudur Rahman
Mel-Frequency Cepstrum Coefficients
as Higher Order Statistics Representation to Characterize Speech Signal for Speaker Identification System
in Noisy Environment using Hidden Markov Model 189
Agus Buono, Wisnu Jatmiko and Benyamin Kusumoputro
Improvements in the Transportation Industry 207 Ship’s Hydroacoustics Signatures
Classification Using Neural Networks 209
A Review of Self-Organizing Map Applications
in Meteorology and Oceanography 253
Yonggang Liu and Robert H Weisberg
Using Self Organising Maps
M L Gonçalves, J A F Costa and M L A Netto
Trang 7Applications of Complex-Valued
Self-Organizing Maps to Ground
Penetrating Radar Imaging Systems 323
Akira Hirose and Yukimasa Nakano
Automated Mapping of Hydrographic Systems
from Satellite Imagery Using Self-Organizing
Maps and Principal Curves 339
Marek B Zaremba
Application of SOM in Medical and Biological Sciences 355 Computational Approaches as a Tool
to Study Developmental Biology in New World Primates 357
Maria Bernardete Cordeiro de Sousa,
Allan Medeiros, Dijenaide Chaves de Castro,
Adriano de Castro Leão and Adrião Duarte Dória Neto
Clustering Genes, Tissues, Cells
and Bioactive Chemicals by Sphere SOM 371
Yuh Sugii, Takayuki Kudoh, Takayuki Otani,
Masashi Ikeda, Heizo Tokutaka and Masaharu Seno
Application of Self-Organizing Maps in Chemistry
The Case of Phenyl Cations 387
Daniele Dondi, Armando Buttafava and Angelo Albini
Myoelectric Knee Angle Estimation Algorithms
for Control of Active Transfemoral Leg Prostheses 401
Alberto L Delis, Joao L A Carvalho, Adson F da Rocha,
Francisco A O Nascimento and Geovany A Borges
A Self Organizing Map Based
Postural Transition Detection System 425
Wattanapong Kurdthongmee
Apparent Age Estimation System
Based on Age Perception 441
Hironobu Fukai, Hironori Takimoto,
Yasue Mitsukura, and Minoru Fukumi
Use of SOM in the Mechanical
and Manufacturing Engineering 453
Parametric and Robust Optimization Study
of a Vibration Absorber with a Generalized Cubic,
Quadratic and Non Integer Nonlinearities
of Damping and Stiffness 455
M.–Lamjed Bouazizi and S Ghanmi and R Nasri
Trang 8Harmonic Potential Fields: An Effective Tool for Generating a Self-organizing Behavior 493
Ahmad A Masoud
Kohonen Maps Combined to Fuzzy C-means,
a Two Level Clustering Approach
Application to Electricity Load Data 541
Khadir M Tarek and Benabbas Farouk
Fault Localization Upon Non-Supervised Neural Networks and Unknown Input Observers for Bounded Faults 559
Benítez-Pérez H and Ortega-Arjona J L
Use of SOM to Study Cotton Growing and Spinning 577
Josphat Igadwa Mwasiagi
Design and Application of Novel Variants of SOM 601 Associative Self-Organizing Map 603
Magnus Johnsson, Max Martinsson, David Gil and Germund Hesslow
Growing Topology Learning Self-Organizing Map 627
Vilson L Dalle Mole and Aluizio F R Araújo
Is it Visible?
Micro-artefacts’ Nonlinear Structure and Natural Formation Processes 643
Dimitris Kontogiorgos and Alexandros Leontitsis
Self-Organization of Object Categories
in a Cortical Artificial Model 649
Alessio Plebe
Applying SOFM and Its FPGA Implementation
on Event Processing of PET Block Detector 677
Trang 11The advent of Self Organizing Maps (SOM) provided an opportunity for scientists to experiment with its ability to solve hitherto complicated problems in all spheres of life SOM has found application in practically all fi elds, especially those which tend to han-dle high dimensional data SOM can be used for the clustering of genes, in the medical
fi eld, the study of multimedia and web-based content and in the transportation try, just to name a few The complex data found in meteorological and remotely sensed images commonly acquired using satellite sensing can also be analyzed using SOM The impact of SOM in the improvement of human life can not be overstated The wide application of SOM in many other areas which include data management, data envel-opment analysis and manufacturing engineering has enabled a thorough study of its strength and weakness This has resulted in the design of novel variants of SOM algo-rithms aimed at addressing some of the weaknesses of SOM
indus-This book seeks to highlight the application of SOM in varied types of industries
Nov-el variants of the SOM algorithms will also be discussed
Dr Josphat Igadwa Mwasiagi
School of Engineering, Moi University, Eldoret,
Kenya
Trang 13Part 1
Data Interpretation and Management
Trang 15In this chapter, we propose a new method to measure the importance of input variables and
to examine the effect of the input variables on other components We applied the method
to competitive learning, in particular, self-organizing maps, to demonstrate the performance
of our method Because our method is based upon our information-theoretic competitivelearning, it is easy to incorporate the idea of the importance of input variables into themethod In addition, by using the SOM, we demonstrate visually how the importance ofinput variables affects the outputs from the other components, such as competitive units
In this section, we first state that our objective is to interpret the network configurations asclearly as possible Then, we show why the importance of input variables should be takeninto account Finally, we will briefly survey our information-theoretic competitive learningand its relation to the importance of input variables
The objective of the new method is to interpret network configurations, focusing upon themeaning of input variables in particular, because we think that one of the most importanttasks in neural learning is that of interpreting network configurations explicitly (Rumelhart
et al., 1986; Gorman & Sejnowski, 1988) In neural networks’ applications, we have had muchdifficulty to explain how neural networks respond to input patterns and produce their outputsdue to the complexity and non-linear nature of data transformation (Mak & Munakata,2002), namely, the low degree of human comprehensibility (Thrun, 1995; Kahramanli &Allahverdi, 2009) in neural networks One of the major approaches for interpretation isrule extraction from trained neural networks by symbolic interpretations with three types of
methods, namely, decompositional, pedagogical and eclectic (Kahramanli & Allahverdi, 2009) In
the decompositional approach (Towell & Shavlik, 1993; Andrews et al., 1993; Tsukimoto, 2000;Garcez et al., 2001), we analyze the hidden unit activations and connection weights for betterunderstanding of network configurations On the other hand, in the pedagogical approach(Andrews et al., 1993), the neural network is considered to be a black box, and we only focusupon the imitation of input-output relations exhibited by the neural networks Finally, inthe eclectic approach (Andrews et al., 1993; Barakat & Diederich, 2005), both pedagogicaland decompositional approaches are incorporated In the popular decompositional approach,much attention has been paid to hidden units as well as connection weights The importance
of input variables has been implicitly taken into account For example, Tsukimoto (Tsukimoto,2000) used the absolute values of connection weights or the squared connection weights toinput variables (attributes) for measuring the importance of input variables In addition,
1
Trang 162 Self Organising Maps, New Achievements
(Garcez et al., 2001) pointed out that the pruning of input vectors maintained the highestpossible precision
On the other hand, in machine learning, variable selection or the interpretation of inputvariables has received much attention In data processing, the number of input variableshas become extremely large (Guyon & Elisseeff, 2003) Thus, it is important to estimate whichinput variable should be taken into account in actual data processing Variable selection aims
to improve the prediction performance, to reduce the cost in prediction and to understandthe main mechanism of data processing (Guyon & Elisseeff, 2003) The third aim is morerelated to the present paper To cope with this variable selection, many methods have beendeveloped (Steppe & K W Bauer, 1997; Belue & K W Bauer, 1995; Petersen et al., 1998) sofar However, we have had few attempts made in the field of unsupervised learning, forexample, competitive learning and SOM, to take into account the effect of input variables.The methods for input variables in neural networks are mainly related to supervised learning,because of the easy implementation of the measures to represent the importance of inputvariables (Guyon & Elisseeff, 2003) Few attempts have been made to apply variable selection
to unsupervised learning Thus, it is necessary to examine the effect of input variables throughthe visualization abilities of the SOM
In unsupervised learning, explicit evaluation functions have not been established for variableselection (Guyon & Elisseeff, 2003) We have introduced variable selection in unsupervisedcompetitive learning by introducing a method of information loss (Kamimura, 2007; 2008b;a)
or information enhancement (Kamimura, 2008c; 2009) In the information loss method, aspecific input unit or variable is temporarily deleted, and the change in mutual informationbetween competitive units and input patterns is measured If the difference between mutualinformation with and without the input unit is increased, the target input unit certainly plays
a very important role On the other hand, in information enhancement, a specific input unit
is used to enhance competitive units or to increase the selectivity of competitive units If theselectivity measured by mutual information between competitive units and input patterns islarge, the target input unit is important to increase the selectivity
One of the major difficulties with these information-theoretic methods is that it is extremelydifficult to determine how much information should be contained in explicit ways In thosemethods, there are some parameters to determine how much information should be acquired.However, there are no ways to adjust the parameters and to determine the appropriate amount
of information to be acquired We must adjust the parameters heuristically by examining finalresults such as competitive unit output and connection weights In this context, we propose anew method to measure information content to be stored in input variables The parameters
in the methods are changed to increase this information content as much as possible The basicprinciple to determine the parameters is how these parameters can maximize the information
of the input variables Compared with the previous methods, the criterion to determine theparameters is more explicit With the ability to explicitly determine the information content,
we can interpret network configurations with more confidence, because our method presents
a network configuration with maximum possible information state
Our method has been developed based on information-theoretic competitive learning Thus,our method is the most suited for competitive learning However, we applied the method
to the self-organizing maps, for two reasons First, the self-organizing map is a convenienttool to visualize the good performance of our method, better than pure competitive learningbecause the good performance can be intuitively understood by visualization techniquesrelated to the SOM Second, we think that the self-organizing map is also an attempt to
4 Self Organizing Maps - Applications and Novel Algorithm Design
Trang 17Information-Theoretic Approach to Interpret
Internal Representations of Self-Organizing Maps 3
Fig 1 A concept of the information-theoretic approach
interpret network configurations not by symbolic but by visual representation Thoughthe SOM has been developed for clustering and data mining of high-dimensional data(Kohonen, 1988; 1995; Tasdemir & Merenyi, 2009), the SOM’s main contribution consists inthe visualization of high dimensional data in terms of the lower dimensions with variousvisualization techniques In the SOM, different final configurations are made explicit byusing various visualization techniques, taking into account codebooks and data distribution(Polzlbauer et al., 2006; Vesanto, 1999; Kaski et al., 1998; Mao & Jain, 1995; Ultsch & Siemon,1990; Ultsch, 2003) From our point of view, the approach of visual representations to interpretnetwork configurations corresponds conceptually to the decompositional approach in ruleextraction, though symbolic representations are not extracted We think that visualization
is an effective tool for interpreting final configurations, corresponding to the extraction ofsymbolic rules in rule extraction
2 Theory and computational methods
5
Information-Theoretic Approach to Interpret Internal Representations of Self-Organizing Maps
Trang 184 Self Organising Maps, New Achievements
Fig 2 Competitive unit outputs for an initial state (a), an intermediate state (b) and a statewith maximum mutual information (c) The black and white competitive units represent thestrong and weak firing rates, respectively
information content in input units As shown in Figure 1(b2), this information should beincreased as much as possible When this information is increased, the number of importantinput variables is decreased We focus here on input units, or variables, and then informationmaximization should be biased toward information contained in input units Thus, mutualinformation in competitive units should be increased under the condition that the increase inthe mutual information prevents a network from increasing information in input units In thefollowing section, we first explain mutual information between competitive units and inputpatterns Then, using the mutual information, we define the importance of input units, bywhich the information of input variables is defined Finally, we explain how to compromisethese two types of information
Fig 3 Competitive unit outputs for conditional entropy minimization (a) and mutual
information maximization (b) The black and white competitive units represent the strongand weak firing rates, respectively
6 Self Organizing Maps - Applications and Novel Algorithm Design
Trang 19Information-Theoretic Approach to Interpret
Internal Representations of Self-Organizing Maps 5
2.2 Information-theoretic competitive learning
We begin with information for competitive units, because information of input units isdefined based upon the information for competitive units We have so far demonstratedthat competitive processes in competitive learning can be described by using the mutualinformation between competitive units and input patterns(Kamimura & Kamimura, 2000;Kamimura et al., 2001; Kamimura, 2003a;b;c;d) In other words, the degree of organization
of competitive units can be described by using mutual information between competitive unitsand input patterns Figures 2 (a), (b) and (c) show three states that depend on the amount ofinformation stored in competitive unit outputs Figure 2(a) shows an initial state without anyinformation on input patterns, where competitive unit outputs respond equally to all inputpatterns When some quantity of information is stored in competitive unit outputs, severalneurons tend to fire at the corners, shown in Figure 2(b) When mutual information betweeninput patterns and competitive units is maximized, shown in Figure 2(c), only one competitiveunit is turned on for specific input patterns
We explain this mutual information more exactly by using the network architecture shown
in Figure 1 In the network, x s
k , w jk and v s
j represent the kth element of the sth input pattern, connection weights from the kth input to the jth competitive unit and the jth competitive unit output for the sth input pattern The competitive unit outputs can be normalized as p(j | s)to
represent the firing probability of the jth competitive unit In the network, we have L input units, M competitive units and S input patterns.
First, the jth competitive unit outputs v s
j for the sth input pattern can be computed by
The firing probability of the jth competitive unit for the sth input pattern can be obtained by
normalizing these competitive unit outputs
p(j | s) = v
s j
we have the high possibility that only one competitive unit at the corner in the figure is alwaysturned on On the other hand, when mutual information is maximized, different competitiveunits respond to different input patterns, as shown in Figure 2(b) Thus, mutual informationmaximization can realize a process of competition in competitive learning
7
Information-Theoretic Approach to Interpret Internal Representations of Self-Organizing Maps
Trang 206 Self Organising Maps, New Achievements
ε
ε
ε
Fig 4 Importance p(k)with large (a), small and estimated importance (c).
Fig 5 Importance p(k)with large (a), small and estimated importance (c).
8 Self Organizing Maps - Applications and Novel Algorithm Design