Neural networks in a softcomputing framework

GCS growing cell structuresGEFREX genetic fuzzy rule extractor GESA guided evolutionary SA GFP generic fuzzy perceptron GGAP-RBF generalized GAP-RBF algorithm GII global identical index

Trang 1

Neural Networks in a Softcomputing Framework

Trang 2

K.-L Du and M.N.S Swamy

Neural Networks in a Softcomputing

Framework

With 116 Figures

123

Trang 3

M.N.S Swamy, PhD, D.Sc (Eng)

Centre for Signal Processing and Communications

Department of Electrical and Computer Engineering

Neural networks in a softcomputing framework

1.Neural networks (Computer science)

I.Title II.Swamy, M N S.

006.3’2

ISBN-13: 9781846283024

ISBN-10: 1846283027

Library of Congress Control Number: 2006923485

ISBN-10: 1-84628-302-7 e-ISBN 1-84628-303-5 Printed on acid-free paper ISBN-13: 978-1-84628-302-4

Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms of licences issued

by the Copyright Licensing Agency Enquiries concerning reproduction outside those terms should be sent to the publishers.

The use of registered names, trademarks, etc in this publication does not imply, even in the absence of

a speciﬁc statement, that such names are exempt from the relevant laws and regulations and therefore free for general use.

The publisher makes no representation, express or implied, with regard to the accuracy of the mation contained in this book and cannot accept any legal responsibility or liability for any errors or omissions that may be made.

infor-Printed in Germany

9 8 7 6 5 4 3 2 1

Springer Science+Business Media

springer.com

Trang 4

OUR PARENTS

AND

TEACHERS

Trang 5

Softcomputing, a concept introduced by L.A Zadeh in the early 1990s, is anevolving collection of methodologies for the representation of the ambiguity

in human thinking The core methodologies of softcomputing are fuzzy logic,neural networks, and evolutionary computation Softcomputing targets at ex-ploiting the tolerance for imprecision and uncertainty, approximate reasoning,and partial truth in order to achieve tractability, robustness, and low-cost so-lutions

Research on neural networks dates back to the 1940s; the discipline ofneural networks is well developed with wide applications in almost all areas

of science and engineering The powerful penetration of neural networks isdue to their strong learning and generalization capability After a neural net-work learns the unknown relation from given examples, it can then predict, bygeneralization, outputs for new samples that are not included in the learningsample set The neural-network method is model free A neural network is ablack box that directly learns the internal relations of an unknown system.This takes us away from guessing functions for describing cause-and-eﬀect re-lationships In addition to function approximation, other capabilities of neuralnetworks such as nonlinear mapping, parallel and distributed processing, as-sociative memory, vector quantization, optimization, and fault tolerance alsocontribute to the widespread applications of neural networks

The theory of fuzzy logic and fuzzy sets was introduced by L.A Zadeh in

1965 Fuzzy logic provides a means for treating uncertainty and computingwith words This is especially useful to mimic human recognition, which skill-fully copes with uncertainty Fuzzy systems are conventionally created fromexplicit knowledge expressed in the form of fuzzy rules, which are designedbased on experts’ experience A fuzzy system can explain its action by fuzzyrules Fuzzy systems can also be used for function approximation The synergy

of fuzzy logic and neural networks generates neurofuzzy systems, which inheritthe learning capability of neural networks and the knowledge-representationcapability of fuzzy systems

Trang 6

Evolutionary computation is a computational method for obtaining thebest possible solutions in a huge solution space based on Darwin’s survival-of-the-ﬁttest principle Evolutionary algorithms are a class of robust adaptationand global optimization techniques for many hard problems Among evolu-tionary algorithms, the genetic algorithm is the best known and most studied,while evolutionary strategy is more eﬃcient for numerical optimization Moreand more biologically or nature-inspired algorithms are emerging Evolution-ary computation has been applied for the optimization of the structure orparameters of neural networks, fuzzy systems, and neurofuzzy systems Thehybridization between neural network, fuzzy logic, and evolutionary compu-tation provides a powerful means for solving engineering problems.

At the invitation of Springer, we initially intended to write a monograph onneural-network applications in array signal processing Since neural-networkmethods are general-purpose methods for data analysis, signal processing, andpattern recognition, we, however, decided to write an advanced textbook onneural networks for graduate students More speciﬁcally, neural networks can

be used in system identiﬁcation, control, communications, data compressionand reconstruction, audio and speech processing, image processing, clusteringanalysis, feature extraction, classiﬁcation, and pattern recognition, etc Con-ventional model-based data-processing methods require experts’ knowledge forthe modeling of a system In addition, they are computationally expensive.Neural-network methods provide a model-free, adaptive, parallel-processingsolution

In this book, we will elaborate on the most popular neural-network modelsand their associated techniques These include multilayer perceptrons, radialbasis function networks, Hopﬁeld networks, Boltzmann machines and stochas-tic neural-network models, many models and algorithms for clustering analysisand principal component analysis The applications of these models constitutethe majority of all neural-network applications Self-contained fundamentals

of fuzzy logic and evolutionary algorithms are introduced, and their synergies

in the other two paradigms of softcomputing described

We include in this book a thorough review of various models Major search results published in the past decades have been introduced Problems

re-of array signal processing are given as examples to illustrate the applications

of each neural-network model

This book is divided into ten chapters and an appendix Chapter 1 gives

an introduction to neural networks Chapter 2 describes some fundamentals

of neural networks and softcomputing A detailed description of the networkarchitecture and the theory of operation for each softcomputing method isgiven in Chapters 3 through 9 Chapter 10 lists some other interesting oremerging neural-network and softcomputing methods and also mentions sometopics that have received recent attention Some mathematical preliminariesare given in the appendix The contents of the various chapters are as follows

Trang 7

• In Chapter 1, a general introduction to neural networks is given This

in-volves the history of neural-network research, the McCulloch–Pitts neuron,network topologies, learning methods, as well as properties and applica-tions of neural networks

• Chapter 2 introduces some topics of neural networks and softcomputing

such as the statistical learning theory, learning and generalization, modelselection, robust learning as well as feature selection and feature extraction

• Chapter 3 is dedicated to multilayer perceptrons Perceptron learning is

ﬁrst introduced This is followed by the backpropagation learning rithm and its numerous improvement measures Many other learning al-gorithms including second-order algorithms are described

algo-• Hopﬁeld networks and Boltzmann machines are described in Chapter 4.

Some aspects of associative memory and combinatorial optimization aredeveloped Simulated annealing is introduced as a global optimizationmethod Some unsupervised learning algorithms for Hopﬁeld networks andBoltzmann machines are also discussed

• Chapter 5 treats competitive learning and clustering networks Dozens

of clustering algorithms, such as Kohonen’s self-organizing map, learning

vector quantization, adaptive resonance theory (ART), C-means, neural gas, and fuzzy C-means, are introduced.

• Chapter 6 systematically deals with radial basis function networks, which

are fast alternatives to the multilayer perceptron Some recent learning gorithms are also introduced A comparison with the multilayer perceptron

al-is made

• Numerous neural networks and algorithms for principal component

ana-lysis, minor component anaana-lysis, independent component anaana-lysis, andsingular value decomposition are described in Chapter 7

• Fuzzy logic and neurofuzzy systems are described in Chapter 8 The

rela-tion between neural networks and fuzzy logic is addressed Some popularneurofuzzy models including the ANFIS are detailed in this chapter

• In Chapter 9, we elaborate on evolutionary algorithms with emphasis on

genetic algorithms and evolutionary strategies Applications of ary algorithms to the optimization of the structure and parameters of aneural network or a fuzzy logic are also described

evolution-• A brief summary of the book is given in Chapter 10 Some other useful or

emerging neural-network models and softcomputing paradigms are brieﬂydiscussed In Chapter 10, we also propose some foresights in this discipline.This book is intended for scientists and practitioners who are working

in engineering and computer science The softcomputing paradigms are ofgeneral purpose in nature, thus this book is also useful to people who are in-terested in applications of neural networks, fuzzy logic, or evolutionary com-putation to their specific fields This book can be used as a textbook forgraduate students Researchers interested in a particular topic will benefitfrom the appropriate chapter of the book, since each chapter provides a sys-

Trang 8

tematic introduction and survey on the respective topic The book contains

1272 references The state-of-the-art survey leads the readers to the most cent results, and this saves the readers enormous amounts of time in documentretrieval

re-In this book, all acronyms and symbols are explained at their ﬁrst ance Readers may encounter some abbreviations or symbols not explained in

appear-a pappear-articulappear-ar section, appear-and in this cappear-ase they cappear-an refer to the lists of appear-abbreviappear-ationsand symbols at the beginning of the book

We would like to thank the editors of Springer for their support We alsowould like to thank our respective families for their patience and understand-ing during the course of writing this book

K.-L DuM.N.S SwamyConcordia University, Montreal, Canada

March, 2006

Trang 9

List of Abbreviations xxiii

List of Symbols xxix

1 Introduction 1

1.1 A Brief History of Neural Networks 1

1.2 Neurons 3

1.3 Analog VLSI Implementation 6

1.4 Architecture of Neural Networks 7

1.5 Learning Methods 10

1.5.1 Supervised Learning 11

1.5.2 Unsupervised Learning 11

1.5.3 Reinforcement Learning 13

1.5.4 Evolutionary Learning 14

1.6 Operation of Neural Networks 14

1.6.1 Adaptive Neural Networks 15

1.7 Properties of Neural Networks 15

1.8 Applications of Neural Networks 16

1.8.1 Function Approximation 16

1.8.2 Classiﬁcation 17

1.8.3 Clustering and Vector Quantization 17

1.8.4 Associative Memory 18

1.8.5 Optimization 18

1.8.6 Feature Extraction and Information Compression 18

1.9 Array Signal Processing as Examples 19

1.9.1 Array Signal Model 19

1.9.2 Direction Finding and Beamforming 21

1.10 Scope of the Book 24

1.10.1 Summary by Chapters 25

Trang 10

2 Fundamentals of Machine Learning and Softcomputing 27

2.1 Computational Learning Theory 27

2.1.1 Vapnik–Chervonenkis Dimension 28

2.1.2 Empirical Risk-minimization Principle 29

2.1.3 Probably Approximately Correct (PAC) Learning 30

2.2 No Free Lunch Theorem 31

2.3 Neural Networks as Universal Machines 31

2.3.1 Boolean Function Approximation 32

2.3.2 Linear Separability and Nonlinear Separability 33

2.3.3 Binary Radial Basis Function 34

2.3.4 Continuous Function Approximation 35

2.4 Learning and Generalization 36

2.4.1 Size of Training Set 37

2.4.2 Generalization Error 37

2.4.3 Generalization by Stopping Criterion 38

2.4.4 Generalization by Regularization 39

2.5 Model Selection 40

2.5.1 Crossvalidation 41

2.5.2 Complexity Criteria 41

2.6 Bias and Variance 43

2.7 Robust Learning 44

2.8 Neural-network Processors 46

2.8.1 Preprocessing and Postprocessing 46

2.8.2 Linear Scaling and Data Whitening 49

2.8.3 Feature Selection and Feature Extraction 50

2.9 Gram–Schmidt Orthonormalization Transform 52

2.10 Principal Component Analysis 53

2.11 Linear Discriminant Analysis 53

3 Multilayer Perceptrons 57

3.1 Single-layer Perceptron 57

3.1.1 Perceptron Learning Algorithm 58

3.1.2 Least Mean Squares Algorithm 60

3.1.3 Other Learning Algorithms 61

3.2 Introduction to Multilayer Perceptrons 62

3.2.1 Universal Approximation 63

3.2.2 Sigma-Pi Networks 64

3.3 Backpropagation Learning Algorithm 65

3.4 Criterion Functions 69

3.5 Incremental Learning versus Batch Learning 71

3.6 Activation Functions for the Output Layer 72

3.6.1 Linear Activation Function 72

3.6.2 Generalized Sigmoidal Function 73

3.7 Optimizing Network Structure 73

3.7.1 Network Pruning 74

Trang 11

3.7.2 Network Growing 79

3.8 Speeding Up Learning Process 82

3.8.1 Preprocessing of Data Set 82

3.8.2 Eliminating Premature Saturation 83

3.8.3 Adapting Learning Parameters 84

3.8.4 Initializing Weights 89

3.8.5 Adapting Activation Function 94

3.8.6 Other Acceleration Techniques 95

3.9 Backpropagation with Global Descent 98

3.9.1 Global Descent 98

3.9.2 Backpropagation with Tunneling 99

3.10 Robust Backpropagation Algorithms 100

3.11 Resilient Propagation 101

3.12 Second-order Learning Methods 103

3.12.1 Newton’s Methods 104

3.12.2 Quasi-Newton Methods 109

3.12.3 Conjugate-gradient Methods 113

3.12.4 Extended Kalman Filtering Methods 115

3.13 Miscellaneous Learning Algorithms 118

3.13.1 Layerwise Linear Learning 119

3.13.2 Natural-gradient Method 121

3.13.3 Binary Multilayer Perceptrons 121

3.14 Escaping Local Minima 121

3.14.1 Some Heuristics for Escaping Local Minima 122

3.14.2 Global Optimization Techniques 123

3.14.3 Deterministic Global-descent Techniques 123

3.14.4 Stochastic Learning Techniques 124

3.15 Hardware Implementation of Perceptrons 125

3.16 Backpropagation for Temporal Learning 127

3.16.1 Recurrent Multilayer Perceptrons with Backpropagation 127

3.16.2 Recurrent Neural Networks with Recurrent Backpropagation 128

3.17 Complex-valued Multilayer Perceptrons and Their Learning 129

3.17.1 Split Complex Backpropagation 130

3.17.2 Fully Complex Backpropagation 130

3.18 Applications and Computer Experiments 131

3.18.1 Application 3.1: NETtalk — A Speech Synthesis System 131

3.18.2 Application 3.2: Handwritten Digit Recognition 132

3.18.3 Example 3.1: Iris Classiﬁcation 135

3.18.4 Example 3.2: DoA Estimation 138

Trang 12

4 Hopﬁeld Networks and Boltzmann Machines 141

4.1 Recurrent Neural Networks 141

4.2 Hopﬁeld Model 143

4.2.1 Dynamics of the Hopﬁeld Model 143

4.2.2 Stability of the Hopﬁeld Model 144

4.2.3 Applications of the Hopﬁeld Model 145

4.3 Analog Implementation of Hopﬁeld Networks 146

4.4 Associative-memory Models 148

4.4.1 Hopﬁeld Model: Storage and Retrieval 149

4.4.2 Storage Capability 153

4.4.3 Multilayer Perceptrons as Associative Memories 156

4.4.4 The Hamming Network 158

4.5 Simulated Annealing 160

4.5.1 Classic Simulated Annealing 160

4.5.2 Variants of Simulated Annealing 162

4.6 Combinatorial Optimization Problems 163

4.6.1 Formulation of Combinatorial Optimization Problems 164

4.6.2 Escaping Local Minima for Combinatorial Optimization Problems 165

4.6.3 Combinatorial Optimization Problems with Equality and Inequality Constraints 167

4.7 Chaotic Neural Networks 168

4.8 Hopﬁeld Networks for Other Optimization and Signal-processing Problems 170

4.9 Multistate Hopﬁeld Networks 171

4.9.1 Multilevel Hopﬁeld Networks 171

4.9.2 Complex-valued Multistate Hopﬁeld Networks 172

4.10 Boltzmann Machines and Learning 174

4.10.1 The Boltzmann Machine 175

4.10.2 The Boltzmann Learning Algorithm 176

4.10.3 The Mean-ﬁeld-theory Machine 178

4.11 Discussion 179

4.12 Computer Experiments 180

4.12.1 Example 4.1: A Comparison of Three Learning Algorithms 181

4.12.2 Example 4.2: Using the Hopﬁeld Network for DoA Estimation 183

5 Competitive Learning and Clustering 187

5.1 Vector Quantization 187

5.2 Competitive Learning 188

5.3 The Kohonen Network 191

5.3.1 Self-organizing Maps 191

5.3.2 Applications of Self-organizing Maps 194

5.3.3 Extensions of Self-organizing Maps 194

Trang 13

5.4 Learning Vector Quantization 195

5.5 C-means Clustering 197

5.5.1 Improvements on the C-means 199

5.6 Mountain and Subtractive Clustering 200

5.7 Neural Gas 203

5.7.1 Competitive Hebbian Learning 205

5.7.2 The Topology-representing Network 206

5.8 ART Networks 206

5.8.1 ART Models 206

5.8.2 ARTMAP Models 213

5.9 Fuzzy Clustering 215

5.9.1 Fuzzy C-means Clustering 215

5.9.2 Conditional Fuzzy C-means Clustering 218

5.9.3 Other Fuzzy Clustering Algorithms 219

5.10 Supervised Clustering 222

5.11 The Underutilization Problem 223

5.11.1 Competitive Learning with Conscience 223

5.11.2 Rival Penalized Competitive Learning 225

5.11.3 Softcompetitive Learning 226

5.12 Robust Clustering 227

5.12.1 Noise Clustering 227

5.12.2 Possibilistic C-means 228

5.12.3 A Uniﬁed Framework for Robust Clustering 229

5.12.4 Other Robust Clustering Problems 230

5.13 Clustering Using Non-Euclidean Distance Measures 230

5.14 Hierarchical Clustering 231

5.14.1 Partitional, Hierarchical, and Density-based Clustering 232

5.14.2 Distance Measures, Cluster Representations, and Dendrograms 233

5.14.3 Agglomerative Clustering Methods 234

5.14.4 Combinations of Hierarchical and Partitional Clustering 235

5.15 Constructive Clustering Techniques 236

5.16 Miscellaneous Clustering Methods 238

5.17 Cluster Validity 239

5.17.1 Measures Based on Maximal Compactness and Maximal Separation of Clusters 239

5.17.2 Measures Based on Minimal Hypervolume and Maximal Density of Clusters 240

5.18.1 Example 5.1: Vector Quantization Using the Self-organizing Map 242

5.18.2 Example 5.2: Solving the TSP Using the Self-organizing Map 244

Trang 14

5.18.3 Example 5.3: Three Clustering Algorithms

— A Comparison 246

5.18.4 Example 5.4: Clustering Analog Signals Using ART 2A 248

6 Radial Basis Function Networks 251

6.1 Introduction 251

6.1.1 Architecture of the Radial Basis Function Network 252

6.1.2 Universal Approximation of Radial Basis Function Networks 253

6.1.3 Learning for Radial Basis Function Networks 253

6.2 Radial Basis Functions 254

6.3 Learning RBF Centers 257

6.3.1 Selecting RBF Centers Randomly from Training Sets 258

6.3.2 Selecting RBF Centers by Clustering Training Sets 259

6.4 Learning the Weights 261

6.4.1 Least Squares Methods for Weight Learning 261

6.4.2 Kernel Orthonormalization-based Weight Learning 261

6.5 RBFN Learning Using Orthogonal Least Squares 263

6.5.1 Batch Orthogonal Least Squares 263

6.5.2 Recursive Orthogonal Least Squares 265

6.6 Supervised Learning of All Parameters 266

6.6.1 Supervised Learning for General Radial Basis Function Networks 267

6.6.2 Supervised Learning for Gaussian Radial Basis Function Networks 268

6.6.3 Implementations of Supervised Learning 269

6.7 Evolving Radial Basis Function Networks 270

6.8 Robust Learning of Radial Basis Function Networks 272

6.9 Various Learning Methods 273

6.10 Normalized Radial Basis Function Networks 274

6.11 Optimizing Network Structure 276

6.11.1 Constructive Methods 276

6.11.2 Resource-allocating Networks 278

6.11.3 Constructive Methods with Pruning 281

6.11.4 Pruning Methods 282

6.12 Radial Basis Function Networks for Modeling Dynamic Systems283 6.13 Hardware Implementations of Radial Basis Function Networks 284 6.14 Complex Radial Basis Function Networks 286

6.15 Properties of Radial Basis Function Networks 287

6.15.1 Receptive-ﬁeld Networks 287

6.15.2 Generalization Error and Approximation Error 287

6.16 Radial Basis Function Networks vs Multilayer Perceptrons 288

Trang 15

6.17.1 Example 6.1: Radial Basis Function Networks

for Beamforming 291

6.17.2 Example 6.2: Radial Basis Function Networks Based DoA Estimation 291

7 Principal Component Analysis Networks 295

7.1 Stochastic Approximation Theory 295

7.2 Hebbian Learning Rule 296

7.3 Oja’s Learning Rule 297

7.4 Principal Component Analysis 298

7.5 Hebbian Rule-based Principal Component Analysis 300

7.5.1 Subspace Learning Algorithms 301

7.5.2 Generalized Hebbian Algorithm 304

7.5.3 Other Hebbian Rule-based Algorithms 304

7.6 Least Mean Squared Error-based Principal Component Analysis 306

7.6.1 The Least Mean Square Error Reconstruction Algorithm 306

7.6.2 The PASTd Algorithm 307

7.6.3 The Robust RLS Algorithm 308

7.7 Other Optimization-based Principal Component Analysis 309

7.7.1 Novel Information Criterion Algorithm 309

7.7.2 Coupled Principal Component Analysis 310

7.8 Anti-Hebbian Rule-based Principal Component Analysis 312

7.8.1 Rubner–Tavan Principal Component Analysis Algorithm 312

7.8.2 APEX Algorithm 313

7.9 Nonlinear Principal Component Analysis 315

7.9.1 Kernel Principal Component Analysis 316

7.9.2 Robust/Nonlinear Principal Component Analysis 317

7.9.3 Autoassociative Network-based Nonlinear Principal Component Analysis 320

7.9.4 Other Networks for Dimensionality Reduction 322

7.10 Minor Component Analysis 322

7.10.1 Extracting the First Minor Component 323

7.10.2 Oja’s Minor Subspace Analysis 323

7.10.3 Self-stabilizing Minor Component Analysis 324

7.10.4 Orthogonal Oja Algorithm 324

7.10.5 Other Developments 325

7.11 Independent Component Analysis 326

7.11.1 Formulation of Independent Component Analysis 326

7.11.2 Independent Component Analysis and Regression 328

7.11.3 Approaches to Independent Component Analysis 328

7.11.4 FastICA Algorithm 329

7.11.5 Independent Component Analysis Networks 330

Trang 16

7.11.6 Nonlinear Independent Component Analysis 333

7.12 Constrained Principal Component Analysis 334

7.13 Localized Principal Component Analysis 335

7.14 Extending Algorithms to Complex Domain 336

7.15 Other Generalizations of the PCA 338

7.16 Crosscorrelation Asymmetric Networks 339

7.16.1 Extracting Multiple Principal Singular Components 339

7.16.2 Extracting the Largest Singular Component 342

7.16.3 Extracting Multiple Principal Singular Components for Nonsquare Matrices 342

7.17.1 Example 7.1: A Comparison of the Weighted SLA, the GHA, and the APEX 343

7.17.2 Example 7.2: Image Compression 348

8 Fuzzy Logic and Neurofuzzy Systems 353

8.1 Fundamentals of Fuzzy Logic 353

8.1.1 Deﬁnitions and Terminologies 354

8.1.2 Membership Function 360

8.1.3 Intersection and Union 361

8.1.4 Aggregation, Fuzzy Implication, and Fuzzy Reasoning 363

8.1.5 Fuzzy Inference Systems and Fuzzy Controllers 364

8.1.6 Fuzzy Rules and Fuzzy Interference 365

8.1.7 Fuzziﬁcation and Defuzziﬁcation 366

8.1.8 Mamdani Model and Takagi–Sugeno–Kang Model 367

8.1.9 Complex Fuzzy Logic 371

8.2 Fuzzy Logic vs Neural Networks 372

8.3 Fuzzy Rules and Multilayer Perceptrons 373

8.3.1 Equality Between Multilayer Perceptrons and Fuzzy Inference Systems 373

8.3.2 Extracting Rules According to Activation Functions 374

8.3.3 Representing Fuzzy Rules Using Multilayer Perceptrons 375

8.4 Fuzzy Rules and Radial Basis Function Networks 376

8.4.1 Equivalence Between Takagi–Sugeno–Kang Model and Radial Basis Function Networks 376

8.4.2 Fuzzy Rules and Radial Basis Function Networks: Representation and Extraction 377

8.5 Rule Generation from Trained Neural Networks 377

8.6 Extracting Rules from Numerical Data 379

8.6.1 Rule Generation Based on Fuzzy Partitioning 380

8.6.2 Hierarchical Rule Generation 382

8.6.3 Rule Generation Based on Look-up Table 383

8.6.4 Other Methods 384

8.7 Interpretability 386

Trang 17

8.8 Fuzzy and Neural: A Synergy 387

8.9 Neurofuzzy Models 389

8.9.1 The ANFIS Model 389

8.9.2 Generic Fuzzy Perceptron 392

8.9.3 Other Neurofuzzy Models 394

8.10 Fuzzy Neural Circuits 397

8.11.1 Example 8.1: Solve the DoA Estimation Using the ANFIS with Grid Partitioning 399

8.11.2 Example 8.2: Solve the DoA Estimation Using the ANFIS with Scatter Partitioning 401

9 Evolutionary Algorithms and Evolving Neural Networks 405

9.1 Evolution vs Learning 405

9.2 Introduction to Evolutionary Algorithms 406

9.2.1 Terminologies 407

9.3 Genetic Algorithms 410

9.3.1 Encoding/Decoding 410

9.3.2 Selection/Reproduction 411

9.3.3 Crossover/Mutation 413

9.3.4 Real-coded Genetic Algorithms for Continuous Numerical Optimization 418

9.3.5 Genetic Algorithms for Sequence Optimization 421

9.3.6 Exploitation vs Exploration 422

9.3.7 Adaptation 423

9.3.8 Variants of the Genetic Algorithm 424

9.3.9 Parallel Genetic Algorithms 424

9.3.10 Two-dimensional Genetic Algorithms 425

9.4 Evolutionary Strategies 426

9.4.1 Crossover, Mutation, and Selection Strategies 426

9.4.2 Evolutionary Strategies vs Genetic Algorithms 427

9.4.3 New Mutation Operators 427

9.5 Other Evolutionary Algorithms 428

9.5.1 Genetic Programming 429

9.5.2 Evolutionary Programming 429

9.5.3 Memetic Algorithms 429

9.6 Theoretical Aspects 430

9.6.1 Schema Theorem and Building-block Hypothesis 430

9.6.2 Dynamics of Evolutionary Algorithms 431

9.6.3 Deceptive Problems 432

9.7 Other Population-based Optimization Methods 432

9.7.1 Particle Swarm Optimization 432

9.7.2 Immune Algorithms 433

9.7.3 Ant-colony Optimization 434

Trang 18

9.8 Multiobjective, Multimodal,

and Constraint-satisfaction Optimizations 436

9.8.1 Multiobjective Optimization 436

9.8.2 Multimodal Optimization 437

9.9 Evolutionary Algorithms vs Simulated Annealing 439

9.9.1 Comparison Between Evolutionary Algorithms and Simulated Annealing 439

9.9.2 Synergy of Evolutionary Algorithms and Simulated Annealing 440

9.10 Constructing Neural Networks Using Evolutionary Algorithms 441 9.10.1 Permutation Problem 441

9.10.2 Hybrid Training 442

9.10.3 Evolving Network Parameters 443

9.10.4 Evolving Network Architecture 444

9.10.5 Simultaneously Evolving Architecture and Parameters 446

9.10.6 Evolving Activation Functions and Learning Rules 447

9.11 Constructing Fuzzy Systems Using Evolutionary Algorithms 447

9.12 Constructing Neurofuzzy Systems Using Evolutionary Algorithms 448

9.13 Constructing Evolutionary Algorithms Using Fuzzy Logic 450

9.13.1 Fuzzy Encoding for Genetic Algorithms 450

9.13.2 Adaptive Parameter Setting Using Fuzzy Logic 451

9.14.1 Example 9.1: Optimization of Rosenbrock’s Function 452

9.14.2 Example 9.2: Iris Classiﬁcation 454

10 Discussion and Outlook 457

10.1 A Brief Summary 457

10.2 Support Vector Machines 458

10.2.1 Support Vector Machines for Classiﬁcation 459

10.2.2 Support Vector Regression 461

10.2.3 Support Vector Clustering 463

10.3 Other Neural-network Models and Softcomputing Approaches 464 10.3.1 Generalized Single-layer Networks 464

10.3.2 Cellular Neural Networks 465

10.3.3 Wavelet Neural Networks 465

10.3.4 Tabu Search 466

10.3.5 Rough Set 467

10.3.6 Emerging Computing Paradigms 467

10.4 Some Research Topics 468

10.4.1 Face Recognition 469

10.4.2 Data Mining 469

10.4.3 Functional Data Analysis 470

Trang 19

A Appendix: Mathematical Preliminaries 471

A.1 Linear Algebra 471

A.2 Stability of Dynamic Systems 477

A.3 Probability Theory and Stochastic Processes 478

A.4 Numerical Optimization Techniques 481

References 483

Index 545

Trang 20

ACO ant-colony optimization

ACS ant-colony system

adaline adaptive linear element

A/D analog-to-digital

AFC adaptive fuzzy clustering

AIC Akaike information criterion

ALA adaptive learning algorithm

ANFIS adaptive-network-based FIS

AOSVR accurate online SVR

APEX adaptive principal

components extraction

ARBP annealing robust BP

ARC adaptive resolution classiﬁer

ARLA annealing robust learning

algorithm

ARRBFN annealing robust RBFN

ART adaptive resonance theory

ASIC application-speciﬁc

integrated circuit

ASP array signal processing

ASSOM adaptive-subspace SOM

BAM bidirectional associative

memoryBCL branching competitive

learningBER bit error rateBFGS Broyden–Fletcher–

Goldfarb–ShannoBIC Bayesian information

criterionBIRCH balanced iterative reducing

and clustering usinghierarchies

BSB brain-states-in-a-boxBSS blind source separation

CAM content-addressable memoryCDF cumulative distribution

function

CFA clustering for function

approximationCFHN compensated fuzzy Hopﬁeld

network

Trang 21

CICA constrained ICA

CMA covariance matrix

CNN cellular neural network

COP combinatorial optimization

DAC digital-to-analog converter

dARTMAP distributed ARTMAP

DBSCAN density-based spatial

cluster-ing of applications with noise

DCS dynamic cell structures

DCT discrete cosine transform

DEKF decoupled EKF algorithm

DFA deterministic ﬁnite-state

automaton

DFP Davidon–Fletcher–Powell

DFT discrete Fourier transform

DFNN dynamic fuzzy neural

DPE dynamic parameter encoding

DWT discrete wavelet transform

EA evolutionary algorithm

ECAM exponential correlation

asso-ciative memory model

ECFC entropy-constrained fuzzy

clustering

ECLVQ entropy-constrained LVQ

EEBP equalized error BP

EHF extended H∞ﬁltering

EKF extended Kalman ﬁlteringELSA evolutionary local selection

clusteringE-step expectation stepETF elementary transcendental

functionEVD eigenvalue decomposition

FALVQ fuzzy algorithms for LVQFAM fuzzy associative memoryFBFN fuzzy basis function network

FCL fuzzy competitive learningFDA functional data analysisFFA fuzzy ﬁnite-state automaton

FFT fast Fourier transformFHN fuzzy Hopﬁeld networkFIR ﬁnite impulse responseFIS fuzzy inference systemFKCN fuzzy Kohonen clustering

networkﬂop ﬂoating-point operation

FNN feedforward neural networkFOSART fully self-organizing SARTFPE ﬁnal prediction errorFSCL frequency-sensitive

competitive learningFuGeNeSys fuzzy genetic neural system

GAP-RBF growing and pruning

algorithm for RBF

GAVaPS GA with varying population

size

Trang 22

GCS growing cell structures

GEFREX genetic fuzzy rule extractor

GESA guided evolutionary SA

GFP generic fuzzy perceptron

GGAP-RBF generalized GAP-RBF

algorithm

GII global identical index

GLVQ-F generalized LVQ family

algorithms

GNG-U GNG with utility criterion

GOTA globally optimal training

HFPNN hybrid fuzzy polynomial

neural network

HWO hidden weight optimization

HUFC hierarchical unsupervised

fuzzy clustering

HUX half-uniform crossover

HyFIS Hybrid neural FIS

analysis

LBG-U LBG with utility

LCMV linear constrained minimum

varianceLDA linear discriminant analysisLII local identical indexLLCS life-long learning cell

structuresLLLS local linearized LS

LMAM LM with adaptive

momentumLMS least mean squaresLMSE least mean squared errorLMSER least mean square error

learningLTG linear threshold gate

LVQ learning vector quantization

MAD median of the absolute

deviationMBCL multiplicatively biased

Trang 23

MSA minor subspace analysis

M-step maximization step

NARX nonlinear autoregressive

with exogenous input

NEFCLASS neurofuzzy classiﬁcation

NIC novel information criterion

NLCPCA nonlinear complex PCA

NLDA nonlinear discriminant

analysis

NOOja normalized orthogonal Oja

NOVEL nonlinear optimization via

external lead

NSGA nondominated sorting GA

OBS optimal brain surgeon

ODE ordinary diﬀerential

equation

OLS orthogonal least squares

OmeGA ordering messy GA

PAES Pareto archived ES

PAST projection approximation

subspace trackingPASTd PAST with deﬂation

PCA principal component analysisPCB printed circuit boardPCG projected conjugate gradientPCM possibilistic C-means

PDF probability density functionPESA Pareto envelope-based

selection algorithm

PMX partial matched crossoverPNN probabilistic neural networkPSA principal subspace analysisPSO particle swarm optimizationPTG polynomial threshold gatePWM pulse width modulation

QR-cp QR with column pivoting

RAN resource-allocating networkRBF radial basis functionRBFN radial basis function networkRCA robust competitive

agglomerationRCAM recurrent correlation

associative memoryRCE restricted Coulomb energyRecSOM recursive SOM

RLS recursive least squaresRNN recurrent neural networkROLS recursive OLS

RPCL rival penalized competitive

learningRProp resilient propagationRRLSA robust RLS algorithm

RTRL real-time recurrent learning

SA simulated annealingSAM standard additive model

SCL simple competitive learningSCS soft competition schemeSER average storage error rate

Trang 24

S-Fuzzy ART symmetric fuzzy ART

SISO single-input single-output

SOMSD SOM for structured data

SVR support vector regression

TABP terminal attractor-based

BP

TDNN time-delay neural network

TDRL time-dependent recurrent

learningTLMS total least mean squaresTLS total least squares

TNGS theory of neuronal group

selectionTREAT trust-region-based error

aggregated trainingTRUST terminal repeller

unconstrained subenergytunneling

TSP traveling salesman problem

UD-FMEKF UD factorization-based

FMEKFUNBLOX uniform block crossover

VHDL very high level hardware

description languageVLSI very large scale integrated

WINC weighted information

criterionWNN wavelet neural network

Trang 25

| · | the cardinality of the set or region within; Also the absolute value

of the scalar within

· A the weighted Euclidean norm

· F the Frobenius norm

· p the p-norm or L p-norm

· ε the ε-insensitive loss function

ˆ

[·],[·] the estimate of the parameter within

[·], ¬[·] the complement of the set or fuzzy set within

[·] the normalized form or unit direction of the vector within[·] † the pseudoinverse of the matrix within

[·] ∗ the conjugate of the matrix within; Also the ﬁxed point or

opti-mum point of the variable within[·]H the Hermitian transpose of the matrix within

[·]T

the matrix transpose of the matrix within[·] ϕ0 the operator that ﬁnds in the interval (−π, π] the quantization of

the variable, which can be a discrete argument mϕ0

[·]max the maximal value of the quantity within

[·]min the minimal value of the quantity within

[·] ◦ [·] the max-min composition of the two fuzzy sets within

[·] [·] the min-max composition of the two fuzzy sets within

∂x

∧ the logic AND operator; Also the intersection operator; Also the

t-norm operator; Also the minimum operator

1 a vector or matrix with all its entries being unity

Trang 26

α the momentum factor in the BP algorithm; Also an annealing

schedule parameter in the SA; Also a design parameter in themountain and subtractive clustering; Also the parameter deﬁningthe size of neighborhood; Also a scaling factor; Also a pheromonedecay parameter in the ant system; Also the inertia weight in thePSO

α the diagonal damping coeﬃcient matrix with the (i, i)th entry

α i; Also an eigenvector of the kernel matrix K

α i the damping coeﬃcient of the ith neuron in the Hopﬁeld model;

Also the ith entry of the eigenvector of the kernel matrix K; Also the quantization of the phase of net iin the complex Hopﬁeld-like

network; Also the Lagrange multiplier for the ith example in the

SVM

α i the ith eigenvector of the kernel matrix K

α i,j the jth entry of α i

α ik a coeﬃcient used in the GSO procedure

α (m) ij the momentum factor corresponding to w (m) ij

αmax the upper bound for α (m) ij in the Quickprop

β the gain of the sigmoidal function; Also a positive parameter that

determines the relative importance of pheromone versus distance

in the ant system; Also a deterministic annealing scale estimator;Also a scaling factor in the chaotic neural network; Also a designparameter in the mountain and subtractive clustering; Also thescale estimator, known as the cutoﬀ parameter, in a loss function

β the variable vector containing the Lagrange multipliers for all the

examples in the SVR

β(t) a step size to decide d(t + 1) in the CG method

β1, β2 time-varying error cutoﬀ points of Hampel’s tanh estimator

β i the phase of x iin the complex Hopﬁeld-like network; Also a shape

parameter associated with the ith dimension of the RBF; Also the weighting factor of the constraint term for the ith cluster; Also an annealing scaling factor for the learning rate of the ith neuron in the ALA algorithm; Also the ith entry of β

β i (m) the gain of the sigmoidal function of the ith neuron at the mth

layer; Also a scaling factor for the weight vector to the ith neuron

at the mth layer, w (m) i

δ a small positive constant; Also a threshold for detecting noise and

outliers in the noise clustering method; Also a global step size inthe CMA-ES

δ(H) the deﬁning length of a schema H

δ i the approximation accuracy at the ith phase of the successive

approximative BP

δ i (t) an exponentially weighted estimate of the ith eigenvalue in the

PASTd

Trang 27

δ ij the Kronecker delta

δ (m) p,v the delta function of the vth neuron in the mth layer for the pth

pattern

δ t the radius of the trust region at the tth step

δσ i a parameter for mutating σ iin the ES

δy p (t) the training error vector for the pth example at the tth phase

∆[·] the change in the variable within

∆(t, y) a function with domain [0, y] whose probability of being close to

0 increases as t increases

∆(m) ij (t) a parameter associated with w ij (m)in the RProp

∆max the upper bound on ∆(m) ij (t)

∆min the lower bound on ∆(m) ij (t)

∆p[·] the change in the variable within due to the pth example (∆E) i the saliency of the ith weight

the measurement noise; Also a perturbation related to W; Also

an error vector, whose ith entry corresponds to the L2-norm of

the approximation error of the ith example

i the error vector as a nonlinear extension to ei; Also the encoded

complex memory state of xi; Also a perturbation vector for

split-ting the ith RBF prototype

i (t) an instantaneous representation error vector for nonlinear PCA

 i,j the jth entry of i

ε the decaying coeﬃcient at each weight change in the

weight-decaying technique; Also a positive constant in the delta; Also a threshold parameter in the mountain and subtrac-tive clustering

delta-bar-ε, ε the two thresholds used in the mountain and substractive

clus-tering

ε0, ε1 predeﬁned small positive numbers

εmax the largest scale of the threshold ε(t) in the RAN

εmin the smallest scale of the threshold ε(t) in the RAN

φ( ·) the activation function; Also the nonlinearity introduced in the

nonlinear PCA and ICA

˙

φ( ·) the ﬁrst-order derivative of φ( ·)

φ −1(·) the inverse of φ( ·)

φ (m) a vector comprising all the activation functions in the mth layer

φ1(·), φ2(·), nonlinear functions introduced in the ICA

φ3(·)

φ i the azimuth angle of the ith point source in the space; Also the

ith RBF in the RBFN;

φ i,j the jth entry of φ i

φ (m) i the activation function at the ith node of the mth layer, the ith

entry ofφ (m)

Trang 28

φ l the lth row of Φ

φ i(x) the normalized form of the ith RBF node, over all the examples

and all the nodes

φ i the vector comprising all φ i(xp ), p = 1, · · · , N

φ µ (net) an activation function deﬁned according to φ(net)

φI(·) the imaginary part of a complex activation function

φR(·) the real part of a complex activation function

φ i (x), φ j(x) the inner product of the two RBFs

Φ a nonlinear mapping between the input and the output of the

examples

Φ the response matrix of the hidden layer of the RBFN

γ a proportional factor; Also a constant in the LM method; Also a

bias parameter in the chaotic neural network

γ G (f ; c, σ) the Gaussian spectrum of the known function

η the learning rate or step size

η0 the initial learning rate

η0− , η+

0 two learning parameters in the RProp or the SuperSAB

ηbatch the learning rate for batch learning

η ij (m) the learning rate corresponding to w ij (m)

ηinc the learning rate for incremental learning

η k the learning rate for the kth prototype

ηr the learning rate for the rival prototype

ηw the learning rate for the winning prototype

η β the learning rate for adapting the gain of the sigmoidal function

ϕ a nonlinear mapping from R J1 to R J2

ϕ (x i) the phase of the ith array element

ϕ0 the Lth root of unity; Also the resolution of the phase quantizer

ϕ1(·) a very robust estimate of the inﬂuence function in the τ -estimator

ϕ2(·) a highly eﬃcient estimate of the inﬂuence function in the τ

ϑ a positive constant taking value between 0 and 1; Also a variable

used in the OOja

κ (x i , x j) the kernel function deﬁned for kernel methods

Trang 29

κ (y i) the kurtosis of signal y i

λ the number of oﬀspring generated from the population in the ES

λ(t) the exact step size to the local minimum of E along the direction

of d(t)

λc the regularization parameter for Ec

λ i the wavelength of the radiation from the ith source; Also the ith

eigenvalue of the Hessian H; Also the ith eigenvalue of the correlation matrix C; Also the ith eigenvalue of the kernel matrix

auto-K; Also the ith generalized eigenvalue in the GEVD problem

λ i the prototype of a hyperspherical shell

˜

λ i, the ith principal eigenvalue of Cs

λEVDi the ith eigenvalue of C, calculated by the EVD method

λmax the largest eigenvalue of the Hessian matrix of the error function;

Also the largest eigenvalues of C

λo the regularization parameter for Eo

Λ the diagonal matrix with all the eigenvalues of C as its diagonal

entries, Λ = diag (λ1, · · · , λ J2)

µ the mean of the data set{x i }; Also the membership degree of a

fuzzy set; Also a positive number; Also the forgetting factor inthe RLS method; Also the population size in the ES

µ i the degree of activation of the ith rule

µ(1)i an MF of the premise part of the NEFPROX

µ(2)i an MF of the consequence part of the NEFPROX

µ j the mean of all the data in class j

µ A j

i

the association between the jth input of A and the ith rule

µ B k

i the association between the kth input of B and the ith rule

µ A (x) the membership degree of x to the fuzzy set A

µ A [α] the α-cut of the fuzzy set A

µ A (x) the membership degree of x to the fuzzy setA

µ A i(x) the membership degree of x to the fuzzy setA i

µ B i(y) the membership degree of y to the fuzzy setB i

µ kp the connection weight assigned to prototype ck with respect to

xp , denoting the membership of pattern p into cluster k

µ R (x, y) the degree of membership for association between x and y

θ the bias or threshold at a single neuron

θ the bias vector; Also a vector of parameters to estimate

θ i the threshold for the ith neuron; Also the elevation angle of the

ith point source in the space; Also the angle between w iand ci

in PCA

θ (m) the bias vector at the mth layer

Trang 30

ρ the condition number of a matrix; Also a vigilance parameter

in ART models; Also a small positive constant, representing thepower of the repeller in the global-descent method; Also a smallpositive tolerance

ρ+, ρ − two positive constants in the bold driver technique

ρ0 a positive constant; Also the initial neighborhood parameter for

the NG

ρf the ﬁnal neighborhood parameter for the NG

ρ (j) i the scaling factor for the weights connected to the ith neuron at

the jth layer

ρ+

ij the (i, j)th entry of the correlation matrix of the state vector x

in the clamped condition

ρ − ij the (i, j)th entry of the correlation matrix of the state vector x

in the free-running condition

ρ t the ratio of the actual reduction in error to the predicted

reduc-tion in error

σ the variance parameter; Also the width of the Gaussian RBF;

Also a shifting parameter in the global-descent method

σ the strategy parameters in the ES, the vector containing all

stan-dard deviations σ i in the ES

σ(t) a small positive value in the LM method, used to indirectly

con-trol the size of the trust region

σ (c i) the standard deviation of cluster i

σ( X ) the standard deviation of datasetX

σ − , σ+ the lower and upper thresholds of σ iin the DDA algorithm

σ − , σ+ the left and right standard deviations used in the

pseudo-Gaussian function

σ i the standard deviation, width or radius of the ith Gaussian RBF;

Also the ith singular value of C xy ; Also the ith singular value of

X

σ i the vector containing all the diagonal entries of Σi

σ i a quantity obtained by mutating σ iin the ES

σ k the variance vector of the kth cluster

σ k,i the ith entry of σ k

σmax the maximum singular values of a matrix

σmin the minimum singular values of a matrix

Σ the covariance matrice for the Gaussian function; Also the

singu-lar value matrix arising from the SVD of X

Σi the covariance matrice for the ith Gaussian RBF

ΣJ2 the singular value matrix with the J2principal singular values of

Xas diagonal entries

Trang 31

τ the size of neighborhood used for estimating the step size of the

line search; Also a decay constant in the RAN

τ the circuit time constant matrix, which is a diagonal matrix with

the (i, i)th entry being τ i

τ i the circuit time constant of the ith neuron in the Hopﬁeld network

or an RNN

τ i,j the intensity of the pheromone on edge i → j

τ i,j k (t) the intensity of the pheromone on edge i → j contributed by ant

k at generation t

τ l (φ i , θ i) the time delay on the lth array element for the i source

ς j the jth value obtained by dividing the interval of the output y

Ω(E) a non-negative continuous function of E

ξ the vector containing all the slack variables ξ i

ξ i a zero-mean Gaussian white noise process for the regression of

the ith RBFN weight; Also a slack variable

ψ( ·) a repulsive potential function

ψ ij(·) a continuous function of one variable

ζ i a slack variable used in the SVR

a the radix of the activation function in the ECAM; Also a scaling

factor; Also a shape parameter of the fuzzy MF

a the coeﬃcient vector in the objective function of the LP problem;

Also the output vector of the left part of the crosscorrelationAPCA network

a1, a2 real parameters

a i a shape parameter associated with the ith dimension of the RBF;

Also the ith entry of a

ai the ith column of the mixing matrix A; Also the regression

pa-rameter vector for the ith RBF weight

a ij a constant integer in a class of COPs

a i,j the adjustable parameter corresponding to the ith rule and the

jth input; Also the jth entry of a i

a j i,k the adjustable parameter corresponding to the kth input, ith rule,

and jth output

a i j a premise parameter in the ANFIS model, corresponding to the

ith input and jth rule

a p a variable deﬁned for the pth eigenvalue in the APCA network

Trang 32

A the grid of neurons in the Kohonen network

A a general matrix; Also a matrix deﬁned in the LMSER; Also the

mixing matrix in the ICA data model

A( ·) a nonlinear function in the NOVEL method

A an input fuzzy set of an FIS

A j an input fuzzy set of an FIS

A i

a fuzzy set obtained by fuzzifying x i ; Also the ith partition of the

fuzzy setA;

A1, A2 two algorithms; Also two weighting parameters in the cost

func-tion of the COP

Ai the transformed form of A for extracting the ith principal

sin-gular component of A; Also a decorrelating matrix used in the

LEAP

A i the fuzzy set associated with the antecedent part of the ith rule;

Also a fuzzy set corresponding to the antecedent part of the ith

fuzzy rule

A j

i the fuzzy subset associated with the ith fuzzy rule and the jth

input

A × B the Cartesian product of fuzzy setsA and B

b a shape parameter for an activation function or a fuzzy MF; Also

a positive number used in the nonuniform mutation

b1 a shape parameter for a nonmonotonic activation function

b a vector; Also the output vector of the right part of the APCA

network

b i the ith entry of b; Also a numerical value associated with the

consequent of the i rule, B i ; Also the ith binary code bit in a

binary code; Also a constant integer in a class of COPs

˜b ij the (i, j) entry ofB

b ij the (i, j) entry of B

b i j a premise parameter in the ANFIS model, corresponding to the

b p a variable deﬁned for the pth eigenvalue in the APCA network

B the size of a block in a pattern; Also a weighting parameter in

the cost function of the COP

B the rotation matrix in the mutation operator in the CMA-ES

B a matrix obtained during the ROLS procedure at the tth

itera-tion; Also a matrix obtained during the batch OLS procedure

B a matrix obtained during the batch OLS procedure

B( ·) a nonlinear function in the NOVEL method

B(t) a variable used in the OSS method

B(t) a matrix obtained during the ROLS procedure at the tth iteration

B (k)

the fuzzy set corresponding to the consequent part of the rule

R (k)

Trang 33

Bi a matrix deﬁned for the ith neuron in the LMSER; Also a

decor-relating matrix used in the LEAP

B i a fuzzy set corresponding to the consequent part of the ith fuzzy

c speed of light; Also a center parameter for an activation function;

Also a shape parameter of the fuzzy MF; Also the accelerationconstant in the PSO, positive

c( ·) the center parameter of an MF of a fuzzy variable

c(t) a coeﬃcient of self-coupling in the chaotic neural network

cin the center of the input space

c1 the cognitive parameter in the PSO, positive constant; Also a

shape parameter in the π-shaped MF; Also a real constant

c2 the social parameter in the PSO, positive constant; Also a shape

parameter in the π-shaped MF; Also a real constant

ci the eigenvectors of C corresponding to eigenvalue λ i; Also the

prototype of the ith cluster in VQ; Also the ith prototypes in the RBFN; Also the feedback weights from the F2 neuron i to all

input nodes in the ART model

cin

˜i the ith principal eigenvectors of the skewed autocorrelation

ma-trix Cs

c i,j its jth entry of c i;

c ij the connectivity from nodes i to j; Also a constant integer

coef-ﬁcient in a class of COPs; Also the (i, j)th entry of C

c i j a premise parameter in the ANFIS model, corresponding to the

cx,j the input part of the augmented cluster center cj in supervised

clustering

cy,j the output part of the augmented cluster center cjin supervised

clusteringco(·) the core of the fuzzy set within

csign(u) the multivalued complex-signum activation function

C a weighting parameter in the cost function of the COP; Also the

product of the gradients at time t and time t + 1 in the RProp;

Also the number of classes; Also a prespeciﬁed constant thattrades oﬀ wide margin with a small number of margin failures

in the SVM

C the autocorrelation of a set of vectors{x}; Also a transform

ma-trix in the kernel orthogonalization-based RBFN weight learning

Trang 34

C all concepts in a class, C = {C n }; Also the complex plane C(t) a variable used in the OSS method

C(U) the set of all continuous real-valued functions on a compact

do-mainU

C (W, W ∗) the criterion function used in the global-descent method

C ∗ (x, y) the t-conorm using the drastic union

C J the J -dimensional complex space

C1 the autocorrelation matrix in the feature space

Cb(x, y) the t-conorm using the bounded sum

C i the capacitance associated with neuron i in the Hopﬁeld network

Ci a matrix deﬁned for the ith neuron in the LMSER

Cm(x, y) the t-conorm using the standard union

C n a set of target concepts over the instance space {0, 1} n

, n ≥ 1;

Also the set of input vectors represented by cnaccording to the

nearest-neighbor paradigm, namely the nth cluster; Also a fuzzy set corresponding to the condition part of the nth fuzzy rule

Cp(x, y) the t-conorm using the algebraic sum

Cs the skewed autocorrelation matrix

Cxy the crosscorrelation matrix of two sets of random vectors {x t }

d(t) the update step of the weight vector − →w; Also the descent direction

approximating Newton’s direction

d ( C1, C2) the distance between clusters C1 andC2

d (x1, x2) the distance between data points x1 and x2

d0 the steering vector in the desired direction

dBCS(ck , c l) the between-cluster separation for cluster k and cluster l

dH(·, ·) the Hamming distance between the two binary vectors within

di the steering vector associate with the ith source; Also the

coeﬃ-cient vector of the ith inequality constraint in LP problems

d i,j the jth entry of d i ; Also the distance between nodes i and j in

the ant system; Also the (i, j)th entry of D; Also the distance

between the ith pattern and the jth prototype

di,j the distance vector between pattern xiand prototypeλ jin

spher-ical shell clustering

dWCS(ck) the within-cluster scatter for cluster k

dmax the maximum distance between the selected RBF centers

dmin the shortest of the distances between the new cluster center and

all the existing cluster centers in the mountain and subclusteringclustering

defuzz(·) the defuzziﬁcation function of the fuzzy set within

det(·) the determinant of the matrix within

dimBVC(N ) the Boolean VC dimension of the class of functions or the neural

network within

Trang 35

dimVC(·) the VC dimension of the class of functions or the neural network

D (m) j the degree of saturation of the jth neuron at the mth layer

Din the maximum possible distance between two points of the input

Dg an approximation to D at the constraint plane of the network in

the nonideal case

D(s) the decoding transformation in the GA

D (R p) a degree of fulﬁllment of the rule associated with the pth example

ei the error vector between the network output and the desired

out-put for the ith example

e(t) the instantaneous representation error vector for the tth input

for PCA

e i the average of e p,iover all the patterns

ei (t) the instantaneous representation error vector associated with the

ith output node for the tth input in robust PCA

emax the maximum error at the output nodes for a given pattern

e i,j the jth entry of e i

err the training error for a model

E an objective function for optimization such as the MSE between

the actual network output and the desired output

E a matrix whose columns are the eigenvectors of the covariance

matrix C

E[·] the expectation operator

E ∗ the optimal value of the cost function

E0 an objective function used in the SVM

E1, E2 two objective functions used in the SVR

E3 an objective function used in the SVC

Ec the constraint term in the cost function

Ecoupled the information criterion for coupled PCA/MCA

Eo the objective term in the cost function

E p the error contribution due to the pth pattern

EAFD the average fuzzy density criterion function

EAIC the AIC criterion function

EAPCA the objective function for the APCA network

EBIC the BIC criterion function

ECMP, ECMP1 two cluster compactness measures

ECPCA the criterion function for the CPCA problem

Trang 36

ECPCA∗ the minimum of ECPCA

ECSA the extra energy term in the CSA

ECV the crossvalidation criterion function

EDoA the error function deﬁned for the DoA problem

E DoA,l the error function deﬁned for the DoA problem, corresponding to

the lth snapshot

EFHV the fuzzy hypervolume criterion function

EGEVD the criterion function for GEVD

EHebb the instantaneous criterion function for Hebbian learning

E LDA,1 , E LDA,2,

E LDA,3

three criterion functions for LDA

EMDL the total description length

ENIC∗ the global maximum of ENIC

EOCQ the overall cluster quality measure

EPCA the criterion function for PCA

ESEP, ESEP1 two cluster separation measures

ESLA the criterion function for the SLA

ESVC the objective function for the SVC

ESVM the objective function for the SVM classiﬁcation

ESVR the objective function for the SVR

ET the total optimization objective function comprising the objective

and regularization terms

ETHK the average shell thickness criterion function

ETj the individual objective function corresponding to the jth cluster

in the PCM

EWBR the ratio of the sum of the within-cluster scatters to the

between-cluster separation

E α , E β the energy levels of a physical system in states α and β

Err the generalization error on the new data

ER the cost function of robust learning

ERR k the ERR due to the kth RBF neuron

E S the expectation operation over all possible training sets

f (·) the ﬁtness function in EAs

f ·) the operator to perform the function of the MLP, used in the

EKF

f (µ ji) the fuzzy complement of µ ji

f (H) the average ﬁtness of all strings in the population matched by

the schema H

f (t) the average ﬁtness of the whole population at time t

f (x) the vector containing multiple functions as entries

˙

f (x) the ﬁrst-order derivative of f (x), that is, df (x) dx

f (x) the objective function obtained by the penalty function method

f(z, t) a vector with functions as entries

f : X → Y a mapping from fuzzy setsX onto Y

Trang 37

fc the carrier frequency

f i(·) the output function in the TSK model for the ith rule; Also the

nonlinear relation characterized by the ith TSK system of the

hierarchical fuzzy system

f i(x) the ith entry of the function vector f (x)

fi(x) the crisp vector function of x, related to the ith rule and the

output of the TSK model

f i j(x) the jth entry of f i (x), related to the jth output component of

the TSK model

fp(x) the penalty term characterizing the constraints

fuzz(·) a fuzziﬁcation operator

F the set of all functions; Also a set of real continuous functions

F (·) the fairness function in the FSCL; Also the CDF of the random

variable within

F (x) the weighted objective of all the entries of f (x)

Fi the fuzzy covariance matrix of the ith cluster

g(t) the gradient vector of E with respect to − →w(t)

g(m) the gradient vector of E with respect to − →w(m)

g ij (m) (t) the gradient of E with respect to w ij (m) (t)

g (m) ij (t) a gradient term decided by g (m) ij (t)

gτ (t) the gradient vector of E (− →w(t) + τ d(t)) with respect to − →w

G ij the conductance of the jth resistor of neuron i in the Hopﬁeld

network

h a hypothesis in the PAC theory; the tournament size in the GA

h( ·) a function deﬁned as the square root of the loss function σ( ·)

h j a constant term in the jth linear inequality in the COP and LP

problems

h j(·) a continuous function of one variable

h kw (t) the neighborhood function, deﬁning the response of neuron k

when cw is the excitation centerhgt(·) the height of the fuzzy set within

H a schema of length l, deﬁned over the three-letter alphabet

{0, 1, ∗}

H(y) the joint entropy of all the entries of y

H (y i) the marginal entropy of component i

Hb the block diagonal Hessian for the MLP

H(m)b the (m, m)th diagonal partition matrix Hb, corresponding to

−

→w(m)

H i a set of hypotheses over the instance space{0, 1} i , i ≥ 1

Trang 38

H ij the (i, j)th entry of H

HBFGS the Hessain obtained by the BFGS methods

HDFP the Hessain obtained by the DFP method

HGN the Hessian matrix obtained by the Gauss–Newton method

HLM the Hessian matrix obtained by the LM method

i → j an edge from nodes i to j

I the intersection of fuzzy setsA and B

I(i) a running length for summation for extracting the ith PC

I(x; y) the mutual information between signal vectors x and y

I(y) the mutual information between the components of vector y in

the ICA

I i the external bias current source for neuron i in the Hopﬁeld

net-work

Ik the identity matrix of size k × k

Im(·) the operator taking the imaginary part of a complex number

−1

J the dimensionality of the input data

J(− →w) the Jacobian matrix

J i the number of nodes in the ith layer

J ij the (i, j)th entry of the Jacobian matrix J

J k

i the set of nodes that remain to be visited by ant k positioned at

node i

k an index for iteration; Also a scaling factor that controls the

variance of the Gaussian machine

k a vector in the set of all the index vectors of a fuzzy rule base,

k= (k, k1, · · · , k n)T

k i the index of the partitioned fuzzy subsets of the interval of x i

k p i the index of the partitioned fuzzy subsets of the interval of x i,

corresponding to the pth pattern

K the number of clusters; Also the number of prototypes in the

RBFN; Also the number of rules in the ANFIS model

K the Kalman gain matrix; Also a kernel matrix

K the index set of a fuzzy rule base

K ij the (i, j)th entry of the kernel matrix K

l p the index of the partitioned fuzzy subsets of the interval of the

output, y, corresponding to the pth pattern

l i the bit-length of the gene x iin the chromosome

Trang 39

L an integer used as the quantization step for phase quantization;

Also the number of array elements; Also a constant parameter inthe ART 1

L the undesirable subspace in the CPCA problem

L J2 a J2-dimensional subspace that is constrained to be orthogonal

toL L(t) the Lipschitz constant at time t

L p (R p , dx) the L p space, where (R p , dx) is a measure space and p a positive

number

L(W(D i)|D i) the likelihood evaluated on the data setD i

L k the length of the tour performed by ant k

L N(WN) the likelihood estimated for a training set of size N and the model

parametersWN

LT[·] the operator extracting the lower triangle of the matrix contained

within

m an index for iteration; Also an integer; Also the fuzziﬁer

m(H, t) the number of examples of a particular schema H within a

pop-ulation at time t

m i the number of fuzzy subsets obtained by partitioning the interval

of x i

m i (t) the complex modulating function for the ith source

m y the number of fuzzy subsets obtained by partitioning the interval

of y

max (x1, x2) the operation that gives a vector with each entry obtained by

taking the maximum of the corresponding entries of x1 and x2

min (x1, x2) the operation that gives a vector with each entry obtained by

taking the minimum of the corresponding entries of x1and x2

M the number of signal sources; Also the number of layers of FNNs;

Also the eﬀective size of a time window

n(t) an unbiased noisy term at a particular instant

n (j) i the number of weights to the ith unit at the jth layer

n y the time window of a time series

net the net input to a single neuron

net the net input vector to the SLP

net i the ith entry of net

net(m) p the net input to the mth layer for the pth pattern

net p,j the net input of the jth neuron for the pth sample

net (m) p,v the net input to the vth neuron of the mth layer for the pth

pattern

N (0, σ) a random number drawn from a normal distribution with zero

mean and standard deviation σ i; Also a normal distribution with

zero mean and standard deviation σ i

NPAC the sample complexity of a learning algorithm

N , N k neural networks

N1 a neural network whose hidden neurons are LTGs

N a neural network whose hidden neurons are binary RBF neurons

Trang 40

N3 a neural network whose hidden neurons are generalized binary

RBF neurons

N i the number of samples in the ith cluster or class

Nmax the storage capability of an associative memory network; Also

the maximum number of fundamental memories

Nn the number of nodes in a network

NOT [·] the complement of the set or fuzzy set within

Nphase the number of training phases in the successive approximative

BP learning

NP the size of the population; the number of ants in the ant system

Nw the total number of weights (free parameters) of a network

Nw(m) the total number of weights (free parameters) at the mth layer

of a network

o(H) the order of a schema H, namely the number of ﬁxed positions

(the number of 0s or 1s) present in the template

o (m) i the output of the ith node in the mth layer

o(m) p the output vector at the mth layer for the pth pattern

o (m) p,i the ith entry of o (m) p

O a J -dimensional hypercube, {−1, 1} J

O( ·) in the order of the parameter within

OP the degree of optimism inherent in a particular estimate

p the index for iteration; Also the density of fundamental memories

p(x) the marginal PDF of the vector variable x∈ R n

p(x, y) the joint PDF of x and y

p(y) the joint PDF of all the elements of y

p1, p2 two points

p i (y i) the marginal PDF of y i

P the number of hypothesis parameters in a model; Also the

prob-ability of a state change

P the conditional error covariance matrix; Also the mean output

power matrix of the beamformer

P(0) the initial value of the conditional error covariance matrix P

P (i) the potential measure for the ith data point, x i

P (k) the potential measure for the kth cluster center, c k

P(t) the population at generation t in an EA

P (x) a data distribution

P α , P β the probabilities of a physical system being in states α and β

Pc the probability of recombination

P i the probability of state change of the ith neuron in the Boltzmann

machine; Also the selection probability of the ith chromosome in

a population

P i (t) the inverse of the covariance of the output of the ith neuron

L an integer used as the quantization step for phase quantization;

Also the number of array elements; Also a constant parameter inthe ART

L...

σmax the maximum singular values of a matrix

σmin the minimum singular values of a matrix

Σ the covariance matrice for the Gaussian function; Also...

Also the output vector of the left part of the crosscorrelationAPCA network

a< /i>1, a< /i>2 real parameters

a i a shape parameter

Định dạng
Số trang	609
Dung lượng	6,17 MB