While some of the problems can be simple thatcan be solved by traditional optimization methods based on mathematical analysis,most of the problems are very hard to be solved using analys
Trang 1Search and
Optimization by Metaheuristics
Trang 4ISBN 978-3-319-41191-0 ISBN 978-3-319-41192-7 (eBook)
DOI 10.1007/978-3-319-41192-7
Library of Congress Control Number: 2016943857
Mathematics Subject Classification (2010): 49-04, 68T20, 68W15
© Springer International Publishing Switzerland 2016
This work is subject to copyright All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on micro films or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc in this publication does not imply, even in the absence of a speci fic statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made.
Printed on acid-free paper
This book is published under the trade name Birkh äuser
The registered company is Springer International Publishing AG Switzerland
(www.birkhauser-science.com)
Trang 5To My Friends Jiabin Lu and Biaobiao Zhang
Ke-Lin Du and
To My Parents
M.N.S Swamy
Trang 6Optimization is a branch of applied mathematics and numerical analysis Almostevery problem in engineering, science, economics, and life can be formulated as anoptimization or a search problem While some of the problems can be simple thatcan be solved by traditional optimization methods based on mathematical analysis,most of the problems are very hard to be solved using analysis-based approaches.Fortunately, we can solve these hard optimization problems by inspirations fromnature, since we know that nature is a system of vast complexity and it alwaysgenerates a near-optimum solution
Natural computing is concerned with computing inspired by nature, as well aswith computations taking place in nature Well-known examples of natural com-puting are evolutionary computation, neural computation, cellular automata, swarmintelligence, molecular computing, quantum computation, artificial immune sys-tems, and membrane computing Together, they constitute the discipline of com-putational intelligence
Among all the nature-inspired computational paradigms, evolutionary tation is most influential It is a computational method for obtaining the best pos-sible solutions in a huge solution space based on Darwin’s survival-of-the-fittestprinciple Evolutionary algorithms are a class of effective global optimizationtechniques for many hard problems
compu-More and more biologically inspired methods have been proposed in the pasttwo decades The most prominent ones are particle swarm optimization, ant colonyoptimization, and immune algorithm These methods are widely used due to theirparticular features compared with evolutional computation All these biologicallyinspired methods are population-based Computation is performed by autonomousagents, and these agents exchange information by social behaviors The memeticalgorithm models the behavior of knowledge propagation of animals
There are also many other nature-inspired metaheuristics for search and mization These include methods inspired by physical laws, chemical reaction,biological phenomena, social behaviors, and animal thinking
opti-Metaheuristics are a class of intelligent self-learning algorithms for findingnear-optimum solutions to hard optimization problems, mimicking intelligentprocesses and behaviors observed from nature, sociology, thinking, and otherdisciplines Metaheuristics may be nature-inspired paradigms, stochastic, or
vii
Trang 7probabilistic algorithms Metaheuristics-based search and optimization are widelyused for fully automated decision-making and problem-solving.
In this book, we provide a comprehensive introduction to nature-inspiredmetaheuristical methods for search and optimization While eachmetaheuristics-based method has its specific strength for particular cases, according
to no free lunch theorem, it has actually the same performance as that of randomsearch in consideration of the entire set of search and optimization problems Thus,when talking about the performance of an optimization method, it is actually based
on the same benchmarking examples that are representatives of some particularclass of problems
This book is intended as an accessible introduction to metaheuristic optimizationfor a broad audience It provides an understanding of some fundamental insights onmetaheuristic optimization, and serves as a helpful starting point for those interested
in more in-depth studies of metaheuristic optimization The computational digms described in this book are of general purpose in nature This book can beused as a textbook for advanced undergraduate students and graduate students Allthose interested in search and optimization can benefit from this book Readersinterested in a particular topic will benefit from the appropriate chapter
para-A roadmap for navigating through the book is given as follows Except theintroductory Chapter1, the contents of the book can be grossly divided into fivecategories and an appendix
• Evolution-based approach is covered in Chapters 3–8:
Chapter3 Genetic Algorithms
Chapter4 Genetic Programming
Chapter5 Evolutionary Strategies
Chapter6 Differential Evolution
Chapter7 Estimation of Distribution Algorithms
Chapter8 Topics in Evolutionary Algorithms
• Swarm intelligence-based approach is covered in Chapters9–15:
Chapter9 Particle Swarm Optimization
Chapter10 Artificial Immune Systems
Chapter11 Ant Colony Optimization
Chapter12 Bee Metaheuristics
Chapter13 Bacterial Foraging Algorithm
Chapter14 Harmony Search
Chapter15 Swarm Intelligence
• Sciences-based approach is covered in Chapters 2,16–18:
Chapter2 Simulated Annealing
Chapter16 Biomolecular Computing
Chapter17 Quantum Computing
Chapter18 Metaheuristics Based on Sciences
• Human-based approach is covered in Chapters19–21:
Trang 8Chapter19 Memetic Algorithms
Chapter20 Tabu Search and Scatter Search
Chapter21 Search Based on Human Behaviors
• General optimization problems are treated in Chapters22–23:
Chapter22 Dynamic, Multimodal, and Constrained Optimizations
Chapter23 Multiobjective Optimization
• The appendix contains auxiliary benchmarks helpful to test new and existingalgorithms
In this book, hundreds of different metaheuristic methods are introduced.However, due to space limitation, we only give detailed description to a largenumber of the most popular metaheuristic methods Some computational examplesfor representative metaheuristic methods are given The MATLAB codes for theseexamples are available at the book website We have also collected some MATLABcodes for some other metaheuristics These codes are of general purpose in nature.The reader needs just to run these codes with their own objective functions.For instructors, this book has been designed to serve as a textbook for courses onevolutionary algorithms or nature-inspired optimization This book can be taught in
12 two-hour sessions We recommend that Chapters1–11,19,22and23should betaught In order to acquire a mastery of these popular metaheuristic algorithms,some programming exercises using the benchmark functions given in the appendixshould be assigned to the students The MATLAB codes provided with the book areuseful for learning the algorithms
For readers, we suggest that you start with Chapter 1, which covers basicconcepts in optimization and metaheuristics When you have digested the basics,you can delve into one or more specific metaheuristic paradigms that you areinterested in or that satisfy your specific problems The MATLAB codes accom-panying the book are very useful for learning those popular algorithms, and theycan be directly used for solving your specific problems The benchmark functionsare also very useful for researchers for evaluating their own algorithms
We would like to thank Limin Meng (Zhejiang University of Technology,China), and Yongyao Yang (SUPCON Group Inc, China) for their consistenthelp We would like to thank all the helpful and thoughtful staff at Xonlink Inc Lastbut not least, we would like to recognize the assistance of Benjamin Levitt and theproduction team at Springer
Trang 91 Introduction 1
1.1 Computation Inspired by Nature 1
1.2 Biological Processes 3
1.3 Evolution Versus Learning 5
1.4 Swarm Intelligence 6
1.4.1 Group Behaviors 7
1.4.2 Foraging Theory 8
1.5 Heuristics, Metaheuristics, and Hyper-Heuristics 9
1.6 Optimization 11
1.6.1 Lagrange Multiplier Method 12
1.6.2 Direction-Based Search and Simplex Search 13
1.6.3 Discrete Optimization Problems 14
1.6.4 P, NP, NP-Hard, and NP-Complete 16
1.6.5 Multiobjective Optimization Problem 17
1.6.6 Robust Optimization 19
1.7 Performance Indicators 20
1.8 No Free Lunch Theorem 22
1.9 Outline of the Book 23
References 25
2 Simulated Annealing 29
2.1 Introduction 29
2.2 Basic Simulated Annealing 30
2.3 Variants of Simulated Annealing 33
References 35
3 Genetic Algorithms 37
3.1 Introduction to Evolutionary Computation 37
3.1.1 Evolutionary Algorithms Versus Simulated Annealing 39
3.2 Terminologies of Evolutionary Computation 39
3.3 Encoding/Decoding 42
3.4 Selection/Reproduction 43
3.5 Crossover 46
xi
Trang 103.6 Mutation 48
3.7 Noncanonical Genetic Operators 49
3.8 Exploitation Versus Exploration 51
3.9 Two-Dimensional Genetic Algorithms 55
3.10 Real-Coded Genetic Algorithms 56
3.11 Genetic Algorithms for Sequence Optimization 60
References 64
4 Genetic Programming 71
4.1 Introduction 71
4.2 Syntax Trees 72
4.3 Causes of Bloat 75
4.4 Bloat Control 76
4.4.1 Limiting on Program Size 77
4.4.2 Penalizing the Fitness of an Individual with Large Size 77
4.4.3 Designing Genetic Operators 77
4.5 Gene Expression Programming 78
References 80
5 Evolutionary Strategies 83
5.1 Introduction 83
5.2 Basic Algorithm 84
5.3 Evolutionary Gradient Search and Gradient Evolution 85
5.4 CMA Evolutionary Strategies 88
References 90
6 Differential Evolution 93
6.1 Introduction 93
6.2 DE Algorithm 94
6.3 Variants of DE 97
6.4 Binary DE Algorithms 100
6.5 Theoretical Analysis on DE 100
References 101
7 Estimation of Distribution Algorithms 105
7.1 Introduction 105
7.2 EDA Flowchart 107
7.3 Population-Based Incremental Learning 108
7.4 Compact Genetic Algorithms 110
7.5 Bayesian Optimization Algorithm 112
7.6 Concergence Properties 112
7.7 Other EDAs 113
7.7.1 Probabilistic Model Building GP 115
References 116
Trang 118 Topics in Evolutinary Algorithms 121
8.1 Convergence of Evolutinary Algorithms 121
8.1.1 Schema Theorem and Building-Block Hypothesis 121
8.1.2 Finite and Infinite Population Models 123
8.2 Random Problems and Deceptive Functions 125
8.3 Parallel Evolutionary Algorithms 127
8.3.1 Master–Slave Model 129
8.3.2 Island Model 130
8.3.3 Cellular EAs 132
8.3.4 Cooperative Coevolution 133
8.3.5 Cloud Computing 134
8.3.6 GPU Computing 135
8.4 Coevolution 136
8.4.1 Coevolutionary Approaches 137
8.4.2 Coevolutionary Approach for Minimax Optimization 138
8.5 Interactive Evolutionary Computation 139
8.6 Fitness Approximation 139
8.7 Other Heredity-Based Algorithms 141
8.8 Application: Optimizating Neural Networks 142
References 146
9 Particle Swarm Optimization 153
9.1 Introduction 153
9.2 Basic PSO Algorithms 154
9.2.1 Bare-Bones PSO 156
9.2.2 PSO Variants Using Gaussian or Cauchy Distribution 157
9.2.3 Stability Analysis of PSO 157
9.3 PSO Variants Using Different Neighborhood Topologies 159
9.4 Other PSO Variants 160
9.5 PSO and EAs: Hybridization 164
9.6 Discrete PSO 165
9.7 Multi-swarm PSOs 166
References 169
10 Artificial Immune Systems 175
10.1 Introduction 175
10.2 Immunological Theories 177
10.3 Immune Algorithms 180
10.3.1 Clonal Selection Algorithm 180
10.3.2 Artificial Immune Network 184
10.3.3 Negative Selection Algorithm 185
10.3.4 Dendritic Cell Algorithm 186
References 187
Trang 1211 Ant Colony Optimization 191
11.1 Introduction 191
11.2 Ant-Colony Optimization 192
11.2.1 Basic ACO Algorithm 194
11.2.2 ACO for Continuous Optimization 195
References 198
12 Bee Metaheuristics 201
12.1 Introduction 201
12.2 Artificial Bee Colony Algorithm 203
12.2.1 Algorithm Flowchart 203
12.2.2 Modifications on ABC Algorithm 207
12.2.3 Discrete ABC Algorithms 208
12.3 Marriage in Honeybees Optimization 209
12.4 Bee Colony Optimization 210
12.5 Other Bee Algorithms 211
12.5.1 Wasp Swarm Optimization 212
References 213
13 Bacterial Foraging Algorithm 217
13.1 Introduction 217
13.2 Bacterial Foraging Algorithm 219
13.3 Algorithms Inspired by Molds, Algae, and Tumor Cells 222
References 224
14 Harmony Search 227
14.1 Introduction 227
14.2 Harmony Search Algorithm 228
14.3 Variants of Harmony Search 230
14.4 Melody Search 233
References 234
15 Swarm Intelligence 237
15.1 Glowworm-Based Optimization 237
15.1.1 Glowworm Swarm Optimization 238
15.1.2 Firefly Algorithm 239
15.2 Group Search Optimization 240
15.3 Shuffled Frog Leaping 241
15.4 Collective Animal Search 242
15.5 Cuckoo Search 243
15.6 Bat Algorithm 246
15.7 Swarm Intelligence Inspired by Animal Behaviors 247
15.7.1 Social Spider Optimization 247
15.7.2 Fish Swarm Optimization 249
15.7.3 Krill Herd Algorithm 250
15.7.4 Cockroach-Based Optimization 251
15.7.5 Seven-Spot Ladybird Optimization 252
Trang 1315.7.6 Monkey-Inspired Optimization 252
15.7.7 Migrating-Based Algorithms 253
15.7.8 Other Methods 254
15.8 Plant-Based Metaheuristics 255
15.9 Other Swarm Intelligence-Based Metaheuristics 257
References 259
16 Biomolecular Computing 265
16.1 Introduction 265
16.1.1 Biochemical Networks 267
16.2 DNA Computing 268
16.2.1 DNA Data Embedding 271
16.3 Membrane Computing 271
16.3.1 Cell-Like P System 272
16.3.2 Computing by P System 273
16.3.3 Other P Systems 275
16.3.4 Membrane-Based Optimization 277
References 278
17 Quantum Computing 283
17.1 Introduction 283
17.2 Fundamentals 284
17.2.1 Grover's Search Algorithm 286
17.3 Hybrid Methods 287
17.3.1 Quantum-Inspired EAs 287
17.3.2 Other Quantum-Inspired Hybrid Algorithms 290
References 291
18 Metaheuristics Based on Sciences 295
18.1 Search Based on Newton's Laws 295
18.2 Search Based on Electromagnetic Laws 297
18.3 Search Based on Thermal-Energy Principles 298
18.4 Search Based on Natural Phenomena 299
18.4.1 Search Based on Water Flows 299
18.4.2 Search Based on Cosmology 301
18.4.3 Black Hole-Based Optimization 302
18.5 Sorting 303
18.6 Algorithmic Chemistries 304
18.6.1 Chemical Reaction Optimization 304
18.7 Biogeography-Based Optimization 306
18.8 Methods Based on Mathematical Concepts 309
18.8.1 Opposition-Based Learning 310
References 311
19 Memetic Algorithms 315
19.1 Introduction 315
19.2 Cultural Algorithms 316
Trang 1419.3 Memetic Algorithms 318
19.3.1 Simplex-based Memetic Algorithms 320
19.4 Application: Searching Low Autocorrelation Sequences 321
References 324
20 Tabu Search and Scatter Search 327
20.1 Tabu Search 327
20.1.1 Iterative Tabu Search 330
20.2 Scatter Search 331
20.3 Path Relinking 333
References 335
21 Search Based on Human Behaviors 337
21.1 Seeker Optimization Algorithm 337
21.2 Teaching–Learning-Based Optimization 338
21.3 Imperialist Competitive Algorithm 340
21.4 Several Metaheuristics Inspired by Human Behaviors 342
References 345
22 Dynamic, Multimodal, and Constrained Optimizations 347
22.1 Dynamic Optimization 347
22.1.1 Memory Scheme 348
22.1.2 Diversity Maintaining or Reinforcing 348
22.1.3 Multiple Population Scheme 349
22.2 Multimodal Optimization 350
22.2.1 Crowding and Restricted Tournament Selection 351
22.2.2 Fitness Sharing 353
22.2.3 Speciation 354
22.2.4 Clearing, Local Selection, and Demes 356
22.2.5 Other Methods 357
22.2.6 Metrics for Multimodal Optimization 359
22.3 Constrained Optimization 359
22.3.1 Penalty Function Method 360
22.3.2 Using Multiobjective Optimization Techniques 363
References 365
23 Multiobjective Optimization 371
23.1 Introduction 371
23.2 Multiobjective Evolutionary Algorithms 373
23.2.1 Nondominated Sorting Genetic Algorithm II 374
23.2.2 Strength Pareto Evolutionary Algorithm 2 377
23.2.3 Pareto Archived Evolution Strategy (PAES) 378
23.2.4 Pareto Envelope-Based Selection Algorithm 379
23.2.5 MOEA Based on Decomposition (MOEA/D) 380
23.2.6 Several MOEAs 381
Trang 1523.2.7 Nondominated Sorting 384
23.2.8 Multiobjective Optimization Based on Differential Evolution 385
23.3 Performance Metrics 386
23.4 Many-Objective Optimization 389
23.4.1 Challenges in Many-Objective Optimization 389
23.4.2 Pareto-Based Algorithms 391
23.4.3 Decomposition-Based Algorithms 393
23.5 Multiobjective Immune Algorithms 394
23.6 Multiobjective PSO 395
23.7 Multiobjective EDAs 398
23.8 Tabu/Scatter Search Based Multiobjective Optimization 399
23.9 Other Methods 400
23.10 Coevolutionary MOEAs 402
References 403
Appendix A: Benchmarks 413
Index 431
Trang 16ABC Artificial bee colony
AbYSS Archive-based hybrid scatter search
ACO Ant colony optimization
ADF Automatically defined function
AI Artificial intelligence
aiNet Artificial immune network
AIS Artificial immune system
BBO Biogeography-based optimization
BFA Bacterial foraging algorithm
BMOA Bayesian multiobjective optimization algorithm
CCEA Cooperative coevolutionary algorithm
CLONALG Clonal selection algorithm
CMA Covariance matrix adaptation
C-MOGA Cellular multiobjective GA
COMIT Combining optimizers with mutual information trees algorithmCOP Combinatorial optimization problem
CRO Chemical reaction optimization
CUDA Computer unified device architecture
DE Differential evolution
DEMO DE for multiobjective optimization
DMOPSO Dynamic population multiple-swarm multiobjective PSO
DNA Deoxyribonucleic acid
DOP Dynamic optimization problem
DSMOPSO Dynamic multiple swarms in multiobjective PSO
DT-MEDA Decision-tree-based multiobjective EDA
EA Evolutionary algorithms
EASEA Easy specification of EA
EBNA Estimation of Bayesian networks algorithm
EDA Estimation of distribution algorithm
EGNA Estimation of Gaussian networks algorithm
ELSA Evolutionary local selection algorithm
xix
Trang 17EPUS-PSO Efficient population utilization strategy for PSO
ES Evolution strategy
FDR-PSO Fitness-distance-ratio-based PSO
G3 Generalized generation gap
GEP Gene expression programming
GPU Graphics processing unit
HypE Hypervolume-based algorithm
IDCMA Immune dominance clonal multiobjective algorithm
IDEA Iterated density-estimation EA
IEC Interactive evolutionary computation
IMOEA Incrementing MOEA
IMOGA Incremental multiple-objective GA
LABS Low autocorrelation binary sequences
LCSS Longest common subsequence
LDWPSO Linearly decreasing weight PSO
LMI Linear matrix inequality
MCMC Markov chain Monte Carlo
meCGA Multiobjective extended compact GA
MIMD Multiple instruction multiple data
MIMIC Mutual information maximization for input clustering
MISA Multiobjective immune system algorithm
MOEA/D MOEA based on decomposition
MOGA Multiobjective GA
MOGLS Multiple-objective genetic local search
mohBOA Multiobjective hierarchical BOA
MOP Multiobjective optimization problem
moPGA Multiobjective parameterless GA
MPMO Multiple populations for multiple objectives
MST Minimum spanning tree
MTSP Multiple traveling salesmen problem
NetKeys Network random keys
NMR Nuclear magnetic resonance
NNIA Nondominated neighbor immune algorithm
NPGA Niched-Pareto GA
NSGA Nondominated sorting GA
opt-aiNet Optimized aiNet
PAES Pareto archived ES
PBIL Population-based incremental learning
PCB Printed circuit board
PCSEA Pareto corner search EA
PCX Parent-centric recombination
PICEA Preference-inspired coevolutionary algorithm
PIPE Probabilistic incremental program evolution
Trang 18POLE Program optimization with linkage estimation
PSL Peak sidelobe level
PSO Particle swarm optimization
QAP Quadratic assignment problem
QSO Quantum swarm optimization
REDA Restricted Boltzmann machine-based multiobjective EDA
RM-MEDA Regularity model-based multiobjective EDA
SAGA Speciation adaptation GA
SAMC Stochastic approximation Monte Carlo
SDE Shift-based density estimation
SIMD Single instruction multiple data
SPEA Strength Pareto EA
SVLC Synapsing variable-length crossover
TLBO Teaching–learning-based optimization
TOPSIS Technique for order preference similar to an ideal solution
TSP Traveling salesman problem
TVAC Time-varying acceleration coefficients
UMDA Univariate marginal distribution algorithm
UNBLOX Uniform block crossover
VEGA Vector-evaluated GA
VIV Virtual virus
Trang 19Introduction
This chapter introduces background material on global optimization and the concept
of metaheuritstics Basic definitions of optimization, swarm intelligence, biologicalprocess, evolution versus learning, and no-free-lunch theorem are described Wehope this chapter will arouse your interest in reading the other chapters
1.1 Computation Inspired by Nature
Artificial intelligence (AI) is an old discipline for making intelligent machines.Search is a key concept of AI, because it serves all disciplines In general, thesearch spaces of practical problems are typically so large that excludes the pos-sibility for being enumerated This disables the use of traditional calculus-based andenumeration-based methods Computational intelligence paradigms are initiated forthis purpose, and the approach mainly depends on the cooperation of agents
Optimization is the process of searching for the optimal solution The three searchmechanisms are analytical, enumeration, and heuristic search techniques Analyticalsearch is calculus-based The search algorithms may be guided by the gradient or theHessian of the function, leading to a local minimum solution Random search andenumeration are unguided search methods that simply enumerate the search spaceand exhaustively search for the optimal solution Heuristic search is guided searchthat in most cases produces high-quality solutions
Computational intelligence is a field of AI It investigates adaptive mechanisms
to facilitate intelligent behaviors in complex environments Unlike AI that relies
on knowledge derived from human expertise, computational intelligence dependsupon numerical data collected It includes a set of nature-inspired computationalparadigms Major subjects in computational intelligence include neural networks forpattern recognition, fuzzy systems for reasoning under uncertainty, and evolutionarycomputation for stochastic optimization search
© Springer International Publishing Switzerland 2016
K.-L Du and M.N.S Swamy, Search and Optimization by Metaheuristics,
DOI 10.1007/978-3-319-41192-7_1
1
Trang 20Nature is the primary source of inspiration for new computational paradigms Forinstance, Wiener’s cybernetics was inspired by feedback control processes observ-able in biological systems Changes in nature, from microscopic scale to ecologicalscale, can be treated as computations Natural processes always reach an equilibriumthat is optimal Such analogies can be used for finding useful solutions for searchand optimization Examples of natural computing paradigms are artificial neuralnetworks [43], simulated annealing (SA) [37], genetic algorithms [30], swarm intel-ligence [22], artificial immune systems [16], DNA-based molecular computing [1],quantum computing [28], membrane computing [51], and cellular automata (vonNeumann 1966).
From bacteria to humans, biological entities have social interaction ranging fromaltruistic cooperation to conflict Swarm intelligence borrows the idea of the collec-tive behavior of biological population Cooperative problem-solving is an approachthat achieves a certain goal by the cooperation of a group of autonomous enti-ties Cooperation mechanisms are common in agent-based computing paradigms,
be biological-based or not Cooperative behavior has inspired researches in biology,economics, and the multi-agent systems This approach is based on the notion of theassociated payoffs from pursuing certain strategies
Game theory studies situations of competition and cooperation between multipleparties The discipline starts with the von Neumann’s study on zero-sum games [48]
It has many applications in strategic warfares, economic or social problems, animalbehaviors, and political voting
Evolutionary computation, DNA computing, and membrane computing are dent on knowledge on the microscopic cell structure of life Evolutionary com-putation evolves a population of individuals by generations, generate offspring bymutation and recombination, and select the fittest to survive each generation DNAcomputing and membrane computing are emerging computational paradigms at themolecular level
depen-Quantum computing is characterized by principles of quantum mechanics, bined with computational intelligence [46] Quantum mechanics is a mathematicalframework or set of rules for the construction of physical theories
com-All effective formal behaviors can be simulated by Turing machines For ical devices used for computational purpose, it is widely assumed that all physicalmachine behaviors can be simulated by Turing machines When a computationalmodel computes the same class of functions as the Turing machine, and potentiallyfaster, it is called a super-Turing model Hypercomputation refers to computationthat goes beyond the Turing limit, and it is in the sense of super-Turing computation.While Deutsch’s (1985) universal quantum computer is a super-Turing model, it is nothypercomputational The physicality of hypercomputational behavior is considered
phys-in [55] from first principles, by showing that quantum theory can be reformulated in
a way that explains why physical behaviors can be regarded as computing something
in standard computational state machine sense
Trang 211.2 Biological Processes 3
1.2 Biological Processes
The deoxyribonucleic acid (DNA) is carrier of the genetic information of organisms.Nucleic acids are linear unbranched polymers, i.e., chain molecules, of nucleotides.Nucleotides are divided into purines (adenine - A, guanine - G) and pyrimidines(thymine - T, cytosine - C) The DNA is organized into a double helix structure.Complementary nucleotides (bases) are pitted against each other: A and T, as well
as G and C
The DNA structure is shown in Figure1.1 The double helix, composed of phate groups (triangles) and sugar components (squares), is the backbone of the DNAstructure The double helix is stabilized by two hydrogen bonds between A and T,and three hydrogen bonds between G and C
phos-A sequence of three nucleotides is a codon or triplet With three exceptions,all 43= 64 codons code one of 20 amino acids, and the synonyms code identicalamino acids Proteins are polypeptide chains consisting of the 20 amino acids Anamino acid consists of a carboxyl and an amino group which differs in other groupsthat may also contain the hexagonal benzene molecule The peptide bound of thelong polypeptide chains happens between the amino and the carboxyl group of theneighbored molecule Proteins are the basis modules of all cells and are actors oflife processes They build characteristic three-dimensional structures, e.g., the alphahelix molecule
The human genome is about 3 billion base pairs long that specifies about 20488genes, arranged in 23 pairs of homologous chromosomes All base pairs of the DNAfrom a single human cell have an overall length of 2.6 m, when unraveled andstretched out, but are compressed in the core to size of 200µm Locations on thesechromosomes are referred to as loci A locus which has a specific function is known
as a gene The state of the genes is called the genotype and the observable of the genotype is called the phenotype A genetic marker is a locus with a known DNA
sequence which can be found in each person in the general population
The transformation from genotype to phenotype is called gene expression In the
transcription phase, the DNA is translated into the RNA In the translation phase, theRNA then synthesizes proteins
GA
Trang 22Figure 1.2 A gene on a
chromosome (Courtesy U.S.
Department of Energy,
Human Genome Program).
Figure1.2displays a chromosome, its DNA makeup, and identifies one gene.The genome directs the construction of a phenotype, especially because the genesspecify sequences of amino acids which, when properly folded, become proteins Thephenotype contains the genome It provides the environment necessary for survival,maintenance, and replication of the genome
Heredity is relevant to information theory as a communication process [5] Theconservation of genomes over intervals at the geological timescale and the existence
of mutations at shorter intervals can be conciliated, assuming that genomes possessintrinsic error-correction codes The constraints incurred by DNA molecules result
in a nested structure Genomic codes resemble modern codes, such as low-densityparity-check (LDPC) codes or turbocodes [5] The high redundancy of genomesachieves good error-correction performance by simple means At the same time,DNA is a cheap material
In AI, some of the most important components comprise the process of memoryformation, filtering, and pattern recognition In biological systems, as in the humanbrain, a model can be constructed of a network of neurons that fire signals withdifferent time sequence patterns for various input signals The unit pulse is called anaction potential, involving a depolarization of the cell membrane and the successiverepolarization to the resting potential The physical basis of this unit pulse is fromactive transport of ions by chemical pumps [29] The learning process is achieved bytaking into account the plasticity of the weights with which the neurons are connected
to one another In biological nervous systems, the input data are first processed locallyand then sent to the central nervous system [33] This preprocessing is partly to avoidoverburdening the central nervous system
The connectionist systems (neural networks) are mainly based on a single like connectionist principle of information processing, where learning and infor-mation exchange occur in the connections In [36], the connectionist paradigm isextended to integrative connectionist learning systems that integrate in their struc-ture and learning algorithms principles from different hierarchical levels of informa-tion processing in the brain, including neuronal, genetic, quantum Spiking neuralnetworks are used as a basic connectionist learning model
Trang 23brain-1.3 Evolution Versus Learning 5
1.3 Evolution Versus Learning
The adaptation of creatures to their environments results from the interaction of twoprocesses, namely, evolution and learning Evolution is a slow stochastic process
at the population level that determines the basic structures of a species Evolutionoperates on biological entities, rather than on the individuals themselves At theother end, learning is a process of gradually improving an individual’s adaptationcapability to the environment by tuning the structure of the individual
Evolution is based on the Darwinian model, also called the principle of natural selection or survival of the fittest, while learning is based on the connectionist model
of the human brain In the Darwinian evolution, knowledge acquired by an individualduring the lifetime cannot be transferred into its genome and subsequently passed
on to the next generation Evolutionary algorithms (EAs) are stochastic search ods that employ a search technique based on the Darwinian model, whereas neuralnetworks are learning methods based on the connectionist model
meth-Combinations of learning and evolution, embodied by evolving neural networks,have better adaptability to a dynamic environment [39,66] Evolution and learningcan interact in the form of the Lamarckian evolution or be based on the Baldwineffect Both processes use learning to accelerate evolution
The Lamarckian strategy allows the inheritance of the acquired traits during anindividual’s life into the genetic code so that the offspring can inherit its charac-teristics Everything an individual learns during its life is encoded back into thechromosome and remains in the population Although the Lamarckian evolution isbiologically implausible, EAs as artificial biological systems can benefit from theLamarckian theory Ideas and knowledge are passed from generation to generation,and the Lamackian theory can be used to characterize the evolution of human cul-tures The Lamarckian evolution has proved effective within computer applications.Nevertheless, the Lamarckian strategy has been pointed out to distort the population
so that the schema theorem no longer applies [62]
The Baldwin effect is biologically more plausible In the Baldwin effect, learninghas an indirect influence, that is, learning makes individuals adapt better to their envi-ronments, thus increasing their reproduction probability In effect, learning smoothesthe fitness landscape and thus facilitates evolution [27] On the other hand, learninghas a cost, thus there is evolutionary pressure to find instinctive replacements forlearned behaviors When a population evolves a new behavior, in the early phase,there will be a selective pressure in favor of learning, and in the latter phase, therewill be a selective pressure in favor of instinct Strong bias is analogous to instinct,and weak bias is analogous to learning [60] The Baldwin effect only alters the fitnesslandscape and the basic evolutionary mechanism remains purely Darwinian Thus,the schema theorem still applies to the Baldwin effect [59]
A parent cannot pass its learned traits to its offspring, instead only the fitness afterlearning is retained In other words, the learned behaviors become instinctive behav-iors in subsequent generations, and there is no direct alteration of the genotype.The acquired traits finally come under direct genetic control after many genera-
tions, namely, genetic assimilation The Baldwin effect is purely Darwinian, not
Trang 24Lamarckian in its mechanism, although it has consequences that are similar to those
of the Lamarckian evolution [59] A computational model of the Baldwin effect ispresented in [27]
Hybridization of EAs and local search can be based either on the Lamarckianstrategy or on the Baldwin effect Local search corresponds to the phenotypic plas-ticity in biological evolution The hybrid methods based on the Lamarckian strategyand the Baldwin effect are very successful with numerous implementations
1.4 Swarm Intelligence
The definition of swarm intelligence was introduced in 1989, in the context of cellularrobotic systems [6] Swarm intelligence is a collective intelligence of groups ofsimple agents [8] Swarm intelligence deals with collective behaviors of decentralizedand self-organized swarms, which result from the local interactions of individualcomponents with one another and with their environment [8] Although there isnormally no centralized control structure dictating how individual agents shouldbehave, local interactions among such agents often lead to the emergence of globalbehavior
Most species of animals show social behaviors Biological entities often engage
in a rich repertoire of social interaction that could range from altruistic cooperation
to open conflict The well-known examples for swarms are bird flocks, herds ofquadrupeds, bacteria molds, fish schools for vertebrates, and the colony of socialinsects such as termites, ants, bees, and cockroaches, that perform collective behavior.Through flocking, individuals gain a number of advantages, such as having reducedchances of being captured by predators, following migration routes in a precise androbust way through collective sensing, having improved energy efficiency during thetravel, and the opportunity of mating
The concept of individual–organization [57] has been widely used to understandcollective behavior of animals The principle of individual–organization indicatesthat simple repeated interactions between individuals can produce complex behav-ioral patterns at group level [57] The agents of these swarms behave without super-vision and each of these agents has a stochastic behavior due to its perception from,and also influence on, the neighborhood and the environment The behaviors can
be accurately described in terms of individuals following simple sets of rules Theexistence of collective memory in animal groups [15] establishes that the previoushistory of the group structure influences the collective behavior in future stages.Grouping individuals often have to make rapid decisions about where to move
or what behavior to perform, in uncertain or dangerous environments Groups areoften composed of individuals that differ with respect to their informational status,and individuals are usually not aware of the informational state of others Someanimal groups are based on a hierarchical structure according to a fitness principleknown as dominance The top member of the group leads all members of that group,e.g., in the cases of lions, monkeys, and deer Such animal behaviors lead to stable
Trang 251.4 Swarm Intelligence 7
groups with better cohesion properties among individuals [9] Some animals, likebirds, fishes and sheep droves, live in groups but have no leader This type of animalshas no knowledge about their group and environment Instead, they can move in theenvironment via exchanging data with their adjacent members
Different swarm intelligence systems have inspired several approaches, includingparticle swarm optimization (PSO) [21], based on the movement of bird flocks andfish schools; the immune algorithm by the immune systems of mammals; bacteria for-aging optimization [50], which models the chemotactic behavior of Escherichia coli;ant colony optimization (ACO) [17], inspired on the foraging behavior of ants; andartificial bee colony (ABC) [35], based on foraging behavior of honeybee swarms.Unike EAs, which are primarily competitive among the population, PSO andACO adopt a more cooperative strategy They can be treated as ontogenetic, sincethe population resembles a multicellular organism optimizing its performance byadapting to its environment
Many population-based metaheuristics are actually social algorithms Culturalalgorithm [53] is introduced for modeling social evolution and learning Ant colonyoptimization is a metaheuristic inspired by ant colony behavior in finding the short-est path to reach food sources Particle swarm optimization is inspired by socialbehavior and movement dynamics of insect swarms, bird flocking, and fish school-ing Artificial immune system is inspired by biological immune systems, and exploittheir characteristics of learning and memory to solve optimization problems Societyand civilization method [52] utilizes the intra and intersociety interactions within asociety and the civilization model
1.4.1 Group Behaviors
In animal behavioral ecology, group living is a widespread phenomenon Animalsearch behavior is an active movement by which an animal attempts to find resourcessuch as food, mates, oviposition, or nesting sites In nature, group members oftenhave different search and competitive abilities Subordinates, who are less efficientforagers than the dominant, will be dispersed from the group Dispersed animalsmay adopt ranging behavior to explore and colonize new habitats
Group search usually adopts two foraging strategies within the group: producing(searching for food) and joining (scrounging) Joining is a ubiquitous trait found inmost social animals such as birds, fish, spiders, and lions In order to analyze theoptimal policy for joining, two models for joining are information-sharing [13] andproducer–scrounger [4] Information-sharing model assumes that foragers searchconcurrently for their own resource while searching for opportunities to join Inproducer–scrounger model, foragers are assumed to use producing (finding) or join-ing (scrounging) strategies exclusively; they are divided into leaders and followers.For the joining policy of ground-feeding birds, producer–scrounger model ismore plausible than information-sharing model In producer–scrounger model, threebasic scrounging strategies are observed in house sparrows (Passer domesticus):area copying—moving across to search in the immediate area around the producer,
Trang 26following—following another animal around without exhibiting any searching ior, and snatching—taking a resource directly from the producer.
behav-The organization of collective behaviors in social insects can be understood as acombination of the four functions of organization: coordination, cooperation, deliber-ation, and collaboration [3] The coordination function regulates the spatio-temporaldensity of individuals, while the collaboration function regulates the allocation oftheir activities The deliberation function represents the mechanisms that supportthe decisions of the colony, while the cooperation function represents the mecha-nisms that overstep the limitations of the individuals Together, the four functions oforganization produce solutions to the colony problems
The extracted general cooperative group behaviors, search strategies, and munication methods are useful within a computing context [3]
com-• Cooperation and group behavior
Cooperation among individuals of the same or different species must benefit thecooperators, whether directly or indirectly Socially, the group may be individualsworking together for mutual benefit, or individuals each with their own specializedrole Competition for the available resources may restrict the size of the group
• Search strategies
The success of a species depends on many factors, including its ability to searcheffectively for resources, such as food and water, in a given environment Searchstrategies can be broadly divided into sit and wait (for ambush) and foraging widely(for active searchers) Compared to the latter, the former has a lower opportunity
to get food, but with a low energy consumption
observed as symbiosis, host–parasite systems, and prey–predator systems, in which
two organisms mutually support each other, one exploits the other, or they fightagainst each other For instance, symbiosis between plants and fungi are very com-mon, where the fungus invades and lives among the cortex cells of the secondary rootsand, in turn, helps the host plant absorb minerals from the soil Cleaning symbiosis
is common in fish
1.4.2 Foraging Theory
Natural selection has a tendency to eliminate animals having poor foraging strategiesand favor the ones with successful foraging strategies to propagate their genes After
Trang 27of animals cooperatively forage.
Some animals forage as individuals and others forage as groups with a type ofcollective intelligence While an animal needs communication capabilities to performsocial foraging, it can exploit essentially the sensing capabilities of the group Thegroup can catch large prey, individuals can obtain protection from predators while
in a group
In general, a foraging strategy involves finding a patch of food, deciding whether
to proceed and search for food, and when to leave the patch There are predators andrisks, energy required for travel, and physiological constraints (sensing, memory,cognitive capabilities) Foraging scenarios can be modeled and optimal policies can
be found using dynamic programming Search and optimal foraging decision-making
of animals can be one of three basic types: cruise (e.g., tunafish and hawks), saltatory(e.g., birds, fish, lizards, and insects), and ambush (e.g., snakes and lions) In cruisesearch, an animal searches the perimeter of a region; in an ambush, it sits and waits;
in saltatory search, an animal typically moves in some directions, stops or slowsdown, looks around, and then changes direction over a whole region
1.5 Heuristics, Metaheuristics, and Hyper-Heuristics
Many real-life optimization problems are difficult to solve by exact optimizationmethods, due to properties, such as high dimensionality, multimodality, epistasis(parameter interaction), and non-differentiability Hence, approximate algorithms are
an alternative approach for these problems Approximate algorithms can be
decom-posed into heuristics and metaheuristics The words meta and heuristic both have their origin in the old Greek: meta means upper level, and heuristic denotes the art
of discovering new strategies [58]
Heuristic refers to experience-based techniques for problem-solving and learning
It gives a satisfactory solution in a reasonable amount of computational time, whichmay not be optimal Specific heuristics are problem-dependent and designed onlyfor the solution of a particular problem Examples of this method include using a rule
of thumb, an educated guess, an intuitive judgment, or even common sense Manyalgorithms, either exact algorithms or approximation algorithms, are heuristics.The term metaheuristic was coined by Glover in 1986 [25] to refer to a set ofmethodologies conceptually ranked above heuristics in the sense that they guidethe design of heuristics A metaheuristic is a higher level procedure or heuristicdesigned to find, generate, or select a lower level procedure or heuristic (partialsearch algorithm) that may provide a sufficiently good solution to an optimization
Trang 28problem By searching over a large set of feasible solutions, metaheuristics can oftenfind good solutions with less computational effort than calculus-based methods, orsimple heuristics, can.
Metaheuristics can be single-solution-based or population-based Single-solutionbased metaheuristics are based on a single solution at any time and compriselocal search-based metaheuristics such as SA, Tabu search, iterated local search[40,42], guided local search [61], pattern search or random search [31], Solis–Wetsalgorithm [54], and variable neighborhood search [45] In population-based meta-heuristics, a number of solutions are updated iteratively until the termination condi-tion is satisfied Population-based metaheuristics are generally categoried into EAsand swarm-based algorithms Single-solution-based metaheuristics are regarded to
be more exploitation-oriented, whereas population-based metaheuristics are moreexploration-oriented
The idea of hyper-heuristics can be traced back to the early 1960s [23] heuristics can be thought of as heuristics to choose heuristics or as search algorithmsthat explore the space of problem solvers A hyper-heuristic is a heuristic searchmethod that seeks to automate the process of selecting, combining, generating, oradapting several simpler heuristics to efficiently solve hard search problems The low-level heuristics are simple local search operators or domain-dependent heuristics,which operate directly on the solution space for a given problem instance Unlikemetaheuristics that search in a space of problem solutions, hyper-heuristics alwayssearch in a space of low-level heuristics
Hyper-Heuristic selection and heuristic generation are currently the two main ologies in hyper-heuristics In the first method, the hyper-heuristic chooses heuristicsfrom a set of known domain-dependent low-level heuristics In the second method,the hyper-heuristic evolves new low-level heuristics by utilizing the components
method-of the existing ones Hyper-heuristics can be based on genetic programming [11]
or grammatical evolution [10], which becomes an excellent candidate for heuristicgeneration
Several Single-Solution-Based Metaheuristics
Search strategies that randomly generate initial solutions and perform a local searchare also called multi-start descent search methods However, to randomly create aninitial solution and perform a local search often results in low solution quality as thecomplete search space is uniformly searched and search cannot focus on promisingareas of the search space
Variable neighborhood search [45] combines local search strategies with dynamicneighborhood structures subject to the search progress The local search is an inten-sification step focusing the search in the direction of high-quality solutions Diver-sification is a result of changing neighborhoods By changing neighborhoods, themethod can easily escape from local optima With an increasing cardinality of theneighborhoods, diversification gets stronger as the shaking steps can choose from alarger set of solutions and local search covers a larger area of the search space.Guided local search [61] uses a similar principle and dynamically changes thefitness landscape subject to the progress that is made during the search so that local
Trang 291.5 Heuristics, Metaheuristics, and Hyper-Heuristics 11
search can escape from local optima The neighborhood structure remains constant
It starts from a random solution x0and performs a local search returning the local
optimum x1 To escape the local optimum, a penalty is added to the fitness function
f such that the resulting fitness function h allows local search to escape A new local
search is started from x1using the modified fitness function h Search continues until
a termination criterion is met
Iterated local search [40,42] connects the unrelated local search phases as it createsinitial solutions not randomly but based on solutions found in previous local searchruns If the perturbation steps are too small, the search cannot escape from a localoptimum If perturbation is too strong, the search has the same behavior as multi-startdescent search methods The modification step as well as the acceptance criterioncan depend on the search history
1.6 Optimization
Optimization can generally be categorized into discrete or continuous optimization,depending on whether the variables are discrete or continuous ones There may belimits or constraints on the variables Optimization can be a static or a dynamicproblem depending upon whether the output is a function of time Traditionally,optimization is solved by calculus-based method, or based on random search, orenumerative search Heuristics-based optimization is the topic treated in this book.Optimization techniques can generally be divided into derivative methods andnonderivative methods, depending on whether or not derivatives of the objectivefunction are required for the calculation of the optimum Derivative methods arecalculus-based methods, which can be either gradient search methods or second-order methods These methods are local optimizers The gradient descent is also
known as steepest descent It searches for a local minimum by taking steps along
the negative direction of the gradient of the function Examples of second-ordermethods are Newton’s method, the Gauss-Newton method, quasi-Newton methods,the trust-region method, and the Levenberg-Marquardt method Conjugate gradientand natural gradient methods can also be viewed as reduced forms of the quasi-Newton method
Derivative methods can also be classified into model-based and metric-based
methods Model-based methods improve the current point by a local approximatingmodel Newton and quasi-Newton methods are model-based methods Metric-basedmethods perform a transformation of the variables and then apply a gradient searchmethod to improve the point The steepest-descent, quasi-Newton, and conjugategradient methods belong to this latter category
Methods that do not require gradient information to perform a search and
sequen-tially explore the solution space are called direct search methods They maintain
a group of points They utilize some sort of deterministic exploration methods tosearch the space and almost always utilize a greedy method to update the maintained
Trang 30Figure 1.3 The landscape of Rosenbrock function f (x) with two variables x1, x2 ∈
[−204.8, 204.8] The spacing of the grid is set as 1 There are many local minima, and the global
a local optimum When operating on continuous space, it is called gradient ascent.
Other nonderivative search methods include univariant search parallel to an axis (i.e.,coordinate search method), sequential simplex method, and acceleration methods indirect search such as the Hooke-Jeeves method, Powell’s method and Rosenbrock’smethod Interior-point methods represent state-of-the-art techniques for solving lin-ear, quadratic, and nonlinear optimization programs
Example 1.1: The Rosenbrock function
to the two-dimensional case (n = 2), with x1, x2∈ [−204.8, 204.8] The landscape
of this function is shown in Figure1.3
1.6.1 Lagrange Multiplier Method
The Lagrange multiplier method can be used to analytically solve continuous tion optimization problem subject to equality constraints [24] By introducing the
Trang 31func-1.6 Optimization 13
Lagrangian formulation, the dual problem associated with the primal problem isobtained, based on which the optimal values of the Lagrange multipliers can befound
Let f (x) be the objective function and h i (x) = 0, i = 1, , m, be the constraints.
The Lagrange function can be constructed as
where λ i , i = 1, , m, are called the Lagrange multipliers.
The constrained optimization problem is converted into an unconstrained
opti-mization problem: Optimize L (x; λ1, , λ m ) By setting
∂
∂x L (x; λ1, , λ m ) = 0, (1.2)
∂
∂λ i L (x; λ1, , λ m ) = 0, i = 1, , m, (1.3)
and solving the resulting set of equations, we can obtain the x position at the extremum
of f (x) under the constraints.
To deal with constraints, the Karush-Kuhn-Tucker (KKT) theorem, as a alization to the Lagrange multiplier method, introduces a slack variable into eachinequality constraint before applying the Lagrange multiplier method The conditions
gener-derived from the procedure are known as the KKT conditions [24]
1.6.2 Direction-Based Search and Simplex Search
In direct search, generally the gradient information cannot be obtained; thus, it isimpractical to implement a step in the negative gradient direction for a minimumproblem However, when the objectives of a group of solutions are available, thebest one can guide the search direction of the other solutions Many direction-basedsearch methods and EAs are inspired by this intuitive idea
Some of the direct search methods use improvement direction information tosearch the objective space Thus, it is useful to embed these directions into an EA aseither a local search method or an exploration operator
Simplex search [47], introduced by Nelder and Mead in 1965, a well-known ministic direction-based search method MATLAB contains a direct search toolboxbased on simplex search Scatter search [26] includes the elitism mechanism intosimplex search Like simplex search, for a group of points, the algorithm finds newpoints, accepts the better ones, and discards the worse ones Differential evolution(DE) [56] uses the directional information from the current population The mutationoperator of DE needs three randomly selected different individuals from the currentpopulation for each individual to form a simplex-like triangle
Trang 32deter-Simplex Search
Simplex search is a group-based deterministic local search method capable of ing the objective space very fast Thus many EAs use simplex search as a local searchmethod after mutation
explor-A simplex is a collection of n + 1 points in n-dimensional space In an optimization problem involving n variables, simplex method searches for an optimization solution
by evaluating a set of n+ 1 points The method continuously forms new simplices
by replacing the point having the worst performance in a simplex with a new point.The new point is generated by reflection, expansion, and contraction operations
In a multidimensional space, the subtraction of two vectors means a new vector
starting at one vector and ending at the other, like x2− x1 We often refer to thesubtraction of two vectors as a direction Addition of two vectors can be implemented
in a triangular way, moving the start of one vector to the end of the other to form
another vector The expression x3+ (x2− x1) can be regarded as the destination of
a moving point that starts at x3and has a length and direction of x2− x1
For every new simplex, several points are assigned according to their objectivevalues Then simplex search repeats reflection, expansion, contraction, and shrink in
a very efficient and deterministic way Vertices of the simplex will move toward theoptimal point and the simplex will become smaller and smaller Stop criteria can beselected as a predetermined number of maximal iterations, the length of the edge orthe improving rate of B
Simplex search for minimization is shown in Algorithm 1.1 The coefficients forthe reflection, expansion, contraction, and shrinking operations are typically selected
as α = 1, β = 2, γ = −1/2, and δ = 1/2 The initial simplex is important The
search may easily get stuck for too small an initial simplex This simplex should beselected depending on the nature of the problem
1.6.3 Discrete Optimization Problems
The discrete optimization problem is also known as combinatorial optimization lem (COP) Any problem that has a large set of discrete solutions and a cost function
prob-for rating those solutions relative to one another is a COP COPs are known to beNP-complete.1 The goal for COPs is to find an optimal solution or sometimes anearly optimal solution In COPs, the number of solutions grows exponentially with
the size of the problem n at O (n!) or O (e n ) such that no algorithm can find the global
minimum solution in a polynomial computational time
Definition 1.1 (Discrete optimization problem) A discrete optimization problem
is denoted as (X , f , ), or as minimizing the objective function
min f (x), x ∈ X , subject to , (1.4)
1 Namely, nondeterministic polynomial-time complete.
Trang 33a Find the worst and best individuals as x h and x l.
Calculate the centroid of all x i ’s, i = h, as x.
b Enter reflection mode:
until termination condition is satisfied.
where X ⊂ R N is the search space defined over a finite set of N discrete decision
variables x = (x1, x2, , x N ) T , f : X → R, is the set of constraints on x Space
X is constructed according to all the constraints imposed on the problem.
Definition 1.2 (Feasible solution) A vector x that satisfies the set of constraints for
an optimization problem is called a feasible solution.
Traveling salesman problem (TSP) is perhaps the most famous COP Given a set
of points, either nodes on a graph or cities on a map, find the shortest possible tourthat visits every point exactly once and then returns to its starting point There are
(n − 1)!/2 possible tours for an n-city TSP TSP arises in numerous applications,
from routing of wires on a printed circuit board (PCB), VLSI circuit design, to fastfood delivery
Multiple traveling salesmen problem (MTSP) generalizes TSP using more than
one salesman Given a set of cities and a depot, m salesmen must visit all cities
according to the constraints that the route formed by each salesman must start andend at the depot, that each intermediate city must be visited once and by a singlesalesman, and that the cost of the routes must be minimum TSP with a time window
is a variant of TSP in which each city is visited within a given time window.The vehicle routing problem concerns the transport of items between depots andcustomers by means of a fleet of vehicles It can be used for logistics and public
Trang 34services, such as milk delivery, mail or parcel pick-up and delivery, school busrouting, solid waste collection, dial-a-ride systems, and job scheduling Two well-known routing problems are TSP and MTSP.
The location-allocation problem is defined as follows Given a set of facilities,each of which serves a certain number of nodes on a graph, the objective is to placethe facilities on the graph so that the average distance between each node and itsserving facility is minimized
1.6.4 P, NP, NP-Hard, and NP-Complete
An issue related to the efficiency and efficacy of an algorithm is how hard the problemitself is The optimization problem is first transformed into a decision problem
Problems that can be solved using a polynomial-time algorithm are tractable A polynomial-time algorithm has an upper bound O (n k ) on its running time, where k is
a constant and n is the problem size (input size) Usually, tractable problems are easy
to solve as running time increases relatively slowly with n In contrast, problems are intractable if they cannot be solved by a polynomial-time algorithm and there is a
lower bound on the running time which is(k n ), where k > 1 is a constant and n is
the input size
The complexity class P (standing for polynomial time complexity) is defined asthe set of decision problems that can be solved by a deterministic Turing machineusing an algorithm with worst-case polynomial time complexity P problems areusually easy as there are algorithms that solve them in polynomial time
The class NP (standing for nondeterministic polynomial time complexity) is theset of all decision problems that can be verified by a nondeterministic Turing machineusing a nondeterministic algorithm in worst-case polynomial time Although nonde-terministic algorithms cannot be executed directly on conventional computers, thisconcept is important and helpful for the analysis of the computational complexity
of problems All problems in P also belong to the class NP, i.e., P ⊆ NP There are
also problems where correct solutions cannot be verified in polynomial time.All decision problems in P are tractable Those problems that are in NP, but not in
P, are difficult as no polynomial-time algorithms exist for them There are problems
in NP where no polynomial algorithm is available and which can be transformed into
one another with polynomial effort A problem is said to be NP-hard, if an algorithm
for solving this problem is polynomial-time reducible to an algorithm that is able tosolve any problem in NP Therefore, NP-hard problems are at least as hard as anyother problem in NP, and are not necessarily in NP
The set of NP-complete problems is a subset of NP [14] A decision problem A is said to be NP-complete, if A is in NP and A is also NP-hard NP-complete problems
are the hardest problems in NP They all have the same complexity They are difficult
as no polynomial-time algorithms are known Decision problems that are not in NPare even more difficult The relationship between all these classes is illustrated inFigure1.4
Trang 35Practical COPs are all NP-complete or NP-hard Right now, no algorithm withpolynomial time complexity can guarantee that an optimal solution will be found.
1.6.5 Multiobjective Optimization Problem
A multiobjective optimization problem (MOP) requires finding a variable vector x
in the domainX that optimizes the objective vector f (x).
Definition 1.3 (Multiobjective optimization problem) MOP is to optimize a
sys-tem with k conflicting objectives
Conflicting objectives will be the case where increasing the quality of one objectivetends to simultaneously decrease the quality of another objective The solution to
an MOP is not a single optimal solution, but a set of solutions representing the besttrade-offs among the objectives
In order to optimize a system with conflicting objectives, the weighted sum ofthese objectives is usually used as the compromise of the system
Trang 36The Pareto method is a popular method for multiobjective optimization It is based
on the principle of nondominance The Pareto optimum gives a set of solutions forwhich there is no way of improving one criterion without deteriorating anothercriterion In MOPs, the concept of dominance provides a means by which multiplesolutions can be compared and subsequently ranked
Definition 1.4 (Pareto dominance) A variable vector x1∈ R n is said to dominate
another vector x2∈ R n , denoted x1 2, if and only if x1is better than or equal to
x2in all attributes, and strictly better in at least one attribute, i.e., ∀i: f i (x1) ≥ f i (x2)
∧∃j: f j (x1) > f j (x2).
For two solutions x1, x2, if x1 is better in all objectives than x2, x1 is said to
strongly dominate x2 If x1is not worse than x2in all objectives and better in at least
one objective, x1is said to dominate x2 A nondominated set is a set of solutions thatare not weakly dominated by any other solution in the set
Definition 1.5 (Nondominance) A variable vector x1∈ X ⊂ R n is nondominated with respect to X , if there does not exist another vector x2∈ X such that x2≺ x1.
Definition 1.6 (Pareto optimality) A variable vector x∗∈ F ⊂ R n (F is the sible region) is Pareto optimal if it is nondominated with respect to F.
fea-Definition 1.7 (Pareto optimal frontier) The Pareto optimal frontier P∗is defined
by the space in R n formed by all Pareto optimal solutions P∗= {x ∈ F|x
optimiza-so that no regions are left unexplored
An illustration of Pareto optimal solutions for a two-dimensional problem withtwo objectives is given in Figure1.5 The upper border from points A to B of thedomainX , denoted P∗, contains all Pareto optimal solutions The frontier from points
f A to f Balong the lower border of the domainY, denoted PF∗, contains all Pareto
frontier in the objective space For two points a and b, their mapping f dominates f ,
Trang 37Figure 1.5 An illustration of Pareto optimal solutions for a two-dimensional problem with two
objectives.X ⊂ R n is the domain of x, and Y ⊂ R m is the domain of f (x).
Y
*
f f
Figure 1.6 Different Pareto fronts a Convex b Concave c Discontinuous.
denoted f a ≺ f b Hence, the decision vector x ais a nondominated solution Figure1.6illustrates that Pareto fronts can be convex, concave, or discontinuous
Definition 1.9 ( ε-dominance) A variable vector x1∈ R n is said to ε-dominate
another vector x2∈ R n , denoted x1 ε x2, if and only if x1 is better than or equal to εx2 in all attributes, and strictly better in at least one attribute, i.e., ∀i:
f i (x1) ≥ f i (εx2) ∧∃j: f j (x1) > f j (εx2) [69].
If ε = 1, ε-dominance is the same as Pareto dominance; otherwise, the area
dom-inated by x i is enlarged or shrunk Thus, ε-dominance relaxes the area of Pareto dominance by a factor of ε.
1.6.6 Robust Optimization
The robustness of a particular solution can be confirmed by resampling or by reusingneighborhood solutions Resampling is reliable, but computationally expensive In
Trang 38contrast, the method of reusing neighborhood solutions is cheap but unreliable Aconfidence measure increases the reliability of the latter method In [44], confidence-based operators are defined for robust metaheuristics The confidence metric and fiveconfidence-based operators are employed to design confidence-based robust PSOand confidence-based robust GA History can be utilized in helping to estimate theexpected fitness of an individual to produce more robust solutions in EAs.
Confidence metric defines the confidence level of a robust solution The highestconfidence is achieved when there are a large number of solutions available withgreatest diversity within a suitable neighborhood around the solution in the parameterspace Mathematically, confidence is expressed by [44]
Overall Performance Indicators
The overall performance indicators provide a general description for the mance Overall performance can be compared according to their efficacy, efficiency,and reliability on a benchmark problem with many runs
perfor-Efficacy evaluates the quality of the results without caring about the speed of an
algorithm Mean best fitness (MBF) is defined as the average of the best fitness in the
last population over all runs The best fitness values thus far can be used as a moreabsolute measure for efficacy
Reliability indicates the extent to which the algorithm can provide acceptable
results Success rate (SR) is defined as the percentage of runs terminated with success.
A successful run is defined as the difference between the best fitness value in the last
generation f∗and a predefined value f o under a predefined threshold ε.
Efficiency requires finding the global optimal solution rapidly Average number
of evaluations to a solution (AES) is defined as the average number of evaluations
it takes for the successful runs If an algorithm has no successful runs, its AES isundefined
Low SR and high MBF may indicate that the algorithm converges slowly, whilehigh SR and low MBF may indicate that the algorithm is basically reliable, but mayprovide very bad results accidentally It is desirable to have smaller AES and larger
SR, thus small AES/SR criterion considers reliability and efficiency at the same time
Trang 391.7 Performance Indicators 21
Evolving Performance Indicators
Several generation-based evolving performance indicators can provide more detailedinformation
• Best-so-far (BSF) records the best solution found by the algorithm thus far for
each generation in every run BSF index is monotonic
• Best-of-current-population (BCP) records the best solution in each generation in
every run MBF is the average of final BCP or final BSF over multiple runs
• Average-of-current-population (ACP) records the average solution in each
gener-ation in every run
• Worst-of-current-population (WCP) records the worst solution in each generation
in every run
After many runs with random initial setting, we can draw conclusions on an rithm by applying statistical descriptions, e.g., statistical visualization, descriptivestatistics, and statistical inference
algo-Statistical visualization uses graphs to describe and compare algorithms The boxplot is widely used for this purpose Suppose we run an algorithm on a problem 100times and get 100 values of a performance indicator We can rank the 100 numbers
in ascending order On each box, the central mark is the median, the lower and upperedges are the 25th and 75th percentiles, the whiskers extend to the most extremedata points not considered outliers, and outliers are plotted individually by+ Theinterquartile range (IQR) is between the lower and upper edges of the box Anydata that lie more than 1.5IQR lower than the lower quartile or 1.5IQR higher thanthe higher quartile is considered an outlier Two lines called whiskers are plotted toindicate the smallest number that is not a lower outlier and the largest number that
is not a higher outlier The default 1.5IQR corresponds to approximately±2.7σ and
99.3 coverage if the data are normally distributed
The box plot for BSF performance of two algorithms is illustrated in Figure1.7.Algorithm 2 has a larger median BDF and a smaller IQR, that is, better average
−4
−2
0 2 4 6
Algorithm 1 Algorithm 2
Figure 1.7 Box plot of the BSF of two algorithms.
Trang 40performance along with smaller variance, thus it outperforms algorithm 1 Also, forthe evolving process of many runs, the convergence graph illustrating the perfor-mance over number of fitness evaluation (NOFE) is quite useful.
Graphs are easy to understand When the difference between different algorithms
is small, one has to calculate specific numbers to describe and compare the mance The most often used descriptive statistics are mean and variance (or standarddeviation) of all performance indicators and compare them Statistical inferenceincludes parameter estimation, hypothesis testing, and many other techniques
perfor-1.8 No Free Lunch Theorem
Before no free lunch theorem [63,64] was proposed in 1995, people intuitivelybelieved that there exists some universally beneficial algorithms for search, andmany people actually made efforts to design some algorithms No free lunch theoremasserts that there is no universally beneficial algorithm
The original no free lunch theorem for optimization states that no search algorithm
is better than another in locating an extremum of a function when averaged over theset of all possible discrete functions That is, all search algorithms achieve the sameperformance as random enumeration, when evaluated over the set of all functions
Theorem 1.1 (No free lunch theorem) Given the set of all functions F and a set
of benchmark functions F1, if algorithm A1is better on average than algorithm A2
on F1, then algorithm A2must be better than algorithm A1on F \ F1.
When there is no structural knowledge at all, all algorithms have equal mance No free lunch theorem is feasible for non-revisiting algorithms with noproblem-specific knowledge It seems to be true because of deceptive functions andrandom functions Deceptive functions lead a hill-climber away from the optimum.For random functions, search for optimum is totally at no where For the two classes
perfor-of functions, it is like finding a needle in a haystack
No free lunch theorem is concerned with the average performance for solvingall problems In applications, such a scenario is hardly ever realistic since there isalmost always some knowledge about typical solutions Practical problems alwayscontain priors such as smoothness, symmetry, and i.i.d samples The performance
of any algorithm is determined by the knowledge concerning the cost function Thus,
it is meaningless to evaluate the performance of an algorithm without specifying theprior knowledge Thus, developing search algorithms actually builds special-purposemethods to solve application-specific problems For example, there are potentiallyfree lunches for coevolutionary approaches [65]
No free lunch theorem was later extended to coding methods, crossvalidation [67],early stopping [12], avoidance of overfitting, and noise prediction [41] Again, it hasbeen asserted that no one method is better than the others for all problems