Search and optimization by metaheuristics

While some of the problems can be simple thatcan be solved by traditional optimization methods based on mathematical analysis,most of the problems are very hard to be solved using analys

Trang 1

Search and

Optimization by Metaheuristics

Trang 4

ISBN 978-3-319-41191-0 ISBN 978-3-319-41192-7 (eBook)

DOI 10.1007/978-3-319-41192-7

Library of Congress Control Number: 2016943857

Mathematics Subject Classiﬁcation (2010): 49-04, 68T20, 68W15

This work is subject to copyright All rights are reserved by the Publisher, whether the whole or part

of the material is concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on micro ﬁlms or in any other physical way, and transmission

or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.

The use of general descriptive names, registered names, trademarks, service marks, etc in this publication does not imply, even in the absence of a speci ﬁc statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made.

Printed on acid-free paper

This book is published under the trade name Birkh äuser

The registered company is Springer International Publishing AG Switzerland

(www.birkhauser-science.com)

Trang 5

To My Friends Jiabin Lu and Biaobiao Zhang

Ke-Lin Du and

To My Parents

M.N.S Swamy

Trang 6

Optimization is a branch of applied mathematics and numerical analysis Almostevery problem in engineering, science, economics, and life can be formulated as anoptimization or a search problem While some of the problems can be simple thatcan be solved by traditional optimization methods based on mathematical analysis,most of the problems are very hard to be solved using analysis-based approaches.Fortunately, we can solve these hard optimization problems by inspirations fromnature, since we know that nature is a system of vast complexity and it alwaysgenerates a near-optimum solution

Natural computing is concerned with computing inspired by nature, as well aswith computations taking place in nature Well-known examples of natural com-puting are evolutionary computation, neural computation, cellular automata, swarmintelligence, molecular computing, quantum computation, artiﬁcial immune sys-tems, and membrane computing Together, they constitute the discipline of com-putational intelligence

Among all the nature-inspired computational paradigms, evolutionary tation is most influential It is a computational method for obtaining the best pos-sible solutions in a huge solution space based on Darwin’s survival-of-the-ﬁttestprinciple Evolutionary algorithms are a class of effective global optimizationtechniques for many hard problems

compu-More and more biologically inspired methods have been proposed in the pasttwo decades The most prominent ones are particle swarm optimization, ant colonyoptimization, and immune algorithm These methods are widely used due to theirparticular features compared with evolutional computation All these biologicallyinspired methods are population-based Computation is performed by autonomousagents, and these agents exchange information by social behaviors The memeticalgorithm models the behavior of knowledge propagation of animals

There are also many other nature-inspired metaheuristics for search and mization These include methods inspired by physical laws, chemical reaction,biological phenomena, social behaviors, and animal thinking

opti-Metaheuristics are a class of intelligent self-learning algorithms for ﬁndingnear-optimum solutions to hard optimization problems, mimicking intelligentprocesses and behaviors observed from nature, sociology, thinking, and otherdisciplines Metaheuristics may be nature-inspired paradigms, stochastic, or

vii

Trang 7

probabilistic algorithms Metaheuristics-based search and optimization are widelyused for fully automated decision-making and problem-solving.

In this book, we provide a comprehensive introduction to nature-inspiredmetaheuristical methods for search and optimization While eachmetaheuristics-based method has its speciﬁc strength for particular cases, according

to no free lunch theorem, it has actually the same performance as that of randomsearch in consideration of the entire set of search and optimization problems Thus,when talking about the performance of an optimization method, it is actually based

on the same benchmarking examples that are representatives of some particularclass of problems

This book is intended as an accessible introduction to metaheuristic optimizationfor a broad audience It provides an understanding of some fundamental insights onmetaheuristic optimization, and serves as a helpful starting point for those interested

in more in-depth studies of metaheuristic optimization The computational digms described in this book are of general purpose in nature This book can beused as a textbook for advanced undergraduate students and graduate students Allthose interested in search and optimization can beneﬁt from this book Readersinterested in a particular topic will beneﬁt from the appropriate chapter

para-A roadmap for navigating through the book is given as follows Except theintroductory Chapter1, the contents of the book can be grossly divided into ﬁvecategories and an appendix

• Evolution-based approach is covered in Chapters 3–8:

Chapter3 Genetic Algorithms

Chapter4 Genetic Programming

Chapter5 Evolutionary Strategies

Chapter6 Differential Evolution

Chapter7 Estimation of Distribution Algorithms

Chapter8 Topics in Evolutionary Algorithms

• Swarm intelligence-based approach is covered in Chapters9–15:

Chapter9 Particle Swarm Optimization

Chapter10 Artiﬁcial Immune Systems

Chapter11 Ant Colony Optimization

Chapter12 Bee Metaheuristics

Chapter13 Bacterial Foraging Algorithm

Chapter14 Harmony Search

Chapter15 Swarm Intelligence

• Sciences-based approach is covered in Chapters 2,16–18:

Chapter2 Simulated Annealing

Chapter16 Biomolecular Computing

Chapter17 Quantum Computing

Chapter18 Metaheuristics Based on Sciences

• Human-based approach is covered in Chapters19–21:

Trang 8

Chapter19 Memetic Algorithms

Chapter20 Tabu Search and Scatter Search

Chapter21 Search Based on Human Behaviors

• General optimization problems are treated in Chapters22–23:

Chapter22 Dynamic, Multimodal, and Constrained Optimizations

Chapter23 Multiobjective Optimization

• The appendix contains auxiliary benchmarks helpful to test new and existingalgorithms

In this book, hundreds of different metaheuristic methods are introduced.However, due to space limitation, we only give detailed description to a largenumber of the most popular metaheuristic methods Some computational examplesfor representative metaheuristic methods are given The MATLAB codes for theseexamples are available at the book website We have also collected some MATLABcodes for some other metaheuristics These codes are of general purpose in nature.The reader needs just to run these codes with their own objective functions.For instructors, this book has been designed to serve as a textbook for courses onevolutionary algorithms or nature-inspired optimization This book can be taught in

12 two-hour sessions We recommend that Chapters1–11,19,22and23should betaught In order to acquire a mastery of these popular metaheuristic algorithms,some programming exercises using the benchmark functions given in the appendixshould be assigned to the students The MATLAB codes provided with the book areuseful for learning the algorithms

For readers, we suggest that you start with Chapter 1, which covers basicconcepts in optimization and metaheuristics When you have digested the basics,you can delve into one or more specific metaheuristic paradigms that you areinterested in or that satisfy your specific problems The MATLAB codes accom-panying the book are very useful for learning those popular algorithms, and theycan be directly used for solving your specific problems The benchmark functionsare also very useful for researchers for evaluating their own algorithms

We would like to thank Limin Meng (Zhejiang University of Technology,China), and Yongyao Yang (SUPCON Group Inc, China) for their consistenthelp We would like to thank all the helpful and thoughtful staff at Xonlink Inc Lastbut not least, we would like to recognize the assistance of Benjamin Levitt and theproduction team at Springer

Trang 9

1 Introduction 1

1.1 Computation Inspired by Nature 1

1.2 Biological Processes 3

1.3 Evolution Versus Learning 5

1.4 Swarm Intelligence 6

1.4.1 Group Behaviors 7

1.4.2 Foraging Theory 8

1.5 Heuristics, Metaheuristics, and Hyper-Heuristics 9

1.6 Optimization 11

1.6.1 Lagrange Multiplier Method 12

1.6.2 Direction-Based Search and Simplex Search 13

1.6.3 Discrete Optimization Problems 14

1.6.4 P, NP, NP-Hard, and NP-Complete 16

1.6.5 Multiobjective Optimization Problem 17

1.6.6 Robust Optimization 19

1.7 Performance Indicators 20

1.8 No Free Lunch Theorem 22

1.9 Outline of the Book 23

References 25

2 Simulated Annealing 29

2.1 Introduction 29

2.2 Basic Simulated Annealing 30

2.3 Variants of Simulated Annealing 33

References 35

3 Genetic Algorithms 37

3.1 Introduction to Evolutionary Computation 37

3.1.1 Evolutionary Algorithms Versus Simulated Annealing 39

3.2 Terminologies of Evolutionary Computation 39

3.3 Encoding/Decoding 42

3.4 Selection/Reproduction 43

3.5 Crossover 46

xi

Trang 10

3.6 Mutation 48

3.7 Noncanonical Genetic Operators 49

3.8 Exploitation Versus Exploration 51

3.9 Two-Dimensional Genetic Algorithms 55

3.10 Real-Coded Genetic Algorithms 56

3.11 Genetic Algorithms for Sequence Optimization 60

References 64

4 Genetic Programming 71

4.1 Introduction 71

4.2 Syntax Trees 72

4.3 Causes of Bloat 75

4.4 Bloat Control 76

4.4.1 Limiting on Program Size 77

4.4.2 Penalizing the Fitness of an Individual with Large Size 77

4.4.3 Designing Genetic Operators 77

4.5 Gene Expression Programming 78

References 80

5 Evolutionary Strategies 83

5.1 Introduction 83

5.2 Basic Algorithm 84

5.3 Evolutionary Gradient Search and Gradient Evolution 85

5.4 CMA Evolutionary Strategies 88

References 90

6 Differential Evolution 93

6.1 Introduction 93

6.2 DE Algorithm 94

6.3 Variants of DE 97

6.4 Binary DE Algorithms 100

6.5 Theoretical Analysis on DE 100

References 101

7 Estimation of Distribution Algorithms 105

7.1 Introduction 105

7.2 EDA Flowchart 107

7.3 Population-Based Incremental Learning 108

7.4 Compact Genetic Algorithms 110

7.5 Bayesian Optimization Algorithm 112

7.6 Concergence Properties 112

7.7 Other EDAs 113

7.7.1 Probabilistic Model Building GP 115

References 116

Trang 11

8 Topics in Evolutinary Algorithms 121

8.1 Convergence of Evolutinary Algorithms 121

8.1.1 Schema Theorem and Building-Block Hypothesis 121

8.1.2 Finite and Infinite Population Models 123

8.2 Random Problems and Deceptive Functions 125

8.3 Parallel Evolutionary Algorithms 127

8.3.1 Master–Slave Model 129

8.3.2 Island Model 130

8.3.3 Cellular EAs 132

8.3.4 Cooperative Coevolution 133

8.3.5 Cloud Computing 134

8.3.6 GPU Computing 135

8.4 Coevolution 136

8.4.1 Coevolutionary Approaches 137

8.4.2 Coevolutionary Approach for Minimax Optimization 138

8.5 Interactive Evolutionary Computation 139

8.6 Fitness Approximation 139

8.7 Other Heredity-Based Algorithms 141

8.8 Application: Optimizating Neural Networks 142

References 146

9 Particle Swarm Optimization 153

9.2 Basic PSO Algorithms 154

9.2.1 Bare-Bones PSO 156

9.2.2 PSO Variants Using Gaussian or Cauchy Distribution 157

9.2.3 Stability Analysis of PSO 157

9.3 PSO Variants Using Different Neighborhood Topologies 159

9.4 Other PSO Variants 160

9.5 PSO and EAs: Hybridization 164

9.6 Discrete PSO 165

9.7 Multi-swarm PSOs 166

References 169

10 Artificial Immune Systems 175

10.2 Immunological Theories 177

10.3 Immune Algorithms 180

10.3.1 Clonal Selection Algorithm 180

10.3.2 Artificial Immune Network 184

10.3.3 Negative Selection Algorithm 185

10.3.4 Dendritic Cell Algorithm 186

References 187

Trang 12

11 Ant Colony Optimization 191

11.2 Ant-Colony Optimization 192

11.2.1 Basic ACO Algorithm 194

11.2.2 ACO for Continuous Optimization 195

References 198

12 Bee Metaheuristics 201

12.2 Artificial Bee Colony Algorithm 203

12.2.1 Algorithm Flowchart 203

12.2.2 Modifications on ABC Algorithm 207

12.2.3 Discrete ABC Algorithms 208

12.3 Marriage in Honeybees Optimization 209

12.4 Bee Colony Optimization 210

12.5 Other Bee Algorithms 211

12.5.1 Wasp Swarm Optimization 212

References 213

13 Bacterial Foraging Algorithm 217

13.2 Bacterial Foraging Algorithm 219

13.3 Algorithms Inspired by Molds, Algae, and Tumor Cells 222

References 224

14 Harmony Search 227

14.2 Harmony Search Algorithm 228

14.3 Variants of Harmony Search 230

14.4 Melody Search 233

References 234

15 Swarm Intelligence 237

15.1 Glowworm-Based Optimization 237

15.1.1 Glowworm Swarm Optimization 238

15.1.2 Firefly Algorithm 239

15.2 Group Search Optimization 240

15.3 Shuffled Frog Leaping 241

15.4 Collective Animal Search 242

15.5 Cuckoo Search 243

15.6 Bat Algorithm 246

15.7 Swarm Intelligence Inspired by Animal Behaviors 247

15.7.1 Social Spider Optimization 247

15.7.2 Fish Swarm Optimization 249

15.7.3 Krill Herd Algorithm 250

15.7.4 Cockroach-Based Optimization 251

15.7.5 Seven-Spot Ladybird Optimization 252

Trang 13

15.7.6 Monkey-Inspired Optimization 252

15.7.7 Migrating-Based Algorithms 253

15.7.8 Other Methods 254

15.8 Plant-Based Metaheuristics 255

15.9 Other Swarm Intelligence-Based Metaheuristics 257

References 259

16 Biomolecular Computing 265

16.1.1 Biochemical Networks 267

16.2 DNA Computing 268

16.2.1 DNA Data Embedding 271

16.3 Membrane Computing 271

16.3.1 Cell-Like P System 272

16.3.2 Computing by P System 273

16.3.3 Other P Systems 275

16.3.4 Membrane-Based Optimization 277

References 278

17 Quantum Computing 283

17.2 Fundamentals 284

17.2.1 Grover's Search Algorithm 286

17.3 Hybrid Methods 287

17.3.1 Quantum-Inspired EAs 287

17.3.2 Other Quantum-Inspired Hybrid Algorithms 290

References 291

18 Metaheuristics Based on Sciences 295

18.1 Search Based on Newton's Laws 295

18.2 Search Based on Electromagnetic Laws 297

18.3 Search Based on Thermal-Energy Principles 298

18.4 Search Based on Natural Phenomena 299

18.4.1 Search Based on Water Flows 299

18.4.2 Search Based on Cosmology 301

18.4.3 Black Hole-Based Optimization 302

18.5 Sorting 303

18.6 Algorithmic Chemistries 304

18.6.1 Chemical Reaction Optimization 304

18.7 Biogeography-Based Optimization 306

18.8 Methods Based on Mathematical Concepts 309

18.8.1 Opposition-Based Learning 310

References 311

19 Memetic Algorithms 315

19.2 Cultural Algorithms 316

Trang 14

19.3 Memetic Algorithms 318

19.3.1 Simplex-based Memetic Algorithms 320

19.4 Application: Searching Low Autocorrelation Sequences 321

References 324

20 Tabu Search and Scatter Search 327

20.1 Tabu Search 327

20.1.1 Iterative Tabu Search 330

20.2 Scatter Search 331

20.3 Path Relinking 333

References 335

21 Search Based on Human Behaviors 337

21.1 Seeker Optimization Algorithm 337

21.2 Teaching–Learning-Based Optimization 338

21.3 Imperialist Competitive Algorithm 340

21.4 Several Metaheuristics Inspired by Human Behaviors 342

References 345

22 Dynamic, Multimodal, and Constrained Optimizations 347

22.1 Dynamic Optimization 347

22.1.1 Memory Scheme 348

22.1.2 Diversity Maintaining or Reinforcing 348

22.1.3 Multiple Population Scheme 349

22.2 Multimodal Optimization 350

22.2.1 Crowding and Restricted Tournament Selection 351

22.2.2 Fitness Sharing 353

22.2.3 Speciation 354

22.2.4 Clearing, Local Selection, and Demes 356

22.2.5 Other Methods 357

22.2.6 Metrics for Multimodal Optimization 359

22.3 Constrained Optimization 359

22.3.1 Penalty Function Method 360

22.3.2 Using Multiobjective Optimization Techniques 363

References 365

23 Multiobjective Optimization 371

23.2 Multiobjective Evolutionary Algorithms 373

23.2.1 Nondominated Sorting Genetic Algorithm II 374

23.2.2 Strength Pareto Evolutionary Algorithm 2 377

23.2.3 Pareto Archived Evolution Strategy (PAES) 378

23.2.4 Pareto Envelope-Based Selection Algorithm 379

23.2.5 MOEA Based on Decomposition (MOEA/D) 380

23.2.6 Several MOEAs 381

Trang 15

23.2.7 Nondominated Sorting 384

23.2.8 Multiobjective Optimization Based on Differential Evolution 385

23.3 Performance Metrics 386

23.4 Many-Objective Optimization 389

23.4.1 Challenges in Many-Objective Optimization 389

23.4.2 Pareto-Based Algorithms 391

23.4.3 Decomposition-Based Algorithms 393

23.5 Multiobjective Immune Algorithms 394

23.6 Multiobjective PSO 395

23.7 Multiobjective EDAs 398

23.8 Tabu/Scatter Search Based Multiobjective Optimization 399

23.9 Other Methods 400

23.10 Coevolutionary MOEAs 402

References 403

Appendix A: Benchmarks 413

Index 431

Trang 16

ABC Artiﬁcial bee colony

AbYSS Archive-based hybrid scatter search

ACO Ant colony optimization

ADF Automatically deﬁned function

AI Artiﬁcial intelligence

aiNet Artiﬁcial immune network

AIS Artiﬁcial immune system

BBO Biogeography-based optimization

BFA Bacterial foraging algorithm

BMOA Bayesian multiobjective optimization algorithm

CCEA Cooperative coevolutionary algorithm

CLONALG Clonal selection algorithm

CMA Covariance matrix adaptation

C-MOGA Cellular multiobjective GA

COMIT Combining optimizers with mutual information trees algorithmCOP Combinatorial optimization problem

CRO Chemical reaction optimization

CUDA Computer uniﬁed device architecture

DE Differential evolution

DEMO DE for multiobjective optimization

DMOPSO Dynamic population multiple-swarm multiobjective PSO

DNA Deoxyribonucleic acid

DOP Dynamic optimization problem

DSMOPSO Dynamic multiple swarms in multiobjective PSO

DT-MEDA Decision-tree-based multiobjective EDA

EA Evolutionary algorithms

EASEA Easy speciﬁcation of EA

EBNA Estimation of Bayesian networks algorithm

EDA Estimation of distribution algorithm

EGNA Estimation of Gaussian networks algorithm

ELSA Evolutionary local selection algorithm

xix

Trang 17

EPUS-PSO Efﬁcient population utilization strategy for PSO

ES Evolution strategy

FDR-PSO Fitness-distance-ratio-based PSO

G3 Generalized generation gap

GEP Gene expression programming

GPU Graphics processing unit

HypE Hypervolume-based algorithm

IDCMA Immune dominance clonal multiobjective algorithm

IDEA Iterated density-estimation EA

IEC Interactive evolutionary computation

IMOEA Incrementing MOEA

IMOGA Incremental multiple-objective GA

LABS Low autocorrelation binary sequences

LCSS Longest common subsequence

LDWPSO Linearly decreasing weight PSO

LMI Linear matrix inequality

MCMC Markov chain Monte Carlo

meCGA Multiobjective extended compact GA

MIMD Multiple instruction multiple data

MIMIC Mutual information maximization for input clustering

MISA Multiobjective immune system algorithm

MOEA/D MOEA based on decomposition

MOGA Multiobjective GA

MOGLS Multiple-objective genetic local search

mohBOA Multiobjective hierarchical BOA

MOP Multiobjective optimization problem

moPGA Multiobjective parameterless GA

MPMO Multiple populations for multiple objectives

MST Minimum spanning tree

MTSP Multiple traveling salesmen problem

NetKeys Network random keys

NMR Nuclear magnetic resonance

NNIA Nondominated neighbor immune algorithm

NPGA Niched-Pareto GA

NSGA Nondominated sorting GA

opt-aiNet Optimized aiNet

PAES Pareto archived ES

PBIL Population-based incremental learning

PCB Printed circuit board

PCSEA Pareto corner search EA

PCX Parent-centric recombination

PICEA Preference-inspired coevolutionary algorithm

PIPE Probabilistic incremental program evolution

Trang 18

POLE Program optimization with linkage estimation

PSL Peak sidelobe level

PSO Particle swarm optimization

QAP Quadratic assignment problem

QSO Quantum swarm optimization

REDA Restricted Boltzmann machine-based multiobjective EDA

RM-MEDA Regularity model-based multiobjective EDA

SAGA Speciation adaptation GA

SAMC Stochastic approximation Monte Carlo

SDE Shift-based density estimation

SIMD Single instruction multiple data

SPEA Strength Pareto EA

SVLC Synapsing variable-length crossover

TLBO Teaching–learning-based optimization

TOPSIS Technique for order preference similar to an ideal solution

TSP Traveling salesman problem

TVAC Time-varying acceleration coefﬁcients

UMDA Univariate marginal distribution algorithm

UNBLOX Uniform block crossover

VEGA Vector-evaluated GA

VIV Virtual virus

Trang 19

Introduction

This chapter introduces background material on global optimization and the concept

of metaheuritstics Basic definitions of optimization, swarm intelligence, biologicalprocess, evolution versus learning, and no-free-lunch theorem are described Wehope this chapter will arouse your interest in reading the other chapters

1.1 Computation Inspired by Nature

Artificial intelligence (AI) is an old discipline for making intelligent machines.Search is a key concept of AI, because it serves all disciplines In general, thesearch spaces of practical problems are typically so large that excludes the pos-sibility for being enumerated This disables the use of traditional calculus-based andenumeration-based methods Computational intelligence paradigms are initiated forthis purpose, and the approach mainly depends on the cooperation of agents

Optimization is the process of searching for the optimal solution The three searchmechanisms are analytical, enumeration, and heuristic search techniques Analyticalsearch is calculus-based The search algorithms may be guided by the gradient or theHessian of the function, leading to a local minimum solution Random search andenumeration are unguided search methods that simply enumerate the search spaceand exhaustively search for the optimal solution Heuristic search is guided searchthat in most cases produces high-quality solutions

Computational intelligence is a field of AI It investigates adaptive mechanisms

to facilitate intelligent behaviors in complex environments Unlike AI that relies

on knowledge derived from human expertise, computational intelligence dependsupon numerical data collected It includes a set of nature-inspired computationalparadigms Major subjects in computational intelligence include neural networks forpattern recognition, fuzzy systems for reasoning under uncertainty, and evolutionarycomputation for stochastic optimization search

K.-L Du and M.N.S Swamy, Search and Optimization by Metaheuristics,

DOI 10.1007/978-3-319-41192-7_1

1

Trang 20

Nature is the primary source of inspiration for new computational paradigms Forinstance, Wiener’s cybernetics was inspired by feedback control processes observ-able in biological systems Changes in nature, from microscopic scale to ecologicalscale, can be treated as computations Natural processes always reach an equilibriumthat is optimal Such analogies can be used for finding useful solutions for searchand optimization Examples of natural computing paradigms are artificial neuralnetworks [43], simulated annealing (SA) [37], genetic algorithms [30], swarm intel-ligence [22], artificial immune systems [16], DNA-based molecular computing [1],quantum computing [28], membrane computing [51], and cellular automata (vonNeumann 1966).

From bacteria to humans, biological entities have social interaction ranging fromaltruistic cooperation to conflict Swarm intelligence borrows the idea of the collec-tive behavior of biological population Cooperative problem-solving is an approachthat achieves a certain goal by the cooperation of a group of autonomous enti-ties Cooperation mechanisms are common in agent-based computing paradigms,

be biological-based or not Cooperative behavior has inspired researches in biology,economics, and the multi-agent systems This approach is based on the notion of theassociated payoffs from pursuing certain strategies

Game theory studies situations of competition and cooperation between multipleparties The discipline starts with the von Neumann’s study on zero-sum games [48]

It has many applications in strategic warfares, economic or social problems, animalbehaviors, and political voting

Evolutionary computation, DNA computing, and membrane computing are dent on knowledge on the microscopic cell structure of life Evolutionary com-putation evolves a population of individuals by generations, generate offspring bymutation and recombination, and select the fittest to survive each generation DNAcomputing and membrane computing are emerging computational paradigms at themolecular level

depen-Quantum computing is characterized by principles of quantum mechanics, bined with computational intelligence [46] Quantum mechanics is a mathematicalframework or set of rules for the construction of physical theories

com-All effective formal behaviors can be simulated by Turing machines For ical devices used for computational purpose, it is widely assumed that all physicalmachine behaviors can be simulated by Turing machines When a computationalmodel computes the same class of functions as the Turing machine, and potentiallyfaster, it is called a super-Turing model Hypercomputation refers to computationthat goes beyond the Turing limit, and it is in the sense of super-Turing computation.While Deutsch’s (1985) universal quantum computer is a super-Turing model, it is nothypercomputational The physicality of hypercomputational behavior is considered

phys-in [55] from first principles, by showing that quantum theory can be reformulated in

a way that explains why physical behaviors can be regarded as computing something

in standard computational state machine sense

Trang 21

1.2 Biological Processes 3

1.2 Biological Processes

The deoxyribonucleic acid (DNA) is carrier of the genetic information of organisms.Nucleic acids are linear unbranched polymers, i.e., chain molecules, of nucleotides.Nucleotides are divided into purines (adenine - A, guanine - G) and pyrimidines(thymine - T, cytosine - C) The DNA is organized into a double helix structure.Complementary nucleotides (bases) are pitted against each other: A and T, as well

as G and C

The DNA structure is shown in Figure1.1 The double helix, composed of phate groups (triangles) and sugar components (squares), is the backbone of the DNAstructure The double helix is stabilized by two hydrogen bonds between A and T,and three hydrogen bonds between G and C

phos-A sequence of three nucleotides is a codon or triplet With three exceptions,all 43= 64 codons code one of 20 amino acids, and the synonyms code identicalamino acids Proteins are polypeptide chains consisting of the 20 amino acids Anamino acid consists of a carboxyl and an amino group which differs in other groupsthat may also contain the hexagonal benzene molecule The peptide bound of thelong polypeptide chains happens between the amino and the carboxyl group of theneighbored molecule Proteins are the basis modules of all cells and are actors oflife processes They build characteristic three-dimensional structures, e.g., the alphahelix molecule

The human genome is about 3 billion base pairs long that specifies about 20488genes, arranged in 23 pairs of homologous chromosomes All base pairs of the DNAfrom a single human cell have an overall length of 2.6 m, when unraveled andstretched out, but are compressed in the core to size of 200µm Locations on thesechromosomes are referred to as loci A locus which has a specific function is known

as a gene The state of the genes is called the genotype and the observable of the genotype is called the phenotype A genetic marker is a locus with a known DNA

sequence which can be found in each person in the general population

The transformation from genotype to phenotype is called gene expression In the

transcription phase, the DNA is translated into the RNA In the translation phase, theRNA then synthesizes proteins

GA

Trang 22

Figure 1.2 A gene on a

chromosome (Courtesy U.S.

Department of Energy,

Human Genome Program).

Figure1.2displays a chromosome, its DNA makeup, and identifies one gene.The genome directs the construction of a phenotype, especially because the genesspecify sequences of amino acids which, when properly folded, become proteins Thephenotype contains the genome It provides the environment necessary for survival,maintenance, and replication of the genome

Heredity is relevant to information theory as a communication process [5] Theconservation of genomes over intervals at the geological timescale and the existence

of mutations at shorter intervals can be conciliated, assuming that genomes possessintrinsic error-correction codes The constraints incurred by DNA molecules result

in a nested structure Genomic codes resemble modern codes, such as low-densityparity-check (LDPC) codes or turbocodes [5] The high redundancy of genomesachieves good error-correction performance by simple means At the same time,DNA is a cheap material

In AI, some of the most important components comprise the process of memoryformation, filtering, and pattern recognition In biological systems, as in the humanbrain, a model can be constructed of a network of neurons that fire signals withdifferent time sequence patterns for various input signals The unit pulse is called anaction potential, involving a depolarization of the cell membrane and the successiverepolarization to the resting potential The physical basis of this unit pulse is fromactive transport of ions by chemical pumps [29] The learning process is achieved bytaking into account the plasticity of the weights with which the neurons are connected

to one another In biological nervous systems, the input data are first processed locallyand then sent to the central nervous system [33] This preprocessing is partly to avoidoverburdening the central nervous system

The connectionist systems (neural networks) are mainly based on a single like connectionist principle of information processing, where learning and infor-mation exchange occur in the connections In [36], the connectionist paradigm isextended to integrative connectionist learning systems that integrate in their struc-ture and learning algorithms principles from different hierarchical levels of informa-tion processing in the brain, including neuronal, genetic, quantum Spiking neuralnetworks are used as a basic connectionist learning model

Trang 23

brain-1.3 Evolution Versus Learning 5

1.3 Evolution Versus Learning

The adaptation of creatures to their environments results from the interaction of twoprocesses, namely, evolution and learning Evolution is a slow stochastic process

at the population level that determines the basic structures of a species Evolutionoperates on biological entities, rather than on the individuals themselves At theother end, learning is a process of gradually improving an individual’s adaptationcapability to the environment by tuning the structure of the individual

Evolution is based on the Darwinian model, also called the principle of natural selection or survival of the fittest, while learning is based on the connectionist model

of the human brain In the Darwinian evolution, knowledge acquired by an individualduring the lifetime cannot be transferred into its genome and subsequently passed

on to the next generation Evolutionary algorithms (EAs) are stochastic search ods that employ a search technique based on the Darwinian model, whereas neuralnetworks are learning methods based on the connectionist model

meth-Combinations of learning and evolution, embodied by evolving neural networks,have better adaptability to a dynamic environment [39,66] Evolution and learningcan interact in the form of the Lamarckian evolution or be based on the Baldwineffect Both processes use learning to accelerate evolution

The Lamarckian strategy allows the inheritance of the acquired traits during anindividual’s life into the genetic code so that the offspring can inherit its charac-teristics Everything an individual learns during its life is encoded back into thechromosome and remains in the population Although the Lamarckian evolution isbiologically implausible, EAs as artificial biological systems can benefit from theLamarckian theory Ideas and knowledge are passed from generation to generation,and the Lamackian theory can be used to characterize the evolution of human cul-tures The Lamarckian evolution has proved effective within computer applications.Nevertheless, the Lamarckian strategy has been pointed out to distort the population

so that the schema theorem no longer applies [62]

The Baldwin effect is biologically more plausible In the Baldwin effect, learninghas an indirect influence, that is, learning makes individuals adapt better to their envi-ronments, thus increasing their reproduction probability In effect, learning smoothesthe fitness landscape and thus facilitates evolution [27] On the other hand, learninghas a cost, thus there is evolutionary pressure to find instinctive replacements forlearned behaviors When a population evolves a new behavior, in the early phase,there will be a selective pressure in favor of learning, and in the latter phase, therewill be a selective pressure in favor of instinct Strong bias is analogous to instinct,and weak bias is analogous to learning [60] The Baldwin effect only alters the fitnesslandscape and the basic evolutionary mechanism remains purely Darwinian Thus,the schema theorem still applies to the Baldwin effect [59]

A parent cannot pass its learned traits to its offspring, instead only the fitness afterlearning is retained In other words, the learned behaviors become instinctive behav-iors in subsequent generations, and there is no direct alteration of the genotype.The acquired traits finally come under direct genetic control after many genera-

tions, namely, genetic assimilation The Baldwin effect is purely Darwinian, not

Trang 24

Lamarckian in its mechanism, although it has consequences that are similar to those

of the Lamarckian evolution [59] A computational model of the Baldwin effect ispresented in [27]

Hybridization of EAs and local search can be based either on the Lamarckianstrategy or on the Baldwin effect Local search corresponds to the phenotypic plas-ticity in biological evolution The hybrid methods based on the Lamarckian strategyand the Baldwin effect are very successful with numerous implementations

1.4 Swarm Intelligence

The definition of swarm intelligence was introduced in 1989, in the context of cellularrobotic systems [6] Swarm intelligence is a collective intelligence of groups ofsimple agents [8] Swarm intelligence deals with collective behaviors of decentralizedand self-organized swarms, which result from the local interactions of individualcomponents with one another and with their environment [8] Although there isnormally no centralized control structure dictating how individual agents shouldbehave, local interactions among such agents often lead to the emergence of globalbehavior

Most species of animals show social behaviors Biological entities often engage

in a rich repertoire of social interaction that could range from altruistic cooperation

to open conflict The well-known examples for swarms are bird flocks, herds ofquadrupeds, bacteria molds, fish schools for vertebrates, and the colony of socialinsects such as termites, ants, bees, and cockroaches, that perform collective behavior.Through flocking, individuals gain a number of advantages, such as having reducedchances of being captured by predators, following migration routes in a precise androbust way through collective sensing, having improved energy efficiency during thetravel, and the opportunity of mating

The concept of individual–organization [57] has been widely used to understandcollective behavior of animals The principle of individual–organization indicatesthat simple repeated interactions between individuals can produce complex behav-ioral patterns at group level [57] The agents of these swarms behave without super-vision and each of these agents has a stochastic behavior due to its perception from,and also influence on, the neighborhood and the environment The behaviors can

be accurately described in terms of individuals following simple sets of rules Theexistence of collective memory in animal groups [15] establishes that the previoushistory of the group structure influences the collective behavior in future stages.Grouping individuals often have to make rapid decisions about where to move

or what behavior to perform, in uncertain or dangerous environments Groups areoften composed of individuals that differ with respect to their informational status,and individuals are usually not aware of the informational state of others Someanimal groups are based on a hierarchical structure according to a fitness principleknown as dominance The top member of the group leads all members of that group,e.g., in the cases of lions, monkeys, and deer Such animal behaviors lead to stable

Trang 25

1.4 Swarm Intelligence 7

groups with better cohesion properties among individuals [9] Some animals, likebirds, fishes and sheep droves, live in groups but have no leader This type of animalshas no knowledge about their group and environment Instead, they can move in theenvironment via exchanging data with their adjacent members

Different swarm intelligence systems have inspired several approaches, includingparticle swarm optimization (PSO) [21], based on the movement of bird flocks andfish schools; the immune algorithm by the immune systems of mammals; bacteria for-aging optimization [50], which models the chemotactic behavior of Escherichia coli;ant colony optimization (ACO) [17], inspired on the foraging behavior of ants; andartificial bee colony (ABC) [35], based on foraging behavior of honeybee swarms.Unike EAs, which are primarily competitive among the population, PSO andACO adopt a more cooperative strategy They can be treated as ontogenetic, sincethe population resembles a multicellular organism optimizing its performance byadapting to its environment

Many population-based metaheuristics are actually social algorithms Culturalalgorithm [53] is introduced for modeling social evolution and learning Ant colonyoptimization is a metaheuristic inspired by ant colony behavior in finding the short-est path to reach food sources Particle swarm optimization is inspired by socialbehavior and movement dynamics of insect swarms, bird flocking, and fish school-ing Artificial immune system is inspired by biological immune systems, and exploittheir characteristics of learning and memory to solve optimization problems Societyand civilization method [52] utilizes the intra and intersociety interactions within asociety and the civilization model

1.4.1 Group Behaviors

In animal behavioral ecology, group living is a widespread phenomenon Animalsearch behavior is an active movement by which an animal attempts to find resourcessuch as food, mates, oviposition, or nesting sites In nature, group members oftenhave different search and competitive abilities Subordinates, who are less efficientforagers than the dominant, will be dispersed from the group Dispersed animalsmay adopt ranging behavior to explore and colonize new habitats

Group search usually adopts two foraging strategies within the group: producing(searching for food) and joining (scrounging) Joining is a ubiquitous trait found inmost social animals such as birds, fish, spiders, and lions In order to analyze theoptimal policy for joining, two models for joining are information-sharing [13] andproducer–scrounger [4] Information-sharing model assumes that foragers searchconcurrently for their own resource while searching for opportunities to join Inproducer–scrounger model, foragers are assumed to use producing (finding) or join-ing (scrounging) strategies exclusively; they are divided into leaders and followers.For the joining policy of ground-feeding birds, producer–scrounger model ismore plausible than information-sharing model In producer–scrounger model, threebasic scrounging strategies are observed in house sparrows (Passer domesticus):area copying—moving across to search in the immediate area around the producer,

Trang 26

following—following another animal around without exhibiting any searching ior, and snatching—taking a resource directly from the producer.

behav-The organization of collective behaviors in social insects can be understood as acombination of the four functions of organization: coordination, cooperation, deliber-ation, and collaboration [3] The coordination function regulates the spatio-temporaldensity of individuals, while the collaboration function regulates the allocation oftheir activities The deliberation function represents the mechanisms that supportthe decisions of the colony, while the cooperation function represents the mecha-nisms that overstep the limitations of the individuals Together, the four functions oforganization produce solutions to the colony problems

The extracted general cooperative group behaviors, search strategies, and munication methods are useful within a computing context [3]

com-• Cooperation and group behavior

Cooperation among individuals of the same or different species must benefit thecooperators, whether directly or indirectly Socially, the group may be individualsworking together for mutual benefit, or individuals each with their own specializedrole Competition for the available resources may restrict the size of the group

• Search strategies

The success of a species depends on many factors, including its ability to searcheffectively for resources, such as food and water, in a given environment Searchstrategies can be broadly divided into sit and wait (for ambush) and foraging widely(for active searchers) Compared to the latter, the former has a lower opportunity

to get food, but with a low energy consumption

observed as symbiosis, host–parasite systems, and prey–predator systems, in which

two organisms mutually support each other, one exploits the other, or they fightagainst each other For instance, symbiosis between plants and fungi are very com-mon, where the fungus invades and lives among the cortex cells of the secondary rootsand, in turn, helps the host plant absorb minerals from the soil Cleaning symbiosis

is common in fish

1.4.2 Foraging Theory

Natural selection has a tendency to eliminate animals having poor foraging strategiesand favor the ones with successful foraging strategies to propagate their genes After

Trang 27

of animals cooperatively forage.

Some animals forage as individuals and others forage as groups with a type ofcollective intelligence While an animal needs communication capabilities to performsocial foraging, it can exploit essentially the sensing capabilities of the group Thegroup can catch large prey, individuals can obtain protection from predators while

in a group

In general, a foraging strategy involves finding a patch of food, deciding whether

to proceed and search for food, and when to leave the patch There are predators andrisks, energy required for travel, and physiological constraints (sensing, memory,cognitive capabilities) Foraging scenarios can be modeled and optimal policies can

be found using dynamic programming Search and optimal foraging decision-making

of animals can be one of three basic types: cruise (e.g., tunafish and hawks), saltatory(e.g., birds, fish, lizards, and insects), and ambush (e.g., snakes and lions) In cruisesearch, an animal searches the perimeter of a region; in an ambush, it sits and waits;

in saltatory search, an animal typically moves in some directions, stops or slowsdown, looks around, and then changes direction over a whole region

1.5 Heuristics, Metaheuristics, and Hyper-Heuristics

Many real-life optimization problems are difficult to solve by exact optimizationmethods, due to properties, such as high dimensionality, multimodality, epistasis(parameter interaction), and non-differentiability Hence, approximate algorithms are

an alternative approach for these problems Approximate algorithms can be

decom-posed into heuristics and metaheuristics The words meta and heuristic both have their origin in the old Greek: meta means upper level, and heuristic denotes the art

of discovering new strategies [58]

Heuristic refers to experience-based techniques for problem-solving and learning

It gives a satisfactory solution in a reasonable amount of computational time, whichmay not be optimal Specific heuristics are problem-dependent and designed onlyfor the solution of a particular problem Examples of this method include using a rule

of thumb, an educated guess, an intuitive judgment, or even common sense Manyalgorithms, either exact algorithms or approximation algorithms, are heuristics.The term metaheuristic was coined by Glover in 1986 [25] to refer to a set ofmethodologies conceptually ranked above heuristics in the sense that they guidethe design of heuristics A metaheuristic is a higher level procedure or heuristicdesigned to find, generate, or select a lower level procedure or heuristic (partialsearch algorithm) that may provide a sufficiently good solution to an optimization

Trang 28

problem By searching over a large set of feasible solutions, metaheuristics can oftenfind good solutions with less computational effort than calculus-based methods, orsimple heuristics, can.

Metaheuristics can be single-solution-based or population-based Single-solutionbased metaheuristics are based on a single solution at any time and compriselocal search-based metaheuristics such as SA, Tabu search, iterated local search[40,42], guided local search [61], pattern search or random search [31], Solis–Wetsalgorithm [54], and variable neighborhood search [45] In population-based meta-heuristics, a number of solutions are updated iteratively until the termination condi-tion is satisfied Population-based metaheuristics are generally categoried into EAsand swarm-based algorithms Single-solution-based metaheuristics are regarded to

be more exploitation-oriented, whereas population-based metaheuristics are moreexploration-oriented

The idea of hyper-heuristics can be traced back to the early 1960s [23] heuristics can be thought of as heuristics to choose heuristics or as search algorithmsthat explore the space of problem solvers A hyper-heuristic is a heuristic searchmethod that seeks to automate the process of selecting, combining, generating, oradapting several simpler heuristics to efficiently solve hard search problems The low-level heuristics are simple local search operators or domain-dependent heuristics,which operate directly on the solution space for a given problem instance Unlikemetaheuristics that search in a space of problem solutions, hyper-heuristics alwayssearch in a space of low-level heuristics

Hyper-Heuristic selection and heuristic generation are currently the two main ologies in hyper-heuristics In the first method, the hyper-heuristic chooses heuristicsfrom a set of known domain-dependent low-level heuristics In the second method,the hyper-heuristic evolves new low-level heuristics by utilizing the components

method-of the existing ones Hyper-heuristics can be based on genetic programming [11]

or grammatical evolution [10], which becomes an excellent candidate for heuristicgeneration

Several Single-Solution-Based Metaheuristics

Search strategies that randomly generate initial solutions and perform a local searchare also called multi-start descent search methods However, to randomly create aninitial solution and perform a local search often results in low solution quality as thecomplete search space is uniformly searched and search cannot focus on promisingareas of the search space

Variable neighborhood search [45] combines local search strategies with dynamicneighborhood structures subject to the search progress The local search is an inten-sification step focusing the search in the direction of high-quality solutions Diver-sification is a result of changing neighborhoods By changing neighborhoods, themethod can easily escape from local optima With an increasing cardinality of theneighborhoods, diversification gets stronger as the shaking steps can choose from alarger set of solutions and local search covers a larger area of the search space.Guided local search [61] uses a similar principle and dynamically changes thefitness landscape subject to the progress that is made during the search so that local

Trang 29

1.5 Heuristics, Metaheuristics, and Hyper-Heuristics 11

search can escape from local optima The neighborhood structure remains constant

It starts from a random solution x0and performs a local search returning the local

optimum x1 To escape the local optimum, a penalty is added to the fitness function

f such that the resulting fitness function h allows local search to escape A new local

search is started from x1using the modified fitness function h Search continues until

a termination criterion is met

Iterated local search [40,42] connects the unrelated local search phases as it createsinitial solutions not randomly but based on solutions found in previous local searchruns If the perturbation steps are too small, the search cannot escape from a localoptimum If perturbation is too strong, the search has the same behavior as multi-startdescent search methods The modification step as well as the acceptance criterioncan depend on the search history

1.6 Optimization

Optimization can generally be categorized into discrete or continuous optimization,depending on whether the variables are discrete or continuous ones There may belimits or constraints on the variables Optimization can be a static or a dynamicproblem depending upon whether the output is a function of time Traditionally,optimization is solved by calculus-based method, or based on random search, orenumerative search Heuristics-based optimization is the topic treated in this book.Optimization techniques can generally be divided into derivative methods andnonderivative methods, depending on whether or not derivatives of the objectivefunction are required for the calculation of the optimum Derivative methods arecalculus-based methods, which can be either gradient search methods or second-order methods These methods are local optimizers The gradient descent is also

known as steepest descent It searches for a local minimum by taking steps along

the negative direction of the gradient of the function Examples of second-ordermethods are Newton’s method, the Gauss-Newton method, quasi-Newton methods,the trust-region method, and the Levenberg-Marquardt method Conjugate gradientand natural gradient methods can also be viewed as reduced forms of the quasi-Newton method

Derivative methods can also be classified into model-based and metric-based

methods Model-based methods improve the current point by a local approximatingmodel Newton and quasi-Newton methods are model-based methods Metric-basedmethods perform a transformation of the variables and then apply a gradient searchmethod to improve the point The steepest-descent, quasi-Newton, and conjugategradient methods belong to this latter category

Methods that do not require gradient information to perform a search and

sequen-tially explore the solution space are called direct search methods They maintain

a group of points They utilize some sort of deterministic exploration methods tosearch the space and almost always utilize a greedy method to update the maintained

Trang 30

Figure 1.3 The landscape of Rosenbrock function f (x) with two variables x1, x2 ∈

[−204.8, 204.8] The spacing of the grid is set as 1 There are many local minima, and the global

a local optimum When operating on continuous space, it is called gradient ascent.

Other nonderivative search methods include univariant search parallel to an axis (i.e.,coordinate search method), sequential simplex method, and acceleration methods indirect search such as the Hooke-Jeeves method, Powell’s method and Rosenbrock’smethod Interior-point methods represent state-of-the-art techniques for solving lin-ear, quadratic, and nonlinear optimization programs

Example 1.1: The Rosenbrock function

to the two-dimensional case (n = 2), with x1, x2∈ [−204.8, 204.8] The landscape

of this function is shown in Figure1.3

1.6.1 Lagrange Multiplier Method

The Lagrange multiplier method can be used to analytically solve continuous tion optimization problem subject to equality constraints [24] By introducing the

Trang 31

func-1.6 Optimization 13

Lagrangian formulation, the dual problem associated with the primal problem isobtained, based on which the optimal values of the Lagrange multipliers can befound

Let f (x) be the objective function and h i (x) = 0, i = 1, , m, be the constraints.

The Lagrange function can be constructed as

where λ i , i = 1, , m, are called the Lagrange multipliers.

The constrained optimization problem is converted into an unconstrained

opti-mization problem: Optimize L (x; λ1, , λ m ) By setting

∂

∂x L (x; λ1, , λ m ) = 0, (1.2)

∂

∂λ i L (x; λ1, , λ m ) = 0, i = 1, , m, (1.3)

and solving the resulting set of equations, we can obtain the x position at the extremum

of f (x) under the constraints.

To deal with constraints, the Karush-Kuhn-Tucker (KKT) theorem, as a alization to the Lagrange multiplier method, introduces a slack variable into eachinequality constraint before applying the Lagrange multiplier method The conditions

gener-derived from the procedure are known as the KKT conditions [24]

1.6.2 Direction-Based Search and Simplex Search

In direct search, generally the gradient information cannot be obtained; thus, it isimpractical to implement a step in the negative gradient direction for a minimumproblem However, when the objectives of a group of solutions are available, thebest one can guide the search direction of the other solutions Many direction-basedsearch methods and EAs are inspired by this intuitive idea

Some of the direct search methods use improvement direction information tosearch the objective space Thus, it is useful to embed these directions into an EA aseither a local search method or an exploration operator

Simplex search [47], introduced by Nelder and Mead in 1965, a well-known ministic direction-based search method MATLAB contains a direct search toolboxbased on simplex search Scatter search [26] includes the elitism mechanism intosimplex search Like simplex search, for a group of points, the algorithm finds newpoints, accepts the better ones, and discards the worse ones Differential evolution(DE) [56] uses the directional information from the current population The mutationoperator of DE needs three randomly selected different individuals from the currentpopulation for each individual to form a simplex-like triangle

Trang 32

deter-Simplex Search

Simplex search is a group-based deterministic local search method capable of ing the objective space very fast Thus many EAs use simplex search as a local searchmethod after mutation

explor-A simplex is a collection of n + 1 points in n-dimensional space In an optimization problem involving n variables, simplex method searches for an optimization solution

by evaluating a set of n+ 1 points The method continuously forms new simplices

by replacing the point having the worst performance in a simplex with a new point.The new point is generated by reflection, expansion, and contraction operations

In a multidimensional space, the subtraction of two vectors means a new vector

starting at one vector and ending at the other, like x2− x1 We often refer to thesubtraction of two vectors as a direction Addition of two vectors can be implemented

in a triangular way, moving the start of one vector to the end of the other to form

another vector The expression x3+ (x2− x1) can be regarded as the destination of

a moving point that starts at x3and has a length and direction of x2− x1

For every new simplex, several points are assigned according to their objectivevalues Then simplex search repeats reflection, expansion, contraction, and shrink in

a very efficient and deterministic way Vertices of the simplex will move toward theoptimal point and the simplex will become smaller and smaller Stop criteria can beselected as a predetermined number of maximal iterations, the length of the edge orthe improving rate of B

Simplex search for minimization is shown in Algorithm 1.1 The coefficients forthe reflection, expansion, contraction, and shrinking operations are typically selected

as α = 1, β = 2, γ = −1/2, and δ = 1/2 The initial simplex is important The

search may easily get stuck for too small an initial simplex This simplex should beselected depending on the nature of the problem

1.6.3 Discrete Optimization Problems

The discrete optimization problem is also known as combinatorial optimization lem (COP) Any problem that has a large set of discrete solutions and a cost function

prob-for rating those solutions relative to one another is a COP COPs are known to beNP-complete.1 The goal for COPs is to find an optimal solution or sometimes anearly optimal solution In COPs, the number of solutions grows exponentially with

the size of the problem n at O (n!) or O (e n ) such that no algorithm can find the global

minimum solution in a polynomial computational time

Definition 1.1 (Discrete optimization problem) A discrete optimization problem

is denoted as (X , f , ), or as minimizing the objective function

min f (x), x ∈ X , subject to , (1.4)

1 Namely, nondeterministic polynomial-time complete.

Trang 33

a Find the worst and best individuals as x h and x l.

Calculate the centroid of all x i ’s, i = h, as x.

b Enter reflection mode:

until termination condition is satisfied.

where X ⊂ R N is the search space defined over a finite set of N discrete decision

variables x = (x1, x2, , x N ) T , f : X → R, is the set of constraints on x Space

X is constructed according to all the constraints imposed on the problem.

Definition 1.2 (Feasible solution) A vector x that satisfies the set of constraints for

an optimization problem is called a feasible solution.

Traveling salesman problem (TSP) is perhaps the most famous COP Given a set

of points, either nodes on a graph or cities on a map, find the shortest possible tourthat visits every point exactly once and then returns to its starting point There are

(n − 1)!/2 possible tours for an n-city TSP TSP arises in numerous applications,

from routing of wires on a printed circuit board (PCB), VLSI circuit design, to fastfood delivery

Multiple traveling salesmen problem (MTSP) generalizes TSP using more than

one salesman Given a set of cities and a depot, m salesmen must visit all cities

according to the constraints that the route formed by each salesman must start andend at the depot, that each intermediate city must be visited once and by a singlesalesman, and that the cost of the routes must be minimum TSP with a time window

is a variant of TSP in which each city is visited within a given time window.The vehicle routing problem concerns the transport of items between depots andcustomers by means of a fleet of vehicles It can be used for logistics and public

Trang 34

services, such as milk delivery, mail or parcel pick-up and delivery, school busrouting, solid waste collection, dial-a-ride systems, and job scheduling Two well-known routing problems are TSP and MTSP.

The location-allocation problem is defined as follows Given a set of facilities,each of which serves a certain number of nodes on a graph, the objective is to placethe facilities on the graph so that the average distance between each node and itsserving facility is minimized

1.6.4 P, NP, NP-Hard, and NP-Complete

An issue related to the efficiency and efficacy of an algorithm is how hard the problemitself is The optimization problem is first transformed into a decision problem

Problems that can be solved using a polynomial-time algorithm are tractable A polynomial-time algorithm has an upper bound O (n k ) on its running time, where k is

a constant and n is the problem size (input size) Usually, tractable problems are easy

to solve as running time increases relatively slowly with n In contrast, problems are intractable if they cannot be solved by a polynomial-time algorithm and there is a

lower bound on the running time which is(k n ), where k > 1 is a constant and n is

the input size

The complexity class P (standing for polynomial time complexity) is defined asthe set of decision problems that can be solved by a deterministic Turing machineusing an algorithm with worst-case polynomial time complexity P problems areusually easy as there are algorithms that solve them in polynomial time

The class NP (standing for nondeterministic polynomial time complexity) is theset of all decision problems that can be verified by a nondeterministic Turing machineusing a nondeterministic algorithm in worst-case polynomial time Although nonde-terministic algorithms cannot be executed directly on conventional computers, thisconcept is important and helpful for the analysis of the computational complexity

of problems All problems in P also belong to the class NP, i.e., P ⊆ NP There are

also problems where correct solutions cannot be verified in polynomial time.All decision problems in P are tractable Those problems that are in NP, but not in

P, are difficult as no polynomial-time algorithms exist for them There are problems

in NP where no polynomial algorithm is available and which can be transformed into

one another with polynomial effort A problem is said to be NP-hard, if an algorithm

for solving this problem is polynomial-time reducible to an algorithm that is able tosolve any problem in NP Therefore, NP-hard problems are at least as hard as anyother problem in NP, and are not necessarily in NP

The set of NP-complete problems is a subset of NP [14] A decision problem A is said to be NP-complete, if A is in NP and A is also NP-hard NP-complete problems

are the hardest problems in NP They all have the same complexity They are difficult

as no polynomial-time algorithms are known Decision problems that are not in NPare even more difficult The relationship between all these classes is illustrated inFigure1.4

Trang 35

Practical COPs are all NP-complete or NP-hard Right now, no algorithm withpolynomial time complexity can guarantee that an optimal solution will be found.

1.6.5 Multiobjective Optimization Problem

A multiobjective optimization problem (MOP) requires finding a variable vector x

in the domainX that optimizes the objective vector f (x).

Definition 1.3 (Multiobjective optimization problem) MOP is to optimize a

sys-tem with k conflicting objectives

Conflicting objectives will be the case where increasing the quality of one objectivetends to simultaneously decrease the quality of another objective The solution to

an MOP is not a single optimal solution, but a set of solutions representing the besttrade-offs among the objectives

In order to optimize a system with conflicting objectives, the weighted sum ofthese objectives is usually used as the compromise of the system

Trang 36

The Pareto method is a popular method for multiobjective optimization It is based

on the principle of nondominance The Pareto optimum gives a set of solutions forwhich there is no way of improving one criterion without deteriorating anothercriterion In MOPs, the concept of dominance provides a means by which multiplesolutions can be compared and subsequently ranked

Definition 1.4 (Pareto dominance) A variable vector x1∈ R n is said to dominate

another vector x2∈ R n , denoted x1 2, if and only if x1is better than or equal to

x2in all attributes, and strictly better in at least one attribute, i.e., ∀i: f i (x1) ≥ f i (x2)

∧∃j: f j (x1) > f j (x2).

For two solutions x1, x2, if x1 is better in all objectives than x2, x1 is said to

strongly dominate x2 If x1is not worse than x2in all objectives and better in at least

one objective, x1is said to dominate x2 A nondominated set is a set of solutions thatare not weakly dominated by any other solution in the set

Definition 1.5 (Nondominance) A variable vector x1∈ X ⊂ R n is nondominated with respect to X , if there does not exist another vector x2∈ X such that x2≺ x1.

Definition 1.6 (Pareto optimality) A variable vector x∗∈ F ⊂ R n (F is the sible region) is Pareto optimal if it is nondominated with respect to F.

fea-Definition 1.7 (Pareto optimal frontier) The Pareto optimal frontier P∗is defined

by the space in R n formed by all Pareto optimal solutions P∗= {x ∈ F|x

optimiza-so that no regions are left unexplored

An illustration of Pareto optimal solutions for a two-dimensional problem withtwo objectives is given in Figure1.5 The upper border from points A to B of thedomainX , denoted P∗, contains all Pareto optimal solutions The frontier from points

f A to f Balong the lower border of the domainY, denoted PF∗, contains all Pareto

frontier in the objective space For two points a and b, their mapping f dominates f ,

Trang 37

Figure 1.5 An illustration of Pareto optimal solutions for a two-dimensional problem with two

objectives.X ⊂ R n is the domain of x, and Y ⊂ R m is the domain of f (x).

Y

*

f f

Figure 1.6 Different Pareto fronts a Convex b Concave c Discontinuous.

denoted f a ≺ f b Hence, the decision vector x ais a nondominated solution Figure1.6illustrates that Pareto fronts can be convex, concave, or discontinuous

Definition 1.9 ( ε-dominance) A variable vector x1∈ R n is said to ε-dominate

another vector x2∈ R n , denoted x1 ε x2, if and only if x1 is better than or equal to εx2 in all attributes, and strictly better in at least one attribute, i.e., ∀i:

f i (x1) ≥ f i (εx2) ∧∃j: f j (x1) > f j (εx2) [69].

If ε = 1, ε-dominance is the same as Pareto dominance; otherwise, the area

dom-inated by x i is enlarged or shrunk Thus, ε-dominance relaxes the area of Pareto dominance by a factor of ε.

1.6.6 Robust Optimization

The robustness of a particular solution can be confirmed by resampling or by reusingneighborhood solutions Resampling is reliable, but computationally expensive In

Trang 38

contrast, the method of reusing neighborhood solutions is cheap but unreliable Aconfidence measure increases the reliability of the latter method In [44], confidence-based operators are defined for robust metaheuristics The confidence metric and fiveconfidence-based operators are employed to design confidence-based robust PSOand confidence-based robust GA History can be utilized in helping to estimate theexpected fitness of an individual to produce more robust solutions in EAs.

Confidence metric defines the confidence level of a robust solution The highestconfidence is achieved when there are a large number of solutions available withgreatest diversity within a suitable neighborhood around the solution in the parameterspace Mathematically, confidence is expressed by [44]

Overall Performance Indicators

The overall performance indicators provide a general description for the mance Overall performance can be compared according to their efficacy, efficiency,and reliability on a benchmark problem with many runs

perfor-Efficacy evaluates the quality of the results without caring about the speed of an

algorithm Mean best fitness (MBF) is defined as the average of the best fitness in the

last population over all runs The best fitness values thus far can be used as a moreabsolute measure for efficacy

Reliability indicates the extent to which the algorithm can provide acceptable

results Success rate (SR) is defined as the percentage of runs terminated with success.

A successful run is defined as the difference between the best fitness value in the last

generation f∗and a predefined value f o under a predefined threshold ε.

Efficiency requires finding the global optimal solution rapidly Average number

of evaluations to a solution (AES) is defined as the average number of evaluations

it takes for the successful runs If an algorithm has no successful runs, its AES isundefined

Low SR and high MBF may indicate that the algorithm converges slowly, whilehigh SR and low MBF may indicate that the algorithm is basically reliable, but mayprovide very bad results accidentally It is desirable to have smaller AES and larger

SR, thus small AES/SR criterion considers reliability and efficiency at the same time

Trang 39

1.7 Performance Indicators 21

Evolving Performance Indicators

Several generation-based evolving performance indicators can provide more detailedinformation

• Best-so-far (BSF) records the best solution found by the algorithm thus far for

each generation in every run BSF index is monotonic

• Best-of-current-population (BCP) records the best solution in each generation in

every run MBF is the average of final BCP or final BSF over multiple runs

• Average-of-current-population (ACP) records the average solution in each

gener-ation in every run

• Worst-of-current-population (WCP) records the worst solution in each generation

in every run

After many runs with random initial setting, we can draw conclusions on an rithm by applying statistical descriptions, e.g., statistical visualization, descriptivestatistics, and statistical inference

algo-Statistical visualization uses graphs to describe and compare algorithms The boxplot is widely used for this purpose Suppose we run an algorithm on a problem 100times and get 100 values of a performance indicator We can rank the 100 numbers

in ascending order On each box, the central mark is the median, the lower and upperedges are the 25th and 75th percentiles, the whiskers extend to the most extremedata points not considered outliers, and outliers are plotted individually by+ Theinterquartile range (IQR) is between the lower and upper edges of the box Anydata that lie more than 1.5IQR lower than the lower quartile or 1.5IQR higher thanthe higher quartile is considered an outlier Two lines called whiskers are plotted toindicate the smallest number that is not a lower outlier and the largest number that

is not a higher outlier The default 1.5IQR corresponds to approximately±2.7σ and

99.3 coverage if the data are normally distributed

The box plot for BSF performance of two algorithms is illustrated in Figure1.7.Algorithm 2 has a larger median BDF and a smaller IQR, that is, better average

−4

−2

0 2 4 6

Algorithm 1 Algorithm 2

Figure 1.7 Box plot of the BSF of two algorithms.

Trang 40

performance along with smaller variance, thus it outperforms algorithm 1 Also, forthe evolving process of many runs, the convergence graph illustrating the perfor-mance over number of fitness evaluation (NOFE) is quite useful.

Graphs are easy to understand When the difference between different algorithms

is small, one has to calculate specific numbers to describe and compare the mance The most often used descriptive statistics are mean and variance (or standarddeviation) of all performance indicators and compare them Statistical inferenceincludes parameter estimation, hypothesis testing, and many other techniques

perfor-1.8 No Free Lunch Theorem

Before no free lunch theorem [63,64] was proposed in 1995, people intuitivelybelieved that there exists some universally beneficial algorithms for search, andmany people actually made efforts to design some algorithms No free lunch theoremasserts that there is no universally beneficial algorithm

The original no free lunch theorem for optimization states that no search algorithm

is better than another in locating an extremum of a function when averaged over theset of all possible discrete functions That is, all search algorithms achieve the sameperformance as random enumeration, when evaluated over the set of all functions

Theorem 1.1 (No free lunch theorem) Given the set of all functions F and a set

of benchmark functions F1, if algorithm A1is better on average than algorithm A2

on F1, then algorithm A2must be better than algorithm A1on F \ F1.

When there is no structural knowledge at all, all algorithms have equal mance No free lunch theorem is feasible for non-revisiting algorithms with noproblem-specific knowledge It seems to be true because of deceptive functions andrandom functions Deceptive functions lead a hill-climber away from the optimum.For random functions, search for optimum is totally at no where For the two classes

perfor-of functions, it is like finding a needle in a haystack

No free lunch theorem is concerned with the average performance for solvingall problems In applications, such a scenario is hardly ever realistic since there isalmost always some knowledge about typical solutions Practical problems alwayscontain priors such as smoothness, symmetry, and i.i.d samples The performance

of any algorithm is determined by the knowledge concerning the cost function Thus,

it is meaningless to evaluate the performance of an algorithm without specifying theprior knowledge Thus, developing search algorithms actually builds special-purposemethods to solve application-specific problems For example, there are potentiallyfree lunches for coevolutionary approaches [65]

No free lunch theorem was later extended to coding methods, crossvalidation [67],early stopping [12], avoidance of overfitting, and noise prediction [41] Again, it hasbeen asserted that no one method is better than the others for all problems

Định dạng
Số trang	437
Dung lượng	6,28 MB