Multifactorial evolutionary algorithms for clustered minimum routing cost tree problems in the multi domain network

HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGYMASTER THESIS Multifactorial Evolutionary Algorithms for Clustered Minimum Routing Cost Tree Problems in the Multi-domain Network TA BAO THANG

Trang 1

HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY

MASTER THESIS

Multifactorial Evolutionary Algorithms for Clustered Minimum Routing Cost Tree Problems in the Multi-domain Network

TA BAO THANG

Data science and Artificial intelligence

Supervisor: Assoc Prof Huynh Thi Thanh Binh

HA NOI, 2022

Trang 2

HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY

MASTER THESIS

Multifactorial Evolutionary Algorithms for Clustered Minimum Routing Cost Tree Problems in the Multi-domain Network

TA BAO THANG

Data science and Artificial intelligence

Supervisor: Assoc Prof Huynh Thi Thanh Binh

School: School of Information and Communication Technology

HA NOI, 2022

Supervisor’s Signature

Trang 3

CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM

Độc lập – Tự do – Hạnh phúc

BẢN XÁC NHẬN CHỈNH SỬA LUẬN VĂN THẠC SĨ

Họ và tên tác giả luận văn: Tạ Bảo Thắng

Đề tài luận văn:

Tiếng Việt: Giải thuật tiến hóa đa nhân tố giải bài toán cây phân cụm có chi phí định tuyến nhỏ nhất trên mạng đa miền

Tiếng Anh: Multifactorial Evolutionary Algorithms for Clustered Minimum Routing Cost Tree Problems in the Multi-domain Network

Chuyên ngành: Khoa học dữ liệu và Trí tuệ nhân tạo

Mã số SV: 20202647M

Tác giả, Người hướng dẫn khoa học và Hội đồng chấm luận văn xác nhận tác giả đã sửa chữa, bổ sung luận văn theo biên bản họp Hội đồng ngày 28/04/2022 với các nội dung sau:

- Sửa lại các lỗi chính tả, hành văn, ngữ pháp, ký hiệu trong luận văn

- Bổ xung thông tin về độ lệch chuẩn của kết quả thực nghiệm

Ngày 26 tháng 05 năm 2022

CHỦ TỊCH HỘI ĐỒNG

Trang 4

Declaration of Authorship and Topic Sentences

1 Personal information

Full name: Ta Bao Thang

Phone number: +84344277998

Email: thang.tb202647M@sis.hust.edu.vn

Major: Data Science and Artificial Intelligence

2 Topic: Multifactorial evolutionary algorithms for clustered minimum routingcost tree problems in the multi-domain network

3 Contributions

• Develop a new encoding and decoding scheme for two clustered tree lems: Clustered Minimum Routing Cost Tree (CluMRCT) and ClusteredShortest Path Tree (CluSPT) The proposed method allows evolutionaryalgorithms to function on complete and sparse graphs

prob-• Design efficient multifactorial evolutionary algorithms to solve CluSPTand CluMRCT problems simultaneously

• Evaluate the efficiency of the proposed algorithms and encoding methods

on various instances The results proved that the proposed methodsoutperformed all existing approaches in terms of solution quality andconvergence trend

4 Declaration of Authorship

I declare that my thesis, titled ”Multifactorial Evolutionary Algorithms forClustered Minimum Routing Cost Tree Problems in the Multi-domain Net-work”, is the work of myself and my supervisor Associate Professor HuynhThi Thanh Binh All papers, sources, tables used in this thesis have beenthoroughly cited

5 Supervisor confirmation

Hanoi, April 2022Supervisor

Associate Professor Huynh Thi Thanh Binh

Trang 5

As-in the near future I am also grateful to my friends who assisted me As-in improvAs-ingthe quality of my thesis.

Finally, I would like to thank Vingroup JSC, the Vingroup Innovation tion, and the School of Information and Communication Technology (SoICT) forsupporting my studies during the Master’s program I was funded by Vingroup JSCand supported by the Master, Ph.D Scholarship Programme of Vingroup Innova-tion Foundation (VINIF), Institute of Big Data, code VINIF.2020.ThS.BK.01 andVINIF.2021.ThS.BK.01 for two years, 2021 and 2022 These supports allow me toentirely focus on my scientific research

Trang 7

Real-world network architectures seldom exist in isolation Many of them are ther repetitive or share domain-specific similarities A good network architecturefor one system can also be helpful for another Therefore, knowledge drawn fromsolving previous network design problems may be reused to solve new problemsmore quickly and efficiently Meanwhile, traditional optimization algorithms oftensolve only one problem at a time from scratch and assume zero prior knowledgeabout these problems at hand It makes the capabilities of solvers not automat-ically grow with experience This thesis proposes multitasking evolutionary algo-rithms to solve multiple clustered tree problems in multi-domain networks simul-taneously The thesis focuses on two clustered tree problems: Clustered ShortestPath Tree (CluSPT) and Clustered Minimum Routing Cost Tree (CluMRCT) Bothare NP-Hard and representative cases of Client-Sever and Peer-to-Peer topologies

ei-in multi-domaei-in networks, respectively The proposed algorithms help reduce thetotal time for optimization completion and facilitate online knowledge transfers be-tween problems during the optimization process, thereby yielding superior results

to traditional single-task optimization methods

Keywords: Evolutionary Algorithms, Multitasking Evolutionary Algorithm,Multifactorial Evolutionary Algorithm, Clustered Tree Problems

Author

Ta Bao Thang

Trang 9

1.1 Overview of Meta-heuristic Algorithm 1

1.2 Multifactorial Evolutionary Algorithm 3

2 Problem Formulation 7 2.1 Problem formulation 7

2.2 Related Works 9

3 Multitask Algorithm for Clustered Shortest Path Tree 11 3.1 Individual Encoding 11

3.2 Individual Decoding 13

3.3 Repairing Method 15

3.4 Algorithmic Structure 17

3.4.1 Unified search space 17

3.4.2 Individual Initialization Method 20

3.4.3 Crossover Operator 20

3.4.4 Mutation Operator 21

3.4.5 Mapping Individual Method 21

4 Multitask Algorithm for Clustered Minimum Routing Cost Tree 23 4.1 Individual Encoding 23

4.2 Individual Decoding 24

4.3 Algorithmic Structure 27

4.3.1 Knowledge Transfer Method 29

4.3.2 Fireflies’ Movement-based Mutation 31

5 Experiments 33 5.1 Dataset 33

5.2 Experimental criteria 33

Trang 10

5.3 Results and Discussions on Clustered Shortest Path Tree 345.3.1 Comprehensive comparisons between the proposed algorithm

and several state-of-the-art approaches 345.3.2 Analyze the effect of the input graph size on the performance

of the proposed algorithm 385.3.3 Analyze the effectiveness of the proposed multi-parent crossover 385.4 Results and Discussions on Clustered Minimum Routing Cost Tree 405.4.1 Analyze the effectiveness of the proposed encoding and decod-

ing method 415.4.2 Analyze the effectiveness of proposed hybridization strategy 425.4.3 Comprehensive comparisons between the proposed algorithm

and several state-of-the-art approaches 43

A.1 Experimental results of the Clustered Shortest-Path Tree (CluSPT)problem 57A.2 Experimental results of the Clustered Minimum Routing Cost Tree(CluMRCT) problem 57

Trang 11

List of Figures

2.1 Solutions of the CluMRCT and CluSPT problems 8

3.1 A CluSPT encoding example 12

3.2 Decoding method for the CluSPT problem 14

3.3 An invalid CluSPT encoding 15

3.4 An example of the repairing process 16

3.5 An example of constructing the unified search space 19

3.6 An example of the proposed crossover operator 20

3.7 An example of constructing specific-task representation from an in-dividual in the unified search space 22

4.1 An CluMRCT encoding example 24

4.2 Proposed decoding method in the first level: a) An input graph; b) A CluMRCT encoding; c-g) Steps to construct an intra-routing spanning tree in Cluster 3 25

4.3 Steps in the second level of the proposed decoding method a) An incomplete solution obtained after the first level b) The priorities of clusters c-d) Steps to build inter-routing spanning tree 27

4.4 An example of individual representation 29

5.1 Compare PI values of K-MFEA and state-of-the-art algorithms 35

5.2 Compare RPD values of K-MFEA and state-of-the-art algorithms 36

5.3 Convergence trends of K-MFEA and G-MFEA on Type 1, Type 5, and Type 6 37

5.4 Running time of K-MFEA, G-MFEA and HB-RGA 37

5.5 Running time of K-MFEA in comparison with HB-RGA 37

5.6 Relationship between the number of vertices and the performance of K-MFEA in Types 1, 5, and 6 39

5.7 Relationship between the improvement percentage (PI) and size of the input graph 42

Trang 12

5.8 Compare MFEA-FA with PMFEA and PFA 435.9 Compare MFEA-FA with state-of-the-art algorithms 445.10 Compare the convergence trend 45

Trang 13

List of Tables

5.1 Summary of comparisons between K-MFEA and existing algorithms 355.2 The different results obtained by K-MFEA when running with thenumber of parents in range 2 to 10 405.3 Parameter of MFEA-FA and comparison algorithms 415.4 Summary results of PMFEA and E-MFEA Better, Equal, and W orsedenote the number of tasks that PMFEA is better, equal and worsethan E-MFEA 425.5 Ranking of algorithms given by Friedman test 445.6 Wilcoxon signed rank test with α = 0.05 44A.1 Results obtained by G-MFEA, HB-RGA and K-MFEA on Type 1 58A.2 Results obtained by G-MFEA, HB-RGA and K-MFEA on Types 3and 4 59A.3 Results obtained by G-MFEA, HB-RGA and K-MFEA on Type 5 60A.4 Results obtained by G-MFEA, HB-RGA and K-MFEA on Type 6 61A.5 Averaged Experimental results (unit: 104) on Type 1 where +, =,and − denote the number of tasks that each algorithm was better,equal, and worse than the MFEA-FA algorithm 62A.6 Averaged Experimental results (unit: 106) on Type 3 and Type 4where +, =, and − denote the number of tasks that each algorithmwas better, equal, and worse than the MFEA-FA algorithm 63A.7 Averaged Experimental results (unit: 105) on Type 5 where +, =,and − denote the number of tasks that each algorithm was better,equal, and worse than the MFEA-FA algorithm 64A.8 Averaged Experimental results (unit: 104) on Type 6 Small where +,

=, and − denote the number of tasks that each algorithm was better,equal, and worse than the MFEA-FA algorithm 65A.9 Experimental results (unit: 106) on Type 6 Large where +, =, and

− denote the number of tasks that each algorithm was better, equal,and worse than the MFEA-FA algorithm 66

Trang 15

In the years following massive globalization efforts, multi-domain network tures where end-users or devices are divided into clusters can be found in manyreal-world applications Some applications can be mentioned, such as agriculturalirrigation systems, transportation services, logistics, cable TV systems, power net-works, and distribution systems Therefore, network design problems on multi-domain networks have attracted much interest not only within the confinement ofresearch communities but even more so from members of governments and industry.This thesis focuses on two network design problems in the multi-domain net-work, which are Clustered Shortest-Path Tree (CluSPT)[16] and Clustered MinimumRouting Cost Tree (CluMRCT)[27] The objective of the CluSPT problem is to find

architec-a multi-domarchitec-ain client-server network architec-architecture to minimize routing costs from architec-acentral server to all other devices Meanwhile, the CluMRCT problem aims to find

a network design to minimize the total routing costs between any two devices in themulti-domain peer-to-peer network Additionally, these problems require communi-cations within a cluster to be routed locally and not contain any vertex from otherclusters, and private information among nodes in each cluster needs to be circulatedinternally and not transmitted through other clusters This allows network systems

to have high security while reducing operational costs Solving CluSPT and RCT thus brings a high economic efficiency not only for computer network systemsbut also for various areas such as computational biology, product transportation,logistics, as well as agricultural irrigation [31, 36]

CluM-However, both these problems are NP-hard Therefore, the preferred method totackle this problem is mainly through meta-heuristic algorithms, as solving a largeinstance of them using exact approaches is unfeasible and quite literally a waste oftime A family of meta-heuristic algorithms that has found considerable success indealing with NP-Hard problems are Evolutionary Algorithms (EAs) [5, 18] Thesealgorithms based their mechanisms on Darwin’s evolution and natural selection the-ory Essentially, a multitude of solutions will first be randomized, encoded in a waythat the solutions are susceptible to change by evolution operators, namely muta-tion and crossover While EAs themselves have been subjects of research since the1990s, they still have many shortcomings, and the most important one is alwayssolving any new problem from scratch regardless of how similar they are to those

Trang 16

already solved in the past Knowledge drawn from past experience is not reused

to tackle new problems Meanwhile, network architectures in practice often sharemany common features and seldom exist in isolation A network architecture inone system may be helpful in another Therefore, a new multitasking variant of EAwhich is Multifactorial Evolutionary Algorithm (MFEA)[6, 17, 21] is proposed tohandle these drawbacks The MFEA not only can solve multiple problems simul-taneously but also facilitate implicit knowledge transfers between problems duringthe optimization process to obtain better solutions than solving them in isolation.There are many efforts using MFEA to solve CluSPT and CluMRCT [17,49,50,

52,55] These methods, however, have a large computational time and only function

on complete graphs Meanwhile, in practical networks, each vertex only connects to

a certain number of other vertices Therefore, it makes algorithms hard to apply toreal-world applications Besides, these algorithms did not control negative transferswhen solving low-similarity problems, leading to low-quality solutions in some cases.The main contributions of this thesis include:

• Propose two novel encoding with significant consideration regarding mance and memory usage for CluSPT and CluMRCT problems Notably, theproposed encodings are much smaller than existing approaches, allowing mul-titask and other meta-heuristic algorithms to effectively function on completeand sparse graphs Besides, these methods’ computational complexity is alsoanalyzed carefully and thoroughly

perfor-• Examine the effectiveness of multi-parent crossover in multitasking algorithms

To the best of my knowledge, there is no similar approach in the multitaskingliterature The results proved that it is a promising approach for large-scaleproblems

• Design a novel combination of multifactorial evolutionary and firefly rithms to tackle low-similarity tasks The proposed algorithm not only en-hances the self-evolution of each task but also improves inter-task knowledgetransfers by delivering higher-quality solutions

algo-• Examine the effectiveness of the proposed algorithm and encoding methods

on various types of instances The results proved that the proposed methodsoutperformed all existing approaches in terms of solution quality, convergencetrend, and computational time

The thesis is organized as follows:

• Chapter 1 provides an overview of popular meta-heuristic algorithms, cially the multifactorial evolutionary algorithm

espe-• Chapter 2 describes problem formulations and outlines related works

Trang 17

• Chapter 3 presents a new encoding strategy and a novel multitasking rithm using the multi-parent crossover to solve multiple CluSPT problemssimultaneously.

algo-• Chapter 4 presents a novel encoding strategy and a novel hybrid multitaskingalgorithm for the CluMRCT problem

• Chapter 5 provides experimental results on various types of instances Besides,the effectiveness of each component in the proposed algorithms and factorsaffecting the quality of the algorithms are also examined

• Chapter 6 concludes the thesis and outlines future works

Trang 18

1.1 Overview of Meta-heuristic Algorithm

Meta-heuristic algorithms refer to a class of methods utilizing stochastic factors tofind the global or near-optimal solutions for complex optimization problems Eachmeta-heuristic optimization algorithm holds two main features: exploration andexploitation Exploration relates to the ability to search throughout the searchspace to find an optimal solution and avoid local optima On the other hand,exploitation is the ability to locally search around elite solutions to improve theirquality Such features are used for all meta-heuristic algorithms but with specificoperators and mechanisms for each framework The advantages of these algorithmsare simplicity, flexibility, and independence from the nature of the problem By usingstochastic factors, meta-heuristic algorithms do not need to be concerned aboutthe problem’s derivative information and, therefore, become an effective methodfor finding optimal solutions to a given optimization problem Another significantadvantage is the flexibility of this class of algorithms, which allows them to solveany kind of optimization problem within a reasonable amount of time by following

a pre-defined structure of each algorithm’s framework Therefore, meta-heuristicsare becoming more popular and received significant attention from the researchcommunity

Interestingly, most meta-heuristics come from familiar sources of inspirationclose to real life, such as natural evolution, animal behaviors, human behaviors, orphysical phenomena For example, Swarm Intelligence (SI) [33] is one of the classes

in meta-heuristics, which imitates the social behavior of animal groups The

Trang 19

mecha-nism behind this method is based on sharing collective information of all individualsduring the optimization process Particle Swarm Optimization (PSO) [19, 25] de-veloped by Kennedy and Eberhart [25], is one of the most representative algorithmsfor this group PSO simulates the movement of organisms in a bird flock or fishschool cooperating to find food During the optimization process, all candidates(particles) follow the best solutions in their path Each particle keeps track ofits coordinates in the search space and compares it with its own position of thebest solution it has achieved (pbest), and the best solution obtained so far in thepopulation (gbest) Some recent other algorithms belonging to this class are GreyWolf Optimizer (GWO) [32], Galactic Swarm Optimization (GSO) [30], Whale Op-timization Algorithm (WOA) [20], Bat Algorithm (BA) [1], and Firefly Algorithm(FA) [54] Thanks to incorporating the best solutions’ information and successfulhistory, these SI-based algorithms have a fast convergence trend and strong exploita-tion ability [1,20, 54].

Physical phenomena also inspire another class of meta-heuristics These rithms are derived from physical laws in nature, characterizing the interaction ofsearch agents through some rules based on physical processes One typical example

algo-of this class is the Gravitational Search Algorithm [39] which is an optimizationmethod inspired by the theory of Newtonian gravity in physics to update the po-sition of a candidate toward the optimum point Recently, several meta-heuristicsbased on human behaviors have been researched and developed An instance of thistype is the Teaching-Learning-Based Optimization (TLBO) [38] The framework isinspired by the influence of the teacher on his/her learners, with the optimizationprocess being divided into two main phases: the “teacher phase” (learners learnfrom the teacher) and the “learner phase” (learning by interacting with other learn-ers) These two phases are repeated continuously until global convergence of thealgorithm is obtained

Meanwhile, Evolutionary Algorithms (EA) [5] with one of the most famous resentatives is Genetic Algorithm (GA) [18] is the most classic meta-heuristic class

rep-EA is inspired by natural evolution with its mechanism directly based on Darwin’stheory of evolution and natural selection This algorithm reflects the natural selec-tion process in which the fittest individuals are selected for reproduction to produceoffspring of the next generation To obtain this, the algorithm uses two evolution-ary operators called crossover and mutation The functionality of mutation is topreserve the population diversity and local search around current individuals, whilethe crossover aims to combine good genes of parents to obtain offspring with betterfitness Due to the simplicity and independence of the problem, EAs have shownremarkable success in dealing with NP-Hard problems [8]

Furthermore, recent years have also witnessed the strong development of brid methods, which combine meta-heuristic algorithms and different techniques for

Trang 20

hy-complex optimization problems Lyu [28] incorporated an EA-based algorithm and

an SI-based algorithm, which are GA and PSO, for periodic charging planning inthe wireless rechargeable sensor network Aydilek [3] adopted two SI-based algo-rithms together, which are FA and PSO, to tackle computationally expensive prob-lems Bernal [26] proposed a combination of multiple meta-heuristics and Fuzzylogic-based initialization for autonomous robot navigation Besides, some studiesthat put meta-heuristics and machine learning together have outstanding results.S´anchez [44] adopted the Firefly Algorithm (FA) and modular granular neural net-works for human recognition Zivkovic [58] successfully proposed a hybrid methodbetween machine learning and beetle antennae search [57] to predict COVID-19cases Bacanin [4] adopted FA and convolutional neural network for magnetic reso-nance image classification of glioma brain tumor grade Due to constitutive methods’mutual complementarity, hybrid methods are considered as a promising and effectiveapproach to dealing with complex problems in many practical applications

These methods, however, still have some shortcomings [48] as follows:

• Firstly, existing meta-heuristics have not yet imitated the multitasking ability

of humans Each meta-heuristic only solves one problem at a time Meanwhile,many real-world systems, such as cloud computing, usually face many taskssubmitted by multiple users simultaneously

• Secondly, meta-heuristics always tackle new problems from scratch regardless

of how similar they are to those already solved in the past Meta-heuristics sume zero prior knowledge about them at hand This makes the capabilities ofsolvers not automatically grow with problem-solving experience Meanwhile,real-world problems seldom exist in isolation Many of them are either repeti-tive or share domain-specific similarities, and therefore, humans routinely use

as-a pool of knowledge dras-awn from pas-ast experiences when fas-aced with as-a new tas-ask

It is necessary to design meta-heuristics that can imitate the human multitaskingability and facilitate knowledge transfers between problems during the optimizationprocess, to reduce the total time for task completion and solve new problems moreefficiently and quickly

1.2 Multifactorial Evolutionary Algorithm

Over the past few years, a novel research direction called Evolutionary MultitaskOptimization (EMTO) [48] has been inaugurated, and it immediately ushered in anew age for the intelligent computing sphere Inspired by classic EA, the EMTOparadigm focuses on ascertaining promising solutions for different problems simulta-neously Through optimizing multiple problems together, valuable genetic materials

Trang 21

can be transferred between tasks, which often plays a crucial part in improving thequality of solutions, convergence characteristics, and resource usage compared tosolving them separately.

The emergence of the cloud service and its series of optimization challenges [7]prompted Gupta et al [22] to devise Multifactorial Optimization (MFO), whichhas currently matured into one of the most efficient and popular approaches whengrappling with complex multitasking environments Besides, an algorithm coinedMFEA was also introduced based on the concept of multifactorial inheritance frombiology With its potentialities, MFEA have become a breakthrough in wide-rangingapplications Sagarna [43] successfully adopted the MFEA for tackling more thantwo software testing problems at the same time Chandra [10] used the MFEA

to simultaneously search many neural network topologies Besides, many otherstudies delved into the applications of MFEA, such as fuzzy cognitive map [47],data mining [56], big data [45], and wireless sensor networks [14, 24]

The driving force behind the methodology of MFEA is to utilize the implicitparallelism of population-based search with the omnidirectional transfer of knowl-edge across different tasks through crossover-based operators Therefore, not onlythe MFEA can leverage the advantage of the traditional EA to allow the evolution

of the population towards optimality with fast convergence speed, but the algorithmcan also implicitly acquire the similarity of the solution to each task and use thismutual information to improve the population in subsequent generations

In order to facilitate knowledge exchange in a multitask setting, all populations

of each constitutive task must be defined in a shared place, which is called theUnified search space The importance of unifying all factorial search spaces intoone with a single representation method is to create a common platform on whichthe transfer of genetic materials among tasks can seamlessly occur Without loss ofgenerality, consider K minimization tasks T1, T2, , TK to be solved simultaneouslywith the search space dimensionality of each optimization task is D1, D2, , DKrespectively In such scenario, the unified search space X can be defined with itsdimensionality Dunif ied= max{D1, D2, , DK} Then, the MFEA employs a singlepopulation P of individuals to solve K optimization tasks concurrently, where eachtask contributes an added factor influencing the evolution of the population Giventhis background, several terminologies are defined to evaluate an individual pi inpopulation P as follows:

• Skill Factor: Skill factor τi of pi is the one task on which the individual is mosteffective This may be defined as τi = argminj{ri

j}, where ri

j is the rank ofindividual pi on task Tj (j = 1, , K)

• Scalar Fitness: Given the skill factor τi, scalar fitness ϕi of pi is calculatedbased on its best rank over all tasks, i.e., ϕi = 1/ri

τ i

Trang 22

Algorithm 1: Pseudocode of basic MFEA

1 begin

2 P (0) ← Randomly generate N individuals;

3 Calculate skill factor and update scalar fitness for each individual in P (0);

10 xi, xj← Intra-task crossover on p a and pb;

11 Assign offspring xi and xj skill factor τa;

13 x i , x j ← Inter-task crossover on p a and p b ;

14 Randomly assign offspring x i , x j skill factor τ a or τ b ;

23 Update scalar fitness for each individual in P i (t);

24 P (t + 1) ← get the N biggest scalar fitness individuals in Pi(t);

to transfer knowledge through crossover-based genetic exchange among tasks

As can be seen in Algorithm 1, the MFEA is under great control of a presetparameter that indicates the level of inter-task knowledge exchange, called ran-dom mating probability (rmp) However, underlying inter-task similarities amongdifferent optimization tasks are hard to foresee in practice to select precisely thisparameter

Trang 24

G = (V, E) where V is the set of nodes, E is the set of edges, and each edge is sociated with a positive weight The vertex set V is divided into m disjoint clusters{C1, C2, , Cm} For a cluster Ci, the induced graph G[Ci] is a maximum subgraph

as-of G that spans all vertices in Ci However, the objective of the CluSPT problem

is to find a multi-domain network architecture to minimize routing costs from acentral server to all other devices Meanwhile, the CluMRCT problem aims to find

a network design to minimize the total routing costs between any two devices in themulti-domain network Solving these problems is an urgent demand not only forcomputer network design but also for optimization of various areas such as compu-tational biology, product transportation, logistics, as well as agricultural irrigation

Trang 25

management [31,36]

Problem formulations of CluSPT and CluMRCT can be clearly presented asfollows:

Clustered Shortest Path Tree Problem

- V is partitioned into m clusters {C1, C2, , Cm}

- A source vertex s ∈ V

Constraint: - The induced graph T [Ci](i = 1, , m) is connected

v∈V

dT(s, v)where dT(s, v) is the cost of shortest path from source vertex s tovertex v on T

Clustered Minimum Routing Cost Tree Problem

- V is partitioned into m clusters {C1, C2, , Cm}

Constraint: - The induced graph T [Ci](i = 1, , m) is connected

v,u∈V

dT(u, v)where dT(u, v) is the cost of shortest path from vertex u to vertex

v on T

(a) An input graph (b) An invalid solution (c) A valid solution

Figure 2.1: Solutions of the CluMRCT and CluSPT problems

The solution cases of both problems are illustrated in Figure 2.1 Figure 2.1(a)shows an input graph G whose vertex set is divided into three clusters The inducedgraph of G on cluster 1 (denoted G[C1]) is composed of a vertex set {v1, v2, v3, v4} and

an edge set {(v1, v2), (v1, v3), (v1, v4)} Figure 2.1(b) describes an invalid solution inwhich the induced graph in cluster 2 is non-connected A valid solution is presented

in Figure 2.1(c), where the whole graph is a spanning tree, and the induced graph

Trang 26

in each cluster is also a tree Communications among vertices in each cluster arerouted locally and do not transmit through any vertex from other clusters.

2.2 Related Works

CluSPT and CluMRCT problems have attracted a lot of attention due to theirurgent demand Because of its NP-hardness, approximation algorithms, heuristics,

or meta-heuristic algorithms are effective methods to tackle these problems

In recent years, some multitasking meta-heuristic approaches were developedfor solving multiple CluSPT problems at the same time The authors [50] proposed

a Multifactorial Evolutionary Algorithm with new genetic operators The mainidea of these operators is that first comes the construction of a spanning tree for thesmallest tasks and afterward constructing spanning trees for larger tasks In [17,50],the authors took advantage of the Cayley code to encode the solution of CluSPT andproposed genetic operators The genetic operators introduced here are, conceptually,similar to the genetic operator for binary and permutation representations However,

it limits its application to complete graphs only Therefore, the proposed MFEA issuitable exclusively for complete graphs Binh at.el [8] discussed a new algorithmbased on the EA and Dijkstra’s Algorithm In a divide and conquer fashion, theproposed algorithm decomposes the CluSPT problem into two sub-problems Thefirst sub-problem’s solution is found by an EA, while Dijkstra’s Algorithm solves thesecond subproblem The goal of the first sub-problem is to determine a spanningtree that connects among the clusters, while that of the second sub-problem is todetermine the best-spanning tree for each cluster The authors [51] proposed aheuristic based on a randomized greedy algorithm and Dijkstra’s algorithm In [52],the authors described a method of applying MFEA based on deconstructing anoriginal problem into two problems In the proposed MFEA, the second task plays

a role as a local search method for improving the solutions determined in the firsttask

For the CluMRCT problem, both meta-heuristics and approximation methodswere proposed An approximation method called R-Star was first proposed in [27]

It constructed a local star tree in each cluster and a global star tree to connectclusters However, this method works only on complete graphs It is considered a2-approximation if the edges of the input graph obey the triangle inequality Be-sides, several multitask methods based on the MFEA were proposed for the CluM-RCT In [55], the authors proposed an algorithm named E-MFEA for the CluMRCTproblem with two-level evolutionary operators The algorithm builds a solution forthe smallest dimensional tasks and then constructs it for higher dimensional tasks.However, this algorithm does not control negative transfers when solving unrelatedtasks, leading to low-quality solutions than single-task algorithms in some cases

Trang 27

The authors [49] proposed an adaptation of MFEA-II (called aMFEA-II), which

is an improved MFEA to limit negative transfers by online learning the transferrate in each generation The method outperformed E-MFEA in most cases, but

it was only more effective than R-Star in small-and-medium instances Moreover,both methods use the edge representation, making them suitable only for completegraphs that require any vertex (device) to connect to all other vertices (devices).Although a host of MFEA algorithms were proposed for solving CluSPT andCluMRCT problems in practice, they have revealed multiple drawbacks, i.e., onlyapplicable on complete graphs, inefficient for finding the solution on large searchspaces, cannot avoid negative transfers between low-similarity problems, and havesignificant computational time In the following chapters, the thesis introducesnovel encoding strategies for each problem to allow existing algorithms to function

on both complete and sparse graphs Besides, the thesis also designs a novel transfermechanism based on multi-parent crossover and a hybrid approach to enhance theknowledge transfer quality of existing algorithms

Trang 28

3.1 Individual Encoding

Although many encoding approaches are proposed for the CluSPT problem, such

as Edge-set encoding[23,50, 51] and Cayley Code [52, 53], these methods only can

be applied to complete graphs Meanwhile, in practical networks, each vertex onlyconnects to a certain number of other vertices, which makes algorithms hard to apply

to real-world applications In this work, an efficient and small-dimensional encoding

is proposed for the CluSPT to reduce computational resources and help multitaskingalgorithms can effectively function on both complete and sparse graphs

Given a CluSPT solution T , a local root r is defined as the first vertex to bevisited within its cluster when starting traversal from the source vertex s The cost

Trang 29

of T is can be calculated as follows:

Besides, from Equation 3.1, it can observe that if a set of local roots (r0, r1, , rm)

is given, the optimal solution for the CluSPT problem can be found in polynomialtime by the Dijsktra’s algorithm Different sets of local roots result in differentCluSPT solutions Therefore, instead of saving entire vertices and edges, this thesisonly uses information about local roots in each cluster to represent a CluSPT so-lution Moreover, only a few nodes can be selected as local roots for each cluster.Because a local root needs to connect to another cluster, vertices selected as thelocal root of a cluster must have direct connections to vertices in other clusters As

a result, the sparser the input graph, the smaller the search space needing to beexplored Hence, the optimization process can be done more quickly

Figure 3.1: A CluSPT encoding exampleFor cluster Cj, let Cj∗ denote the set of vertices connecting directly to vertices

in other clusters Cj∗ is called as the inter-vertex set of cluster Cj An exampleshown in Figure 3.1 Figure 3.1a describes an input graph G with 14 vertices and

3 clusters Inter-vertex sets of the corresponding clusters in G are illustrated inFigure 3.1b Figure 3.1c shows an individual encoding of the CluSPT problem, inwhich each element represents the selected local root of respective clusters

Trang 30

Notably, the length of the proposed encoding is equal to the number of clusters

in the input graph Meanwhile, in practical networks, the number of clusters isalways much smaller than the number of nodes Therefore, it can be concluded thatthe proposed encoding strategy significantly reduces the search space dimensioncompared to existing encoding approaches

3.2 Individual Decoding

As described above, each CluSPT solution’s encoding is an integer array whosedimension equals the number of clusters in the input graph The ith element is thelocal root of the ith cluster Given an encoding {r0, r1, , rm}, a correspondingCluSPT solution is constructed in as follows:

Step 1: Dijkstra’s algorithm is used to build the shortest path tree in each cluster

starting from its local root

Step 2: Connections among the clusters are built by a customized version of Dijkstra’s

algorithm First, the proposed method adds vertices of the cluster containingthe source vertex s into a closed set V Among remaining clusters connected

to V through their local roots, the proposed method chooses to add a cluster

Ci, in which the total routing cost from the source vertex to its vertices (|Ci| ×

dT(s, ri)) is minimum, into V

Step 3: Repeat step 2 until all clusters have been added to V

The Dijkstra’s algorithm in this work is implemented using the Binary Heap ture and its complexity when running on the input graph G(V, E) is O(|E| +

struc-|V | × logstruc-|V |) So for m runs on m clusters (Ci, Ei) will cost O(P

Figure 3.2 presents a process of constructing the CluSPT solution Figure 3.2(a)shows the input graph while an individual encoding is shown in Figure 3.2(b) Vertex

1 is the source vertex of the input graph Local roots of three clusters are 1, 12, and

3, respectively Firstly, shortest path trees starting from their local root are created

as shown in Figure 3.2(c) Next, the cluster containing source vertex 1 is addedinto V (red), as shown in Figure 3.2(d) The decoding method now considers allremaining clusters that connect directly to V , in this case, both cluster 2 (through

Trang 31

(a) (b) (c)

Figure 3.2: Decoding method for the CluSPT problem

Trang 32

edge (1, 12)) and cluster 3 (through edge (3,6)) Comparing these two routes, it can

be seen that routing from 12 to 1 is closer than from 3 to 1, and thus cluster 2 isadded to V (Figure 3.2(e)) Cluster 3 now can connect to V through both 6 and 11.Since the path 3 → 6 → 1 is shorter than the path 3 → 11 → 1, cluster 3 is addedinto V through edge (3, 6)

3.3 Repairing Method

Although the proposed encoding method has many advantages in storing, lating, and executing evolutionary operators, it contains a weakness in incompletegraphs Because local roots of clusters are randomly selected, they may not guar-antee connectivity between clusters in Step 2 of the decoding method

calcu-An example is depicted in Figure 3.3 Figure 3.3(a) describes the input graphwith 19 vertices and 3 clusters Figure 3.3(b) presents an invalid CluSPT encoding

As shown in Figure 3.3(b), vertex 12, 8, and 19 are assigned to be the local root

of clusters 2, 3, and 4, respectively However, in the input graph, there is no pathfrom 1 (source vertex) to 8 and 19 such that they must be the first visited vertex

in its cluster when starting traversal from 1 Therefore, the local root property ofthe selected vertices is not guaranteed A repairing method (RIM) is proposed tofix these errors as follows:

Step 3: If there are still clusters outside of V’ after step 2 then do:

(a) Randomize an edge (u, v) with v ∈ V0 and u /∈ V0

(b) Determine the cluster containing u and change the local root of thatcluster into u

Trang 33

(c) Add vertices of that cluster to V0.

Step 4: Repeat step 2 and step 3 until all clusters are added to V0

Figure 3.4: An example of the repairing process

An example of the repairing process is shown in Figure 3.4 In Figure 3.4(a), Cluster

1 which contains source vertex, are added into V0 Then, only cluster 2 is directlyconnected to V0 through the edge (1,12) (highlighted in red) Thus, cluster 2 isadded into V0, as shown in Figure 3.4(b) Next, since clusters 3 and 4 lack an explicitconnection to V0 from their local roots, RIM randomizes an edge that satisfies thiscondition It can be seen that the edges (3, 6), (3, 9), (6, 11), (18, 13), (14, 7), (14,13) and (11, 14) are eligible for selection Assume that the edge (11, 14) is chosen

As a result, the local root of cluster 4 is changed to 14, and cluster 4 is added to

V0 (Figure 3.4(c)) Finally, the algorithm runs through the clusters outside of V0again, in this case, only cluster 3 Because Cluster 3 can connect to V0 throughthe edge (8, 19), it is then added to V’ The property of the resulting local roots isnow guaranteed The computational complexity of the proposed repairing method

is O(m ∗ |V |) where m and |V | are the number of clusters and number of vertices inthe input graph, respectively

Furthermore, to analyze the proposed repairing method’s efficacy, the thesis hasprovided and proven a lemma about the maximum number of positions to be fixedfor any invalid encoding, as shown in Lemma 3.3.1

Lemma 3.3.1 The maximum number of positions that must be fixed on the encodingrepresentation is bm/2c, with m being the number of clusters

Proof Consider each cluster C in the input graph G as a vertex c in new graph

G1 An edge between two vertices c1 and c2 exists in G1 if cluster C1 connects tocluster C2 via its local root in G or vice versa Because local roots are randomlyselected from the inter-vertex set, each cluster in G always has at least a directconnection to another cluster through its local root Therefore, each vertex in G1 isalways connecting to another vertex In the worst case, G1 is a forest with bm/2cconnected components To connect these connected components, it only needs to

Trang 34

modify one cluster’s local root in each component Therefore, the maximum number

of positions needing to be corrected is bm/2c

Besides, from Lemma 3.3.1, it can be concluded that the smaller the number ofclusters, the smaller the probability of invalid encodings

3.4 Algorithmic Structure

In this section, the thesis introduces an approach based on MFEA (called K-MFEA)using the proposed encoding and decoding strategy to solve multiple CluSPT taskssimultaneously Besides, instead of only performing crossover between two parents,the proposed algorithm K-MFEA is equipped with a novel multi-parent crossover

to enhance knowledge transfers between tasks

The ith CluSPT task is performed on an input graph Gi = (Vi, Ei, si), i =

1, , K where Vi, Ei, si are set of vertices, set of edges, and source vertex, tively Vi is divided into mi clusters The jth cluster of the ith task is denoted by

respec-Cji, j = 1, , mi and Ci = {C1i, C2i, , Cmi i} The local root of the jth cluster ofthe ith task is denoted by rij

The proposed algorithm’s structure is presented in Algorithm 2, and the mentation steps of the algorithm are discussed in detail in the subsections below

imple-3.4.1 Unified search space

The USS for K CluSPT tasks is defined on a graph Gu(V, C, m) as follows:

• The number of clusters m = max(m1, m2, , mK) where mi is the number ofclusters in the ith CluSPT task

• The jth cluster of Gu is denoted by Cj, and Cj = Cj1∗∪ C2∗

in Figure 3.5b and Figure 3.5d, respectively The remaining vertices all fulfill therequirement of a local root

Trang 35

Algorithm 2: Proposed K-MFEA Algorithm

6 Assign skill factor τi= i%N + 1;

7 Construct specific representaion sri of individual pi in task τi Refer to

Algorithm 4;

9 Construct and evaluate the CluSPT solution from sri for task τi only Refer to

Section 3.2;

10 end

11 Update scalar fitness of each individual in P (0);

12 while stopping conditions are not satisfied do

13 Offspring population Pc(t) ← ∅;

14 while |Pc(t)| < N do

15 Choose k random individuals pi(i = 1, , k) from P (t);

/* Perform muti-parent crossover operator */

16 if (All selected individuals have same a skill factor) or (rand < rmp) then

17 o i (i = 1, , k) ← Perform the multi-parent crossover on p i (i = 1, , k)

Refer to subsection 3.4.3;

18 Assign randomly the skill factor of the parents to offspring;

19 else

20 o i ← Perform mutation on each parent p i (i = 1, , k) Refer to

subsection 3.4.4;

21 Assign respectively the skill factor of the parent to offspring;

22 end

23 Construct specific representation sr for each offspring Refer to Algorithm 4;

25 Construct and evaluate CluSPT solution for offsprings Refer to Section 3.2;

26 P c (t) ← P c (t) ∪ {o i }, i = 1, , k;

27 end

28 P B (t) ← the top 50% best individuals from P (t);

29 R(t) ← P c (t) ∪ P B (t);

30 Update scalar fitness of each individual in R(t);

31 P (t + 1) ← Get N fittest individuals from R(t);

33 end

34 end

Trang 36

Figure 3.5: An example of constructing the unified search space

Trang 37

Next, inter-vertices are merged in the respective clusters Figure 3.5e shows anobtained USS where the red area in each cluster denotes the vertices used for thefirst task while the orange area marks the vertices for the second task.

3.4.2 Individual Initialization Method

Each element in a unified individual representation is randomly selected from therespective cluster in Gu(V, C, m) The initialization method details are presented inAlgorithm 3 with computational complexity of O(m)

Algorithm 3: Initialization Individual Method

is similar to how farmers breed new crops by inheriting the excellent traits frommultiple trees: pest resistance of plant A, high yield of plant B, and fruit quality ofplant C

Figure 3.6 shows offsprings obtained after performing N-MPCX ’s steps ure 3.6(a) presents n (n = 4) input parents and n − 1 random cut-points The ith

Fig-offspring preserves two segments of genes which are from the (i − 1)th cut-point to

ith cut-point, and (N − 1)th cut-point to the end of its corresponding parent Othersegments of the offspring are determined by the gene segment of their next parents

in order The computational complexity of the proposed crossover is O(N × m)

Figure 3.6: An example of the proposed crossover operator

Trang 38

with N and m being the number of parents and the number of clusters in the unifiedsearch space, respectively.

3.4.4 Mutation Operator

The mutation operator is performed by randomly selecting a cluster and replacingits local root with another inter-cluster vertex in the same cluster The mutationmethod’s computational complexity is O(1)

3.4.5 Mapping Individual Method

This part presents a method to construct a task-specific individual representationfrom a unified individual representation

Algorithm 4: Mapping Individual Method

Input:

• An input graph of the i th task Gi(Vi, Ei, Ci, Ci∗, si, mi)

• Unified search space G u (V, E, C, m).

• An unified individual representation I = {r 1 , r2, , rm}

Output: A task-specific individual representation I0 = {r01, r02, , r0mi }

9 p ← find maximum index of vertex l in Cjh∗, ∀h ∈ [1, , K] and (h 6= i);

10 sizej← the number of elements in C i∗

Let l be the vertex corresponding to the ith element If l can be found withinthe ith cluster of the current task, it will be selected to be the local root Otherwise,the method locates the maximum index of vertex l among the ith clusters of allother tasks It then takes the vertex whose index is the remainder of the divisionbetween the maximum index and the current cluster’s size The method is described

in Algorithm 4 with the computational complexity of O(m2), where m is the number

of clusters in the unified search space

Trang 39

1 ={1, 6, 5}, C1∗

14 exists within the inter-vertex set C21∗ of Cluster 2, it is selected to be the localroot of Cluster 2 However, vertex 6 does not appear in the inter-vertex set C31∗, andits maximum index in cluster 3 of the other task is 0 Therefore, vertex 9, havingthe same index 0 ( = 0 mod 3), will be chosen as the local root of Cluster 3 Theprocess for Task 2 is similar Figure 3.7(b) demonstrates two obtained specific-taskindividual representations after applying the mapping method

Chapter summary

This chapter proposed a novel encoding strategy based on local roots for the CluSPTproblem Notably, the proposed encoding allows multitasking algorithms to func-tion on both complete and sparse graphs and significantly reduces the search spacedimension compared to existing encoding approaches Besides, a multitasking algo-rithm with a novel multi-parent crossover is designed to enhance knowledge transfersbetween tasks To the best of my knowledge, this is the first effort to examine theeffectiveness of the multi-parent crossover in multitasking algorithms

Trang 40

prob-to function efficiently on both complete and sparse graphs Besides, prob-to reduce theeffect of blindness and randomness transfers, this chapter introduces a hybrid multi-tasking algorithm named multifactorial firefly algorithm to tackle multiple ClusteredMinimum Routing Cost Tree (CluMRCT) problems at the same time.

4.1 Individual Encoding

There are many methods to represent a spanning tree in the literature A goodencoding strategy will offer many benefits to solve complex problems The mostprominent spanning-tree encoding schemes, namely Edge-set Encoding (ESE) [41](uses a list of edges), Characteristic Vector Encoding (CSE) [40] (utilizes a binarystructure whose size is the number of edges), Predecessor Encoding (PE) [40] (storesthe id of the parent node), Pr¨ufer Number Encoding (PNE) [17, 34] (employs aninteger vector whose size is less than the number of vertices by 2), Network Ran-dom Keys Encoding (NRKE) [42] based on the priority of edges, and Node-DepthEncoding (NDE) [9, 15] based on the depth and degree of nodes These methods,however, still have some shortcomings [9, 40] For example, they are highly redun-dant (e.g., NRKE), or only work efficiently in the simple complete graph (e.g., PE,PNE, NDE); easily generate unfeasible solutions, require the complex evolutionaryoperators (e.g., ESE, CSE, PE, PNE, NDE); or cannot apply directly into the CluM-RCT problem Besides, due to no information about source vertex, the proposedencoding for the CluSPT problem in the previous chapter also cannot be applied to

Định dạng
Số trang	83
Dung lượng	1,12 MB