Luận văn evolutionary and deep reinforcement learning algorithms for optimizing the lifetime of wireless sensor networks

olutionary and Deep Reinforcement Learning Algorithms for Optimizing the Lifetime of Wireless Sensor Networks Abstract In recent years, wireless sensor networks WSMs lave become an ess

Trang 1

HANGOI UNIVERSITY OF SCIENCE AND TECHNOLOGY

MASTER THESIS

Evolutionary and Deep Reinforcement

Learning Algorithms for Optimizing the

Lifetime of Wireless Sensor Networks

BUIHONG NGOC

ngoc.bh?12155m@sis hust.edu Major : Data Science (Llitech)

Trang 2

HANGOI UNIVERSITY OF SCIENCE AND TECHNOLOGY

MASTER THESIS

Evolutionary and Deep Reinforcement

Learning Algorithms for Optimizing the

Lifetime of Wireless Sensor Networks

BUIHONG NGOC

ngoc.bh?12155m@sis hust.edu Major : Data Science (Llitech)

Thesis advisor Dr Nguyen Phi Le

Signature of advisor Department of Computer Science

School of Information and Communication ‘lechnology

Department

Institute

Hanai, 4-2023

Trang 3

CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM

ự do— Hạnh phúc

BẢN XÁC NHẬN CHỈNH SỬA LUẬN VĂN THẠC SĨ

Ho va (én tác piả luận văn: Bùi Hẳng Ngọc

DỀ tài luận văn: Cáo giải thuật tiền hoá và học tăng cường để tối tư thời gian

sông của mạng cảm biên khang day

Chuyên ngành: Khoa học đít liệu (liteeh)

Mã số SV: 20212155M

Tác giá, Người hướng dẫn khoa học vá Hội ding chim luận văn xác nhận tac gia

đã sửa chữa, bỗ sung luận văn theo biên bản họp Hội đồng ngày 28/04/2023 với các nội dung sau:

- _ Hiệu chỉnh tên chương 4 và chương 5 để làm rõ hơn nội dung tinh bay

- _ Thêm các thảo luận ở chương 5 nhằm làm rô hơn các sự khác biệt trang đồng

gop ở chương 3 sơ với đỗ án tốt nghiệp kỹ sư của củng tác giả

- _ Thêm các thảo luận ở chương 4 vá chương 5 để làm rõ động lực vả sự khác biệt

của thuật toán đề xuất so với các thuật toán trước đó

- Thêm các thảo luận vẻ hạn chế của các giả định so với mỗi trường thực tẻ trong

cáo bài toán ở chương 4 và chương 5

-_ Rà soái, hiệu chỉnh các lỗi soạn thảo

Trang 4

rey

GRADURATION THESIS ASSIGNMENT

Student's formation

Name : Bui Hong Ngoc

Phone : 098 490 934 Iimail: ngoc.bh21215Smú

Class : Data Science

ls-hust cản

Atilialion : Ianơi Ciriversily of Science and Technology

Duration : 10/11/2022 - 22/04/2023

‘Thesis title ; Evolutionary and Deep Reinforcement Learning Algorithms for Opti-

mizing the Litetrme of Wireless Sensor Networks:

Thesis statement

‘Tins thesis proposes evolutionary and deep reinforcement learning, algorithms to tackle two emerging techniques in prolonging the lifetime of wireless sensor networks,

Teclaratione/T2isclosures :

1~ Bui Hong Ngọc — hereby warrants that the work and presentation in this thesis are performed by myself under the supervision of Dr Nguyen Phi Le All results presented in this thesis are trulhful and are nol copied from any other works AU references in this thesis - including images, tables, Figures, and quotes - are clearly and fully ducumented in the bibliography, 1 will lake ful aespomsibilily for even

one copy that violates school regulations

Hanoi, date month year 2023

Author

But Hong Ngoc

Altestation of thesis advisor:

I certify’ that 1 have read this thesis and thet, in my opinion, it is fully adequate in scope and quality as a thesis for the degree of Master of Science

Hanoi, date month — year 2023

‘Thesis Advisor

Dr Nguyen Phi Le

Trang 5

Acknowledgments

I would like to express my gratitude to all those who have motivated and aided me in aclicving this siguificunl aniestone in my life, My heatlfell appreciation goes oul lo my thesis advisors, Dr Mgnyen Phi Le and Prof Do Phan Thuan, whose encomagement and guidance were invaluable in shaping the research questions and methodology Without their snpervision and insightfal suggestions, completing this thesis would have been an insunnountable challenge

Lowe a debt of gratitude to my advisor at VinAl, Dr Nenyen Viet Anh, for his un wavering support throughout my two years at VinAL His expertise and guidance have enabled me to nat only complate this thesis but also gain the knowledge and skills necessary for my futtue career, Beyond academic research, he was always available to provide advice aud guidance ou various aspects of uy personal and professional developinent Tis support has been a source of inspiration and motivation for me throughout my joumey

Additionally, The invaluable feedback and suggestions provided by Prof, Iuynh Thi Thanh Binh and Dr Nenyen ‘Thi Tam were essential to the completion ofthis thesis Their extensive expertise in Uke Geld was crucial Lo the development of various aspects of this

work, and J am deeply grateful tor their support and guidance throughout the process

Without their coulributious, Unis thesis would not have been completed

Lalso wish to acknowledge my friends at HUST and VAI, whose unwavering support aud companionship have been a source of strength and motivation throughout my tine

at the university and company The memories of playing, games, watching movies, and cujoying football matches are priccless and have kept me going

Lasly, I dedicale (his thesis lo ny Family: my parenls, who have always encouraged

me to pursue my dreams, my girlfriend, who has stood by me during my most difficult

moments, listening and empathizmg with my grievances about the world; and my sister,

who may sometimes be a nuisance, but always add color to my life Together, they have

shaped the person | am today.

Trang 6

olutionary and Deep Reinforcement Learning Algorithms

for Optimizing the Lifetime of Wireless Sensor Networks

Abstract

In recent years, wireless sensor networks (WSMs) lave become an essential part of numy civilian applications, such as sma agriculture, health monitoring, and stuart cili

However, the limited energy capacity of the sensors poses a significant challenge ta en-

swing continuows surveillance In this thesis, we study two emerging techniques to pro- Jong the network's lifetime: energy-efficient routing via relay node placement and routing, sualegice forthe mobile chaxger in witeless rechargeuble sensor uelworks (WRSNs)

The first problem involves optimizing the routing protocol by deploying non-sensing, relay nodes (RNs) Recent research either focused on finding the minimwun nnmber of RNs required to reduce deployment costs or developing, efficient ronting, schemes to Te- duce enerpy use However, striking a balance between these criteria remains a significant challenge To address this issue, we propose a multi-objective approach to constructing,

an efficicut communication structure with the least possible number of RNs, thereby prolonging the network's lifetime while ensuring its connectivity and reliability We conduct extensive experiments to demonstrate the effectiveness of our method and show that it provides a better trade-off compared to existing algorithms

Inthe second problem, we focus on addressing the challenges faced in WRSNe, where

a mobile charger can be employed to move around and change the sensors lixisting ap proaches struggle to design an optimal charging path for the mobile charger while accoum-

ing for the uncertainties azising in the network, which could come fun node failures or deployments To overcome this challenge, we propose a novel charging scheme that uses a deep reinfiacemen! leaning (DRL) approach tu guide (he mobile changer adaptively Our approach enables the mobile charger to adapt to spontaneous changes in the network topology ‘he experiments show the superiority of owrmodel compared lo existing on-demand

methods in prolonging the network lifetime

Our proposed solutions for the wo problems of relay node placement end mobile charger routing can significantly prolong the network's lifespan and reduce maintenance

coals, The polortiel for combining these two techniques in WRSNs to futher enlumce the

network's sustainalnlity and reliabrlity is an interesting future research direction Adopting, these solutions, especially new advancements in đecp reinforcement leaming can facilitate the development of more efficient and effective WSNs, enabling, ns to better monitor and

lnmage various systems aud processes in our duily lives

ii

Trang 7

Các giải thuật tiến hoá và học tăng GưỜng để tối tru thời gian

sống của mạng cảm biến không dây

‘Tom tắt luận vău

Trong những năm gân đây, mạng cám biến không dây đã trở thành một phản cân thiết trong nhiều ứng dụng đản sự, như giám sát sức khỏe, nông nghiệp và thành ghỏ thông,

mĩnh 'ftry nhiên, lượng năng lượng hạn chế được cài đạt trong các căm biến đạt ra thách

thức lớn trong việc đảm bảo địch vụ trong các ứng đụng yên cẩn giảm sát liên tục và với

tần snất cao Trong luận vân này, chúng tôi nghiên cứu hai phương pháp mới đẻ kéo đài

tuổi thọ mạng; định tuyển tiết kiệm năng lượng thông qua đãi các trút trung gi và chiến lược định tuyên cho bộ sạc đi động trong mạng căm trến cỏ thễ sạc lại không, đây

cách triển khai

Bài toán dâu tiên liên quan đến tối ru hóa giao thức định tuyến bã

" biển Các nghiên cứu gần đây lập trung vị

tối thiến mút trung gian cần thiết đế giảm chỉ phí triển khai hoặc phát triển các chiến lược

định tuyên để giản năng lượng sử dụng Tuy như, từn sự câu bằng giữa các tiêu chí này vẫn lä một thách thức lớn, Để giải quyết vẫn đề này chủng tôi để xuất một phương pháp

đa mục tiêu để xây đựng một câu trúc truyền thông hiệu quả với số lượng mút trung, gian ít

nhất có thể, tử dó kéo đải tuổi thọ mạng trong khi đâm bảo kết nói của nó Chủng tôi tiên

Hành các thử nghiện đề chứng aninh tính hiệu quả cứa phương pháp của chúng tới và chờ

thây nó cũng cắp sự cân bằng tốt hơn so với các thuật toán hiện có

Ở tải toàn thứ hai, clrủng tôi tập trung vào việc giải quyết các thách thức trong mạng,

cảm biển không đây sạc lại, noi một bộ sạc di động có thế được sử dụng để di chuyển và

am biển Bái toán quan trọng là thiết kế một chiến lược sạc hiệu quả chọ bộ sạc

mạng nhờ các cơ chế mới trong học máy Các thử nghiệm cho thấy tỉnh ru việt của mỏ

bình của chúng tôi sơ với các phương pháp hiện tại trong việc kéo đài tuổi thọ mạng,

Các giái pháp đề xmắt của chúng tối cho hai van để về vị trí nút trung gian và định

tuyển bộ sạc đi động có thể kéo đài tuổi thọ của mạng và giảm chỉ phí bảo tủ đáng kể

tiềm răng kết hợp hai kỹ thuật này trong muạng cảm biển sạc lại để tăng cường tỉnh bổn

vững và dáng tin cây của mạng là một hưởng nghiền cứu tiém năng trong tương lai Việc

áp dụng các giải pháp rà

mở đường cho các mô hình mạng cảm biến hiện quả hơn, giúp chúng ta giảm sát và quản

đặc biệt là những liền bộ mới trong học Lăng cường sâu, có thế

lý tắt hơn các hệ thống khác nhau trong cuộc sông ca đhủng ta

ii

Trang 8

2.1 (Multi-objective) Evolntionary Algorithms

2.1.1 Multi-objective optimization problems

2.1.2 Non-dominated sorting genetic algorithm

ty ray Deep Reinforcement Leaning,

2.3.1 Reintorcement learning and key concepts 2.2.2 Policy Gradient methods

2.2.3 Attention mechanisms

2.24 Pointing mechanism

3 Lileralue Review

3.1 Network lifetime in WSNe

3.2 _ Fnergy-efficient routing via relay node placement

Trang 9

4.2.2 Energy consumption model 4.2.3 Problem formulation

43 Proposed method

4.3.1 Solution weprexentation 43.2 Initialization

3 _ A Reioicemeid Leanung-bascd Cluuging Dolicy in WRSNs

4.3.1 Formulation of the DRL Framework

3.3.2 Model Architecture 3.3.3 Policy Optimization

Trang 10

List of Figures

1.1.1 Wiroless sensor network urchitecture (sowee: Balni (2018)

2.1.1 Parelo dominance (Sourve: Verma et al (2021)

2.1.2 Examples of the non-dominated sorting, algorithm and crowding ( distance

calculation (Source Verma et al (2021)

3.1.1 A block diagram of the architecture of the sensor node in the WSN

3.3.1 A comparison of offline and online cleaying scheme,

4.2.1 An example ofa network with 3 relays, 6 sensors, and max-hop constraint

is 3

4.3.1 An example of calculuting maximum nuober of children, Dashed lines

denotes potential edges 4.3.2 Anexanple of tte energy-oriented mulation ‘Ihe dashed blue lines denote

the potential added edges

4.3.3 An example of the relay-onented mutation

4.3.4 The disibution the suauber of used2elays generated Dy different nmudoni-

tree algorithms on a graph with 100 relays and 100 sensors

4.4.1 Height heatmaps of terrains

4.4.2 Feusitile ratio of the population on various mmax-hop constraints

4.4.3 Comparison of five algorithms on Nint

4.4.4 Box plots far C-metric The rectangle at row 4 and column 73 represent

C(A, 8}, Bach zeclangle ineludes 12 box plots (ell lo right) comespoud- ing ta 12 instances (Nn? to Nin24) C-metric values are scaled to [0.1]

4.4.5 The comparison of Parelu-fron! on lest sel $2 (in? to Nint2) aud 83

(Nin13 to Nin18) 4.4.6 Performance of competing, algorithms in different fon network instance

Nino

4.4.7 Comparisonot hypervolume, delta-metric and O.V VG with different com-

snuntication radius on network instance Ning

5.1.1 The drawback of on-demand charging scheme, Node # sends a charging

request right after the MC decides to change node A next

Trang 11

5.2.1 An illustration of wireless rechargeable sensor network for target-covering 42 3.3.1 Learning model of a reinforcement leaming system ad

5.3.3 valuation of the network Lifetime of competing algoritians when varying

the number of sensors, mumber of targets, or package gencration probability 49 5.4.1 The comparison of the aggregated energy consumption rate and the num-

ber of node failures when increasing the munber of sensors, number of

vii

Trang 12

List of Tables

4.4.1 Description of network instances The last column refers to the density of

4.4.2 The max-hop constraint (/i) on each type of dataset All instances are

eusured lo have valid solutions sài + sài sài +

4.4.3 The operator's probability of each algorithm 7w:s the crossover prababil-

ity and pra is the nvutation probability For Glrim, pir: represents a pair

of energy-omented and relay-nriented mutation probability

4.4.4 Performance of competing algorithns onthe set 51 Bold values indicate

the best values

4.4.5 Perfomance of competing algoridnns on teslsets 52 and $3, Buch lable

shows the results on one metric and hold values indicate the best value

4.4.6 Complexity comparison for cach algouithm

4.4.7 Average algonthm running time (in seconds)

5.3.1 State information The notation S$ and 1) indicate static and dynamic in-

Trang 13

List of Acronyms

AT Artificial Intelligent ix

BS base station ix, 19-22

DEM digital clevation model ix, 21

DL deep leaming ix

DRL deep reinforcement leaming, ix

DRI-TC

deep reinforcement learning ap-

proach for target coverage and con-

ToT Intemet of Things ix

MC mobile charger, ix, 40, 41, 44-47, 49,

41

MDP Markov decision proves, ix

MOEA multi-objective evolutionary algo-

rithan, ix, 22, 38

MORA/D muttiobjective evolutionary algo-

rithnn bared on decomposition, ix

MOO iuli-objective optimization, ix, 30,

38

IS ininimum spanning tree algorithm ix

NEBP Node-Energy Bottleneck problem in

multi-hop wireless sensor nerworks

is, 19, 21, 22, 30, 38

‘NetKeys network random keys ix NINP Nearest-Job-Next with Preemption

ix 40 NSGA fi non-dominated soting genetic al-

gorithon, ix

PI Pareto front ix QoS Quality of Service ix

RL reinforcement learning ix

RN relay node ix, 20 22

SN sensor node ix, 1, 13, 19 22 31

WRSN wireless rechargeable sensor net-

work ix, 41

‘WSN wireless sensor network

21

ix, 13, 20,

Trang 14

List of Notations

AMC battery capacity of the MC

5° battery capacity of a sensor

A ayeward function

T a transition model

ya disconnt factor (discount rate)

Aa sct of legal actions,

At Markov decision process

P aset of deployed sensors

Q aset of critical targets

Sastate space

ue charging vale,

v velocity of the MC

“vimeve ECR of the MC tor traveling

w energy consumption rate

cM yesidual energy of the MC

c residual energy of a sensor

mm number of critical targete

nunmber of deployed senzors

Trang 15

home automation (Pirbhulal et al., 2016), air/earthquake monitoring (Alphonsa and Ravi,

2016, Kingsy Grace and Manju, 2019), smart agriculture (Sanjeevi et al., 2020), health monitoring (Abdulkarem et al., 2020, Gardaševié et al., 2020), smart cities (Csdji et al.,

2017), A WSN typically consists of a few to hundreds or thousands of spatially dispersed and dedicated sensors to monitor specific targets or areas of interest Each sensor node (SN) is equipped with sensing, processing, and communication capabilities to convert an analog signal of physical quantity into a digital signal and connect the node to the network

‘The sensing data will be cooperatively transmitted through the wireless network to a base

station (BS), also known as a sink, where the data can be observed and analyzed,

Figure 1.1.1: Wireless sensor network architecture (source: Bali (2018))

Such a design offers WSNs the capability of being deployed on the fly and can operate

unattended, self-organizing without requiring any pre-existing infrastructure and with lit-

tle maintenance The sensor nodes collaborate to achieve a common goal, such as sensing and reporting temperature, humidity, or motion in a specific area The communication among the sensor nodes is usually achieved through multi-hop routing, where the data is

forwarded from node to node until it reaches the base station Therefore, they can cover

large areas and gather data from numerous sources simultaneously, providing real-time

1

Trang 16

monitoring of the target area Additionally, the use of WSNs can reduce labor and mainte-

nance costs since the SNs are equpped with low-cost, low-power batteries that can operate

for extended periods (Akyildiz et al 2002)

However, the limited energy capacity of the sensors poses a crucial challenge for ensuring continuons surveillance Ouce the battery is fully consumed, the sensor can 10 longer monitor the targets or relay the data, leading to network fragmentation and data loss in some punts of the senving field, Further, WSNs are oflen deployed in ash amd inacees- sible environments for hmmans, such as underground tunnels and battlefields, making it challenging to teplace the sensors’ batteries Hence, in most WSN applications, one of the primary objectives is to maximize the nelwotk's Lifespan while keeping il functional

to ensure continuons data transmission and monitoring of the targets (Yetgin et al., 2017)

‘The last two decades observed a remarkable effort of researchers put into designing new paradigms and protocols to prolong the network lifetime The approaches can he broadly classified inlo two usin calegorics: sensor fimetioning optinuzaton aud energy replenishment Sensor functioning optimization aims to increase the efficiency of SNs and reduce energy consumption through methods such as energy-efficient, routing (Raj

ct al., 2019), data aggregation (Goyal ct al., 2019), and sensor scheduling (Haimour and Abu-Sharkh, 2019), while energy replenishment focuses on providing extemal sources of energy, auch az energy harvestmg (Adu-Manu et al., 2018, Shaikh and Zeadally, 2016) or

‘wireless charging (Keswan et al., 2022, Qureshi ct al, 2022), to the sensor nodes

One advantage of optimizing sensor functioning is that it can extend network lifespan without added hardware or infrastructure These techniques can also save energy and be customized for various applications However, they may not suflice for high-uoughput needs that demand real-time data transmission, as this approach can only extend the sensors’ lifetime for a certain amount of time ‘The battery will eventually be exhansted if no extemal source supplies (hư seusors Meanwluile, energy replenishment provides a con- timuơns energy supply to sensors, possibly eliminating the need for battery replacements

‘This technique works in remote or harsh environments but may require extra hardware or inmfrastruclure, such as energy harvesters ur wireless chargers Llardware costs and run tenance are also potential drawbacks in some applications

‘The choice of technique to prolong the network lifetime in wireless sensor networks

depends on varions factors, including application reqnirementa, energy constraints, and vost considerations Iu this thesis, we study (wo problems, cael is an emergjng tech-

nique that has attracted significant attention in recent years for its potential to prolong the

network lifetime of WSNs Specifically, we investigate (1) energy-efficient routing via relay node placement aud (2) adaptive routing strategies forthe mobile claager in wireless

rechargeable sensor networks (WR SNS)

Energy-efficent routing via relay node placement Relay node placement is an essential technique ta optimize the functioning, of WSNs The idea is ta deploy non-sensing, relay nodes (RNs) to increase the network’s capability and balance the energy consumption among nodes As RNs are often determined after deploying, SNs we can optimize the placement of RNs to balance the energy consumption among nodes, which in tum pro- longs the network lifetime Past research efforts have focused on finding the minimum number of RNé required to rednce deployment costs while enauring QoS ritena such ag ueiwunk coverage aid comectivily Recent works have also considered ils potential to elongate the network lifetime However, striking a balance between these criteria remains

a significant challenge To address this issue, we proposed a multi-cbjective approach

2

Trang 17

to constructing an efficient communication structure (routing tree) with the least possible mmumber of R’Ns, thereby prolonging the network’s lifetime while ensnring its connectivity

Adaptive charging strategies in WRSNs, Despile remarkable progress in recent years,

sensor function optimization and energy harvesting techniques cannot provide reliable ser-

vice for high-throughput applications requiring, continuous surveillance Recent advauce-

incnts in wireless clanging provide a foundation fora novel scheme: wirelesy rechargeable sensor networks (WRSWs) The idea is to employ a mobile charger (MC) with a high-

power battery te go aronnd and charge the sensors wirelessly The main challenge here

lo designa suitable chiaging strategy for the mobile charger wile accounling for the uner- tainties arising inthe network Existing charging schemes either make a strict assumption

abont constant energy consumption rates or cannot adapt to unpredictable changes in the

nol woxk lopulogy To uvercune Lis challenge, we prupose a novel haying veeane thai uses a deep reinforcement leaming (DRL) approach to guide the mobile charger adaptively Our approach enables the mobile charger to adapt to spontaneous changes in the nol work Lopology

1.2 Thesis contributions

The main conizibuliuns of dis Uwsiz can be summarized ay follows:

+ Inthis study, we examine common approaches for prolonging, the network Infetimne

in wireless sensor nelworks, which cau be classified into two main groups: sensor functiening optimization and energy replenishment We investigate one emerging issue for each group, which can help in extending the network lifetime further

+ In Chapler 4, we investigate cnergy-cllicient routing lechuiquey by considering the relay node placement ax a mniti-objective problem We imroduce a novel multi objective evolntionary algorithm that exploits problem-specific features to find a Pareto-front that balances the trade-off between the number of relay nodes and the network's energy consumption We conduct extensive experiments to demonstrate the effectiveness of our method and show that it provides a better trade-off compared

to existing algorithms The paper summarizing the results is under review in the Soft Computing journal (Bui et al., 2023)

‘The resullz were published al the IEEE MASS, 2022 (Bui et al, 20:

13 Thesis outline

The thesis is structed as fullows:

In Chapter 2, we provide an overview of the evolutionary and deep reinforcement

leaning algorithmy ulilized in (his study We introduce the fundamental concepts and principles of these algorithms to provide readers with the necessary background knowl-

edge for subsequent chapters

Trang 18

In Chapter 3, we present a comprehensive survey of the network lifetime problem

in WSNs Our survey cavers cinvent state-of-the-art works in sensor fimetioning, apti-

mization and energy replenishment approaches We also provide an in-depth analysis of energy-efficient routing approaches and adaptive charging schemes in WRSWs This anal-

‘ysis will position on wark better in the literature

In Chapier 4, we delve inlo the relay node placement problem and present our multi objective framework for addressing this problem We describe the problem statement and the design of our algoritlan, which likes ile account provlem-specifie properties to fad

aPwcto-foul that balances the munber of relay nodes and the uctwork’s energy consuup-

tion We also conduct experiments to demonstrate the effectiveness of our approach and compare il with existing algoritlons

In Chapter 5, we present on adaptive deep reinforcement leaming, framework for

scheduling, charging trajectories for mobile chargers in WRSN settings We describe the

design of ou approach, which enables the mobile charger to choose the next charging destination adaptively, based on the curent energy levels of the sensors and the network's

topology We demonstrate the effectiveness of our model through extensive experiments

and compare it with existing on-demand charging methods

Finally, in Chapter 5.5, we conclude the thesis by summarizing our contributions and

highlighting te key Gaudings of our study, and then providing recommendations for fuvure works

Trang 19

Chapter 2

Background

In this chapter, we present un overview of the evolutiomny wd deep reinforcement lear

ing algorithms used in the study The purpose of this chapter is to introduce readers to

Ue basic convepls and principles of hese algunitlunus, which will be necessary for under- standing Ue subsequent chapurs We intzoduce nulti-vbjective optimization (MOO) prob- Jems and describe how evolntionary algorithm simulate the process of natural selection

lo evalve optimal solutions lo MOO problems in Section 2.1 We also explain how deep

Trimforcentent leaming algoritions combine newal nelworks ad reiuforcement leaning to

enable agents to leam from their expenences and improve their decision-making abilities over Lire in Section 2.2

2.1 (Multi-objective) Evolutionary Algorithms

2

.1 Multi-objective optimization problems

Muli-ubjective optimization (MOO) is a branch of oplimizalion Dial deals wilh problems involving multiple conflicting objectives In MOO, the goal is to find a set of solutions thal simultaneously oplimize multiple objectives These objectives often comllict with cach other, leading Lo no single solution being optimal for ell objectives Mathemnalically,

a multi-objective optimization problem can be formulated as

where ƒ : 4 - + Bis the ith objective fimction As there are conflicting criteria, we first xeed a definition of solution demination (o compare belween solutions

Definition 2.1.1 (Solution domination) A salition x, is said to dominate another solution

Xụ, denoted x, > xy, #x: is na worse than x, in all objectives and is strictly better in at

least one abjecttve

A solution x, is said to be non-dominated if there is no other solution in the population that dominates x- The Pareto front help represents the set of non-dominated solntions A solution is Parelu optimal if there is no ollier sulution thal is beller in all objectives The Pareto front can be mathematically formulated as follows Let x ¢ V denote a solution vector, where ¥ is the feasible region Let ffx) — ˆ#'(x}, /a(%) l„(x)] denets the

$

Trang 20

Figue 2.1.1: Pareto dominance (Source: Vea el all (2021)

vector of m objective functions Then, the Pareto front can be defined as:

P — {x 2 fix’ © A such that f(x’) ~ F(x)}

where F(x} ~: ffs’) denotes that x/ dominates x, meaning, that x/ is no warse than x in all objectives and strictly ever Uhan x ia at least oue objective, Figure 2.1.1 shows an example of the Pareto front The Pareto front is auseful tool for decision-making in MOO becanze it provides a set of trade-off solutsons that decision-makers can choose fram hased

on their preferences

2.1.2 Non-dominated sorting genetic algarithm

Multi-objective evolutionary algorithms (MOEAs) are a class of optimization algorithms that are designed to tackle MOO problems MORAs are inspired by the process of natura] selection, where individuals wilh betler Glness are snore likely to survive and reproduce

In MOEAs, candidate solutions are represented as individnals in a population, and the fimess of each individual is evaluated based on multiple objective functions ‘The goal of MOKAs is to find a set of Pareto optimal solutions

‘Non-dominated Sorting Genetic Algorithm (MSG) is one of'the most popular MOG algorithms that uses the genetic algorithm (CA) az a search method NSGA-ii was first introduced by Deb et al (2002) as an improvement over Ue traditioual GA fur ulti- objective optimazation problems The core idea behind NSGA is to sort the population of

candidate solutions inlo several frunis based un (heir won-dormination relationsinp The

Gast front contains nou-dontnated solutions, and the second front contains solutigus that

are dominated by solutions in the first tront, and so on By sorting the population into

frunts, NSGA ig able to mainlant a diverse set of Pareto oplimel solutions,

Similar to the original genetic algorithm, NSGA uses two main genetic operators

crossover andmutalion Crossover is a process of combining, two parent solutions to gen-

srale anew offspring solution, while mulationis a proces: of introducing random changes

toa solution NSGA uses toumament selection to choose parent solutions for crossover

and mutation Tn each generation, NSGA performs non-dorinated sorting to rank the

6

Trang 21

si Nga dnmfnalelsotirg Đi daueneetrlenhilom

Figure 2.1.2: Examples of the non-<dominated sorting algorithm and crowding distance calcnilation (Sonrce Verma et al (2021)

population into fronts and assigns a crowding distance value to each solution in the front The crowding distance value measmes how crowded a solution is in the front based on the distance to its neighboring solutions The crowding distance value of a sokttion x, in front #; is deÑmed ag

crowding_distance(x:.F,) =~

wel

where 17 is the number of objectives, f,(x;,1 ;) and &,(x\1, /) are the abjective fime-

tion values of the neighboring solutions of ; infront F along the #-th objective, and [?""* and 7" ore the maximum and zninimum objective fiction values, respectively, along the k-th objective over the entire population The crowding distance value ensures that s0- Intions that are diverse in the objective space are preserved in the population NSGA then

selects the best solutions trom the fronts based on their rank and crowding distance and

uses hem to generale Ute ext population ‘The selection operator ensures thal solutions i the first front are always selected, and solutions in the subsequent fronts are selected based

on ther crowding distance value We provide a pseudocode of NSGA-i in Algorithm 1

Trang 22

Algorithm 1 Non-dominated Sorting Genetic Algorithm

Input: Population size V, number of generations 7, crossover probability p., mutation

probshility p,,, selection operatar

1: Initwlize population /4 with 4 individuals

2: Evaluate objective functions for each individual in Po

Bret

A: while t <7 do

$ Qy¢ Create empty population

& H, © Create empty population

T: Pf ¢- Perform tournament selection on £,

fori« 1toN by2do

Select parents :r;,7;_, from / nsing binary tonmament selection

if Random number < ø, then

1ã: for j + i2+ ldo

lốc if Random mưnber < p,, then

+3: Merge #4, and é2) inta Hy

2: Perform non-duniualed sorting on A, Wo obtain Bom È;,

as: Set Pye ico

Sort remaining individuals in F; by crowding, distance

3a: Add lop (À — [24,1 ) individuals in 210 F344

2.2.1 Reinforcement learning and key concepts

Reinforcement leaming (KL) is one of the fundamental paradigms of machine leaming alongside supervised learning and unsupervised learning It involves an agent that inter-

acts with an environment, receiving rewards for its actions The objective of RI is to

enable (ie ageul to leam a policy tal maximizes the rewards il receives by ileratively try ing, different strategies and receiving feedback This way, the agent can adapt to changes

in the environment and improve its performance over time ipnre 5.3.1 illustrates the

x

Trang 23

basic framework of reinforcement leaming

IRL problems, an agent interacts with an environment aver a sequence of discrete lime steps Al each lime step, the agenl observes a sale of Le environment, lakes action, receives a reward, and transitions to a new state The state of the environment at each time

step depends only on the previous state and action and not on any stafes or actions that

occumed before thai

RL problems are often represented as Markov Decision Processes (MDPs), which are

characterized by a sel of slates 5, actions A, transition probabilities 1, reward function A,

anda discount factory The state space is the set of all possible states that the environment can be in The action space is the set of all possible actions that the agent can take The trausiion probabilites describe the probabilily of Irausilioning from one state (o another when taking a particular action The reward function maps each state-action pair ta a scalar reward The discount factor is a parameter that determines the relative importance

of future rewards

‘The goal of an RL agent in an MDP is to eam a policy =, which is a mapping from

stales to actions, (hal maximizes the expected cumulalive reward over time, ‘The optimal

policy is the policy thal muxdinizes the expected cumulative reward for all possible imittal states To maximize the cumnlative rewards, the agent needs to determine an appropriate policy (s) thal selecls the best action iar each stale, as well as a value fiaction V (s> (hat cestimatey Ue fulure rewards thal will be obtained Ly following the pulicy The interaction between the agent and the environment involves a sequence of actions that cause the environment lo change stale, This sequence can be described as au episode Uhal ends when the envionment reaches a terminal slate, At cach time step ¢ € 1.2 ,7, the agent ub-

serves the current state X,, takes action 4, and receives a reward 1 The trajectory is a

Ara Xe}

‘The mnodel gives all the necessary iufonmation abvul an environment: trusition prol- ability function 7 and reward fanclion R At state -r,, the agent chooses an action a,

which leads to a new state 7,1 and receives ä reward r, ¡ ‘his is a transition step

(2,01, 11,7141) The probability of this trausition ix

Pricuyastsilens a) = Pie = teen Ria = real Xr = tn Ar = ai] @3)

‘The policy » models the agent’s behavior at a state «+ w{aelse} = E„|+

r,] and the return G, is a aoonmmulatad đisconmt rewards: Œ, = f„¡ + 21h¿; + =

332 a3È ¿ii The discounting factor + € [0,1] determnes how much the agent cares

about rewards in the distant future relative to thosc in the imuncdiate future The value fnnetion is the expected retum of state s at time t, which can be calculated by:

Trang 24

‘The following, equation draws the connection between the valne fanction and the Q-function

2

In some cases, it is favorable to use the difference in 1etrn pŸ a state-action pair compared

to the expected return of that state We define that difference as the advantage value

Optimal value and policy The optimal value function produces (le miaxnnum rẻlunnt:

a) maxQ,(r.a) (2.8)

‘The optimal policy achieves optimal value functions

Thus, t2z(x) - VY(z) and (2,

-) — QUẦnn)

Bellman equations ‘The [iellman equation expresses the relationship between the value

of a state and the values of its successor states It states that the value of a state is equal to the immediate reward obtained hy taking action in that state plus the disconnted value of

the next state that the agent transitions ta

‘Traditional RL methods use a combination of model-ftee techniques (e.g Q-leaming, SARSA) and model-based approaches (¢.g dynamic programming) to leam optimal poli-

cies in environments with discrete and small state-action spaces

Deep reinforcement learning, (I9RL.) is a recent extension of RI that leverages deep neural networks to enable leaming in the high-dimensional state and action spaces In DRI, the agent learns to diractly map raw sensory inputs (¢.g., pixel values in images) to actions without relying on hand-engineered features ‘Ihis is achieved by combining deep neural networks with traditional RL algorithms, such as Q-learning and policy gradient methods As this thesis uses actor-critic algonthms, which are highly based on policy gradient methods, we provide background knowledge aboul policy gradiend methods in the following section,

2.2.2 Policy Gradient methods

Policy gradient methods are a class of reinforcement Jeaming algorithms that aim to optimize policies directly rather than computing a value Sumetion and then deriving a policy from it These methods have gained popularity recently due to their ability to handle continnons action spaces and their success in achieving state-of-the-art results in various applicalions, including game-playing, reboties, aud nalural language processing,

Policy gradient methods leam the policy directly with a parameterized function with respect to 4, x(a|r:4) We tram the agent to maximize the expected retum Specifically,

10

Trang 25

in continuous space with X, as the initial starting state:

‘We maximize the objective function /(@) using the gradient ascent method, The gradient

of J can be computed nsing Policy Gradient Theorem Sutton and Barto (2018) as follows

VIO) = RE [V laz

Taving the gradient, we can optimze policies directly using gradient ascent algorithms

However, early policy gradient methods suffered from high variance, which made them difficult to converge to optimal policies

‘To address this issne, actor-critic methods combine a policy network (the actor) with a value function network (the critic) to provide a more stable estimate of the policy gradient Specifically, two networks are maintained in actor-critic methods One network is used for learning, value function, namely the critic, denoted Vj,, and another Teams the mapping, between state and actions directly, known as the acter, denoted x9 The ctitic is used to

criticize the actions made by the actor, andthe actor adjusts its parameters im the direction

suggested by the criúc The gradient of the actor is now given by:

Over the last decade, encoder-decoder architecture has emerged as one of the most promi-

nent deep leaming architectures Originally introduced to solve the problem of mapping fixed-length input to output in sequence-to-sequence learning, the vanilla encoder-decoder

encodes a variable-input sequence to an internal, fixed-dimensional representation This

representation is then used by an RNN-hased decoder ta produce a variable-length output unlil a termination criterion is detected, However, a major drawback of this approacl is

its inability to remember long sentences To overcome this limitation, attention mecha-

nisms were intioduced Attention allows the decoder to use any af the encoder’s hidden

statcg instead of relying ou the fixed-length representation produced by the eucoder This

is achieved by creating shoutcuts that combine the entire input sequence into a context

vector, with weights assigned to represent how much attention is devoted to each input Mathematically, Ict us denote the encoder and decoder hidden states with (e1,¢y 064)

and (đi.da, đu} @ context vector at decoding time iis given by

= where u? is an alignment of the input vector, which is calculated by:

where:

Trang 26

‘This context vector « is later concatenated with decoder state d to make a prediction

or compnte a hidden vector for the next steps of the recnrrent model

2.2.4 Pointing mechanism

‘The pointing mechanism is a technique first proposed in (Vinyels et al., 2015) to produce discrete outyuts that correspond to positions in the input or example, in the combinato- rial problem - Travel Sailing Problem, the solution is a permutation of the input positions

In (Vinyals et al., 2015), the authors proposed a Pointer network which is an encoder- decoder LSTM model The inpul, including a sequence of the node's position, is encoded

by an LSTM encoder In the decoder, instead of blending the encoder hidden states c; into a context vector c at each decoder step, a eduction of attention mechanism is used to poinl tu ameinber of the inpul sequence Lo be selected as Ue oulput:

ai =softmax(u'), je {1.2 n} (2.19)

where a is considered as the probabilily to select inpul j im the decoder step ¢, ‘The nude

with the highest probability is chosen to be visited next The procedure is iteratively re-

peated to obtain the final solntion

12

Trang 27

et al,, 2002) Figure 3.1.1 provides a simplified diagram of the sensor node architecture

Figure 3.1.1: A block diagram of the architecture of the sensor node in the WSN

Sensor nodes are generally low-cost, resulting in a battery with a low capacity and usually non-renewable Therefore, extending the network's lifetime is a crucial factor that determines the overall efficiency and effectiveness of these networks The network lifetime is generally defined as the time until either coverage or connectivity is lost (Zhao and Gurusamy, 2008), Numerous efforts have been dedicated to extending the lifespan of wireless sensor networks (WSNs), which can be classified into two main groups: sensor

functioning optimization and energy replenishment

Sensor fumetioning optimization focuses on enhancing the efficiency of sensor nodes while reducing their energy consumption Various methods have been proposed in this area, such as data reduction, which involves removing redundant or unnecessary data to reduce the amount of information transmitted and, consequently, the energy consumption (Goyal et al., 2019), Another approach is sleep/wakeup schemes, which enable sensors to alternate between sleep and active modes based on predefined schedules or events, thereby reducing energy consumption duting idle periods (Haimour and Abu-Sharkh, 2019) Additionally, energy-efficient routing protocols have been developed to minimize energy

13

Trang 28

consumption by reducing the amber of hops between nodes and selecting paths with low

energy costa (Raj et al., 2019)

‘hese above methods have the advantage of being able to prolong the network lifetime

without requiring additional hardware or infustructue aud can Le implemented at the software level They also have the potential to achieve significant energy savings and

can be tailored to suit different application requirements [lowever, these leclmiques may

not be sufficient forhigh-Garoughpul applications lal require real-time or continuous data transmission, as this approach can only extend the sensors’ lifetime for « certain amount of time The battery will eventually be exhausted if no extemal source supplies the sensors

On the olher hand, energy replenishaent aims lo provide extemal sources of energy tothe sensor nodes to extend their lifetime One promising approach is energy harvesting, which involves converting ambient energy ftom the environment, such as solar, thermal

or kinetic cmmgy, inlo electrical euegy lo power the sensors (Adu-Manu el al, 2018) Nevertheless, this technique dramatically depends on an ambient sonroe that is usually unstable and uncontrollable Another approach is wireless charging, which imvolves wire- lesily (ransamilting energy to the sensory using cleclrumaguetic waves or magnetic resu- nance The ideais to employ a (or mnlti-) mobile charger (MC), which is equipped with a high-capacity battery and a transmission coll, to travel aromnd the sensing field and charge the sensor wirelessly (Qureshi el al., 2022)

Enayy repteniglacnt tecloiques can provide a continuous supply of energy lo the

sengor nodes without the need for battery replacements They also have the advantage of

being able lo operate in reole or harsh envommenty where traditional power sources:

may not be available However, these techniques may require additional hardware and infrastmetnre, such as enesgy harvesters or wireless chargers Moreover, the cost and nuinieace of such hardware and infaslruclue may be a concern i sume applivalious

In the folluwing sections, we focus on two problems of euergy-eflicient routing via relay node placement and charging policies for WRSNs

3.2 Energy-efficient routing via relay node placement

Encigy-efficient routing is a well-established research area that aims to reduce energy consumption while maintaining reliable data delnvery (Behera et al., 2022, Raj et al., 2019) Traditional approucles miosily focused on the existing structure of the WSNy lo determine the routing protocols Popnlar ones include hierarchical routing protocols, data-centiic routing, protocols, and location-based routing protocols ]Herarchical routmg, protocols like LEACH(Heinzcbnan ct al., 2000) divide the nctwork into clusters to minimize energy consumption, while data-centzic routing, protocols like Directed Diffusion (Intanagonwi- wat et al., 2003) route data based on the contant ofthe message T.ocation-based routing, protocols Like GRP (Karp and Kung, 2000) use location information to make routing, deci- sions However, the effectiveness of these traditional routing, protocols can be limited by the existing network topology and nade density

In zecent years, there has been a growing interest in relay node placement strategies

to enhance energy consumption in wireless sensor networks (Verma et al., 2015) The idea is to deploy additional non-sensing relay nodes to increase the network's capability and balauce the energy conswuption among nodes Originally, relay node placement is designed to enhance to QoS ciiteria of WSNs, such as network connectivity and fault tolerance (TTanh et al., 2019, I.ee e† al, 2015, Ma et a1, 2015, Sheikhi et al., 2021) Re-

14

Trang 29

cently, relay node placement has been considered more for its potential to help balance the energy consumption among nodes, which in tum elongates the network lifetime (Tam

et al., 2020) Tam et al (2020) have considered two objectives: minimizing the number

of relay nodes and minimizing the maximum node energy consumption to prolong the

network lifetime They proposed a weighted-sum approach to finding a routing tree that

maximizes the network’s lifetime with minimum additional relay nodes Although their work is limited to 2-hop WSNs and employs only the weighted-sum algorithm, it shows the potential of applying relay node replacement in prolonging the network lifetime This

thesis aims the enhance the existing results of Tam et al (2020) to multi-hop networks with a novel objective-oriented multi-objective algorithm We will discuss this problem

(a) Offline charging scheme (®) Online charging scheme

Figure 3.3.1: A comparison of offline and online charging scheme Wireless Rechargeable Sensor Networks (WRSNs) are a subset of WSNs that use wireless charging technology to replenish the sensor nodes’ energy This technology provides

a promising solution to one of the most significant challenges facing WSNs, which is the limited energy capacity of the sensors WRSNs consist of a mobile charger that traverses the sensing field and wirelessly charges the sensors that need energy This approach en-

sures continuous operation of the WSN without requiring manual battery replacements,

making it ideal for remote or harsh environments where accessing the sensor nodes is difficult, However, constructing an efficient charging policy for mobile chargers (MCs) to meet the dynamic charging requirements of the sensors is one of the most critical challenges in WRSNS, Various proposals have been put forward to address this challenge, and they can be broadly classified into two main categories: offline charging schemes and online charging schemes (Figure 3.3.1)

Offline charging scheme Offline charging schemes aim to optimize the charging policy before the charging process begins, usually by considering the expected workload, the capacity of the MCs, as well as the energy demand of the sensors, This approach enables the MC to charge the sensors with the optimal charging schedule, minimizing the over~ all energy consumption and extending the network lifetime Lyu et al (2019) propose

a periodic charging planning for mobile Wireless Charging Equipment with limited trav- ling energy They propose a Hybrid Particle Swarm Optimization Genetic Algorithm (HPSOGA) because of the NP-hardness of the problem In (Jiang et al., 2017), the authors jointly consider charging tour planning and MC depot positioning for large-scale WSNS Their method consists of charging tour planning, candidate depot identification

15

Trang 30

and reduction, depot deployment, and charging tour assignment The charging scheme also considers the association between the MC charging cycle and the sensor nodes’ Tite time, Maet al, (2018b) aim to minimize the sensor energy expiration time and the charging, tour length ofthe mobile charger They develop an approximation algozithm forthe charging utility maximization problem if the energy consumption of the mobile charger om ts changing tour is negligible and an elficien! heuzislic Irough a nen-uivial reduction fon

a length-constrained utility maximization problem otherwise However, this approach re- quires a strict asamnption ahout the constant energy consumption rate of sensor nodes,

witch 1s unrealistic im praclice

Online charging scheme Online charging, schemes detenmine the optimal charging pol-

icy while the charging process is ongoing, usually by monitoring, the energy levels of

the sensors and the remaining energy of the MC Online charging schemes are generally more flexible and adaptive, as they can adjust the charging policy in response to changing, network conditions In the on-demand charging problem, sensor nodes request charging, from the MC when their energy is depleted or falls under a predefined threshold ‘The

MC maintains a pool of these requests and determines the next sensor to charge among, the requested umes inthe pool The NINP (Lie ef al., 2013) algoritiun charges the closes! sensor node in the queue, while DWDP (Lin et al., 2019) uses double waning thresholds

and double preemption to optimize charging priorities and recharge deasllines TSync (Fu

et al,, 2015) constructs nested TSP tours to reduce travel dislance and changing delay, anid TSCA (Lin et al., 2017) minimizes the number of failed nodes while maximizing energy

efficiency Kaswan et al (2018) present a T.inear Programming (1.P) formulation for the

ondemand scheduling problem and (hen introduce an efficient solution based on a gravi- tational search algorithm (GSA) to tackle the problem PA and INMA (Zhu ot al., 2018)

are two efficient online charging algorithms that first consider dynamic energy consump lion rates based on their hidory slativlics and real-time wnergy consumption Recently,

reinforcement leamning-based algorithms (Cao et al., 2021, La et al., 2020) have also been

considered for designing on-demand charging schemes However, a common drawhack

of the above on-demand dgorilluns is the dependence on Ue chosen threshold for the charging, requests, making it sensitive to that setting We propose in this thesis a novel adaptive charging scheme by eliminating the on-demand charging request and using deep reinforcement Learning Lo train the policy The detail will be discuseed im Chapter Š

Trang 31

Chapter 4

An Evolutionary Algorithm for Optimal

Node Placement

This chapter investigates the energy-efficient routing problem in WSNs throngh relay

node placement Hixisting approaches solely focus on minimizing, the number of used

relay nodes without considering, energy consumption among nodes We propose a relay node placement approach in multi-hop wireless sensor networks with two objectives: minimizing the number of used telay nodes and minimizing the maximum node energy consumption The frst objective is to restrict the deployment costs, while the second is to

balance the energy consumption among nodes, which in tnm extends the netwark’s lite-

lime To improve the network's reliability, we also consider a hop count bound, which acts as a delay constraint for transmitting packages To solve our problem, we propose a

novel objective-oriented multi-objective evolunonary algonthm (MORAs) that leverages

Ihe problem-specific properties lo isuprove the algoritlan’s convergence rate, Simulation results on 3D datasets show that onr algorithm outperforms existing algorithms in all mea-

However, optimizing the placement of RNs is a challenging task az it often involves multiple conflicting ciiteria There are two variants of relay node placement problems in

tha hterature: unconstrained and constrained problems In the former problem, relay nodes

can be placed anywhere in the texrain In contrast, to avoid unreelistic relay deployments due to physical constiaints, the latter restricts the position of the additional relay nodes

at certain locations, which are determined in advance Tlowever, both typically form NP-

hard problems (Lloyd and Xue, 2006, Misra ct al., 2009); thus, in practical scttings, it is hard to obtain an optimal sohition in a suitable amount of time

‘Numerons works have been carried out to approximate the placement of RNs while

17

Trang 32

ensuring several constraints of connectivity and fault tolerance Ma et al (2015) studied the constrained relay node placement in WSNa and proposed a, connecuvity-aware local search algorithm to find the ininimum number of relay nodes so each sensor is covered by atleast one relay node Lee et al (2015) assured the fault tolerance of a partitioned WSN by establishing, a bi-connected inter-partition topology while still deploying, the Teast comnt of relay nodes Bagaa et al, (2017) leveraged a Rayleigh block-fading camel and weighted comamnication graph to construct a routing tree with a minimum munber of additional relay nodes anh et.al (2019) introduced a multi-objective problem that simultaneonsly cousiders the target coverage, coumectivily, aud fault tolerance of WSNs Sheikhi et al (2021) proposed the two phases approach to provide multi-path routing and fault-tolerance with higher network connecrivity in heterogeneons WSNs

Most of the above works consider multi-hop communication (o enswe couneetivily and prevent long-hop transmission However, unlimited hop communication could in-

crease the network’s latency and reduce its reliability Bhattacharya and Kumar (2014)

As it iv difficull to measue nctwork latency before node deployments, ahup count bound

is often nsed as a surrogate constraint for network latency (Bhattacharya and Kumar, 2014 Liang el al, 2019, Ma ef al., 2017, 2018a) Bhattacharya and Kuntar (2014) fist studied a cousttained relay placement problem wilh hop count bound and showed ils NP-harduess

‘They then proposed a polynomial tme approximation algonthm forthe problem Ma et al

(2017, 2018a) also uzed the hop coum! lo measure delay and reliability wud formulated the 2-counceted Lop-oustrained relay node placement (HCRNP) problem Two approxima

tion algorithme are proposed to salve this problem Liang et al (2019) later conducted

catensive real-world deployments of WSNs using existing algoriduns and then proposed

a Sct-Covering-based Algorithm (SCA) to ensure the quality of communication in the

network with a hep count baund as a delay constraint,

Despile the promising results, a drawback of the aforementioned works is the lack of considering the energy consumption of nodes in the placement The energy consumption

of the nodes m a WSN is well-known ta be imbalanced since it depends heavily on the

inunber of relayed packets and Lhe distance to the next node in (he network topology (Gu- levia and Verma, 2019) Thus, balancing the load among nodes is essential to prolong the

network's litetime [Iowever, the network's litetime and the cost of deploying, additional

rely nodes are Wo conflicting criteria Deploying moze relay nodes Lypically increases the network's capability and provides more possibilities for load balancing but induces

more cost in the deployment Recently, Tam et al (2020) first considered these two ob-

jectivey in their design ‘They proposed a weighled-swan approach to finding a routing tree that maximizes the network lifetime with minimum additional relay nodes However, they

only consider the 2-hop WSNs, which are only snitable for small networks Additionally,

the weighted-sum strategy must make certain assumptions when assigning weight values regarding how ‘impontant’ a criterion is compared to the other

To overcome these issues, we introduce a novel problem, called Node-Tnergy Rottle- neck Prublem (NEBP), which considas both objectives: minimize the number of telays used and maximize the network lifetime in a multi-hop setting with a hop count bound

‘This paper ums v calablish a communication structure (couting tree) thal has balance con munication among, nodes with minimal additional RNs The main difference compared

to Tam et al (2020) is that we consider a ranlti-hop scheme with a delay constraint by tim- iting the maximum munber of communication hops for cach SN lowards BS, Moreover,

we focus instead on using MOEAs to solve two objectives simultaneously

MOT-As are favorable for their alnlity to provide Pareto fronts of non-dominated zolu-

1K

Trang 33

tions in the objective function space These Pareto fionts endow decision-makers to select asolution that fits them best In evolutionary-based approaches, a population of candidate solutions is maintained and evolved toward better solutions There are two main types of representation of an individual in the population: indirect and direct

Tm an indirect representation, the candidate solntions are mapped to a different space where shmdard crossover and mulation operators can be applied, As we ate focusing,

on constructing a ronting tree, a standard approach could use Prnfer encoding (Prifer, 1918), link and node biased (Palmer and Kershenbaum, 1594), or Network random key (NetKeyz) (Rothlanf et al., 2002) as dhe solution encoder Recently, Prekush et al, (2020) leveraged a permutation encormg with a heuristic decoder to propose a hybrid mult objective evolutionary algorithm (TMORA) to find a minimal spanning tree with a minimum dimucter, The advantage of inutizee! representation is that we can adopl standard operators directly on the solution reprezentaiton (Nayyar et al., 2018) However, most of these representations suffer from the low locality (small changes in the code can lead to Jarge changes in the decoded tree) (Prufer, 1918), or infeasible and redundant representations (Prakash et al., 2020, Rothlanf et al., 2002)

On the olher land, direct representations can use # simple encuding method such

as edge sets encoding (Raidl and Julstrom, 2003) and then perfonn a problem-specific

crossover and mutation directly on phenotypes to create new affsprmg (Rothlauf, 2006)

The main advenlage of this scheme is the abilily to upply a lcuristic Lo guide search operators Hao and Lin (2017) Therefore, in this paper, we use edge sets encoding to represent the solutions and then propose the novel crossover and mmtation operators to solve the Node-Eumgy Bolleneck problem in multi-hop wirelesy seusoruetworks (NEBE)

‘We outline the contributions of this chapter as follows

+ First, we iutoduce a novel problem called uke Node-Energy Bollleneck (NEBL), which considers multi-hop networks with a hop count bound We aim to minimize

two objectives: i) the rumber of used relay nades; ii) the maximmm node energy

consumption Lo prolong the network lifetime

+ Secondly, we propose Guided Prim NGSA-II (GPrim) to solve the proposed prob-

Jem The uovellies of the proposed GPrim can be summarized as follows i) accord ing to the problenmspecific characteristics, encoding-based edge-sel and decoding, methodologies are developed to represent the solution space: ii) we leverage the problems energy property Lo develop a heuristic Prim-based crossover and two mu tations including, energy-oriented mutation and relay-oriented mutation to improve the convergence rate of the algorithm

+ The proposed algorithm is validated against different encoding methods, including, Permutation, Prufer code, NetKeys, and Edge sets The comparison is delivered on

‘various metrics showing that onr algorithm ontpertorms existing approaches by a

rE}

Trang 34

temains We consider {wo lypes of connceliow: relay nodes ~-baze station and sensor nodes - sensor nodes/relay nodes Sensing data is gathered by SXs and scut to aBS through arclay

nove or other sensors The data transmitted to relay nodes can only be forwarded directly

to the base station rather than other sensors or iclays We asswne the sensor network is static, meaning SNs have already been deployed, and a finite set of potential positions for

SNs is known in advance The base station is a sink node deployed at the central terrain

wilh an unlimited power supply while relay nodes (RNs) und SNs have the varue initial energy, which camot be replenished

4.2.2 Energy consumption model

Numerous energy dissipation madels in WSNs are studied with different assumptions mn

this work, we use the same energy model in Gawade and Nelbalwar (2016), which ac- counts for the dissipated energy at both the receiver and transmitter during a transmission The free space model (¢* power loss) is used for proximal transmissions, and the multi- pall fading model (¢* power loss) is cousidered for hage-dislance transmissions ‘Thus the energy dissipated by the transmitter for transmitting, an [-bit packet to a distance d is given by:

where do = V is the distance threshold for swapping amplification models and z„

indicates the Tange with which a node can communicate In other words, no camnection

will be established among the nodes out of this range

‘The energy consumption of [he rẻ

"Thẻ đisgipaLed cucigy uf'a node reeviving 7 packels and (ausinilting them ly the parent,

node is calculated by the following formula

where Eu(4), , are calctllared as in Equation 4.1 and 4.2, respectively The argument ¢

eqnall 1 iŸthe nođe is a sensor nade, and 9 otherwise ‘he argument ¢ is the transmission

distance The network parameters showa in Table 4.2.1 are sel ay in Wu and Liu (2013)

Trang 35

4.1.3 Problem formulation

‘We cousider a wireless sensor network including a set of deployed sensor node S =

Sx, } 8 eet of potential relay nodes J = {r-,72, ,7a.f and a base station de-

sa The posilion of each node is represented as a single point in 3D space Ural is intexpolated from the digital elevation model (DEM) model (Florinsky, 2016) The com-

munication between two nodes can only be established if the Enclidean distance between

them docs nol exceed the communicalion rage 7

that limits the maxinvam mmber of communication hops for each SN towards HS (depth

of the routing tree) We denote the problem as the Node Energy Bottleneck problem with hop connt bound The formal formulation below models the desired structnre as a Steiner tree (Liwang and Richards, 1992)

Definition 4.2.1 (Steiner tres) Given an undirected graph G = {V ) and a set of termi-

nal nades N CV A tree T = (V7, Tip) is called a Steiner tree if it contains no cycles and

spans all terminal nodes, NS Vp & V The set of nodes Vs N is called Steiner nodes

Input

+ @ = (V, F) isan undirected graph, where V = SL it {so} is set of vertices in the graph, $ is set of sensor nudes, His scl of relays, and ag comesponds to base station

+ N= SU {so} denotes the set of terminal nodes

+r € T* is the communication range

<d: Vx V > Rt is the distance function An edge = (u,v) ¢ £ only if

len(so.7) <A Wee Vy,

‘where lenu, ) denotes the Jength of the unique path between two nodes 2 and z-

Output: A valid solution is a Steiner tree 2” (Vp, Ey) that spans the set of terminal nodes N = 5'U {sc} and satisfies the above constraints

Figure 4.2.1 shows an example of a network with three relays and six sensors

Objectives: ‘The Node-Euergy Bollleneck yroblem in multi-hop wireless sensor met works (NEBP} seeks a Steiner tree T = (Vp Er} inthe valid output space that optimizes

two following objectives:

Trang 36

Potential network ‘Aeandidato soluton ‘An infeasible soation

Figure 4.2.1: An example of a network with 3 relays, 6 sensors, and max-hop constraint is3

* Minimize the number of selected SNs (Steiner node):

* Minimize the maximum energy consumption of each node:

where £) is calculated as Equation 4.3

‘We propose a phenotype-based multi-objective evolutionary algorithm (MOEA) named

Guided Prim NGSA-II (GPrim) to solve the NEBP In this scheme, the population is

first initialized by a random-tree algorithm in which the candidate solutions are encoded

by the edge sets encoding method Then, we apply the NSGA-II algorithm (Deb et al., 2002) to maintain and evolve the population towards better solutions using the problem- specific search operators, We leverage the problem's energy property to develop a heuristic Prim-based crossover and two objective-oriented mutations to reduce ineffective moves from standard search operators The specifics of solution representation, initialization, crossover, and mutation are described below

4.3.1 Solution representation

Although a direct representation needs no mapping between the phenotypic and genotypic space, a data structure is still necessary for processing Li (2001), Rothlauf (2006) We use edge-set encoding on this problem for its simplicity This encoding can act as the basis for evaluating the solution or be converted to an adjacency list in linear time As we want

to find a Steiner Tree that can connect all SNs to the BS, the number of vertices in the solutions is not consistent For simplicity, we initialize solutions with the connections from the BS to every RN, and this structure is maintained in all candidate solutions in the population The RNs with no connection to any SNs are later removed from the output structure by the decoder

Trang 37

a spanning tree from a start node by adding an adjacent node at random, regardless of its weight Moreover, we adapt PrimRST to consider max-hop constraint by maintaining nodes’ depth while creating a tree We call this algorithm as HCPrimRST (Algorithm 2)

Applying HCPrimRST with max-hop constraint may lead to an invalid, non-connected

structure The initialization is thus divided into two phases The first phase initializes

the second phase, we relax its constraint and continue to build the tree obtained from the

first phase to get a valid connected tree

Algorithm 2 HCPamRST

Input: ‘The set of initialized edges 7’, set of vertices V7, set of potential edges , max-hop constraint /i

Output: The set of used edges 7

d & depth of vertices in partial tree T # by dfs from root 0

Định dạng
Số trang	75
Dung lượng	1,99 MB

Tiêu đề	Evolutionary and Deep Reinforcement Learning Algorithms for Optimizing the Lifetime of Wireless Sensor Networks
Tác giả	Bui Hong Ngoc
Người hướng dẫn	Dr. Nguyen Phi Le
Trường học	Hanoi University of Science and Technology
Chuyên ngành	Data Science
Thể loại	Thesis
Năm xuất bản	2023
Thành phố	Hanoi