olutionary and Deep Reinforcement Learning Algorithms for Optimizing the Lifetime of Wireless Sensor Networks Abstract In recent years, wireless sensor networks WSMs lave become an ess
Trang 1HANGOI UNIVERSITY OF SCIENCE AND TECHNOLOGY
MASTER THESIS
Evolutionary and Deep Reinforcement
Learning Algorithms for Optimizing the
Lifetime of Wireless Sensor Networks
BUIHONG NGOC
ngoc.bh?12155m@sis hust.edu Major : Data Science (Llitech)
Trang 2HANGOI UNIVERSITY OF SCIENCE AND TECHNOLOGY
MASTER THESIS
Evolutionary and Deep Reinforcement
Learning Algorithms for Optimizing the
Lifetime of Wireless Sensor Networks
BUIHONG NGOC
ngoc.bh?12155m@sis hust.edu Major : Data Science (Llitech)
Thesis advisor Dr Nguyen Phi Le
Signature of advisor Department of Computer Science
School of Information and Communication ‘lechnology
Department
Institute
Hanai, 4-2023
Trang 3CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM
ự do— Hạnh phúc
BẢN XÁC NHẬN CHỈNH SỬA LUẬN VĂN THẠC SĨ
Ho va (én tác piả luận văn: Bùi Hẳng Ngọc
DỀ tài luận văn: Cáo giải thuật tiền hoá và học tăng cường để tối tư thời gian
sông của mạng cảm biên khang day
Chuyên ngành: Khoa học đít liệu (liteeh)
Mã số SV: 20212155M
Tác giá, Người hướng dẫn khoa học vá Hội ding chim luận văn xác nhận tac gia
đã sửa chữa, bỗ sung luận văn theo biên bản họp Hội đồng ngày 28/04/2023 với các nội dung sau:
- _ Hiệu chỉnh tên chương 4 và chương 5 để làm rõ hơn nội dung tinh bay
- _ Thêm các thảo luận ở chương 5 nhằm làm rô hơn các sự khác biệt trang đồng
gop ở chương 3 sơ với đỗ án tốt nghiệp kỹ sư của củng tác giả
- _ Thêm các thảo luận ở chương 4 vá chương 5 để làm rõ động lực vả sự khác biệt
của thuật toán đề xuất so với các thuật toán trước đó
- Thêm các thảo luận vẻ hạn chế của các giả định so với mỗi trường thực tẻ trong
cáo bài toán ở chương 4 và chương 5
-_ Rà soái, hiệu chỉnh các lỗi soạn thảo
Trang 4rey
GRADURATION THESIS ASSIGNMENT
Student's formation
Name : Bui Hong Ngoc
Phone : 098 490 934 Iimail: ngoc.bh21215Smú
Class : Data Science
ls-hust cản
Atilialion : Ianơi Ciriversily of Science and Technology
Duration : 10/11/2022 - 22/04/2023
‘Thesis title ; Evolutionary and Deep Reinforcement Learning Algorithms for Opti-
mizing the Litetrme of Wireless Sensor Networks:
Thesis statement
‘Tins thesis proposes evolutionary and deep reinforcement learning, algorithms to tackle two emerging techniques in prolonging the lifetime of wireless sensor net- works,
Teclaratione/T2isclosures :
1~ Bui Hong Ngọc — hereby warrants that the work and presentation in this thesis are performed by myself under the supervision of Dr Nguyen Phi Le All results presented in this thesis are trulhful and are nol copied from any other works AU references in this thesis - including images, tables, Figures, and quotes - are clearly and fully ducumented in the bibliography, 1 will lake ful aespomsibilily for even
one copy that violates school regulations
Hanoi, date month year 2023
Author
But Hong Ngoc
Altestation of thesis advisor:
I certify’ that 1 have read this thesis and thet, in my opinion, it is fully adequate in scope and quality as a thesis for the degree of Master of Science
Hanoi, date month — year 2023
‘Thesis Advisor
Dr Nguyen Phi Le
Trang 5Acknowledgments
I would like to express my gratitude to all those who have motivated and aided me in aclicving this siguificunl aniestone in my life, My heatlfell appreciation goes oul lo my thesis advisors, Dr Mgnyen Phi Le and Prof Do Phan Thuan, whose encomagement and guidance were invaluable in shaping the research questions and methodology Without their snpervision and insightfal suggestions, completing this thesis would have been an insunnountable challenge
Lowe a debt of gratitude to my advisor at VinAl, Dr Nenyen Viet Anh, for his un wavering support throughout my two years at VinAL His expertise and guidance have enabled me to nat only complate this thesis but also gain the knowledge and skills neces- sary for my futtue career, Beyond academic research, he was always available to provide advice aud guidance ou various aspects of uy personal and professional developinent Tis support has been a source of inspiration and motivation for me throughout my joumey
Additionally, The invaluable feedback and suggestions provided by Prof, Iuynh Thi Thanh Binh and Dr Nenyen ‘Thi Tam were essential to the completion ofthis thesis Their extensive expertise in Uke Geld was crucial Lo the development of various aspects of this
work, and J am deeply grateful tor their support and guidance throughout the process
Without their coulributious, Unis thesis would not have been completed
Lalso wish to acknowledge my friends at HUST and VAI, whose unwavering support aud companionship have been a source of strength and motivation throughout my tine
at the university and company The memories of playing, games, watching movies, and cujoying football matches are priccless and have kept me going
Lasly, I dedicale (his thesis lo ny Family: my parenls, who have always encouraged
me to pursue my dreams, my girlfriend, who has stood by me during my most difficult
moments, listening and empathizmg with my grievances about the world; and my sister,
who may sometimes be a nuisance, but always add color to my life Together, they have
shaped the person | am today.
Trang 6
olutionary and Deep Reinforcement Learning Algorithms
for Optimizing the Lifetime of Wireless Sensor Networks
Abstract
In recent years, wireless sensor networks (WSMs) lave become an essential part of numy civilian applications, such as sma agriculture, health monitoring, and stuart cili
However, the limited energy capacity of the sensors poses a significant challenge ta en-
swing continuows surveillance In this thesis, we study two emerging techniques to pro- Jong the network's lifetime: energy-efficient routing via relay node placement and routing, sualegice forthe mobile chaxger in witeless rechargeuble sensor uelworks (WRSNs)
The first problem involves optimizing the routing protocol by deploying non-sensing, relay nodes (RNs) Recent research either focused on finding the minimwun nnmber of RNs required to reduce deployment costs or developing, efficient ronting, schemes to Te- duce enerpy use However, striking a balance between these criteria remains a significant challenge To address this issue, we propose a multi-objective approach to constructing,
an efficicut communication structure with the least possible number of RNs, thereby pro- longing the network's lifetime while ensuring its connectivity and reliability We conduct extensive experiments to demonstrate the effectiveness of our method and show that it provides a better trade-off compared to existing algorithms
Inthe second problem, we focus on addressing the challenges faced in WRSNe, where
a mobile charger can be employed to move around and change the sensors lixisting ap proaches struggle to design an optimal charging path for the mobile charger while accoum-
ing for the uncertainties azising in the network, which could come fun node failures or deployments To overcome this challenge, we propose a novel charging scheme that uses a deep reinfiacemen! leaning (DRL) approach tu guide (he mobile changer adaptively Our approach enables the mobile charger to adapt to spontaneous changes in the network topol- ogy ‘he experiments show the superiority of owrmodel compared lo existing on-demand
methods in prolonging the network lifetime
Our proposed solutions for the wo problems of relay node placement end mobile charger routing can significantly prolong the network's lifespan and reduce maintenance
coals, The polortiel for combining these two techniques in WRSNs to futher enlumce the
network's sustainalnlity and reliabrlity is an interesting future research direction Adopting, these solutions, especially new advancements in đecp reinforcement leaming can facilitate the development of more efficient and effective WSNs, enabling, ns to better monitor and
lnmage various systems aud processes in our duily lives
ii
Trang 7Các giải thuật tiến hoá và học tăng GưỜng để tối tru thời gian
sống của mạng cảm biến không dây
‘Tom tắt luận vău
Trong những năm gân đây, mạng cám biến không dây đã trở thành một phản cân thiết trong nhiều ứng dụng đản sự, như giám sát sức khỏe, nông nghiệp và thành ghỏ thông,
mĩnh 'ftry nhiên, lượng năng lượng hạn chế được cài đạt trong các căm biến đạt ra thách
thức lớn trong việc đảm bảo địch vụ trong các ứng đụng yên cẩn giảm sát liên tục và với
tần snất cao Trong luận vân này, chúng tôi nghiên cứu hai phương pháp mới đẻ kéo đài
tuổi thọ mạng; định tuyển tiết kiệm năng lượng thông qua đãi các trút trung gi và chiến lược định tuyên cho bộ sạc đi động trong mạng căm trến cỏ thễ sạc lại không, đây
cách triển khai
Bài toán dâu tiên liên quan đến tối ru hóa giao thức định tuyến bã
" biển Các nghiên cứu gần đây lập trung vị
tối thiến mút trung gian cần thiết đế giảm chỉ phí triển khai hoặc phát triển các chiến lược
định tuyên để giản năng lượng sử dụng Tuy như, từn sự câu bằng giữa các tiêu chí này vẫn lä một thách thức lớn, Để giải quyết vẫn đề này chủng tôi để xuất một phương pháp
đa mục tiêu để xây đựng một câu trúc truyền thông hiệu quả với số lượng mút trung, gian ít
nhất có thể, tử dó kéo đải tuổi thọ mạng trong khi đâm bảo kết nói của nó Chủng tôi tiên
Hành các thử nghiện đề chứng aninh tính hiệu quả cứa phương pháp của chúng tới và chờ
thây nó cũng cắp sự cân bằng tốt hơn so với các thuật toán hiện có
Ở tải toàn thứ hai, clrủng tôi tập trung vào việc giải quyết các thách thức trong mạng,
cảm biển không đây sạc lại, noi một bộ sạc di động có thế được sử dụng để di chuyển và
am biển Bái toán quan trọng là thiết kế một chiến lược sạc hiệu quả chọ bộ sạc
mạng nhờ các cơ chế mới trong học máy Các thử nghiệm cho thấy tỉnh ru việt của mỏ
bình của chúng tôi sơ với các phương pháp hiện tại trong việc kéo đài tuổi thọ mạng,
Các giái pháp đề xmắt của chúng tối cho hai van để về vị trí nút trung gian và định
tuyển bộ sạc đi động có thể kéo đài tuổi thọ của mạng và giảm chỉ phí bảo tủ đáng kể
tiềm răng kết hợp hai kỹ thuật này trong muạng cảm biển sạc lại để tăng cường tỉnh bổn
vững và dáng tin cây của mạng là một hưởng nghiền cứu tiém năng trong tương lai Việc
áp dụng các giải pháp rà
mở đường cho các mô hình mạng cảm biến hiện quả hơn, giúp chúng ta giảm sát và quản
đặc biệt là những liền bộ mới trong học Lăng cường sâu, có thế
lý tắt hơn các hệ thống khác nhau trong cuộc sông ca đhủng ta
ii
Trang 82.1 (Multi-objective) Evolntionary Algorithms
2.1.1 Multi-objective optimization problems
2.1.2 Non-dominated sorting genetic algorithm
ty ray Deep Reinforcement Leaning,
2.3.1 Reintorcement learning and key concepts 2.2.2 Policy Gradient methods
2.2.3 Attention mechanisms
2.24 Pointing mechanism
3 Lileralue Review
3.1 Network lifetime in WSNe
3.2 _ Fnergy-efficient routing via relay node placement
Trang 94.2.2 Energy consumption model 4.2.3 Problem formulation
43 Proposed method
4.3.1 Solution weprexentation 43.2 Initialization
3 _ A Reioicemeid Leanung-bascd Cluuging Dolicy in WRSNs
4.3.1 Formulation of the DRL Framework
3.3.2 Model Architecture 3.3.3 Policy Optimization
Trang 10List of Figures
1.1.1 Wiroless sensor network urchitecture (sowee: Balni (2018)
2.1.1 Parelo dominance (Sourve: Verma et al (2021)
2.1.2 Examples of the non-dominated sorting, algorithm and crowding ( distance
calculation (Source Verma et al (2021)
3.1.1 A block diagram of the architecture of the sensor node in the WSN
3.3.1 A comparison of offline and online cleaying scheme,
4.2.1 An example ofa network with 3 relays, 6 sensors, and max-hop constraint
is 3
4.3.1 An example of calculuting maximum nuober of children, Dashed lines
denotes potential edges 4.3.2 Anexanple of tte energy-oriented mulation ‘Ihe dashed blue lines denote
the potential added edges
4.3.3 An example of the relay-onented mutation
4.3.4 The disibution the suauber of used2elays generated Dy different nmudoni-
tree algorithms on a graph with 100 relays and 100 sensors
4.4.1 Height heatmaps of terrains
4.4.2 Feusitile ratio of the population on various mmax-hop constraints
4.4.3 Comparison of five algorithms on Nint
4.4.4 Box plots far C-metric The rectangle at row 4 and column 73 represent
C(A, 8}, Bach zeclangle ineludes 12 box plots (ell lo right) comespoud- ing ta 12 instances (Nn? to Nin24) C-metric values are scaled to [0.1]
4.4.5 The comparison of Parelu-fron! on lest sel $2 (in? to Nint2) aud 83
(Nin13 to Nin18) 4.4.6 Performance of competing, algorithms in different fon network instance
Nino
4.4.7 Comparisonot hypervolume, delta-metric and O.V VG with different com-
snuntication radius on network instance Ning
5.1.1 The drawback of on-demand charging scheme, Node # sends a charging
request right after the MC decides to change node A next
Trang 115.2.1 An illustration of wireless rechargeable sensor network for target-covering 42 3.3.1 Learning model of a reinforcement leaming system ad
5.3.3 valuation of the network Lifetime of competing algoritians when varying
the number of sensors, mumber of targets, or package gencration probability 49 5.4.1 The comparison of the aggregated energy consumption rate and the num-
ber of node failures when increasing the munber of sensors, number of
vii
Trang 12List of Tables
4.4.1 Description of network instances The last column refers to the density of
4.4.2 The max-hop constraint (/i) on each type of dataset All instances are
eusured lo have valid solutions sài + sài sài +
4.4.3 The operator's probability of each algorithm 7w:s the crossover prababil-
ity and pra is the nvutation probability For Glrim, pir: represents a pair
of energy-omented and relay-nriented mutation probability
4.4.4 Performance of competing algorithns onthe set 51 Bold values indicate
the best values
4.4.5 Perfomance of competing algoridnns on teslsets 52 and $3, Buch lable
shows the results on one metric and hold values indicate the best value
4.4.6 Complexity comparison for cach algouithm
4.4.7 Average algonthm running time (in seconds)
5.3.1 State information The notation S$ and 1) indicate static and dynamic in-
Trang 13List of Acronyms
AT Artificial Intelligent ix
BS base station ix, 19-22
DEM digital clevation model ix, 21
DL deep leaming ix
DRL deep reinforcement leaming, ix
DRI-TC
deep reinforcement learning ap-
proach for target coverage and con-
ToT Intemet of Things ix
MC mobile charger, ix, 40, 41, 44-47, 49,
41
MDP Markov decision proves, ix
MOEA multi-objective evolutionary algo-
rithan, ix, 22, 38
MORA/D muttiobjective evolutionary algo-
rithnn bared on decomposition, ix
MOO iuli-objective optimization, ix, 30,
38
IS ininimum spanning tree algorithm ix
NEBP Node-Energy Bottleneck problem in
multi-hop wireless sensor nerworks
is, 19, 21, 22, 30, 38
‘NetKeys network random keys ix NINP Nearest-Job-Next with Preemption
ix 40 NSGA fi non-dominated soting genetic al-
gorithon, ix
PI Pareto front ix QoS Quality of Service ix
RL reinforcement learning ix
RN relay node ix, 20 22
SN sensor node ix, 1, 13, 19 22 31
WRSN wireless rechargeable sensor net-
work ix, 41
‘WSN wireless sensor network
21
ix, 13, 20,
Trang 14List of Notations
AMC battery capacity of the MC
5° battery capacity of a sensor
A ayeward function
T a transition model
ya disconnt factor (discount rate)
Aa sct of legal actions,
At Markov decision process
P aset of deployed sensors
Q aset of critical targets
Sastate space
ue charging vale,
v velocity of the MC
“vimeve ECR of the MC tor traveling
w energy consumption rate
cM yesidual energy of the MC
c residual energy of a sensor
mm number of critical targete
nunmber of deployed senzors
Trang 15home automation (Pirbhulal et al., 2016), air/earthquake monitoring (Alphonsa and Ravi,
2016, Kingsy Grace and Manju, 2019), smart agriculture (Sanjeevi et al., 2020), health monitoring (Abdulkarem et al., 2020, Gardaševié et al., 2020), smart cities (Csdji et al.,
2017), A WSN typically consists of a few to hundreds or thousands of spatially dispersed and dedicated sensors to monitor specific targets or areas of interest Each sensor node (SN) is equipped with sensing, processing, and communication capabilities to convert an analog signal of physical quantity into a digital signal and connect the node to the network
‘The sensing data will be cooperatively transmitted through the wireless network to a base
station (BS), also known as a sink, where the data can be observed and analyzed,
Figure 1.1.1: Wireless sensor network architecture (source: Bali (2018))
Such a design offers WSNs the capability of being deployed on the fly and can operate
unattended, self-organizing without requiring any pre-existing infrastructure and with lit-
tle maintenance The sensor nodes collaborate to achieve a common goal, such as sensing and reporting temperature, humidity, or motion in a specific area The communication among the sensor nodes is usually achieved through multi-hop routing, where the data is
forwarded from node to node until it reaches the base station Therefore, they can cover
large areas and gather data from numerous sources simultaneously, providing real-time
1
Trang 16monitoring of the target area Additionally, the use of WSNs can reduce labor and mainte-
nance costs since the SNs are equpped with low-cost, low-power batteries that can operate
for extended periods (Akyildiz et al 2002)
However, the limited energy capacity of the sensors poses a crucial challenge for ensur- ing continuons surveillance Ouce the battery is fully consumed, the sensor can 10 longer monitor the targets or relay the data, leading to network fragmentation and data loss in some punts of the senving field, Further, WSNs are oflen deployed in ash amd inacees- sible environments for hmmans, such as underground tunnels and battlefields, making it challenging to teplace the sensors’ batteries Hence, in most WSN applications, one of the primary objectives is to maximize the nelwotk's Lifespan while keeping il functional
to ensure continuons data transmission and monitoring of the targets (Yetgin et al., 2017)
‘The last two decades observed a remarkable effort of researchers put into designing new paradigms and protocols to prolong the network lifetime The approaches can he broadly classified inlo two usin calegorics: sensor fimetioning optinuzaton aud energy replenishment Sensor functioning optimization aims to increase the efficiency of SNs and reduce energy consumption through methods such as energy-efficient, routing (Raj
ct al., 2019), data aggregation (Goyal ct al., 2019), and sensor scheduling (Haimour and Abu-Sharkh, 2019), while energy replenishment focuses on providing extemal sources of energy, auch az energy harvestmg (Adu-Manu et al., 2018, Shaikh and Zeadally, 2016) or
‘wireless charging (Keswan et al., 2022, Qureshi ct al, 2022), to the sensor nodes
One advantage of optimizing sensor functioning is that it can extend network lifespan without added hardware or infrastructure These techniques can also save energy and be customized for various applications However, they may not suflice for high-uoughput needs that demand real-time data transmission, as this approach can only extend the sen- sors’ lifetime for a certain amount of time ‘The battery will eventually be exhansted if no extemal source supplies (hư seusors Meanwluile, energy replenishment provides a con- timuơns energy supply to sensors, possibly eliminating the need for battery replacements
‘This technique works in remote or harsh environments but may require extra hardware or inmfrastruclure, such as energy harvesters ur wireless chargers Llardware costs and run tenance are also potential drawbacks in some applications
‘The choice of technique to prolong the network lifetime in wireless sensor networks
depends on varions factors, including application reqnirementa, energy constraints, and vost considerations Iu this thesis, we study (wo problems, cael is an emergjng tech-
nique that has attracted significant attention in recent years for its potential to prolong the
network lifetime of WSNs Specifically, we investigate (1) energy-efficient routing via relay node placement aud (2) adaptive routing strategies forthe mobile claager in wireless
rechargeable sensor networks (WR SNS)
Energy-efficent routing via relay node placement Relay node placement is an essen- tial technique ta optimize the functioning, of WSNs The idea is ta deploy non-sensing, relay nodes (RNs) to increase the network’s capability and balance the energy consump- tion among nodes As RNs are often determined after deploying, SNs we can optimize the placement of RNs to balance the energy consumption among nodes, which in tum pro- longs the network lifetime Past research efforts have focused on finding the minimum number of RNé required to rednce deployment costs while enauring QoS ritena such ag ueiwunk coverage aid comectivily Recent works have also considered ils potential to elongate the network lifetime However, striking a balance between these criteria remains
a significant challenge To address this issue, we proposed a multi-cbjective approach
2
Trang 17to constructing an efficient communication structure (routing tree) with the least possible mmumber of R’Ns, thereby prolonging the network’s lifetime while ensnring its connectivity
Adaptive charging strategies in WRSNs, Despile remarkable progress in recent years,
sensor function optimization and energy harvesting techniques cannot provide reliable ser-
vice for high-throughput applications requiring, continuous surveillance Recent advauce-
incnts in wireless clanging provide a foundation fora novel scheme: wirelesy rechargeable sensor networks (WRSWs) The idea is to employ a mobile charger (MC) with a high-
power battery te go aronnd and charge the sensors wirelessly The main challenge here
lo designa suitable chiaging strategy for the mobile charger wile accounling for the uner- tainties arising inthe network Existing charging schemes either make a strict assumption
abont constant energy consumption rates or cannot adapt to unpredictable changes in the
nol woxk lopulogy To uvercune Lis challenge, we prupose a novel haying veeane thai uses a deep reinforcement leaming (DRL) approach to guide the mobile charger adap- tively Our approach enables the mobile charger to adapt to spontaneous changes in the nol work Lopology
1.2 Thesis contributions
The main conizibuliuns of dis Uwsiz can be summarized ay follows:
+ Inthis study, we examine common approaches for prolonging, the network Infetimne
in wireless sensor nelworks, which cau be classified into two main groups: sensor functiening optimization and energy replenishment We investigate one emerging issue for each group, which can help in extending the network lifetime further
+ In Chapler 4, we investigate cnergy-cllicient routing lechuiquey by considering the relay node placement ax a mniti-objective problem We imroduce a novel multi objective evolntionary algorithm that exploits problem-specific features to find a Pareto-front that balances the trade-off between the number of relay nodes and the network's energy consumption We conduct extensive experiments to demonstrate the effectiveness of our method and show that it provides a better trade-off compared
to existing algorithms The paper summarizing the results is under review in the Soft Computing journal (Bui et al., 2023)
‘The resullz were published al the IEEE MASS, 2022 (Bui et al, 20:
13 Thesis outline
The thesis is structed as fullows:
In Chapter 2, we provide an overview of the evolutionary and deep reinforcement
leaning algorithmy ulilized in (his study We introduce the fundamental concepts and principles of these algorithms to provide readers with the necessary background knowl-
edge for subsequent chapters
Trang 18In Chapter 3, we present a comprehensive survey of the network lifetime problem
in WSNs Our survey cavers cinvent state-of-the-art works in sensor fimetioning, apti-
mization and energy replenishment approaches We also provide an in-depth analysis of energy-efficient routing approaches and adaptive charging schemes in WRSWs This anal-
‘ysis will position on wark better in the literature
In Chapier 4, we delve inlo the relay node placement problem and present our multi objective framework for addressing this problem We describe the problem statement and the design of our algoritlan, which likes ile account provlem-specifie properties to fad
aPwcto-foul that balances the munber of relay nodes and the uctwork’s energy consuup-
tion We also conduct experiments to demonstrate the effectiveness of our approach and compare il with existing algoritlons
In Chapter 5, we present on adaptive deep reinforcement leaming, framework for
scheduling, charging trajectories for mobile chargers in WRSN settings We describe the
design of ou approach, which enables the mobile charger to choose the next charging destination adaptively, based on the curent energy levels of the sensors and the network's
topology We demonstrate the effectiveness of our model through extensive experiments
and compare it with existing on-demand charging methods
Finally, in Chapter 5.5, we conclude the thesis by summarizing our contributions and
highlighting te key Gaudings of our study, and then providing recommendations for fuvure works
Trang 19Chapter 2
Background
In this chapter, we present un overview of the evolutiomny wd deep reinforcement lear
ing algorithms used in the study The purpose of this chapter is to introduce readers to
Ue basic convepls and principles of hese algunitlunus, which will be necessary for under- standing Ue subsequent chapurs We intzoduce nulti-vbjective optimization (MOO) prob- Jems and describe how evolntionary algorithm simulate the process of natural selection
lo evalve optimal solutions lo MOO problems in Section 2.1 We also explain how deep
Trimforcentent leaming algoritions combine newal nelworks ad reiuforcement leaning to
enable agents to leam from their expenences and improve their decision-making abilities over Lire in Section 2.2
2.1 (Multi-objective) Evolutionary Algorithms
2
.1 Multi-objective optimization problems
Muli-ubjective optimization (MOO) is a branch of oplimizalion Dial deals wilh problems involving multiple conflicting objectives In MOO, the goal is to find a set of solutions thal simultaneously oplimize multiple objectives These objectives often comllict with cach other, leading Lo no single solution being optimal for ell objectives Mathemnalically,
a multi-objective optimization problem can be formulated as
where ƒ : 4 - + Bis the ith objective fimction As there are conflicting criteria, we first xeed a definition of solution demination (o compare belween solutions
Definition 2.1.1 (Solution domination) A salition x, is said to dominate another solution
Xụ, denoted x, > xy, #x: is na worse than x, in all objectives and is strictly better in at
least one abjecttve
A solution x, is said to be non-dominated if there is no other solution in the population that dominates x- The Pareto front help represents the set of non-dominated solntions A solution is Parelu optimal if there is no ollier sulution thal is beller in all objectives The Pareto front can be mathematically formulated as follows Let x ¢ V denote a solution vector, where ¥ is the feasible region Let ffx) — ˆ#'(x}, /a(%) l„(x)] denets the
$
Trang 20Figue 2.1.1: Pareto dominance (Source: Vea el all (2021)
vector of m objective functions Then, the Pareto front can be defined as:
P — {x 2 fix’ © A such that f(x’) ~ F(x)}
where F(x} ~: ffs’) denotes that x/ dominates x, meaning, that x/ is no warse than x in all objectives and strictly ever Uhan x ia at least oue objective, Figure 2.1.1 shows an example of the Pareto front The Pareto front is auseful tool for decision-making in MOO becanze it provides a set of trade-off solutsons that decision-makers can choose fram hased
on their preferences
2.1.2 Non-dominated sorting genetic algarithm
Multi-objective evolutionary algorithms (MOEAs) are a class of optimization algorithms that are designed to tackle MOO problems MORAs are inspired by the process of natura] selection, where individuals wilh betler Glness are snore likely to survive and reproduce
In MOEAs, candidate solutions are represented as individnals in a population, and the fimess of each individual is evaluated based on multiple objective functions ‘The goal of MOKAs is to find a set of Pareto optimal solutions
‘Non-dominated Sorting Genetic Algorithm (MSG) is one of'the most popular MOG algorithms that uses the genetic algorithm (CA) az a search method NSGA-ii was first introduced by Deb et al (2002) as an improvement over Ue traditioual GA fur ulti- objective optimazation problems The core idea behind NSGA is to sort the population of
candidate solutions inlo several frunis based un (heir won-dormination relationsinp The
Gast front contains nou-dontnated solutions, and the second front contains solutigus that
are dominated by solutions in the first tront, and so on By sorting the population into
frunts, NSGA ig able to mainlant a diverse set of Pareto oplimel solutions,
Similar to the original genetic algorithm, NSGA uses two main genetic operators
crossover andmutalion Crossover is a process of combining, two parent solutions to gen-
srale anew offspring solution, while mulationis a proces: of introducing random changes
toa solution NSGA uses toumament selection to choose parent solutions for crossover
and mutation Tn each generation, NSGA performs non-dorinated sorting to rank the
6
Trang 21
si Nga dnmfnalelsotirg Đi daueneetrlenhilom
Figure 2.1.2: Examples of the non-<dominated sorting algorithm and crowding distance calcnilation (Sonrce Verma et al (2021)
population into fronts and assigns a crowding distance value to each solution in the front The crowding distance value measmes how crowded a solution is in the front based on the distance to its neighboring solutions The crowding distance value of a sokttion x, in front #; is deÑmed ag
crowding_distance(x:.F,) =~
wel
where 17 is the number of objectives, f,(x;,1 ;) and &,(x\1, /) are the abjective fime-
tion values of the neighboring solutions of ; infront F along the #-th objective, and [?""* and 7" ore the maximum and zninimum objective fiction values, respectively, along the k-th objective over the entire population The crowding distance value ensures that s0- Intions that are diverse in the objective space are preserved in the population NSGA then
selects the best solutions trom the fronts based on their rank and crowding distance and
uses hem to generale Ute ext population ‘The selection operator ensures thal solutions i the first front are always selected, and solutions in the subsequent fronts are selected based
on ther crowding distance value We provide a pseudocode of NSGA-i in Algorithm 1
Trang 22
Algorithm 1 Non-dominated Sorting Genetic Algorithm
Input: Population size V, number of generations 7, crossover probability p., mutation
probshility p,,, selection operatar
1: Initwlize population /4 with 4 individuals
2: Evaluate objective functions for each individual in Po
Bret
A: while t <7 do
$ Qy¢ Create empty population
& H, © Create empty population
T: Pf ¢- Perform tournament selection on £,
fori« 1toN by2do
Select parents :r;,7;_, from / nsing binary tonmament selection
if Random number < ø, then
1ã: for j + i2+ ldo
lốc if Random mưnber < p,, then
+3: Merge #4, and é2) inta Hy
2: Perform non-duniualed sorting on A, Wo obtain Bom È;,
as: Set Pye ico
Sort remaining individuals in F; by crowding, distance
3a: Add lop (À — [24,1 ) individuals in 210 F344
2.2.1 Reinforcement learning and key concepts
Reinforcement leaming (KL) is one of the fundamental paradigms of machine leaming alongside supervised learning and unsupervised learning It involves an agent that inter-
acts with an environment, receiving rewards for its actions The objective of RI is to
enable (ie ageul to leam a policy tal maximizes the rewards il receives by ileratively try ing, different strategies and receiving feedback This way, the agent can adapt to changes
in the environment and improve its performance over time ipnre 5.3.1 illustrates the
x
Trang 23basic framework of reinforcement leaming
IRL problems, an agent interacts with an environment aver a sequence of discrete lime steps Al each lime step, the agenl observes a sale of Le environment, lakes action, receives a reward, and transitions to a new state The state of the environment at each time
step depends only on the previous state and action and not on any stafes or actions that
occumed before thai
RL problems are often represented as Markov Decision Processes (MDPs), which are
characterized by a sel of slates 5, actions A, transition probabilities 1, reward function A,
anda discount factory The state space is the set of all possible states that the environment can be in The action space is the set of all possible actions that the agent can take The trausiion probabilites describe the probabilily of Irausilioning from one state (o another when taking a particular action The reward function maps each state-action pair ta a scalar reward The discount factor is a parameter that determines the relative importance
of future rewards
‘The goal of an RL agent in an MDP is to eam a policy =, which is a mapping from
stales to actions, (hal maximizes the expected cumulalive reward over time, ‘The optimal
policy is the policy thal muxdinizes the expected cumulative reward for all possible imittal states To maximize the cumnlative rewards, the agent needs to determine an appropriate policy (s) thal selecls the best action iar each stale, as well as a value fiaction V (s> (hat cestimatey Ue fulure rewards thal will be obtained Ly following the pulicy The interaction between the agent and the environment involves a sequence of actions that cause the en- vironment lo change stale, This sequence can be described as au episode Uhal ends when the envionment reaches a terminal slate, At cach time step ¢ € 1.2 ,7, the agent ub-
serves the current state X,, takes action 4, and receives a reward 1 The trajectory is a
Ara Xe}
‘The mnodel gives all the necessary iufonmation abvul an environment: trusition prol- ability function 7 and reward fanclion R At state -r,, the agent chooses an action a,
which leads to a new state 7,1 and receives ä reward r, ¡ ‘his is a transition step
(2,01, 11,7141) The probability of this trausition ix
Pricuyastsilens a) = Pie = teen Ria = real Xr = tn Ar = ai] @3)
‘The policy » models the agent’s behavior at a state «+ w{aelse} = E„|+
r,] and the return G, is a aoonmmulatad đisconmt rewards: Œ, = f„¡ + 21h¿; + =
332 a3È ¿ii The discounting factor + € [0,1] determnes how much the agent cares
about rewards in the distant future relative to thosc in the imuncdiate future The value fnnetion is the expected retum of state s at time t, which can be calculated by:
Trang 24‘The following, equation draws the connection between the valne fanction and the Q-function
2
In some cases, it is favorable to use the difference in 1etrn pŸ a state-action pair compared
to the expected return of that state We define that difference as the advantage value
Optimal value and policy The optimal value function produces (le miaxnnum rẻlunnt:
a) maxQ,(r.a) (2.8)
‘The optimal policy achieves optimal value functions
Thus, t2z(x) - VY(z) and (2,
-) — QUẦnn)
Bellman equations ‘The [iellman equation expresses the relationship between the value
of a state and the values of its successor states It states that the value of a state is equal to the immediate reward obtained hy taking action in that state plus the disconnted value of
the next state that the agent transitions ta
‘Traditional RL methods use a combination of model-ftee techniques (e.g Q-leaming, SARSA) and model-based approaches (¢.g dynamic programming) to leam optimal poli-
cies in environments with discrete and small state-action spaces
Deep reinforcement learning, (I9RL.) is a recent extension of RI that leverages deep neural networks to enable leaming in the high-dimensional state and action spaces In DRI, the agent learns to diractly map raw sensory inputs (¢.g., pixel values in images) to actions without relying on hand-engineered features ‘Ihis is achieved by combining deep neural networks with traditional RL algorithms, such as Q-learning and policy gradient methods As this thesis uses actor-critic algonthms, which are highly based on policy gradient methods, we provide background knowledge aboul policy gradiend methods in the following section,
2.2.2 Policy Gradient methods
Policy gradient methods are a class of reinforcement Jeaming algorithms that aim to opti- mize policies directly rather than computing a value Sumetion and then deriving a policy from it These methods have gained popularity recently due to their ability to handle continnons action spaces and their success in achieving state-of-the-art results in various applicalions, including game-playing, reboties, aud nalural language processing,
Policy gradient methods leam the policy directly with a parameterized function with respect to 4, x(a|r:4) We tram the agent to maximize the expected retum Specifically,
10
Trang 25in continuous space with X, as the initial starting state:
‘We maximize the objective function /(@) using the gradient ascent method, The gradient
of J can be computed nsing Policy Gradient Theorem Sutton and Barto (2018) as follows
VIO) = RE [V laz
Taving the gradient, we can optimze policies directly using gradient ascent algorithms
However, early policy gradient methods suffered from high variance, which made them difficult to converge to optimal policies
‘To address this issne, actor-critic methods combine a policy network (the actor) with a value function network (the critic) to provide a more stable estimate of the policy gradient Specifically, two networks are maintained in actor-critic methods One network is used for learning, value function, namely the critic, denoted Vj,, and another Teams the mapping, between state and actions directly, known as the acter, denoted x9 The ctitic is used to
criticize the actions made by the actor, andthe actor adjusts its parameters im the direction
suggested by the criúc The gradient of the actor is now given by:
Over the last decade, encoder-decoder architecture has emerged as one of the most promi-
nent deep leaming architectures Originally introduced to solve the problem of mapping fixed-length input to output in sequence-to-sequence learning, the vanilla encoder-decoder
encodes a variable-input sequence to an internal, fixed-dimensional representation This
representation is then used by an RNN-hased decoder ta produce a variable-length output unlil a termination criterion is detected, However, a major drawback of this approacl is
its inability to remember long sentences To overcome this limitation, attention mecha-
nisms were intioduced Attention allows the decoder to use any af the encoder’s hidden
statcg instead of relying ou the fixed-length representation produced by the eucoder This
is achieved by creating shoutcuts that combine the entire input sequence into a context
vector, with weights assigned to represent how much attention is devoted to each input Mathematically, Ict us denote the encoder and decoder hidden states with (e1,¢y 064)
and (đi.da, đu} @ context vector at decoding time iis given by
= where u? is an alignment of the input vector, which is calculated by:
where:
Trang 26‘This context vector « is later concatenated with decoder state d to make a prediction
or compnte a hidden vector for the next steps of the recnrrent model
2.2.4 Pointing mechanism
‘The pointing mechanism is a technique first proposed in (Vinyels et al., 2015) to produce discrete outyuts that correspond to positions in the input or example, in the combinato- rial problem - Travel Sailing Problem, the solution is a permutation of the input positions
In (Vinyals et al., 2015), the authors proposed a Pointer network which is an encoder- decoder LSTM model The inpul, including a sequence of the node's position, is encoded
by an LSTM encoder In the decoder, instead of blending the encoder hidden states c; into a context vector c at each decoder step, a eduction of attention mechanism is used to poinl tu ameinber of the inpul sequence Lo be selected as Ue oulput:
ai =softmax(u'), je {1.2 n} (2.19)
where a is considered as the probabilily to select inpul j im the decoder step ¢, ‘The nude
with the highest probability is chosen to be visited next The procedure is iteratively re-
peated to obtain the final solntion
12
Trang 27et al,, 2002) Figure 3.1.1 provides a simplified diagram of the sensor node architecture
Figure 3.1.1: A block diagram of the architecture of the sensor node in the WSN
Sensor nodes are generally low-cost, resulting in a battery with a low capacity and usually non-renewable Therefore, extending the network's lifetime is a crucial factor that determines the overall efficiency and effectiveness of these networks The network lifetime is generally defined as the time until either coverage or connectivity is lost (Zhao and Gurusamy, 2008), Numerous efforts have been dedicated to extending the lifespan of wireless sensor networks (WSNs), which can be classified into two main groups: sensor
functioning optimization and energy replenishment
Sensor fumetioning optimization focuses on enhancing the efficiency of sensor nodes while reducing their energy consumption Various methods have been proposed in this area, such as data reduction, which involves removing redundant or unnecessary data to reduce the amount of information transmitted and, consequently, the energy consump- tion (Goyal et al., 2019), Another approach is sleep/wakeup schemes, which enable sen- sors to alternate between sleep and active modes based on predefined schedules or events, thereby reducing energy consumption duting idle periods (Haimour and Abu-Sharkh, 2019) Additionally, energy-efficient routing protocols have been developed to minimize energy
13
Trang 28consumption by reducing the amber of hops between nodes and selecting paths with low
energy costa (Raj et al., 2019)
‘hese above methods have the advantage of being able to prolong the network lifetime
without requiring additional hardware or infustructue aud can Le implemented at the software level They also have the potential to achieve significant energy savings and
can be tailored to suit different application requirements [lowever, these leclmiques may
not be sufficient forhigh-Garoughpul applications lal require real-time or continuous data transmission, as this approach can only extend the sensors’ lifetime for « certain amount of time The battery will eventually be exhausted if no extemal source supplies the sensors
On the olher hand, energy replenishaent aims lo provide extemal sources of energy tothe sensor nodes to extend their lifetime One promising approach is energy harvesting, which involves converting ambient energy ftom the environment, such as solar, thermal
or kinetic cmmgy, inlo electrical euegy lo power the sensors (Adu-Manu el al, 2018) Nevertheless, this technique dramatically depends on an ambient sonroe that is usually unstable and uncontrollable Another approach is wireless charging, which imvolves wire- lesily (ransamilting energy to the sensory using cleclrumaguetic waves or magnetic resu- nance The ideais to employ a (or mnlti-) mobile charger (MC), which is equipped with a high-capacity battery and a transmission coll, to travel aromnd the sensing field and charge the sensor wirelessly (Qureshi el al., 2022)
Enayy repteniglacnt tecloiques can provide a continuous supply of energy lo the
sengor nodes without the need for battery replacements They also have the advantage of
being able lo operate in reole or harsh envommenty where traditional power sources:
may not be available However, these techniques may require additional hardware and infrastmetnre, such as enesgy harvesters or wireless chargers Moreover, the cost and nuinieace of such hardware and infaslruclue may be a concern i sume applivalious
In the folluwing sections, we focus on two problems of euergy-eflicient routing via relay node placement and charging policies for WRSNs
3.2 Energy-efficient routing via relay node placement
Encigy-efficient routing is a well-established research area that aims to reduce energy con- sumption while maintaining reliable data delnvery (Behera et al., 2022, Raj et al., 2019) Traditional approucles miosily focused on the existing structure of the WSNy lo determine the routing protocols Popnlar ones include hierarchical routing protocols, data-centiic routing, protocols, and location-based routing protocols ]Herarchical routmg, protocols like LEACH(Heinzcbnan ct al., 2000) divide the nctwork into clusters to minimize energy consumption, while data-centzic routing, protocols like Directed Diffusion (Intanagonwi- wat et al., 2003) route data based on the contant ofthe message T.ocation-based routing, protocols Like GRP (Karp and Kung, 2000) use location information to make routing, deci- sions However, the effectiveness of these traditional routing, protocols can be limited by the existing network topology and nade density
In zecent years, there has been a growing interest in relay node placement strategies
to enhance energy consumption in wireless sensor networks (Verma et al., 2015) The idea is to deploy additional non-sensing relay nodes to increase the network's capability and balauce the energy conswuption among nodes Originally, relay node placement is designed to enhance to QoS ciiteria of WSNs, such as network connectivity and fault tolerance (TTanh et al., 2019, I.ee e† al, 2015, Ma et a1, 2015, Sheikhi et al., 2021) Re-
14
Trang 29cently, relay node placement has been considered more for its potential to help balance the energy consumption among nodes, which in tum elongates the network lifetime (Tam
et al., 2020) Tam et al (2020) have considered two objectives: minimizing the number
of relay nodes and minimizing the maximum node energy consumption to prolong the
network lifetime They proposed a weighted-sum approach to finding a routing tree that
maximizes the network’s lifetime with minimum additional relay nodes Although their work is limited to 2-hop WSNs and employs only the weighted-sum algorithm, it shows the potential of applying relay node replacement in prolonging the network lifetime This
thesis aims the enhance the existing results of Tam et al (2020) to multi-hop networks with a novel objective-oriented multi-objective algorithm We will discuss this problem
(a) Offline charging scheme (®) Online charging scheme
Figure 3.3.1: A comparison of offline and online charging scheme Wireless Rechargeable Sensor Networks (WRSNs) are a subset of WSNs that use wire- less charging technology to replenish the sensor nodes’ energy This technology provides
a promising solution to one of the most significant challenges facing WSNs, which is the limited energy capacity of the sensors WRSNs consist of a mobile charger that traverses the sensing field and wirelessly charges the sensors that need energy This approach en-
sures continuous operation of the WSN without requiring manual battery replacements,
making it ideal for remote or harsh environments where accessing the sensor nodes is dif- ficult, However, constructing an efficient charging policy for mobile chargers (MCs) to meet the dynamic charging requirements of the sensors is one of the most critical chal- lenges in WRSNS, Various proposals have been put forward to address this challenge, and they can be broadly classified into two main categories: offline charging schemes and online charging schemes (Figure 3.3.1)
Offline charging scheme Offline charging schemes aim to optimize the charging pol- icy before the charging process begins, usually by considering the expected workload, the capacity of the MCs, as well as the energy demand of the sensors, This approach enables the MC to charge the sensors with the optimal charging schedule, minimizing the over~ all energy consumption and extending the network lifetime Lyu et al (2019) propose
a periodic charging planning for mobile Wireless Charging Equipment with limited trav- ling energy They propose a Hybrid Particle Swarm Optimization Genetic Algorithm (HPSOGA) because of the NP-hardness of the problem In (Jiang et al., 2017), the au- thors jointly consider charging tour planning and MC depot positioning for large-scale WSNS Their method consists of charging tour planning, candidate depot identification
15
Trang 30and reduction, depot deployment, and charging tour assignment The charging scheme also considers the association between the MC charging cycle and the sensor nodes’ Tite time, Maet al, (2018b) aim to minimize the sensor energy expiration time and the charging, tour length ofthe mobile charger They develop an approximation algozithm forthe charg- ing utility maximization problem if the energy consumption of the mobile charger om ts changing tour is negligible and an elficien! heuzislic Irough a nen-uivial reduction fon
a length-constrained utility maximization problem otherwise However, this approach re- quires a strict asamnption ahout the constant energy consumption rate of sensor nodes,
witch 1s unrealistic im praclice
Online charging scheme Online charging, schemes detenmine the optimal charging pol-
icy while the charging process is ongoing, usually by monitoring, the energy levels of
the sensors and the remaining energy of the MC Online charging schemes are generally more flexible and adaptive, as they can adjust the charging policy in response to changing, network conditions In the on-demand charging problem, sensor nodes request charging, from the MC when their energy is depleted or falls under a predefined threshold ‘The
MC maintains a pool of these requests and determines the next sensor to charge among, the requested umes inthe pool The NINP (Lie ef al., 2013) algoritiun charges the closes! sensor node in the queue, while DWDP (Lin et al., 2019) uses double waning thresholds
and double preemption to optimize charging priorities and recharge deasllines TSync (Fu
et al,, 2015) constructs nested TSP tours to reduce travel dislance and changing delay, anid TSCA (Lin et al., 2017) minimizes the number of failed nodes while maximizing energy
efficiency Kaswan et al (2018) present a T.inear Programming (1.P) formulation for the
ondemand scheduling problem and (hen introduce an efficient solution based on a gravi- tational search algorithm (GSA) to tackle the problem PA and INMA (Zhu ot al., 2018)
are two efficient online charging algorithms that first consider dynamic energy consump lion rates based on their hidory slativlics and real-time wnergy consumption Recently,
reinforcement leamning-based algorithms (Cao et al., 2021, La et al., 2020) have also been
considered for designing on-demand charging schemes However, a common drawhack
of the above on-demand dgorilluns is the dependence on Ue chosen threshold for the charging, requests, making it sensitive to that setting We propose in this thesis a novel adaptive charging scheme by eliminating the on-demand charging request and using deep reinforcement Learning Lo train the policy The detail will be discuseed im Chapter Š
Trang 31Chapter 4
An Evolutionary Algorithm for Optimal
Node Placement
This chapter investigates the energy-efficient routing problem in WSNs throngh relay
node placement Hixisting approaches solely focus on minimizing, the number of used
relay nodes without considering, energy consumption among nodes We propose a relay node placement approach in multi-hop wireless sensor networks with two objectives: min- imizing the number of used telay nodes and minimizing the maximum node energy con- sumption The frst objective is to restrict the deployment costs, while the second is to
balance the energy consumption among nodes, which in tnm extends the netwark’s lite-
lime To improve the network's reliability, we also consider a hop count bound, which acts as a delay constraint for transmitting packages To solve our problem, we propose a
novel objective-oriented multi-objective evolunonary algonthm (MORAs) that leverages
Ihe problem-specific properties lo isuprove the algoritlan’s convergence rate, Simulation results on 3D datasets show that onr algorithm outperforms existing algorithms in all mea-
However, optimizing the placement of RNs is a challenging task az it often involves multiple conflicting ciiteria There are two variants of relay node placement problems in
tha hterature: unconstrained and constrained problems In the former problem, relay nodes
can be placed anywhere in the texrain In contrast, to avoid unreelistic relay deployments due to physical constiaints, the latter restricts the position of the additional relay nodes
at certain locations, which are determined in advance Tlowever, both typically form NP-
hard problems (Lloyd and Xue, 2006, Misra ct al., 2009); thus, in practical scttings, it is hard to obtain an optimal sohition in a suitable amount of time
‘Numerons works have been carried out to approximate the placement of RNs while
17
Trang 32ensuring several constraints of connectivity and fault tolerance Ma et al (2015) studied the constrained relay node placement in WSNa and proposed a, connecuvity-aware local search algorithm to find the ininimum number of relay nodes so each sensor is covered by atleast one relay node Lee et al (2015) assured the fault tolerance of a partitioned WSN by establishing, a bi-connected inter-partition topology while still deploying, the Teast comnt of relay nodes Bagaa et al, (2017) leveraged a Rayleigh block-fading camel and weighted comamnication graph to construct a routing tree with a minimum munber of additional relay nodes anh et.al (2019) introduced a multi-objective problem that simultaneonsly cousiders the target coverage, coumectivily, aud fault tolerance of WSNs Sheikhi et al (2021) proposed the two phases approach to provide multi-path routing and fault-tolerance with higher network connecrivity in heterogeneons WSNs
Most of the above works consider multi-hop communication (o enswe couneetivily and prevent long-hop transmission However, unlimited hop communication could in-
crease the network’s latency and reduce its reliability Bhattacharya and Kumar (2014)
As it iv difficull to measue nctwork latency before node deployments, ahup count bound
is often nsed as a surrogate constraint for network latency (Bhattacharya and Kumar, 2014 Liang el al, 2019, Ma ef al., 2017, 2018a) Bhattacharya and Kuntar (2014) fist studied a cousttained relay placement problem wilh hop count bound and showed ils NP-harduess
‘They then proposed a polynomial tme approximation algonthm forthe problem Ma et al
(2017, 2018a) also uzed the hop coum! lo measure delay and reliability wud formulated the 2-counceted Lop-oustrained relay node placement (HCRNP) problem Two approxima
tion algorithme are proposed to salve this problem Liang et al (2019) later conducted
catensive real-world deployments of WSNs using existing algoriduns and then proposed
a Sct-Covering-based Algorithm (SCA) to ensure the quality of communication in the
network with a hep count baund as a delay constraint,
Despile the promising results, a drawback of the aforementioned works is the lack of considering the energy consumption of nodes in the placement The energy consumption
of the nodes m a WSN is well-known ta be imbalanced since it depends heavily on the
inunber of relayed packets and Lhe distance to the next node in (he network topology (Gu- levia and Verma, 2019) Thus, balancing the load among nodes is essential to prolong the
network's litetime [Iowever, the network's litetime and the cost of deploying, additional
rely nodes are Wo conflicting criteria Deploying moze relay nodes Lypically increases the network's capability and provides more possibilities for load balancing but induces
more cost in the deployment Recently, Tam et al (2020) first considered these two ob-
jectivey in their design ‘They proposed a weighled-swan approach to finding a routing tree that maximizes the network lifetime with minimum additional relay nodes However, they
only consider the 2-hop WSNs, which are only snitable for small networks Additionally,
the weighted-sum strategy must make certain assumptions when assigning weight values regarding how ‘impontant’ a criterion is compared to the other
To overcome these issues, we introduce a novel problem, called Node-Tnergy Rottle- neck Prublem (NEBP), which considas both objectives: minimize the number of telays used and maximize the network lifetime in a multi-hop setting with a hop count bound
‘This paper ums v calablish a communication structure (couting tree) thal has balance con munication among, nodes with minimal additional RNs The main difference compared
to Tam et al (2020) is that we consider a ranlti-hop scheme with a delay constraint by tim- iting the maximum munber of communication hops for cach SN lowards BS, Moreover,
we focus instead on using MOEAs to solve two objectives simultaneously
MOT-As are favorable for their alnlity to provide Pareto fronts of non-dominated zolu-
1K
Trang 33tions in the objective function space These Pareto fionts endow decision-makers to select asolution that fits them best In evolutionary-based approaches, a population of candidate solutions is maintained and evolved toward better solutions There are two main types of representation of an individual in the population: indirect and direct
Tm an indirect representation, the candidate solntions are mapped to a different space where shmdard crossover and mulation operators can be applied, As we ate focusing,
on constructing a ronting tree, a standard approach could use Prnfer encoding (Prifer, 1918), link and node biased (Palmer and Kershenbaum, 1594), or Network random key (NetKeyz) (Rothlanf et al., 2002) as dhe solution encoder Recently, Prekush et al, (2020) leveraged a permutation encormg with a heuristic decoder to propose a hybrid mult objective evolutionary algorithm (TMORA) to find a minimal spanning tree with a min- imum dimucter, The advantage of inutizee! representation is that we can adopl standard operators directly on the solution reprezentaiton (Nayyar et al., 2018) However, most of these representations suffer from the low locality (small changes in the code can lead to Jarge changes in the decoded tree) (Prufer, 1918), or infeasible and redundant representa- tions (Prakash et al., 2020, Rothlanf et al., 2002)
On the olher land, direct representations can use # simple encuding method such
as edge sets encoding (Raidl and Julstrom, 2003) and then perfonn a problem-specific
crossover and mutation directly on phenotypes to create new affsprmg (Rothlauf, 2006)
The main advenlage of this scheme is the abilily to upply a lcuristic Lo guide search opera- tors Hao and Lin (2017) Therefore, in this paper, we use edge sets encoding to represent the solutions and then propose the novel crossover and mmtation operators to solve the Node-Eumgy Bolleneck problem in multi-hop wirelesy seusoruetworks (NEBE)
‘We outline the contributions of this chapter as follows
+ First, we iutoduce a novel problem called uke Node-Energy Bollleneck (NEBL), which considers multi-hop networks with a hop count bound We aim to minimize
two objectives: i) the rumber of used relay nades; ii) the maximmm node energy
consumption Lo prolong the network lifetime
+ Secondly, we propose Guided Prim NGSA-II (GPrim) to solve the proposed prob-
Jem The uovellies of the proposed GPrim can be summarized as follows i) accord ing to the problenmspecific characteristics, encoding-based edge-sel and decoding, methodologies are developed to represent the solution space: ii) we leverage the problems energy property Lo develop a heuristic Prim-based crossover and two mu tations including, energy-oriented mutation and relay-oriented mutation to improve the convergence rate of the algorithm
+ The proposed algorithm is validated against different encoding methods, including, Permutation, Prufer code, NetKeys, and Edge sets The comparison is delivered on
‘various metrics showing that onr algorithm ontpertorms existing approaches by a
rE}
Trang 34temains We consider {wo lypes of connceliow: relay nodes ~-baze station and sensor nodes - sensor nodes/relay nodes Sensing data is gathered by SXs and scut to aBS through arclay
nove or other sensors The data transmitted to relay nodes can only be forwarded directly
to the base station rather than other sensors or iclays We asswne the sensor network is static, meaning SNs have already been deployed, and a finite set of potential positions for
SNs is known in advance The base station is a sink node deployed at the central terrain
wilh an unlimited power supply while relay nodes (RNs) und SNs have the varue initial energy, which camot be replenished
4.2.2 Energy consumption model
Numerous energy dissipation madels in WSNs are studied with different assumptions mn
this work, we use the same energy model in Gawade and Nelbalwar (2016), which ac- counts for the dissipated energy at both the receiver and transmitter during a transmission The free space model (¢* power loss) is used for proximal transmissions, and the multi- pall fading model (¢* power loss) is cousidered for hage-dislance transmissions ‘Thus the energy dissipated by the transmitter for transmitting, an [-bit packet to a distance d is given by:
where do = V is the distance threshold for swapping amplification models and z„
indicates the Tange with which a node can communicate In other words, no camnection
will be established among the nodes out of this range
‘The energy consumption of [he rẻ
"Thẻ đisgipaLed cucigy uf'a node reeviving 7 packels and (ausinilting them ly the parent,
node is calculated by the following formula
where Eu(4), , are calctllared as in Equation 4.1 and 4.2, respectively The argument ¢
eqnall 1 iŸthe nođe is a sensor nade, and 9 otherwise ‘he argument ¢ is the transmission
distance The network parameters showa in Table 4.2.1 are sel ay in Wu and Liu (2013)
Trang 354.1.3 Problem formulation
‘We cousider a wireless sensor network including a set of deployed sensor node S =
Sx, } 8 eet of potential relay nodes J = {r-,72, ,7a.f and a base station de-
sa The posilion of each node is represented as a single point in 3D space Ural is intexpolated from the digital elevation model (DEM) model (Florinsky, 2016) The com-
munication between two nodes can only be established if the Enclidean distance between
them docs nol exceed the communicalion rage 7
that limits the maxinvam mmber of communication hops for each SN towards HS (depth
of the routing tree) We denote the problem as the Node Energy Bottleneck problem with hop connt bound The formal formulation below models the desired structnre as a Steiner tree (Liwang and Richards, 1992)
Definition 4.2.1 (Steiner tres) Given an undirected graph G = {V ) and a set of termi-
nal nades N CV A tree T = (V7, Tip) is called a Steiner tree if it contains no cycles and
spans all terminal nodes, NS Vp & V The set of nodes Vs N is called Steiner nodes
Input
+ @ = (V, F) isan undirected graph, where V = SL it {so} is set of vertices in the graph, $ is set of sensor nudes, His scl of relays, and ag comesponds to base station
+ N= SU {so} denotes the set of terminal nodes
+r € T* is the communication range
<d: Vx V > Rt is the distance function An edge = (u,v) ¢ £ only if
len(so.7) <A Wee Vy,
‘where lenu, ) denotes the Jength of the unique path between two nodes 2 and z-
Output: A valid solution is a Steiner tree 2” (Vp, Ey) that spans the set of terminal nodes N = 5'U {sc} and satisfies the above constraints
Figure 4.2.1 shows an example of a network with three relays and six sensors
Objectives: ‘The Node-Euergy Bollleneck yroblem in multi-hop wireless sensor met works (NEBP} seeks a Steiner tree T = (Vp Er} inthe valid output space that optimizes
two following objectives:
Trang 36"Base station © Reay © Sensor
Potential network ‘Aeandidato soluton ‘An infeasible soation
Figure 4.2.1: An example of a network with 3 relays, 6 sensors, and max-hop constraint is3
* Minimize the number of selected SNs (Steiner node):
* Minimize the maximum energy consumption of each node:
where £) is calculated as Equation 4.3
‘We propose a phenotype-based multi-objective evolutionary algorithm (MOEA) named
Guided Prim NGSA-II (GPrim) to solve the NEBP In this scheme, the population is
first initialized by a random-tree algorithm in which the candidate solutions are encoded
by the edge sets encoding method Then, we apply the NSGA-II algorithm (Deb et al., 2002) to maintain and evolve the population towards better solutions using the problem- specific search operators, We leverage the problem's energy property to develop a heuris- tic Prim-based crossover and two objective-oriented mutations to reduce ineffective moves from standard search operators The specifics of solution representation, initialization, crossover, and mutation are described below
4.3.1 Solution representation
Although a direct representation needs no mapping between the phenotypic and genotypic space, a data structure is still necessary for processing Li (2001), Rothlauf (2006) We use edge-set encoding on this problem for its simplicity This encoding can act as the basis for evaluating the solution or be converted to an adjacency list in linear time As we want
to find a Steiner Tree that can connect all SNs to the BS, the number of vertices in the solutions is not consistent For simplicity, we initialize solutions with the connections from the BS to every RN, and this structure is maintained in all candidate solutions in the population The RNs with no connection to any SNs are later removed from the output structure by the decoder
Trang 37a spanning tree from a start node by adding an adjacent node at random, regardless of its weight Moreover, we adapt PrimRST to consider max-hop constraint by maintaining nodes’ depth while creating a tree We call this algorithm as HCPrimRST (Algorithm 2)
Applying HCPrimRST with max-hop constraint may lead to an invalid, non-connected
structure The initialization is thus divided into two phases The first phase initializes
edges T = {(0,v) © Elv © R} and runs the algorithm with max-hop constraint; and in
the second phase, we relax its constraint and continue to build the tree obtained from the
first phase to get a valid connected tree
Algorithm 2 HCPamRST
Input: ‘The set of initialized edges 7’, set of vertices V7, set of potential edges , max-hop constraint /i
Output: The set of used edges 7
d & depth of vertices in partial tree T # by dfs from root 0