TIANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY Master’s Thesis in Data Science and Artificial Intelligence An Efficient, On-demand Charging for WRSNs Using Fuzzy Logic and Q-Learning
Trang 1TIANOI UNIVERSITY OF SCIENCE AND
TECHNOLOGY
Master’s Thesis in Data Science and
Artificial Intelligence
An Efficient, On-demand Charging
for WRSNs Using Fuzzy Logic and
Q-Learning
La Van Quan Quon L¥202335MGsis.hust.cdu.vn
Department: Department cf Software engineering
Institute: School of Information and Communication ‘lechnology
Hanoi, 2022
Trang 2Declaration of Authorship and Topic Sentences
1 Personal information
Full name: La Yan Quan
Phone number: 039 721 1659
Email: Quan.LV202335M Gsis.hust.cdu vir
Mojor: Data Scicuce uid Artificial Intelligence
2 "Topic
An Efficient, On-demand Charging for WRSNs
Using urzy Logic and Q-Tearning
3 Contributions
© Propose « Fuzzy logic-based algorithm that determines the energy level
to be charged to the sensors
© Introduce a new method that optimizes the optimal charging time at each
charging location to maximize the number of alive sensors
* Propose Fuzzy Q.charging, which uses Q-learning in its charging scheme
to guarantce the target coverage and connectivity,
4A Declaration of Authorship
Thercby declare that my thesis, titled ‘An Bificlent, On-demand Charging for WRSNs Using Fuzzy Logic and Q-Learning", is the work of myself and my supervisor Dr Nguycn Phi Le All papers, sources, tables, and so on used in this thesis huve been (horoughly cited,
Trang 3Acknowledgments
I would like to thank my supervisor, Dr Nguyen Phi Le, for her continued support and guidance throughout the course of my Masters’ studies She has been
a great teacher and mentor for me since iny underyraduate years, and I am proud
tu have completed this thesis under her supervision
Twant to thank my family aud my friends, who have given me their unconditional Jove and support to finish my Masters’ studies,
Finally, T wonid like to again thank Vingronp and the Vingronp Trnovation
Foundation, who have supported my studies through their Domestic Master/Ph.D
Scholarship program
Paxts of this work were published in the paper “(Q-learning based, Optimized On
demand Charging Algorithm in WRSN” by La Van Quan, Phi Le Nguyen, Thanh
Hung Nguyen, and Kieu Nguyen in the Proceedings of the 19th IEEE International Symposium on Network Computing and Applications, 2020
La Van Quan was fuuded by Vingroup Joint Stock Company and supported by
the Domestic: Master/Th.D Scholarship Programme of Vingroup Tnnovation Foun- dation (VINIF], Vingronp Tig Data Institute, eade VINTF.2020.ThS BIC.03.
Trang 4several important tasks, two of which are sensing and communication Every time
the above tasks are performed, the sensor’s energy will be lost over time Therefore
some sensor nodes may die A sensor node is considered dead when it runs out of
energy Correspondingly, the lifetime of WS
of operation until a sensor di
The limited battery capacity of the sensor is always a "bottleneck" that greatly af-
fects the life of the network To solve this problem, Wireless Rechargeable Sensor Networks (WRSNs) were born, WRSNs include sensors equipped with battery charg-
(Mobile Chargers (MC)) responsible for adding
ers and one or more mobile charge
power to the sensors In WRSNs MCs move around the network, stopping at spe-
cific locations (called charging sites) and charging the sensors Thus, it is necessary
to find a charging route for MC to improve the lifetime of WRSNs [2] [3]
Trang 54.4 Charging time determination
4A Fury logic-based safe energy level determination
44.1 Motivation 44.2 Furzification
443 Luzzy controller 44.4 Deluuzification 4.5 Reward function
vi
ONT hee
24
25 2
Trang 63.2 Comparison with existing algorithms
Trang 7Comparison of non-rtonitored targets over lime 6
Comparison of dead yeusors over bine 6
vi
waa
Trang 8Fuzzy rules evaluation
Trang 9
cludes many battery-powered sensor nodes, monitoring several target
ing In the WSNs, it is nec«
to provide sufficient monitoring quality surrounding the targets (i.e., guarantee-
ing target coverage) Moreover, the WSNs need to have adequate capacity for the communication between the sensors and base station (i.e
Í0][7]IS] The target coverage and connectivity are severely affected by the depletion
, ensuring connectivity)
of the battery on sensor nodes, When a node runs out of battery, it becomes a dead
nsing and communication capability, damaging the whole network in
WRSNs) leverages the advan- node without s
coverage and connectivity
In a normal operation, the MC moves around the networks and performs charg-
ing strategie: h can be classified into the periodic {{)][1][10][11][12] or on-demand
charging [1][2][/4][15] [10][L7][1S] In the former, the MC, with a predefined trajec-
tory, stops at charging locations to charge the nearby sensors’ batteries In the latter
the MC will move and charge upon receiving requests from the sensors, which have the remaining energy below a threshold The periodic strategy is limited since it can- not adapt to the sensors’ energy consumption rate dynamic On the contrary, the
on-demand charging approach potentially deals with the uncertainty of the energy consumption rate Since a sensor with a draining battery triggers the on-demand op- cration, the MC’s charging strategy faces a new time constraint challenge The MC
needs to handle two crucial issues: deciding the next charging location and staying
period at the location
Trang 10Although many, the
<isting on-demand charging schemes in the literature face two seriona problems The first one is the consideration of the same role for the sen- sor nodes in WRSNs That ia somewhat unrealistic since, intuitively, several sensors, depending on their locations, significantly impact the target coverage and the con- nectivity than others Hence, the existing charging schemes may enrich unnecessary sensors’ power while letting necessary ones run out of energy, leading to charging algoriUnus" jneflicicney Tt is of great importance to tuke inte account the target coverage und connectivity simulluncously The second problen is about Uie MC's chorging awount, which iy vither a ull capacity (uf seusor buttcry) or a fixed amount
of energy The former case may cause: 1) a long waiting time of other sensors stay- ing near the charging lacatin; 2) qnick exhanstion of the MC’s energy Tn contrast, charging a too small amomnt to a node may tead to its lack of power ta operare
should adjust the transferred energy level dynamically following the network condition
for WRSN that assures the larget coverage aud connectivity and adjusts Lie energy
Jy proposal, named Fuzzy Q-charging,
aims to maximize the network lifetime, which is the time until the first target is nov
level charged to the sensors dynamically
monitored First, this work exploit Fuzzy logic in an optimization algorithm that determines the optimal charging time at each charging location, aiming to maximize the numbers of alive sensors and monitoring targets Fuzzy logic is used to cope with network dynamics by taking various nctwork parameters into account during the determination process of optimal charging time Socond, this thesis leverage the Qlcamning technique in 9 now algorithm that selects the next charging location to maximize Uhe network lifetiine The MC naintains a Q-table coulaiuing Uhe charging locations’ Q-values representing the charging locations’ goodness The Q-values will
be updated in a real-time manner whenever there is a new charging request from a sensor I design the Q-value to pricritize charging locations at which the MC cen charge a node depending on its critical role After finishing tasks in one place, the
MC chooses the next one, which has the highest Q-value, and determines an optimal charging time The main contributions of the paper are as follows
« This thesis propose a Fuzzy logic-based algorithm that determines the energy level to be charged to the sensors ‘I'he energy level is adjusted dynamically
following the nctwork condition
@ Based on the above algorithm, this thesis introduce a new method that opti- mizes the optimal charging time at each charging location It considers sev-
Trang 11eral parameters (i
_ Temaining energy, energy consumption rate, sensor-to-
charging location's distance) to maximize the number of alive sensors
© This thesis propose Preay Q-cherying, which uses Q-learning in its charging scone to guarantee the target coverage and couneetivity Fuzey Q-charying’s
reward fimction is designed to maximize the charged amormt: ro essential sen-
sors and the mmber af manitored cargets
‘he rest of this thesis is constructed as follows
© Chapter 3 introchices the related knowledge of this stndy, incinding the overview
of the WSN and the WIRSN, the previous works of the charging scheme opti mization problem and some optimization algorithms
© Chapter 2 describes concepts related to wireless sensor networks, q-learning,
and fnzzy logie
« Chaprer 4 presents the propoael algorithms, which are eamprised df the fnzzy logic Q-learning approach
© Chapter § evaluates und compares Uhe perlorinaues of the proposed algorithms with oxieting rosuaroh,
® Chapter 5.2.4 concludes the thesis and discusses about future works.
Trang 12Chapter 2
Theoretical Basis
A Wireless Sensor Network (WSN) is a network that consists of several spatially
di
to monitor and record physical conditions in a variety of situations The sensors
tributed and specialized sensors connected by a communications infrastructure
in WRSNs will collectively convey the sensing data to the Base Station (BS), also known as a sink, where it will be gathered, processed, and/or multiple actions done
as needed A typical WSN connecting with end-users is seen in Fig 2.1
To monitor the physical
Sensors play an important role in a sensor networ
environments and communicate with others efficiently, sensors have a lot of re-
quirements They not only need to record surroundings accurately and precisely, be
capable of computing, analyzing, and storing the sensing data, but also have to be small in space, low in cost, and effective in power consumption
Sensor
s commonly comprise four fundamental units: a sensing unit, which mon- itors environments and converts the analog signal into a digital signal; a processing unit, which processes the digital data and stores in memory; a transceiver unit, which provides communication capability; and a power unit, which supplies energy
to the sensor [|] In addition, some sensors also have a navigation system to deter-
deployed in the 1950s as part of a sound surveillanc
's have a wide range of applications in a variety of fields They were first
system designed by the US
Navy to detect and track Soviet submarines WSNs are now used in a variety of civilian applications, including environmental monitoring, health monitoring, smart agriculture, and so on A WSN can be used in a forest, for example to alert au- thorities to the risk of a forest fire Furthermore, WSN can track the location of a fire before it has a chance to expand out of control WSNs have a lot of potential in
ed system can record the fluctuation rate of glucose in
Trang 14
“me Sensor "i tase station «Gf Mobile charger (MC) fH! Depot @ Charging location
4 Monitored target % Now-monitred tuyet_ —+ Duta transmission.) Sensing range
Figure 2.3: Network model
Many efforts have been made to reduce the energy usage of WSNs They have
attempted to optimize radio signals using cognitive radio standardization, lower data rate using data aggregation, save more energy using sleep/wake-up schemes,
and pick efficient energy routing protocols However, none of them completely solved
the energy problem of the sensor node in WSNs The battery will ultimately run out
if there is no external source of electricity for the sensors Gathering energy from the
environment is another way to overcome the sensor's energy depletion problem Each
In recent years, thanks to advancements in wireless energy transfer and recharge-
able battery technology, a recharging device can be used to recharge the battery of
sensors in WSNs As a result, WRSNs, a new generation of sensor networks, was born
(Fig 2.3) The sensor nodes in WRSNs are equipped with a wireless energy receiver
via wireless transfer radio waves based on electromagnetic radiation and magnetic resonant coupling technology, giving them an edge over standard WSNs WRSNs use one or more chargers to recharge sensor nodes on a regular basis As a result,
the lifetime of the network is optimally prolonged for eternal operations WRSNs
approach Charging terminals and MCs are the two types of chargers available, A
„ in particular, necessitate a charger employment charging terminal is a device that has a fixed placement in the network and can
Trang 15recharge many sensors Because the network scale is normally large, a significant
network, allowing it to cover a large region If it runs out of power, it will return to
the BS to replenish its battery a result, the only issue is that we need to figure
Q-Learning [L9] is one of the most often used Reinforcement Learning (RL) al-
gorithms It learns to predict the quality, in terms of expected cumulative reward, of
an action in a specifi¢ state (Q-value) [1] Moreover, as it is a model-free reinforce-
ment learning algorithm [20], [15], the agent does not have a model representation
of the environment, it simply learns and acts without knowing the changes being
caused in the environment The methods in which an environment model is known are called model-based In this case, the agent knows approximately how the envi- ronment is going to evolve This is the reason why model-based methods focus on planning while model-free ones focus on learning [19]
The standard Q-learning framework consists of four components: an environ-
ment, one or more agents, a state space, and an action space, as shown in Fig 2.4 The Q-value represents the approximate goodness of the action concerning the
y and the Q-value After
performing an action, the agent modifies its policy to attain its goal The Q-value
agent’s goal An agent chooses actions according to the polic
is updated using the Bellman equation as follows:
QS Ad) & (1 = a) Q(S) Ad) + [Re +7 max Q(Sts1, a), (2.1) where Q(5;, A;) is the Q-value of action Ay at a given sate Sj Ry is the reward
obtained if performing action Ay where in the state S; Moreover, max Q(Sts1,4)
is the maximum possible Q-value in the next state S,,; for all possible actions a a
and ¥ are the learning rate and the future reward discount factor Their values are set between () and 1
An explicit procedure to implement the Q-learning algorithm is provided in Algorithm 1
Trang 16States, Reward Re ActionA,
2 for each episode do
Get initial state +
Select @ using policy derived from Q:
Take action a, observe next state s” and obtain reward r;
Update Q(s.a) by equation 2.1;
rtainly true or false in some cases In such situations,
fuzzy logic can be used as a flexible method for reasoning, given the uncertainty
In logic Boolean, a classic logical statement is a declarative sentence that deliv- ers factual information, If the information is correct, the statement is true; if the
information is erroneous, the statement is false However, sometimes, true or false
values are not enough
Lotfi et al [7] coined the term "fuzzy logic" in the 1960s to describe a type
of logic processing that contains more than two true values The fact that some
assertions contain imprecise or non-numerical information influences fuzzy logic
The term "fuzzy" was also used to describe ambiguity and unclear information As
a result, fuzzy logic can describe and manipulate ambiguous and uncertain data, and it has been used in a variety of industries
Following the fuzzy method, fuzzy logic uses particular input values, such as multi-numeric values or lingnistic variables, to produce a specific output The fuzzy technique will determine if an object fully or partially contains a property, even
if the property is ambiguous For example, the term "extremely strong engine" is
based on the fuz method There are hidden degrees of intensity ("very") of the
trait in question ("strong").
Trang 17J-ontput has the following form
Be: LP (fis An )OUp is Ag lO Un is Aue)
where {h, - Tx} represents the crisp inputs to the role {Ay, , As} and Ry are linguistic variables The operator @ can be AND, OR, or NOT Inference Engine is in charge of the estimation of the Vuzzy output set It calculates the membership degree (4) of the output for all linguistic variables by applying the rule set described in Knowledge Hase, Kor Fuzzy rules with lots of inputs, the output calculation depends on the operators used inside it, Le., AND, OR, or NOT The
calculation for euch Lype of operator iy described as follows:
(his Ay AND Ty is Ay):
Hage as ag) = main Gea lho Mas}, (i is A; OR Fy is Ay):
Baxva,(u) = max (ua, (0), 444,()))-
where gø(2) is the output membership function of the Hnguistie varinblc Z8,
Trang 18Chapter 3
Literature Review
Initially, This thesi
in WRSNs In [1], the authors leverage PSO and GA to propose a charging path
introduces the existing works related to periodic charging
determination algorithm that minim: the docking time during which the MC
ff at the depot [1] jointly conside
s charging path planning and depot
Lin et al derive a new energy transfer model with distance and angle
all nodes They use linear programming and obtain the optimal solution As the
charging schedule is always fixed, the periodic scheme fails to adapt to the dynamic
of sensors’ energy consumption
Regarding the on-demand charging, the authors in [17] address the node failure
problem They first propose to choose the next charging node based on the charging
probability Second, they introduce a charging node selected method to minimize the
number of other requesting nodes suffering from energy depletion In [2, 11], aiming
to maximize the charging throughput, they propose a double warning threshold
charging scheme, Two dynamic warning thresholds are triggered depending on the
residual energy of sensors The authors in [15] studied how to optimize the serving order of the charging requests waiting in the queue using the gravitational search algorithm In [!()], X Cao et al introduce a new metric (i.e., charging reward), which quantifies the charging scheme’s quality The authors then address the problem of
maximizing the total reward in each charging tour under the constraint of the MC's
10
Trang 19
energy and sensors’ charging time windows They use a deep reinforcement learning-
based on-demand charging algorithm to solve the addressed problem
‘The existing charging algorithms have two serious problems, First, the charging
time problem has not been thoroughly considered Most of the charging schemes leverage either the fully charging approach [1, 2, 9, 10, 1, 14, 17], or the partial
charging one [21] I want to emphasize that the charging time is an essential factor
that decides how much the charging algorithm can prolong the network lifetime
Moreover, there is no existing work considering the target coverage and connectivity
constraints concurrently, Most previous works treat all sensors in WRSNs evenly;
hence, the MC may charge unnecessary sensors while necessary ones may run out
of energy Unlike them, this work addresses the target coverage and connectivity
constraints in charging schedule optimization This thesis uniquely considers the
robotics [24], embedded controllers [25] In WSNs, Fuzzy logic is a promising, tech-
nique in dealing with various problems, including localization, routing [20, 27], elus-
tering [19], and data aggregation [28, 20] R M Al-Kiyumi et al in [26] propose a
Fuzzy logic-based routing for lifetime enhancement in WSNs, which maps network
shortest path In [20], the
anthors also leverage Fuzzy logic and Q-learning, but in a cooperative multi-agent
status into corresponding cost values to calculate the
system for controlling the energy of a microgrid In [il], Fuzzy and Q-learning are
combined to address the problem of thermal unit commitment Specifically, each in-
put state vector is mapped with the Fuzzy rules to determine all the possible actions with corresponding Q-values, The main idea is exploiting Fuzzy logic to map net- work status into corresponding cost values to calculate the shortest path Recently,
the authors in [15] use Fuzzy logic in an algorithm for adaptively determining the charging threshold and deciding the charging schedule Different from the others, I use Fuzzy logic and Q-learning in my unique Fuzzy Q-charging proposal The earlier version of this work has been published in ['}1], which considers only Q-charging
Figure 3.1 shows the considered network model, in which a WRSN monitors:
several targets The network has three main components: an MC, sensor nodes, and
a base station The MC is a robot that can move and carry a wireless power charger The sensor nodes can receive charged energy from the MC via a wireless medium
The base station is static and responsible for gathering sensing information, We
assume that there are n sensors Sj (j = 1, ,n) and m targets T (k = 1, ,m)
We call a sensor a farget-covering sensor if it covers at least one target Moreover,
11
Trang 20
Ấm Somer ÍÍBaeuaiom €Ể Mobilechaser(MC) ÏỂ Do, ® Chayinglocmlon
mm”
Figure 3.1: Network model
if there exists an alive routing path between a sensor and the base station, it is connected to the base station The target is defined as to be monitored when at
least one sensor connected to the base station covers it
A sensor node that has its remaining energy below Ej, (i.c., a predefined thresh-
old) will send a charging request to the MC We target a non-preemptive charging
schedule, in which charging requests from sensors are queued at the MC We as-
sume that there are k charging locations denoted by Dj, , Dy in the network When the MC completes its tasks at a charging location, it runs the proposed al- gorithm to select the next optimal charging location from D,, , Dg Moreover, the MC also determines the optimal charging time at that charging location When
the energy of the MC goes below a thi
itself Besides gathe
hold, it returns to the depot to recharge
ing the sensing information, the base station is also responsible for collecting information about the remaining energy sensors Based on that, the
Trang 21
following procedures to update the Q-rable
« The MC leverages Fussy logo tơ calculate a so-elled safe energy level, which
is sulficieutly higher than Ey The MC thou uses the algorithm deseribed
in Section 4.3 lo determine the charging lime at cach charging location, The
charging time is optimized to maximize the mmber of sensors which guarantee
the safe energy level
Ihe MC calculates the reward of every charging location using (4.9), and update the Q-table using equation (4.1)
Aller finivhiug charging ut a charging location, the MC selects the next charging location aa the one with the highest Q-valne Finally, the MC moves to the next charging location and charges far the determined charging time When the energy
of the MC goes helow a threshold, it returns to the depot to recharge itself Fignre 4.1 presents the overview of our charging algarithm
In our Q-learning: based model, the network is considered the environment while the MC is the agent A stute is defined by the current uburging location of the
MG, nnd un uetion isu wove to the next charging location Each MC iaintains its own Qeluble, which is a two-dimensiounl array Each row represents a stute, and each ealimn represents an action An item Q(2;, 1%) in the j-th row and £1h cahimn represents the Q-value corresponding to the action when the MC! moves from
the current charging square Dy to the next, charging location 7 Figure 4.2 shows
13
Trang 22State space
Figure 4.2: Illustration of the Q-table
an illustration of our Q-table In the figure, the gray row represents the Q-values
concerning all possible actions when the MC stays at the charging location D The
green cell depicts the maximum Q-value regarding the next charging location
Let D, be the current charging location and D, be an arbitrary charging location, then the Q-value of action moving from D, to D; is iteratively updated by using the Bellman equation as follows:
Q (De, Di) — Q (De, Di) + a(r(Di) arene) (Di, Dj)-—Q(De,Di)) (41)
The equation ’s right side consists of two elements including the current Q-value
and the temporal difference The temporal difference measures the gap between the
estimated target, i.e., r(Di)+maz9 (Dj, Dj), and the old Q-value, i-e., Q (De, Di)-
a and 7 are two hyper-parameters whose names are learning rate and discount factor, respectively r(Dj) is our proposed reward function which will be detailed in
Section 4.5
In the following, we first describe our algorithms to determine the optimal charg- ing time and the safety energy level in Sections 4.3, 4.4 Then, we present the details
14
Trang 23of the reward function and the mechanism for updating the Q-table in Sections 4.5,
46
‘This work aims to design a charging strategy so that the number of sensors reaching a safe energy level is as big as possible after each charging round Here, the
safe energy level means the energy amount that is sufficiently greater than Ey, We
define the safe energy level, By, as
where Emar is the maximum energy capacity of the s
determine the optimal charging time 7; to minimize the number of critical sensors
We adopt the multi-nodes charging model, in which the MC can simultaneously
energy consumption rate of 8) which is estimated by the MC Suppose that the MC
charges Sj at D;, we denote the remaining energy of 9; when the charging process
starts and finishes as E; and Ej, then E; = Ej +(p\—e) xTj At the charging location
Dj, we call p',—e; the energy gain of S) The remaining energy of 8; will increase if its
energy gain is positive and decreases otherwise Note that the energy of 5; equals to
E,
the safety energy level, if the charging time equals to which is named as the safety charging time of S; with respect to the charging location D,, and denoted as
Aj The sensors can be classified into four groups The first and second ones contain
normal sensors with positive energy gain and critical sensors with negative energy
ively The third and fourth groups contain normal sensors with negative
gain, resper
energy gain, and critical sensors with positive energy gain, respectively Obviously, the first and second groups’ sensors don’t change their status no matter how long the MC charges at D, In contrast, a sensor Sj in the third group will fall into the critical status, and a sensor in the four groups can alleviate the critical status, if the
15