1. Trang chủ
  2. » Luận Văn - Báo Cáo

An efficient, on demand charging for wrsns using fuzzy logic and q learning

46 8 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề An Efficient, On-demand Charging for WRSNs Using Fuzzy Logic and Q-Learning
Tác giả La Van Quan
Người hướng dẫn Dr. Nguyen Phi Le
Trường học Hanoi University of Science and Technology
Chuyên ngành Data Science and Artificial Intelligence
Thể loại master’s thesis
Năm xuất bản 2022
Thành phố Hanoi
Định dạng
Số trang 46
Dung lượng 1,3 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Declaration of Authorship and Topic Sentences• Propose a Fuzzy logic-based algorithm that determines the energy level to be charged to the sensors.. • Introduce a new method that optimiz

Trang 1

HANOI UNIVERSITY OF SCIENCE AND

Institute: School of Information and Communication Technology

Hanoi, 2022

Trang 2

Declaration of Authorship and Topic Sentences

• Propose a Fuzzy logic-based algorithm that determines the energy level

to be charged to the sensors

• Introduce a new method that optimizes the optimal charging time at eachcharging location to maximize the number of alive sensors

• Propose Fuzzy Q-charging, which uses Q-learning in its charging scheme

to guarantee the target coverage and connectivity

4 Declaration of Authorship

I hereby declare that my thesis, titled “An Efficient, On-demand Charging forWRSNs Using Fuzzy Logic and Q-Learning”, is the work of myself and mysupervisor Dr Nguyen Phi Le All papers, sources, tables, and so on used inthis thesis have been thoroughly cited

5 Supervisor confirmation

Ha Noi, April 2022Supervisor

Dr Nguyen Phi Le

Trang 3

I would like to thank my supervisor, Dr Nguyen Phi Le, for her continuedsupport and guidance throughout the course of my Masters’ studies She has been

a great teacher and mentor for me since my undergraduate years, and I am proud

to have completed this thesis under her supervision

I want to thank my family and my friends, who have given me their unconditionallove and support to finish my Masters’ studies

Finally, I would like to again thank Vingroup and the Vingroup InnovationFoundation, who have supported my studies through their Domestic Master/Ph.DScholarship program

Parts of this work were published in the paper “Q-learning-based, Optimized demand Charging Algorithm in WRSN” by La Van Quan, Phi Le Nguyen, Thanh-Hung Nguyen, and Kien Nguyen in the Proceedings of the 19th IEEE InternationalSymposium on Network Computing and Applications, 2020

On-La Van Quan was funded by Vingroup Joint Stock Company and supported bythe Domestic Master/Ph.D Scholarship Programme of Vingroup Innovation Foun-dation (VINIF), Vingroup Big Data Institute, code VINIF.2020.ThS.BK.03

Trang 4

In recent years, Wireless Sensor Networks (WSNs) have attracted great attentionworldwide WSNs consist of sensor nodes deployed on an surveillance area to monitorand control the physical environment In WSNs, every sensor node needs to performseveral important tasks, two of which are sensing and communication Every timethe above tasks are performed, the sensor’s energy will be lost over time Thereforesome sensor nodes may die A sensor node is considered dead when it runs out ofenergy Correspondingly, the lifetime of WSNs is defined as the time from the start

of operation until a sensor dies [1] Thus, one of the important issues to improve thequality of WSNs is to maximize the life of the network

In classical WSNs, sensor nodes have fixed energy and always degrade over time.The limited battery capacity of the sensor is always a "bottleneck" that greatly af-fects the life of the network To solve this problem, Wireless Rechargeable SensorNetworks (WRSNs) were born WRSNs include sensors equipped with battery charg-ers and one or more mobile chargers (Mobile Chargers (MC)) responsible for addingpower to the sensors In WRSNs, MCs move around the network, stopping at spe-cific locations (called charging sites) and charging the sensors Thus, it is necessary

to find a charging route for MC to improve the lifetime of WRSNs [2], [3]

Keywords: Wireless Rechargeable Sensor Network, Fuzzy Logic, ReinforcementLearning, Q-Learning, Network Lifetime

Author

La Van Quan

Trang 5

1.1 Problem overview 1

1.2 Thesis contributions 2

1.3 Thesis structure 3

2 Theoretical Basis 4 2.1 Wireless Rechargeable Sensor Networks 4

2.2 Q-learning 7

2.3 Fuzzy Logic 8

3 Literature Review 10 3.1 Related Work 10

3.2 Problem definition 11

4 Fuzzy Q-charging algorithm 13 4.1 Overview 13

4.2 State space, action space and Q table 13

4.3 Charging time determination 15

4.4 Fuzzy logic-based safe energy level determination 16

4.4.1 Motivation 16

4.4.2 Fuzzification 17

4.4.3 Fuzzy controller 17

4.4.4 Defuzzification 18

4.5 Reward function 21

4.6 Q table update 22

5 Experimental Results 24 5.1 Impacts of parameters 25

5.1.1 Impacts of α 25

5.1.2 Impacts of γ 26

Trang 6

5.2 Comparison with existing algorithms 27

5.2.1 Impacts of the number of sensors 27

5.2.2 Impacts of the number of targets 28

5.2.3 Impacts of the packet generation frequency 28

5.2.4 Non-monitored targets and dead sensors over time 30

Trang 7

List of Figures

2.1 A wireless sensor network 5

2.2 A sensor structure 5

2.3 Network model 6

2.4 Q-learning overview 8

3.1 Network model 12

4.1 The flow of Fuzzy Q-learning-based charging algorithm 14

4.2 Illustration of the Q-table 14

4.3 Fuzzy input membership functions 18

4.4 Fuzzy output membership function 19

5.1 Impact of α on the network lifetime 25

5.2 Impact of γ on the network lifetime 26

5.3 Network lifetime vs the number of sensors 27

5.4 Network lifetime vs the number of targets 29

5.5 Network lifetime vs the packet generation frequency 29

5.6 Comparison of non-monitored targets over time 30

5.7 Comparison of dead sensors over time 31

Trang 8

List of Tables

4.1 Input variables with their linguistic values and corresponding

mem-bership function 18

4.2 Output variable with its linguistic values and membership function 18 4.3 Fuzzy rules for safe energy level determination 19

4.4 Inputs of linguistic variables 19

4.5 Fuzzy rules evaluation 20

5.1 System parameters 25

Trang 9

to provide sufficient monitoring quality surrounding the targets (i.e., ing target coverage) Moreover, the WSNs need to have adequate capacity for thecommunication between the sensors and base station (i.e., ensuring connectivity)[6][7][8] The target coverage and connectivity are severely affected by the depletion

guarantee-of the battery on sensor nodes When a node runs out guarantee-of battery, it becomes a deadnode without sensing and communication capability, damaging the whole network inconsequence Wireless Rechargeable Sensor Networks (WRSNs) leverages the advan-tages of wireless power transferring technology to solve that critical issue in WSNs

A WRSN uses a mobile charger (MC) to wirelessly compensate for a rechargeablebattery’s energy consumption on a sensor node, aiming to guarantee both the targetcoverage and connectivity

In a normal operation, the MC moves around the networks and performs ing strategies, which can be classified into the periodic [9][1][10][11][12] or on-demandcharging [13][2][14][15] [16][17][18] In the former, the MC, with a predefined trajec-tory, stops at charging locations to charge the nearby sensors’ batteries In the latter,the MC will move and charge upon receiving requests from the sensors, which havethe remaining energy below a threshold The periodic strategy is limited since it can-not adapt to the sensors’ energy consumption rate dynamic On the contrary, theon-demand charging approach potentially deals with the uncertainty of the energyconsumption rate Since a sensor with a draining battery triggers the on-demand op-eration, the MC’s charging strategy faces a new time constraint challenge The MCneeds to handle two crucial issues: deciding the next charging location and stayingperiod at the location

Trang 10

charg-Although many, the existing on-demand charging schemes in the literature facetwo serious problems The first one is the consideration of the same role for the sen-sor nodes in WRSNs That is somewhat unrealistic since, intuitively, several sensors,depending on their locations, significantly impact the target coverage and the con-nectivity than others Hence, the existing charging schemes may enrich unnecessarysensors’ power while letting necessary ones run out of energy, leading to chargingalgorithms’ inefficiency It is of great importance to take into account the targetcoverage and connectivity simultaneously The second problem is about the MC’scharging amount, which is either a full capacity (of sensor battery) or a fixed amount

of energy The former case may cause: 1) a long waiting time of other sensors ing near the charging location; 2) quick exhaustion of the MC’s energy In contrast,charging a too small amount to a node may lead to its lack of power to operateuntil the next charging round Therefore, the charging strategy should adjust thetransferred energy level dynamically following the network condition

stay-1.2 Thesis contributions

Motivated by the above, this thesis propose a novel on-demand charging schemefor WRSN that assures the target coverage and connectivity and adjusts the energylevel charged to the sensors dynamically My proposal, named Fuzzy Q-charging,aims to maximize the network lifetime, which is the time until the first target is notmonitored First, this work exploit Fuzzy logic in an optimization algorithm thatdetermines the optimal charging time at each charging location, aiming to maximizethe numbers of alive sensors and monitoring targets Fuzzy logic is used to copewith network dynamics by taking various network parameters into account duringthe determination process of optimal charging time Second, this thesis leverage theQ-learning technique in a new algorithm that selects the next charging location tomaximize the network lifetime The MC maintains a Q-table containing the charginglocations’ Q-values representing the charging locations’ goodness The Q-values will

be updated in a real-time manner whenever there is a new charging request from asensor I design the Q-value to prioritize charging locations at which the MC cancharge a node depending on its critical role After finishing tasks in one place, the

MC chooses the next one, which has the highest Q-value, and determines an optimalcharging time The main contributions of the paper are as follows

• This thesis propose a Fuzzy logic-based algorithm that determines the energylevel to be charged to the sensors The energy level is adjusted dynamicallyfollowing the network condition

• Based on the above algorithm, this thesis introduce a new method that mizes the optimal charging time at each charging location It considers sev-

Trang 11

opti-eral parameters (i.e., remaining energy, energy consumption rate, charging location’s distance) to maximize the number of alive sensors.

sensor-to-• This thesis propose Fuzzy Q-charging, which uses Q-learning in its chargingscheme to guarantee the target coverage and connectivity Fuzzy Q-charging’sreward function is designed to maximize the charged amount to essential sen-sors and the number of monitored targets

1.3 Thesis structure

The rest of this thesis is constructed as follows

• Chapter 3 introduces the related knowledge of this study, including the overview

of the WSN and the WRSN, the previous works of the charging scheme mization problem and some optimization algorithms

opti-• Chapter 2 describes concepts related to wireless sensor networks, q-learning,and fuzzy logic

• Chapter 4 presents the proposed algorithms, which are comprised of the fuzzylogic Q-learning approach

• Chapter 5 evaluates and compares the performance of the proposed algorithmswith existing research

• Chapter 5.2.4 concludes the thesis and discusses about future works

Trang 12

Chapter 2

Theoretical Basis

2.1 Wireless Rechargeable Sensor Networks

A Wireless Sensor Network (WSN) is a network that consists of several spatiallydistributed and specialized sensors connected by a communications infrastructure

to monitor and record physical conditions in a variety of situations The sensors

in WRSNs will collectively convey the sensing data to the Base Station (BS), alsoknown as a sink, where it will be gathered, processed, and/or multiple actions done

as needed A typical WSN connecting with end-users is seen in Fig 2.1

Sensors play an important role in a sensor network To monitor the physicalenvironments and communicate with others efficiently, sensors have a lot of re-quirements They not only need to record surroundings accurately and precisely, becapable of computing, analyzing, and storing the sensing data, but also have to besmall in space, low in cost, and effective in power consumption

Sensors commonly comprise four fundamental units: a sensing unit, which itors environments and converts the analog signal into a digital signal; a processingunit, which processes the digital data and stores in memory; a transceiver unit,which provides communication capability; and a power unit, which supplies energy

mon-to the sensor [1] In addition, some sensors also have a navigation system to mine positions of themselves, other sensors and sensing targets, a mobilizer to addthe mobile capability, etc A sensor structure is shown in Fig 2.2

deter-WSNs have a wide range of applications in a variety of fields They were firstdeployed in the 1950s as part of a sound surveillance system designed by the USNavy to detect and track Soviet submarines WSNs are now used in a variety ofcivilian applications, including environmental monitoring, health monitoring, smartagriculture, and so on A WSN can be used in a forest, for example, to alert au-thorities to the risk of a forest fire Furthermore, WSN can track the location of afire before it has a chance to expand out of control WSNs have a lot of potential inthe area of health monitoring Scientists, for example, have developed a WSN-basedsugar monitoring device The system can record the fluctuation rate of glucose in

Trang 13

Figure 2.1: A wireless sensor network

Figure 2.2: A sensor structure

diabetes patients’ blood and warn them in time

Despite its many benefits, a WSN has several limits Because a WSN mustmaintain its low-cost characteristic, some features must be eliminated A sensor in aWSN, for example, has a low-capacity battery that is often non-rechargeable Bat-tery replacements are impossible to obtain if WSNs are installed in remote terrain.When a sensor’s battery dies, it can no longer record, send data, or communicate,causing the network to malfunction Furthermore, WSN sensors are tiny, resulting

in limited computing and storage capabilities Although there are many challengeswith WSNs such as communicating range, signal attenuation, security, and so on,this thesis focuses on the energy difficulty as one of the most important As a re-sult, the sensor nodes’ energy depletion avoidance has gotten a lot of interest fromresearchers and network users all around the world

Trang 14

Figure 2.3: Network model

Many efforts have been made to reduce the energy usage of WSNs They haveattempted to optimize radio signals using cognitive radio standardization, lowerdata rate using data aggregation, save more energy using sleep/wake-up schemes,and pick efficient energy routing protocols However, none of them completely solvedthe energy problem of the sensor node in WSNs The battery will ultimately run out

if there is no external source of electricity for the sensors Gathering energy from theenvironment is another way to overcome the sensor’s energy depletion problem Eachsensor has an energy harvester that converts power from external sources such assolar, thermal, wind, kinetic, and other forms of energy into electrical power Sensorscan use the converted power to recharge their batteries, extending the network’slifespan However, this strategy is overly reliant on an unstable and uncontrolledambient energy supply

In recent years, thanks to advancements in wireless energy transfer and able battery technology, a recharging device can be used to recharge the battery ofsensors in WSNs As a result, WRSNs, a new generation of sensor networks, was born(Fig 2.3) The sensor nodes in WRSNs are equipped with a wireless energy receivervia wireless transfer radio waves based on electromagnetic radiation and magneticresonant coupling technology, giving them an edge over standard WSNs WRSNsuse one or more chargers to recharge sensor nodes on a regular basis As a result,the lifetime of the network is optimally prolonged for eternal operations WRSNsare easier to maintain long-term and reliable operations than typical WSNs becausethey give a more flexible, customizable, and dependable energy replenishment

recharge-A new network generation, on the other hand, would provide new issues that havenever been faced before WRSNs, in particular, necessitate a charger employmentapproach Charging terminals and MCs are the two types of chargers available Acharging terminal is a device that has a fixed placement in the network and can

Trang 15

recharge many sensors Because the network scale is normally large, a significantnumber of charging terminals would be required The network is trying to figureout how many charging terminals are needed to refresh all of the sensors, which

is analogous to the coverage issue with WSNs Furthermore, a charging terminaldoes not have an infinite energy supply, thus it will need to be recharged after sometime As a result, the charging terminal is not a dependable device for long-termoperation It’s a better idea to use MCs to recharge sensors A MC can travel via thenetwork, allowing it to cover a large region If it runs out of power, it will return tothe BS to replenish its battery As a result, the only issue is that we need to figureout the MC’s charging structure

The standard Q-learning framework consists of four components: an ment, one or more agents, a state space, and an action space, as shown in Fig.2.4 The Q-value represents the approximate goodness of the action concerning theagent’s goal An agent chooses actions according to the policy and the Q-value Afterperforming an action, the agent modifies its policy to attain its goal The Q-value

environ-is updated using the Bellman equation as follows:

Q(St, At) ← (1 − α)Q(St, At) + α[Rt+ γ max

a Q(St+1, a)], (2.1)where Q(St, At) is the Q-value of action At at a given sate St Rt is the rewardobtained if performing action At where in the state St Moreover, max

is the maximum possible Q-value in the next state St+1 for all possible actions a αand γ are the learning rate and the future reward discount factor Their values areset between 0 and 1

An explicit procedure to implement the Q-learning algorithm is provided inAlgorithm 1

Trang 16

Figure 2.4: Q-learning overviewAlgorithm 1: Q-Learning Algorithm

1 Initialize Q(s, a);

2 for each episode do

3 Get initial state s;

4 Select a using policy derived from Q;

5 Take action a, observe next state s0 and obtain reward r;

6 Update Q(s, a) by equation 2.1;

7 s ← s0;

8 end

2.3 Fuzzy Logic

Nowadays, decision-making is a daily activity of our lives However, it is hard

to decide if a statement is certainly true or false in some cases In such situations,fuzzy logic can be used as a flexible method for reasoning, given the uncertainty

In logic Boolean, a classic logical statement is a declarative sentence that ers factual information If the information is correct, the statement is true; if theinformation is erroneous, the statement is false However, sometimes, true or falsevalues are not enough

deliv-Lotfi et al [7] coined the term "fuzzy logic" in the 1960s to describe a type

of logic processing that contains more than two true values The fact that someassertions contain imprecise or non-numerical information influences fuzzy logic.The term "fuzzy" was also used to describe ambiguity and unclear information As

a result, fuzzy logic can describe and manipulate ambiguous and uncertain data,and it has been used in a variety of industries

Following the fuzzy method, fuzzy logic uses particular input values, such asmulti-numeric values or linguistic variables, to produce a specific output The fuzzytechnique will determine if an object fully or partially contains a property, even

if the property is ambiguous For example, the term "extremely strong engine" isbased on the fuzzy method There are hidden degrees of intensity ("very") of thetrait in question ("strong")

Trang 17

A fuzzy logic system consists of three components: fuzzification, fuzzy logic troller, and defuzzification The first component converts the crisp values of thevariable into their fuzzy form using some membership functions The second one isresponsible for simulating the human reasoning process by making fuzzy inferencebased on inputs and a set of defined IF-THEN rules The module itself can beseparated into two subcomponents, namely Knowledge Base and Inference Engine.Knowledge Base is a set of specifically designed rules so that together with the in-put states of variables, they will produce consistent results Each rule’s form is "IF{set of input} THEN {output}" More explicitly, a fuzzy rule Ri with k-inputs and1-output has the following form.

con-Ri : IF (I1 is Ai1)Θ(I2 is Ai2)Θ Θ(Ik is Aik)THEN (O is Bi),

(2.2)

where {I1, , Ik} represents the crisp inputs to the rule {Ai1, , Aik} and Biare linguistic variables The operator Θ can be AND, OR, or NOT InferenceEngine is in charge of the estimation of the Fuzzy output set It calculates themembership degree (µ) of the output for all linguistic variables by applying the ruleset described in Knowledge Base For Fuzzy rules with lots of inputs, the outputcalculation depends on the operators used inside it, i.e., AND, OR, or NOT Thecalculation for each type of operator is described as follows:

(Ii is Ai AND Ij is Aj) :

µAi∩Aj(Iij) = min (µAi(Ii), µAj(Ij)),(Ii is Ai OR Ij is Aj) :

µAi∪Aj(Iij) = max (µAi(Ii), µAj(Ij)),(NOT Ii is Ai) :

µA¯i(Ii) = 1 − µA i(Ii)

The last component helps to convert the fuzzy output set from the linguistic variablesinto a crisp value The most popular fuzzy solution is a methodology called thecentroid technique, described as follows:

Center of Gravity of B (CoGB) =

R+∞

−∞ µB(z)zdz

R+∞

−∞ µB(z)dz , (2.3)where µB(z) is the output membership function of the linguistic variable B

Trang 18

Chapter 3

Literature Review

3.1 Related Work

Initially, This thesis introduces the existing works related to periodic charging

in WRSNs In [9], the authors leverage PSO and GA to propose a charging pathdetermination algorithm that minimizes the docking time during which the MCrecharges itself at the depot [1] jointly considers charging path planning and depotpositioning to minimize the number of MCs while ensuring no sensor runs out ofenergy before being recharged The work in [10] determines a charging path tomaximize the MC’s accumulative charging utility gain or minimize the MC’s energyconsumption during traveling The authors then propose approximation algorithmswith constant ratios for the maximization and minimization problems Arguing that

an MC can not fulfill all sensors’ demand in dense networks, W Xu et al in [11]introduce a multi-chargers approximation model to increase the charging speed

In [12], C Lin et al derive a new energy transfer model with distance and anglefactors They also consider the problem of minimizing the total charging delay forall nodes They use linear programming and obtain the optimal solution As thecharging schedule is always fixed, the periodic scheme fails to adapt to the dynamic

of sensors’ energy consumption

Regarding the on-demand charging, the authors in [17] address the node failureproblem They first propose to choose the next charging node based on the chargingprobability Second, they introduce a charging node selected method to minimize thenumber of other requesting nodes suffering from energy depletion In [2,14], aiming

to maximize the charging throughput, they propose a double warning thresholdcharging scheme Two dynamic warning thresholds are triggered depending on theresidual energy of sensors The authors in [18] studied how to optimize the servingorder of the charging requests waiting in the queue using the gravitational searchalgorithm In [16], X Cao et al introduce a new metric (i.e., charging reward), whichquantifies the charging scheme’s quality The authors then address the problem ofmaximizing the total reward in each charging tour under the constraint of the MC’s

Trang 19

energy and sensors’ charging time windows They use a deep reinforcement based on-demand charging algorithm to solve the addressed problem.

learning-The existing charging algorithms have two serious problems First, the chargingtime problem has not been thoroughly considered Most of the charging schemesleverage either the fully charging approach [1, 2, 9, 10, 13, 14, 17], or the partialcharging one [21] I want to emphasize that the charging time is an essential factorthat decides how much the charging algorithm can prolong the network lifetime.Moreover, there is no existing work considering the target coverage and connectivityconstraints concurrently Most previous works treat all sensors in WRSNs evenly;hence, the MC may charge unnecessary sensors while necessary ones may run out

of energy Unlike them, this work addresses the target coverage and connectivityconstraints in charging schedule optimization This thesis uniquely considers theoptimization of charging time and charging location simultaneously I use Fuzzylogic and Q-learning in my proposal

Fuzzy logic has been applied in many fields such as signal processing [22, 23],robotics [24], embedded controllers [25] In WSNs, Fuzzy logic is a promising tech-nique in dealing with various problems, including localization, routing [26,27], clus-tering [19], and data aggregation [28, 29] R M Al-Kiyumi et al in [26] propose aFuzzy logic-based routing for lifetime enhancement in WSNs, which maps networkstatus into corresponding cost values to calculate the shortest path In [20], theauthors also leverage Fuzzy logic and Q-learning, but in a cooperative multi-agentsystem for controlling the energy of a microgrid In [30], Fuzzy and Q-learning arecombined to address the problem of thermal unit commitment Specifically, each in-put state vector is mapped with the Fuzzy rules to determine all the possible actionswith corresponding Q-values The main idea is exploiting Fuzzy logic to map net-work status into corresponding cost values to calculate the shortest path Recently,the authors in [15] use Fuzzy logic in an algorithm for adaptively determining thecharging threshold and deciding the charging schedule Different from the others, Iuse Fuzzy logic and Q-learning in my unique Fuzzy Q-charging proposal The earlierversion of this work has been published in [31], which considers only Q-charging

We call a sensor a target-covering sensor if it covers at least one target Moreover,

Trang 20

Figure 3.1: Network model

if there exists an alive routing path between a sensor and the base station, it isconnected to the base station The target is defined as to be monitored when atleast one sensor connected to the base station covers it

A sensor node that has its remaining energy below Eth (i.e., a predefined old) will send a charging request to the MC We target a non-preemptive chargingschedule, in which charging requests from sensors are queued at the MC We as-sume that there are k charging locations denoted by D1, , Dk in the network.When the MC completes its tasks at a charging location, it runs the proposed al-gorithm to select the next optimal charging location from D1, , Dk Moreover,the MC also determines the optimal charging time at that charging location Whenthe energy of the MC goes below a threshold, it returns to the depot to rechargeitself Besides gathering the sensing information, the base station is also responsiblefor collecting information about the remaining energy sensors Based on that, the

thresh-MC estimates every sensor’s energy consumption rate using the weighted ing method Given all sensors and the targets’ locations, the on-demand chargingalgorithm aims to maximize the number of monitored targets

Trang 21

• The MC leverages Fuzzy logic to calculate a so-called safe energy level, which

is sufficiently higher than Eth The MC then uses the algorithm described

in Section 4.3 to determine the charging time at each charging location Thecharging time is optimized to maximize the number of sensors which guaranteethe safe energy level

• The MC calculates the reward of every charging location using (4.9), andupdate the Q-table using equation (4.1)

After finishing charging at a charging location, the MC selects the next charginglocation as the one with the highest Q-value Finally, the MC moves to the nextcharging location and charges for the determined charging time When the energy

of the MC goes below a threshold, it returns to the depot to recharge itself Figure4.1 presents the overview of our charging algorithm

4.2 State space, action space and Q table

In our Q-learning-based model, the network is considered the environment whilethe MC is the agent A state is defined by the current charging location of the

MC, and an action is a move to the next charging location Each MC maintainsits own Q-table, which is a two-dimensional array Each row represents a state,and each column represents an action An item Q(Dj, Di) in the j-th row and i-thcolumn represents the Q-value corresponding to the action when the MC moves fromthe current charging square Dj to the next charging location Di Figure 4.2 shows

Trang 22

Figure 4.1: The flow of Fuzzy Q-learning-based charging algorithm

Figure 4.2: Illustration of the Q-table

an illustration of our Q-table In the figure, the gray row represents the Q-valuesconcerning all possible actions when the MC stays at the charging location Dc Thegreen cell depicts the maximum Q-value regarding the next charging location.Let Dcbe the current charging location and Dibe an arbitrary charging location,then the Q-value of action moving from Dcto Di is iteratively updated by using theBellman equation as follows:

Q (Dc, Di) ← Q (Dc, Di) + α(r(Di) + γ max

The equation ’s right side consists of two elements, including the current Q-valueand the temporal difference The temporal difference measures the gap between theestimated target, i.e., r(Di) + γ max

α and γ are two hyper-parameters whose names are learning rate and discountfactor, respectively r(Di) is our proposed reward function which will be detailed inSection 4.5

In the following, we first describe our algorithms to determine the optimal ing time and the safety energy level in Sections 4.3, 4.4 Then, we present the details

Trang 23

charg-of the reward function and the mechanism for updating the Q-table in Sections 4.5,4.6.

4.3 Charging time determination

This work aims to design a charging strategy so that the number of sensorsreaching a safe energy level is as big as possible after each charging round Here, thesafe energy level means the energy amount that is sufficiently greater than Eth Wedefine the safe energy level, Esf, as

Esf = Eth+ θEmax, (4.2)

where Emax is the maximum energy capacity of the sensors θ is an adaptive eter that is determined by Fuzzy logic The algorithm determining θ algorithm will

param-be descriparam-bed in Section4.4

A sensor node has the critical status if its remaining energy is smaller than to

Esf The sensor with critical status is named as critical sensor Otherwise, a sensornode is called a normal sensor For each charging location Di(1 ≤ i ≤ l), we want todetermine the optimal charging time Ti to minimize the number of critical sensors

We adopt the multi-nodes charging model, in which the MC can simultaneouslycharge to all sensors According to [32], the per second energy that a sensor Sj ischarged when the MC stays at Di is given by

pij = λ(d(Sj, Di) + β)2, (4.3)where λ and β are known constants decided by the hardware of the charger andreceiver d(Sj, Di) is the Euclidean distance between Sj and Di We denote ej as theenergy consumption rate of Sj which is estimated by the MC Suppose that the MCcharges Sj at Di, we denote the remaining energy of Sj when the charging processstarts and finishes as Ej and Ej0, then Ej0 = Ej+(pij−ej)×Ti At the charging location

Di, we call pij−ej the energy gain of Sj The remaining energy of Sj will increase if itsenergy gain is positive and decreases otherwise Note that the energy of Sj equals tothe safety energy level, if the charging time equals to Epsfi −Eaj

aj−eaj , which is named as thesafety charging time of Sj with respect to the charging location Di, and denoted as

∆i

j The sensors can be classified into four groups The first and second ones containnormal sensors with positive energy gain and critical sensors with negative energygain, respectively The third and fourth groups contain normal sensors with negativeenergy gain, and critical sensors with positive energy gain, respectively Obviously,the first and second groups’ sensors don’t change their status no matter how longthe MC charges at Di In contrast, a sensor Sj in the third group will fall into thecritical status, and a sensor in the four groups can alleviate the critical status, if the

Ngày đăng: 10/10/2022, 07:42