Opportunistic spectrum access for energy

At the beginning of each time slot, the secondary user can choose to stay idle or to carry out spectrum sensing to detect if the primary network is idle or active.. Given the delay cost

Trang 1

Cognitive Radios Anh Tuan Hoang, Ying-Chang Liang, David Tung Chong Wong,

Yonghong Zeng, and Rui Zhang Institute for Infocomm Research, Singapore

Abstract

This paper considers a scenario in which a secondary user makes opportunistic use of a channel allocated

to some primary network The primary network operates in a time-slotted manner and switches between idle and active states according to a stationary Markovian process At the beginning of each time slot, the secondary user can choose to stay idle or to carry out spectrum sensing to detect if the primary network is idle or active If the primary network is detected as idle, the secondary user can carry out data transmission Spectrum sensing consumes time and energy and introduces false alarms and mis-detections Given the delay cost associated with staying idle, the energy costs associated with spectrum sensing and data transmission, and the throughput gain associated with successful transmissions, the objective is to decide, for each time slot, whether the secondary user should stay idle

or carry out sensing, and if so, for how long, to maximize the expected net reward We formulate this problem

as a partially observable Markov decision process (POMDP) and prove several structural properties of the optimal spectrum sensing/accessing policies Based on these properties, heuristic control policies with low complexity and good performance are proposed.

I INTRODUCTION

The traditional approach of fixed radio spectrum allocation leads to under-utilization It has been reported

in recent studies by the US Federal Communications Commission (FCC) that there are vast temporal andspatial variations in the usage of allocated spectrum [1] This motivates the concepts of opportunisticspectrum access (OSA), which allows secondary cognitive radio (CR) systems to opportunistically exploitthe under-utilized spectrum

One of the core components of an OSA system is the spectrum-sensing module, which examines aspectrum of interest to determine whether the primary network which owns the spectrum is currentlyactive or idle Spectrum sensing, therefore, is a binary hypothesis test or a series of binary hypothesistests Spectrum sensing can have several effects on the OSA system, which are highlighted as follows

Trang 2

• Energy consumption: carrying out spectrum sampling and subsequent signal processing consumes

energy In general, the longer the sensing time, the more the consumed energy

• Time consumption: to avoid self interfering with the sensing process, the OSA system may need to

suspend its data communications when carrying out spectrum sensing This means spectrum sensingconsumes communication time

• False alarms: a false alarm occurs when the spectrum sensing module mistakes an idle primary

network as active This leads to a spectrum access opportunity being overlooked

• Mis-detections: a mis-detection occurs when the spectrum sensing module mistakes an active primary

network as idle This leads to possible collision between secondary and primary transmissions

In this paper, we take the above effects into account when designing an energy-constrained OSA system

A General Control Problem

We consider a scenario in which a secondary user makes opportunistic use of a channel allocated tosome primary network The primary network operates in a time-slotted manner and switches betweenidle and active states according to a stationary Markovian process Within each time slot, the state of theprimary network remains unchanged At the beginning of each time slot, the instantaneous state of theprimary network is not directly observable and the secondary user needs to decide whether to stay idle

or to carry out spectrum sensing If the secondary user chooses to carry out spectrum sensing, it needs todecide the duration of the sensing period and to configure related parameters to meet a minimum detectionprobability Subsequently, if spectrum sensing indicates that the primary network is idle, the secondaryuser proceeds to transmit data during the rest of the time slot

There are important trade-offs when the secondary user makes the above control decisions By stayingidle in a particular time slot, the secondary user conserves energy, but at the same time suffers increase indelay and reduction in throughput By carrying out spectrum sensing, the secondary user consumes timeand energy to acquire knowledge of the state of the primary network, and stands a chance to transmitdata if the primary network is idle Furthermore, there are trade-offs involving energy consumption,sensing accuracy, and transmission time when the duration of sensing periods is varied When the requiredprobability of detection is fixed, increasing the sensing time can reduce the probability of false alarms andtherefore increase the probability of transmission for the secondary user However, increasing the sensingtime also reduces the time available for transmission

Trang 3

For the secondary user, given the delay cost associated with staying idle in a time slot, the energycosts associated with spectrum sensing and data transmission, and the throughput gain associated with

a successful transmission, we consider the problem of finding an optimal policy which decides the idleand sensing modes, together with spectrum sensing time, to maximize the expected net reward Here, thereward is defined as a function of delay and energy costs and throughput gain

B Contributions

The main contributions of this paper are as follows

• We formulate the control problem that captures important throughput and delay/energy trade-offs forOSA systems The problem involves important decisions for the secondary user, i.e., to stay idle or

to carry out spectrum sensing, and to determine the optimal duration of each sensing period

• We analyze the problem using the framework of partially observable Markov decision processes(POMDP) and prove important structural properties of the optimal control policies that maximize theexpected net reward

• Based on theoretical characterization of the optimal policies, we propose heuristic control policiesthat can be obtained at lower complexity while achieving good performance One of these policies

is based on grid approxmimation of POMDP solutions

• Finally, we obtain numerical results to support our theoretical analysis

C Related Work

There has been a series of recent works on optimizing spectrum-sensing activities in OSA systems[2]–[6] These works can roughly be classified into two groups, i.e., those that focus on the control withineach time slot, when the status of a primary network is more or less static [2]–[4] and those that focus

on the time dynamics of the control problem [5], [6]

In [2]–[4], the sensing duration within each time slot can be varied, i.e., the spectrum-sensing modulecan operate at different ROC curves The objective then is to trade off between sensing accuracy and timeavailable for communications Assuming the mis-detection probability is fixed, the longer the sensingduration, the lower the false alarm probability However, the longer the sensing duration, the less time isavailable for communications As the focus is on control within each time slot, the dynamics of primarynetworks is not taken into account in [2]–[4]

Trang 4

In [5], Zhao et al consider a spectrum access scenario similar to ours, where primary networks switchbetween idle and active states in a Markovian manner and a secondary user carries out spectrum sensingprior to opportunistic data transmission An important result of [5] is the separation principle, whichdecouples the sensing policy from that of the sensor and spectrum access policy Unlike in our model,

in [5], the energy cost of sensing is not of concern and the secondary user carries out sensing in everytime slot Furthermore, in [5], determining the sensing duration is not part of the control problem, rather,

it is assumed that a fixed Receiver Operating Characteristic (ROC) that defines the relationship betweenfalse-alarm and mis-detection probabilities is given In our model, varying the spectrum-sensing durationresults in different ROC curves The work in [6] does take into account the energy and power consumptionwhen scheduling spectrum sensing However, in [6], spectrum sensing is assumed perfect, i.e., there are

no false alarms and mis-detections

To some extend, this paper bridges the gap between the two classes of problems considered in [2]–[4]and in [5], [6] In particular, our control problem recognizes the energy costs of sensing and transmission,allows the variation of spectrum-sensing duration within each time slot, and incorporates the dynamics

of the primary network over time All these factors are taken into account for the final objective ofmaximizing the expected long term net reward received by the secondary user

It is also interesting to note that some important results in this paper bear significant similarities tothose in [7] and [8], even though the control scenarios are totally different In [7], [8], the problem ofscheduling packet transmission over time-varying channels with memory is considered, with the objective

of balancing throughput and energy consumption The state of the channel is not directly observed andthe authors also formulate and analyze the problem using a POMDP framework

D Paper Organization

The rest of this paper is organized as follows In Section II, we describe the system model and thecontrol problem Important properties of the optimal control policies are proved and discussed in SectionIII In Section IV, heuristic policies are proposed Numerical results and discussion are presented inSection V Finally, we conclude the paper and highlight future directions in Section VI

Trang 5

II SYSTEM MODEL

where b is the probability that the primary network becomes active in the next time slot, given that it is

idle in the current slot and g is the probability that the primary network becomes idle in the next time

slot, given that it is active in the current time slot The stationary probabilities of being idle and activefor the primary network are πi = g/(b + g) and πa = b/(b + g), respectively

From the secondary user’s point of view, when the primary network is idle, the user has a ‘good’ channel

to exploit On the other hand, an active primary network results in a ‘bad’ channel for the secondary user.

This leads to an interesting observation thatM can be regarded as the state-transition matrix for a virtual

Gilbert-Elliot (GE) channel of the secondary user In [9], for a GE channel with state-transition matrix

M, the channel memory is defined as µ = 1 − b − g When µ > 0, it is said that the channel has positive

memory, i.e., the probability of remaining in a particular state is greater than or equal to the stationaryprobability of that state In this paper, we also assume that µ = 1 − b − g > 0

B Opportunistic Spectrum Access

A secondary user opportunistically accesses the channel when the primary network is idle by firstsynchronizing with the slot structure of the primary network and then carrying out the following mechanism(illustrated in Fig 1)

1) Spectrum Sensing: If the secondary user wishes to transmit in a particular slot, it will first spend a

time durationτ at the beginning of the slot to carry out spectrum sensing This basically involves sampling

the channel and carrying out a binary hypothesis test:

H0 : the primary network is idle,

versus H1 : the primary network is active

(2)

Letθ denote the outcome of the above binary hypothesis test, where θ = 0 means H0 is detected andθ = 1

otherwise Associated with the spectrum sensing activity are probability of false alarm, i.e., mistakingH0

Trang 6

for H1, and probability of mis-detection, i.e., mistaking H1 for H0 In this paper, we assume that thesecondary network must carry out spectrum sensing to meet a fixed probability of detection Pd Then,the probability of false alarm is a function of the sensing time τ and is denoted by Pf a(τ ) The sensing

durationτ must be within the interval [τmin, τmax], where 0 < τmin ≤ τmax < T It is assumed that for the

given range of τ , 0 < Pf a(τ ) < Pd< 1 This is in fact a reasonable assumption as in practical cognitive

radio systems [10], we normally have Pd> 90% and Pf a < 10% Furthermore, we assume that Pf a(τ ) is

continuous, differentialbe, and decreasing in τ ∈ [τmin, τmax]

2) Data transmission: If the spectrum sensing results inθ = 0, the secondary user proceeds to transmit

data in the rest of the time slot Otherwise, if θ = 1, the secondary user must stay quiet and wait until

the next time slot to try again

3) Acknowledgment: Even though the spectrum sensing outcome indicates θ = 0, this can be due to

a mis-detection Mis-detections result in collisions between primary and secondary transmissions In thispaper, we assume that if collision happens due to mis-detection, a negative acknowledgment (NAK) isreturned On the other hand, if the secondary transmission is carried out when the primary network isactually idle, a positive acknowledgment (ACK) is returned

C POMDP Formulation

At the beginning of each time slot, the secondary user decides whether or not to carry out spectrumsensing, and if so, for how long As the instantaneous state of the primary network is not directly observed,our control problem can be classified as a discrete-time POMDP with the following components

1) Belief State: In a discrete-time POMDP, the decision maker selects an action and receives some

reward, together with some observation which reveal information about the actual system state It is wellknown ( [11]) that for each POMDP, all information that is useful for making decisions can be encapsulated

in the posterior distribution vector of the system states In our control problem, at the beginning of eachtime slot, based on previous actions and observations, the secondary user can calculate the probabilitythat the primary network is idle in the time slot We denote this probability by p and name it the ‘belief

state’ After each time slot, depending on the action taken by the secondary user and the corresponding

outcome, the belief state p can be updated according to one of the following four cases

Case 1: The secondary user stays idle and does not carry out spectrum sensing Then, the next belief

state, i.e., the probability that the primary network is idle in the next time slot can be derived as:

L1(p) = p(1 − b) + (1 − p)g = p(1 − b − g) + g (3)

Trang 7

Case 2: The secondary user senses the channel for the duration τ , obtains the outcome θ = 1, i.e., the

primary network is active, and therefore needs to keep quiet in the rest of the time slot Using Bayes’rule, the belief state in the next time slot can be derived as:

L2(p, τ ) = pPf a(τ )(1 − b) + (1 − p)Pdg

pPf a(τ ) + (1 − p)Pd

Case 3: The secondary user senses the channel for the duration τ , obtains θ = 0, i.e., the primary

network is idle, carries out transmission and subsequently receives an ACK at the end of the slot TheACK implies that the primary network is actually idle during the current time slot and therefore, the beliefstate in the next time slot is L3 = 1 − b

Case 4: All the same as Case 3, except that an NAK is received at the end of the slot This implies that

mis-detection happens during spectrum sensing, the primary network is actually active during the currenttime slot and therefore, the belief state in the next time slot is L4 = g

As can be noted, L3 and L4 do not depend on p and τ However, in the rest of this paper, in order

to simplify the notation, we use L1(p, τ ), L3(p, τ ), L4(p, τ ) interchangeable for L1(p), L3, L4 definedabove Also, letting Qi(p, τ ), i = 2, 3, 4 denote the probabilities that Case i above happens, we have:

Q2(p, τ ) = pPf a(τ ) + (1 − p)Pd, Q3(p, τ ) = p(1 − Pf a(τ )), and Q4(p, τ ) = (1 − p)(1 − Pd) (5)

2) Properties of Li(p, τ ): From (3) and the assumption that the state-transition matrix M has positive

memory, i.e., 1 − b − g > 0, it follows that L1(p) is increasing in p Also, it can be verified that:

∂p is positive and increasing inp Therefore, L2(p) is convex

and increasing in p At the same time:

L2(p) = (1 − b)(pPf a(τ ) + (1 − p)Pd) − (1 − p)Pd(1 − b − g)

Similarly, it can be shown that L2(p) ≥ g So

3) Costs, Reward, and Control Objective: When carrying out spectrum sensing, the secondary user

spends energy in channel sampling and signal processing We assume that the energy cost of carryingout spectrum sensing for τ units of time is a continuous, non-negative and increasing function in τ and

is denoted by cs(τ ) If the sensing outcome is θ = 0, the secondary user proceeds to transmit during the

Trang 8

rest of the time slot Let rt and ct respectively be the gain in throughput and the energy cost, both aremeasured per unit of transmission time It is reasonable to assume that rt > ct ≥ 0, otherwise, there is

no justification for the secondary user to carry out transmission

Note that the secondary user can also choose to stay idle, i.e., neither carry out spectrum sensing nordata transmission, during a time slot to conserve energy However, doing so results in negative effectssuch as lower throughput and longer latency We assume that, for the secondary user, the cost of stayingidle during each time slot is ci, ci ≥ 0

In a particular time slot, if the probability of the the primary network being idle isp and the secondary

user carries out spectrum sensing for τ units of time, then the expected net gain can be calculated as:

To simplify the notation, we also use τ = 0 to represent that the secondary user chooses to stay idle

We then have the following expected reward when the sensing decision is set to τ

(10)

We are interested in the following problem

Definition 1: Let pn denote the probability that the primary network is idle during time slot n, select

the sensing time τn, τn∈ {0, [τmin, τmax]}, to maximize the following discounted reward function

where 0 < α < 1 is a discounting factor, and 1 ≤ N ≤ ∞ is the control horizon.

III STRUCTURE OF OPTIMAL POLICIES

A Monotonicity and Convexity of Value Functions

When N < ∞, let VN(p) denote the maximum achievable discounted reward function in Definition 1,

VN(p) satisfies the following Bellman equation ( [11]):

Trang 9

V1(p) = max

When N = ∞, let V (p) denote the maximum achievable discounted reward function in Definition 1,

V (p) satisfies the following Bellman equation ( [11]):

Note that in (14), G(p, τ ) is the immediate gain obtained by sensing for duration τ while the expected

discounted future gain given this sensing duration isαP

i=2,3,4Qi(p, τ )V (Li(p, τ )) As can be seen, both

immediate and future gains are dependent onτ

It can be shown ( [11]) that limn→∞Vn(p) = V (p) It can also be verified that VN(p) and V (p) are

continuous in p Let us now prove some important structural results for VN(p) and V (p)

Proposition 1: VN(p) and V (p) are nondecreasing in p.

Proof: First, let us prove the property for VN(p), N < ∞ The proof proceeds by induction AsG(p, τ ) is increasing in p, from (13), it follows that V1(p) is nondecreasing in p Now assuming that

Vn(p) is nondecreasing in p for some value of n ≥ 1, we have

are both nondecreasing in p, it follows that E1 is nondecreasing in p The first term in E2, i.e., G(p, τ ),

is nondecreasing in p For the second term in E2, letting 0 < q < p, we have:

Trang 10

is also increasing inp So the second term in E2 is increasing inp, which implies that E2 is also increasing

in p As both E1 and E2 are nondecreasing in p, it follows that Vn+1(p) is nondecreasing in p

As limn→∞Vn(p) = V (p), it follows that V (p) is also nondecreasing in p

Proposition 2: VN(p) and V (p) are convex in p.

Please refer to the Appendix A for the proof

Remark 1: Proposition 1 states intuitively that, the higher the probabilityp that the primary network is

idle at the beginning of the control process, the higher the maximum achievable expected reward VN(p)

and V (p) Proposition 2 then indicates how fast VN(p) and V (p) increase in p As VN(p) and V (p) are

convex, VN(p) and V (p) increase at least linearly in p

B Properties of Optimal Policies

Let us explore some useful structural properties of the optimal control policies Letting

G∗

(p) = max

as G(p, τ ) is increasing in p, so is G∗(p) We state the following property of the optimal control policies

Proposition 3: Letp∗ be the minimum value of p such that G∗(p∗

) > −ci If, in a particular time slot, the probability of the primary network being idle is p such that p ≥ p∗

, then an optimal policy must carry out spectrum sensing in that time slot.

Proof: Let us prove for the case whenN < ∞ We have V1(p) = maxτ ∈[τ min ,τ max ]

n

−ci, G(p, τ )o=max{−ci, G∗

Trang 11

the optimal control policies for our POMDP model may not possess the threshold-based structure (inthe value of p) To the best of our knowledge, there have been a limited number of works that prove

the threshold-based characteristic of the optimal control policies for some specific POMDP models (see[12]–[14]) Unfortunately, our problem does not directly fit into these models

A natural question to ask is, given that sensing is carried out, how the optimal sensing time τ would

vary with p Let

Proposition 4: FN(p, τ ) and F (p, τ ) are nondecreasing in τ

Please refer to the Appendix B for the proof

Remark 3: Essentially, Proposition 4 makes concrete the intuition that the more sensing being carried

out in the current time slot, the better the expected reward in the future time slots This is becauseincreasing the sensing time gives the secondary user more accurate knowledge of the state of the primarynetwork, which in turn improves future control

To further study the effect of varying the sensing time, we need to make the following assumptions

• A1: The probability of false alarm, i.e., Pf a(τ ), is convex and decreasing in τ , τmin ≤ τ ≤ τmax

• A2: The energy cost of sensing, i.e., cs(τ ), is convex and increasing in τ

For the justifications of assumptions A1 and A2, please refer to the Appendix E

Lemma 1: Given assumptions A1 and A2, function G(p, τ ) is concave in τ , τmin ≤ τ ≤ τmax.

Please refer to the Appendix C for the proof

As the function G(p, τ ) is concave, continuous, and has a strictly decreasing first order derivative for

all τ in the interval [τmin, τmax], there exists an unique maximum point for this function Letting

τ∗

(p) = arg max

τ ≤τ ≤τ

Trang 12

the following proposition relates the optimal sensing time to the value of τ∗(p) defined in (23).

Proposition 5: Given assumptions A1 and A2, if at the beginning of a particular time slot, the

prob-ability of the primary network being idle is p and sensing is carried out, then the optimal sensing time

τopt is greater than or equal to τ∗(p).

Proof: We prove for N = ∞, the case when N < ∞ is similar The proof is by contradiction

Suppose the optimal sensing timeτopt < τ∗(p) Due to the concavity of G(p, τ ) in τ we have G(p, τopt) <G(p, τ∗(p)) Furthermore, from Proposition 4, we have F (p, τopt) ≤ F (p, τ∗(p)), therefore

G(p, τopt) + F (p, τopt) < G(p, τ∗

(p)) + F (p, τ∗

which contradicts to the fact that τopt is the optimal sensing time given p This completes the proof

Proposition 5 gives the lower bound for the optimal sensing time τopt To see how this lower boundvaries with p, consider the first order derivative of G(p, τ ) with respect to τ :

∂G

∂τ = p

(rt− ct) Pf a(τ ) − (T − τ)P′

Lemma 2: Given assumptions A1 and A2, we have the following cases.

• Case 1: v(τmin) ≥ 0 then τ∗(p) is nondecreasing in p.

• Case 2: v(τmax) < 0 then τ∗(p) is nonincreasing in p.

• Case 3: v(τmin) < 0 ≤ v(τmax) Letting v(τv) = 0, τmin < τv ≤ τmax,

a) ifu(τv) > 0 then τ∗(p) is nondecreasing in p.

b) if u(τv) ≤ 0 then τ∗(p) is nonincreasing in p.

Please refer to the Appendix D for the proof

Remark 4: Proposition 5 states that when the probability of the primary network being idle is p and

sensing is carried out, then the lower bound of the optimal sensing time isτ∗(p) Lemma 2 further shows

that the lower bound τ∗(p) is always monotonic in p

Trang 13

Remark 5: If the energy costs of sensing and transmission are ignorable, then the gain of sensing for

τ unit of time when the probability of primary network being idle is p can be simplified to:

˜G(p, τ ) = p 1 − Pf a(τ )(T − τ)rt (26)Then the value of τ = ˜τ that maximizes ˜G(p, τ ) does not depends on p Therefore, it can be verified

that the policy that carries out sensing for τ unit of time for every time slot maximizes the expected˜

reward Our optimization problem is then equivalent to the problem considered in [2], [3], which focus

on maximizing throughput within each time slot

IV HEURISTIC POLICIES

Directly solving the POMDP described in Section III can be computational challenging In this section,suboptimal control policies that can be obtained at lower complexity are discussed

A Grid-based Approximation

Grid-based approximation is a widely-used approach for approximating solutions to POMDPs In thisapproach, the value function is approximated at a finite number of belief points on a grid The valuefunction at belief points not belonging to the grid is evaluated using interpolation In this paper, weemployed the fixed-resolution, regular-grid approach proposed by Lovejoy [15] Applying to our POMDP,the range of belief state is first divided into P equally spaced points p0, p1, pP −1 Then, the valuefunction at these grid points are calculated using the following iteration

As pointed out in [15], the iteration in (27) is guaranteed to converge and the value ˜V (p) in (28) is an

upper-bound of the optimal value function V (p) (as V (p) is convex) After obtaining the approximation

function ˜V (p), we can substitute into the Bellman equation (14) to obtain the corresponding sensing time

B Myopic Policy ζm(.)

Proposition 3 identifies the sufficient condition on the probability that the primary network is idle forsensing to be carried out Furthermore, Proposition 5 gives the lower bound on the optimal sensing time,given that sensing is carried out Based on these two propositions, we consider the following policy:

Định dạng
Số trang	26
Dung lượng	734,71 KB