Classical fictitious play with decision and observ- 123docz.net

CHAPTER 3 FICTITIOUS PLAY FOR NETWORK SECURITY

3.3 Classical fictitious play with decision and observation errors

3.3.1 Classical fictitious play with observation errors

We present in this subsection some analytical results for the case where the error probabilities associated with the sensor systems are known to the players. We consider a discrete-time fictitious play process where at each time stepk, PlayerPi computes the empirical frequency of its opponent’s actions and picks the best response to this empirical frequency, as described in 3.2.3.

α 1−α

1−γ

ǫ 1−ǫ

à 1−à

a11 (A) a11 (A)

a21 (N) a21 (N)

a12 (D) a12 (D)

a22 (N) a22 (N)

P1 (Attacker) P2 (Defender)

Actions

Actions Observations

Observations

Figure 3.2: Players observe their opponent’s actions through binary channels with error probabilities α, γ, ǫ, and à.

Proposition 3.3. Consider the discrete-time two-player fictitious play with imperfect ob- servations given in Figure 3.2. Let eα, eγ, eǫ, andeà be the empirical error frequencies of the observations corresponding to the error probabilities α, γ, ǫ, and à. It holds that

k→∞lim a.s. eα =α; lim

k→∞ a.s. eγ =γ,

k→∞lim a.s. eǫ =ǫ; lim

k→∞ a.s. eà=à, where we use lim a.s. to denote almost sure convergence.

Proof. Given that the binary channels are independent from stage to stage, this proposition can be proved using the strong law of large numbers.

Proposition 3.4. Consider a discrete-time two-player fictitious play with imperfect obser- vations given in Figure 3.2. Let pi(k) be the observed frequency and pi(k) be the empirical frequency of Player Pi at time step k, it holds that

k→∞lim a.s. pi(k) = Cipi(k), i= 1,2, (3.15)

where Ci, i= 1,2 are the channel matrices given by

C1 =



 1−α γ

α 1−γ



, C2 =



 1−ǫ à

ǫ 1−à



. (3.16)

Proof. At stage k, let N1(k) and N2(k) respectively be the numbers of Player P2’s action 1 and action 2 that PlayerP1 has observed. Also, letN1(k) andN2(k) be the real numbers of action 1 and action 2 of Player P2. Apparently, N1(k) +N2(k) = N1(k) +N2(k) = k. We

have that 

 N1(k) N2(k)



 =



 1−eǫ(k) eà(k) eǫ(k) 1−eà(k)







 N1(k) N2(k)



. (3.17)

Dividing both LHS and RHS of (3.17) by k and taking the limit as k → ∞, we get the part of (3.15) for Player P2. The part for Player P1 can be proved similarly. For notational simplicity, from now on we will suppress the time step in pi(k) and pi(k).

Now suppose that the play order is P1-P2 and P2 observesP1’s actions (with errors). We state the following proposition.

Proposition 3.5. If the error probabilities are known to the players, in each stage, the best response based on the distribution of the information sets is also the best response based on the empirical frequency.

Proof. In Table 3.1, we have the expected payoffs of P2 for pure strategies D and N, given the true empirical frequency of P1, (p11, p21). In classical FP, if P2 ignores the error, it just picks from this table the pure strategy that yields the better payoff (or randomizes over two pure strategies with probability 0.5 each if they yield the same payoff). We call this the best response based on the empirical frequency.

Now we look at the P1-P2 extensive form with imperfect observations plotted in Figure 3.3. In this graph, we model the imperfect observations as the Nature play [10]. Here the information sets [10] are the very observations of the Defender. Table 3.2 shows the payoffs of P2 given a particular information set. Each entry of this table is the weighted average of the payoff of P2 given an information set and a pure strategy of this player. As can be seen

from here, ifP2 plays D, its expected payoff will be U21 =q11ep11(1−α) +gp21γ

p11(1−α) +p21γ +q12ep11α+gp21(1−γ) p11α+p21(1−γ) ,

where q11 and q21 are the probabilities of information set I and II, respectively, q11 =p11(1− α) +p21γ,q12 =p11α+p21(1−γ). We then have

U21 =ep11(1−α) +gp21γ+ep11α+gp21(1−γ) =ep11+gp21.

Otherwise, if P2 plays N, the expected payoff can be obtained similarly: U22 = f p11 +hp21. These two expected payoffs (when P2 plays D and N, respectively) are exactly the same as those in Table 3.1. Thus Proposition 3.5 is proved.

Remark 3.1. Although the result given in this proposition is not surprising, it does pave the way for us to propose two FP algorithms, which will be presented in the next section, where the players simply compensate for the effects of the observation errors before playing the regular FPs.

Table 3.1: Expected payoffs of P2 for pure strategies D and N, given the true empirical frequency ofP1, (p11, p21).

P2\P1 A w.p. p11 and N w.p. p21 (p11 +p21 = 1) D ep11+gp21

N f p11+hp21

Table 3.2: Payoffs ofP2 at a given information set.

P2\P1 I(q11) II(q12) D epp111(1−α)+gp21γ

1(1−α)+p21γ

ep11α+gp21(1−γ) p11α+p21(1−γ)

N f pp111(1−α)+hp21γ 1(1−α)+p21γ

f p11α+hp21(1−γ) p11α+p21(1−γ)

Nature

A A

N N

D D

D D I

a, e

a, e b, f b, f c, g d, h c, g d, h

Figure 3.3: P1-P2 extensive form with imperfect observation.

Theorem 3.1. For a classical nonzero-sum2×2fictitious play process with imperfect obser- vations, let Ci, i= 1,2, be the channel matrices given in (3.16), where 0≤ α, γ, ǫ, à <0.5.

At time k, let player i, i= 1,2, carry out the following steps (Algorithm 5):

(i) Update the observed frequency of the opponent p−i using (3.12).

(ii) Compute the estimated frequency using p−i =C−i−1p−i, i= 1,2.

(iii) Pick the optimal pure strategy using Table 3.1. If there are multiple optimal strategies, randomize over the optimal strategies with equal probabilities.

Then the mixed strategies of the players will converge to the Nash equilibrium of the under- lying static game as k → ∞.

Proof. Proposition 3.3 and the convergence proof for classical non-zero-sum 2×2 fictitious play [26] lead to this theorem. Note that with the assumption 0≤ α, γ, ǫ, à < 0.5, C1 and C2 are always invertible.

Remark 3.2. Theorem 3.1 only states the convergence of the classical version. A conver-

gence proof for the stochastic discrete-time FP, however, is as yet not available.

3.3.2 Classical fictitious play with decision errors

In this subsection, we consider the situations where players are not totally rational or the channels carrying commands are error prone. Specifically, P1 makes decision errors with probabilities αij’s where αij, i, j = 1. . .2, is the probability that P1 intends to play action i but ends up playing action j, αij ≥ 0, P2

j=1αij = 1, i = 1. . .2. Similarly, P2’s decision error probabilities are given by ǫij, ǫij ≥ 0, P2

j=1ǫij = 1, i = 1. . . n. This is called the

“trembling hand” problem in the game theory literature (see for example, [10], Subsection 3.5.5). The decision error matricesD1 and D2 are

D1 =



 1−α γ

α 1−γ



, D2 =



 1−ǫ à

ǫ 1−à



 (3.18)

These decision errors are illustrated in Figure 3.4.

α 1−α

1−γ

1−ǫ

1−à

a11 (A) a11 (A)

a21 (N) a21 (N)

a12 (D) a12 (D)

a22 (N) a22 (N)

P1 (Attacker)

P2 (Defender)

Figure 3.4: A 2×2 game where players make decision errors with probabilities α, γ, ǫ, and à.

With the same arguments as in the case of FP with observation errors, we can see that if each player precompensates for the decision errors (by randomizing her action withDi−1, i=

1,2), both players will end up playing their intended strategies. The FP process then will converge to the Nash equilibrium of the static game. A more detailed treatment can be found in Section 3.5.

3.3.3 Algorithms

We present in this subsection two algorithms for classical FP. Algorithm 4, derived from [25,26], is for the perfect observation case. Players also use these algorithms when they have no estimates of the error probabilities of their sensor systems and thus ignore the errors.

Algorithm 5, developed based on the analysis in Subsection 3.3.1, is used for players who have estimates of the error probabilities and want to compensate for these.

1: Given payoff matrixMi,i= 1,2.

2: for k ∈ {0,1,2, . . .} do

3: Update the empirical frequency of the opponent using (3.12).

4: Pick the optimal pure strategy using Table 3.1. If there are multiple optimal strategies, randomize over the optimal strategies with equal probabilities.

5: end for

Algorithm 4: Classical FP with perfect observations.

1: Given payoff matrixMi, channel matrixCi given by (3.16), i= 1,2.

2: for k ∈ {0,1,2, . . .} do

3: Update the observed frequency of the opponent using (3.12).

4: Compute the estimated frequency using

p−i =C−i−1p−i, i= 1,2.

5: Pick the optimal pure strategy using Table 3.1. If there are multiple optimal strategies, randomize over the optimal strategies with equal probabilities.

6: end for

Algorithm 5:Classical FP with imperfect observations.

3.3.4 Simulation results

We present in this subsection some simulation results for the classical discrete-time FP. The payoff matrices of PlayerP1 and PlayerP2 are chosen to be

M1 =



 1 5 3 2



, M2 =



 4 1 3 5



, (3.19)

which satisfy Assumptions 3.1. The static game with simultaneous moves does not have a pure strategy NE. The mixed strategy NE is (0.8, 0.2) and (0.6, 0.4). The error probabilities of the binary channels are α= 0.1, γ = 0.05, ǫ= 0.2, andà= 0.15. The number of steps for the simulations of classical FP is 10,000. The empirical frequencies of the players in classical FP are plotted in Figures 3.5, 3.6, and 3.7.

0 2000 4000 6000 8000 10000 12000

0 0.2 0.4 0.6 0.8 1

CFP with perfect observation − Empirical frequency of the Attacker

0 2000 4000 6000 8000 10000 12000

0 0.2 0.4 0.6 0.8 1

CFP with perfect observation − Empirical frequency of the Defender

Figure 3.5: Classical FP with perfect observations.

0 2000 4000 6000 8000 10000 12000 0

0.2 0.4 0.6 0.8 1

CFP with imperfect obs. − Attacker aware of errrors − Emp. freq. of the Attacker

0 2000 4000 6000 8000 10000 12000

0 0.2 0.4 0.6 0.8 1

CFP with imperfect obs. − Defender aware of errors − Emp. freq. of the Defender

Figure 3.6: Classical FP with imperfect observations where players are aware of the error probabilities.

0 2000 4000 6000 8000 10000 12000

0 0.2 0.4 0.6 0.8 1

CFP with imperfect obs. − Attacker unaware of errrors − Emp. freq. of the Attacker

0 2000 4000 6000 8000 10000 12000

0 0.2 0.4 0.6 0.8 1

CFP with imperfect obs. − Defender unaware of errors − Emp. freq. of the Defender

Figure 3.7: Classical FP with imperfect observations where players are unaware of errors.

As can be seen from the graphs of classical FP, when the observations are perfect, the empirical frequencies converge to the mixed strategy NE (Figure 3.5). When the observations

are erroneous and the players use Algorithm 5 to fix the errors, the empirical frequencies exhibit larger fluctuations but still converge to the NE (Figure 3.6). Finally, if the observations are erroneous, but the error probabilities are unknown to the players, and they still use Algorithm 4, Figure 3.7 shows the deviation of the empirical frequencies from the original NE.

Classical fictitious play with decision and observation errors

The existence of optimal solutions

KDD Cup 1999 data and simulation results