DSpace at VNU: Optimal control problem for the Lyapunov exponents of random matrix products tài liệu, giáo án, bài giảng...
Trang 1Optimal Control Problem for the Lyapunov
N H DU2Communicated by G P Papavassilopoulos
Abstract. This paper deals with the optimal control problem for theLyapunov exponents of stochastic matrix products when these matricesdepend on a controlled Markov process with values in a finite orcountable set Under some hypotheses, the reduced process satisfies theDoeblin condition and the existence of an optimal control is proved.Furthermore, with this optimal control, the spectrum of the system con-sists of only one element
Key Words. Random matrix products, Lyapunov exponents, Markovprocesses, decision models, optimal policy, optimal control, systemspectrum
Trang 2For any admissible control (u t), that we shall define exactly below,
we consider the R d
-valued random variables (X n : nG0, 1, ) given by the
following difference equation:
X nC1GM(ξnC1, YnC1)Xn, (1a)
where (Y n ) is a sequence of i.i.d random variables and M(i, y) are invertible
dBd matrices The behavior of the solutions of the system (1) when the
transition probability does not depend on a has been studied by many authors; see Refs 1–3 Let X u
(x) be the solution of (1) associated with the control (u t ) The process (u t) affects the solutions of the system through the
transition probability P(a) We define the Lyapunov exponent of X u
(x) by
λu [x]Glim sup
n→ S(1yn) log uX u (x)u (2)
For any admissible control u, the Lyapunov exponent of the system (1)
is in general a random variable; then, in order to exclude the randomness,
we introduce here the new concept of essential Lyapunov exponent of
Therefore,Λu [·] takes many finite values, namely,
Hence, for the trivial solution X≡0 to be stable, it suffices to choose a
control (u t) such that the condition (3) holds However, this condition isnot always satisfied because, in some cases, among the class of admissible
controls, we are not able to find a control (u t) that yield negative Lyapunovexponents of the solutions of (1) So, in view of applications, it is natural
that we want to find a control (u t) with which the system (1) is nearest tostability This means that the Lyapunov exponents of our system must be
as small as possible
Trang 3This question leads us to consider the problem of minimizing the tionΛu
func-[x] over the class of admissible controls In this article, the main idea
for solving this problem is to relate it to the Markov decision problem withper unit average cost
The paper is organized as follows Section 2 introduces the fundamentalnotations and hypotheses, in terms of which we define the policies andobjective function for the problem Sections 3–4 contain the main results:
we reduce the state space and prove that, under the assumptions introduced
in Section 2, our model satisfies the Doeblin condition From this, we canuse methods dealt with in Ref 4 and the properties of Lyapunov exponents
to show the existence of an optimal policy Furthermore, with this policy,the spectrum of the system (1) consists of only one element
2 Notations and Hypotheses
Let A be a compact metric space, called the space of actions, and let
N G {1, 2, } be the set of natural numbers Throughout this paper, if m(·)
is a measure and f is an m-integrable function, we denoteef (x) dm by m( f ); and if S is a topological space, we writeB(S ) for the Borel sets of S Let
Y be a measurable space endowed with theσ-algebraB(Y), and suppose
thatµis a probability measure on (Y,B(Y)) Let I be a finite or countable set Suppose that, for every a∈A, we have a transition matrix
P (a)G(P ij (a): i, j∈I)
Let M: IBY→Gl (d, R) be a measurable map from IBY into the group
of invertible matrices Gl (d, R) Throughout this paper, we shall make the
P ij1(a1 )P j1j2(a2) P j n0A1 j n0 (a n0)Xα, (5)
for any i∈I and a1 , a2 , , a ∈A
Trang 4Assumption A3. For the distribution Q(i, H )Gµ( y∈Y : M (i, y)∈H),
H∈B(Gl (d, R)), of M(i, ·) on GL(d, R), there exists a number n1H0 such
that3
Q (i1, i2 , , i n1, ·)GQ(i1, ·)∗Q (i2 , ·)∗· · ·∗Q (i n1, ·), (6)
for any i1 , i2 , , i n1∈I, has a nonvanishing absolutely continuous part inits Lesbegue decomposition
Assumption A3 means that, if
Q (i1, i1 , , i n1, ·)GQ c (·)CQ s (·),
where Q c and Q s are respectively the absolutely continuous and singular
parts with respect to the Lebesgue measure, then Q c (Gl (d, R))H0.
Example 2.1. Let IG{+,−} and let
M(±, y)G31 ±y
1 04
Suppose that Y n.γ, whereγ has a continuous distribution; then, it is easy
to see that Assumption A3 is true with n1G4.
We shall now formulate the problem in canonical space Denote byΩ0
the set of all sequences (w n ) with w nG(ξn , y n , a n), whereξn∈I , y n∈Y, and
A decision πt at time t is a stochastic kernel on B(A)BR dB
(IBYBA) tA1B(iBY), namely,
πtGπt (·ux, w1 , w2, , w tA1,ξt , y t)
3
The asterisk in (6) denotes the convolution operation.
Trang 5A sequence of decisions πG(π1,π2, ) is called a policy We use Π todenote the class of all policies.
Letπ∈Πbe a policy, and let q∈P(I ),ν∈P(R d
), whereP(S ) denotes the set of probability measures on S for any measurable space S Then, we can define a probability measure P on (Ω,Ft,F ) such that the following
conditions are satisfied: for any nG1, 2, , B∈B(Y), and i, j∈I:
(i) P (Y n∈BuFnA1,ξn)Gµ(B), (7)(ii) P(ξnC1GjuFn )GPξn (A n), (8)
(iii) P(A n∈·uFnA1,ξn , Y n)Gπn (· ux0 , W1 , W2 , , W nA1,ξn , Y n), (9)(iv) P(ξ0Gi)Gqi, q G (q1, q2, ),
(v) P (x0∈B)Gν(B), ∀B∈B(R d
),
with the convention W0Gconst
The probability P is called the control associated with the policyπ and
the initial distributions q, ν We denote by R(q,ν) the class of controls
starting from q It is well-known thatR(q,ν) is a convex, closed set
Let P∈R(q,ν) be a control associated with the policyπ∈Πand q,ν
We consider a difference equation in the form
X nC1GM(ξnC1, YnC1)Xn, (10a)
Suppose that X(n, x0 ) is the solution of (10) starting at x0, i.e.,
X (0, x0 )Gx0, P-a.s
We consider the following two objective functions:
Λ(q,ν,π)GPAess sup5lim sup
t→ S(1yt) loguX(t, x0)u6, (11)
with the essential supremum taken over the probability P, and
Ψ(q,ν,π)GE qπ, νlim sup
t→ S (1yt) loguX(t, x0)u, (12)
where E qπ, νdenotes the expectation with respect to the measure P q, ν If q and
ν are degenerate at i and x, we will write simply Λ(i, x,π) and Ψ(i, x,π),instead ofΛ(q,ν,π) andΨ(q,ν,π), respectively It is evident that
Λ(q,ν,π)XΨ(q,ν,π), (13)
Trang 6for any q,ν,π Let
I , x∈R d
From (13), we get
Λ*XΨ*
So, if (q,ν,π) is minimum for problem (12) and Λ(q,ν,π)GΨ*, then
(q,ν,π) is also minimum for problem (11) Therefore, we hope that, undersuitable hypotheses, it is sufficient to consider problem (12) to find an opti-mal control for problem (11)
3 Reduced Markov Decision Model
It is well-known that the objective function given in the form (11) or(12) is independent of the length of the vectors Therefore, we may reducethe state space Any two nonzero vectors are said to be equivalent if they
are proportional The space of equivalent classes is denoted by P dA1
The
action of a matrix g on R d
preserves the equivalence relation We use g again to denote the quotient action on P dA1 Let us consider theFn-adaptedreduced process
Z nG(ξn , S n), n G1, 2, , (15)defined onΩwith values on IBP dA1, where
Trang 7) We denote this transition by
T ( jBB ui, s, a)GP ij (a)#1B [M( j, y)s yuM( j, y)su]µ(dy). (18)The policyπG(π1,π2, ) is said to be Markov stationary for the con-trol problem of Lyapunov exponents (or randomized stationary, see Ref 4)
if there exists a kernelΦon B(A)B(IBP dA1) such that, for tG1, 2, ,
πt G(daux0, W1, W2, , W tA1,ξt , Y t)GΦ(dauZ t)
We writeΦS
for the policy (Φ,Φ, )
A Markov stationary policy Φ is called a stationary policy (or mined stationary policy) ifΦ(· ui, s) is the Dirac mass for any i∈I , s∈P dA1
deter-In this case, a stationary policy is described completely by a measurable
mapping f: IBP dA1→Asuch that
Φ({ f (i, s)}ui, s)G1, for i∈I , s∈P dA1;
see Refs 5–6 We denote this policy by fS
Let Φ(daui, s) be a Markov stationary policy; then, under the
prob-ability associated with Φ, the process (Z t) is Markov with transition
prob-ability TΦgiven by
TΦ(C ui, s)G#T (C ui, s, a)Φ(da ui, s).
Lemma 3.1. Under Assumptions A2 and A3, for any Markov
station-ary policy, the Markov chain (Z ) satisfies the Doeblin condition (see Refs
Trang 87–8) with respect to the product measure of Lebesgue measure meas(·) on
P dA1and a counting measure on I.
Proof. We have to prove that, for any Markov stationary policy
Φ(· ui, s), there exist a counting measure on I (γ, say) and numbers (H0,
δH0, and m0 such that, for every i∈I and s∈P dA1,
T m0
for any C∈B(I )BB(P dA1) such that γBmeas(C)Fδ, where meas(·)
denotes the Lebesgue measure on P dA1
Let K andα, n0be given as in Assumption A2 Then,
∑
j∉K ∑
j1, j2, , jn0∈I
P ij1(a0)·P j1j2(a1) P j n0 j (a n0)Y1Aα, (20)
for any i∈I , a0, a1 , , a j n0∈A We note that, if (20) is satisfied for n0, then
it will be satisfied for any nXn0 Indeed,
Furthermore, if Assumption A3 is true for n1, then it is still true for any
nXn1by the following property: if one of the measuresσ1andσ2is utely continuous with respect toσ on a topological group, then their convol-ution is absolutely continuous with respect to σ Hence, without loss of
absol-generality, we can suppose that n0 Gn1G1 and we shall show that (19) is
satisfied for m0G1 To avoid complexities, we put
Q (i, s, B)Gµ{y: M(i, y)syuM(i, y)su∈B},
Q (i, H )Gµ{y: M(i, y)∈H},
where r is the number of elements of K We denote by m(·) the product
measure γ(·)Bmeas(·) on IBP dA1 Suppose that δ1F1yr; then, from
m (C)Fδ2, it follows that
meas(C)Fδ1, for any i∈K
Trang 9By using the definition of TΦ, we have
and F(g) is essentially bounded on H0 We suppose that
uguYc, uF(g)uYc, for any g∈H0
From this, we have
Q s (i, H0)CQ(i, Hr0)Y1Aσ,
where Hr0denotes the complement of H0 Letting
c · B G {x: xyuxu∈B,uxuFc},
we get
Q (i, s, B)YQ(i, Hr0)CQs (i, H0 )CQ c (i, {g∈H0: gsyugsu∈B})
Y1AσCQ c (i, H0∩{gs∈c · B}) (22)But
Q c (i, H0∩{gs∈c · B })Yk· meas(c · B)
Yk · c d
· meas(B),
Trang 10where there is an abuse of notation between meas(·) on Gl(d, R) and meas(·) on R d
This implies that, if
( G(1y2)ασ and δGmin{1yr2,δ2} h
In connection with the value functionsΛ(·) andΨ(·), we consider theMarkov decision model mentioned above with value function in the form
V (i, s,π)Glim sup
t→ S (1yt)Eπ
i ∑t
n G1
ρn (Z n), Z0G(i, s), (23)which is familiar to us
P ij (a)E loguM( j, Y n )su.
By Assumption A1, ρ(i, s, a) is a bounded continuous function, and it is
easy to see that
E iπρn (Z n )GE iπρ(Z nA1, A nA1), n G1, 2,
From this, we get
V (i, s,π)Glim sup
→ S (1yt) ∑t E iπρn (Z n , A n) (24)
Trang 11Because (Y n) is an i.i.d sequence, then by Assumption A1, the sequence(ρ(Z n)) is uniformly integrable for anyπ∈Π By virtue of the Fatou lemma,
we have
Ψ(i, s,π)XV(i, s,π), i∈I , s∈P dA1,π∈Π (25)Hence,
Proof. Under the policyΦS
, (Z n) is a Markov process with transitionprobability
T (C ui, s,Φ)G#A
T (C ui, s, a)Φ(da ui, s)_TΦ(C ui, s).
By Lemma 3.1, the Markov process (Z n) satisfies the Doeblin condition
with respect to the measure m(·) defined in the proof of Lemma 3.1 with
constants δ and ( So, we can define a decomposition of the state space
IBP dA1 into a transient set F and a finite number of ergodic sets C1,
C2, , C p
, with
m (C r)Xδ, 1YrYp.
The restriction of (Z n ) on C ris ergodic, so it is Harris recurrent with respect
to the invariant measureγr (·) defined by
ρ(i, s, a) ·Φ(da ui, s),
then it is easy to show that
#[ρΦ(i, s)AV(i, s,Φ)]ν(di, ds)G0.
This implies that ρΦAV(i, s,Φ) is a charge on C r Hence, the Poissonequation
Trang 12where E is the identity operator on C r , has a bounded solution Let h be
such a solution of (28), and put
Trang 13Using a proof similar to that of (30), we can show that
V r GV(i, s), when (i, s)∈C r
On the other hand, if (i, s)∈F, then
V (i, s,ΦS
)Glim sup
Since (30) takes place P-a.s., then it is easy to establish a relation
between the value functions (11) and (23)
Theorem 3.2. Under the assumptions of Theorem 3.1, for any Markovstationary policyΦS
G(Φ,Φ, ), letτ and C r , V , 1YrYl, defined as in
Trang 14the proof of Theorem 3.1 Then,
Λ(i, s,ΦS
)G5V (i, s,Φ), if (i, s)∈C r,
PAess sup{∑l
r G11{Zr∈C r}· V r }, if (i, s)∈F
We now turn to the reduced problem Denote byΠ¯ the subclass of Π
consisting of kernels of the form π¯ (da uZ1 , A1 , , Z tA1, A tA1, Zt), whichmay be considered as a kernel on B(A)B(IBP dA1BA) t B(IBP dA1) Let
π∈Πbe an arbitrary policy, and let
i.e.,π andπ¯ have the same value.
Therefore, for the control problem of Lyapunov exponents, we can
reduce our model by considering that (Z n , A n) is a canonical process defined
on the canonical space
Ω¯ G{ f: N→IBP dA1BA},
and the policies are in the class Π¯ with controlled transition probability
(18) This reduced model has many advantages because P dA1
is compact.Therefore, in the following, we consider only the reduced model Further-more, we can find optimal policies inΠ¯ , as we now show
Trang 154 Existence of an Optimal Policy
To prove the existence of an optimal policy, we use the Kurano ideas,which are explained in Ref 4 In this section, we replace Assumption A2
by the following assumption
Assumption A2′ the map a→P (a) is continuous and the family {P(a): a∈A } is tight, i.e., for any (H0, there is a finite set K such that, for any i∈I , a∈A,
Lemma 4.1. See Ref 4, Lemma 2.1 For any policy π∈Π¯ and any
initial distribution q, ν of ξ0, S0, we can find a probability measure on
IBP dA1BA, namelyσ, such that
#IBP dA1BA
ρ(z, a)σ(dz, da)YV(q,ν,π) (33)and
for any bounded continuous function g.
Proof. For givenπ∈Π¯ , q∈P(I), andν∈P(P dA1), we put
µT (D)G(1yT) ∑T E1D (Z n , A n),
Trang 16where the expectation is taken with respect to the control associated with
(q,ν,π) The family {µT (·): TG1, 2, } is tight, so there exists a sequence {T n} and a probability measureσ on IBP dA1such that
inf
ν{V(q,ν,π):π is policy}Ginf
ν {V(q,ν,Φ):Φis Markov}
Trang 17Lemma 4.2. If {σn} is a sequence of probability measures whichsatisfy (34), then {σn} is tight.
Proof. Let (H0, and let K be as mentioned in Assumption A2′ Then,from (34), we have
σn (KBP dA1BA)G#σn (dz, da)T(KBP dA1uz, a)
here, there is an abuse of notation, but we can define V(ν,π) exactly as
V (i, s,π) Under the Doeblin condition on the process (Z n), we have adecomposition of the state space as in the proof of Theorem 3.1, namely,
the subsets F, C1, , C lfor which
m (C r)Xδ, for any 1YrYl,
other hand, for any fixed i∈Isuch that
meas(C i)≠0, where C i G{s: (i, s)∈C},
the map
s→V (i, s,ΦS
)
Trang 18satisfies the general property of the Lyapunov exponents (see Ref 10).Hence, the set
Lemma 4.3. Sis an invariant set
Proof. Suppose that there is an (i0, s0)∈Ssuch that
is a Markov policy, then we have by Theorem3.1 that
Trang 19This is a contradiction Thus, S is an invariant. h
Theorem 4.1. Suppose that there exists a Markov policy LS
(38), it follows that P(τFS )G1 So, in a way similar to the proof ofTheorem 3.1 and Theorem 3.2, we get
and this means thatΦS
∏ is an optimal policy for the objective function V(·).
The fact that Φ∏ is optimal for the objective function Λ(·) follows from
Theorem 3.2 and forΨ(·) is deduced from Inequalities (13) and (25) In thiscase, we have
TΦ(· ui, s)G(T· ui, s,Φ)G# T (· ui, s, a)Φ(daui, s),