1. Trang chủ
  2. » Ngoại Ngữ

A smoothing newton BICGStab method for least squares matrix nuclear norm problems

45 283 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 45
Dung lượng 176,6 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

A SMOOTHING NEWTON-BICGSTAB METHOD FOR LEAST SQUARES MATRIXNUCLEAR NORM PROBLEMS... Contents vA Smoothing Newton-BiCGStab Method for Least Squares Matrix Nuclear Norm Problems Luo Yanyin

Trang 1

A SMOOTHING NEWTON-BICGSTAB METHOD FOR LEAST SQUARES MATRIX

NUCLEAR NORM PROBLEMS

Trang 2

I would like to express my deepest thanks and respect to my supervisor ProfessorSun Defeng He has patiently introduced me into the field of optimization and hasprovided guidance and encouragement throughout my study My sincere respect tohim came from his enthusiasm in the optimization field and his effort in organizingweekly optimization discussion sessions, which had become a fruitful experienceand a great learning opportunity for me in this research field

My sincere thanks also go to all the friends in the department of mathematics:Gao Yan, Liu Yongjin, Zhao Xinyuan, Jiang Kaifeng, Ding Chao and Yang Zhe,for their kindly help and support throughout the project

Luo Yanying/Jan 2010

ii

Trang 3

iii

Trang 4

Contents iv

Trang 5

Contents v

A Smoothing Newton-BiCGStab Method for Least Squares Matrix Nuclear Norm Problems

Luo Yanying

Department of Mathematics, Faculty of Science

National University of Singapore

Master’s thesis

Abstract

In this thesis, we study a smoothing Newton-BiCGStab method for the leastsquares nonsymmetric matrix nuclear norm problems For this type of problems,when linear inequality and second-order cone constraints are present, the dualproblem is equivalent to a system of nonsmooth equations Some smoothing func-tions are introduced to the nonsmooth layers of the system We will prove thatthe smoothed system of equations for nonsymmetric matrix problems inherits thestrong semismoothness property from the real-valued smoothing functions As aresult, we show that the smoothing Newton-BiCGStab method which was intro-duced for solving least squares semidefinite programming problems can be extended

to solve the least squares nonsymmetric matrix nuclear norm problems

Trang 6

σ n1(X) are singular values of X Let ∥ · ∥2 stand for the Euclidean norm, and ∥ · ∥ F

denote the Frobenius norm which is induced by the standard trace inner product

⟨X, Y ⟩ = trace(Y T X) in ℜ n1×n2 Let{A e , A l , A q , A u } be the linear operators used

in four types of constraints respectively: linear equality, linear inequality, order cone, and linear vector space constraints Each of these operators is a linearmapping from ℜ n1×n2 to ℜ m ∗ defined respectively by

Trang 7

where the constants are required to be ρ ≥ 0, µ > 0, λ > 0, C is some matrix in

ℜ n1×n2 and K m q denotes a second order cone which is defined by

Trang 8

smoothing functions to solve the least squares covariance matrix (LSCM) problems

with equality and inequality constraints [6],

In absence of the inequality constraints, we have Q+ = ℜ m e, which implies that

the dual of (LSCM) problem is an unconstrained convex optimization problem.

Based on a result of [18], we know that when▽θ is a strongly semismooth function

though it is not continuously differentiable One can still find a quadratically

con-vergent method for solving (LSCM) problems [16] When inequality constraints

are present, the dual problem becomes a constrained problem, which can be formed into a system of equations,

trans-F (y) : = y − Π Q+(y − ▽θ(y)) = 0. (1.4)

In this system, the projector ΠQ+(·) is a metric projection from ℜ m e +m l to Q+.The function ▽θ involves another metric projector onto the symmetric positive

Trang 9

semedefinite cone The two layers of metric projectors have created obstacles to adirect use of Newton type of algorithms to achieve a quadratic convergence rate Totackle this problem, Gao and Sun [6] applied some smoothing functions to the two

nonsmooth layers of metric projectors in F A Newton-BiCGStab algorithm is used

to solve a smoothed system of (1.4) Their results have shown a promised quadratic

convergence rate for the (LSCM) problems with linear inequality constraints.

The (LSCM) problem has recently been used by Gao and Sun [7] to iteratively

solve the H-Weighted least squares semidefinite programming problems with anadditional rank constraint,

where H ≥ 0 is a given matrix and ” ◦ ” denotes the Hadamard product of two

matrices Note that

n

i=k+1

σ i (X) = 0 iff rank(X) ≤ k The rank constraint may

be replaced by putting a penalty term ρ(

function The idea of the majorized penalty approach given in [7] is to solve a

sequence of (LSCM) problems of the form,

Trang 10

Given its potential importance of problem (1.1) for solving structure preservinglow rank approximation problems and beyond, we will focus on solving problem(1.1).

In this thesis, the least squares matrix nuclear norm minimization problems will

be shown to have similar properties as the (LSCM) problems The smoothing

Newton-BiCGStab method will be applied to solve problem (1.1) Preliminariessuch as derivations of the dual problem, optimality conditions, constructions ofsmoothing functions, the continuous and differentiable properties of nonsymmetricmatrix-valued functions that are involved in solving problem (1.1) will be presented

in the next chapter In Chapter 3, the smoothing Newton-BiCGStab method

is illustrated with the convergence analysis Implementation related issues andnumerical experiments will be discussed in Chapter 4, and followed by conclusions

in Chapter 5

Trang 11

Chapter 2

Preliminaries

Optimal-ity Conditions

In this chapter, we denote the primal problem (1.2) by (P).

The Lagrangian function L(X, x u , y) : ℜ n1×n2×ℜ m u ×ℜ m → ℜ for (P) is defined

Trang 12

2.1 The Lagrangian Dual Problem and Optimality Conditions 7

defined by:

D τ (X) : = U D τ (Σ)V1T , D τ(Σ) = diag({(σ i − τ)+}),

where t+: = max(0, t) The singular value thresholding operator is a proximity

operator associated with nuclear norm Details of proximity operator can be found

Trang 13

2.1 The Lagrangian Dual Problem and Optimality Conditions 8

in [9] The following proposition1 allows us to obtain the result of infX {ρ∥X∥ ∗+λ

2∥X − C − 1

λ W ∗ y ∥2

F } Its proof can be found in [2, 12].

Proposition 2.1.1 For each τ ≥ 0 and Y ∈ ℜ n1×n2, the singular value ing operator obeys

The objective function θ in the dual problem (D) is a continuously differentiable

convex function However it is not twice continuously differentiable Its first orderderivative is given by

Trang 14

2.1 The Lagrangian Dual Problem and Optimality Conditions 9

The dual problem (D) of problem (P) is a convex constrained vector-valued problem, in contrast to the matrix-valued problem (P) When it is easier to apply optimization algorithms to solve for solutions for (D) than for (P), one can use

Rockafellar’s dual approach [17] to find an optimal solution ¯y for (D) first An

optimal solution X for (P) can then be obtained by

Before introducing optimality conditions, we assume that the Slater condition holds

for the primal problem (P):

where ri (Q) denotes the relative interior of Q When the Slater condition is

satis-fied, the following proposition, which is a straightforward application of lar’s results in [17], holds

Rockafel-Proposition 2.1.2 Under the Slater condition (2.4), the following results hold:

(i) There exists at least one ¯y ∈ Q+ that solves the dual problem (D) The unique solution to the primal problem (P) is given by

(X, ¯ x u) = (D ρ

λ (C + 1

λ W ∗ y),¯ −µ −1¯u ). (2.5)

(ii) For every real number ε, the constrained level set {y ∈ Q+| θ(y) ≤ ε} is closed,

bounded and convex

The convexity in the second part of Proposition 2.1.2 allows us to apply any gradi-ent based optimization method to obtain an optimal solution for the dual problem

(D) When a solution is found for (D), one can always use (2.5) to obtain a unique

optimal solution to the primal problem (P).

Trang 15

2.1 The Lagrangian Dual Problem and Optimality Conditions 10

With respect to problem (D), the Lagrange function may be defined by L(y, α) =

θ(y) −⟨α, y⟩ For some Lagrange multiplier ¯α, the Karush-Kuhn-Tucker conditions

require the optimal solutions ¯y of problem (D) to satisfy:

On the other hand, we define F : ℜ m → ℜ m by

F (y) : = y − Π Q+(y − ▽θ(y)), ∀y ∈ ℜ m (2.7)

It can be verified with the results from [4] that solving the variational inequality(2.6) is equivalent to solving the system of

It is known that F is globally Lipschitz continuous but not everywhere continuously

differentiable One may use Clarke’s generalized Jacobian based Newton’s methods

to solve problem (2.8) However those methods can not be globalized because

F does not have any real-valued gradient mapping function Nevertheless, the

smoothing Newton-BiCGStab method has been shown to resolve such difficulty forthe least squares semidefinite programming problems [6] Similarly we may alsointroduce smoothing functions for the least squares nonsysmetric matrix nuclearproblems and design a Newton-BiCGStab method for solving a smoothed system

of (2.8)

Trang 16

2.2 The Differential Properties of the Smoothing Functions 11

Definition 2.2.1 Suppose that a vector-valued function f : ℜ m1 → ℜ m2 is locally

Lipschitz continuous at x ∈ ℜ m1 f is said to be semismooth at x, if f is tionally differentiable at x; and for any V ∈ ∂f(x + ∆x), the generalized Clarke

Trang 17

2.2 The Differential Properties of the Smoothing Functions 12

It has been known that both ϕ H and ϕ S are globally Lipschitz continuous,

con-tinuously differentiable around (ε, t) whenever ε ̸= 0, and are strongly semismooth

at (0, t) (see [21] and references therein for details) The outer layer vector-valued functions defined in (2.7), when they are composite functions of (t)+ and a linear

function, can be smoothed by using a smoothing function either ϕ H or ϕ S der certain conditions, the smoothing functions inherit the Lipschitz continuity,

Un-differentiability, and semismoothness properties of either ϕ H or ϕ S With respect

to the inner layer of F in (2.7), where the singular value thresholding operator

is involved, we will also show that the nonsymmetric matrix-valued functions can

be smoothed by applying the smoothing function either ϕ H or ϕ S to the singularvalues of the matrix The resulting matrix-valued function will be shown to inherit

the related differential properties from ϕ H (or ϕ S ) Since ϕ H and ϕ S share similar

differential properties, in the following, unless we specify we will use ϕ to denote the smoothing function either ϕ H or ϕ S

The function F (y) in (2.7) is given by

where T (y u ) = [0; 0; 0; y u ] F contains a composition of two nonsmooth

func-tions In the outer layer, ΠQ+(·) is a metric projection operator from ℜ m to Q+

where z = [z e ; z l ; z q ; z u] and ΠK mq (z) denotes the projection of z onto the

second-order cone K m q The properties of second order cone have been well studied The

Trang 18

2.2 The Differential Properties of the Smoothing Functions 13

following well known proposition gives an analytical solution to ΠK n(·), the metric

projection onto a second order cone K n of dimension n See [14] and references

therein for more discussions on ΠK mq(·).

Proposition 2.2.1 For any z ∈ ℜ n , let z = [z t ; z n ] where z t ∈ ℜ n −1 and z

It has been shown in [21, Theorem 5.1] that ϕ K n(·, ·) is globally Lipschitz

continu-ous, and strongly semismooth on +× ℜ n , if the smoothing function ϕ is globally

Lipschitz continuous, and strongly semismooth Furthermore, a smoothing

func-tion ψ : ℜ × ℜ m → ℜ m for the outer layer of metric projector (2.12) may now be

Trang 19

2.2 The Differential Properties of the Smoothing Functions 14

With the above known results, ψ is a globally Lipschitz continuous, and strongly

the SVD of X, we let a symmetric matrix Y X ∈ S (n1+n2 )×(n1+n2 ) be defined by

For some β > 0, we define a real-valued function g β and a corresponding

matrix-valued function G β (Y X ) : S (n1+n2 )×(n1+n2 ) → S (n1+n2 )×(n1+n2 ) such that

G β (Y X) : = (Y X − βI)+− (−Y X − βI)+. (2.17)

Trang 20

2.2 The Differential Properties of the Smoothing Functions 15

Here I denotes an identity matrix of dimension (n1+n2) and the matrix-valued erator (·)+is the metric projection ΠS n

op-+(·) onto the symmetric positive semidefinite

cone Then one can check [10] that

Trang 21

2.2 The Differential Properties of the Smoothing Functions 16

For any Y ∈ S n , λ(Y ) ∈ ℜ n denotes the vector of eigenvalues of Y Let Y =

P diag(λ(Y ))P T be the eigenvalue decomposition of Y A L¨ owner function F : S n →

S n is then defined with respect to a real-valued function f ( ·),

F (Y ) : = P diag[f (λ1(Y )), f (λ2(Y )), , f (λ n (Y ))]P T (2.22)

Trang 22

2.2 The Differential Properties of the Smoothing Functions 17

When f is differentiable at µ, a first divided difference function F[1] at µ ∈ ℜ n isdefined by

With the results of L¨owner (see [1] for details), we have the following lemma

Lemma 2.2.1 If a real-valued function f ( ·) is continuously differentiable in an

open interval (a1, a2) containing all the eigenvalues{λ i (Y ) } of Y , then the L¨owner

function F ( ·) is differentiable at Y For any H ∈ S n , the derivative of F ( ·) is given

by

F ′ (Y )H = P (F[1](λ(Y )) ◦ (P T HP ))P T



With Lemma 2.2.1, we have that ΦG is differentiable at (ε, Y X ) for any ε > 0, and

its derivative is given by

Trang 23

2.2 The Differential Properties of the Smoothing Functions 18

we divide Ω(ε, λ(Y X)) into nine parts,

P12 = (ΦD β)′ X (ε, X)∆X

2U ((A + A

T)◦ Ω11+ (A − A T)◦ Ω12)V1T + U (B ◦ Ω13)V2T

Similarly to (2.24), we have that ΦD β is differentiable at (ε, X) when ε > 0, and

its derivative is given by

values of the matrix are given by ϕ g, which is a sum of two strongly semismooth

Trang 24

2.2 The Differential Properties of the Smoothing Functions 19

functions The sum of two strongly semismooth functions is also strongly mooth From the results of [21], we know that the smoothing matrix-valued func-tion ΦG inherits the globally Lipschitz continuous and strong semismoothness of

semis-ϕ g We have seen from above that the derivative of ΦD β has an analogous formation form to the derivative of ΦG as from X to Y X Thus ΦD β analogouslyinherit the globally Lipschitz continuous and strongly semismooth properties at

trans-any (0, X) ∈ ℜ × ℜ n1×n2 In particular, for any ∆X → 0 and ε → 0 and

V ∈ ∂Φ D β (ε, X + ∆X),

ΦD β (ε, X + ∆X) − Φ D β (0, X) − V (ε, ∆X) = O(∥(ε, ∆X)∥2). (2.27)

Now we are ready to introduce a smoothing function Υ : ℜ × ℜ m → ℜ m for F

defined in (2.7) with (2.13) and (2.20),

The differential properties of Υ, which will be used for the convergence analysis

of our algorithm, are summarized in the following proposition

Proposition 2.2.2 Let Υ : ℜ × ℜ m be defined by (2.28) Let y ∈ ℜ m Then itholds that

(i) Υ is globally Lipschitz continuous on ℜ × ℜ m

(ii) Υ is continuously differentiable around (ε, y) where ε ̸= 0 If m q = 0, then

any fixed ε ∈ ℜ, Υ(ε, ·) is P0-function, i.e for any y, h ∈ ℜ m with y ̸= h, it holds

Trang 25

2.2 The Differential Properties of the Smoothing Functions 20

Trang 26

2.2 The Differential Properties of the Smoothing Functions 21

This implies that g ε is a P0 function onℜ m Let y, h ∈ ℜ m with y ̸= h Then there

exists i ∈ {1, , m} with y i ̸= h i such that

Thus Υ is a P0-function and (2.29) holds for any y, h ∈ ℜ m such that y ̸= h.

(iii) We have shown that the smoothing functions ψ defined in (2.13) is strongly semismooth at any (0, y) ∈ ℜ × ℜ m; and ΦD ρ

λ

defined in (2.13) is strongly

semis-mooth at any (0, X) ∈ ℜ × ℜ n1×n2 With the known result that a compositefunctionof strongly semismooth function is also strongly semismooth [5], we can

conclude that Υ is strongly semismooth at (0, y).

(iv) Both ψ and Φ D β are directionally differentiable For any (ε, y ′) ∈ ℜ × ℜ m

such that Υ is Fr´echet differentiable at (ε, y ′), the directional derivative gives that

where T (h u ) = [0; 0; 0; h u ] and z ′ = y ′ − ▽θ(y ′ ) With the semismoothness of ψ

and ΦD β, it implies that

Ngày đăng: 26/09/2015, 09:56

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN