Chapter 10: INTRODUCTION TO ASYMPTOTIC THEORY

184 Introduction to asymptotic theory I =3 X,—EX, D n In order to be able to extend these results to arbitrary Borel functions hX.. not just 3 7~¡ X;, we ñrstly need to extend the vari

Trang 1

from the distribution of X In Chapter 6 we saw that this is by no means a trivial problem even for the simplest functions h( -) Indeed the results in this area are almost exclusively related to simple functions of normaily distributed r.v.’s, most of which have been derived in Chapter 6 For more complicated functions even in the case of normality very few results are available Given, however, that statistical inference depends crucially on being able to determine the distribution of such functions h(X) we need to tackle the problem somehow Intuition suggests that the limit theorems discussed in Chapter 9, when extended, might enable us to derive approximate solutions to the distribution problem

The limit theorems considered in Chapter 9 tell us that under certain conditions, which ensure that no one r.v in a sequence {X,,n>1} dominates the behaviour of the sum (}?_, X;), we can deduce that:

183

Trang 2

184 Introduction to asymptotic theory

I

=3 (X,—E(X,) D

n

In order to be able to extend these results to arbitrary Borel functions h(X) not just 3 7~¡ X;, we ñrstly need to extend the various modes Ofconvergence (convergence in probability, almost sure convergence, convergence in

distribution) to apply to any sequence of r.v.’s {X,,n> 1}

The various modes of convergence related to the above limit theorems are considered in Section 10.2 The main purpose of this section is to relate the various mathematical notions of convergence to the probabilistic convergence modes needed in asymptotic theory One important mode of convergence not encountered in the context of the limit theorems is

‘convergence in the rth mean’, which refers to convergence of moments Section 10.3 discusses various concepts related to the convergence of moments such as asymptotic moments, limits of moments and probability limits in an attempt to distinguish between these concepts often confused in asymptotic theory In Chapter 9 it was stressed that an important ingredient which underlies the conditions giving rise to the various limit theorems is the notion of the ‘order of magnitude’ For example, the Markov condition needed for the WLLN,

Asymptotic theory results are resorted to by necessity, when finite sample results are not available in a usable form This is because asymptotic results provide only approximations ‘How good the approximations are’ is commonly unknown because we could answer such a question only when the finite result is available But in such a case the asymptotic result is not needed! There are, however, various ‘rough’ error bounds which can shed some light on the magnitude of the approximation error Moreover, it is often possible to ‘improve’ upon the asymptotic results using what we call

Trang 3

10.2 Modes of convergence 185 asymptotic expansions such as the Edgeworth expansion The purpose of Section 10.6 is to introduce the reader to this important literature on error bounds and asymptotic expansions The discussion is only introductory and much more intuitive than formal in an attempt to demystify this literature which plays an important role in econometrics For a more complete and formal discussion see Phillips (1980), Rothenberg (1984), inter alia

10.2 Modes of convergence

The notions of ‘limit? and ‘convergence’ play a very important role in probability theory, not only because of the limit theorems discussed in Chapter 9 but also because they underlie some of the most fundamental concepts such as probability and distribution functions, density functions, mean, variance, as well as higher moments This was not made explicit in Chapters 3-7 because of the mathematical subtleties involved

In order to understand the various modes of convergence in probability theory let us begin by reminding ourselves of the notion of convergence in mathematical analysis A sequence {a,,ne@ 1} is defined to be a function

from the natural numbers 1°={1, 2, 3, } to the real line R

Definition 1

A sequence {a,,n€.1 } is said to converge to a limit a if for every arbitrary small number ¢>0 there corresponds a number N(e) such that the inequality |a, —a| <e holds for all terms a, of the sequence

with n> N(e); we denote this by lim,_,, 4,=4

Definition 2

A function h(x) is said to converge to a limit | as x > Xo, if for every

Trang 4

é>0 however small, there exists a number 6()>0 such that

|h(x) -I] <eé

holds for every x satisfying the condition 0<|x —Xo|< d(e)

Example 2

For h(x)=e*, lim, _,,4(x)=0 and for the polynomial function

A(x) =dox"+ayx"-' ++" +a,-4x+a,, lim h(x)=a,

x70

Note that the condition 0< |x— xạ|< ð(e) excludes the point x =x, in the above definition and thus for h(x)=(x?— 9)/(x — 3), lim, › h(x) =6, even though A(x) is not defined at x=3

A function h(x), defined over some interval D(h) SR, xo € D(h) is said

to be continuous at the point x, if for each e>O, there exists a O(é)>0 such that

|ñ(x) —h(xạ)| <é for every x satisfying the restriction |x — x | < 6(¢) We denote this by lim, , A(x) =h(x9) A function h(x) is said to be continuous if it is continuous at every point of its domain, D

Example 3

The functions h(x)=ax+b and h(x)=e” are continuous for all xeR

(verify!).

Trang 5

|h„(x) —h(x)| <e holds for all x éA

Example 4

For

n k h(x) = mm lim h,(x)=e* for all xe

k<o ' ate

In the case where N(e, x) does not depend on x (only on e) then {h„(x),ne, ˆ}

is said to converge uniformly on A The importance of uniform convergence stems from the fact that if each h,,(x) in the sequence is continuous and h,(x) converges uniformly to h(x) on D then the limit h(x) is also continuous That

With the above notions of continuity and limit in mathematical analysis

in mind let us consider the question of convergence in the context of the probability spaces (S, 4% P(-)) and (R, 4, P,(-)) Given that a random variable X(-) is a function from S to R we can define pointwise and uniform convergence on S for the sequence {X,(s), neé.4"} by

and

|X,(s)— X(s)|<e forn>N(e), seS, (10.11)

respectively These notions of convergence are of little interest because the

probabilistic structure of {X,(s), ne.4°} is ignored Although the

probability set functions P(-) and P,(-) do not come into the definition ofa

Trang 6

random variable they play a crucial role in its behaviour If we take its probabilistic structure into consideration both of the above forms of convergence are much too strong because they imply that for n>N

|X,,(s) —X(s)|<e whatever the outcome seS (10.12)

The form of probabilistic convergence closer to this is the almost sure convergence which allows for convergence of X,,(s) to X(s) for all s except of some s-set A S$ for which P(A) =0; A is said to be a set of probability zero The term almost sure is used to emphasise the convergence on S—A not the whole of S

Definition 5

A sequence of r.v.s {X,(s),n€.4 } is said to converge almost surely

(a.s.) to a rv X(s), denoted by X, ¬ X,if

An equivalent way of defining almost sure convergence is by

> lim Pr(s: |X,,(s)— X(s)|<e, all m>n)=1 (10.14)

(see Chung (1974)) The almost sure convergence ts the mode of convergence associated with the strong law of large numbers (SLLN)

Another mode of convergence not considered in relation to the limit theorems (see Chapter 9) is that of convergence in rth mean

Definition 6

Let (X,(s),ne€.4"} be a sequence of r.v.s such that E(|X,\")< % for

allne WV and E(|X|’) <x forr>0, then the sequence converges to X

Trang 7

10.2 Modes of convergence 189 Definition 7

A sequence of r.v.’s | X,(s),n€.4| is said to converge in probability

A sequence of r.v.s {X,(s), n€.4\ with distribution functions

(F(x), ne \ is said to converge in distribution fo X(s) denoted by

X, 7X, if

at every continuity point x of F(x)

This is nothing more than the pointwise convergence of a sequence of functions considered above In the case where the convergence is also uniform then F(x) is continuous and vice versa It is important, however, to note that F(x) in (17) might not be a proper distribution function (see Chapter 4)

Without any further restrictions on the sequence of rv.’s {X,(s), ne.177 the above four modes of convergence are related as shown in Fig 10.1 As

we can see, convergence in distribution is the weakest mode of convergence being implied by all three other modes Moreover, almost sure and rth mean convergence are not directly related but they both imply convergence

in probability In order to be able to relate almost sure and rth mean

convergence we need to impose some more restrictions on the sequence

{X,(s),n€ 4}, such as the existence of moments up to order r

Trang 8

convergence than (16), which holds for all nm The Implication —› = — is based on the inequality

which implies that Pr|X,— X|>e)<e""E(|X,— XỈƑ) >0 as n— x The

In order to go from convergence in probability or almost sure convergence

to rth mean convergence we need to ensure that the sequence of r.v.’s

{X,,n€.£ } is bounded and the moments up to order r exist In particular if

E(\X,\")< ax then EX,[)<+ for0<l<r (10.22)

That is, if the rth moment exists (is bounded) then all the moments of order less than r also exist This is the reason why when we assume that

Trang 9

10.2 Modes of convergence 191 Var(X,,)< a we do not need to add that E(X,,)< a, given that it is always implied

In applying asymptotic theory we often need to extend the above convergence results to transformed sequences of random vectors {g(X,,),

ne 4} The above convergence results are said to hold for a random vector

sequence {X,,ne.1} if they hold for each component X,,,i=1,2, ,k

of g(-): see Mann and Wald (1943) Borel functions have a distinct

advantage over continuous functions in the present context because the

limit of such functions are commonly Borel functions themselves without requiring uniform convergence Continuous functions are Borel functions but not vice versa In order to get some idea about the generality of Borel functions note that if h and g are Borel functions then the following are also Borel functions: (1) ah + bg, a, beéR, (11) |h , (ill) max(h, g), (iv) min(h, g), (Vv)

Trang 10

{92 Introduction to asymptotic theory

a continuous derivative (as in the case of a continuous r.v.) then dF,,(x) is equivalent to the differential f,(x) dx; f,(x)=[dF,(x)]/dx being the corresponding density function

The limit of the rth moment (E(X‘)) is defined by

and it refers to the ordinary mathematical limit of the sequence {E(X7), n> 1} This limit is by no means equivalent to the asymptotic moments ot XxX, defined by

Trang 11

10.3 Convergence of moments 193

of its asymptotic distribution F(x) and not its finite sample distribution F(x) In view of the fact that F,(x) might have moments up to order m and F(x) might not (or vice versa), there is no reason why E(X%), lim, ,, E(X}) and E,(X°) will be equal for ail r <q and all n Indeed, we can show that the limit inferior of E(|X,|") for some r>1 provide upper bounds for the corresponding asymptotic moments In particular:

If X,— X and E(\X\\)) <<, then lim,_.,, nS we E(X") = E(X")

Lemma 10.7

P

If X,, > X and E(|X|")< x, {X1,n> 1} is uniformly integrable, then

lim, E(X?)= E(X'

nyo

Lemma 10.8

If X,— X and lim, ,, inf E(\X,|")< EX) then

lim, , E(X")= E(X") nya

(For these lemmas see Serfling (1980).) Looking at these results we can see that the important condition for the equality of the limit of the rth moment

and the rth asymptotic moment is the uniform integrability of {X},n> 1}

which allows us to interchange limits with expectations.

Trang 12

Beyond the distinction between moments, limits of moments and asymptotic moments we sometimes encounter the concept of approximate

moments

Consider the Taylor series expansion of g(m,), m,=(1/n) V7.4, X7

This expansion is often used to derive approximate moments for g(m,) Under certain regularity conditions (see Sargan (1974),

Var(g(m,)) = [g™ (u,)]? Var(m,), (10.33) Etg(m,) — E(g(m,)))* >(g°(u,))°Elgtm,) — g(w,))°

+32 (HG (UK ELg(m,)

where ‘x’ reads approximately equal These moments are viewed as moments of a statistic purporting to approximate g(m,) and under certain conditions can be treated as approximations to the moments of g(m,)

(see Sargan (1974)) Such approximations must be distinguished from E(X‘)

as well as E(X’) The approximate moments derived above can be very useful in choosing the functions g(-) so as to make the asymptotic results more accurate in the context of variance stabilising transformations and asymptotic expansions (see Rothenberg (1984)) In deriving the asymptotic distributions of g(m,) only the first two moments are utilised and one can improve upon the normal approximation by utilising the above

approximate higher moments in the context of asymptotic expansions A brief introduction to asymptotic expansions is given in Section 10.6

10.4 The ‘big O’ and ‘little 0’ notation

As argued above, the essence of asymptotic theory is approximation;

approximation of Borel functions, random variables, distribution functions, mean, variances and higher moments (see Section 10.5) A particularly useful notion in the context of any approximation theory is that of the accuracy or order of magnitude of the approximations In

mathematical analysis the order of magnitude of the various quantities involved in an approximation is ‘kept track of” by the use of the ‘big O, little

o’ notation It turns out that this notation can be extended to probabilistic

approximations with minor modifications The purpose of this section is to review the O, o notation and consider its extension to asymptotic theory.

Trang 13

10.4 The ‘big O’ and ‘little 0’ notation 195

Let {a,, b,, 2€.} be a double sequence of real numbers

(532-3) = 9(s:} (n0+1)=O(n)=o(n?); exp{—-n}=o(n °), ð>o;

(mm) =O(n"!); log,n=o(n"), a>0; (6n?+3n)=o(n3)= O(n’)

A very important implication stemming from these examples is that if

a, = O(n) then a,=o(n"*°), ø,ö>0

The O,o0 notation satisfies the following properties:

(PI) If a,=O(e,) and b,=O(c,), then

Trang 14

ø( ) with common domain D¥ & We say h(x)= O(g(x)) as x > Xp if for a constant K >0,

h(x)

———|<K, xe(D_—sg)

lim glx)

xxg

Moreover, we say that h(x)=o(g(x))

tin (=o xe(D—xe)

In the case where

h(x)—g(x)=O(l(x))_ we write h(x)=g(x)+ O((x))

and for

h(x)—g(x)=o(lx)) we write h(x)= g(x) + o(l(x))

This notation is particularly useful in the case of the Taylor expansion, where we can show that if h(x) is differentiable of order n (i.e the derivatives

(dh)/(6x/) =hY, j= 1, 2, , n, exist for some positive integer n) at x=xạ,

then

h? xạ) - s + 0" +0(0") asd 0 (10.37)

n!

The O, o notation considered above can be extended to the case of stochastic convergence, convergence almost surely and in probability

Definition 11

Let {X,,n€4} bea sequence of r.v.sand [c,,n€.¥'] a sequence of

positive real numbers We say that

(i) X,, is at most of order c,, in probability if there exists

Định dạng
Số trang	28
Dung lượng	801,52 KB

Tiêu đề	Introduction to asymptotic theory
Thể loại	Chapter