184 Introduction to asymptotic theory I =3 X,—EX, D n In order to be able to extend these results to arbitrary Borel functions hX.. not just 3 7~¡ X;, we ñrstly need to extend the vari
Trang 1
from the distribution of X In Chapter 6 we saw that this is by no means a trivial problem even for the simplest functions h( -) Indeed the results in this area are almost exclusively related to simple functions of normaily distributed r.v.’s, most of which have been derived in Chapter 6 For more complicated functions even in the case of normality very few results are available Given, however, that statistical inference depends crucially on being able to determine the distribution of such functions h(X) we need to tackle the problem somehow Intuition suggests that the limit theorems discussed in Chapter 9, when extended, might enable us to derive approximate solutions to the distribution problem
The limit theorems considered in Chapter 9 tell us that under certain conditions, which ensure that no one r.v in a sequence {X,,n>1} dominates the behaviour of the sum (}?_, X;), we can deduce that:
183
Trang 2184 Introduction to asymptotic theory
I
=3 (X,—E(X,) D
n
In order to be able to extend these results to arbitrary Borel functions h(X) not just 3 7~¡ X;, we ñrstly need to extend the various modes Ofconvergence (convergence in probability, almost sure convergence, convergence in
distribution) to apply to any sequence of r.v.’s {X,,n> 1}
The various modes of convergence related to the above limit theorems are considered in Section 10.2 The main purpose of this section is to relate the various mathematical notions of convergence to the probabilistic convergence modes needed in asymptotic theory One important mode of convergence not encountered in the context of the limit theorems is
‘convergence in the rth mean’, which refers to convergence of moments Section 10.3 discusses various concepts related to the convergence of moments such as asymptotic moments, limits of moments and probability limits in an attempt to distinguish between these concepts often confused in asymptotic theory In Chapter 9 it was stressed that an important ingredient which underlies the conditions giving rise to the various limit theorems is the notion of the ‘order of magnitude’ For example, the Markov condition needed for the WLLN,
Asymptotic theory results are resorted to by necessity, when finite sample results are not available in a usable form This is because asymptotic results provide only approximations ‘How good the approximations are’ is commonly unknown because we could answer such a question only when the finite result is available But in such a case the asymptotic result is not needed! There are, however, various ‘rough’ error bounds which can shed some light on the magnitude of the approximation error Moreover, it is often possible to ‘improve’ upon the asymptotic results using what we call
Trang 310.2 Modes of convergence 185 asymptotic expansions such as the Edgeworth expansion The purpose of Section 10.6 is to introduce the reader to this important literature on error bounds and asymptotic expansions The discussion is only introductory and much more intuitive than formal in an attempt to demystify this literature which plays an important role in econometrics For a more complete and formal discussion see Phillips (1980), Rothenberg (1984), inter alia
10.2 Modes of convergence
The notions of ‘limit? and ‘convergence’ play a very important role in probability theory, not only because of the limit theorems discussed in Chapter 9 but also because they underlie some of the most fundamental concepts such as probability and distribution functions, density functions, mean, variance, as well as higher moments This was not made explicit in Chapters 3-7 because of the mathematical subtleties involved
In order to understand the various modes of convergence in probability theory let us begin by reminding ourselves of the notion of convergence in mathematical analysis A sequence {a,,ne@ 1} is defined to be a function
from the natural numbers 1°={1, 2, 3, } to the real line R
Definition 1
A sequence {a,,n€.1 } is said to converge to a limit a if for every arbitrary small number ¢>0 there corresponds a number N(e) such that the inequality |a, —a| <e holds for all terms a, of the sequence
with n> N(e); we denote this by lim,_,, 4,=4
Definition 2
A function h(x) is said to converge to a limit | as x > Xo, if for every
Trang 4186 Introduction to asymptotic theory
é>0 however small, there exists a number 6()>0 such that
|h(x) -I] <eé
holds for every x satisfying the condition 0<|x —Xo|< d(e)
Example 2
For h(x)=e*, lim, _,,4(x)=0 and for the polynomial function
A(x) =dox"+ayx"-' ++" +a,-4x+a,, lim h(x)=a,
x70
Note that the condition 0< |x— xạ|< ð(e) excludes the point x =x, in the above definition and thus for h(x)=(x?— 9)/(x — 3), lim, › h(x) =6, even though A(x) is not defined at x=3
A function h(x), defined over some interval D(h) SR, xo € D(h) is said
to be continuous at the point x, if for each e>O, there exists a O(é)>0 such that
|ñ(x) —h(xạ)| <é for every x satisfying the restriction |x — x | < 6(¢) We denote this by lim, , A(x) =h(x9) A function h(x) is said to be continuous if it is continuous at every point of its domain, D
Example 3
The functions h(x)=ax+b and h(x)=e” are continuous for all xeR
(verify!).
Trang 5|h„(x) —h(x)| <e holds for all x éA
Example 4
For
n k h(x) = mm lim h,(x)=e* for all xe
k<o ' ate
In the case where N(e, x) does not depend on x (only on e) then {h„(x),ne, ˆ}
is said to converge uniformly on A The importance of uniform convergence stems from the fact that if each h,,(x) in the sequence is continuous and h,(x) converges uniformly to h(x) on D then the limit h(x) is also continuous That
With the above notions of continuity and limit in mathematical analysis
in mind let us consider the question of convergence in the context of the probability spaces (S, 4% P(-)) and (R, 4, P,(-)) Given that a random variable X(-) is a function from S to R we can define pointwise and uniform convergence on S for the sequence {X,(s), neé.4"} by
and
|X,(s)— X(s)|<e forn>N(e), seS, (10.11)
respectively These notions of convergence are of little interest because the
probabilistic structure of {X,(s), ne.4°} is ignored Although the
probability set functions P(-) and P,(-) do not come into the definition ofa
Trang 6188 Introduction to asymptotic theory
random variable they play a crucial role in its behaviour If we take its probabilistic structure into consideration both of the above forms of convergence are much too strong because they imply that for n>N
|X,,(s) —X(s)|<e whatever the outcome seS (10.12)
The form of probabilistic convergence closer to this is the almost sure convergence which allows for convergence of X,,(s) to X(s) for all s except of some s-set A S$ for which P(A) =0; A is said to be a set of probability zero The term almost sure is used to emphasise the convergence on S—A not the whole of S
Definition 5
A sequence of r.v.s {X,(s),n€.4 } is said to converge almost surely
(a.s.) to a rv X(s), denoted by X, ¬ X,if
An equivalent way of defining almost sure convergence is by
> lim Pr(s: |X,,(s)— X(s)|<e, all m>n)=1 (10.14)
(see Chung (1974)) The almost sure convergence ts the mode of convergence associated with the strong law of large numbers (SLLN)
Another mode of convergence not considered in relation to the limit theorems (see Chapter 9) is that of convergence in rth mean
Definition 6
Let (X,(s),ne€.4"} be a sequence of r.v.s such that E(|X,\")< % for
allne WV and E(|X|’) <x forr>0, then the sequence converges to X
Trang 710.2 Modes of convergence 189 Definition 7
A sequence of r.v.’s | X,(s),n€.4| is said to converge in probability
A sequence of r.v.s {X,(s), n€.4\ with distribution functions
(F(x), ne \ is said to converge in distribution fo X(s) denoted by
X, 7X, if
at every continuity point x of F(x)
This is nothing more than the pointwise convergence of a sequence of functions considered above In the case where the convergence is also uniform then F(x) is continuous and vice versa It is important, however, to note that F(x) in (17) might not be a proper distribution function (see Chapter 4)
Without any further restrictions on the sequence of rv.’s {X,(s), ne.177 the above four modes of convergence are related as shown in Fig 10.1 As
we can see, convergence in distribution is the weakest mode of convergence being implied by all three other modes Moreover, almost sure and rth mean convergence are not directly related but they both imply convergence
in probability In order to be able to relate almost sure and rth mean
convergence we need to impose some more restrictions on the sequence
{X,(s),n€ 4}, such as the existence of moments up to order r
Trang 8190 Introduction to asymptotic theory
convergence than (16), which holds for all nm The Implication —› = — is based on the inequality
which implies that Pr|X,— X|>e)<e""E(|X,— XỈƑ) >0 as n— x The
In order to go from convergence in probability or almost sure convergence
to rth mean convergence we need to ensure that the sequence of r.v.’s
{X,,n€.£ } is bounded and the moments up to order r exist In particular if
E(\X,\")< ax then EX,[)<+ for0<l<r (10.22)
That is, if the rth moment exists (is bounded) then all the moments of order less than r also exist This is the reason why when we assume that
Trang 910.2 Modes of convergence 191 Var(X,,)< a we do not need to add that E(X,,)< a, given that it is always implied
In applying asymptotic theory we often need to extend the above convergence results to transformed sequences of random vectors {g(X,,),
ne 4} The above convergence results are said to hold for a random vector
sequence {X,,ne.1} if they hold for each component X,,,i=1,2, ,k
of g(-): see Mann and Wald (1943) Borel functions have a distinct
advantage over continuous functions in the present context because the
limit of such functions are commonly Borel functions themselves without requiring uniform convergence Continuous functions are Borel functions but not vice versa In order to get some idea about the generality of Borel functions note that if h and g are Borel functions then the following are also Borel functions: (1) ah + bg, a, beéR, (11) |h , (ill) max(h, g), (iv) min(h, g), (Vv)
Trang 10{92 Introduction to asymptotic theory
a continuous derivative (as in the case of a continuous r.v.) then dF,,(x) is equivalent to the differential f,(x) dx; f,(x)=[dF,(x)]/dx being the corresponding density function
The limit of the rth moment (E(X‘)) is defined by
and it refers to the ordinary mathematical limit of the sequence {E(X7), n> 1} This limit is by no means equivalent to the asymptotic moments ot XxX, defined by
Trang 1110.3 Convergence of moments 193
of its asymptotic distribution F(x) and not its finite sample distribution F(x) In view of the fact that F,(x) might have moments up to order m and F(x) might not (or vice versa), there is no reason why E(X%), lim, ,, E(X}) and E,(X°) will be equal for ail r <q and all n Indeed, we can show that the limit inferior of E(|X,|") for some r>1 provide upper bounds for the corresponding asymptotic moments In particular:
If X,— X and E(\X\\)) <<, then lim,_.,, nS we E(X") = E(X")
Lemma 10.7
P
If X,, > X and E(|X|")< x, {X1,n> 1} is uniformly integrable, then
lim, E(X?)= E(X'
nyo
Lemma 10.8
If X,— X and lim, ,, inf E(\X,|")< EX) then
lim, , E(X")= E(X") nya
(For these lemmas see Serfling (1980).) Looking at these results we can see that the important condition for the equality of the limit of the rth moment
and the rth asymptotic moment is the uniform integrability of {X},n> 1}
which allows us to interchange limits with expectations.
Trang 12194 Introduction to asymptotic theory
Beyond the distinction between moments, limits of moments and asymptotic moments we sometimes encounter the concept of approximate
moments
Consider the Taylor series expansion of g(m,), m,=(1/n) V7.4, X7
This expansion is often used to derive approximate moments for g(m,) Under certain regularity conditions (see Sargan (1974),
Var(g(m,)) = [g™ (u,)]? Var(m,), (10.33) Etg(m,) — E(g(m,)))* >(g°(u,))°Elgtm,) — g(w,))°
+32 (HG (UK ELg(m,)
where ‘x’ reads approximately equal These moments are viewed as moments of a statistic purporting to approximate g(m,) and under certain conditions can be treated as approximations to the moments of g(m,)
(see Sargan (1974)) Such approximations must be distinguished from E(X‘)
as well as E(X’) The approximate moments derived above can be very useful in choosing the functions g(-) so as to make the asymptotic results more accurate in the context of variance stabilising transformations and asymptotic expansions (see Rothenberg (1984)) In deriving the asymptotic distributions of g(m,) only the first two moments are utilised and one can improve upon the normal approximation by utilising the above
approximate higher moments in the context of asymptotic expansions A brief introduction to asymptotic expansions is given in Section 10.6
10.4 The ‘big O’ and ‘little 0’ notation
As argued above, the essence of asymptotic theory is approximation;
approximation of Borel functions, random variables, distribution functions, mean, variances and higher moments (see Section 10.5) A particularly useful notion in the context of any approximation theory is that of the accuracy or order of magnitude of the approximations In
mathematical analysis the order of magnitude of the various quantities involved in an approximation is ‘kept track of” by the use of the ‘big O, little
o’ notation It turns out that this notation can be extended to probabilistic
approximations with minor modifications The purpose of this section is to review the O, o notation and consider its extension to asymptotic theory.
Trang 1310.4 The ‘big O’ and ‘little 0’ notation 195
Let {a,, b,, 2€.} be a double sequence of real numbers
(532-3) = 9(s:} (n0+1)=O(n)=o(n?); exp{—-n}=o(n °), ð>o;
(mm) =O(n"!); log,n=o(n"), a>0; (6n?+3n)=o(n3)= O(n’)
A very important implication stemming from these examples is that if
a, = O(n) then a,=o(n"*°), ø,ö>0
The O,o0 notation satisfies the following properties:
(PI) If a,=O(e,) and b,=O(c,), then
Trang 14196 Introduction to asymptotic theory
ø( ) with common domain D¥ & We say h(x)= O(g(x)) as x > Xp if for a constant K >0,
h(x)
———|<K, xe(D_—sg)
lim glx)
xxg
Moreover, we say that h(x)=o(g(x))
tin (=o xe(D—xe)
In the case where
h(x)—g(x)=O(l(x))_ we write h(x)=g(x)+ O((x))
and for
h(x)—g(x)=o(lx)) we write h(x)= g(x) + o(l(x))
This notation is particularly useful in the case of the Taylor expansion, where we can show that if h(x) is differentiable of order n (i.e the derivatives
(dh)/(6x/) =hY, j= 1, 2, , n, exist for some positive integer n) at x=xạ,
then
h? xạ) - s + 0" +0(0") asd 0 (10.37)
n!
The O, o notation considered above can be extended to the case of stochastic convergence, convergence almost surely and in probability
Definition 11
Let {X,,n€4} bea sequence of r.v.sand [c,,n€.¥'] a sequence of
positive real numbers We say that
(i) X,, is at most of order c,, in probability if there exists