Chapter 1 NHỮNG VẤN ĐỀ CƠ BẢN CỦA MARKETING

The methods to be discussed are the least-squares method, the method of moments and the maximum likelihood method.. The method developed in relation to the contemporary statistical model

Trang 1

CHAPTER 13

Estimation IIT — methods

The purpose of this chapter is to consider various methods for constructing

‘good’ estimators for the unknown parameters @ The methods to be discussed are the least-squares method, the method of moments and the maximum likelihood method These three methods played an important role

in the development of statistical inference from the early nineteenth century

to the present day The historical background is central to the discussion of these methods because they were developed in response to the particular demands of the day and in the context of different statistical frameworks If

we consider these methods in the context of the present-day framework ofa statistical model as developed above we lose most of the early pioneers’ insight and the resulting anachronism can lead to misunderstanding The method developed in relation to the contemporary statistical model framework is the maximum likelihood method attributed to Fisher (1922) The other two methods will be considered briefly in relation to their historical context in an attempt to delineate their role in contemporary statistical inference and in particular their relation to the method of maximum likelihood

The method of maximum likelihood will play a very important role in the discussion and analysis of the statistical models considered in Part IV; a sound understanding of this method will be of paramount importance After the discussion of the concepts of the likelihood function, maximum likelihood estimator (MLE) and score function we go on to discuss the properties of MLE’s The properties of MLE’s are divided into finite- sample and asymptotic properties and discussed in the case of a random as well as a non-random sample The latter case will be used extensively in Part IV The actual derivation of MLE’s and their asymptotic distributions

252

Trang 2

13.1 The method of least-squares 253

is emphasised throughout as a prelude to the discussion of estimation in Part IV

13.1 The method of least-squares

The method of least-squares was first introduced by Legendre in 1805 and Gauss in 1809 in the context of astronomical measurements The problem as posed at the time was one of approximating a set of noisy observations y,, i=1,2, ,n, with some known functions g(@,, 02, , On),i=1, -,7, which depended on the unknown parameters 0=(6,, ., 0,), m<n Legendre argued that in the case of g{0@)=0,, i= 1, 2, , n, minimising

yi w;—Ø,)?, with respect to 0, (13.1)

gives rise to 0; =(1/n) ) y,=¥,, the sample mean, which was generally considered to be the most representative value of (11, ¥3, , }',) On the basis of this result he went on to suggest minimising the squared errors

I(0)= Š_ (y—g(0))?; the least-squares, (13.2)

Gauss, on the other hand, proposed a probabilistic set-up by reversing the Legendre argument about the mean Crudely, his argument was that if X=(X,, X5, - X,,)' 18 a random sample from some density function f(x) and the mean x, is the most representative value for all such X,s, then the density function must be normal, i.e

Trang 3

254 Methods

(Note: NI({-) ‘reads’ normal independent, justifying the normality assumption on the grounds of being made up of a large number of independent factors cancelling each other out.) In this form the problem can

be viewed as one of estimation in the context of the statistical model:

(1) 0= | flys 0= 555 ex] 32g20 949) Lee] (13.6)

by transferring the probabilistic assumption from ¢; to y,, the observable r.v., and

is linear, i.e

k= 1

and the normality assumption replaced by the assumptions

E(e,)=0, Var(e,)=07, E(ee)=0, iAj, ij=l,2, ,0

(13.10)

In some of the present-day literature this model is considered as an extension of the Gauss formulation by weakening the normality assumption For further discussion of the Gauss linear model see Chapter 18.

Trang 4

13.1 The method of least-squares 255

For simplicity of exposition let us consider the case where m= | and the model becomes

is the least-squares estimator of @,

Note that in the case where x,;= 1,i=1,2 ,n,9,=(1/n) 37s yụ, Le the sample mean

Given that the x, ,s are in general assumed to be known constants 6, isa linear function of the r.v.’s y,, , ¥, of the form

It can be shown that, under the above assumptions relating to ¢; if

(S"_ , x7) #0, the least-squares estimator 0, of 0, has the smallest variance

(13.17)

Trang 5

256 Methods

within the class of linear and unbiased estimators (Gauss—Markov theorem, see Section 21.2)

13.2 The method of moments

From the discussion in the previous section it is clear that the least-squares method is not a general method of estimation because it presupposes the existence of approximating functions g,(0), i= 1,2, ,n, which play the role

of the mean in the context of a probability model In the context of a probability model ®, however, unknown parameters of interest are not only associated with the mean but also with the higher moments This prompted Pearson in 1894 to suggest the method of moments as a general estimation method The idea underlying the method can be summarised as follows: Let usassume that X =(X,,X>, X,,) isa random sample from f(x; 9), 6¢R* The raw moments of f(x; 6), w= E(x’), r>1, are by definition

functions of the unknown parameters, since

Trang 6

13.3 The maximum likelihood method 257

(1/n) 7 Xj, m2=(1/n) Y?_ , X7 The method suggests

(Mì Hy)

det——————l#0 SA 0 0), 7Ô for 0eO, Úc (13.23) 13.23

If the equations y(0)=m,, i=1,2, ,k, have a unique solution 6,=(6,,

P , 6) with probability approaching one, asn > x, then 0 —> 6 (ie 6, isa consistent estimator of 6)

Although the method of moments usually yields (strongly) consistent estimators they are in general inefficient This was taken up by Fisher in several papers in the 1920s and 30s arguing in favour of the maximum likelihood method for producing efficient estimators (at least asymptotically) The controversy between Pearson and Fisher about the relative merits of their respective methods of estimation ended in the mid- 1930s with Fisher the winner and the absolute dominance since then of the maximum likelihood method

The basic reason for the inefficiency of the estimators based on the method of moments is not hard to find It is due to the fact that the method does not use any information relating to the probability model ® apart from the assumption that raw moments of order k exist It is important, however, to remember that this method was proposed by Pearson in the late nineteenth century when no such probability model was postulated a priori The problem of statistical inference at the time was seen as one starting from a sample X=(X,, , X,,)' and estimating f(x; 0) without assuming a priori some form for {(-) This point is commonly missed when comparisons between the various methods are made; it was unfortunately missed even by Pearson himself in his exchanges with Fisher It is no surprise then to discover that a method developed in the context of an alternative framework when applied to present-day set-up is found wanting

13.3 The maximum likelihood method

The maximum likelihood method of estimation was formulated by Fisher

Trang 7

258 Methods

in a series of papers in the 1920s and 30s and extended by various authors such as Cramer, Rao and Wald In the current statistical literature the method of maximum likelihood is by far the most widely used method of estimation and plays a very important role in hypothesis testing

(1) The likelihood function

Consider the statistical model:

(i) D={ f(x; 0), 0€ O};

(H) X=(X,,X, ,X,)' a sample from f(x; 6),

where X takes values in 2 = IR", the observation space The distribution of the sample D(x,,X , ,X,} Ø) describes how the density changes as X takes different values in % fora given § €@ In deriving the likelihood function we reason as follows:

since D(x; 9) incorporates all the information in the statistical model it makes a lot of intuitive sense to reverse the argument in deriving D(x; 8) and consider the question which value of 0¢© is mostly supported by a given sample realisation X =x?

‘likelihood’ of arising must be intuitively our best choice of 6 Using this intuitive argument the likelihood function is defined by

where k(x)>0 is a function of x only (not 8) In particular

Trang 8

Even though the probability remains attached to X and not @ in defining L(0; x) it is interpreted as if it is reflected inferentially on @; reflecting the

‘likelihood’ of a given X =x arising for different values of @ in © In order to see this, consider the following example

Trang 9

260 Methods

implies that the likelihood function is non-unique; any monotonic transformation of it represents the same information In particular: (1) log L(@; x), the log likelihood function; and (13.27)

(2) The maximum likelihood estimator (MLE)

Given that the likelihood function represents the support given to the

various 0€@ given X =x, it is natural to define the maximum likelihood estimator of @ to be a Borel function 6: #— © such that

660

and there may be one, none or many such MLE’s

Trang 10

Trang 11

262 Methods

Note that

log L(@; x) > log L(6*; x), for all 0*e@ (13.30)

In the case where L(@; x) is differentiable the MLE can be derived as a solution of the equations

Trang 12

Before the reader jumps to the erroneous conclusion that deriving the MLE

is a matter of a simple differentiation let us consider some examples where the derivation is not as straightforward

Example 4

Let Z=(Z,, Z>, , Z,) where Z;=(X;, Y;) be a random sample from

O\/1 p

Ni()() 0l

/6y:p= TP? ap| T— ——; (x2~2øxy+32) “ 2m 2(1—ø?) ,

log L(p; x, y)=c —n log 2n—5 log (1—p?)

Example 5

LetX=(X;,X;, , X„Y be a random sample from f(x; 0)= 1/0 where 0 <

x <6 The likelihood function is

L(0;x)=0"" if O<x; <0, i=1,2, ,n

Using [dL(0; x)]/d@=0 to derive the MLE is out of the question since L(6; x) is not continuous at the maximum (see Fig 13.5) A moment's reflection suggests that the MLE of @ is 0=max(X,, X2, , X,)-

Trang 13

that L(@; x)>0 Since 0 is bounded below by the X;8, ô= min(X;, X;,

X,,) represents the MLE of 8

Looking at examples 5 and 6 we can see that the problem of the derivation of the MLE arose because the range of the X,s depended on the unknown parameter @ It turns out that in such cases there are not only problems with deriving the MLE but also the estimators derived do not in general satisfy all the properties MLE’s enjoy (see below) For example, 6= max(X,, , X,,) 18 not asymptotically normal Such cases are excluded by the assumption CR1 of Chapter 12

So far the examples considered refer to the case where @ is a scalar In

econometrics, however, @ is commonly a k x 1 vector, a case which presents

certain additional difficulties For differentiable likelihood functions the MLE of €=(0,, 02, 6,)' is derived by solving the system of equations

Trang 14

Trang 15

(3) Finite sample properties

Let us discuss the finite sample properties of MLE’s in the context of the simple statistical model:

(i) probability model, ®={ ƒ(x; Ø), EO};

(ii) sampling model, X=(X,, ., X,,)’, is a random sample from

ñ=n( Š lo X,)

i=1

Trang 16

The invariance property of MLE’s enables us to deduce that the MLE of

1 lẻ

o=5 is $=; 2, log Xi

In relation to invariance it is important to note that in general

For example, if g(@) = 07 it is well known that E(6?)# (E(Ô))? in general This

contributes to the fact that the MLE’s are not in general unbiased estimators For instance, in example 7 above the MLE of o?, d?= (1/n) ¥'8_, (X;-X) is a biased estimator since (no?)/o? ~ ¥2(n—1) (see Section 11.5) and hence E(é?)=[(n—1)/n]o*#o7 Thus, in general, unbiased and MLE’s do not coincide In one particular case, when unbiasedness is accompanied by full efficiency, however, the two coincide

Unbiasedness, full-efficiency

In the case where ® satisfies the regularity conditions CR1-CR3 and @is an unbiased estimator of 6 whose variance achieves the Cramer—Rao lower bound, then the likelihood equation has a unique solution equal to 6 This suggests that any unbiased fully efficient estimator 6 can be derived as a solution of the likelihood equation (a comforting thought!) In example 7 above the MLE of p was ji, = X,, which implies that ji, ~~ N(u, o7/n) since j2,

is a linear function of independent r.v.’s Hence, E(i,)= and ;i, is an unbiased estimator Moreover, given that

0 —

n

we can see that Var(zi,,) achieves the Cramer—Rao lower bound On the

other hand, the MLE of 0’, ¢?=(I/n) )'"_, (X; —X,,)’ as discussed above, is

not an unbiased estimator

The property mostly emphasised by Fisher in support of the method of maximum likelihood was the property of sufficiency.

Trang 17

268 Methods

Sufficiency

If c(X) is a sufficient statistic for 6 and a unique MLE 0 of @ exists then Oisa function of 1(X) In the case of a non-unique MLE, a MLE 6 can be found which is a function of 7(X) It isimportant to note that this does not say that any MLE isa function of 2(X); in the case of non-uniqueness some MLE’s are not functions of t(X) It was shown in Chapter 12 that 7(X)=(9" , X,, vey X?) are jointly minimal sufficient statistics for 0=(, 07) in the case where X =(X,, , X,,)' is a random sample from N(u, o*) In example 7 above the MLE’s of y and o? were

by =~ » Xi, Ơn =~ > (X;—X,)

which are clearly functions of 1(X)

An important implication for ML estimation when a sufficient statistic exists is that the asymptotic covariance of @,, (see below) can be consistently estimated by the Hessian evaluated at @=6, That is,

(4) Asymptotic properties (IID case)

Although MLE’s enjoy several optimum finite sample properties, as seen above, their asymptotic properties provide the main justification for the almost universal appeal of the method of maximum likelihood As argued below, under certain regularity conditions, MLE’s can be shown to be consistent, asymptotically normal and asymptotically efficient

Let us begin the discussion of asymptotic properties enjoyed by MLE’s

by considering the simplest possible case where the statistical model is as follows:

(i) probability model, ®= { f(x; &), d€@}, 6 being a scalar;

(ti) sampling model, X =(X,, ,X,,)' isa random sample from f(x; 6) Although this case is of little interest in Part IV, a brief discussion of it will help us understand the non-random sample case considered in the sequel The regularity conditions needed to prove the above-mentioned asymptotic properties for MLE’s can take various forms (see Cramer (1946), Wald (1949), Norden (1972-73), Weiss and Wolfowitz (1974), Serfling (1980), inter alia) For our purposes it suffices to supplement the regularity conditions of Chapter 12, CR1I-CR3, with the following condition:

Tiêu đề	Những Vấn Đề Cơ Bản Của Marketing
Trường học	University of Marketing
Chuyên ngành	Marketing
Thể loại	Luận văn
Năm xuất bản	2023
Thành phố	Hanoi

Định dạng
Số trang	33
Dung lượng	845,23 KB