Chapter 11: STASTISTICAL INFERENCE

CHAPTER II The nature of statistical inference 11.1 Introduction In the discussion of descriptive statistics in Part 1 it was argued that in order to be able to go beyond the mere summ

Trang 1

PART III

Statistical inference

Trang 2

CHAPTER II

The nature of statistical inference

11.1 Introduction

In the discussion of descriptive statistics in Part 1 it was argued that in order

to be able to go beyond the mere summarisation and description of the observed data under consideration it was important to develop a mathematical model purporting to provide a generalised description of the data generating process (DGP) Motivated by the various results on frequency curves, a probability model in the form of the parametric family

of density functions ®= { f(x; 0), @¢ ©} and its various ramifications was formulated in Part II, providing such a mathematical model Along with the formulation of the probability model ® various concepts and results were discussed in order to enable us to extend and analyse the model, preparing the way for statistical inference to be considered in the sequel Before we go

on to consider that, however, it is important to understand the difference between the descriptive study of data and statistical inference As suggested above, the concept of a density function in terms of which the probability model is defined was motivated by the concept of a frequency curve It is obvious that any density function f(x; 6) can be used as a frequency curve

by reinterpreting it as a non-stochastic function of the observed data This precludes any suggestions that the main difference between the descriptive study of data and statistical inference proper lies with the use of density functions in describing the observed data ‘What is the main difference then”

In descriptive statistics the aim is to summarise and describe the data under consideration and frequency curves provide us with a convenient way to do that The choice of a frequency curve is entirely based on the data

in hand On the other hand, in statistical inference a probability model © is

213

Trang 3

214 The nature of statistical inference

postulated a priori as a generalised description of the underlying DGP giving rise to the observed data (not the observed data themselves) Indeed, there 1s nothing stochastic about a set of members making up the data The stochastic element is introduced into the framework in the form of uncertainty relating to the underlying DGP and the observed data are

viewed as one of the many possible realisations In descriptive statistics we

start with the observed data and seek a frequency curve which describes these data as closely as possible In statistical inference we postulate a

probability model ® a priori, which purports to describe either the DGP

giving rise to the data or the population which the observed data came from These constitute fundamental departures from descriptive statistics allowing us to make generalisations beyond the numbers in hand This being the case the analysis of observed data in statistical inference proper will take a very different form as compared with descriptive statistics briefly

considered in Part I In order to see this let us return to the income data

discussed in Chapter 2 There we considered the summarisation and description of personal income data on 23 000 households using descriptors like the mean, median, mode, variance, the histogram and the frequency curve These enabled us to get some idea about the distribution of incomes among these households The discussion ended with us speculating about the possibility of finding an appropriate frequency curve which depends on

few parameters enabling us to describe the data and analyse them in a much more convenient way In Section 4.3 we suggested that the parametric family of density functions of the Pareto distribution

6+1

0

could provide a reasonable probability model for incomes over £4500 As can be seen, there is only one unknown parameter @ which once specified (x; 8) is completely determined In the context of statistical inference we postulate ® a priori as a stochastic model not of the data in hand but of the distribution of income of the population from which the observed data constitute one realisation, i.e the UK households Clearly, there is nothing wrong with using f(x; @) as a frequency curve in the context of descriptive statistics by returning to the histogram of these data and after plotting F(x; 6) for various values of 8, say = 1, 1.5, 2, choose the one which comes closer to the frequency polygon For the sake of the argument let us assume that the curve chosen is @= 1.5, Le

1.5 (4500

Trang 4

11.2 The sampling model 215 This provides us with a very convenient descriptor of these data as can be easily seen when compared with the cumbersome histogram function

$i

+Ö(X;+¡ —X;)

t

(see Chapter 2) But it is no more than a convenient descriptor of the data in hand For example, we cannot make any statements about the distribution

of personal income in the UK on the basis of the frequency curve f*(x) In order to do that we need to consider the problem in the context of statistical inference proper By postulating ® above as a probability model for the distribution of income in the UK and interpreting the observed data asa sample from the population under study we could go on to consider questions about the unknown parameter 6 as well as further observations

from the probability model, see Section 11.4 below

In Section 11.2 the important concept of a sampling model is introduced

as a way to link the probability model postulated, say ®= { f(x; 6), 8 c@),

to the observed data x=(x;, ,x„Y available The sampling model provides the second important ingredient needed to define a statistical model; the starting point of any ‘parametric’ statistical inference

In Section 11.3, armed with the concept ofa statistical model, we go on to discuss a particular approach to statistical inference, known as the frequency approach The frequency approach is briefly contrasted with another important approach to statistical inference, the Bayesian

A brief overview of statistical inference is considered in Section 11.4 asa prelude to the discussion of the next three chapters The most important

concept in statistical inference is the concept of a statistic which is discussed

in Section 11.5 This concept and its distribution provide the cornerstone

for estimation, testing and prediction

11.2 The sampling model

As argued above, the probability model ®= { f(x; 6), @¢ O} constitutes a

very important component of statistical inference Another important

element in the same context is what we call a sampling model, which provides the link between the probability model and the observed data It is

designed to model the relationship between them and refers to the way the observed data can be viewed in relation to ® In order to be able to formulate sampling models we need to define formally the concept of a

sample in statistical inference.

Trang 5

Definition 1

A sample is defined to be a set of random variables (r.v.s) (Xị, X 5 , X,) whose density functions coincide with the ‘true’ density function f(x; 9) as postulated by the probability model

Note that the term sample has a very precise meaning in this context and it

is not the meaning attributed in everyday language In particular the term does not refer to any observed data as the everyday use of the term might

suggest

The significance of the concept becomes apparent when we learn that the observed data in this context are considered to be one of the many possible realisations of the sample In this interpretation lies the inductive argument of statistical inference which enables us to extend the results based on the observed data in hand to the underlying mechanism giving rise

to them Hence the observed data in this context are no longer just a set of numbers we want to make some sense of, they represent a particular outcome of an experiment; the experiment as defined by the sampling model postulated to complement the probability model ®= { f(x; 6)

0c©)

Given that a sample is a set of r.v.s related to ® it must have a distribution which we call the distribution of the sample

Definition 2

The distribution of the sample X =(X, X,,)' is defined to be the joint distribution of the rv/s X, , X,, denoted by

ƒ(Xị X„; Ø) Sƒf(X: Ø)

The distribution of the sample incorporates both forms of relevant information, the probability as well as sample information It must come as

no surprise to learn that f(x; 0) plays a very important role in statistical inference The form of f(x; 6) depends crucially on the nature of the sampling model as well as ® The simplest but most widely used form of a sampling model ts the one based on the idea of a random experiment & (see Chapter 3) and is called a random sample

Definition 3

A set of random variables (X,, X, X,) is called a random sample from f(x; 0) if the rvs X,.X>, ,X,, are independent and identically distributed (IID) In this case the distribution of the

Trang 6

11.2 The sampling model 217 sample takes the form

f(Xi.X; X„; Ổ)= [] ƒ(x;:Ø)=[ ƒf(x: 0)]J"

i=]

the first equality due to independence and the second to the fact that the rvs ure identically distributed

One of the important ingredients of a random experiment 4 is that the experiment can be repeated under identical conditions This enables us to construct a random sample by repeating the experiment n times Such a procedure of constructing a random sample might suggest that this is feasible only when experimentation ts possible Although there is some truth in this presupposition, the concept of a random sample is also used in cases where the experiment can be repeated under identical conditions, if only conceptually In order to see this let us consider the personal income example where ® represents a Pareto family of density functions ‘What isa random sample in this case” If we can ensure that every household in the

UK has the same chance of being selected in one performance of a conceptual experiment then we can interpret the n households selected as representing a random sample (X,, X>, , X,,) and their incomes (the observed data) as being a realisation of the sample In general we denote the sample by X=(X, ,X,,)' and its realisation by x =(x, ,x,)', where x

is assumed to take values in the observation space 2% i.e x 6.4; usually

+ =R

A less restrictive form of a sampling model is what we call an independent sample, where the identically distributed condition in the random sample is relaxed

Definition 4

A set of ruos(X,, ,X,,) is suid to be an independent sample from f(x 0), f= 1, 2, ., n, respectively if the rves X, X, independent In this case the distribution of the sample takes the form

are

i=l

Usually the density functions ƒ(x,:Ø,), ¡=1,2 n belong to the same family but their numerical characteristics (moments, etc.) may differ

If we relax the independence assumption as well we have what we can call

a non-random sample.

Trang 7

Definition Š

A set of rcs(X,, ,X,) is said to be a non-random sample from ƒ(Xi, x„: Ö) jƒ the rv’s X,, ,X, are non-HD In this case the only decomposition of the distribution of the sample possible is

ƒ(X¡ Xz , Xu; Ổ)= II ƒ(Xj/Xị v Xi~i:Ô), (11.5)

i=

given Xo, where f(x;/X,, ,X;~1/9,), i=1,2, ,n, represent the

conditional distribution of X, given X,, Xy, , X;-1

A non-random sample is clearly the most general of the sampling models

considered above and includes the independent and random samples as

special cases given that

ƒ(X//Xị.- „X;~1;Ø,)=ƒ(X,;8,), i= 1,2, cà (11.6)

when X,, ,X,, are independent r.v.’s Its generality, however, renders the concept non-operational unless certain restrictions are imposed on the

heterogeneity and dependence among the X;s Such restrictions have been

extensively discussed in Sections 8.2~—3 In Part IV the restrictions often used are stationarity and asymptotic independence

In the context of statistical inference we need to postulate both a probability as well as a sampling model and thus we define a statistical model as comprising both

Definition 6

A statistical model is defined as comprising

(i) a probability model ®= { f(x; 0), 8€ ©}; and

(ii) a sampling model X=(X,, X3, , X,)'-

The concept of a statistical model provides the starting point of all forms of statistical inference to be considered in the sequel To be more precise, the concept of a statistical model forms the basis of what is known as

parametric inference There is also a branch of statistical inference known

as non-parametric inference where no ® is assumed a priori (see Gibbons (1971)) Non-parametric statistical inference is beyond the scope of this book

It must be emphasised at the outset that the two important components

of a statistical model, the probability and sampling models, are clearly interrelated For example, we cannot postulate the probability model ®=

Trang 8

11.3 The frequency approach 219 { f(x; 0), 0€@} if the sample X is non-random This is because if the r.v.’s X,, , X, are not independent the probability model must be defined in terms of their joint distribution, i.e = { f(x,, ,x,3 0), 8€O} Moreover,

in the case of an independent but not identically distributed sample we need

to specify the individual density functions for each r.v in the sample, i.e =

{ f(x, 9), 06 O, K=1, 2, , n} The most important implication of this

relationship is that when the sampling model postulated is found to be inappropriate it means that the probability model has to be respecified as well Several examples of this are encountered in Chapters 21 to 23

11.3 The frequency approach

In developing the concept of a probability model in Part II it was argued that no interpretation of probability was needed The whole structure was built upon the axiomatic approach which defined probability as a set function P(-): #— [0,1] satisfying various axioms and devoid of any interpretations (see Section 3.2) In statistical inference, however, the interpretation of the notion of probability is indispensable The discerning reader would have noted that in the above introductory discussion we have

already adopted a particular attitude towards the meaning of probability

In interpreting the observed data as one of many possible realisations of the

DGP as represented by the probability model we have committed ourselves towards the frequency interpretation of probability This is because we implicitly assumed that if we were to repeat the experiment under identical conditions indefinitely (ic with the number of observations going to infinity) we would be able to reconstruct the probability model ® In the case of the income example discussed above, this amounts to assuming that

if we were to observe everybody’s income and plot the relative frequency curve for incomes over £4500 we would get a Pareto density function This suggests that the frequency approach to statistical inference can be viewed

as a natural extension of the descriptive study of data with the introduction

of the concept of a probability model In practice we never have an infinity

of observations in order to recover the probability model completely and hence caution should be exercised in interpreting the results of the

frequency-approach-based statistical methods which we consider in the

sequel These results depend crucially on the probability model which we interpret as referring to a situation where we keep on repeating the

experiment to infinity This suggests that the results should be interpreted

as holding under the same circumstances, i.e ‘in the long run’ or ‘on average’ Adopting such an interpretation implies that we should propose statistical procedures which give rise to ‘optimum results’ according to

Trang 9

Probability model

®= {f(x;6),6 € Of

Distribution of the sample F(x4,Xa, -,X,/6)

Sampling model

X= (X,,X2, , X,)

Observed data

X= (x1, X2, ++ Xp)

Fig 11.1 The frequentist approach to statistical inference

criteria related to this ‘long-run’ interpretation Hence, it is important to keep this in mind when reading the following chapters on criteria for optimal estimators, tests and perdictors

The various approaches to statistical inference based on alternative interpretations of the notion of probability differ mainly in relation to what constitutes relevant information for statistical inference and how it should be processed In the case of the frequency approach (sometimes called the classical approach) the relevant information comes in the form of a probability model ®= { f(x; 0), Øc@} and a sampling model X=(X;, X;, ., X,,)', providing the link between ® and the observed data x =(x,, X>, .,X,)' The observed data are in effect interpreted as a realisation of the sampling model, i.e X =x This relevant information is then processed via the distribution of the sample f(x,,.x2, , X43 8) (see Fig 11.1)

The ‘subjective’ interpretation of probability, on the other hand, leads to

a different approach to statistical inference This is commonly known as the Bayesian approach because the discussion is based on revising prior beliefs about the unknown parameters @ in the light of the observed data using Bayes’ formula The prior information about @ comes in the form of a probability distribution (6); that is, @is assumed to be a random variable The revision to the prior f(6) comes in the form of the posterior distribution ƒ({Ø/x) via Bayes’ formula:

ƒ(x/0) /() f

f(x/0) being the distribution of the sample and f(x) being constant for

Trang 10

11.4 An overview of statistical inference 221 X=x For more details and an excellent discussion of the frequency and Bayesian approaches to statistical inference see Barnett (1973) In what follows we concentrate exclusively on the frequency approach

11.4 An overview of statistical inference

As defined above the simplest form of a statistical model comprises:

(i) a probability model ®= { f(x; 6), @€ @}; and

(ii) a sampling model X=(X;, X; , X„} — a random sample Using this simple statistical model, let us attempt a brief overview of statistical inference before we consider the various topics individually in order to keep the discussion which follows in perspective The statistical model in conjunction with the observed data enable us to consider the following questions:

(1) Are the observed data consistent with the postulated statistical

model? (misspecification)

(2) Assuming that the statistical model postulated is consistent with

(a) Can we decrease the uncertainty about @ by reducing the

(b) Can we decrease the uncertainty about @ by choosing a

(c) Can we consider the question that @ belongs to some

chosen what can we infer about further observations from the DGP as described by the postulated statistical model? (prediction) The above questions describe the main areas of statistical inference Comparing these questions with the ones we could ask in the context of descriptive statistics we can easily appreciate the role of probability theory

in statistical inference

The second question posed above (the first question is considered in the appendix below) assumes that the statistical model postulated is ‘valid’ and considers various forms of inference relating to the unknown parameters Ø Point estimation (or just estimation): refers to our attempt to give a

Fig 11.2) We call function h(X) an estimator of @ and its value h(x) an

Định dạng
Số trang	19
Dung lượng	617,8 KB