A Course in Mathematical Statistics phần 8 doc

Also anappropriate loss function, corresponding to δ, is of the following form:θθ ωω θθ θθ As in the point estimation case, we would like to determine a decisionfunctionδ for which the c

Trang 1

13.8 Applications of LR Tests: Contingency Tables, Goodness-of-Fit Tests

Now we turn to a slightly different testing hypotheses problem, where the LR

is also appropriate We consider an r experiment which may result in k possibly different outcomes denoted by O j , j = 1, , k In n independent repetitions of the experiment, let p j be the (constant) probability that each one

of the trials will result in the outcome O j and denote by X j the number of trials

which result in O j , j = 1, , k Then the joint distribution of the X’s is the

Multinomial distribution, that is,

We may suspect that the p’s have certain speciﬁed values; for example, in

the case of a die, the die may be balanced We then formulate this as ahypothesis and proceed to test it on the basis of the data More generally, wemay want to test the hypothesis that θ lies in a subset ωωω of ΩΩΩ

Consider the case that H :θθθθθ ∈ ωωω = {θθθθθ0}= {(p10, , p k0)′} Then, under ωωω,

x x k p p

x k

0 1

and H is rejected if −2 logλ > C The constant C is determined by the fact that

−2logλ is asymptotically χ2

k−1 distributed under H, as it can be shown on the

basis of Theorem 6, and the desired level of signiﬁcanceα

Now consider r events A i , i = 1, , r which form a partition of the sample

spaceS and let {B j , j = 1, , s} be another partition of S Let p ij = P(A i ∩ B j)and let

p i p ij p j p ij

r s

=

∑

Trang 2

Then, clearly, p i. = P(A i ), p .j = P(B j) and

p i p j p ij

j s

i r

j s

Furthermore, the events {A1, , A r } and {B1, , B s} are independent if and

female, and B may denote educational status comprising the levels B1

(el-ementary school graduate), B2 (high school graduate), B3 (college graduate),

It is then clear that

X i X j n j

Letθθθθθ = (p ij , i = 1, , r, j = 1, , s)′ Then the set ΩΩΩ of all possible values of

Under the above set-up, the problem of interest is that of testing whether

the characteristics A and B are independent That is, we want to test the existence of probabilities p i , q j , i = 1, , r, j = 1, , s such that H:p ij = p i q j,

i = 1, , r, j = 1, , s Since for i = 1, , r − 1 and j = 1, , s − 1 we have the r + s − 2 independent linear relationships

p ij p i p ij q j

i r

it follows that the set ω, speciﬁed by H, is an (r + s − 2)-dimensional subsetω

ofΩΩ

Next, if x ij is the observed value of X ij and if we set

x i x ij x j x ij

r s

=

∑

Trang 3

the likelihood function takes the following forms under ΩΩ and ωωω, respectively.WritingΠi,j instead of Πr

i j

ij

i j

i j x

ij

i j

i x j x

ij

i j

i x

i

j x

i x j x

i

x x

i

s x

j i

i j

i x

i

x s x

i

i x

i

j x

ij i i j j

x n

ij

i j

ij

i j x

ij

i j

i x

i

j x

x n x n

i x

i

j x

j

ij x

i j

i x

i

j x

j n ij x

Trang 4

Thisχ2 r.v can be used for testing the hypothesis

=

∑ x j np np j

j j

k

and reject H if χ2

ω is too large, in the sense of being greater than a certain

constant C which is speciﬁed by the desired level of the test It can further be

shown that, under ωω,χ2

f with f = (r − 1)(s − 1), as can be shown Once

more this test is asymptotically equivalent to the corresponding test based on

−2logλ

Tests based on chi-square statistics are known as chi-square tests or

goodness-of-ﬁt tests for obvious reasons.

ωω= ⋅ as claimed in the discussion inthis section

In Exercises 13.8.2–13.8.9 below, the test to be used will be the appropriate χ2test

13.8.2 Refer to Exercise 13.7.2 and test the hypothesis formulated there atthe speciﬁed level of signiﬁcance by using a χ2

-goodness-of-ﬁt test Also,compare the cut-off point with that found in Exercise 13.7.2(i)

Trang 5

13.8.3 A die is cast 600 times and the numbers 1 through 6 appear with thefrequencies recorded below.

At the level of signiﬁcanceα = 0.1, test the fairness of the die

13.8.4 In a certain genetic experiment, two different varieties of a certainspecies are crossed and a speciﬁc characteristic of the offspring can only occur

at three levels A, B and C, say According to a proposed model, the ties for A, B and C are 1

probabili-12, 3

12 and 8

12, respectively Out of 60 offsprings, 6, 18,

and 36 fall into levels A, B and C, respectively Test the validity of the

proposed model at the level of signiﬁcanceα = 0.05

13.8.5 Course work grades are often assumed to be normally distributed In

a certain class, suppose that letter grades are given in the following manner: A for grades in [90, 100], B for grades in [75, 89], C for grades in [60, 74], D for grades in [50, 59] and F for grades in [0, 49] Use the data given below to check the assumption that the data is coming from an N(75, 92) distribution For thispurpose, employ the appropriate χ2

test and take α = 0.05

13.8.6 It is often assumed that I.Q scores of human beings are normallydistributed Test this claim for the data given below by choosing appropriatelythe Normal distribution and taking α = 0.05

x ≤ 90 90 < x ≤ 100 100 < x ≤ 110 110 < x ≤ 120 120 < x ≤ 130 x > 130

(Hint: Estimate μ and σ2

from the grouped data; take the midpoints for theﬁnite intervals and the points 65 and 160 for the leftmost and rightmostintervals, respectively.)

13.8.7 Consider a group of 100 people living and working under very similarconditions Half of them are given a preventive shot against a certain diseaseand the other half serve as control Of those who received the treatment, 40did not contract the disease whereas the remaining 10 did so Of those nottreated, 30 did contract the disease and the remaining 20 did not Test effec-tiveness of the vaccine at the level of signiﬁcanceα = 0.05

Trang 6

13.8.8 On the basis of the following scores, appropriately taken, testwhether there are gender-associated differences in mathematical ability (as isoften claimed!) Take α = 0.05.

Boys: 80 96 98 87 75 83 70 92 97 82Girls: 82 90 84 70 80 97 76 90 88 86(Hint: Group the grades into the following six intervals: [70, 75), [75, 80), [80,85), [85, 90), [90, 100).)

13.8.9 From each of four political wards of a city with approximately thesame number of voters, 100 voters were chosen at random and their opinionswere asked regarding a certain legislative proposal On the basis of the datagiven below, test whether the fractions of voters favoring the legislative pro-posal under consideration differ in the four wards Take α = 0.05

13.9 Decision-Theoretic Viewpoint of Testing Hypotheses

For the deﬁnition of a decision, loss and risk function, the reader is referred toSection 6, Chapter 12

Let X1, , X n be i.i.d r.v.’s with p.d.f f(·;θθθθθ), θθθθθ ∈ ΩΩΩ ⊆r

, and let ω be aω(measurable) subset of ΩΩ Then the hypothesis to be tested is H: θθθθθ ∈ωωω against

the alternative A :θθθθθ ∈ωωc

Let B be a critical region Then by setting z = (x1, ,

x n)′, in the present context a non-randomized decision function δ = δ(z) isdeﬁned as follows:

⎩

10

,,

ifotherwise

B

Trang 7

We shall conﬁne ourselves to non-randomized decision functions only Also anappropriate loss function, corresponding to δ, is of the following form:

θθ ωω

θθ θθ

As in the point estimation case, we would like to determine a decisionfunctionδ for which the corresponding risk would be uniformly (in θθθθθ) smallerthan the risk corresponding to any other decision function δ* Since this is not

feasible, except for trivial cases, we are led to minimax decision and Bayes

decision functions corresponding to a given prior p.d.f on ΩΩ Thus in the casethatω = {θθθθθω 0} and ωc

= {θθθθθ1},δ is minimax if

max R[ ( ) ( )θθ0; δ , R θθ1;δ ]≤max R[ (θθ0;δ* ,) ( R θθ1;δ*) ]

for any other decision function δ*

Regarding the existence of minimax decision functions, we have the result

below The r.v.’s X1, , X n is a sample whose p.d.f is either f(·;θθθθθ0) or else

f(·;θθθθθ1) By setting f0= f(·; θθθθθ0) and f1= f(·; θθθθθ1), we have

Let X1, , X n be i.i.d r.v.’s with p.d.f f(·; θθθθθ), θθθθθ ∈ ΩΩΩ = {θθθθθ0, θθθθθ1} We are

interested in testing the hypothesis H :θθθθθ = θθθθθ0 against the alternative A :θθθθθ = θθθθθ1 atlevelα Deﬁne the subset B of n

THEOREM 7

Trang 8

δ z( )=⎧⎨ z∈

⎩

10

,,

ifotherwise,

Let A be any other (measurable) subset of n

and let δ* be the correspondingdecision function Then

R(0; and Rδ*)=L P1 0(Z∈A) (1;δ*)=L P2 1(Z∈A c)

Consider R(0; δ) and R(0; δ*) and suppose that R(0; δ*) ≤ R(0; δ) This is

equivalent to L1P0(Z∈ A) ≤ L1P0(Z∈ B), or

P0(Z∈A)≤α

Then Theorem 1 implies that P1(Z∈A) ≤ P1(Z∈B) because the test deﬁned

by (47) is MP in the class of all tests of level ≤α Hence

P A c P B c L P A c L P B c

1(Z∈ )≥ 1(Z∈ ), or 2 1(Z∈ )≥ 2 1(Z∈ ),

or equivalently, R(1; δ*) ≥ R(1; δ) That is, if

R(0; Rδ*)≤ ( )0; then Rδ , ( )1;δ ≤R(1; δ* ) (48)Since by assumption (0; δ) = (1; δ), we have

max R[ (0;δ* ,) ( R1;δ*) ]=R(1; δ*)≥R( )1;δ =max R[ ( ) ( )0;δ , R 1; δ ],

(49)whereas if R(0; δ) < R(0; δ*), then

max R[ (0;δ* ,) (R1; δ*) ]≥R(0; δ*)>R( )0; δ =max R[ ( ) ( )0;δ , R1; δ ]

(50)Relations (49) and (50) show that δ is minimax, as was to be seen ▲

REMARK 7 It follows that the minimax decision function deﬁned by (46) is

an LR test and, in fact, is the MP test of level P0(Z ∈ B) constructed in

Theorem 1

We close this section with a consideration of the Bayesian approach Inconnection with this it is shown that, corresponding to a given p.d.f on ΩΩ ={θθθθθ0,θθθθθ1}, there is always a Bayes decision function which is actually an LR test.More precisely, we have

Trang 9

Let X1, , X n be i.i.d r.v.’s with p.d.f f(·;θθθθθ), θθθθθ ∈ ΩΩΩ = {θθθθθ0,θθθθθ1} and let λ0= {p0,

p1} (0 < p0 < 1) be a probability distribution on ΩΩΩ Then for testing the

hypothesis H :θθθθθ = θθθθθ0 against the alternative A :θθθθθ = θθθθθ1, there exists a Bayes

decision functionδλ 0corresponding to λ0= {p0, p1}, that is, a decision rule whichminimizes the average risk R(θθθθθ0;δ)p0+ R(θθθθθ1;δ)p1, and is given by

δλ 0

10

( )=⎧⎨ ∈

⎩

,,

ifotherwise,

REMARK 8 It follows that the Bayesian decision function is an LR test and

is, in fact, the MP test for testing H against A at the level P0(Z∈ B), as follows

by Theorem 1

THEOREM 8

Trang 10

The following examples are meant as illustrations of Theorems 7 and 8.

Let X1, , X n be i.i.d r.v.’s from N(θ, 1) We are interested in determiningthe minimax decision function δ for testing the hypothesis H:θ = θ0 against the

alternative A :θ = θ1 We have

f f

n

z z

;

expexp

θθ

θ θ

1 0

1 0 1 2 0 2

12

( ) ( )= [ ( − ) ]

so that f(z;θ1)> Cf(z;θ0) is equivalent to

2 0 2

01

Hence C0= 0.53, as is found by the Normal tables

Therefore the minimax decision function is given by

Trang 11

and the power of the test is

corresponding to the minimax δ given above

Refer to Example 13 and determine the Bayes decision function ing to λ0= {p0, p1}

correspond-From the discussion in the previous example it follows that the Bayesdecision function is given by

ifotherwise,

,

ifotherwise

x

The type-I error probability of this test is Pθ0(X ¯ > 0.55) = P[N(0, 1) > 2.75] =

1− Φ(2.75) = 0.003 and the power of the test is Pθ 1(X X ¯ > 0.55) = P[N(1, 1) > −

2.25] = Φ(2.25) = 0.9878 Therefore relation (51) gives that the Bayes riskcorresponding to {2

n

z z

;

θθ

θθ1

0 1 0

1

11

( ) ( )=⎛⎝⎜

Trang 12

j=1X j is B(n, θ) and for C0= 13, we have P0.5(X> 13) = 0.0577 and

P0.75(X > 13) = 0.7858, so that P0.75(X≤ 13) = 0.2142 With the chosen values of

L1 and L2, it follows then that relation (46) is satisﬁed Therefore the minimaxdecision function is determined by

ifotherwise

x

Furthermore, the minimax risk is equal to 0.5 × 0.2142 = 0.1071

Trang 13

14.1 Some Basic Theorems of Sequential Sampling

In all of the discussions so far, the random sample Z1, , Z n, say, that we have

dealt with was assumed to be of ﬁxed size n Thus, for example, in the point estimation and testing hypotheses problems the sample size n was ﬁxed be-

forehand, then the relevant random experiment was supposed to have been

independently repeated n times and ﬁnally, on the basis of the outcomes, a

point estimate or a test was constructed with certain optimal properties.Now, whereas in some situations the random experiment under considera-tion cannot be repeated at will, in many other cases this is, indeed, the case Inthe latter case, as a rule, it is advantageous not to ﬁx the sample size inadvance, but to keep sampling and terminate the experiment according to a(random) stopping time

Let {Z n } be a sequence of r.v.’s A stopping time (deﬁned on this sequence) is

a positive integer-valued r.v N such that, for each n, the event (N = n) depends

on the r.v.’s Z1, , Z n alone

REMARK 1 In certain circumstances, a stopping time N is also allowed to

take the value ∞ but with probability equal to zero In such a case and when

forming EN, the term ∞ · 0 appears, but that is interpreted as 0 and no problemarises

Next, suppose we observe the r.v.’s Z1, Z2, one after another, a singleone at a time (sequentially), and we stop observing them after a speciﬁed eventoccurs In connection with such a sampling scheme, we have the followingdeﬁnition

A sampling procedure which terminates according to a stopping time is called

Trang 14

Thus a sequential procedure terminates with the r.v Z N , where Z N isdeﬁned as follows:

the value of Z N at s∈S is equal to Z N s( )( )s (1)

Quite often the partial sums S N = Z1+ · · · + Z N deﬁned by

S N( )s =Z s1( )+ ⋅ ⋅ ⋅ +Z N s( )( )s, s∈S (2)are of interest and one of the problems associated with them is that of ﬁnding

the expectation ES N of the r.v S N Under suitable regularity conditions, thisexpectation is provided by a formula due to Wald

(Wald’s lemma for sequential analysis) For j ≥ 1, let Z j be independent r.v.’s(not necessarily identically distributed) with identical ﬁrst moments such that

E|Z j|= M < ∞, so that EZ j=μ is also ﬁnite Let N be a stopping time, deﬁned

on the sequence {Z j }, j ≥ 1, and assume that EN is ﬁnite Then E|S N|< ∞ and

ES N=μEN, where S N is deﬁned by (2) and Z N is deﬁned by (1)

The proof of the theorem is simpliﬁed by ﬁrst formulating and proving a

lemma For this purpose, set Y j = Z j −μ, j ≥ 1 Then the r.v.’s Y1, Y2, are

independent, EY j= 0 and have (common) ﬁnite absolute moment of ﬁrst order

to be denoted by m; that is, E|Y j|= m < ∞ Also set T N = Y1+ · · · + Y N, where

Y N and T N are deﬁned in a way similar to the way Z N and S N are deﬁned by (1)and (2), respectively Then we will show that

ET N < ∞ and ET N =0 (3)

In all that follows, it is assumed that all conditional expectations, given N = n, are ﬁnite for all n for which P(N = n) > 0 We set E(Y j |N = n) = 0 (accordingly,

E(|Y j ||N = n) = 0 for those n’s for which P(N = n) = 0).

In the notation introduced above:

The event (N = n) depends only on Y1, , Y n and hence, for j > n,

E(|Y j ||N = n) = E|Y j|= m Therefore (4) becomes

Trang 15

n j j

1

(6)where the equality ∑∞

j=1P(N ≥ j) = ∑∞

j=1jP(N = j) is shown in Exercise 14.1.1.

Relation (6) establishes part (i)

ii) By setting p jn = E(|Y j ||N = n)P(N = n), this part asserts that

p jn p p p p n p n p nn j

,

and

n j j

Mathematical Analysis, Addison-Wesley, 1957). ▲

PROOF OF THEOREM 1 Since T N = S N−μN, it sufﬁces to show (3) To this

n

j j n

n

j j n

Trang 16

j j n

n

j

n j j

This is so because the event (N = n) depends only on Y1, , Y n, so that, for

j > n, E(Y j |N = n) = EY j= 0 Therefore (9) yields

Relations (7) and (12) complete the proof of the theorem ▲

Now consider any r.v.’s Z1, Z2, and let C1, C2 be two constants such that

C1< C2 Set S n = Z1+ · · · + Z n and deﬁne the random quantity N as follows: N

is the smallest value of n for which S n ≤ C1 or S n ≥ C2 If C1< S n < C2 for all n, then set N = ∞ In other words, for each s ∈S, the value of N at s, N(s), is assigned as follows: Look at S n (s) for n ≥ 1, and ﬁnd the ﬁrst n, N = N(s), say, for which S N (s) ≤ C1 or S N (s) ≥ C2 If C1< S n (s) < C2 for all n, then set N(s)= ∞.Then we have the following result

Trang 17

Let Z1, Z2, be i.i.d r.v.’s such that P(Z j = 0) ≠ 1 Set S n = Z1+ · · · + Z n and

for two constants C1, C2with C1< C2, deﬁne the r quantity N as the smallest n for which S n ≤ C1 or S n ≥ C2; set N = ∞ if C1< S n < C2 for all n Then there exist

c > 0 and 0 < r < 1 such that

PROOF The assumption P(Z j = 0) ≠ 1 implies that P(Z j > 0) > 0, or P(Z j< 0)

> 0 Let us suppose ﬁrst that P(Z j> 0) > 0 Then there exists ε > 0 such that

P(Z j>ε) = δ > 0 In fact, if P(Z j >ε) = 0 for every ε > 0, then, in particular,

P(Z j > 1/n) = 0 for all n But (Z j > 1/n) ↑ (Z j > 0) and hence 0 = P(Z j > 1/n) →

P(Z j> 0) > 0, a contradiction

Thus for the case that P(Z j> 0) > 0, we have that

There exists ε>0 such that P Z( j>ε)= >δ 0 (14)

With C1, C2 as in the theorem and ε as in (14), there exists a positive integer m

the ﬁrst inclusion being obvious because there are m Z’s, each one of which is

greater than ε, and the second inclusion being true because of (15) Thus

the inequality following from (17) and the equalities being true because of the

independence of the Z’s and (14) Clearly

S km Z jm Z j m j

This is so because, if for some j = 0, 1, , k − 1, we suppose that Z jm+1+ · · · +

Z (j+1)m > C2− C1, this inequality together with S jm > C1 would imply that S (j+1)m

> C , which is in contradiction to C < S < C , i = 1, , km Next,

THEOREM 2

Trang 18

, , ,

,I

the ﬁrst inclusion being obvious from the deﬁnition of N and the second one

following from (18) Therefore

j k

0 1

I

the last inequality holding true because of (16) and the equality before it by the

independence of the Z’s Thus

The theorem just proved has the following important corollary

Under the assumptions of Theorem 2, we have (i) P(N < ∞) = 1 and (ii)

Trang 19

by Theorem 2 in Chapter 2 But P(A n) ≤ cr n

by the theorem Thus lim

P(A n)= 0, so that P(A) = 0, as was to be shown.

ii) We have

c r r

REMARK 2 The r.v N is positive integer-valued and it might also take on the

value∞ but with probability 0 by the ﬁrst part of the corollary On the other

hand, from the deﬁnition of N it follows that for each n, the event (N = n) depends only on the r.v.’s Z1, , Z n Accordingly, N is a stopping time by

Deﬁnition 1 and Remark 1

Exercises

14.1.1 For a positive integer-valued r.v N show that EN= ∑∞

n=1P(N ≥ n).

14.1.2 In Theorem 2, assume that P(Z j< 0) > 0 and arrive at relation (13)

14.2 Sequential Probability Ratio Test

Although in the point estimation and testing hypotheses problems discussed inChapter 12 and 13, respectively (as well as in the interval estimation problems

to be dealt with in Chapter 15), sampling according to a stopping time is, ingeneral, proﬁtable, the mathematical machinery involved is well beyond thelevel of this book We are going to consider only the problem of sequentiallytesting a simple hypothesis against a simple alternative as a way of illustratingthe application of sequential procedures in a concrete problem

To this end, let X1, X2, be i.i.d r.v.’s with p.d.f either f0 or else f1, and

suppose that we are interested in testing the (simple) hypothesis H: the true density is f0 against the (simple) alternative A: the true density is f1, at level ofsigniﬁcanceα (0 < α < 1) without ﬁxing in advance the sample size n.

In order to simplify matters, we also assume that {x ∈ ; f0(x) > 0} =

1 1 1

0 1 0

0 1, , ; ,

Trang 20

We shall use the same notation λn for λn (x1, , x n ; 0, 1), where x1, , x n are

the observed values of X1, , X n

For testing H against A, consider the following sequential procedure: As long as a < λn < b, take another observation, and as soon as λ n ≤ a, stop sampling and accept H and as soon as λn ≥ b, stop sampling and reject H.

By letting N stand for the smallest n for which λn ≤ a or λ n ≥ b, we have that

N takes on the values 1, 2, and possibly ∞, and, clearly, for each n, the event (N = n) depends only on X1, , X n Under suitable additional assumptions,

we shall show that the value ∞ is taken on only with probability 0, so that N will

be a stopping time

Then the sequential procedure just described is called a sequential

prob-ability ratio test (SPRT) for obvious reasons.

In what follows, we restrict ourselves to the common set of positivity of

f0 and f1, and for j = 1, , n, set

At this point, we also make the assumption that P i [f0(X1)≠ f1(X1)]> 0 for

i = 0, 1; equivalently, if C is the set over which f0 and f1 differ, then it is assumedthat∫C f0(x)dx> 0 and ∫C f1(x)dx> 0 for the continuous case This assumption is

equivalent to P i (Z1≠ 0) > 0 under which the corollary to Theorem 2 applies.Summarizing, we have the following result

Let X1, X2, be i.i.d r.v.’s with p.d.f either f0 or else f1, and suppose that

1 1 1

0 1 0

1 0

1, log , , ,

and

S n Z j j

For two numbers a and b with 0 < a < b, deﬁne the random quantity N as the smallest n for which λn ≤ a or λ n ≥ b; equivalently, the smallest n for which

S n ≤ loga or S n ≥ logb for all n Then

Trang 21

terminate with probability one and acceptance or rejection of H, regardless of

the true underlying density

In the formulation of the proposition above, the determination of a and b

was postponed until later At this point, we shall see what is the exact

determi-nation of a and b, at least from theoretical point of view However, the actual

identiﬁcation presents difﬁculties, as will be seen, and the use of approximatevalues is often necessary

To start with, let α and 1 − β be prescribed ﬁrst and second type of errors,

respectively, in testing H against A, and let α < β < 1 From their own tion, we have

= ( ≥ )+ ( < < ≥ )+ ⋅ ⋅ ⋅+ < <

,, ,,a< n <b, n≥b

( λ −1 λ )+ ⋅ ⋅ ⋅ (20)and

= ( ≤ )+ ( < < ≤ )+ ⋅ ⋅ ⋅+ < <

,, ,a< n <b, n≤a

( λ −1 λ )+ ⋅ ⋅ ⋅ (21)Relations (20) and (21) allow us to determine theoretically the cut-off points

a and b when α and β are given

In order to ﬁnd workable values of a and b, we proceed as follows For each n, set

1

1 0

1

1 0

Trang 22

In other words, T′n is the set of points in n

for which the SPRT terminates

with n observations and accepts H, while T″n is the set of points in n

for which

the SPRT terminates with n observations and rejects H.

In the remainder of this section, the arguments will be carried out for the

case that the X j’s are continuous, the discrete case being treated in the sameway by replacing integrals by summation signs Also, for simplicity, the differ-entials in the integrals will not be indicated

From (20), (22) and (23), one has

But on T″n , f 1n /f on ≥ b, so that f 0n ≤ (1/b)f 1n Therefore

n T n

1

1 1

in T n

1 1

n T n

≥ −

− ≤

1

1 , b . (29)

Relation (29) provides us with a lower bound and an upper bound for the

actual cut-off points a and b, respectively.

βα

α β

,

and

and suppose that the SPRT is carried out by employing the cut-off points a′

and b ′ given by (30) rather than the original ones a and b Furthermore, let α′

Trang 23

and 1 −β′ be the two types of errors associated with a′ and b′ Then replacing

α, β, a and b by α′, β′, a′ and b′, respectively, in (29) and also taking into

consideration (30), we obtain

11

− ′

βα

and (31)That is,

Summarizing the main points of our derivations, we have the following result

For testing H against A by means of the SPRT with prescribed error

probabili-tiesα and 1 − β such that α < β < 1, the cut-off points a and b are determined

by (20) and (21) Relation (30) provides approximate cut-off points a ′ and b′

with corresponding error probabilities α′ and 1 − β′, say Then relation (32)provides upper bounds for α′ and 1 − β′ and inequality (33) shows that theirsumα′ + (1 − β′) is always bounded above by α + (1 − β)

REMARK 3 From (33) it follows that α′ > α and 1 − β′ > 1 − β cannot happensimultaneously Furthermore, the typical values of α and 1 − β are such as 0.01,0.05 and 0.1, and then it follows from (32) that α′ and 1 − β′ lie close to α and

1−β, respectively For example, for α = 0.01 and 1 − β = 0.05, we have α′ <0.0106 and 1 −β′ < 0.0506 So there is no serious problem as far as α′ and 1 −

β′ are concerned The only problem which may arise is that, because a′ and b′ are used instead of a and b, the resulting α′ and 1 − β′ are too small compared

toα and 1 − β, respectively As a consequence, we would be led to taking amuch larger number of observations than would actually be needed to obtain

α and β It can be argued that this does not happen

Exercise

14.2.1 Derive inequality (28) by using arguments similar to the ones ployed in establishing relation (27)

em-PROPOSITION 2

Trang 24

14.3 Optimality of the SPRT-Expected Sample Size

An optimal property of the SPRT is stated in the following theorem, whoseproof is omitted as being well beyond the scope of this book

For testing H against A, the SPRT with error probabilities α and 1 − β

minimizes the expected sample size under both H and A (that is, it minimizes

E0N and E1N) among all tests (sequential or not) with error probabilities

bounded above by α and 1 − β and for which the expected sample size is ﬁnite

under both H and A.

The remaining part of this section is devoted to calculating the expectedsample size of the SPRT with given error probabilities, and also ﬁnding ap-proximations to the expected sample size

So consider the SPRT with error probabilities α and 1 − β, and let N be the

associated stopping time Then we clearly have

Thus formula (34) provides the expected sample size of the SPRT under both

H and A, but the actual calculations are tedious This suggests that we should

try to ﬁnd an approximate value to E i N, as follows By setting A = loga and

B = logb, we have the relationships below:

i i

j

i i

n

i i n

( a or b)=(Z ≤A or Z ≥B) (36)From the right-hand side of (35), all partial sums ∑j

=1Z i , j = 1, , n − 1 lie between A and B and it is only the ∑n

i=1Z i which is either ≤A or ≥B, and this is due to the nth observation Z n We would then expect that ∑n

Trang 25

Therefore we obtain

E S0 N ≈ −( )1 α A+αB and E S1 N ≈ −( )1 β A+βB (37)

On the other hand, by assuming that E i |Z1| < ∞, i = 0, 1, Theorem 1 gives

E i S N = (E i N)(E i Z1) Hence, if also E i Z1 ≠ 0, then E i N = (E i S N )/(E i Z1) Byvirtue of (37), this becomes

Thus we have the following result

In the SPRT with error probabilities α and 1 − β, the expected sample size E i N,

i = 0, 1 is given by (34) If furthermore E i |Z1|< ∞ and E i Z1≠ 0, i = 0, 1, relation (38) provides approximations to E i N, i= 0, 1

REMARK 4 Actually, in order to be able to calculate the approximations

given by (38), it is necessary to replace A and B by their approximate values

taken from (30), that is,

βα

tive A :θ = 0.05 with α = 0.1, 1 − β = 0.05 Find the expected sample sizes under

both H and A and compare them with the ﬁxed sample size of the MP test for testing H against A with the same α and 1 − β as above

14.3.2 Discuss the same questions as in the previous exercise if the X j’sare independently distributed as Negative Exponential with parameter θ ∈

Ω = (0, ∞)

14.4 Some Examples

This chapter is closed with two examples In both, the r.v.’s X1, X2, are i.i.d

with p.d.f f(·;θ), θ ∈ Ω ⊆ , and for θ0,θ1∈ Ω with θ0<θ1, the problem is that

of testing H :θ = θ0 against A :θ = θ1 by means of the SPRT with error tiesα and 1 − β Thus in the present case f = f(·;θ) and f = f(·;θ)

probabili-PROPOSITION 3

Trang 26

What we explicitly do, is to set up the formal SPRT and for selectednumerical values of α and 1 − β, calculate a′, b′, upper bounds for α′ and

1− β′, estimate E i N, i = 0, 1, and ﬁnally compare the estimated E i N, i= 0, 1with the size of the ﬁxed sample size test with the same error probabilities

Let X1, X2, be i.i.d r.v.’s with p.d.f

1 0

11

and we continue sampling as long as

A n

X j B n

j n

1111

11

1 0

0 1

1

1 0

0 1

θθ

11

= ( ) ( )= ( ( −− ) )+ −−

log logθ θ log ,

θθ

so that

1 0

11

For a numerical application, take α = 0.01 and 1 − β = 0.05 Then the cut-off

points a and b are approximately equal to a ′ and b′, respectively, where a′ and

b′ are given by (30) In the present case,

.

.and

For the cut-off points a ′ and b′, the corresponding error probabilities α′ and

1−β′ are bounded as follows according to (32):

. .and β

Next, relation (39) gives

A≈log 5 = − B≈log =

At this point, let us suppose that θ =–3 and θ =–4 Then

EXAMPLE 1

Trang 27

logθ θ log log log ,

θθ

1 0

11

5

11

1

1 2 0 2

By using the same values of α and 1 − β as in the previous example, we have

the same A and B as before Taking θ0= 0 and θ1= 1, we have

E Z0 1= −0 5 and E Z1 1=0 5

Thus relation (38) gives

E N0 ≈2 53 and E N1 ≈3 63

Now the ﬁxed sample size MP test is given by (13) in Chapter 13 From this

we ﬁnd that n ≈ 15.84 Again both E0N and E1N compare very favorably with

the ﬁxed value of n which provides the same protection.

EXAMPLE 2

Trang 28

15.1 Conﬁdence Intervals

Let X1, , X n be i.i.d r.v.’s with p.d.f f(·;θθθθθ) θθθθθ ∈ ΩΩΩ ⊆r

In Chapter 12, weconsidered the problem of point estimation of a real-valued function of θθθθθ, g(θθθθθ) That is, we considered the problem of estimating g(θθθθθ) by a statistic (based on

the X’s) having certain optimality properties.

In the present chapter, we return to the estimation problem, but in adifferent context First, we consider the case that θ is a real-valued parameterand proceed to deﬁne what is meant by a random interval and a conﬁdenceinterval

DEFINITION 1 A random interval is a ﬁnite or inﬁnite interval, where at least one of the end

points is an r.v

DEFINITION 2 Let L(X1, , X n ) and U(X1, , X n ) be two statistics such that L(X1, , X n)

≤ U(X1, , X n ) We say that the r interval [L(X1, , X n ), U(X1, , X n)] is

a confidence interval for θ with confidence coefficient 1 − α (0 < α < 1) if

P L Xθ[ ( 1, ,X n)≤ ≤θ U X( 1, ,X n) ]≥ −1 α for all θ∈Ω (1)

Also we say that U(X1, , X n ) and L(X1, , X n ) is an upper and a lower

confidence limit for θ, respectively, with confidence coefficient 1 − α, if for all

Trang 29

r interval [L(X1, , X n ), U(X1, , X n)] covers the parameter θ no matterwhatθ ∈ Ω is.

The interpretation of this statement is as follows: Suppose that the r

experiment under consideration is carried out independently n times, and if x j

is the observed value of X j , j = 1, , n, construct the interval [L(x1, , x n),

U(x1, , x n )] Suppose now that this process is repeated independently N times, so that we obtain N intervals Then, as N gets larger and larger, at least

(1−α)N of the N intervals will cover the true parameter θ.

A similar interpretation holds true for an upper and a lower conﬁdencelimit of θ

REMARK 1 By relations (1) and (2) and the fact that

it follows that, if L(X1, , X n ) and U(X1, , X n) is a lower and an upperconfidence limit for θ, respectively, each with confidence coefficient 1 −1

that we would be interested in ﬁnding the shortest conﬁdence interval within a

certain class of conﬁdence intervals This will be done explicitly in a number ofinteresting examples

At this point, it should be pointed out that a general procedure forconstructing a conﬁdence interval is as follows: We start out with an r.v

T n(θ) = T(X1, , X n;θ) which depends on θ and on the X’s only through a

sufﬁcient statistic of θ, and whose distribution, under Pθ, is completely

deter-mined Then L n = L(X1, , X n ) and U n = U(X1, , X n) are some rather

simple functions of T n(θ) which are chosen in an obvious manner

The examples which follow illustrate the point

Trang 30

dence interval (and also the shortest confidence interval within a certain class)forθ with confidence coefficient 1 − α.

EXAMPLE 1 Let X1, , X n be i.i.d r.v.’s from N(μ, σ2

) First, suppose that σ is known, sothatμ is the parameter, and consider the r.v T n(μ) = √n(X¯ − μ)/σ Then T n(μ)

depends on the X’s only through the sufﬁcient statistic X¯ of μ and its

distribu-tion is N(0, 1) for all μ

Next, determine two numbers a and b (a < b) such that

P a[ ≤N( )0 1, ≤b]= −1 α (3)From (3), we have

X a n

is a confidence interval for μ with confidence coefficient 1 − α Its length is

equal to (b − a)σ/√n From this it follows that, among all conﬁdence intervals

with conﬁdence coefﬁcient 1 −α which are of the form (4), the shortest one is

that for which b − a is smallest, where a and b satisfy (3) It can be seen (see also Exercise 15.2.1) that this happens if b = c (> 0) and a = −c, where c is the

upperα/2 quantile of the N(0, 1) distribution which we denote by zα/2 fore the shortest confidence interval for μ with confidence coefficient 1 − α(and which is of the form (4)) is given by

Next, assume that μ is known, so that σ2

is the parameter, and considerthe r.v

n for all σ2

Now determine two numbers a and b (0 < a < b) such that

P a( ≤χn2≤b)= −1 α (6)From (6), we obtain

Trang 31

nS a

2 2 21

nS a

is a conﬁdence interval for σ2

with conﬁdence coefﬁcient 1 −α and its length

is equal to (1/a − 1/b)nS2

n The expected length is equal to (1/a − 1/b)nσ2

Now, although there are inﬁnite pairs of numbers a and b satisfying (6), in

practice they are often chosen by assigning mass α/2 to each one of the tails ofthe χ2

n distribution However, this is not the best choice because then thecorresponding interval (7) is not the shortest one For the determination of the

shortest conﬁdence interval, we work as follows From (6), it is obvious that a and b are not independent of each other but the one is a function of the other.

So let b = b(a) Since the length of the conﬁdence interval in (7) is l = (1/a − 1/b)nS2

n , it clearly follows that that a for which l is shortest is given by

dl/da= 0 which is equivalent to

db da

b a

Thus (8) becomes a2g n (a) = b2

g n (b) By means of this result and (6), it follows that a and b are determined by

for the variance of a normal distribution,” Journal of the American Statistical

Association, 1959, Vol 54, pp 674–682) for n= 2(1)29 and 1 −α = 0.90, 0.95,0.99, 0.995, 0.999)

To summarize then, the shortest (both in actual and expected length)conﬁdence interval for σ2

with conﬁdence coefﬁcient 1 −α (and which is of theform (7)) is given by

Trang 32

nS b

nS a

where a and b are determined by (9).

As a numerical application, let n = 25, σ = 1, and 1 − α = 0.95 Then

zα/2 = 1.96, so that (5) gives [X¯ − 0.392, X¯ + 0.392] Next, for the equal-tails conﬁdence interval given by (7), we have a = 13.120 and b = 40.646, so that

the equal-tails conﬁdence interval itself is given by

25

40 646

25

13 12025

2

25 2

2

25 2

and the ratio of their lengths is approximately 1.07

EXAMPLE 2 Let X1, , X n be i.i.d r.v.’s from the Gamma distribution with parameter β

andα a known positive integer, call it r Then ∑ n

j=1X j is a sufﬁcient statistic for

β (see Exercise 11.1.2(iii), Chapter 11) Furthermore, for each j = 1, , n, the r.v 2X j/β is χ2

2r, since

ββ

( )=

=

∑21

Trang 33

2 2

b

X a

j j

n

j j n

In order to determine the shortest conﬁdence interval, one has to

mini-mize l subject to (10) But this is the same problem as the one we solved in the

second part of Example 1 It follows then that the shortest (both in actual andexpected length) confidence interval with confidence coefficient 1 −α (which

is of the form (11)) is given by (11) with a and b determined by

a g rn a b g rn b g rn t dt

a

b

2 2

2

( )= ( ) and ∫ ( ) = −α

For instance, for n = 7, r = 2 and 1 −α = 0.95, we have, by means of the

tables cited in Example 1, a = 16.5128 and b = 49.3675 Thus the corresponding

shortest conﬁdence interval is then

2

49 3675

2

16 51281

so that the ratio of their length is approximately equal to 1.075

EXAMPLE 3 Let X1, , X n be i.i.d r.v.’s from the Beta distribution with β = 1 and α = θ

unknown

Then ∏n

j=1X j, or −∑n

j=1log X j is a sufﬁcient statistic for θ (See Exercise

11.1.2(iv) in Chapter 11.) Consider the r.v Y j= −2θ logXj It is easily seen thatits p.d.f is 1

–2exp(−y j /2), y j> 0, which is the p.d.f of a χ2

2 This shows that

j n

1 1

Trang 34

b X j

j

n

j j

Considering dl/da= 0 in conjunction with (12) in the same way as it was done

in Example 2, we have that the shortest (both in actual and expected length)conﬁdence interval (which is of the form (13)) is found by numerically solvingthe equations

g n a g n b g n t dt

a

b

However, no tables which would facilitate this solution are available

For example, for n= 25 and 1 −α = 0.95, the equal-tails conﬁdence intervalforθ is given by (13) with a = 32.357 and b = 71.420.

EXAMPLE 4 Let X1, , X n be i.i.d r.v.’s from U(0, θ) Then Y n = X (n) is a sufﬁcient statistic

forθ (see Example 7, Chapter 11) and its p.d.f g n is given by

Trang 35

From (14), we get Pθ(a ≤ Y n/θ ≤ b) = 1 − α which is equivalent to

Pθ[X (n) /b ≤ θ ≤ X (n) /a] = 1 − α Therefore a confidence interval for θ withconfidence coefficient 1 −α is given by

X b

X a

X X n n n

Guenther on “Shortest conﬁdence intervals” in The American Statistican,

1969, Vol 23, Number 1

Exercises

15.2.1 LetΦ be the d.f of the N(0, 1) distribution and let a and b with a < b be

such that Φ(b) − Φ(a) = γ (0 < γ < 1) Show that b − a is minimum if b = c (> 0) and a = −c (See also the discussion of the second part of Example 1.)

15.2.2 Let X1, , X n be independent r.v.’s having the Negative Exponentialdistribution with parameter θ ∈ Ω = (0, ∞), and set U = ∑ n

i=1X i

ii) Show that the r.v U is distributed as Gamma with parameters (n,θ) and

that the r.v 2U/θ is distributed as χ2

;

Định dạng
Số trang	70
Dung lượng	430,27 KB