Also anappropriate loss function, corresponding to δ, is of the following form:θθ ωω θθ θθ As in the point estimation case, we would like to determine a decisionfunctionδ for which the c
Trang 113.8 Applications of LR Tests: Contingency Tables, Goodness-of-Fit Tests
Now we turn to a slightly different testing hypotheses problem, where the LR
is also appropriate We consider an r experiment which may result in k possibly different outcomes denoted by O j , j = 1, , k In n independent repetitions of the experiment, let p j be the (constant) probability that each one
of the trials will result in the outcome O j and denote by X j the number of trials
which result in O j , j = 1, , k Then the joint distribution of the X’s is the
Multinomial distribution, that is,
We may suspect that the p’s have certain specified values; for example, in
the case of a die, the die may be balanced We then formulate this as ahypothesis and proceed to test it on the basis of the data More generally, wemay want to test the hypothesis that θ lies in a subset ωωω of ΩΩΩ
Consider the case that H :θθθθθ ∈ ωωω = {θθθθθ0}= {(p10, , p k0)′} Then, under ωωω,
x x k p p
x k
0 1
and H is rejected if −2 logλ > C The constant C is determined by the fact that
−2logλ is asymptotically χ2
k−1 distributed under H, as it can be shown on the
basis of Theorem 6, and the desired level of significanceα
Now consider r events A i , i = 1, , r which form a partition of the sample
spaceS and let {B j , j = 1, , s} be another partition of S Let p ij = P(A i ∩ B j)and let
p i p ij p j p ij
r s
=
∑
Trang 2Then, clearly, p i. = P(A i ), p .j = P(B j) and
p i p j p ij
j s
i r
j s
Furthermore, the events {A1, , A r } and {B1, , B s} are independent if and
female, and B may denote educational status comprising the levels B1
(el-ementary school graduate), B2 (high school graduate), B3 (college graduate),
It is then clear that
X i X j n j
Letθθθθθ = (p ij , i = 1, , r, j = 1, , s)′ Then the set ΩΩΩ of all possible values of
Under the above set-up, the problem of interest is that of testing whether
the characteristics A and B are independent That is, we want to test the existence of probabilities p i , q j , i = 1, , r, j = 1, , s such that H:p ij = p i q j,
i = 1, , r, j = 1, , s Since for i = 1, , r − 1 and j = 1, , s − 1 we have the r + s − 2 independent linear relationships
p ij p i p ij q j
i r
it follows that the set ω, specified by H, is an (r + s − 2)-dimensional subsetω
ofΩΩ
Next, if x ij is the observed value of X ij and if we set
x i x ij x j x ij
r s
=
∑
Trang 3the likelihood function takes the following forms under ΩΩ and ωωω, respectively.WritingΠi,j instead of Πr
i j
ij
i j
i j x
ij
i j
i j
i x j x
ij
i j
i j
i x
i
j x
i x j x
i
x x
i
s x
j i
i j
i x
i
x s x
i
i x
i
j x
ij i i j j
x n
x n
ij
i j
ij
i j x
ij
i j
i x
i
j x
x n x n
i x
i
j x
j
ij x
i j
i x
i
j x
j n ij x
Trang 4Thisχ2 r.v can be used for testing the hypothesis
=
∑ x j np np j
j j
k
and reject H if χ2
ω is too large, in the sense of being greater than a certain
constant C which is specified by the desired level of the test It can further be
shown that, under ωω,χ2
f with f = (r − 1)(s − 1), as can be shown Once
more this test is asymptotically equivalent to the corresponding test based on
−2logλ
Tests based on chi-square statistics are known as chi-square tests or
goodness-of-fit tests for obvious reasons.
ωω= ⋅ as claimed in the discussion inthis section
In Exercises 13.8.2–13.8.9 below, the test to be used will be the appropriate χ2test
13.8.2 Refer to Exercise 13.7.2 and test the hypothesis formulated there atthe specified level of significance by using a χ2
-goodness-of-fit test Also,compare the cut-off point with that found in Exercise 13.7.2(i)
Trang 513.8.3 A die is cast 600 times and the numbers 1 through 6 appear with thefrequencies recorded below.
At the level of significanceα = 0.1, test the fairness of the die
13.8.4 In a certain genetic experiment, two different varieties of a certainspecies are crossed and a specific characteristic of the offspring can only occur
at three levels A, B and C, say According to a proposed model, the ties for A, B and C are 1
probabili-12, 3
12 and 8
12, respectively Out of 60 offsprings, 6, 18,
and 36 fall into levels A, B and C, respectively Test the validity of the
proposed model at the level of significanceα = 0.05
13.8.5 Course work grades are often assumed to be normally distributed In
a certain class, suppose that letter grades are given in the following manner: A for grades in [90, 100], B for grades in [75, 89], C for grades in [60, 74], D for grades in [50, 59] and F for grades in [0, 49] Use the data given below to check the assumption that the data is coming from an N(75, 92) distribution For thispurpose, employ the appropriate χ2
test and take α = 0.05
13.8.6 It is often assumed that I.Q scores of human beings are normallydistributed Test this claim for the data given below by choosing appropriatelythe Normal distribution and taking α = 0.05
x ≤ 90 90 < x ≤ 100 100 < x ≤ 110 110 < x ≤ 120 120 < x ≤ 130 x > 130
(Hint: Estimate μ and σ2
from the grouped data; take the midpoints for thefinite intervals and the points 65 and 160 for the leftmost and rightmostintervals, respectively.)
13.8.7 Consider a group of 100 people living and working under very similarconditions Half of them are given a preventive shot against a certain diseaseand the other half serve as control Of those who received the treatment, 40did not contract the disease whereas the remaining 10 did so Of those nottreated, 30 did contract the disease and the remaining 20 did not Test effec-tiveness of the vaccine at the level of significanceα = 0.05
Trang 613.8.8 On the basis of the following scores, appropriately taken, testwhether there are gender-associated differences in mathematical ability (as isoften claimed!) Take α = 0.05.
Boys: 80 96 98 87 75 83 70 92 97 82Girls: 82 90 84 70 80 97 76 90 88 86(Hint: Group the grades into the following six intervals: [70, 75), [75, 80), [80,85), [85, 90), [90, 100).)
13.8.9 From each of four political wards of a city with approximately thesame number of voters, 100 voters were chosen at random and their opinionswere asked regarding a certain legislative proposal On the basis of the datagiven below, test whether the fractions of voters favoring the legislative pro-posal under consideration differ in the four wards Take α = 0.05
13.9 Decision-Theoretic Viewpoint of Testing Hypotheses
For the definition of a decision, loss and risk function, the reader is referred toSection 6, Chapter 12
Let X1, , X n be i.i.d r.v.’s with p.d.f f(·;θθθθθ), θθθθθ ∈ ΩΩΩ ⊆r
, and let ω be aω(measurable) subset of ΩΩ Then the hypothesis to be tested is H: θθθθθ ∈ωωω against
the alternative A :θθθθθ ∈ωωc
Let B be a critical region Then by setting z = (x1, ,
x n)′, in the present context a non-randomized decision function δ = δ(z) isdefined as follows:
⎩
10
,,
ifotherwise
B
Trang 7We shall confine ourselves to non-randomized decision functions only Also anappropriate loss function, corresponding to δ, is of the following form:
θθ ωω
θθ θθ
As in the point estimation case, we would like to determine a decisionfunctionδ for which the corresponding risk would be uniformly (in θθθθθ) smallerthan the risk corresponding to any other decision function δ* Since this is not
feasible, except for trivial cases, we are led to minimax decision and Bayes
decision functions corresponding to a given prior p.d.f on ΩΩ Thus in the casethatω = {θθθθθω 0} and ωc
= {θθθθθ1},δ is minimax if
max R[ ( ) ( )θθ0; δ , R θθ1;δ ]≤max R[ (θθ0;δ* ,) ( R θθ1;δ*) ]
for any other decision function δ*
Regarding the existence of minimax decision functions, we have the result
below The r.v.’s X1, , X n is a sample whose p.d.f is either f(·;θθθθθ0) or else
f(·;θθθθθ1) By setting f0= f(·; θθθθθ0) and f1= f(·; θθθθθ1), we have
Let X1, , X n be i.i.d r.v.’s with p.d.f f(·; θθθθθ), θθθθθ ∈ ΩΩΩ = {θθθθθ0, θθθθθ1} We are
interested in testing the hypothesis H :θθθθθ = θθθθθ0 against the alternative A :θθθθθ = θθθθθ1 atlevelα Define the subset B of n
THEOREM 7
Trang 8δ z( )=⎧⎨ z∈
⎩
10
,,
ifotherwise,
Let A be any other (measurable) subset of n
and let δ* be the correspondingdecision function Then
R(0; and Rδ*)=L P1 0(Z∈A) (1;δ*)=L P2 1(Z∈A c)
Consider R(0; δ) and R(0; δ*) and suppose that R(0; δ*) ≤ R(0; δ) This is
equivalent to L1P0(Z∈ A) ≤ L1P0(Z∈ B), or
P0(Z∈A)≤α
Then Theorem 1 implies that P1(Z∈A) ≤ P1(Z∈B) because the test defined
by (47) is MP in the class of all tests of level ≤α Hence
P A c P B c L P A c L P B c
1(Z∈ )≥ 1(Z∈ ), or 2 1(Z∈ )≥ 2 1(Z∈ ),
or equivalently, R(1; δ*) ≥ R(1; δ) That is, if
R(0; Rδ*)≤ ( )0; then Rδ , ( )1;δ ≤R(1; δ* ) (48)Since by assumption (0; δ) = (1; δ), we have
max R[ (0;δ* ,) ( R1;δ*) ]=R(1; δ*)≥R( )1;δ =max R[ ( ) ( )0;δ , R 1; δ ],
(49)whereas if R(0; δ) < R(0; δ*), then
max R[ (0;δ* ,) (R1; δ*) ]≥R(0; δ*)>R( )0; δ =max R[ ( ) ( )0;δ , R1; δ ]
(50)Relations (49) and (50) show that δ is minimax, as was to be seen ▲
REMARK 7 It follows that the minimax decision function defined by (46) is
an LR test and, in fact, is the MP test of level P0(Z ∈ B) constructed in
Theorem 1
We close this section with a consideration of the Bayesian approach Inconnection with this it is shown that, corresponding to a given p.d.f on ΩΩ ={θθθθθ0,θθθθθ1}, there is always a Bayes decision function which is actually an LR test.More precisely, we have
Trang 9Let X1, , X n be i.i.d r.v.’s with p.d.f f(·;θθθθθ), θθθθθ ∈ ΩΩΩ = {θθθθθ0,θθθθθ1} and let λ0= {p0,
p1} (0 < p0 < 1) be a probability distribution on ΩΩΩ Then for testing the
hypothesis H :θθθθθ = θθθθθ0 against the alternative A :θθθθθ = θθθθθ1, there exists a Bayes
decision functionδλ 0corresponding to λ0= {p0, p1}, that is, a decision rule whichminimizes the average risk R(θθθθθ0;δ)p0+ R(θθθθθ1;δ)p1, and is given by
δλ 0
10
( )=⎧⎨ ∈
⎩
,,
ifotherwise,
ifotherwise,
REMARK 8 It follows that the Bayesian decision function is an LR test and
is, in fact, the MP test for testing H against A at the level P0(Z∈ B), as follows
by Theorem 1
THEOREM 8
Trang 10The following examples are meant as illustrations of Theorems 7 and 8.
Let X1, , X n be i.i.d r.v.’s from N(θ, 1) We are interested in determiningthe minimax decision function δ for testing the hypothesis H:θ = θ0 against the
alternative A :θ = θ1 We have
f f
n
z z
;
;
expexp
θθ
θ θ
1 0
1 0 1 2 0 2
12
( ) ( )= [ ( − ) ]
so that f(z;θ1)> Cf(z;θ0) is equivalent to
2 0 2
01
Hence C0= 0.53, as is found by the Normal tables
Therefore the minimax decision function is given by
Trang 11and the power of the test is
corresponding to the minimax δ given above
Refer to Example 13 and determine the Bayes decision function ing to λ0= {p0, p1}
correspond-From the discussion in the previous example it follows that the Bayesdecision function is given by
ifotherwise,
,
ifotherwise
x
The type-I error probability of this test is Pθ0(X ¯ > 0.55) = P[N(0, 1) > 2.75] =
1− Φ(2.75) = 0.003 and the power of the test is Pθ 1(X X ¯ > 0.55) = P[N(1, 1) > −
2.25] = Φ(2.25) = 0.9878 Therefore relation (51) gives that the Bayes riskcorresponding to {2
n
z z
;
θθ
θθ
θθ1
0 1 0
1
11
( ) ( )=⎛⎝⎜
Trang 12j=1X j is B(n, θ) and for C0= 13, we have P0.5(X> 13) = 0.0577 and
P0.75(X > 13) = 0.7858, so that P0.75(X≤ 13) = 0.2142 With the chosen values of
L1 and L2, it follows then that relation (46) is satisfied Therefore the minimaxdecision function is determined by
ifotherwise
x
Furthermore, the minimax risk is equal to 0.5 × 0.2142 = 0.1071
Trang 1314.1 Some Basic Theorems of Sequential Sampling
In all of the discussions so far, the random sample Z1, , Z n, say, that we have
dealt with was assumed to be of fixed size n Thus, for example, in the point estimation and testing hypotheses problems the sample size n was fixed be-
forehand, then the relevant random experiment was supposed to have been
independently repeated n times and finally, on the basis of the outcomes, a
point estimate or a test was constructed with certain optimal properties.Now, whereas in some situations the random experiment under considera-tion cannot be repeated at will, in many other cases this is, indeed, the case Inthe latter case, as a rule, it is advantageous not to fix the sample size inadvance, but to keep sampling and terminate the experiment according to a(random) stopping time
Let {Z n } be a sequence of r.v.’s A stopping time (defined on this sequence) is
a positive integer-valued r.v N such that, for each n, the event (N = n) depends
on the r.v.’s Z1, , Z n alone
REMARK 1 In certain circumstances, a stopping time N is also allowed to
take the value ∞ but with probability equal to zero In such a case and when
forming EN, the term ∞ · 0 appears, but that is interpreted as 0 and no problemarises
Next, suppose we observe the r.v.’s Z1, Z2, one after another, a singleone at a time (sequentially), and we stop observing them after a specified eventoccurs In connection with such a sampling scheme, we have the followingdefinition
A sampling procedure which terminates according to a stopping time is called
Trang 14Thus a sequential procedure terminates with the r.v Z N , where Z N isdefined as follows:
the value of Z N at s∈S is equal to Z N s( )( )s (1)
Quite often the partial sums S N = Z1+ · · · + Z N defined by
S N( )s =Z s1( )+ ⋅ ⋅ ⋅ +Z N s( )( )s, s∈S (2)are of interest and one of the problems associated with them is that of finding
the expectation ES N of the r.v S N Under suitable regularity conditions, thisexpectation is provided by a formula due to Wald
(Wald’s lemma for sequential analysis) For j ≥ 1, let Z j be independent r.v.’s(not necessarily identically distributed) with identical first moments such that
E|Z j|= M < ∞, so that EZ j=μ is also finite Let N be a stopping time, defined
on the sequence {Z j }, j ≥ 1, and assume that EN is finite Then E|S N|< ∞ and
ES N=μEN, where S N is defined by (2) and Z N is defined by (1)
The proof of the theorem is simplified by first formulating and proving a
lemma For this purpose, set Y j = Z j −μ, j ≥ 1 Then the r.v.’s Y1, Y2, are
independent, EY j= 0 and have (common) finite absolute moment of first order
to be denoted by m; that is, E|Y j|= m < ∞ Also set T N = Y1+ · · · + Y N, where
Y N and T N are defined in a way similar to the way Z N and S N are defined by (1)and (2), respectively Then we will show that
ET N < ∞ and ET N =0 (3)
In all that follows, it is assumed that all conditional expectations, given N = n, are finite for all n for which P(N = n) > 0 We set E(Y j |N = n) = 0 (accordingly,
E(|Y j ||N = n) = 0 for those n’s for which P(N = n) = 0).
In the notation introduced above:
The event (N = n) depends only on Y1, , Y n and hence, for j > n,
E(|Y j ||N = n) = E|Y j|= m Therefore (4) becomes
Trang 15n j j
1
(6)where the equality ∑∞
j=1P(N ≥ j) = ∑∞
j=1jP(N = j) is shown in Exercise 14.1.1.
Relation (6) establishes part (i)
ii) By setting p jn = E(|Y j ||N = n)P(N = n), this part asserts that
p jn p p p p n p n p nn j
,
and
n j j
Mathematical Analysis, Addison-Wesley, 1957). ▲
PROOF OF THEOREM 1 Since T N = S N−μN, it suffices to show (3) To this
n
j j n
n
j j n
Trang 16j j n
n
j
n j j
This is so because the event (N = n) depends only on Y1, , Y n, so that, for
j > n, E(Y j |N = n) = EY j= 0 Therefore (9) yields
Relations (7) and (12) complete the proof of the theorem ▲
Now consider any r.v.’s Z1, Z2, and let C1, C2 be two constants such that
C1< C2 Set S n = Z1+ · · · + Z n and define the random quantity N as follows: N
is the smallest value of n for which S n ≤ C1 or S n ≥ C2 If C1< S n < C2 for all n, then set N = ∞ In other words, for each s ∈S, the value of N at s, N(s), is assigned as follows: Look at S n (s) for n ≥ 1, and find the first n, N = N(s), say, for which S N (s) ≤ C1 or S N (s) ≥ C2 If C1< S n (s) < C2 for all n, then set N(s)= ∞.Then we have the following result
Trang 17Let Z1, Z2, be i.i.d r.v.’s such that P(Z j = 0) ≠ 1 Set S n = Z1+ · · · + Z n and
for two constants C1, C2with C1< C2, define the r quantity N as the smallest n for which S n ≤ C1 or S n ≥ C2; set N = ∞ if C1< S n < C2 for all n Then there exist
c > 0 and 0 < r < 1 such that
PROOF The assumption P(Z j = 0) ≠ 1 implies that P(Z j > 0) > 0, or P(Z j< 0)
> 0 Let us suppose first that P(Z j> 0) > 0 Then there exists ε > 0 such that
P(Z j>ε) = δ > 0 In fact, if P(Z j >ε) = 0 for every ε > 0, then, in particular,
P(Z j > 1/n) = 0 for all n But (Z j > 1/n) ↑ (Z j > 0) and hence 0 = P(Z j > 1/n) →
P(Z j> 0) > 0, a contradiction
Thus for the case that P(Z j> 0) > 0, we have that
There exists ε>0 such that P Z( j>ε)= >δ 0 (14)
With C1, C2 as in the theorem and ε as in (14), there exists a positive integer m
the first inclusion being obvious because there are m Z’s, each one of which is
greater than ε, and the second inclusion being true because of (15) Thus
the inequality following from (17) and the equalities being true because of the
independence of the Z’s and (14) Clearly
S km Z jm Z j m j
This is so because, if for some j = 0, 1, , k − 1, we suppose that Z jm+1+ · · · +
Z (j+1)m > C2− C1, this inequality together with S jm > C1 would imply that S (j+1)m
> C , which is in contradiction to C < S < C , i = 1, , km Next,
THEOREM 2
Trang 18, , ,
,I
the first inclusion being obvious from the definition of N and the second one
following from (18) Therefore
j k
j k
0 1
0 1
I
the last inequality holding true because of (16) and the equality before it by the
independence of the Z’s Thus
The theorem just proved has the following important corollary
Under the assumptions of Theorem 2, we have (i) P(N < ∞) = 1 and (ii)
Trang 19by Theorem 2 in Chapter 2 But P(A n) ≤ cr n
by the theorem Thus lim
P(A n)= 0, so that P(A) = 0, as was to be shown.
ii) We have
c r r
REMARK 2 The r.v N is positive integer-valued and it might also take on the
value∞ but with probability 0 by the first part of the corollary On the other
hand, from the definition of N it follows that for each n, the event (N = n) depends only on the r.v.’s Z1, , Z n Accordingly, N is a stopping time by
Definition 1 and Remark 1
Exercises
14.1.1 For a positive integer-valued r.v N show that EN= ∑∞
n=1P(N ≥ n).
14.1.2 In Theorem 2, assume that P(Z j< 0) > 0 and arrive at relation (13)
14.2 Sequential Probability Ratio Test
Although in the point estimation and testing hypotheses problems discussed inChapter 12 and 13, respectively (as well as in the interval estimation problems
to be dealt with in Chapter 15), sampling according to a stopping time is, ingeneral, profitable, the mathematical machinery involved is well beyond thelevel of this book We are going to consider only the problem of sequentiallytesting a simple hypothesis against a simple alternative as a way of illustratingthe application of sequential procedures in a concrete problem
To this end, let X1, X2, be i.i.d r.v.’s with p.d.f either f0 or else f1, and
suppose that we are interested in testing the (simple) hypothesis H: the true density is f0 against the (simple) alternative A: the true density is f1, at level ofsignificanceα (0 < α < 1) without fixing in advance the sample size n.
In order to simplify matters, we also assume that {x ∈ ; f0(x) > 0} =
1 1 1
0 1 0
0 1, , ; ,
Trang 20We shall use the same notation λn for λn (x1, , x n ; 0, 1), where x1, , x n are
the observed values of X1, , X n
For testing H against A, consider the following sequential procedure: As long as a < λn < b, take another observation, and as soon as λ n ≤ a, stop sampling and accept H and as soon as λn ≥ b, stop sampling and reject H.
By letting N stand for the smallest n for which λn ≤ a or λ n ≥ b, we have that
N takes on the values 1, 2, and possibly ∞, and, clearly, for each n, the event (N = n) depends only on X1, , X n Under suitable additional assumptions,
we shall show that the value ∞ is taken on only with probability 0, so that N will
be a stopping time
Then the sequential procedure just described is called a sequential
prob-ability ratio test (SPRT) for obvious reasons.
In what follows, we restrict ourselves to the common set of positivity of
f0 and f1, and for j = 1, , n, set
At this point, we also make the assumption that P i [f0(X1)≠ f1(X1)]> 0 for
i = 0, 1; equivalently, if C is the set over which f0 and f1 differ, then it is assumedthat∫C f0(x)dx> 0 and ∫C f1(x)dx> 0 for the continuous case This assumption is
equivalent to P i (Z1≠ 0) > 0 under which the corollary to Theorem 2 applies.Summarizing, we have the following result
Let X1, X2, be i.i.d r.v.’s with p.d.f either f0 or else f1, and suppose that
1 1 1
0 1 0
1 0
1, log , , ,
and
S n Z j j
For two numbers a and b with 0 < a < b, define the random quantity N as the smallest n for which λn ≤ a or λ n ≥ b; equivalently, the smallest n for which
S n ≤ loga or S n ≥ logb for all n Then
Trang 21terminate with probability one and acceptance or rejection of H, regardless of
the true underlying density
In the formulation of the proposition above, the determination of a and b
was postponed until later At this point, we shall see what is the exact
determi-nation of a and b, at least from theoretical point of view However, the actual
identification presents difficulties, as will be seen, and the use of approximatevalues is often necessary
To start with, let α and 1 − β be prescribed first and second type of errors,
respectively, in testing H against A, and let α < β < 1 From their own tion, we have
= ( ≥ )+ ( < < ≥ )+ ⋅ ⋅ ⋅+ < <
,, ,,a< n <b, n≥b
( λ −1 λ )+ ⋅ ⋅ ⋅ (20)and
= ( ≤ )+ ( < < ≤ )+ ⋅ ⋅ ⋅+ < <
,, ,a< n <b, n≤a
( λ −1 λ )+ ⋅ ⋅ ⋅ (21)Relations (20) and (21) allow us to determine theoretically the cut-off points
a and b when α and β are given
In order to find workable values of a and b, we proceed as follows For each n, set
1
1 0
1 0
1
1 0
1 0
Trang 22In other words, T′n is the set of points in n
for which the SPRT terminates
with n observations and accepts H, while T″n is the set of points in n
for which
the SPRT terminates with n observations and rejects H.
In the remainder of this section, the arguments will be carried out for the
case that the X j’s are continuous, the discrete case being treated in the sameway by replacing integrals by summation signs Also, for simplicity, the differ-entials in the integrals will not be indicated
From (20), (22) and (23), one has
But on T″n , f 1n /f on ≥ b, so that f 0n ≤ (1/b)f 1n Therefore
n T n
1
1 1
in T n
1 1
n T n
n T n
≥ −
− ≤
1
1 , b . (29)
Relation (29) provides us with a lower bound and an upper bound for the
actual cut-off points a and b, respectively.
βα
βα
α β
,
,
and
and suppose that the SPRT is carried out by employing the cut-off points a′
and b ′ given by (30) rather than the original ones a and b Furthermore, let α′
Trang 23and 1 −β′ be the two types of errors associated with a′ and b′ Then replacing
α, β, a and b by α′, β′, a′ and b′, respectively, in (29) and also taking into
consideration (30), we obtain
11
11
− ′
βα
βα
βα
βα
and (31)That is,
Summarizing the main points of our derivations, we have the following result
For testing H against A by means of the SPRT with prescribed error
probabili-tiesα and 1 − β such that α < β < 1, the cut-off points a and b are determined
by (20) and (21) Relation (30) provides approximate cut-off points a ′ and b′
with corresponding error probabilities α′ and 1 − β′, say Then relation (32)provides upper bounds for α′ and 1 − β′ and inequality (33) shows that theirsumα′ + (1 − β′) is always bounded above by α + (1 − β)
REMARK 3 From (33) it follows that α′ > α and 1 − β′ > 1 − β cannot happensimultaneously Furthermore, the typical values of α and 1 − β are such as 0.01,0.05 and 0.1, and then it follows from (32) that α′ and 1 − β′ lie close to α and
1−β, respectively For example, for α = 0.01 and 1 − β = 0.05, we have α′ <0.0106 and 1 −β′ < 0.0506 So there is no serious problem as far as α′ and 1 −
β′ are concerned The only problem which may arise is that, because a′ and b′ are used instead of a and b, the resulting α′ and 1 − β′ are too small compared
toα and 1 − β, respectively As a consequence, we would be led to taking amuch larger number of observations than would actually be needed to obtain
α and β It can be argued that this does not happen
Exercise
14.2.1 Derive inequality (28) by using arguments similar to the ones ployed in establishing relation (27)
em-PROPOSITION 2
Trang 2414.3 Optimality of the SPRT-Expected Sample Size
An optimal property of the SPRT is stated in the following theorem, whoseproof is omitted as being well beyond the scope of this book
For testing H against A, the SPRT with error probabilities α and 1 − β
minimizes the expected sample size under both H and A (that is, it minimizes
E0N and E1N) among all tests (sequential or not) with error probabilities
bounded above by α and 1 − β and for which the expected sample size is finite
under both H and A.
The remaining part of this section is devoted to calculating the expectedsample size of the SPRT with given error probabilities, and also finding ap-proximations to the expected sample size
So consider the SPRT with error probabilities α and 1 − β, and let N be the
associated stopping time Then we clearly have
Thus formula (34) provides the expected sample size of the SPRT under both
H and A, but the actual calculations are tedious This suggests that we should
try to find an approximate value to E i N, as follows By setting A = loga and
B = logb, we have the relationships below:
i i
j
i i
n
i i n
( a or b)=(Z ≤A or Z ≥B) (36)From the right-hand side of (35), all partial sums ∑j
=1Z i , j = 1, , n − 1 lie between A and B and it is only the ∑n
i=1Z i which is either ≤A or ≥B, and this is due to the nth observation Z n We would then expect that ∑n
Trang 25Therefore we obtain
E S0 N ≈ −( )1 α A+αB and E S1 N ≈ −( )1 β A+βB (37)
On the other hand, by assuming that E i |Z1| < ∞, i = 0, 1, Theorem 1 gives
E i S N = (E i N)(E i Z1) Hence, if also E i Z1 ≠ 0, then E i N = (E i S N )/(E i Z1) Byvirtue of (37), this becomes
Thus we have the following result
In the SPRT with error probabilities α and 1 − β, the expected sample size E i N,
i = 0, 1 is given by (34) If furthermore E i |Z1|< ∞ and E i Z1≠ 0, i = 0, 1, relation (38) provides approximations to E i N, i= 0, 1
REMARK 4 Actually, in order to be able to calculate the approximations
given by (38), it is necessary to replace A and B by their approximate values
taken from (30), that is,
βα
tive A :θ = 0.05 with α = 0.1, 1 − β = 0.05 Find the expected sample sizes under
both H and A and compare them with the fixed sample size of the MP test for testing H against A with the same α and 1 − β as above
14.3.2 Discuss the same questions as in the previous exercise if the X j’sare independently distributed as Negative Exponential with parameter θ ∈
Ω = (0, ∞)
14.4 Some Examples
This chapter is closed with two examples In both, the r.v.’s X1, X2, are i.i.d
with p.d.f f(·;θ), θ ∈ Ω ⊆ , and for θ0,θ1∈ Ω with θ0<θ1, the problem is that
of testing H :θ = θ0 against A :θ = θ1 by means of the SPRT with error tiesα and 1 − β Thus in the present case f = f(·;θ) and f = f(·;θ)
probabili-PROPOSITION 3
Trang 26What we explicitly do, is to set up the formal SPRT and for selectednumerical values of α and 1 − β, calculate a′, b′, upper bounds for α′ and
1− β′, estimate E i N, i = 0, 1, and finally compare the estimated E i N, i= 0, 1with the size of the fixed sample size test with the same error probabilities
Let X1, X2, be i.i.d r.v.’s with p.d.f
1 0
11
and we continue sampling as long as
A n
X j B n
j n
1111
11
1 0
1 0
0 1
1
1 0
1 0
0 1
θθ
θθ
11
11
= ( ) ( )= ( ( −− ) )+ −−
log logθ θ log ,
θθ
so that
1 0
11
For a numerical application, take α = 0.01 and 1 − β = 0.05 Then the cut-off
points a and b are approximately equal to a ′ and b′, respectively, where a′ and
b′ are given by (30) In the present case,
.
.and
For the cut-off points a ′ and b′, the corresponding error probabilities α′ and
1−β′ are bounded as follows according to (32):
. .and β
Next, relation (39) gives
A≈log 5 = − B≈log =
At this point, let us suppose that θ =–3 and θ =–4 Then
EXAMPLE 1
Trang 27logθ θ log log log ,
θθ
1 0
11
5
11
1
1 2 0 2
By using the same values of α and 1 − β as in the previous example, we have
the same A and B as before Taking θ0= 0 and θ1= 1, we have
E Z0 1= −0 5 and E Z1 1=0 5
Thus relation (38) gives
E N0 ≈2 53 and E N1 ≈3 63
Now the fixed sample size MP test is given by (13) in Chapter 13 From this
we find that n ≈ 15.84 Again both E0N and E1N compare very favorably with
the fixed value of n which provides the same protection.
EXAMPLE 2
Trang 2815.1 Confidence Intervals
Let X1, , X n be i.i.d r.v.’s with p.d.f f(·;θθθθθ) θθθθθ ∈ ΩΩΩ ⊆r
In Chapter 12, weconsidered the problem of point estimation of a real-valued function of θθθθθ, g(θθθθθ) That is, we considered the problem of estimating g(θθθθθ) by a statistic (based on
the X’s) having certain optimality properties.
In the present chapter, we return to the estimation problem, but in adifferent context First, we consider the case that θ is a real-valued parameterand proceed to define what is meant by a random interval and a confidenceinterval
DEFINITION 1 A random interval is a finite or infinite interval, where at least one of the end
points is an r.v
DEFINITION 2 Let L(X1, , X n ) and U(X1, , X n ) be two statistics such that L(X1, , X n)
≤ U(X1, , X n ) We say that the r interval [L(X1, , X n ), U(X1, , X n)] is
a confidence interval for θ with confidence coefficient 1 − α (0 < α < 1) if
P L Xθ[ ( 1, ,X n)≤ ≤θ U X( 1, ,X n) ]≥ −1 α for all θ∈Ω (1)
Also we say that U(X1, , X n ) and L(X1, , X n ) is an upper and a lower
confidence limit for θ, respectively, with confidence coefficient 1 − α, if for all
Trang 29r interval [L(X1, , X n ), U(X1, , X n)] covers the parameter θ no matterwhatθ ∈ Ω is.
The interpretation of this statement is as follows: Suppose that the r
experiment under consideration is carried out independently n times, and if x j
is the observed value of X j , j = 1, , n, construct the interval [L(x1, , x n),
U(x1, , x n )] Suppose now that this process is repeated independently N times, so that we obtain N intervals Then, as N gets larger and larger, at least
(1−α)N of the N intervals will cover the true parameter θ.
A similar interpretation holds true for an upper and a lower confidencelimit of θ
REMARK 1 By relations (1) and (2) and the fact that
it follows that, if L(X1, , X n ) and U(X1, , X n) is a lower and an upperconfidence limit for θ, respectively, each with confidence coefficient 1 −1
that we would be interested in finding the shortest confidence interval within a
certain class of confidence intervals This will be done explicitly in a number ofinteresting examples
At this point, it should be pointed out that a general procedure forconstructing a confidence interval is as follows: We start out with an r.v
T n(θ) = T(X1, , X n;θ) which depends on θ and on the X’s only through a
sufficient statistic of θ, and whose distribution, under Pθ, is completely
deter-mined Then L n = L(X1, , X n ) and U n = U(X1, , X n) are some rather
simple functions of T n(θ) which are chosen in an obvious manner
The examples which follow illustrate the point
Trang 30dence interval (and also the shortest confidence interval within a certain class)forθ with confidence coefficient 1 − α.
EXAMPLE 1 Let X1, , X n be i.i.d r.v.’s from N(μ, σ2
) First, suppose that σ is known, sothatμ is the parameter, and consider the r.v T n(μ) = √n(X¯ − μ)/σ Then T n(μ)
depends on the X’s only through the sufficient statistic X¯ of μ and its
distribu-tion is N(0, 1) for all μ
Next, determine two numbers a and b (a < b) such that
P a[ ≤N( )0 1, ≤b]= −1 α (3)From (3), we have
X a n
is a confidence interval for μ with confidence coefficient 1 − α Its length is
equal to (b − a)σ/√n From this it follows that, among all confidence intervals
with confidence coefficient 1 −α which are of the form (4), the shortest one is
that for which b − a is smallest, where a and b satisfy (3) It can be seen (see also Exercise 15.2.1) that this happens if b = c (> 0) and a = −c, where c is the
upperα/2 quantile of the N(0, 1) distribution which we denote by zα/2 fore the shortest confidence interval for μ with confidence coefficient 1 − α(and which is of the form (4)) is given by
Next, assume that μ is known, so that σ2
is the parameter, and considerthe r.v
n for all σ2
Now determine two numbers a and b (0 < a < b) such that
P a( ≤χn2≤b)= −1 α (6)From (6), we obtain
Trang 31nS a
2 2 21
nS a
is a confidence interval for σ2
with confidence coefficient 1 −α and its length
is equal to (1/a − 1/b)nS2
n The expected length is equal to (1/a − 1/b)nσ2
Now, although there are infinite pairs of numbers a and b satisfying (6), in
practice they are often chosen by assigning mass α/2 to each one of the tails ofthe χ2
n distribution However, this is not the best choice because then thecorresponding interval (7) is not the shortest one For the determination of the
shortest confidence interval, we work as follows From (6), it is obvious that a and b are not independent of each other but the one is a function of the other.
So let b = b(a) Since the length of the confidence interval in (7) is l = (1/a − 1/b)nS2
n , it clearly follows that that a for which l is shortest is given by
dl/da= 0 which is equivalent to
db da
b a
Thus (8) becomes a2g n (a) = b2
g n (b) By means of this result and (6), it follows that a and b are determined by
for the variance of a normal distribution,” Journal of the American Statistical
Association, 1959, Vol 54, pp 674–682) for n= 2(1)29 and 1 −α = 0.90, 0.95,0.99, 0.995, 0.999)
To summarize then, the shortest (both in actual and expected length)confidence interval for σ2
with confidence coefficient 1 −α (and which is of theform (7)) is given by
Trang 32nS b
nS a
where a and b are determined by (9).
As a numerical application, let n = 25, σ = 1, and 1 − α = 0.95 Then
zα/2 = 1.96, so that (5) gives [X¯ − 0.392, X¯ + 0.392] Next, for the equal-tails confidence interval given by (7), we have a = 13.120 and b = 40.646, so that
the equal-tails confidence interval itself is given by
25
40 646
25
13 12025
2
25 2
2
25 2
and the ratio of their lengths is approximately 1.07
EXAMPLE 2 Let X1, , X n be i.i.d r.v.’s from the Gamma distribution with parameter β
andα a known positive integer, call it r Then ∑ n
j=1X j is a sufficient statistic for
β (see Exercise 11.1.2(iii), Chapter 11) Furthermore, for each j = 1, , n, the r.v 2X j/β is χ2
2r, since
ββ
( )=
=
∑21
Trang 332 2
b
X a
j j
n
j j n
In order to determine the shortest confidence interval, one has to
mini-mize l subject to (10) But this is the same problem as the one we solved in the
second part of Example 1 It follows then that the shortest (both in actual andexpected length) confidence interval with confidence coefficient 1 −α (which
is of the form (11)) is given by (11) with a and b determined by
a g rn a b g rn b g rn t dt
a
b
2 2
2
( )= ( ) and ∫ ( ) = −α
For instance, for n = 7, r = 2 and 1 −α = 0.95, we have, by means of the
tables cited in Example 1, a = 16.5128 and b = 49.3675 Thus the corresponding
shortest confidence interval is then
2
49 3675
2
16 51281
so that the ratio of their length is approximately equal to 1.075
EXAMPLE 3 Let X1, , X n be i.i.d r.v.’s from the Beta distribution with β = 1 and α = θ
unknown
Then ∏n
j=1X j, or −∑n
j=1log X j is a sufficient statistic for θ (See Exercise
11.1.2(iv) in Chapter 11.) Consider the r.v Y j= −2θ logXj It is easily seen thatits p.d.f is 1
–2exp(−y j /2), y j> 0, which is the p.d.f of a χ2
2 This shows that
j n
1 1
Trang 34b X j
j
n
j j
Considering dl/da= 0 in conjunction with (12) in the same way as it was done
in Example 2, we have that the shortest (both in actual and expected length)confidence interval (which is of the form (13)) is found by numerically solvingthe equations
g n a g n b g n t dt
a
b
However, no tables which would facilitate this solution are available
For example, for n= 25 and 1 −α = 0.95, the equal-tails confidence intervalforθ is given by (13) with a = 32.357 and b = 71.420.
EXAMPLE 4 Let X1, , X n be i.i.d r.v.’s from U(0, θ) Then Y n = X (n) is a sufficient statistic
forθ (see Example 7, Chapter 11) and its p.d.f g n is given by
Trang 35From (14), we get Pθ(a ≤ Y n/θ ≤ b) = 1 − α which is equivalent to
Pθ[X (n) /b ≤ θ ≤ X (n) /a] = 1 − α Therefore a confidence interval for θ withconfidence coefficient 1 −α is given by
X b
X a
X X n n n
Guenther on “Shortest confidence intervals” in The American Statistican,
1969, Vol 23, Number 1
Exercises
15.2.1 LetΦ be the d.f of the N(0, 1) distribution and let a and b with a < b be
such that Φ(b) − Φ(a) = γ (0 < γ < 1) Show that b − a is minimum if b = c (> 0) and a = −c (See also the discussion of the second part of Example 1.)
15.2.2 Let X1, , X n be independent r.v.’s having the Negative Exponentialdistribution with parameter θ ∈ Ω = (0, ∞), and set U = ∑ n
i=1X i
ii) Show that the r.v U is distributed as Gamma with parameters (n,θ) and
that the r.v 2U/θ is distributed as χ2
;