Handbook of mathematics for engineers and scienteists part 163 ppt

21.3.3.3 Under the assumption that the null hypothesis H0is true, the distribution of the statistic μ ∗ depends only on the sample size n and is independent of the parameters a and σ2..

Trang 1

where m is the number of degrees of freedom If|ρrom| ≤ 3, then the null hypothesis should

be accepted; if|ρrom|>3, then the null hypothesis should be rejected This test has a fixed

size α =0.0027

2◦ Test for excluding outliers To test the null hypothesis, the following statistic is used:

x ∗ = 1

s ∗ 1≤maxi≤n|X i – m ∗|, (21.3.3.2)

where s ∗ is the sample mean-square deviation The null hypothesis H0for a given size α

is accepted if x ∗ < x1–α , where x α is the quantile of the statistic x ∗ under the assumption

that the sample is normal The values of x α for various n and α can be found in statistical

tables

3◦ Test based on the sample first absolute central moment The test is based on the statistic

μ ∗ = μ ∗ (X

1, , X n) = 1

ns ∗

n

i=1

|X i – m ∗| (21.3.3.3)

Under the assumption that the null hypothesis H0is true, the distribution of the statistic μ ∗ depends only on the sample size n and is independent of the parameters a and σ2 The

null hypothesis H0 for a given size α is accepted if μ α/2 < μ ∗ < μ1–α/2, where μ α is the

α -quantile of the statistic μ ∗ The values of μ α for various n and α can be found in statistical

tables

4◦ Test based on the sample asymmetry coefficient The test is based on the statistic

γ ∗

1 = γ1∗ (X1, , X n) = 1

n (s ∗)3

n

i=1

(X i – m ∗)3 (21.3.3.4)

The null hypothesis H0 for a given size α is accepted if|γ ∗

1|< γ1,1–α/2, where γ1,α is the

α -quantile of the statistic γ ∗1 The values of γ1,α for various n and α can be found in

statistical tables

5◦ Test based on sample excess coefficient The test verifies the closeness of the sample

excess (the test statistic)

γ ∗

2 = γ2∗ (X1, , X n) = 1

n (s ∗)4

n

i=1

(X i – m ∗)4 (21.3.3.5)

and the theoretical excess γ2 + 3 = E{(X – E{X})4}/(Var{X})2, equal to 3 for the

normal law The null hypothesis H0 for a given size α is accepted if the inequality

γ2 ,α/ 2< γ2∗ < γ2 , 1 –α/ 2holds, where γ2 ,α is the α-quantile of the statistic γ2∗ The values of

γ2 ,α for various n and α can be found in statistical tables.

21.3.3-3 Comparison of expectations of two normal populations

Suppose that X and Y are two populations with known variances σ21 and σ22and unknown

expectations a1and a2 Two independent samples X1, , X n and Y1, , Y k are drawn

Trang 2

from the populations and the sample expectations (means) m ∗1 and m ∗2are calculated The hypothesis that the expectations are equal to each other is tested using the statistic

∗

1– m ∗2

σ2

1/n + σ22/k

which has a normal distribution with parameters (0,1) under the assumption that the null

hypothesis H0 : a X = a Y is true

If the variances of the populations are unknown, then either the sample size should be sufficiently large for obtaining a reliable accurate estimator or the variances should coincide; otherwise, the known tests are inefficient If the variances of the populations are equal to

each other, σ12= σ22, then one can test the null hypothesis H0: a X = a Y using the statistic

T = (m ∗1– m ∗2)

*1

n + 1

k

s2

1 (n –1) + s22 (k –1)

n + k –2

+–1 2

which has the t-distribution (Student’s distribution) with v = n + k –2degrees of freedom The choice of the critical region depends on the form of the alternative hypothesis:

1 For the alternative hypothesis H1 : a1 > a2, one should choose a right-sided critical region

2 For the alternative hypothesis H1: a1< a2, one should choose a left-sided critical region

3 For the alternative hypothesis H1 : a1 ≠ a2, one should choose a two-sided critical

region

21.3.3-4 Tests for variances to be equal

Suppose that there are L independent samples

X11, , X1n1; X21, , X2n2; ; X L1, , X Ln L

of sizes n1, , n L drawn from distinct normal populations with unknown expectation a1,

, a L and unknown variances σ12, , σ2L It is required to test the simple hypothesis

H0: σ21 = · · · = σ2

L(the variances of all populations are the same) against the alternative

hypothesis H1that some variances are different

1◦ Bartlett’s test The statistic in this test has the form

b = N ln

1

N

L

i=1

(n i–1)s2i

–

L

i=1

(n i–1) ln s2i , (21.3.3.8) where

N =

L

i=1

(n i–1), s2

j = 1

n j–1

n j

j=1

(X ij – L ∗ j 2, L ∗

j = 1

n j

j=1

X ij.

The statistic b permits reducing the problem of testing the hypothesis that the variances of

normal samples are equal to each other to the problem of testing the hypothesis that the expectations of approximately normal samples are equal to each other If the null hypothesis

H0is true and all n >5, then the ratio

B = b

3(L –1) +

L

i=1

1

n i–1 –

1

N

–1

is distributed approximately according to the chi-square law with L –1degrees of freedom

The null hypothesis H0for a given size α is accepted if B < χ21–α (L –1), where χ2αis the

α -quantile of the chi-square distribution with L –1degrees of freedom

Trang 3

2◦ Cochran’s test If all samples have the same volume (n1=· · · = n L = n), then the null

hypothesis H0is tested against the alternative hypothesis H1using the Cochran statistic

2

max

s2

1 +· · · + s2

L

where s2max= max

1 ≤i≤L s

2

i .

Cochran’s test is, in general, less powerful than the Bartlett test, but it is simpler The

null hypothesis H0for a given size α is accepted if G < G α , where G α is the α-quantile of the statistic G The values of G α for various α, L, and v = n –1can be found in statistical tables

3◦ Fisher’s test For L =2, to test the null hypothesis H0that the variances of two samples coincide, it is most expedient to use Fisher’s test based on the statistic

Ψ = s22

s2 1

where s21 and s22 are the adjusted sample variances of the two samples The statisticΨ has

the F -distribution (Fisher–Snedecor distribution) with n2–1and n1–1degrees of freedom

The one-sided Fisher test verifies the null hypothesis H0: σ12= σ22against the alternative

hypothesis H1 : σ12 < σ22; the critical region of the one-sided Fisher test for a given size α

is determined by the inequalityΨ > Ψ1 –α(n2–1, n1–1)

The two-sided Fisher test verifies the null hypothesis H0: σ12= σ22against the alternative

hypothesis H1: σ21≠σ2

2; the critical region of the two-sided Fisher test for a given size α is

determined by the two inequalitiesΨα/2<Ψ < Ψ1 –α/ 2, whereΨαis the α-quantile of the

F -distribution with parameters n2–1and n1–1

21.3.3-5 Sample correlation

Suppose that a sample X1, , X n is two-dimensional and its elements Xi = (X i1, X i2)

are two-dimensional random variables with joint normal distribution with means a1 and

a2, variances σ21 and σ22, and correlation r It is required to test the hypothesis that the

components X(1)and X(2)of the vector X are independent, i.e., test the hypothesis that the

correlation is zero

Estimation of the correlation r is based on the sample correlation

i=1(X i1– m ∗1)(X i2– m ∗2)

n

i=1(X i1– m ∗1)2n

j=1(X i2– m ∗2)2

, (21.3.3.11)

where

m ∗

n

i=1

X i1, m ∗

n

i=1

X i2.

The distribution of the statistic r ∗ depends only on the sample size n, and the statistic r ∗ itself is a consistent asymptotically efficient estimator of the correlation r.

The null hypothesis H0: r =0that X(1)and X(2)are independent against the alternative

hypothesis H1 : r≠ 0(X(1)and X(2)are dependent) is accepted if the inequality r α/2 < r ∗

< r1–α/2is satisfied Here r α is the α-quantile of the sample correlation under the assumption that the null hypothesis H0is true; the relation r α = –r1–αholds because of symmetry

Trang 4

To construct the confidence intervals, one should use the Fisher transformation

y = arctanh r ∗ = 1

2ln

1+ r ∗

1– r ∗, (21.3.3.12)

which, for n >10, is approximately normal with parameters

E{y} ≈ 1

2 ln

1+ r

1– r +

r

2(n –3), Var{y} ≈ 1

n–3.

21.3.3-6 Regression analysis

Let X1, , X n be the results of n independent observations

X i =

L

j=1

θ j f j (t i ) + ε i

where f1(t), , f L (t) are known functions, θ1, , θ L are unknown parameters, and ε1,

, ε nare random errors known to be independent and normally distributed with zero mean

and with the same unknown variance σ2

The regression parameters are subject to the following constraints:

1 The number of observations n is greater than the number L of unknown parameters.

2 The vectors

fi = (f i (t1), , f i (t n)) (i =1,2, , L) (21.3.3.13) must be linearly independent

1◦ Estimation of unknown parameters θ1, , θ L and construction of (one-dimensional)

confidence intervals for them.

To solve this problem, we consider the sum of squares

S2 = S2(θ

1, , θ L) =

n

i=1

[X i – θ1f1(t i) –· · · – θ L f L (t i)]2. (21.3.3.14)

The estimators θ1∗ , , θ ∗ Lform a solution of the system of equations

θ ∗

1

n

j=1

f i (t j) +· · · + θ ∗

L

n

j=1

f i (t j) =

n

j=1

X j f i (t j). (21.3.3.15)

The estimators θ1∗ , , θ L ∗ are linear and efficient; in particular, they are unbiased and have the minimal variance among all possible estimators

Remark. If we omit the requirement that the errors ε1, , ε nare normally distributed and only assume

that they are uncorrelated and have zero expectation and the same variance σ2, then the estimators θ ∗1, , θ L ∗

are linear, unbiased, and have the minimal variance in the class of all linear estimators.

The confidence intervals for a given confidence level γ for the unknown parameters θ1,

, θ Lhave the form

|θ i – θ ∗ i|< t(1+γ)/2

c2

i s20 , (21.3.3.16)

where t γ is the γ-quantile of the t-distribution with n – L degrees of freedom,

s2

n – L θ1min, ,θL

S2(θ

1, , θ L) = S

2(θ ∗

1, , θ L ∗)

2

i =

n

i=1

c2

ij,

and c ij are the coefficients in the representation θ i ∗= n

j=1c ij X j.

Trang 5

System (21.3.3.15) can be solved in the simplest way if the vectors (21.3.3.13) are orthogonal In this case, system (21.3.3.15) splits into separate equations

θ ∗ i

n

j=1

f2

i (t j) =

n

j=1

X j f i (t j).

Then the estimators θ ∗1, , θ L ∗ are independent, linear, and efficient

Remark. If we omit the requirement that the errors ε1, , ε nare normally distributed, then the estimators

θ1, , θ ∗ Lare uncorrelated, linear, and unbiased and have the minimal variance in the class of all linear estimators.

2◦ Testing the hypothesis that some θ i are zero Suppose that it is required to test the null

hypothesis H0 : θ k+1 = · · · = θ L =0 (0 ≤ k < L) This problem can be solved using the

statistic

Ψ = s21

s2 0

,

where

s2

n – L θ1min, ,θL

S2(θ

1, , θ L), s21 = S

2

2– (n – L)s20

S2

2 = minθ

1 , ,θ k

S2(θ

1, , θ k,0, ,0)

The hypothesis H0for a given size γ is accepted ifΨ < Ψ1 –γ, whereΨγ is the γ-quantile

of the F -distribution with parameters L – k and n – L.

3◦ Finding the estimator x ∗ (t) of the regression x(t) =L

i=1θ i f i (t) at an arbitrary time and

the construction of confidence intervals.

The estimator x ∗ (t) of the regression x(t) is obtained if θ i in x(t) are replaced by their

estimators:

x ∗ (t) =L

i=1

θ ∗

i f i (t).

The estimator x ∗ (t) is a linear, efficient, normally distributed, and unbiased estimator of the regression x(t).

The confidence interval of confidence level γ is given by the inequality

|x (t) – x ∗ (t)|< t(1+γ)/2

c (t)s20 , (21.3.3.17)

where c(t) =L

i=1c ij (t)f i (t) and t γ is the α-quantile of the t-distribution with n – L degrees

of freedom

Example 1 Consider a linear regression x(t) = θ1+ θ2t.

1◦ The estimators θ ∗1 and θ ∗2 of the unknown parameters θ1and θ2 are given by the formulas

θ1 =

n

j=1

c1j X j, θ2 =

n

j=1

c2j X j,

where

c1j=

n

k=1t

2

k – t jn

k=1t k

nn

k=1t

2

k– n

k=1t k

2, c2j=

nt j– n

k=1t k

nn

k=1t

2

k– n

k=1t k

2.

Trang 6

The statistic s20∗ , up to the factor (n –2)/σ2, has the χ2-distribution with n –2 degrees of freedom and is determined by the formula

s20∗= 1

n– 2

n

j=1

(X j – θ ∗1 – θ ∗2t )2.

The confidence intervals of confidence level γ for the parameters θ iare given by the formula

|θ i – θ ∗ i|< t(1+γ)/2

c2s20∗, i= 1 , 2 ,

where t γ is the γ-quantile of the t-distribution with n –2 degrees of freedom.

2◦ We test the null hypothesis H0: θ2= 0, i.e., the hypothesis that x(t) is independent of time The value of

S2is given by the formula

S2=

n

i=1

(X i – m ∗)2, m ∗= 1

n

i=1

X i,

and the value of s21∗is given by the formula

s21∗ = S2– (n –2)s20∗.

Thus, the hypothesis H0for a given confidence level γ is accepted if φ < φ γ , where φ γ is the γ-quantile of an

F-distribution with parameters 1and n –2

3◦ The estimator x ∗ (t) of the regression x(t) has the form

x ∗ (t) = θ ∗1+ θ ∗2t.

The coefficient c(t) is determined by the formula

c (t) =

n

j=1

(c1j + c2j )2=

n

j=1

c2j+ 2t

n

j=1

c1j c2j + t2

n

j=2

c2j = b0+ b1t + b2t2.

Thus the boundaries of the confidence interval for a given confidence level γ are given by the formulas

x ∗ L (t) = θ1∗ + θ2∗ t – t( 1 +γ)/2

s20∗ (b0+ b1t + b2t2),

x ∗ R (t) = θ ∗1 + θ ∗2t + t(1+γ)/2

s20∗ (b0+ b1t + b2t2),

where t γ is the γ-quantile of the t-distribution with n –2 degrees of freedom.

21.3.3-7 Analysis of variance

Analysis of variance is a statistical method for clarifying the influence of several factors on

experimental results and for planning subsequent experiments

1◦ The simplest problem of analysis of variance Suppose that there are L independent

samples

X11, , X1n1; X21, , X2n1; ; X L1, X L2, , X Ln1, (21.3.3.18)

drawn from normal populations with unknown expectations a1, , a Land unknown but

equal variances σ2 It is necessary to test the null hypothesis H0 : a1 =· · · = a L that all

theoretical expectations a i are the same against the alternative hypothesis H1 that some theoretical expectations are different

The intragroup variances are determined by the formulas

s2

(i) = 1

n i–1

n j

j=1

(X ij – m ∗ i)2 (i =1,2, , n), (21.3.3.19)

Trang 7

where m ∗ i = n1

j

n j

j=1X ij is the sample mean of the corresponding sample The random

variable (n i–1)s2(i) /σ2has the chi-square distribution with n

i–1degrees of freedom An

unbiased estimator of the unknown variance σ2is given by the statistic

s2

L

i=1(n i–1)s2(i)

L

i=1(n i–1)

called the residual variance.

The intergroup sample variance is defined to be the statistic

s2

L–1

L

i=1

(m ∗ i – m ∗)2n i (21.3.3.21)

where m ∗ is the common sample mean of the generalized sample The statistic s21 is

independent of s20 and is an unbiased estimator of the unknown variance σ2 The random

variable s20 L

i=1(n i–1)/σ2 is distributed by the chi-square law withL

i=1(n i–1) degrees of

freedom, and the random variable s21 (L –1)/σ2is distributed by the chi-square law with

L–1degrees of freedom

According to the one-sided Fisher test, the null hypothesis H0must be accepted for a

given confidence level γ if Ψ =s2

1 /s20 <Ψγ, whereΨγ is the γ-quantile of the F -distribution with parameters L –1andL

i=1(n i–1)

2◦ Multifactor analysis of variance We consider two-factor analysis of variance Suppose

that the first factor acts at L1levels and the second factor acts at L2 levels (the two-factor

(L1, L2)-level model of analysis of variance) Suppose that we have n ij observations in

which the first factor acted at the ith level and the second factor acted at the jth level The observation results X ijk are independent normally distributed random variables with the

same (unknown) variance σ2and unknown expectations a ij It is required to test the null

hypothesis H0that the first and second factors do not affect the results of observations, i.e.,

all a ij are the same

The action of two factors at levels L1 and L2 is identified with the action of a single

factor at L1L2levels; then to test the hypothesis H0, it is expedient to use the one-factor

L1L2-level model The statistics s20 and s21 are determined by the formulas

s2

L1

i=1

L2

j=1

n ij

k=1(X ijk – m

∗

ij)2

L1

i=1

L2

j=1(n ij–1)

,

s2

L1L2–1

L1

i=1

L2

j=1

n ij (m ∗ ij – m ∗)2,

(21.3.3.22)

Tiêu đề	Statistical Hypothesis Testing
Trường học	Standard University
Chuyên ngành	Mathematical Statistics
Thể loại	Thesis
Năm xuất bản	2023
Thành phố	City Name

Định dạng
Số trang	7
Dung lượng	390,48 KB