21.3.3.3 Under the assumption that the null hypothesis H0is true, the distribution of the statistic μ ∗ depends only on the sample size n and is independent of the parameters a and σ2..
Trang 1where m is the number of degrees of freedom If|ρrom| ≤ 3, then the null hypothesis should
be accepted; if|ρrom|>3, then the null hypothesis should be rejected This test has a fixed
size α =0.0027
2◦ Test for excluding outliers To test the null hypothesis, the following statistic is used:
x ∗ = 1
s ∗ 1≤maxi≤n|X i – m ∗|, (21.3.3.2)
where s ∗ is the sample mean-square deviation The null hypothesis H0for a given size α
is accepted if x ∗ < x1–α , where x α is the quantile of the statistic x ∗ under the assumption
that the sample is normal The values of x α for various n and α can be found in statistical
tables
3◦ Test based on the sample first absolute central moment The test is based on the statistic
μ ∗ = μ ∗ (X
1, , X n) = 1
ns ∗
n
i=1
|X i – m ∗| (21.3.3.3)
Under the assumption that the null hypothesis H0is true, the distribution of the statistic μ ∗ depends only on the sample size n and is independent of the parameters a and σ2 The
null hypothesis H0 for a given size α is accepted if μ α/2 < μ ∗ < μ1–α/2, where μ α is the
α -quantile of the statistic μ ∗ The values of μ α for various n and α can be found in statistical
tables
4◦ Test based on the sample asymmetry coefficient The test is based on the statistic
γ ∗
1 = γ1∗ (X1, , X n) = 1
n (s ∗)3
n
i=1
(X i – m ∗)3 (21.3.3.4)
The null hypothesis H0 for a given size α is accepted if|γ ∗
1|< γ1,1–α/2, where γ1,α is the
α -quantile of the statistic γ ∗1 The values of γ1,α for various n and α can be found in
statistical tables
5◦ Test based on sample excess coefficient The test verifies the closeness of the sample
excess (the test statistic)
γ ∗
2 = γ2∗ (X1, , X n) = 1
n (s ∗)4
n
i=1
(X i – m ∗)4 (21.3.3.5)
and the theoretical excess γ2 + 3 = E{(X – E{X})4}/(Var{X})2, equal to 3 for the
normal law The null hypothesis H0 for a given size α is accepted if the inequality
γ2 ,α/ 2< γ2∗ < γ2 , 1 –α/ 2holds, where γ2 ,α is the α-quantile of the statistic γ2∗ The values of
γ2 ,α for various n and α can be found in statistical tables.
21.3.3-3 Comparison of expectations of two normal populations
Suppose that X and Y are two populations with known variances σ21 and σ22and unknown
expectations a1and a2 Two independent samples X1, , X n and Y1, , Y k are drawn
Trang 2from the populations and the sample expectations (means) m ∗1 and m ∗2are calculated The hypothesis that the expectations are equal to each other is tested using the statistic
∗
1– m ∗2
σ2
1/n + σ22/k
which has a normal distribution with parameters (0,1) under the assumption that the null
hypothesis H0 : a X = a Y is true
If the variances of the populations are unknown, then either the sample size should be sufficiently large for obtaining a reliable accurate estimator or the variances should coincide; otherwise, the known tests are inefficient If the variances of the populations are equal to
each other, σ12= σ22, then one can test the null hypothesis H0: a X = a Y using the statistic
T = (m ∗1– m ∗2)
*1
n + 1
k
s2
1 (n –1) + s22 (k –1)
n + k –2
+–1 2
which has the t-distribution (Student’s distribution) with v = n + k –2degrees of freedom The choice of the critical region depends on the form of the alternative hypothesis:
1 For the alternative hypothesis H1 : a1 > a2, one should choose a right-sided critical region
2 For the alternative hypothesis H1: a1< a2, one should choose a left-sided critical region
3 For the alternative hypothesis H1 : a1 ≠ a2, one should choose a two-sided critical
region
21.3.3-4 Tests for variances to be equal
Suppose that there are L independent samples
X11, , X1n1; X21, , X2n2; ; X L1, , X Ln L
of sizes n1, , n L drawn from distinct normal populations with unknown expectation a1,
, a L and unknown variances σ12, , σ2L It is required to test the simple hypothesis
H0: σ21 = · · · = σ2
L(the variances of all populations are the same) against the alternative
hypothesis H1that some variances are different
1◦ Bartlett’s test The statistic in this test has the form
b = N ln
1
N
L
i=1
(n i–1)s2i
–
L
i=1
(n i–1) ln s2i , (21.3.3.8) where
N =
L
i=1
(n i–1), s2
j = 1
n j–1
n j
j=1
(X ij – L ∗ j 2, L ∗
j = 1
n j
n j
j=1
X ij.
The statistic b permits reducing the problem of testing the hypothesis that the variances of
normal samples are equal to each other to the problem of testing the hypothesis that the expectations of approximately normal samples are equal to each other If the null hypothesis
H0is true and all n >5, then the ratio
B = b
3(L –1) +
L
i=1
1
n i–1 –
1
N
–1
is distributed approximately according to the chi-square law with L –1degrees of freedom
The null hypothesis H0for a given size α is accepted if B < χ21–α (L –1), where χ2αis the
α -quantile of the chi-square distribution with L –1degrees of freedom
Trang 32◦ Cochran’s test If all samples have the same volume (n1=· · · = n L = n), then the null
hypothesis H0is tested against the alternative hypothesis H1using the Cochran statistic
2
max
s2
1 +· · · + s2
L
where s2max= max
1 ≤i≤L s
2
i .
Cochran’s test is, in general, less powerful than the Bartlett test, but it is simpler The
null hypothesis H0for a given size α is accepted if G < G α , where G α is the α-quantile of the statistic G The values of G α for various α, L, and v = n –1can be found in statistical tables
3◦ Fisher’s test For L =2, to test the null hypothesis H0that the variances of two samples coincide, it is most expedient to use Fisher’s test based on the statistic
Ψ = s22
s2 1
where s21 and s22 are the adjusted sample variances of the two samples The statisticΨ has
the F -distribution (Fisher–Snedecor distribution) with n2–1and n1–1degrees of freedom
The one-sided Fisher test verifies the null hypothesis H0: σ12= σ22against the alternative
hypothesis H1 : σ12 < σ22; the critical region of the one-sided Fisher test for a given size α
is determined by the inequalityΨ > Ψ1 –α(n2–1, n1–1)
The two-sided Fisher test verifies the null hypothesis H0: σ12= σ22against the alternative
hypothesis H1: σ21≠σ2
2; the critical region of the two-sided Fisher test for a given size α is
determined by the two inequalitiesΨα/2<Ψ < Ψ1 –α/ 2, whereΨαis the α-quantile of the
F -distribution with parameters n2–1and n1–1
21.3.3-5 Sample correlation
Suppose that a sample X1, , X n is two-dimensional and its elements Xi = (X i1, X i2)
are two-dimensional random variables with joint normal distribution with means a1 and
a2, variances σ21 and σ22, and correlation r It is required to test the hypothesis that the
components X(1)and X(2)of the vector X are independent, i.e., test the hypothesis that the
correlation is zero
Estimation of the correlation r is based on the sample correlation
i=1(X i1– m ∗1)(X i2– m ∗2)
n
i=1(X i1– m ∗1)2n
j=1(X i2– m ∗2)2
, (21.3.3.11)
where
m ∗
n
n
i=1
X i1, m ∗
n
n
i=1
X i2.
The distribution of the statistic r ∗ depends only on the sample size n, and the statistic r ∗ itself is a consistent asymptotically efficient estimator of the correlation r.
The null hypothesis H0: r =0that X(1)and X(2)are independent against the alternative
hypothesis H1 : r≠ 0(X(1)and X(2)are dependent) is accepted if the inequality r α/2 < r ∗
< r1–α/2is satisfied Here r α is the α-quantile of the sample correlation under the assumption that the null hypothesis H0is true; the relation r α = –r1–αholds because of symmetry
Trang 4To construct the confidence intervals, one should use the Fisher transformation
y = arctanh r ∗ = 1
2ln
1+ r ∗
1– r ∗, (21.3.3.12)
which, for n >10, is approximately normal with parameters
E{y} ≈ 1
2 ln
1+ r
1– r +
r
2(n –3), Var{y} ≈ 1
n–3.
21.3.3-6 Regression analysis
Let X1, , X n be the results of n independent observations
X i =
L
j=1
θ j f j (t i ) + ε i
where f1(t), , f L (t) are known functions, θ1, , θ L are unknown parameters, and ε1,
, ε nare random errors known to be independent and normally distributed with zero mean
and with the same unknown variance σ2
The regression parameters are subject to the following constraints:
1 The number of observations n is greater than the number L of unknown parameters.
2 The vectors
fi = (f i (t1), , f i (t n)) (i =1,2, , L) (21.3.3.13) must be linearly independent
1◦ Estimation of unknown parameters θ1, , θ L and construction of (one-dimensional)
confidence intervals for them.
To solve this problem, we consider the sum of squares
S2 = S2(θ
1, , θ L) =
n
i=1
[X i – θ1f1(t i) –· · · – θ L f L (t i)]2. (21.3.3.14)
The estimators θ1∗ , , θ ∗ Lform a solution of the system of equations
θ ∗
1
n
j=1
f i (t j) +· · · + θ ∗
L
n
j=1
f i (t j) =
n
j=1
X j f i (t j). (21.3.3.15)
The estimators θ1∗ , , θ L ∗ are linear and efficient; in particular, they are unbiased and have the minimal variance among all possible estimators
Remark. If we omit the requirement that the errors ε1, , ε nare normally distributed and only assume
that they are uncorrelated and have zero expectation and the same variance σ2, then the estimators θ ∗1, , θ L ∗
are linear, unbiased, and have the minimal variance in the class of all linear estimators.
The confidence intervals for a given confidence level γ for the unknown parameters θ1,
, θ Lhave the form
|θ i – θ ∗ i|< t(1+γ)/2
c2
i s20 , (21.3.3.16)
where t γ is the γ-quantile of the t-distribution with n – L degrees of freedom,
s2
n – L θ1min, ,θL
S2(θ
1, , θ L) = S
2(θ ∗
1, , θ L ∗)
2
i =
n
i=1
c2
ij,
and c ij are the coefficients in the representation θ i ∗= n
j=1c ij X j.
Trang 5System (21.3.3.15) can be solved in the simplest way if the vectors (21.3.3.13) are orthogonal In this case, system (21.3.3.15) splits into separate equations
θ ∗ i
n
j=1
f2
i (t j) =
n
j=1
X j f i (t j).
Then the estimators θ ∗1, , θ L ∗ are independent, linear, and efficient
Remark. If we omit the requirement that the errors ε1, , ε nare normally distributed, then the estimators
θ1, , θ ∗ Lare uncorrelated, linear, and unbiased and have the minimal variance in the class of all linear estimators.
2◦ Testing the hypothesis that some θ i are zero Suppose that it is required to test the null
hypothesis H0 : θ k+1 = · · · = θ L =0 (0 ≤ k < L) This problem can be solved using the
statistic
Ψ = s21
s2 0
,
where
s2
n – L θ1min, ,θL
S2(θ
1, , θ L), s21 = S
2
2– (n – L)s20
S2
2 = minθ
1 , ,θ k
S2(θ
1, , θ k,0, ,0)
The hypothesis H0for a given size γ is accepted ifΨ < Ψ1 –γ, whereΨγ is the γ-quantile
of the F -distribution with parameters L – k and n – L.
3◦ Finding the estimator x ∗ (t) of the regression x(t) =L
i=1θ i f i (t) at an arbitrary time and
the construction of confidence intervals.
The estimator x ∗ (t) of the regression x(t) is obtained if θ i in x(t) are replaced by their
estimators:
x ∗ (t) =L
i=1
θ ∗
i f i (t).
The estimator x ∗ (t) is a linear, efficient, normally distributed, and unbiased estimator of the regression x(t).
The confidence interval of confidence level γ is given by the inequality
|x (t) – x ∗ (t)|< t(1+γ)/2
c (t)s20 , (21.3.3.17)
where c(t) =L
i=1c ij (t)f i (t) and t γ is the α-quantile of the t-distribution with n – L degrees
of freedom
Example 1 Consider a linear regression x(t) = θ1+ θ2t.
1◦ The estimators θ ∗1 and θ ∗2 of the unknown parameters θ1and θ2 are given by the formulas
θ1 =
n
j=1
c1j X j, θ2 =
n
j=1
c2j X j,
where
c1j=
n
k=1t
2
k – t jn
k=1t k
nn
k=1t
2
k– n
k=1t k
2, c2j=
nt j– n
k=1t k
nn
k=1t
2
k– n
k=1t k
2.
Trang 6The statistic s20∗ , up to the factor (n –2)/σ2, has the χ2-distribution with n –2 degrees of freedom and is determined by the formula
s20∗= 1
n– 2
n
j=1
(X j – θ ∗1 – θ ∗2t )2.
The confidence intervals of confidence level γ for the parameters θ iare given by the formula
|θ i – θ ∗ i|< t(1+γ)/2
c2s20∗, i= 1 , 2 ,
where t γ is the γ-quantile of the t-distribution with n –2 degrees of freedom.
2◦ We test the null hypothesis H0: θ2= 0, i.e., the hypothesis that x(t) is independent of time The value of
S2is given by the formula
S2=
n
i=1
(X i – m ∗)2, m ∗= 1
n
n
i=1
X i,
and the value of s21∗is given by the formula
s21∗ = S2– (n –2)s20∗.
Thus, the hypothesis H0for a given confidence level γ is accepted if φ < φ γ , where φ γ is the γ-quantile of an
F-distribution with parameters 1and n –2
3◦ The estimator x ∗ (t) of the regression x(t) has the form
x ∗ (t) = θ ∗1+ θ ∗2t.
The coefficient c(t) is determined by the formula
c (t) =
n
j=1
(c1j + c2j )2=
n
j=1
c2j+ 2t
n
j=1
c1j c2j + t2
n
j=2
c2j = b0+ b1t + b2t2.
Thus the boundaries of the confidence interval for a given confidence level γ are given by the formulas
x ∗ L (t) = θ1∗ + θ2∗ t – t( 1 +γ)/2
s20∗ (b0+ b1t + b2t2),
x ∗ R (t) = θ ∗1 + θ ∗2t + t(1+γ)/2
s20∗ (b0+ b1t + b2t2),
where t γ is the γ-quantile of the t-distribution with n –2 degrees of freedom.
21.3.3-7 Analysis of variance
Analysis of variance is a statistical method for clarifying the influence of several factors on
experimental results and for planning subsequent experiments
1◦ The simplest problem of analysis of variance Suppose that there are L independent
samples
X11, , X1n1; X21, , X2n1; ; X L1, X L2, , X Ln1, (21.3.3.18)
drawn from normal populations with unknown expectations a1, , a Land unknown but
equal variances σ2 It is necessary to test the null hypothesis H0 : a1 =· · · = a L that all
theoretical expectations a i are the same against the alternative hypothesis H1 that some theoretical expectations are different
The intragroup variances are determined by the formulas
s2
(i) = 1
n i–1
n j
j=1
(X ij – m ∗ i)2 (i =1,2, , n), (21.3.3.19)
Trang 7where m ∗ i = n1
j
n j
j=1X ij is the sample mean of the corresponding sample The random
variable (n i–1)s2(i) /σ2has the chi-square distribution with n
i–1degrees of freedom An
unbiased estimator of the unknown variance σ2is given by the statistic
s2
L
i=1(n i–1)s2(i)
L
i=1(n i–1)
called the residual variance.
The intergroup sample variance is defined to be the statistic
s2
L–1
L
i=1
(m ∗ i – m ∗)2n i (21.3.3.21)
where m ∗ is the common sample mean of the generalized sample The statistic s21 is
independent of s20 and is an unbiased estimator of the unknown variance σ2 The random
variable s20 L
i=1(n i–1)/σ2 is distributed by the chi-square law withL
i=1(n i–1) degrees of
freedom, and the random variable s21 (L –1)/σ2is distributed by the chi-square law with
L–1degrees of freedom
According to the one-sided Fisher test, the null hypothesis H0must be accepted for a
given confidence level γ if Ψ =s2
1 /s20 <Ψγ, whereΨγ is the γ-quantile of the F -distribution with parameters L –1andL
i=1(n i–1)
2◦ Multifactor analysis of variance We consider two-factor analysis of variance Suppose
that the first factor acts at L1levels and the second factor acts at L2 levels (the two-factor
(L1, L2)-level model of analysis of variance) Suppose that we have n ij observations in
which the first factor acted at the ith level and the second factor acted at the jth level The observation results X ijk are independent normally distributed random variables with the
same (unknown) variance σ2and unknown expectations a ij It is required to test the null
hypothesis H0that the first and second factors do not affect the results of observations, i.e.,
all a ij are the same
The action of two factors at levels L1 and L2 is identified with the action of a single
factor at L1L2levels; then to test the hypothesis H0, it is expedient to use the one-factor
L1L2-level model The statistics s20 and s21 are determined by the formulas
s2
L1
i=1
L2
j=1
n ij
k=1(X ijk – m
∗
ij)2
L1
i=1
L2
j=1(n ij–1)
,
s2
L1L2–1
L1
i=1
L2
j=1
n ij (m ∗ ij – m ∗)2,
(21.3.3.22)