As in the case of ratio estimation, here also one can employ double sampling method to estimate the population total Y whenever the population total X of the auxiliary variable is not known. The difference estimator for the population total under double sampling is defined as
r00 =Y+A.<id -i> (7.6>
...
where X D is an unbiased estimator of the population total X based on the first phase sample. Evidently the difference estimator is unbiased for the population total in both the cases of double sampling.
~ ... 2
Note that V(Y00 ) = E[Y00 - Y]
= E[(Y-Y)+ A.(Xd-X)]2
A A A 2
=E[(Y- Y)+A.[(Xd- X}-(X- X)]]
= V(f)+A.2[V(X)+ V(X d )-2cov(X, Xd )]
- 2A.[cov(f, X d)-cov(X, f)] (7.7) When the samples are drawn independently, the above variance reduces to
V[f00]=V(f)+ A.2[V(X) + V(Xd )] - 2A.cov(f, id)
The following theorem gives the variance of the difference estimator in double sampling when the samples are drawn independently in two phases of sampling using simple random sampling.
Theorem 7.3 When the sampies are drawn independently in the two phases of sampling using simple random sampling the variance of the difference estimator is
V(Y00> = N2[s; +12 (/ + f')S; -2A. I Sxyl
N-n N-n'
where f = - and f' = . Here n' and n are the sample s1zes
Nn Nn'
corresponding to the first and second phases of sampling. Further the minimum variance of the difference estimator in this case is
N 2 I s y 2 [1- /+/' I P 2]
where p is the correlation coefficient between x and y.
Proof Using the results stated in Section 6.8 in the variance expression available in (7. 7) we get
V(Y _ N 2(N-rr) 5 2 ;..2 N2(N-n') S2+ N 2(N-n) s2
DD) - N ,. + N ' .t Nn ;c
n ã n
.,
_ 2A. N ... (N -n) S . Nn -~
= N2[s; +A-2 (/ + f')S.~-2A. I S;cy] (7.8)
Differentiating the above variance expression partially with respect to A. and equating the derivative to zero, we get A.= f S ;cy2 .Substituting this value in
f+f'sJC
(7.8) and simplifying the resulting expression we get the minimum variance N2 I s2[1- f P2]
y /+/'
It is to be noted that the second order derivative is always positive. Hence the proof. •
Theorem 7.4 When the second phase sample is a subsample of the first phase sample and simple random sampling is used in both the phases of sampling the variance of the difference estimator is
V(Y oo) = N 2 [/ S; + A.2 (/-f')S; + 2A.(/'-/)S;cy].
The minimum variance of the difference estimator in this case is N2 s;[f' P2 + 10 _ p2)l
where f and f' are as defined in Theorem 7.3 Proof of this theorem is left as an exercise.
7 A Multivariate DitTerence Estimation
When information about more than one auxiliary variable is known, the difference estimator defined in Section 7.3 can be extended in a straight forward manner.
Let Y , X 1 and X! be unbia5ed estimators for the population totals Y , X 1 and X! of the study variable)', the auxiliary variables x1 and x2 respectively. The difference estimator of the population total Y is defined as
y 02 = y + 8 I (X I - X I ) + 8 2 (X 2 - i 2)
where the constants 81 and 82 are predetermined.
A
(7.9)
The estimator Y 02 is unbiased for the population total and its variance is
... .... .... .... 2
V(Y02 )=E{(f-f)+81(X 1-X 1)+82(X 2 -X 2)]
.... 2 .... 2 .... ... ....
=V(Y) + 8 1 V(X 1) + 82 V(X 2)- 281 cov(f, X 1) -282 cov(f. :i 2 ) + 28182 cov(Xt, X2) Denote by
V0 =V(Y),V1 =V(X 1),V2 =V(X 2)
A A A A A A
C01 =cov(Y,X 1),C02 =cov(Y,X 2),C12 =cov(X 1,X 2)
Differentiating the variance expression partially with respect to 81 and 8 2 and equating the derivatives to zero, we get the following equations
V181 +C1282 =Cot C1281 + V282 =Co2
Solving these two equations, we obtain C01 V2 - C12C02
8t= .,
V1V2-cr2
Co2V1- C12Cot
82 = 2
v1v2 -c12
(7.10) (7.11)
(7.12) (7.13) Substituting these values in the variance expression, we get after simplification
V(f)[1-R~ ] (7.14)
.~ . ..t •• ..t2
A A
where R y . ..t •• ..t2 is the multiple correlation between Y and X 1 . X 2 . Since the multiple correlation between Y and X 1 . X 2 is always greater than the correlation between Y and X 1 and that of between Y and X 2 , we infer that the use of additional auxiliary information will always increase the efficiency of the estimator. However, it should be noted that the values of 81 and 8 2 given in (7.12) and (7.13) depend on C01 and C02 which in general will not be known.
The following theorem proves that whenever b1 and b2 are used in place of 81
and 8 2 given in (7 .12) and (7 .13 ), the resulting estimator will have mean square error that is approximately equal to the minimum variance given in (7.14), where
where c01 and coz are unbiased estimators for C01 and C02 respectively.
TINorem 7.5 The approximate mean square error of the estimator
A . A A A
Y D2 = Y + bt (X t - X t ) + b2 (X 2 - X 2)
is same as that of the difference estimator defined in (7 .9), where b1 ~d b1 are as defined in (7.15) and (7.16) respectively.
Proof Let
Y-Y X1-Xt e _ X2 -X2
eo = y ' el = X t ' 2 - X 2
e'= cot -Cot ,e"= coz -Co2
Cot Coz
It can be seen that
b _ CotO+e'}Vz -Ct2CozO+e")
1-
VtVz- Cfz B CotVze'-CtzCoze"
= t+--~~~=-~-
vlv2 -c?2
Similarly it be seen that
b CozVte"-CtzCote'
2 = Bz + 2
. V1V2 -C12
Using (7.17) and (7.18), the estimaror r;2 can be written as YA• - YA [B CotVze'-CtzCoze"] X XA
D2 - + I + 2 ( I - I ) +
vtvz -Ctz
[B CozVte''-Cl2Cote']<x XA
2 + 2 2 - 2)
V1V2 ~ C12
A A
Replacing Y ,X1 and X 2by YO+e0),X 1(1+e1) and X2(1+e2) we obtain
A• [ Cot Vze'-CtzCoze"]
y D2 - y = Yeo + 8 t + 2 (-X t e t ) + V1 V2 -C12
. [B CozVte"-CtzCote']
z+ 2 (-Xze2)
v1v2 - c12
(7 .17)
(7.18)
(7.19) in (7.19)
Squaring both the sides and ignoring terms of degree greater than two, we obtain
E[Y;z- Y]2 = Y2E(e5) + Bf XfE(ef) + BfX f E(ei)-
~ 2 ~ .., ~
=V(Y)+B1 V(Xt)+B!V(X2)-
2B1 cov(f, X 1)- 282 cov(f, X 2) + 28182 cov(X 1, X 2)
Substituting the values given in (7. 12) and (7 ,13) in the above expression. we get the approximate mean square error off~2 as V(f)[l-Ry . 2 .r, ,.rz ] . Hence the proof. •
Thus in the last few sections of this chapter. we constructed the linear regression estimator for the population total by using the fact that the variables x and y are linearly related and extended this to cover the case of having more than one auxiliary variable. In the following section. the problem of identifying the optimal sampling-estimating strategy with the help of super population models is considered.