Sampling and Sample Image

Một phần của tài liệu Mathematical statistics for applied econometrics (Trang 175 - 181)

Focusing on the sample versus population dichotomy for a moment, the sample image ofX, denotedX∗, and the empirical distribution function forf(X) can be depicted as a discrete distribution function with probability 1/n.

Consider an example from production economics. Suppose that we observe data on the level of production for a group of firms and their inputs (e.g., the capital (K), labor (L), energy (E), and material (M) data from Dale Jorgen- son’s KLEM dataset [22] for a group of industriesi= 1,ã ã ãN). Next, assume that we are interested in measuring the inefficiency given an estimate of the efficient amount of production associated with each input (ˆyi(ki, li, ei, mi)).

i=yi−yˆi(ki, li, ei, mi). (7.1) For the moment assume that the efficient level of production is known without error. One possible assumption is that−i ∼Γ (α, β), or all firms are at most efficient (yi−yˆi(ki, li, ei, mi)≤0). An example of the gamma distribution is presented in Figure 7.1.

Given this specification, we could be interested in estimating the charac- teristics of the inefficiency for a firm in a specific industry – say the average

0.00 0.02 0.04 0.06 0.08 0.10 0.12

0 5 10 15 20 25

,

f H D E

H

FIGURE 7.1

Density Function for a Gamma Distribution.

TABLE 7.1

Small Sample of Gamma Random Variates

Obs. Fˆ(i) F(i) Obs. Fˆ(i) F(i) 1 0.4704 0.02 0.0156 26 1.7103 0.52 0.4461 2 0.4717 0.04 0.0157 27 1.7424 0.54 0.4601 3 0.5493 0.06 0.0256 28 1.8291 0.56 0.4971 4 0.6324 0.08 0.0397 29 1.9420 0.58 0.5436 5 0.6978 0.10 0.0532 30 1.9559 0.60 0.5491 6 0.7579 0.12 0.0676 31 1.9640 0.62 0.5524 7 0.9646 0.14 0.1303 32 2.1041 0.64 0.6061 8 0.9849 0.16 0.1375 33 2.2862 0.66 0.6698 9 0.9998 0.18 0.1428 34 2.3390 0.68 0.6868 10 1.0667 0.20 0.1677 35 2.3564 0.70 0.6923 11 1.0927 0.22 0.1778 36 2.5629 0.72 0.7522 12 1.1193 0.24 0.1883 37 2.6581 0.74 0.7766 13 1.1895 0.26 0.2169 38 2.8669 0.76 0.8234 14 1.2258 0.28 0.2321 39 2.9415 0.78 0.8381 15 1.3933 0.30 0.3051 40 3.0448 0.80 0.8566 16 1.4133 0.32 0.3140 41 3.0500 0.82 0.8575 17 1.4354 0.34 0.3238 42 3.0869 0.84 0.8637 18 1.5034 0.36 0.3543 43 3.1295 0.86 0.8705 19 1.5074 0.38 0.3561 44 3.1841 0.88 0.8788 20 1.5074 0.40 0.3561 45 4.0159 0.90 0.9585 21 1.5459 0.42 0.3733 46 4.1773 0.92 0.9667 22 1.5639 0.44 0.3814 47 4.2499 0.94 0.9699 23 1.5823 0.46 0.3896 48 4.4428 0.96 0.9770 24 1.5827 0.48 0.3898 49 4.4562 0.98 0.9774 25 1.6533 0.50 0.4211 50 4.6468 1.00 0.9828

technical inefficiency of firms in the Food and Fiber Sector. Table 7.1 presents one such sample for 50 firms in ascending order (i.e., this is not the order the sample was drawn in). In this table we define the empirical cumulative distribution as

Fˆ(i) = i

N (7.2)

where N = 50 (the number of oberservations). The next column gives the theoretical cumulative density function (F(i)) – integrating the gamma den- sity function from 0 to i. The relationship between the empirical and theo- retical cumulative distribution functions is presented in Figure 7.2. From this graphical depiction, we conclude that the sample image (i.e., the empirical cu- mulative distribution) approaches the theoretical distibution. Given the data presented in Table 7.1, the sample mean is 2.03 and the sample variance is 1.27.

Next, we extend the sample to N = 200 observations. The empirical and theoretical cumulative density functions for this sample are presented in Figure 7.3. Intuitively, the sample image for the larger sample is closer to the

0.0 0.2 0.4 0.6 0.8 1.0 1.2

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Empirical Theoretical

,

FH D E

H

FIGURE 7.2

Empirical versus Theoretical Cumulative Distribution Functions — Small Sample.

0.0 0.2 0.4 0.6 0.8 1.0 1.2

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

Empirical Theoretical

,

FH D E

H

FIGURE 7.3

Empirical versus Theoretical Cumulative Distribution Functions — Large Sample.

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Cumulative Density

Probability Density

Density Cumulative

FIGURE 7.4

Probability and Cumulative Beta Distributions.

underlying distribution function than the smaller sample. Empirically, the mean of the larger sample is 2.03 and the variance is 1.02. Given that the true underlying distribution is a Γ (α= 4, β= 2), the theoretical mean is 2 and the variance is 1. Hence, while there is little improvement in the estimate of the mean from the larger sample, the estimate of the variance for the larger sample is much closer to its true value.

To develop the concept of sampling from a distribution, assume that we are interested in estimating the share of a household’s income spent on housing.

One possibility for this effort is the beta distribution, which is a two parameter distribution for a continuous random variable with values between zero and one (depicted in Figure 7.4). Assume that our population is the set of 40 faculty of some academic department. Further assume that the true underlying beta distribution is the one depicted in Table 7.2. Assume that it is too costly to sample all 40 faculty members for some reason and that we will only be able to collect a sample of 8 faculty (i.e., there are two days remaining in the spring semester so the best you can hope for is to contact 8 faculty). The question is how does our sample of eight faculty relate to the true beta distribution?

First, assume that we rank the faculty by the percent of their income spent on housing from the lowest to the highest. Next, assume that we draw a sample of eight faculty members from this list (or sample) at random.

s={34,27,19,29,33,12,23,35}. (7.3) Taking the first point, 34/40 is equivalent to a uniform outcome of 0.850.

Graphically, we can map this draw from a uniform random outcome into a

TABLE 7.2

Density and Cumulative Density Functions for Beta Distribution x f(x|α, β) F(x|α, β)) x f(x|α, β) F(x|α, β))

0.025 0.2852 0.0036 0.525 1.4214 0.7240

0.050 0.5415 0.0140 0.550 1.3365 0.7585

0.075 0.7701 0.0305 0.575 1.2463 0.7908

0.100 0.9720 0.0523 0.600 1.1520 0.8208

0.125 1.1484 0.0789 0.625 1.0547 0.8484

0.150 1.3005 0.1095 0.650 0.9555 0.8735

0.175 1.4293 0.1437 0.675 0.8556 0.8962

0.200 1.5360 0.1808 0.700 0.7560 0.9163

0.225 1.6217 0.2203 0.725 0.6579 0.9340

0.250 1.6875 0.2617 0.750 0.5625 0.9492

0.275 1.7346 0.3045 0.775 0.4708 0.9621

0.300 1.7640 0.3483 0.800 0.3840 0.9728

0.325 1.7769 0.3926 0.825 0.3032 0.9814

0.350 1.7745 0.4370 0.850 0.2295 0.9880

0.375 1.7578 0.4812 0.875 0.1641 0.9929

0.400 1.7280 0.5248 0.900 0.1080 0.9963

0.425 1.6862 0.5675 0.925 0.0624 0.9984

0.450 1.6335 0.6090 0.950 0.0285 0.9995

0.475 1.5711 0.6491 0.975 0.0073 0.9999

0.500 1.5000 0.6875 1.000 0.0000 1.0000

beta outcome, as depicted in Figure 7.5, yielding a value of the beta random variable of 0.6266. This value requires a linear interpolation. The uniform value (i.e., the value of the cumulative distribution for beta) lies between 0.8484 (x= 0.625) and 0.8735 (x= 0.650).

x= 0.625 + (0.8500−0.8484)× 0.650−0.625

0.8735−0.8484 = 0.6266. (7.4) Thus, if our distribution is true (B(α= 3, β= 2)), the 34th individual in the sample will spend 0.6266 of their income on housing. The sample of house shares for these individuals are then

t={0.6266,0.4919,0.3715,0.5257,0.6038,0.2724,0.4295,0.6516}. (7.5) Table 7.3 presents a larger sample of random variables drawn according to the theoretical distribution. Figure 7.6 presents the sample and theoretical cumulative density functions for the data presented in Table 7.3.

The point of the discussion is that a sample drawn at random from a population that obeys any specific distribution function will replicate that distribution function (the sample converges in probability to the theoretical distribution). The uniform distribution is simply the collection of all individ- uals in the population. We assume that each individual is equally likely to be

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.4919

0.850

0.475 0.675

0.6266

Uniform Beta

0.3715

27

34

19

FIGURE 7.5

Inverse Beta Distribution.

TABLE 7.3

Random Sample of Betas

Obs. U[0,1] B(α, β) Obs. U[0,1] B(α, β) 1 0.3900 0.3235 21 0.3944 0.3260 2 0.8403 0.6177 22 0.0503 0.0977 3 0.3312 0.2902 23 0.5190 0.3967 4 0.5652 0.4236 24 0.4487 0.3566 5 0.7302 0.5295 25 0.7912 0.5753 6 0.4944 0.3826 26 0.4874 0.3785 7 0.3041 0.2748 27 0.7320 0.5307 8 0.3884 0.3227 28 0.4588 0.3623 9 0.2189 0.2241 29 0.1510 0.1799 10 0.9842 0.8357 30 0.9094 0.6915 11 0.8840 0.6616 31 0.6834 0.4973 12 0.0244 0.0657 32 0.6400 0.4694 13 0.0354 0.0806 33 0.6833 0.4973 14 0.0381 0.0837 34 0.3476 0.2996 15 0.8324 0.6105 35 0.3600 0.3066 16 0.0853 0.1302 36 0.0993 0.1417 17 0.5128 0.3931 37 0.5149 0.3943 18 0.7460 0.5409 38 0.7397 0.5364 19 0.4754 0.3717 39 0.0593 0.1066 20 0.0630 0.1101 40 0.4849 0.3771

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Sample Theoretical

,

F xD E

x

FIGURE 7.6

Sample and Theoretical Beta Distributions.

drawn for the sample. We order the underlying uniform distribution in our discussion as a matter of convenience. However, given that we draw the sample population randomly, no assumption about knowing the underlying ordering of the population is actually used.

Một phần của tài liệu Mathematical statistics for applied econometrics (Trang 175 - 181)

Tải bản đầy đủ (PDF)

(362 trang)