1. Trang chủ
  2. » Giáo Dục - Đào Tạo

Tài liệu Kinh tế ứng dụng_ Lecture 1: Normal Distribution pptx

7 326 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Normal distribution
Tác giả Nguyen Hoang Bao
Chuyên ngành Econometrics
Thể loại Lecture notes
Năm xuất bản 2004
Định dạng
Số trang 7
Dung lượng 147,38 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

3 Moment-based characteristics of a distribution First moment Mean > Median: the distribution is skewed to the right Mean ≅ Median ≅ Mode: the distribution is symmetrically distribute

Trang 1

Applied Econometrics

Lecture 1: Normal Distribution

For many random variables, the probability distribution is a specific bell-shaped curve, called the normal curve, or Gaussian curve This is the most common and useful distribution in statistics

1) Standard normal distribution

The standard normal distribution has the probability density function as follows:

e z 2π

1 P(z)

1

=

= Features of the curve are:

1) z2 increases in the negative exponent Therefore, P(z) decreases, approaching 0

symmetrically in both tails

2) The mean, which is zero (μ = 0), is the balancing point or the center of symmetry

3) The standard deviation is one (σ = 1)

Example 1.1: If z has a standard normal distribution, find: P(-2<z<2)1

Solution: P(-2<z<2) = 1 – P(z<-2) – P(z>2) = 1 – 2 (0.023) = 0.954

2) General normal distribution

The general normal distribution has the probability density function as follows:

μ X 2π

σ

1 Y

2

2 1

⎛ −

The quantity Y, which is the height of the curve at any point along the scale of X, is known as the probability density of that particular value of the variable quantity, X

Example 2.1: The local authorities in a certain city install 2,000 electricity lamps in the streets of the city If these lamps have an average life of 1,000 burning hours, with a standard deviation of 200 hours, what number of the lamps might be expected to fail in the first 700 burning hours?

Trang 2

Solution: In this case, we want to find the probability corresponding to the area of the probability curve below t = [(700-1000)/200] = -1.5 We ignore the sign and enter our table at 1.5 to find that the probability for lives less than 700 hours is P = 0.067 Hence the expected number of failures will be 2,000 x 0.067 = 134

Example 2.2: What number of lamps may be expected to fail between 900 and 1,300 burning hours?

Solution:

z The number of lamps, which will fail under 900 hours: The corresponding value of t = [(900 – 1000)/200] = -0.5 Entering the table with this value of t, we find for the probability of failure below 900 hours: P = 0.309

z The number of lamps, which will fail over 1,300: The corresponding value of t = [(1,300 – 1,000)/200] = 1.5 Entering the table with this value of t, we find for the probability of failure over 1,300 hours: P = 0.067

z Hence the probability of failure outside the limits 900 to 1,300 hours will be 0.376 (0.309+0.067 = 0.376) It follows that the number of lamps we may expect to fail outside these limits is: 2,000 x 0.376 = 752 But we were asked to find the number, which are likely to fail inside the limits stated This is 2,000 – 752 = 1,248

Example 2.3: After what period of burning hours would you expect that 10% of the lamps would have failed?

Solution: What we want here is the value of t corresponding to a probability P = 0.1 Looking along our table, we find that when t = 1.25 the probability is P = 0.106 This is near enough for our purpose

of prediction Hence we may take it that 10% of the lamps will fail at 1.25 standard deviations Since one standard deviation is equal to 200 hours, it follows that 10% of the lamps will fail before 1,000 – 1.25 x (200) = 1,000 – 250 = 750 hours

3) Moment-based characteristics of a distribution

First moment

Mean > Median: the distribution is skewed to the right

Mean ≅ Median ≅ Mode: the distribution is symmetrically distributed

Mean < Median: the distribution is skewed to the left

Trang 3

Second moment

The spread of a distribution is measured by its standard deviation

1 n

X

X S

n

1 i

2

i

= ∑= −

Third moment

Coefficient of skewness: a3 = (1/ns3) ∑(Xi- X)3

z Cubic power preserves the sign of an expression but inflate the larger deviations proportionally much more than smaller deviations If the distribution is symmetrical, negative and positive cubic power will cancel each other out

z The cubic power of the standard deviation in the denominator is used to standardize the measure and so remove the dimension (i.e., it will not depend on the units in which the variable

is measured)

z If a3 > 0, the distribution is skewed to the right (meaning its long tail is to the right) and the mean is greater than the median

If a3 ≅ 0, the distribution is normally distributed (approximate symmetry) and the mean is approximately equal to the median

If a3 < 0, the distribution is skewed to the left (meaning its long tail is to the left) and the mean

is smaller than the median

Fourth moment

Coefficient of kurtosis: a4 = (1/ns4) ∑(Xi- X)4

z Fourth powers make each sign positive but inflate larger deviations even more than cubic powers or squares would do

z The presence of heavy tails, therefore, will tend to inflate the numerator proportionally more than denominator The fatter the tails, therefore, the higher the kurtosis

z The fourth power of the standard deviation in the denominator standardizes the measure and

Trang 4

z If a4 > 3, the distribution has heavier tails than a normal distribution

If a3 < 3, the distribution has a rectangular distribution which has a body but no tails

Table 3.1: Moment-based characteristics of a distribution

Measure Population Sample X ∼ N(0,1)

Second moment Spread E(X-μ)2 = σ2

S2 = [1/(n-1)] ∑(Xi- X)2 1 Third moment Skewness (1/σ3) E(X-μ)3

a3 = (1/ns3) ∑(Xi- X)3 0 Fourth moment Kurtosis (1/σ4) E(X-μ)4

a4 = (1/ns4) ∑(Xi- X)4 3

4) The skewness – kurtosis (Jarque – Bera) test for normality

The hypothesis of normality distribution H0 is as follows:

H0: α3 = 0 and α4 = 3

Against

H1: α3 ≠ 0 or α4 ≠ 3 or both

The relevant test statistic is BJ which follows a chi-square distribution with two degree of freedom

BJ = a32 (n/6) + (a4 – 3)2 (n/24)

If BJ > 5.99, the normality distribution is formally rejected

If BJ ≤ 5.99, we have no conclusion

5) Transformations towards normality

If the data are unimodal but skewed, a data transformation is called for to correct for the skewness in the data To do this we rely on the ladder of power transformations, which enable us to correct for differences in the direction of skewness (positive or negative) and its strength Often, but not always,

a transformation renders the transformed data symmetric, and, hopefully, also more normal in shape

If so, the classical model of inference about the population mean using the sample mean as estimator can again be used Table 5.1 illustrates the hierarchy of these power transformations and their impact

on the skewness in the data

Trang 5

Table 5.1: Ladder of Power to Reduce Skewness

3

2

1

0

-1

X3

X2

X lnX 1/X

Reduce extreme negative skewness Reduce negative skewness

Leaves data unchanged Reduce positive skewness Reduce extreme positive skewness

The power used in transformation need not be only an integer but can contain fractions as well The choice of an appropriate transformation often involves a trade-off between one which is ideal for the purposes of data analysis and one which performs reasonably well on this count but also has the advantage that it lends itself to a more straightforward interpretation (in substantive terms) of the results

References

Bao, Nguyen Hoang (1995), ‘Applied Econometrics’, Lecture notes and Readings,

Vietnam-Netherlands Project for MA Program in Economics of Development

Maddala, G.S (1992), ‘Introduction to Econometrics’, Macmillan Publishing Company, New York Mukherjee Chandan, Howard White and Marc Wuyts (1998), ‘Econometrics and Data Analysis for

Developing Countries’ published by Routledge, London, UK

Trang 6

Workshop 1: Normal Distribution

1) Phil and Kim Bell do not know whether to buy a house now or wait a year, in which case a price increase may put a house beyond their reach Their best guess is that, if they wait a year, the price increase will be approximately normal, with a mean of 8% and, reflecting the uncertainty

of the market, a standard deviation of 10%

1.1) If the price increase exceeds 25% they feel they will be unable to afford a house What is the chance of this?

1.2) On the other hand, if the price drops, they will have won their gamble handsomely What is the chance of this?

2) Using the data file SOCECON (with the world socioeconomic data for 1990) on the diskette, make histograms and compute means, median and modes for the following variables:

GNP (gross national product) per capita

HDI (human development index)

FERT (fertility rate)

LEXPM and LEXPF (male and female life expectancy)

POPGRWTH (population growth rate)

In each case, discuss the different averages in the light of the shape of the empirical distribution Would you say that any of the distributions is reasonably symmetrical and bell-shaped?

3) Collecting the macroeconomic indicators Y (GDP), I (Investment), C (consumption), X (Exports) and M (Imports) at fixed price on the World Development Indicators 2003 for 200 countries in the world,

3.1) Make histograms and compute means, median and modes for the above variables

3.2) Calculate the coefficients of skewness and kurtosis

3.3) Use the Jarque – Bera test for normality of each variable

Trang 7

4) Collecting data of life expectancy (LE) and GDP per capita (Y) of 200 countries (WDI 2003),

4.1) Plot the histogram (frequency graph) for each of your two samples (life expectancy and income per capita)

4.2) Calculate the mean, mode, and median for each of your two samples

4.3) Calculate the skewness and kurtosis for each of your two samples

4.4) Use the Bera – Jarque test for normality for each of your two samples

4.5) In each case find the most appropriate transformation so that the data are approximately normal

4.6) Calculate the regression coefficients from regressing LE on Y using a different functional forms

LE = a0 + a1Y

ln(LE) = b0 + b1Y

LE = c0 + c1lnY

ln(LE) = d0 + d1lnY

and compare their coefficients of determination

4.7) Which of the models you have estimated best fits of the data? Discuss your results

Ngày đăng: 27/01/2014, 11:20

TỪ KHÓA LIÊN QUAN