Estimating Single-Independent-Variable Models- 123docz.net

The purpose of regression analysis is to take a purely theoretical equation like:

Yi = β0+β1Xi+ei (2.1)

Chapter 2

M02_STUD2742_07_SE_C02.indd 35 1/4/16 6:46 PM

36 ChAptER 2 Ordinary Least squares

and use a set of data to create an estimated equation like:

Yni = βn0+βn1Xi (2.2) where each “hat” indicates a sample estimate of the true population value.

(In the case of Y, the “true population value” is E3Y ƒ X4.) The purpose of the estimation technique is to obtain numerical values for the coefficients of an otherwise completely theoretical regression equation.

The most widely used method of obtaining these estimates is Ordinary Least Squares (OLS), which has become so standard that its estimates are presented as a point of reference even when results from other estimation techniques are used. Ordinary Least Squares (OLS) is a regression estimation technique that calculates the βns so as to minimize the sum of the squared residuals, thus:1

OLS minimizes a

N i=1

e2i 1i = 1, 2, . . . , N2 (2.3) Since these residuals (eis) are the differences between the actual Ys and the estimated Ys produced by the regression (the Yns in Equation 2.2), Equation 2.3 is equivalent to saying that OLS minimizes a1Yi-Yni22.

Why Use Ordinary Least Squares?

Although OLS is the most-used regression estimation technique, it’s not the only one. Indeed, econometricians have developed what seem like zillions of different estimation techniques, a number of which we’ll discuss later in this text.

There are at least three important reasons for using OLS to estimate regression models:

1. OLS is relatively easy to use.

2. The goal of minimizing ae2i is quite appropriate from a theoretical point of view.

3. OLS estimates have a number of useful characteristics.

1. The summation symbol, a, indicates that all terms to its right should be added (or summed) over the range of the i values attached to the bottom and top of the symbol. In Equation 2.3, for example, this would mean adding up e2i for all integer values between 1 and N:

i=1e2i = e21+e22+ g+e2N

Often the a notation is simply written as a

i , and it is assumed that the summation is over all observations from i= 1 to i = N. Sometimes, the i is omitted entirely and the same assump- tion is made implicitly.

37 estimating singLe-independent-VariabLe mOdeLs with OLs

The first reason for using OLS is that it’s the simplest of all econometric estimation techniques. Most other techniques involve complicated nonlin- ear formulas or iterative procedures, many of which are extensions of OLS itself. In contrast, OLS estimates are simple enough that, if you had to, you could calculate them without using a computer or a calculator (for a single- independent-variable model). Indeed, in the “dark ages” before computers and calculators, econometricians calculated OLS estimates by hand!

The second reason for using OLS is that minimizing the summed, squared residuals is a reasonable goal for an estimation technique. To see this, recall that the residual measures how close the estimated regression equation comes to the actual observed data:

ei = Yi-Yni 1i = 1, 2, . . ., N2 (1.15) Since it’s reasonable to want our estimated regression equation to be as close as possible to the observed data, you might think that you’d want to minimize these residuals. The main problem with simply totaling the residuals is that ei can be negative as well as positive. Thus, negative and positive residuals might cancel each other out, allowing a wildly inaccurate equation to have a very low aei. For example, if Y = 100,000 for two consecutive observations and if your equation predicts 1.1 million and -900,000, respectively, your residuals will be +1 million and -1 million, which add up to zero!

We could get around this problem by minimizing the sum of the absolute values of the residuals, but absolute values are difficult to work with mathematically. Luckily, minimizing the summed squared residuals does the job. Squared functions pose no unusual mathematical difficulties in terms of manipulations, and the technique avoids canceling positive and negative residuals because squared terms are always positive.

The final reason for using OLS is that its estimates have at least two useful properties:2

1. The sum of the residuals is exactly zero.

2. OLS can be shown to be the “best” estimator possible under a set of specific assumptions. We’ll define “best” in Chapter 4.

An estimator is a mathematical technique that is applied to a sample of data to produce a real-world numerical estimate of the true population regression coefficient (or other parameters). Thus, OLS is an estimator, and a βn produced by OLS is an estimate.

2. These properties, and indeed all the properties of OLS that we discuss in this book, are true as long as a constant term is included in the regression equation. For more on this, see Section 7.1.

M02_STUD2742_07_SE_C02.indd 37 1/4/16 6:46 PM

38 ChAptER 2 Ordinary Least squares

how Does OLS Work?

How would OLS estimate a single-independent-variable regression model like Equation 2.1?

Yi = β0+β1Xi+ei (2.1) OLS selects those estimates of β0 and β1 that minimize the squared residuals, summed over all the sample data points.

For an equation with just one independent variable, these coefficients are:3

βn1 = a

i=131Xi-X21Yi-Y24 a

i=11Xi-X22

(2.4)

and, given this estimate of β1,

βn0 = Y-βn1X (2.5)

where X = the mean of X, or aXi/N, and Y = the mean of Y, or aYi/N.

Note that for each different data set, we’ll get different estimates of β1 and β0, depending on the sample.

3. Since

i=1e2i = a

i=1 1Yi-Yni22 and Yni = βn0+βnX1i, OLS actually minimizes

e2i = a

i 1Yi-βn0-βn1Xi22

by choosing the βns that do so. For those with a moderate grasp of calculus and algebra, the derivation of these equations is informative.

39 estimating singLe-independent-VariabLe mOdeLs with OLs

An Illustration of OLS Estimation

The equations for calculating regression coefficients might seem a little for- bidding, but it’s not hard to apply them yourself to data sets that have only a few observations and independent variables. Although you’ll usually want to use regression software packages to do your estimation, you’ll understand OLS better if you work through the following illustration.

To keep things simple, let’s attempt to estimate the regression coefficients of the height and weight data given in Section 1.4. For your convenience in following this illustration, the original data are reproduced in Table 2.1. As was noted previously, the formulas for OLS estimation for a regression equation with one independent variable are Equations 2.4 and 2.5:

βn1 = a

i=131Xi-X2 1Yi- Y24 a

i=11Xi-X22

(2.4)

βn0 = Y-βn1X (2.5)

If we undertake the calculations outlined in Table 2.1 and substitute them into Equations 2.4 and 2.5, we obtain these values:

βn1 = 590.20

92.50 = 6.38

βn0 = 169.4-16.38 # 10.352 = 103.4 or

Yni = 103.4+6.38Xi (2.6)

If you compare these estimates, you’ll find that the manually calculated coefficient estimates are the same as the computer regression results summarized in Section 1.4.

As can be seen in Table 2.1, the sum of the Yns (column 8) equals the sum of the Ys (column 2), so the sum of the residuals (column 9) does indeed equal zero (except for rounding errors).

M02_STUD2742_07_SE_C02.indd 39 1/4/16 6:46 PM

40 ChAptER 2 Ordinary Least squares

Estimating Single-Independent-Variable Models with OLS

Using Regression to Explain Housing Prices

Estimating Multivariate Regression Models with OLS