21.1 The Simple Regression ModelData Generating Process X denote its spending on advertising both in thousands of dollars... 21.1 The Simple Regression ModelData Generating Process The
Trang 2The Simple Regression Model
Chapter 21
Trang 321.1 The Simple Regression Model
How can we test the CAPM (Capital Asset
Pricing Model) for Berkshire Hathaway
stock?
change in Berkshire Hathaway stock as y and the
percentage change in value of the whole stock
market as x
errors, confidence intervals and hypothesis tests
Trang 421.1 The Simple Regression Model
association in the population between an
explanatory variable x and response y.
Consider the data to be a sample from a
population
Trang 521.1 The Simple Regression Model
Linear on Average
conditional mean of Y depends on X.
with intercept β0 and slope β1:
x x
X Y
E
x
y ( ) 0 1
Trang 621.1 The Simple Regression Model
Deviations from the Mean
The deviations of responses around are
called errors
Error, is denoted by , and E( ) = 0
x y
Trang 721.1 The Simple Regression Model
Deviations from the Mean
The SRM makes three assumptions about :
1. Independent Errors are independent of each
Trang 821.1 The Simple Regression Model
Data Generating Process
X denote its spending on advertising (both in
thousands of dollars)
Trang 921.1 The Simple Regression Model
Data Generating Process
The SRM assumes a normal distribution at each x
Trang 1021.1 The Simple Regression Model
Data Generating Process
Eventually the data shown below are observed
Trang 1121.1 The Simple Regression Model
Data Generating Process
The true regression line is a characteristic of the population, not the observed data
The SRM is a model and offers a simplified view
of reality
Trang 1221.1 The Simple Regression Model
Simple Regression Model (SRM)
Observed values of the response Y are linearly related to the values
of the explanatory variable X by the equation:
, ~ N(0, ).
The observations are independent of one another, have equal variance around the regression line, and are normally distributed around the regression line
Trang 1321.2 Conditions for the SRM
Conditions for the SRM – Checklist
Is the association between y and x linear?
Have lurking variables been ruled out?
Are the errors evidently independent?
Are the variances of the residuals similar?
Are the residuals nearly normal?
Trang 1421.2 Conditions for the SRM
Conditions for the SRM – CAPM Example
Linearity condition is satisfied; no pattern in the
residuals Data are shifted to the right because of
Trang 1521.2 Conditions for the SRM
Conditions for the SRM – CAPM Example
No obvious lurking variable (according to CAPM
theory)
Similar variances condition is satisfied Check the
plot of residuals versus x for any fan shaped
pattern (none visible)
Trang 1621.2 Conditions for the SRM
Conditions for the SRM – CAPM Example
Evidently independent No dependence apparent
in the timeplot of the residuals
Trang 1721.2 Conditions for the SRM
Conditions for the SRM – CAPM Example
The residuals are not normally distributed Check
sample size condition (satisfied) to use CLT.
Trang 1821.2 Conditions for the SRM
Modeling Process
Before looking at plots, ask two questions:
1. Does a linear relationship make sense?
2. Is the relationship free of lurking variables?
Then begin working with data
Trang 1921.2 Conditions for the SRM
Modeling Process
Plot y versus x and verify a linear association.
Fit the least squares line and obtain residuals
Plot the residuals versus x.
If time series data, construct a timeplot of
residuals
Inspect the histogram and quantile plot of the
residuals
Trang 2021.3 Inference in Regression
Parameters and Estimates for SRM
Trang 21s n
s s
n
s b
Trang 2221.3 Inference in Regression
Estimated Standard Error of b1
Influenced by:
Standard deviation of the residuals As it
increases, the standard error increases
Sample size As it increases, the standard error decreases
Standard deviation of x As it increases, the
standard error increases
Trang 2321.3 Inference in Regression
Software Results for CAPM Example
Trang 2421.3 Inference in Regression
Confidence Intervals
The 95% confidence interval for β1 is
The 95% confidence interval for β0 is
) ( 1 2
, 025 0
b n
) ( 0 2
, 025 0
b n
Trang 2521.3 Inference in Regression
Confidence Intervals – CAPM Example
The 95% confidence interval for β1 is
The 95% confidence interval for β0 is
] 876
0 to 569
0 [ 077763
0 97 1 7223495
] 065
2 to 727
0 [ 339682
0 97 1 3962046
Trang 26t
Trang 2721.3 Inference in Regression
Hypothesis Tests – CAPM Example
The t-statistic of 9.29 with p-value of < 0.0001
indicates that the slope is significantly different
from zero
The t-statistic of 4.11 with p-value of < 0.0001
indicates that the intercept is significantly different from zero
Trang 284M Example 21.1:
LOCATING A FRANCHISE OUTLET
Motivation
Does traffic volume affect gasoline sales?
How much more gasoline can be expected
to be sold at a franchise location with an
average of 40,000 drive-bys compared to
one with an average of 32,000 drive-bys?
Trang 29confidence interval for 8,000 times the
estimated slope will indicate how much
more gas is expected to sell at the busier
location.
Trang 314M Example 21.1:
LOCATING A FRANCHISE OUTLET
Mechanics
Trang 34Hence, a difference of 8,000 cars in daily
traffic volume implies a difference in
average daily sales of approximately 1,507
to 2,281 more gallons per day.
Trang 354M Example 21.1:
LOCATING A FRANCHISE OUTLET
Message
Based on a sample of 80 gas stations, we
expect that a station located at a site with
40,000 drive bys will sell on average from
1,507 to 2,281 more gallons of gas daily
than a location with 32,000 drive bys.
Trang 3621.4 Prediction Intervals
Leveraging the SRM
fraction (usually 95%) of the values of the
response for a given value of x
interval because it makes a statement about the
location of a new observation rather than a
Trang 37ˆnew t0.025, 2se ynew
new new b b x
y ˆ 0 1
2
2
)1(
)(
11
)ˆ
(
x
new e
new
s n
x
x n
s y
Trang 3821.4 Prediction Intervals
Leveraging the SRM
A simple approximation for a 95% prediction
interval is
Prediction intervals are reliable within the range
of observed data They are also sensitive to the assumptions of constant variance and normality
e
s
y 2 ˆ
Trang 394M Example 21.2:
MANAGING NATURAL RESOURCES
Motivation
In managing commercial fishing fleets, the
level of effort (number of boat-days) is
assumed to influence the size of the catch What is the predicted crab catch in a
season with 7,500 days of effort?
Trang 404M Example 21.2:
MANAGING NATURAL RESOURCES
Method
Use regression with Y equal to the catch
near Vancouver Island from 1980 – 2007
measured in thousands of pounds of
Dungeness crabs with X equal to the level
of effort (total number of days by boats
catching Dungeness crabs).
Trang 424M Example 21.2:
MANAGING NATURAL RESOURCES
Mechanics
Trang 434M Example 21.2:
MANAGING NATURAL RESOURCES
Mechanics
Evidently independent
Trang 464M Example 21.2:
MANAGING NATURAL RESOURCES
Mechanics
The t-statistic (and p-value) indicate that the slope
is significantly different from zero The predicted
catch in a year with x = 7500 days of effort is
1,173.24 thousand pounds The 95% prediction interval is from 908.44 to 1,438.11 thousand
pounds
Trang 47average, each additional day of effort (per boat)
increases the harvest by about 160 pounds In a season with 7,500 days of effort, there is an
expected total harvest of 1,173,240 pounds
There is a 95% probability that the catch will be between 908,440 and 1,438,110 pounds
Trang 48Best Practices
Verify that your model makes sense, both visually and substantively
Consider other possible explanatory variables
Check the conditions, in the listed order
Trang 49Best Practices (Continued)
Use confidence intervals to express what you
know about the slope and intercept
before using prediction intervals
Be careful when extrapolating
Trang 50 Don’t overreact to residual plots
Do not mistake varying amounts of data for