24.1 Identifying Explanatory VariablesWhat explanatory variables belong in a regression model for stock returns?. 24.1 Identifying Explanatory VariablesThe Initial Model Build a model
Trang 2Building Regression
Models
Chapter 24
Trang 324.1 Identifying Explanatory Variables
What explanatory variables belong in a
regression model for stock returns?
Initial model motivated by theory such as CAPM
Seek additional variables that improve fit and
produce better predictions
The process is typically complicated by correlated
explanatory variables (i.e., collinearity)
Trang 424.1 Identifying Explanatory Variables
The Initial Model
Build a model that describes returns on
Sony stock
CAPM provides a theoretical starting point: use % change for the whole stock market
as an explanatory variable
Trang 524.1 Identifying Explanatory Variables
The Initial Model – Scatterplot
Association appears linear, two outliers identified
Trang 624.1 Identifying Explanatory Variables
The Initial Model – Timeplot of Residuals
Locates outliers in time (Dec 1999 and Apr 2003)
No evidence of dependence
Trang 724.1 Identifying Explanatory Variables
The Initial Model – Regression Results
Trang 824.1 Identifying Explanatory Variables
The Initial Model – Residual Plot
Aside from the two outliers, residuals have similar
Trang 924.1 Identifying Explanatory Variables
The Initial Model – Check Normality
Aside from the two outliers, residuals are nearly
normal.
Trang 1024.1 Identifying Explanatory Variables
The Initial Model – Proceed to Inference
Estimates are consistent with CAPM
The estimated intercept is not significantly
different from zero with a p-value of 0.6964.
The estimated slope is highly significant with a
p-value less than 0.0001
Trang 1124.1 Identifying Explanatory Variables
Identifying Other Variables
Research in finance suggests other variables,
should be added to the initial model.
Three of these variables are: percentage change
in the DJIA (Dow % Change) and differences in
performance between small and large companies
(Small-Big) and between growth and value stocks (High-Low)
Trang 1224.1 Identifying Explanatory Variables
Correlation Matrix
Trang 1324.1 Identifying Explanatory Variables
Scatterplot Matrix
Trang 1424.1 Identifying Explanatory Variables
Identifying Other Variables
The correlation matrix indicates that percentage
changes in the DJIA and in the whole market
index are highly correlated
The scatterplot matrix indicates that the
association between the response and these
variables appear linear
Trang 1524.1 Identifying Explanatory Variables
Adding Explanatory Variables
The data consist of 168 observations with four
candidate explanatory variables
Begin model building by including all four
variables in the multiple regression model
Trang 1624.1 Identifying Explanatory Variables
MRM with All Four Explanatory Variables
Trang 1724.1 Identifying Explanatory Variables
Residual Plot: Residuals vs Fitted Values
Outliers are still present; however, this and other
residual plots show the conditions for MRM are
satisfied
Trang 1824.1 Identifying Explanatory Variables
MRM with All Four Explanatory Variables
The F-statistic is 21.59 with p-value of 0.0001; this
multiple regression equation explains statistically
significant variation in percentage changes in the
value of Sony stock.
Based on the t-statistics, only the variable
Small-Big improves a regression that contains all of the
Trang 1924.1 Identifying Explanatory Variables
MRM with All Four Explanatory Variables
Adding other explanatory variables to the initial
model alters the slope for Market % Change.
This once important variable is no longer
statistically significant in explaining percentage
changes in the value of Sony stock
Trang 2024.2 Collinearity
Marginal and Partial Slopes
There is a high correlation between Market %
Change and Dow % Change (r = 0.89).
This collinearity produces imprecise estimates of the partial slopes
It explains the difference between the marginal and
Trang 2124.2 Collinearity
Variance Inflation Factor (VIF)
Variance inflation factor: quantifies the amount of unique variation in each explanatory variable and measures the effect of collinearity
The VIF for is
2
1
1 )
Trang 2224.2 Collinearity
Results for Sony Stock Value Example
Trang 2324.2 Collinearity
Results for Sony Stock Value Example
Is High-Low not statistically significant because it
is redundant or simply unrelated to the response?
Because it has a VIF near 1, collinearity has little effect on this variable (not redundant)
Generally, VIF > 5 or 10 suggests redundancy
Trang 2424.2 Collinearity
Signs of Collinearity
R 2 increases less than we’d expect.
Slopes of correlated explanatory variables in the model change dramatically
The F-statistic is more impressive than individual
t-statistics.
Trang 2524.2 Collinearity
Signs of Collinearity (Continued)
Standard errors for partial slopes are larger than those for marginal slopes
Variance inflation factors increase
Trang 2624.2 Collinearity
Remedies for Collinearity
Remove redundant explanatory variables
Re-express explanatory variables (e.g., use the
average of Market % Change and Dow % Change
as an explanatory variable)
Do nothing if the explanatory variables are
Trang 2724.3 Removing Explanatory Variables
Issues
After adding several explanatory variables to a
model, some of those added and some of those
originally present may not be statistically
Trang 284M Example 24.1:
MARKET SEGMENTATION
Motivation
Within which magazine should a
manufacturer of a new mobile phone
advertise? One has an older audience
They collect consumer ratings on the new
phone design along with consumers’ ages and reported incomes.
Trang 294M Example 24.1:
MARKET SEGMENTATION
Method
Use multiple regression with ratings as the
response and age and income as the
explanatory variables Examine the
correlation matrix and scatterplot matrix.
Trang 324M Example 24.1:
MARKET SEGMENTATION
Mechanics – Estimation Results
Trang 334M Example 24.1:
MARKET SEGMENTATION
Mechanics – Examine Plots
MRM conditions are satisfied.
Trang 344M Example 24.1:
MARKET SEGMENTATION
Mechanics
The F-statistic has a p-value of < 0.0001
The model explains statistically significant variation in the ratings Although collinear, both predictors (age and income) are
statistically significant
Trang 35confidence interval for the slope of Age, an
affluent audience that is younger by 20 years
assigns, on average, ratings that are 1 to 2 points higher than the older, affluent audience
Age changes sign when adjusted for differences in
income Substantively, this makes sense because younger customers with money find the new
design attractive
Trang 364M Example 24.2: RETAIL PROFITS
Motivation
A chain of pharmacies is looking to expand
into a new community It has data for 110
cities on the following variables: income,
disposable income, birth rate, social
security recipients, cardiovascular deaths
and percentage of local population aged 65
or more.
Trang 374M Example 24.2: RETAIL PROFITS
Method
Use multiple regression The response
variable is profit Examine the correlation
matrix and the scatterplot matrix.
Trang 384M Example 24.2: RETAIL PROFITS
Method
Several high correlations are present (shaded in table) and indicate the presence of collinearity.
Trang 394M Example 24.2: RETAIL PROFITS
Method
This partial scatterplot
matrix identifies
communities that are
distinct from others.
Linearity and no
lurking variables
conditions are met.
Trang 404M Example 24.2: RETAIL PROFITS
Mechanics – Estimation Results
Trang 414M Example 24.2: RETAIL PROFITS
Mechanics – Examine Plots
These and other plots (not shown here) indicate
that all MRM conditions are satisfied
Trang 424M Example 24.2: RETAIL PROFITS
Mechanics
The F-statistic indicates that this collection of
explanatory variables explains statistically
significant variation in profits The VIF’s indicate some explanatory variables are redundant and
should be removed (one at a time) from the
model
Trang 434M Example 24.2: RETAIL PROFITS
Mechanics – Simplified Model
This multiple regression separates the effects of birth rates from age (and income) It reveals that cities with higher birth rates produce higher profits when compared to cities with lower birth rates but comparable income and local population above 65.
Trang 444M Example 24.2: RETAIL PROFITS
Message
Three characteristics of the local community affect estimated profits: disposable
income, age and birth rates Increases in
each of these lead to higher profits The
data show that the pharmacy chain will
have to trade off these characteristics in
selecting a site for expansion.
Trang 45Best Practices
Begin a regression analysis by looking at plots
Use the F-statistic for the overall model and a
t-statistic for each explanatory variable
Learn to recognize the presence of collinearity
Don’t fear collinearity – understand it