Multiple regression using an APT-style model 7.. The elements of the β vector ● SRFSample Regression Function , where:, • T×1 4.3 How are the parameters the elements of the β vector c
Trang 1Chapter 4
Further development and analysis of the
classical linear regression model
Phan Tuyết Trinh
Tô Thị Phương Thảo Nguyễn Hoàng Minh Huy
Lâm Bá Du
Lê Chí Cang Huỳnh Thái Huy
GVHD: TS Phùng Đức Nam
Trang 21 Generalising the simple model to multiple linear regression
2 The constant term
3 How are the parameters calculated in the generalised case?
4 Testing multiple hypotheses: the F-test
5 Sample output for multiple hypothesis tests
6 Multiple regression using an APT-style model
7 Data mining and the true size of the test
8 Goodness of fit statistics
9 Hedonic pricing models
10 Tests of non-nested hypotheses
11 Quantile regression
Trang 34.1 Generalising the simple model
to multiple linear regression
Stock returns might be purported to depend on their sensitivity to unexpected changes in:
• inflation
• the differences in returns on short- and long-dated bonds
• industrial production
• default risks
Trang 44.2 The constant term
k is defined as the number of ‘explanatory variables’ or
‘regressors’ including the constant term.
= the number of parameters that are estimated in the regression equation.
Trang 5The elements of the β vector
● SRF(Sample Regression Function)
, where:,
•
T×1
4.3 How are the parameters (the elements of the β
vector) calculated in the generalised case?
Trang 6Ordinary least squares (OLS)
● (: an estimate of the variance of the errors - )
● var
•
4.3 How are the parameters (the elements of the β
vector) calculated in the generalised case?
Trang 74.3 How are the parameters (the elements of the β
vector) calculated in the generalised case?
Trang 8Example
4.3 How are the parameters (the elements of the β
vector) calculated in the generalised case?
Trang 9var
•
Example
4.3 How are the parameters (the elements of the β
vector) calculated in the generalised case?
Trang 11● var
•
Summary
4.3 How are the parameters (the elements of the β
vector) calculated in the generalised case?
Trang 12● Mô hình gốc/Mô hình không ràng buộc – UnRestricted
Ước lượng bằng OLS thu được tổng bình phương các phần dư
URSS, có bậc tự do df (degree of freedom) = T – k
● Mô hình có ràng buộc (Mô hình bị thu hẹp, mất đi m hệ số hồi
Trang 13Ví dụ mô hình có ràng buộc (Restricted)
Trang 15View/Coefficient Diagnostics/Wald Test – Coefficient Restriction
Trang 17• Whether the monthly returns on Microsoft stock can be explained bay reference to unexpected changes in a set of macroeconomic and financial variables.
=> Arbitrage pricing theory (APT)
4.6 Multiple regression using an APT-style model
Trang 18The steps to take regression model
• Step 1: Open a new Eviews workfile
• Step 2: Import the data
• Step 3: Generate variables:
The APT posits that the stock return can be
explained by reference to the unexpected changes
in the macroeconomic varibles rather their levels
Unexpected value = Actual value – expected value
4.6 Multiple regression using an APT-style model
Trang 19Generate variables
• Genr
Dspread = baa_aaa_spread – baa_aaa_spread(-1)
Dcredit = consumer_credit – consumer_credit (-1)
Rmsoft = 100*dlog(microsoft)
Rsandp = 100*dlog(sandp)
Dmoney = m1money_supply – m1money_supply(-1)
Inflation = 100*dlog(cpi)
Term = ustb10y – ustb3m
Dinflation = inflation – inflation(-1)
Mustb3m = ustb3m/12
Rterm = term – term(-1)
Ermsoft = rmsoft – mustb3m
Ersandp = rsandp – mustb3m
4.6 Multiple regression using an APT-style model
Trang 20The steps to take regression model
• Step 4: Object/New Object/ Equation msoftreg: ERMSOFT C ERSANDP DPROD DCREDIT DINFLATION DMONEY DSPREAD RTERM
• Method: Least Squares
4.6 Multiple regression using an APT-style model
Trang 22The steps to take regression model
• View/Coefficient Diagnostics/Wald Test –
Coefficient Restrictions
• C(3) = 0, C(4) = 0, C(5) = 0, C(6) = 0, C(7) = 0
4.6 Multiple regression using an APT-style model
Trang 23Null Hypothes is : C(3)=0, C(4)=0, C(5)=0, C(6)=0,C(7)=0
Null Hypothes is Sum m ary:
Norm alized Res triction (= 0) Value Std Err.
Trang 24Stepwise regression
• Stepwise regression is an automatic variable selection produre which chooses the jointly most important’s explanatory variables from a set of candidate variables
• The simplest is the uni-directional forwards method
• No variables => first variable(the lowest p-value) =>the next lowest p-value
4.6 Multiple regression using an APT-style model
Trang 25• Object/New Object
• Equation: Msoftstepwise
• Method: STEPLS- Stepwise Least Square
• Dependent variable: ERMSOFT C
• Explanatory variables: ERSANDP DPROD DCREDIT DINFLATION DMONEY DSPREAD RTERM
• Option: Forward, p-value: 0.2
Stepwise regression
4.6 Multiple regression using an APT-style model
Trang 27Stepwise regression
Stepwise procedures have been strongly criticised by statistical purists At the most basic level, they are sometimes argued to be
no better than automated procedures for data mining, in particular if the list of potential candidate variables is long and results from a
‘fishing trip’ rather than a strong prior financial theory
4.6 Multiple regression using an APT-style model
Trang 28Sample sizes and asymptotic theory
• A question that is often asked by those new to econometrics
is ‘what is an appropriate sample size for model estimation?’
- Most testing procedures in econometrics rely on asymptotic
theory The results in theory hold only if there are an infinite
Trang 29• test statistics are assumed to follow a random distribution
=> they will take on extreme values that fall in the rejection region some of the time by chance alone
Þ the possibility of rejecting a correct null hypothesis
4.7 Data mining and the true size of the test
Trang 30• If enough explanatory variables are employed
in a regression, often one or more will be significant by chance alone
• If an α% size of test is used, on average one in every (100/αα) regressions will have a significant
slope coefficient by chance alone
4.7 Data mining and the true size of the test
Trang 31• Trying many variables in a regression without
basing the selection of the candidate variables
on a financial or economic theory is known as
‘data mining’ or ‘data snooping’.
=> The true significance level will be considerably greater than the nominal significance level assumed
4.7 Data mining and the true size of the test
Trang 32To avoid data mining:
• ensuring that the selection of candidate regressors for inclusion in a model is made on the basis of financial or economic theory
• examining the forecast performance of the model in an ‘out-of-sample’ data set
4.7 Data mining and the true size of the test
Trang 334.8 Goodness of fit statistics
R2
“How well does the model containing the explanatory variables that was proposed actually explain variations in the dependent variable?”
Trang 344.8 Goodness of fit statistics
R2
• Quantities known as goodness of fit statistics
are available to test how well the SRF fits the data – that is, how ‘close’ the fitted regression line is to all of the data points taken together
Trang 354.8 Goodness of fit statistics
R2
What measures might make plausible candidates
to be goodness of fit statistics?
• RSS
The value of RSS depends to a great extent on the
scale of the dependent variable
• R2
A scaled version of RSS
Trang 364.8 Goodness of fit statistics
R2
• It is the square of the correlation coefficient between and
• the square of the correlation between the values
of the dependent variable and the corresponding fitted values from the model
• must lie between 0 and 1
• If this correlation is high, the model fits the data well, while if the correlation is low (close to zero), the model is not providing a good fit to the data
•
Trang 374.8 Goodness of fit statistics
R2
The TSS can be split into 2 parts:
• the part that has been explained by the model (the
explained sum of squares, ESS)
• the part that the model was not able to explain (the RSS).
Trang 384.8 Goodness of fit statistics
R2
Trang 394.8 Goodness of fit statistics
R2
RSS = TSS i.e ESS =0 so R2 = ESS/TSS = 0
• The model has not succeeded in explaining any
of the variability of y about its mean value
• This would happen only where the estimated values of all of the coefficients = 0
Trang 404.8 Goodness of fit statistics
R2
ESS = TSS i.e RSS =0 so R2 = ESS/TSS = 1
• The model has explained all of the variability of
y about its mean value
• This would happen only in the case where all of the observation points lie exactly on the fitted line
Trang 414.8 Goodness of fit statistics
Trang 43Problems with R2 as a goodness of fit measure
• R 2 is defined in terms of variation about the mean of y so that if a model is reparameterised (rearranged) and the dependent variable changes, R 2 will change.
• R2 never falls if more regressors are added to the regression
• (3) R2 can take values of 0.9 or higher for time series regressions, and hence it is not good at discriminating between models, since a wide array of models will
frequently have broadly similar (and high) values of R2
Trang 444.8 Goodness of fit statistics
Adjusted R 2
So if an extra regressor (variable) is added to the
model, k increases and unless R2 increases by a
more than off-setting amount, will actually fall
Trang 45• One application of econometric techniques where the coefficients have a particularly intuitively appealing interpretation is in the area
of hedonic pricing models
• Hedonic models are often used to produce appraisals or valuations of properties, given their characteristics (e.g size of dwelling, number of bedrooms, location, number of bathrooms, etc) In these models, the coefficient estimates represent ‘prices of the characteristics’
4.9 Hedonic pricing models
Trang 46• One such application of a hedonic pricing model is given by Des Rosiers andTheriault (1996), who consider the effect of various amenities on rental values for ´buildings and apartments in five sub-markets in the Quebec area of Canada
• The paper employs 1990 data for the QuebecCity region, and there are 13,378 observations
4.9 Hedonic pricing models
Trang 47LnAGE log of the apparent age of the property
NBROOMS number of bedrooms
AREABYRM area per room (in square metres)
ELEVATOR a dummy variable = 1 if the building has an
elevator; 0 otherwise BASEMENT a dummy variable = 1 if the unit is located in a basement; 0 otherwise OUTPARK number of outdoor parking spaces
INDPARK number of indoor parking spaces
NOLEASE a dummy variable = 1 if the unit has no leaseattached to it; 0 otherwise
LnDISTCBD log of the distance in kilometres to the centralbusiness district (CBD)
SINGLPAR percentage of single parent families in the areawhere the building stands
DSHOPCNTR distance in kilometres to the nearest shoppingcentre
VACDIFF1 vacancy difference between the building and thecensus figure
4.9 Hedonic pricing models
Trang 48Variable Coefficie nt t- ratio
sign expected A priori
-Hedonic model of rental values in Quebec City, 1990.
Dependent variable: Canadian dollars per month
4.9 Hedonic pricing models
Trang 49• This list includes several variables that are dummy variables.
• Dummy variables can be used in the context of cross-sectional or time series regressions
• The dummy variables are used in the same way as other explanatory variables and the coefficients on the dummy variables can beinterpreted as the average differences in the values of the dependent variable foreach category
4.9 Hedonic pricing models
Trang 50The relationship between the regression F
-statistic and R
• Recall that the regression F -statistic tests the
null hypothesis that all of the regression slope parameters are simultaneously zero
Trang 51• One limitation of such studies that is worth
mentioning at this stage is their assumption that the implicit price of each characteristic is
identical across types of property, and that
these characteristics do not become saturated.
4.9 Hedonic pricing models
Trang 52• Suppose that there are two researchers
working independently, each with a separate
financial theory for explaining the variation in
some variable, yt
Trang 531 γ2 is statistically significant but γ3 is not In this case, (4.50)
collapses to (4.48), and the latter is the preferred model.
2 γ3 is statistically significant but γ2 is not In this case, (4.50)
collapses to (4.49), and the latter is the preferred model.
3 γ2 and γ3 are both statistically significant This would imply that
both x2 and x3 have incremental explanatory power for y, in
which case both variables should be retained Models (4.48)
and (4.49) are both ditched and (4.50) is the preferred model.
4. Neither γ2 nor γ3 are statistically significant In this case, none
of the models can be dropped, and some other method for
choosing between them must be employed.
Selecting between models
4.10 Tests of non-nested hypotheses
Trang 54• There are several limitations to the use of encompassing regressions to select between non-nested models.
• It could be the case that if they are both
included, neither γ2 nor γ3 are statistically significant, while each is significant in theirseparate regressions (4.48) and (4.49)
4.10 Tests of non-nested hypotheses
Trang 55Background and motivation
• We may think of there being a non-linear
(∩-shaped) relationship between regulation and
GDP growth
• Estimating a standard linear regression model may lead to seriously misleading estimates: it will ‘average’ the positive and negative effects from very low and very high regulation
4.11 Quantile regression
Trang 56Background and motivation
• Quantile regressions, developed by Koenker
and Bassett (1978), represent a more natural and flexible way to capture the complexities inherent in the relationship by estimating models for the conditional quantile functions
4.11 Quantile regression
Trang 57Background and motivation
• Quantile regressions can be conducted in both time series and cross-sectional contexts
• It is usually assumed that the dependent
variable (response variable) in the literature on
quantile regressions, is independently distributed and homoscedastic
• Quantile regression is a non-parametric technique
4.11 Quantile regression
Trang 58Background and motivation
• Quantiles, denoted , refer to the position where
an observation falls within an ordered series for
y
Q(τ ) = inf y : F(y) ≥ τ
where inf refers to the infimum, or the ‘greatest
lower bound’ which is the smallest value of y
satisfying the inequality
• quantiles must lie between 0 and 1
•
4.11 Quantile regression
Trang 59Estimation of quantile functions
4.11 Quantile regression
Trang 60An application of quantile regression: evaluating fund performance
• A study by Bassett and Chen (2001) performs a style attribution analysis for a mutual fund and, for comparison, the S&P500 index
• Examine how a portfolio’s exposure to various styles varies with performance
4.11 Quantile regression
Trang 61An application of quantile regression: evaluating fund performance
• Bassett and Chen (2001) conduct a style
analysis in this spirit by regressing the returns
of a fund on the returns of a large growth
portfolio, the returns of a large value portfolio, the returns of a small growth portfolio, and the returns of a small value portfolio
4.11 Quantile regression