Tiểu luận môn định giá doanh further development and analysis of the classical linear regression model

Multiple regression using an APT-style model 7.. The elements of the β vector ● SRFSample Regression Function  , where:, • T×1 4.3 How are the parameters the elements of the β vector c

Trang 1

Chapter 4

Further development and analysis of the

classical linear regression model

Phan Tuyết Trinh

Tô Thị Phương Thảo Nguyễn Hoàng Minh Huy

Lâm Bá Du

Lê Chí Cang Huỳnh Thái Huy

GVHD: TS Phùng Đức Nam

Trang 2

1 Generalising the simple model to multiple linear regression

2 The constant term

3 How are the parameters calculated in the generalised case?

4 Testing multiple hypotheses: the F-test

5 Sample output for multiple hypothesis tests

6 Multiple regression using an APT-style model

7 Data mining and the true size of the test

8 Goodness of fit statistics

9 Hedonic pricing models

10 Tests of non-nested hypotheses

11 Quantile regression

Trang 3

4.1 Generalising the simple model

to multiple linear regression

Stock returns might be purported to depend on their sensitivity to unexpected changes in:

• inflation

• the differences in returns on short- and long-dated bonds

• industrial production

• default risks

Trang 4

4.2 The constant term

k is defined as the number of ‘explanatory variables’ or

‘regressors’ including the constant term.

= the number of parameters that are estimated in the regression equation.

Trang 5

The elements of the β vector

● SRF(Sample Regression Function)

 , where:,

•

T×1

4.3 How are the parameters (the elements of the β

vector) calculated in the generalised case?

Trang 6

Ordinary least squares (OLS)

● (: an estimate of the variance of the errors - )

● var

•

Trang 7

Trang 8

Example

Trang 9

var

•

Example

Trang 11

● var

•

Summary

Trang 12

● Mô hình gốc/Mô hình không ràng buộc – UnRestricted

 Ước lượng bằng OLS thu được tổng bình phương các phần dư

URSS, có bậc tự do df (degree of freedom) = T – k

● Mô hình có ràng buộc (Mô hình bị thu hẹp, mất đi m hệ số hồi

Trang 13

Ví dụ mô hình có ràng buộc (Restricted)

Trang 15

View/Coefficient Diagnostics/Wald Test – Coefficient Restriction

Trang 17

• Whether the monthly returns on Microsoft stock can be explained bay reference to unexpected changes in a set of macroeconomic and financial variables.

=> Arbitrage pricing theory (APT)

4.6 Multiple regression using an APT-style model

Trang 18

The steps to take regression model

• Step 1: Open a new Eviews workfile

• Step 2: Import the data

• Step 3: Generate variables:

The APT posits that the stock return can be

explained by reference to the unexpected changes

in the macroeconomic varibles rather their levels

Unexpected value = Actual value – expected value

Trang 19

Generate variables

• Genr

Dspread = baa_aaa_spread – baa_aaa_spread(-1)

Dcredit = consumer_credit – consumer_credit (-1)

Rmsoft = 100*dlog(microsoft)

Rsandp = 100*dlog(sandp)

Dmoney = m1money_supply – m1money_supply(-1)

Inflation = 100*dlog(cpi)

Term = ustb10y – ustb3m

Dinflation = inflation – inflation(-1)

Mustb3m = ustb3m/12

Rterm = term – term(-1)

Ermsoft = rmsoft – mustb3m

Ersandp = rsandp – mustb3m

Trang 20

• Step 4: Object/New Object/ Equation msoftreg: ERMSOFT C ERSANDP DPROD DCREDIT DINFLATION DMONEY DSPREAD RTERM

• Method: Least Squares

Trang 22

• View/Coefficient Diagnostics/Wald Test –

Coefficient Restrictions

• C(3) = 0, C(4) = 0, C(5) = 0, C(6) = 0, C(7) = 0

Trang 23

Null Hypothes is : C(3)=0, C(4)=0, C(5)=0, C(6)=0,C(7)=0

Null Hypothes is Sum m ary:

Norm alized Res triction (= 0) Value Std Err.

Trang 24

Stepwise regression

• Stepwise regression is an automatic variable selection produre which chooses the jointly most important’s explanatory variables from a set of candidate variables

• The simplest is the uni-directional forwards method

• No variables => first variable(the lowest p-value) =>the next lowest p-value

Trang 25

• Object/New Object

• Equation: Msoftstepwise

• Method: STEPLS- Stepwise Least Square

• Dependent variable: ERMSOFT C

• Explanatory variables: ERSANDP DPROD DCREDIT DINFLATION DMONEY DSPREAD RTERM

• Option: Forward, p-value: 0.2

Trang 27

Stepwise procedures have been strongly criticised by statistical purists At the most basic level, they are sometimes argued to be

no better than automated procedures for data mining, in particular if the list of potential candidate variables is long and results from a

‘fishing trip’ rather than a strong prior financial theory

Trang 28

Sample sizes and asymptotic theory

• A question that is often asked by those new to econometrics

is ‘what is an appropriate sample size for model estimation?’

- Most testing procedures in econometrics rely on asymptotic

theory The results in theory hold only if there are an infinite

Trang 29

• test statistics are assumed to follow a random distribution

=> they will take on extreme values that fall in the rejection region some of the time by chance alone

Þ the possibility of rejecting a correct null hypothesis

4.7 Data mining and the true size of the test

Trang 30

• If enough explanatory variables are employed

in a regression, often one or more will be significant by chance alone

• If an α% size of test is used, on average one in every (100/αα) regressions will have a significant

slope coefficient by chance alone

Trang 31

• Trying many variables in a regression without

basing the selection of the candidate variables

on a financial or economic theory is known as

‘data mining’ or ‘data snooping’.

=> The true significance level will be considerably greater than the nominal significance level assumed

Trang 32

To avoid data mining:

• ensuring that the selection of candidate regressors for inclusion in a model is made on the basis of financial or economic theory

• examining the forecast performance of the model in an ‘out-of-sample’ data set

Trang 33

4.8 Goodness of fit statistics

R2

“How well does the model containing the explanatory variables that was proposed actually explain variations in the dependent variable?”

Trang 34

R2

• Quantities known as goodness of fit statistics

are available to test how well the SRF fits the data – that is, how ‘close’ the fitted regression line is to all of the data points taken together

Trang 35

R2

What measures might make plausible candidates

to be goodness of fit statistics?

• RSS

The value of RSS depends to a great extent on the

scale of the dependent variable

• R2

A scaled version of RSS

Trang 36

R2

• It is the square of the correlation coefficient between and

• the square of the correlation between the values

of the dependent variable and the corresponding fitted values from the model

• must lie between 0 and 1

• If this correlation is high, the model fits the data well, while if the correlation is low (close to zero), the model is not providing a good fit to the data

•

Trang 37

R2

The TSS can be split into 2 parts:

• the part that has been explained by the model (the

explained sum of squares, ESS)

• the part that the model was not able to explain (the RSS).

Trang 38

R2

Trang 39

R2

RSS = TSS i.e ESS =0 so R2 = ESS/TSS = 0

• The model has not succeeded in explaining any

of the variability of y about its mean value

• This would happen only where the estimated values of all of the coefficients = 0

Trang 40

R2

ESS = TSS i.e RSS =0 so R2 = ESS/TSS = 1

• The model has explained all of the variability of

y about its mean value

• This would happen only in the case where all of the observation points lie exactly on the fitted line

Trang 41

Trang 43

Problems with R2 as a goodness of fit measure

• R 2 is defined in terms of variation about the mean of y so that if a model is reparameterised (rearranged) and the dependent variable changes, R 2 will change.

• R2 never falls if more regressors are added to the regression

• (3) R2 can take values of 0.9 or higher for time series regressions, and hence it is not good at discriminating between models, since a wide array of models will

frequently have broadly similar (and high) values of R2

Trang 44

Adjusted R 2

So if an extra regressor (variable) is added to the

model, k increases and unless R2 increases by a

more than off-setting amount, will actually fall

Trang 45

• One application of econometric techniques where the coefficients have a particularly intuitively appealing interpretation is in the area

of hedonic pricing models

• Hedonic models are often used to produce appraisals or valuations of properties, given their characteristics (e.g size of dwelling, number of bedrooms, location, number of bathrooms, etc) In these models, the coefficient estimates represent ‘prices of the characteristics’

4.9 Hedonic pricing models

Trang 46

• One such application of a hedonic pricing model is given by Des Rosiers andTheriault (1996), who consider the effect of various amenities on rental values for ´buildings and apartments in five sub-markets in the Quebec area of Canada

• The paper employs 1990 data for the QuebecCity region, and there are 13,378 observations

Trang 47

LnAGE log of the apparent age of the property

NBROOMS number of bedrooms

AREABYRM area per room (in square metres)

ELEVATOR a dummy variable = 1 if the building has an

elevator; 0 otherwise BASEMENT a dummy variable = 1 if the unit is located in a basement; 0 otherwise OUTPARK number of outdoor parking spaces

INDPARK number of indoor parking spaces

NOLEASE a dummy variable = 1 if the unit has no leaseattached to it; 0 otherwise

LnDISTCBD log of the distance in kilometres to the centralbusiness district (CBD)

SINGLPAR percentage of single parent families in the areawhere the building stands

DSHOPCNTR distance in kilometres to the nearest shoppingcentre

VACDIFF1 vacancy difference between the building and thecensus figure

Trang 48

Variable Coefficie nt t- ratio

sign expected A priori

-Hedonic model of rental values in Quebec City, 1990.

Dependent variable: Canadian dollars per month

Trang 49

• This list includes several variables that are dummy variables.

• Dummy variables can be used in the context of cross-sectional or time series regressions

• The dummy variables are used in the same way as other explanatory variables and the coefficients on the dummy variables can beinterpreted as the average differences in the values of the dependent variable foreach category

Trang 50

The relationship between the regression F

-statistic and R

• Recall that the regression F -statistic tests the

null hypothesis that all of the regression slope parameters are simultaneously zero

Trang 51

• One limitation of such studies that is worth

mentioning at this stage is their assumption that the implicit price of each characteristic is

identical across types of property, and that

these characteristics do not become saturated.

Trang 52

• Suppose that there are two researchers

working independently, each with a separate

financial theory for explaining the variation in

some variable, yt

Trang 53

1 γ2 is statistically significant but γ3 is not In this case, (4.50)

collapses to (4.48), and the latter is the preferred model.

2 γ3 is statistically significant but γ2 is not In this case, (4.50)

collapses to (4.49), and the latter is the preferred model.

3 γ2 and γ3 are both statistically significant This would imply that

both x2 and x3 have incremental explanatory power for y, in

which case both variables should be retained Models (4.48)

and (4.49) are both ditched and (4.50) is the preferred model.

4. Neither γ2 nor γ3 are statistically significant In this case, none

of the models can be dropped, and some other method for

choosing between them must be employed.

Selecting between models

4.10 Tests of non-nested hypotheses

Trang 54

• There are several limitations to the use of encompassing regressions to select between non-nested models.

• It could be the case that if they are both

included, neither γ2 nor γ3 are statistically significant, while each is significant in theirseparate regressions (4.48) and (4.49)

4.10 Tests of non-nested hypotheses

Trang 55

Background and motivation

• We may think of there being a non-linear

(∩-shaped) relationship between regulation and

GDP growth

• Estimating a standard linear regression model may lead to seriously misleading estimates: it will ‘average’ the positive and negative effects from very low and very high regulation

4.11 Quantile regression

Trang 56

• Quantile regressions, developed by Koenker

and Bassett (1978), represent a more natural and flexible way to capture the complexities inherent in the relationship by estimating models for the conditional quantile functions

Trang 57

• Quantile regressions can be conducted in both time series and cross-sectional contexts

• It is usually assumed that the dependent

variable (response variable) in the literature on

quantile regressions, is independently distributed and homoscedastic

• Quantile regression is a non-parametric technique

Trang 58

• Quantiles, denoted , refer to the position where

an observation falls within an ordered series for

y

Q(τ ) = inf y : F(y) ≥ τ

where inf refers to the infimum, or the ‘greatest

lower bound’ which is the smallest value of y

satisfying the inequality

• quantiles must lie between 0 and 1

•

Trang 59

Estimation of quantile functions

Trang 60

An application of quantile regression: evaluating fund performance

• A study by Bassett and Chen (2001) performs a style attribution analysis for a mutual fund and, for comparison, the S&P500 index

• Examine how a portfolio’s exposure to various styles varies with performance

Trang 61

An application of quantile regression: evaluating fund performance

• Bassett and Chen (2001) conduct a style

analysis in this spirit by regressing the returns

of a fund on the returns of a large growth

portfolio, the returns of a large value portfolio, the returns of a small growth portfolio, and the returns of a small value portfolio

Định dạng
Số trang	63
Dung lượng	1,94 MB