Advanced econometric models with matlab

Compare large and small stepwise models This example show how to compare the models that LinearModel.stepwise returns starting from aconstant model and starting from a full interaction m

Trang 1

Free ebooks ==> www.Ebook777.com

ADVANCED ECONOMETRIC MODELS with MATLAB

Stepwise Regression to Select Appropriate Models

LinearModel.stepwise creates a linear model and automatically adds to or trims the model To create asmall model, start from a constant model To create a large model, start with a model containing manyterms A large model usually has lower error as measured by the fit to the original data, but might nothave any advantage in predicting new data

LinearModel.stepwise can use all the name-value options from LinearModel.fit, with additional optionsrelating to the starting and bounding models In particular:

• For a small model, start with the default lower bounding model: 'constant' (a model that has no

predictor terms)

• The default upper bounding model has linear terms and interaction terms (products of pairs of

predictors) For an upper bounding model that also includes squared terms, set the Upper name-valuepair to 'quadratic'

Compare large and small stepwise models

This example show how to compare the models that LinearModel.stepwise returns starting from aconstant model and starting from a full interaction model

Load the carbig data and create a dataset array from some of the data

load carbig

ds = dataset(Acceleration,Displacement,Horsepower,Weight,MPG);

Create a mileage model stepwise starting from the constant model

mdl1 = LinearModel.stepwise(ds,'constant','ResponseVar','MPG')

1 Adding Weight, FStat = 888.8507, pValue = 2.9728e-103

2 Adding Horsepower, FStat = 3.8217, pValue = 0.00049608

3 Adding Horsepower:Weight, FStat = 64.8709, pValue = 9.93362e-15

mdl1 =

Linear regression model: MPG ~ 1 + Horsepower*Weight

Estimated Coefficients: Estimate SE tStat pValue (Intercept) 63.558 2.3429 27.127 1.2343e-91 Horsepower -0.25084 0.027279 -9.1952 2.3226e-18 Weight -0.010772 0.00077381 -13.921 5.1372e-36 Horsepower:Weight 5.3554e-05 6.6491e-06 8.0542 9.9336e-15

Number of observations: 392, Error degrees of freedom: 388 Root Mean Squared Error: 3.93 R-squared: 0.748, Adjusted R-Squared 0.746 statistic vs constant model: 385, p-value = 7.26e-116

F-www.Ebook777.com

Trang 2

Create a mileage model stepwise starting from the full interaction model

mdl2 = LinearModel.stepwise(ds,'interactions','ResponseVar','MPG')

1 Removing Acceleration:Displacement, FStat = 0.024186, pValue = 0.8765

2 Removing Displacement:Weight, FStat = 0.33103, pValue = 0.56539

3 Removing Acceleration:Horsepower, FStat = 1.7334, pValue = 0.18876

4 Removing Acceleration:Weight, FStat = 0.93269, pValue = 0.33477

5 Removing Horsepower:Weight, FStat = 0.64486, pValue = 0.42245

mdl2 =

Linear regression model: MPG ~ 1 + Acceleration + Weight + Displacement*Horsepower

Estimated Coefficients: Estimate SE tStat pValue (Intercept) 61.285 2.8052 21.847 1.8593e-69 Acceleration -0.34401 0.11862 -2.9 0.0039445 Displacement -0.081198 0.010071 -8.0623 9.5014e-15 Horsepower -0.24313 0.026068 -9.3265 8.6556e-19 Weight -0.0014367 0.00084041 -1.7095 0.088166 Displacement:Horsepower 0.00054236 5.7987e-05 9.3531 7.0527e-19

Number of observations: 392, Error degrees of freedom: 386 Root Mean Squared Error: 3.84 R-squared: 0.761, Adjusted R-Squared 0.758 statistic vs constant model: 246, p-value = 1.32e-117

F-Notice that:

• mdl1 has four coefficients (the Estimate column), and mdl2 has six coefficients.

• The adjusted R-squared of mdl1 is 0.746, which is slightly less (worse) than that of mdl2, 0.758.

Create a mileage model stepwise with a full quadratic model as the upper bound, starting from the fullquadratic model:

Trang 3

The models have similar residuals It is not clear which fits the data better Interestingly, the morecomplex models have larger maximum deviations of the residuals:

Rrange1 = [min(mdl1.Residuals.Raw),max(mdl1.Residuals.Raw)]; Rrange2 =

“What Is Robust Regression?” on page 9-116

“Robust Regression versus Standard Least-Squares Fit” on page 9-116

What Is Robust Regression?

The models described in “What Are Linear Regression Models?” on page 9-7 are based on certainassumptions, such as a normal distribution of errors in the observed responses If the distribution oferrors is asymmetric or prone to outliers, model assumptions are invalidated, and parameter estimates,confidence intervals, and other computed statistics become unreliable Use LinearModel.fit with the

Trang 4

RobustOpts name-value pair to create a model that is not much affected by outliers The robust fittingmethod is less sensitive than ordinary least squares to large changes in small parts of the data.

Robust regression works by assigning a weight to each data point Weighting is done automatically and

iteratively using a process called iteratively reweighted least squares In the first iteration, each point is

assigned equal weight and model coefficients are estimated using ordinary least squares At subsequentiterations, weights are recomputed so that points farther from model predictions in the previous iterationare given lower weight Model coefficients are then recomputed using weighted least squares Theprocess continues until the values of the coefficient estimates converge within a specified tolerance

Robust Regression versus Standard Least-Squares Fit

This example shows how to use robust regression It compares the results of a robust fit to a standardleast-squares fit

Step 1 Prepare data.

Load the moore data The data is in the first five columns, and the response in the sixth

load moore X = [moore(:,1:5)]; y = moore(:,6);

Step 2 Fit robust and nonrobust models.

Fit two linear models to the data, one using robust fitting, one not

mdl = LinearModel.fit(X,y); % not robust mdlr = LinearModel.fit(X,y,'RobustOpts','on');

Step 3 Examine model residuals.

Examine the residuals of the two models

subplot(1,2,1);plotResiduals(mdl,'probability') subplot(1,2,2);plotResiduals(mdlr,'probability')

The residuals from therobust fit (right half of the plot) are nearly all closer to the straight line, except for the one obviousoutlier

4 Remove the outlier from the standard model

Trang 5

Find the index of the outlier Examine the weight of the outlier in the robust fit

[~,outlier] = max(mdlr.Residuals.Raw); mdlr.Robust.Weights(outlier)

“Introduction to Ridge Regression” on page 9-119 “Ridge Regression” on page 9-119

Introduction to Ridge Regression

Coefficient estimates for the models described in “Linear Regression” on page 9-11 rely on the

independence of the model terms When terms are correlated and the columns of the design matrix X have an approximate linear dependence, the matrix (X T X)–1becomes close to singular As a result, theleast-squares estimate

where k is the ridge parameter and I is the identity matrix Small positive values of k improve the

conditioning of the problem and reduce the variance of the estimates While biased, the reduced variance

of ridge estimates often result in a smaller mean square error when compared to least-squares estimates.The Statistics Toolbox function ridge carries out ridge regression

www.Ebook777.com

Trang 6

xlabel('x2'); ylabel('x3'); grid on; axis square

Note the correlationbetween x1 and the other two predictor variables

Use ridge and x2fx to compute coefficient estimates for a multilinear model with interaction terms, for arange of ridge parameters:

Trang 7

The estimates stabilize to the right of the plot Note that the coefficient of the x2x3 interaction termchanges sign at a value of the ridge parameter § 5×10–4.

Lasso and Elastic Net

In this section

“What Are Lasso and Elastic Net?” on page 9-123 “Lasso Regularization” on page 9-123

“Lasso and Elastic Net with Cross Validation” on page 9-126 “Wide Data via Lasso and Parallel

Computing” on page 9-129 “Lasso and Elastic Net Details” on page 9-134 “References” on page 9-136

What Are Lasso and Elastic Net?

Lasso is a regularization technique Use lasso to:

• Reduce the number of predictors in a regression model.

• Identify important predictors.

• Select among redundant predictors.

• Produce shrinkage estimates with potentially lower predictive errors than ordinary least squares.

Trang 8

Elastic net is a related technique Use elastic net when you have several highly correlated variables.lasso provides elastic net regularization when you set the Alpha name-value pair to a number strictlybetween 0 and 1.

See “Lasso and Elastic Net Details” on page 9-134

For lasso regularization of regression ensembles, see regularize

Lasso Regularization

To see how lasso identifies and discards unnecessary predictors:

1Generate 200 samples of five-dimensional artificial data X from exponential distributions with variousmeans:

rng(3,'twister') % for reproducibility X = zeros(200,5);

for ii = 1:5

X(:,ii) = exprnd(ii,200,1); end

2Generate response dataY= X*r+ eps where r has just two nonzero components, and the noise eps isnormal with standard deviation 0.1:

The plot shows the nonzero coefficients in the regression for various values of the Lambda

regularization parameter Larger values of Lambda appear on the left side of the graph, meaning moreregularization, resulting in fewer nonzero regression coefficients

Trang 9

The dashed vertical lines represent the Lambda value with minimal mean squared error (on the right),and the Lambda value with minimal mean squared error plus one standard deviation This latter value is

a recommended setting for Lambda These lines appear only when you perform cross validation Crossvalidate by setting the 'CV' name-value pair This example uses 10-fold cross validation

The upper part of the plot shows the degrees of freedom (df), meaning the number of nonzero

coefficients in the regression, as a function of Lambda On the left, the large value of Lambda causes allbut one coefficient to be 0 On the right all five coefficients are nonzero, though the plot shows only twoclearly The other three coefficients are so small that you cannot visually distinguish them from 0.For small values of Lambda (toward the right in the plot), the coefficient values are close to the least-squares estimate See step 5 on page 9-126

4Find the Lambda value of the minimal cross-validated mean squared error plus one standard deviation.Examine the MSE and coefficients of the fit at that Lambda:

lam = fitinfo.Index1SE; fitinfo.MSE(lam)

lasso did a good job finding the coefficient vector r

5For comparison, find the least-squares estimate of r:

rhat = X\Y

rhat =

-0.0038 1.9952 0.0014

-2.9993 0.0031

The estimate b(:,lam) has slightly more mean squared error than the mean squared error of rhat:

res = X*rhat - Y; % calculate residuals MSEmin = res'*res/200 % b(:,lam) value is 0.1398

MSEmin =

0.0088

But b(:,lam) has only two nonzero components, and therefore can provide better predictive estimates onnew data

Lasso and Elastic Net with Cross Validation

Consider predicting the mileage ( MPG) of a car based on its weight, displacement, horsepower, andacceleration The carbig data contains these measurements The data seem likely to be correlated,

making elastic net an attractive choice

Trang 10

1Load the data:

load carbig

2Extract the continuous (noncategorical) predictors (lasso does not handle categorical predictors):

X = [Acceleration Displacement Horsepower Weight];

3Perform a lasso fit with 10-fold cross validation:

[b fitinfo] = lasso(X,MPG,'CV',10);

4Plot the result:

lassoPlot(b,fitinfo,'PlotType','Lambda','XScale','log');

5Calculate thecorrelation of the predictors:

% Eliminate NaNs so corr runs

7Plot the result Name each predictor so you can tell which curve is which:

pnames = {'Acceleration','Displacement', 'Horsepower','Weight'};

lassoPlot(ba,fitinfoa,'PlotType','Lambda', 'XScale','log','PredictorNames',pnames);

www.Ebook777.com

Trang 11

When you activate the data cursor

and click the plot, you see the name of the predictor, the coefficient, the value of Lambda, and the index

of that point, meaning the column in b associated with that fit

Here, the elastic net and lasso results are not very similar Also, the elastic net plot reflects a notablequalitative property of the elastic net technique The elastic net retains three nonzero coefficients asLambda increases (toward the left of the plot), and these three coefficients reach 0 at about the sameLambda value In contrast, the lasso plot shows two of the three coefficients becoming 0 at the samevalue of Lambda, while another coefficient remains nonzero for higher values of Lambda

This behavior exemplifies a general pattern In general, elastic net tends to retain or drop groups ofhighly correlated predictors as Lambda increases In contrast, lasso tends to drop smaller groups, or evenindividual predictors

Wide Data via Lasso and Parallel Computing

Lassoandelasticnetareespeciallywellsuitedto wide data, meaning data with more predictors than

observations Obviously, there are redundant predictors in this type of data Use lasso along with crossvalidation to identify important predictors

Cross validation can be slow If you have a Parallel Computing Toolbox license, speed the computationusing parallel computing

1Load the spectra data:

load spectra Description

Description =

== Spectral and octane data of gasoline == NIR spectra and octane numbers of 60 gasoline samples

NIR: NIR spectra, measured in 2 nm intervals from 900 nm to 1700 nm octane: octane numbers spectra: a dataset array containing variables for NIR and octane

Trang 12

Reference: Kalivas, John H., "Two Data Sets of Near Infrared Spectra," Chemometrics and Intelligent Laboratory Systems, v.37 (1997) pp.255 259

2Compute the default lasso fit:

Elapsed time is 226.876926 seconds

5Plot the result:

lassoPlot(b,fitinfo,'PlotType','Lambda','XScale','log');

You can see thesuggested value of Lambda is over 1e-2, and the Lambda with minimal MSE is under 1e-2 These valuesare in the fitinfo structure:

Trang 13

ans =

0.0532

fitinfo.DF(lambdaindex)

ans = 11

The fit uses just 11 of the 401 predictors, and achieves a cross-validated MSE of 0.0532

7Examine the plot of cross-validated MSE:

lassoPlot(b,fitinfo,'PlotType','CV');

% Use a log scale for MSE to see small MSE values better set(gca,'YScale','log');

As Lambda increases (toward the left), MSE increases rapidly The coefficients are reduced too muchand they do not adequately fit the responses

As Lambda decreases, the models are larger (have more nonzero coefficients) The increasing MSEsuggests that the models are overfitted

The default set of Lambda values does not include values small enough to include all predictors In thiscase, there does not appear to be a reason to look at smaller values However, if you want smaller valuesthan the default, use the LambdaRatio parameter, or supply a sequence of Lambda values using theLambda parameter For details, see the lasso reference page

8To compute the cross-validated lasso estimate faster, use parallel computing (available with a ParallelComputing Toolbox license):

matlabpool open

Starting matlabpool using the 'local' configuration connected to 4 labs

Trang 14

opts = statset('UseParallel',true);

tic;

[b fitinfo] = lasso(NIR,octane,'CV',10,'Options',opts); toc

Elapsed time is 107.539719 seconds

Computing in parallel is more than twice as fast on this problem using a quad-core processor

Lasso and Elastic Net Details

Overview of Lasso and Elastic Net

Lasso is a regularization technique for performing linear regression Lasso includes a penalty term thatconstrains the size of the estimated coefficients Therefore, it resembles ridge regression Lasso is a

shrinkage estimator:it generates coefficient estimates that are biased to be small Nevertheless, a lasso

estimator can have smaller mean squared error than an ordinary least-squares estimator when you apply

it to new data

Unlike ridge regression, as the penalty term increases, lasso sets more coefficients to zero This meansthat the lasso estimator is a smaller model, with fewer predictors As such, lasso is an alternative tostepwise regression and other model selection and dimensionality reduction techniques

Elastic net is a related technique Elastic net is a hybrid of ridge regression and lasso regularization Likelasso, elastic net can generate reduced models by generating zero-valued coefficients Empirical studieshave suggested that the elastic net technique can outperform lasso on data with highly correlated

predictors

Definition of Lasso

The lasso technique solves this regularization problem For a given value of , a nonnegative parameter,

lasso solves the problem

• N is the number of observations.

•y i is the response at observation i.

•x i is data, a vector of p values at observation i.

• is a positive regularization parameter corresponding to one value of Lambda.

• The parameters0and are scalar and p-vector respectively.

As increases, the number of nonzero components of decreases

The lasso problem involves the L1norm of , as contrasted with the elastic net algorithm

Trang 15

Definition of Elastic Net

The elastic net technique solves this regularization problem For an strictly between 0 and 1, and a

nonnegative , elastic net solves the problem min

Elastic net is the same as lasso when =1 As shrinks toward 0, elastic net approaches ridge regression

For other values of , the penalty term P ( ) interpolates between the L1norm of and the squared L2norm

of

References

[1] Tibshirani, R Regression shrinkage and selection via the lasso Journal of the Royal Statistical

Society, Series B, Vol 58, No 1, pp 267–288, 1996

[2] Zou,H.andT.Hastie Regularization and variable selection via the elastic net Journal of the Royal

Statistical Society, Series B, Vol 67, No 2, pp 301–320, 2005

[3] Friedman, J., R Tibshirani, and T Hastie Regularization paths for generalized linear models via coordinate descent Journal of Statistical Software, Vol 33, No 1, 2010 http://www.jstatsoft.org/v33/i01 [4] Hastie, T., R Tibshirani, and J Friedman The Elements of Statistical Learning, 2nd edition.

Springer, New York, 2008

In this section

“Introduction to Partial Least Squares” on page 9-137 “Partial Least Squares” on page 9-138

Introduction to Partial Least Squares

Partial least-squares (PLS) regression is a technique used with data that contain correlated predictor variables This technique constructs new predictor variables, known as components, as linear

Trang 16

combinations of the original predictor variables PLS constructs these components while considering theobserved response values, leading to a parsimonious model with reliable predictive power.

The technique is something of a cross between multiple linear regression and principal componentanalysis:

• Multiple linear regression finds a combination of the predictors that best fit aresponse.

• Principal component analysis finds combinations of the predictors with large variance, reducing

correlations The technique makes no use of response values

• PLS finds combinations of the predictors that have a large covariance with theresponsevalues.

PLS therefore combines information about the variances of both the predictors and the responses, whilealso considering the correlations among them

PLS shares characteristics with other regression and feature transformation techniques It is similar toridge regression in that it is used in situations with correlated predictors It is similar to stepwise

regression (or more general feature selection techniques) in that it can be used to select a smaller set ofmodel terms PLS differs from these methods, however, by transforming the original predictor space intothe new component space

The Statistics Toolbox function plsregress carries out PLS regression For example, consider the data onbiochemical oxygen demand in moore.mat, padded with noisy versions of the predictors to introducecorrelations:

Trang 17

Choosing the number of components in a PLS model is a critical step The plot gives a rough indication,showing nearly 80% of the variance in y explained by the first component, with as many as five

additional components making significant contributions

The following computes the six-component model:

[XL,yl,XS,YS,beta,PCTVAR,MSE,stats] = plsregress(X,y,6); yfit = [ones(size(X,1),1) X]*beta;

plot(y,yfit,'o')

The scatter shows a reasonable

correlation between fitted and observed responses, and this is confirmed by the R2statistic:

TSS = sum((y-mean(y)).^2);

RSS = sum((y-yfit).^2);

Trang 18

Rsquared = 1 - RSS/TSS

Rsquared =

0.8421

A plot of the weights of the ten predictors in each of the six components shows that two of the

components (the last two computed) explain the majority of the variance in X:

Trang 19

The calculation of mean-squared errors by plsregress is controlled by optional parameter name/valuepairs specifying cross-validation type and the number of Monte Carlo repetitions.

Generalized Linear Models

In this section

“What Are Generalized Linear Models?” on page 9-143 “Prepare Data” on page 9-144

“Choose Generalized Linear Model and Link Function” on page 9-146 “Choose Fitting Method andModel” on page 9-150

“Fit Model to Data” on page 9-155

“Examine Quality and Adjust the Fitted Model” on page 9-156 “Predict or Simulate Responses to NewData” on page 9-168 “Share Fitted Models” on page 9-171

“Generalized Linear Model Workflow” on page 9-173

What Are Generalized Linear Models?

Linear regression models describe a linear relationship between a response and one or more predictiveterms Many times, however, a nonlinear relationship exists “Nonlinear Regression” on page 9-198

describes general nonlinear models A special class of nonlinear models, called generalized linear models, uses linear methods.

Recall that linear models have these characteristics:

• At each set of values for the predictors, the response has a normal distribution with mean

• A coefficient vector b defines a linear combination Xb of the predictors X.

• The model is = Xb.

In generalized linear models, these characteristics are generalized as follows:

Trang 20

• At each set of values for the predictors, the response has a distribution that can be normal, binomial,

Poisson, gamma, or inverse Gaussian, with parameters including a mean

• A coefficient vector b defines a linear combination Xb of the predictors X.

• A link function f defines the model as f( )= Xb.

Prepare Data

To begin fitting a regression, put your data into a form that fitting functions expect All regression

techniques begin with input data in an array X and response data in a separate vector y, or input data in adataset array ds and response data as a column in ds Each row of the input data represents one

observation Each column represents one predictor (variable)

For a dataset array ds, indicate the response variable with the 'ResponseVar' name-value pair:

mdl = LinearModel.fit(ds,'ResponseVar','BloodPressure'); %or

mdl = GeneralizedLinearModel.fit(ds,'ResponseVar','BloodPressure');

The response variable is the last column by default

You can use numeric categorical predictors A categorical predictor is one that takes values from a fixed

set of possibilities

• For a numeric array X, indicate the categorical predictors using the 'Categorical' name-value pair For

example, to indicate that predictors 2 and 3 out of six are categorical:

- Categorical (nominal or ordinal)

- String or character array

If you want to indicate that a numeric predictor is categorical, use the 'Categorical' name-value pair.Represent missing numeric data as NaN To represent missing data for other data types, see “MissingGroup Values” on page 2-53

• For a 'binomial' model with data matrix X, the response y can be:

- Binary column vector — Each entry represents success (1)orfailure(0).

- Two-column matrix of integers — The first column is the number of successes in each observation, the

second column is the number of trials in that observation

• For a 'binomial' model with dataset ds:

- Use the ResponseVar name-value pair to specify the column of ds that gives the number of successes in

www.Ebook777.com

Trang 21

ds = dataset(MPG,Weight); ds.Year = ordinal(Model_Year);

Numeric Matrix for Input Data, Numeric Vector for Response For example, to create numeric arrays

from workspace variables:

load carsmall

X = [Weight Horsepower Cylinders Model_Year]; y = MPG;

To create numeric arrays from an Excel spreadsheet:

[X Xnames] = xlsread('hospital.xls'); y = X(:,4); % response y is systolic pressure X(:,4) = []; % remove

y from the X matrix Notice that the nonnumeric entries, such as sex, do not appear in X

Choose Generalized Linear Model and Link Function

Often, your data suggests the distribution type of the generalized linear model

Response Data Type

Any real number

Any positive number

Any nonnegative integer

Integer from 0 to n,where n is a fixed positive value

Suggested Model Distribution Type

'normal'

'gamma' or 'inverse gaussian' 'poisson'

'binomial'

Set the model distribution type with the Distribution name-value pair After selecting your model type,

choose a link function to map between the mean μ and the linear predictor Xb.

Value

'comploglog'

'identity', default for the distribution 'normal'

'log', default for the distribution 'poisson'

'logit' , default for the distribution 'binomial' 'loglog'

Trang 22

p (a number), default for the distribution 'inverse gaussian' (with p = –2)

Cell array of the form

{FL FD FI}, containing

three function handles, created using @, that define the link (FL),thederivativeofthe link (FD), and theinverse link (FI) Equivalently, can be a structure of function handles with field Link containing FL,field Derivative containing FD,and field Inverse

containing FI

Description μ p = Xb

User-specified link function (see “Custom Link Function” on page 9-147)

The nondefault link functions are mainly useful for binomial models These nondefault link functionsare 'comploglog', 'loglog',and 'probit'

Custom Link Function

The link function defines the relationship f(μ)= Xb between the mean response μ and the linear

combination Xb = X*b of the predictors You can choose one of the built-in link functions or define your

own by specifying the link function FL, its derivative FD, and its inverse FI:

• The link function FL calculates f(μ).

• The derivative of the link function FD calculates df(μ)/dμ.

• The inverse function FI calculates g(Xb)= μ.

You can specify a custom link function in either of two equivalent ways Each way contains function

handles that accept a single array of values representing μ or Xb, and returns an array the same size The

function handles are either in a cell array or a structure:

• Cell array of the form {FL FD FI}, containing three function handles,

created using @, that define the link (FL),thederivativeofthelink(FD), and the inverse link (FI)

• Structure s with three fields, each containing a function handle created using @:

- s.Link — Link function

- s.Derivative — Derivative of the link function

- s.Inverse — Inverse of the link function

Trang 23

For example, to fit a model using the 'probit' link function:

Chi^2-statistic vs constant model: 241, p-value = 2.25e-54

You can perform the same fit using a custom link function that performs identically to the 'probit' linkfunction:

s = {@norminv,@(x)1./normpdf(norminv(x)),@normcdf}; g = GeneralizedLinearModel.fit(x,[y n], 'linear','distr','binomial','link',s)

Trang 24

Generalized Linear regression model: link(y) ~ 1 + x1

Choose Fitting Method and Model

Therearetwoways to create a fitted model

• Use GeneralizedLinearModel.fit when you have a good idea of your generalized linear model, or when

you want to adjust your model later to include or exclude certain terms

• Use GeneralizedLinearModel.stepwise when you want to fit your model using stepwise regression.

GeneralizedLinearModel.stepwise starts from one model, such as a constant, and adds or subtracts termsone at a time, choosing an optimal term each time in a greedy fashion, until it cannot improve further.Use stepwise fitting to find a good model, one that has only relevant terms

The result depends on the starting model Usually, starting with a constant model leads to a small model.Starting with more terms can lead to a more complex model, but one that has lower mean squared error

In either case, provide a model to the fitting function (which is the starting model for

GeneralizedLinearModel.stepwise)

Specify a model using one of these methods

• “Brief String” on page 9-150

• “Terms Matrix” on page 9-151

Model contains only a constant (intercept) term

Model contains an intercept and linear terms for each predictor

Model contains an intercept, linear terms, and all products of pairs of distinct predictors (no squaredterms)

Trang 25

Model contains an intercept, linear terms, and squared terms.

Model contains an intercept, linear terms, interactions, and squared terms

Model is a polynomial with all terms up to degree i in the first predictor, degree j in the second predictor,

etc Use numerals 0 through 9.For example, 'poly2111' has a constant plus all linear and product terms,and also contains terms with predictor 1 squared

Terms Matrix A terms matrix is a T-byP+1 matrix specifying terms in a model, where T is the number

of terms, P is the number of predictor variables, and plus one is for the response variable The value ofT(i,j) is the exponent of variable j in term i For example, if there are three predictor variables A, B,andC:

[0 0 0 0] % constant term or intercept [0 1 0 0] % B; equivalently, A^0 * B^1 * C^0 [1 0 1 0] % A*C[2 0 0 0] % A^2

[0 1 2 0] % B*(C^2)

The 0 at the end of each term represents the response variable In general,

• If you have the variables in a dataset array, then a 0 must represent the response variable depending on

the position of the response variable in the dataset array For example:

Load sample data and define the dataset array

Trang 26

Now, the response variable is the first term in the data set array Specify the same linear model,

'BloodPressure ~ 1 + Sex + Age + Smoker', using a term matrix

• If you have the predictor and response variables in a matrix and column vector, then you must include

a 0 for the response variable at the end of each term For example:

Load sample data and define the matrix of predictors

load carsmall

X = [Acceleration,Weight];

Specify the model 'MPG ~ Acceleration + Weight +

Acceleration:Weight + Weight^2' using a term matrix and fit the model to data This model includes themain effect and two way interaction terms for the variables, Acceleration and Weight,anda second orderterm for the variable, Weight

Number of observations: 94, Error degrees of freedom: 89 Root Mean Squared Error: 4.1

R-squared: 0.751, Adjusted R-Squared 0.739

F-statistic vs constant model: 67, p-value = 4.99e-26

Only the intercept and x2 term, which corresponds to the Weight variable, are significant at the 5%significance level

Trang 27

Now, perform a stepwise regression with a constant model as the starting model and a linear model withinteractions as the upper model.

Number of observations: 94, Error degrees of freedom: 92 Root Mean Squared Error: 4.13

R-squared: 0.738, Adjusted R-Squared 0.735

F-statistic vs constant model: 259, p-value = 1.64e-28

The results of the stepwise regression are consistent with the results of LinearModel.fit in the previousstep

- + to include the next variable

- to exclude the next variable

- : to define an interaction, a product of terms

- * to define an interaction and all lower-order terms

- ^toraisethepredictortoapower,exactlyasin * repeated, so ^ includes lower order terms as well

- () to group terms

Tip Formulas include a constant (intercept) term by default To exclude a constant term from the model,

include -1 in the formula

Examples:

'Y ~A +B+ C' is a three-variable linear model with intercept 'Y ~A +B+ C- 1' is a three-variable linearmodel without intercept 'Y ~A +B+ C+ B^2' is a three-variable model with intercept and a B^2 term.'Y ~A +B^2 +C' is the same as the previous example, since B^2 includes a B term

'Y ~A +B+ C+ A:B' includes an A*B term

'Y ~A*B+C' is the same as the previous example, sinceA*B= A+ B + A:B

'Y ~ A*B*C - A:B:C' has all interactions among A, B,and C,exceptthe three-way interaction

'Y ~ A*(B + C + D)' has all linear terms, plus products of A with each of the other variables

Trang 28

Fit Model to Data

Create a fitted model using GeneralizedLinearModel.fit or GeneralizedLinearModel.stepwise Choosebetween them as in “Choose Fitting Method and Model” on page 9-150 For generalized linear modelsother than those with a normal distribution, give a Distribution name-value pair as in “Choose

Generalized Linear Model and Link Function” on page 9-146 For example,

mdl = GeneralizedLinearModel.fit(X,y,'linear','Distribution','poisson') %or

mdl = GeneralizedLinearModel.fit(X,y,'quadratic',

'Distribution','binomial')

Examine Quality and Adjust the Fitted Model

After fitting a model, examine the result

• “Model Display”onpage9-156

• “Diagnostic Plots” on page 9-157

• “Residuals — Model Quality for Training Data” on page 9-160

• “Plots to Understand Predictor Effects and How to Modify a Model” on page 9-163

Model Display

A linear regression model shows several diagnostics when you enter its name or enter disp(mdl) Thisdisplay gives some of the basic information to check whether the fitted model represents the dataadequately

For example, fit a Poisson model to data constructed with two out of five predictors not affecting theresponse, and with no intercept term:

rng('default') % for reproducibility X = randn(100,5);

100 observations, 94 error degrees of freedom

Dispersion: 1

Chi^2-statistic vs constant model: 44.9, p-value = 1.55e-08

Trang 29

Notice that:

• The display contains the estimated values of each coefficient in the Estimate column These values are

reasonably near the true values [0;.4;0;0;.2;.3], except possibly the coefficient of x3 is not terribly near0

• There is a standard error column for the coefficient estimates.

• The reported pValue (which are derived from the t statistics under the assumption of normal errors) for

predictors 1, 4, and 5 are small These are the three predictors that were used to create the response datay

• The pValue for (Intercept), x2 and x3 are larger than 0.01 These three predictors were not used to

create the response data y.The pValue for x3 is just over 05, so might be regarded as possibly

It is reasonable to assume that the values of poor follow binomial distributions, with the number of trialsgiven by total and the percentage of successes depending on w This distribution can be accounted for in

the context of a logistic model by using a generalized linear model with link function log(μ/(1 – μ)) =

Xb This link function is called 'logit'.

Trang 30

Dispersion: 1

See how well the model fits the data

Trang 31

This is typical of a regression with points ordered by the predictor variable The leverage of each point

on the fit is higher for points with relatively extreme predictor values (in either direction) and low forpoints with average predictor values In examples with multiple predictors and with points not ordered

by predictor value, this plot can help you identify which observations have high leverage because theyare outliers as measured by their predictor values

Residuals — Model Quality for Training Data

There are several residual plots to help you discover errors, outliers, or correlations in the model or data.The simplest residual plots are the default histogram plot, which shows the range of the residuals andtheir frequencies, and the probability plot, which shows how the distribution of the residuals compares

to a normal distribution with matched variance

This example shows residual plots for a fitted Poisson model The data construction has two out of fivepredictors not affecting the response, and no intercept term:

Trang 32

While most residualscluster near 0, there are several near ±18 So examine a different residuals plot.

Now it is clear The residuals do not follow a normal distribution Instead, they have fatter tails, much as

an underlying Poisson distribution

Plots to Understand Predictor Effects and How to Modify a Model

This example shows how to understand the effect each predictor has on a regression model, and how tomodify the model to remove unnecessary terms

Trang 33

1Create a model from some predictors in artificial data The data do not use the second and thirdcolumns in X So you expect the model not to show much dependence on those predictors.rng('default') % for reproducibility X = randn(100,5);

mu = exp(X(:,[1 4 5])*[2;1;.5]); y = poissrnd(mu);

mdl = GeneralizedLinearModel.fit(X,y, 'linear','Distribution','poisson');

2Examine a slice plot of the responses This displays the effect of each predictor separately.plotSlice(mdl)

Trang 34

The scale of the first predictor is overwhelming the plot Disable it using the Predictors menu.

Now it is clear that predictors 2 and 3 have little to no effect

You can drag the individual predictor values, which are represented by dashed blue vertical lines Youcan also choose between simultaneous and non-simultaneous confidence bounds, which are represented

by dashed red curves Dragging the predictor lines confirms that predictors 2 and 3 have little to noeffect

3Remove the unnecessary predictors using either removeTerms or step Using step can be safer, in casethere is an unexpected importance to a term that becomes apparent after removing another term

However, sometimes removeTerms can be effective when step does not proceed In this case, the twogive identical results

Trang 35

Generalized Linear regression model: log(y) ~ 1 + x1 + x4 + x5 Distribution = Poisson

Estimated Coefficients: Estimate SE tStat pValue (Intercept) 0.17604 0.062215 2.8295 0.004662 x1 1.9122 0.024638 77.614 0 x4 0.98521 0.026393 37.328 5.6696e-305 x5 0.61321 0.038435 15.955 2.6473e-57 100 observations, 96 error degrees of freedom Dispersion: 1 Chi^2-statistic

vs constant model: 4.97e+04, p-value = 0

Predict or Simulate Responses to New Data

There are three ways to use a linear model to predict the response to new data:

1Create a model from some predictors in artificial data The data do not use the second and third

columns in X So you expect the model not to show much dependence on these predictors Construct themodel stepwise to include the relevant predictors automatically

rng('default') % for reproducibility

1 Adding x1, Deviance = 2515.02869, Chi2Stat = 47242.9622, PValue = 0

3 Adding x5, Deviance = 96.3326, Chi2Stat = 232.0642, PValue = 2.114384e-52

2Generate some new data, and evaluate the predictions from the data

Xnew = randn(3,5) + repmat([1 2 3 4 5],[3,1]); % new data [ynew,ynewci] = predict(mdl,Xnew)

ynew =

1.0e+04 *

www.Ebook777.com

Trang 36

This example shows how to predict mean responses using the feval method.

Xnew = randn(3,5) + repmat([1 2 3 4 5],[3,1]); % new data

ynew = feval(mdl,Xnew(:,1),Xnew(:,4),Xnew(:,5)) % only need predictors 1,

Trang 37

1.7375

3.7471

random

The random method generates new random response values for specified predictor values The

distribution of the response values is the distribution used in the model random calculates the mean ofthe distribution from the predictors, estimated coefficients, and link function For distributions such asnormal, the model also provides an estimate of the variance of the response For the binomial and

Poisson distributions, the variance of the response is determined by the mean; random does not use aseparate “dispersion” estimate

This example shows how to simulate responses using the random method

Xnew = randn(3,5) + repmat([1 2 3 4 5],[3,1]); % new data ysim = random(mdl,Xnew)

ysim =

1111

17121

37457

The predictions from random are Poisson samples, so are integers

3Evaluate the random method again, the result changes

Trang 38

Share Fitted Models

The model display contains enough information to enable someone else to recreate the model in atheoretical sense For example,

mu = exp(X(:,[1 4 5])*[2;1;.5]);

y = poissrnd(mu);

mdl = GeneralizedLinearModel.stepwise(X,y,

'constant','upper','linear','Distribution','poisson')

mdl =

Generalized Linear regression model: log(y) ~ 1 + x1 + x4 + x5 Distribution = Poisson

Estimated Coefficients: Estimate SE tStat pValue (Intercept) 0.17604 0.062215 2.8295 0.004662 x1 1.9122 0.024638 77.614 0 x4 0.98521 0.026393 37.328 5.6696e-305 x5 0.61321 0.038435 15.955 2.6473e-57

100 observations, 96 error degrees of freedom Dispersion: 1 Chi^2-statistic vs constant model: 4.97e+04, p-value = 0

You can access the model description programmatically, too For example,

Generalized Linear Model Workflow

This example shows a typical workflow: import data, fit a generalized linear model, test its quality,modify it to improve the quality, and make predictions based on the model It computes the probabilitythat a flower is in one of two classes, based on the Fisher iris data

Step 1 Load the data.

Load the Fisher iris data Extract the rows that have classification versicolor or virginica These are rows

51 to 150 Create logical response variables that are trueforversicolorflowers

load fisheriris

X = meas(51:end,:); % versicolor and virginica y = strcmp('versicolor',species(51:end));

Trang 39

Step 2 Fit a generalized linear model.

Fit a binomial generalized linear model to the data

Dispersion: 1

Step 3 Examine the result, consider alternative models.

Some p-values in the pValue column are not very small Perhaps the model can be simplified.

See if some 95% confidence intervals for the coefficients include 0 If so, perhaps these model termscould be removed

Only two of the predictors have coefficients whose confidence intervals do not include 0

The coefficients of 'x1' and 'x2' have the largest p-values Test whether both coefficients could be zero.

M = [0 1 0 0 0 % picks out coefficient for column 1

0 0 1 0 0]; % picks out coefficient for column 2

Trang 40

Perhaps it would have been better to have

GeneralizedLinearModel.stepwise identify the model initially

mdl2 = GeneralizedLinearModel.stepwise(X,y,

'constant','Distribution','binomial','upper','linear')

2 Adding x3, Deviance = 20.5635, Chi2Stat = 12.8573, PValue = 0.000336166

3 Adding x2, Deviance = 13.2658, Chi2Stat = 7.29767, PValue = 0.00690441

mdl2 =

Generalized Linear regression model: logit(y) ~ 1 + x2 + x3 + x4 Distribution = Binomial

Estimated Coefficients: Estimate SE tStat pValue (Intercept) 50.527 23.995 2.1057 0.035227 x2 8.3761 4.7612 1.7592 0.078536 x3 -7.8745 3.8407 -2.0503 0.040334 x4 -21.43 10.707 -2.0014 0.04535

100 observations, 96 error degrees of freedom Dispersion: 1 Chi^2-statistic vs constant model: 125, p-value = 5.4e-27

GeneralizedLinearModel.stepwise included 'x2' in the model, because it neither adds nor removes terms

with p-values between 0.05 and 0.10.

Step 4 Look for outliers and exclude them.

Examine a leverage plot to look for influential outliers

Định dạng
Số trang	331
Dung lượng	4,33 MB