Statistics in geophysics linear regression II

Modelling the effects of continuous covariatescovariates and the response within the scope of linear models.Two simple methods for dealing with nonlinearity: 1 Variable transformation; 2

Trang 1

Statistics in Geophysics: Linear Regression II

Steffen Unkel

Department of Statistics Ludwig-Maximilians-University Munich, Germany

Trang 2

Model definition

Suppose we have the following model under consideration:

y = Xβ + ,where y = (y1 , yn)> is an n × 1 vector of observations onthe response, β = (β0, β1, , βk)> is a (k + 1) × 1 vector ofparameters, = (1, , n)> is an n × 1 vector of randomerrors, and X is the n × (k + 1) design matrixwith

Trang 3

It follows that E(y) = Xβ and Cov(y) = σ2I In case of

assumption 4, we have y ∼ N (Xβ, σ2I)

Trang 4

Modelling the effects of continuous covariates

covariates and the response within the scope of linear models.Two simple methods for dealing with nonlinearity:

1 Variable transformation;

2 Polynomial regression.

Sometimes it is customary to transform the continuousresponse as well

Trang 5

Modelling the effects of categorical covariates

more distinct levels

In such a case we cannot set up a continuous scale for thecategorical variable

variables, and estimate a separate effect for each category ofthe original covariate

We can deal with c categories by the introduction of c − 1dummy variables

Trang 6

Example: Turkey data

weightp agew origin

Trang 7

Dummy coding for categorical covariates

Given a covariate x ∈ {1, , c} with c categories,

we define the c − 1 dummy variables

Trang 8

Design matrix for the turkey data using dummy coding

Trang 9

Interactions between covariates

An interaction between predictor variables exists if the effect

of a covariate depends on the value of at least one othercovariate

Consider the following model between a response y and twocovariates x1 and x2:

y = β0+ β1x1+ β2x2+ β3x1x2+ ,where the term β3x1x2 is called aninteractionbetween x1 and

x2

The terms β1x1 and β2x2 depend on only one variable and arecalledmain effects

Trang 10

Least squares estimation of regression coefficients

The error sum of squares is

Trang 11

Maximum likelihood estimation of regression coefficients

Assuming normally distributed errors yields the likelihood

Maximizing l (β, σ2) with respect to β is equivalent to

minimizing (y − Xβ)>(y − Xβ), which is the least squarescriterion

The MLE, ˆβML, therefore is identical to the least squaresestimator

Trang 12

Fitted values and residuals

Based on ˆβ = (X>X)−1X>y, we can estimate the(conditional) mean of y by

dE(y) = ˆy = X ˆβ Substituting the least squares estimator further results in

ˆ

y = X(X>X)−1X>y = Hy ,where the n × n matrix H is called the hat matrix.The residualsare

ˆ

Trang 13

Estimation of the error variance

Maximization of the log-likelihood with respect to σ2 yields

Trang 14

Properties of the least squares estimator

For the least squares estimator we have

E( ˆβ) = β

Cov( ˆβ) = σ2(X>X)−1

Gauss-Markov Theorem: Among all linear and unbiasedestimators ˆβL, the least squares estimator has minimalvariances, implying

Var( ˆβj) ≤ Var( ˆβjL) , j = 0, , k

If ∼ N (0, σ2I), then ˆβ ∼ N (β, σ2(X>X)−1)

Trang 15

ANOVA table for multiple linear regression and R2

The multiple coefficient of determination is still computed as

R2 = SSRSST, but is no longer the square of the Pearson correlationbetween the response and any of the predictor variables

Trang 16

Interval estimation and tests

We would like to constructconfidence intervals andstatisticaltestsfor hypotheses regarding the unknown regression

Trang 17

Testing linear hypotheses

Trang 18

Trang 19

Trang 20

Trang 21

Confidence intervals and regions for regression coefficients

Confidence interval for βj:

A confidence intervalfor βj with level 1 − α is given by

[ ˆβj − tn−p(1 − α/2)ˆσβˆj, ˆβj + tn−p(1 − α/2)ˆσβˆj]

Confidence region for subvector β1:

A confidence ellipsoidfor β1 = (β1, , βr)> with level 1 − α

Trang 22

Prediction interval for a future observation:

A prediction intervalfor a future observation y0 at location x0

with level 1 − α is given by

x>0β ± tˆ n−p(1 − α/2)ˆσ(1 + x>0(X>X)−1x0)1/2

Trang 23

The corrected coefficient of determination

We already defined the coefficient of determination, R2, as ameasure for the goodness-of-fit to the data

The use of R2 is limited, since it will never decrease with theaddition of a new covariate into the model

The corrected coefficient of determination, Radj2 , adjusts forthis problem, by including a correction term for the number ofparameters

It is defined by

Radj2 = 1 − n − 1

2)

Trang 24

Akaike information criterion

The Akaike information criterion(AIC) is one of the mostwidely used criteria for model choice within the scope of

likelihood-based inference

The AIC is defined by

AIC = −2l ( ˆβML, σML2 ) + 2(p + 1) ,where l ( ˆβML, σ2ML) is the maximum value of the log-likelihood.Smaller values of the AIC correspond to a better model fit

Trang 25

Bayesian information criterion

The Bayesian information criterion(BIC) is defined by

BIC = −2l ( ˆβML, σML2 ) + log(n)(p + 1) The BIC multiplied by 1/2 is also known as Schwartz criterion.Smaller values of the BIC correspond to a better model fit.The BIC penalizes complex models much more than the AIC

Trang 26

Practical use of model choice criteria

To select the most promising models from candidate models,

we first obtain a preselection of potential models

All potential models can now be assessed with the aid of one

of the various model choice criteria (AIC, BIC)

This method is not always practical, since the number ofregressor variables and modelling variants can be very large inmany applications

In this case, we can use the following partially heuristicmethods

Trang 27

Practical use of model choice criteria II

Complete model selection: In case that the number of

predictor variables is not too large, we can determine the bestmodel with the “leaps-and-bounds” algorithm

Forward selection:

1 Based on a starting model, forward selection includes one additional variable in every iteration of the algorithm.

2 The variable which offers the greatest reduction of a

preselected model choice criterion is chosen.

3 The algorithm terminates if no further reduction is possible.

Trang 28

Practical use of model choice criteria III

eliminated from the model.

3 The algorithm terminates if no further reduction is possible.

Stepwise selection: Stepwise selection is a combination offorward selection and backward elimination In every iteration

of the algorithm, a predictor variable may be added to themodel or removed from the model

Định dạng
Số trang	28
Dung lượng	296,85 KB