Modelling the effects of continuous covariatescovariates and the response within the scope of linear models.Two simple methods for dealing with nonlinearity: 1 Variable transformation; 2
Trang 1Statistics in Geophysics: Linear Regression II
Steffen Unkel
Department of Statistics Ludwig-Maximilians-University Munich, Germany
Trang 2Model definition
Suppose we have the following model under consideration:
y = Xβ + ,where y = (y1 , yn)> is an n × 1 vector of observations onthe response, β = (β0, β1, , βk)> is a (k + 1) × 1 vector ofparameters, = (1, , n)> is an n × 1 vector of randomerrors, and X is the n × (k + 1) design matrixwith
Trang 3It follows that E(y) = Xβ and Cov(y) = σ2I In case of
assumption 4, we have y ∼ N (Xβ, σ2I)
Trang 4Modelling the effects of continuous covariates
covariates and the response within the scope of linear models.Two simple methods for dealing with nonlinearity:
1 Variable transformation;
2 Polynomial regression.
Sometimes it is customary to transform the continuousresponse as well
Trang 5Modelling the effects of categorical covariates
more distinct levels
In such a case we cannot set up a continuous scale for thecategorical variable
variables, and estimate a separate effect for each category ofthe original covariate
We can deal with c categories by the introduction of c − 1dummy variables
Trang 6Example: Turkey data
weightp agew origin
Trang 7Dummy coding for categorical covariates
Given a covariate x ∈ {1, , c} with c categories,
we define the c − 1 dummy variables
Trang 8Design matrix for the turkey data using dummy coding
Trang 9Interactions between covariates
An interaction between predictor variables exists if the effect
of a covariate depends on the value of at least one othercovariate
Consider the following model between a response y and twocovariates x1 and x2:
y = β0+ β1x1+ β2x2+ β3x1x2+ ,where the term β3x1x2 is called aninteractionbetween x1 and
x2
The terms β1x1 and β2x2 depend on only one variable and arecalledmain effects
Trang 10Least squares estimation of regression coefficients
The error sum of squares is
Trang 11Maximum likelihood estimation of regression coefficients
Assuming normally distributed errors yields the likelihood
Maximizing l (β, σ2) with respect to β is equivalent to
minimizing (y − Xβ)>(y − Xβ), which is the least squarescriterion
The MLE, ˆβML, therefore is identical to the least squaresestimator
Trang 12Fitted values and residuals
Based on ˆβ = (X>X)−1X>y, we can estimate the(conditional) mean of y by
dE(y) = ˆy = X ˆβ Substituting the least squares estimator further results in
ˆ
y = X(X>X)−1X>y = Hy ,where the n × n matrix H is called the hat matrix.The residualsare
ˆ
Trang 13Estimation of the error variance
Maximization of the log-likelihood with respect to σ2 yields
Trang 14Properties of the least squares estimator
For the least squares estimator we have
E( ˆβ) = β
Cov( ˆβ) = σ2(X>X)−1
Gauss-Markov Theorem: Among all linear and unbiasedestimators ˆβL, the least squares estimator has minimalvariances, implying
Var( ˆβj) ≤ Var( ˆβjL) , j = 0, , k
If ∼ N (0, σ2I), then ˆβ ∼ N (β, σ2(X>X)−1)
Trang 15ANOVA table for multiple linear regression and R2
The multiple coefficient of determination is still computed as
R2 = SSRSST, but is no longer the square of the Pearson correlationbetween the response and any of the predictor variables
Trang 16Interval estimation and tests
We would like to constructconfidence intervals andstatisticaltestsfor hypotheses regarding the unknown regression
Trang 17Testing linear hypotheses
Trang 18Testing linear hypotheses
Trang 19Testing linear hypotheses
Trang 20Testing linear hypotheses
Trang 21Confidence intervals and regions for regression coefficients
Confidence interval for βj:
A confidence intervalfor βj with level 1 − α is given by
[ ˆβj − tn−p(1 − α/2)ˆσβˆj, ˆβj + tn−p(1 − α/2)ˆσβˆj]
Confidence region for subvector β1:
A confidence ellipsoidfor β1 = (β1, , βr)> with level 1 − α
Trang 22Prediction interval for a future observation:
A prediction intervalfor a future observation y0 at location x0
with level 1 − α is given by
x>0β ± tˆ n−p(1 − α/2)ˆσ(1 + x>0(X>X)−1x0)1/2
Trang 23The corrected coefficient of determination
We already defined the coefficient of determination, R2, as ameasure for the goodness-of-fit to the data
The use of R2 is limited, since it will never decrease with theaddition of a new covariate into the model
The corrected coefficient of determination, Radj2 , adjusts forthis problem, by including a correction term for the number ofparameters
It is defined by
Radj2 = 1 − n − 1
2)
Trang 24Akaike information criterion
The Akaike information criterion(AIC) is one of the mostwidely used criteria for model choice within the scope of
likelihood-based inference
The AIC is defined by
AIC = −2l ( ˆβML, σML2 ) + 2(p + 1) ,where l ( ˆβML, σ2ML) is the maximum value of the log-likelihood.Smaller values of the AIC correspond to a better model fit
Trang 25Bayesian information criterion
The Bayesian information criterion(BIC) is defined by
BIC = −2l ( ˆβML, σML2 ) + log(n)(p + 1) The BIC multiplied by 1/2 is also known as Schwartz criterion.Smaller values of the BIC correspond to a better model fit.The BIC penalizes complex models much more than the AIC
Trang 26Practical use of model choice criteria
To select the most promising models from candidate models,
we first obtain a preselection of potential models
All potential models can now be assessed with the aid of one
of the various model choice criteria (AIC, BIC)
This method is not always practical, since the number ofregressor variables and modelling variants can be very large inmany applications
In this case, we can use the following partially heuristicmethods
Trang 27Practical use of model choice criteria II
Complete model selection: In case that the number of
predictor variables is not too large, we can determine the bestmodel with the “leaps-and-bounds” algorithm
Forward selection:
1 Based on a starting model, forward selection includes one additional variable in every iteration of the algorithm.
2 The variable which offers the greatest reduction of a
preselected model choice criterion is chosen.
3 The algorithm terminates if no further reduction is possible.
Trang 28Practical use of model choice criteria III
eliminated from the model.
3 The algorithm terminates if no further reduction is possible.
Stepwise selection: Stepwise selection is a combination offorward selection and backward elimination In every iteration
of the algorithm, a predictor variable may be added to themodel or removed from the model