1. Trang chủ
  2. » Giáo án - Bài giảng

Statistics in geophysics generalized linear regression

25 260 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 25
Dung lượng 241,59 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Components of the classical linear modelGeneralized linear modelsGLMs are an extension of classicallinear models.. The systematic partof the model is a specification for theconditional m

Trang 1

Statistics in Geophysics: Generalized Linear

RegressionSteffen Unkel

Department of Statistics Ludwig-Maximilians-University Munich, Germany

Trang 2

Components of the classical linear model

Generalized linear models(GLMs) are an extension of classicallinear models

Recall the classical linear regression model: y = Xβ + 

The systematic partof the model is a specification for the(conditional) mean of y, which takes the form E(y) = Xβ

specialization of the model involves the assumption that

 ∼ N (0, σ2I)

n×1 >

Trang 3

Components of a generalized linear model II

Three-part specification of the classical linear model:

a linear predictorη = (ηi, , ηn)>, where

ηi = x>i β , (i = 1, , n)

µ = η This specification introduces a new symbol η for the linear predictorand the 3rd component then specifies that µ and η are identical

Trang 4

Classical linear models have a Gaussian distribution in

component 1 and the identity function for the link in

component 3

GLMs allow two extensions:

1 The distribution in component 1 may come from an

exponential family other than the Gaussian.

2 The link function in component 3 may become any monotonic

Trang 5

The second parameter φ is a dispersion parameter.

It can be shown that E(y ) = µ = b0(θ) and Var(y ) = φb00(θ)

Trang 6

Exponential family parameters, expectation and variance

Bernoulli B(1, π) log(π/(1 − π)) log(1 + exp(θ)) 1

Distribution E(y ) = b0(θ) b00(θ) Var(y ) = b00(θ)φ

Bernoulli π = 1+exp(θ)exp(θ) π(1 − π) π(1 − π)

Trang 7

Maximum likelihood estimation in GLMs

weighted least squares estimates



˜1(ˆη1(t)), , ˜yn(ˆηn(t))

>

is a vector of “workingobservations” with elements

Trang 8

Maximum likelihood estimation in GLMs II

A key role in the iterations plays the matrix X>W(t)X

Invertibility of X>W(t)X does not follow from the invertibility

Trang 9

Maximum likelihood estimation in GLMs III

Asymptotic properties of the ML estimator

Under regularity conditions:

2 l(β)

∂β∂β> = Fobs is the observed Fisher information

matrixand l (β) is the log-likelihood

Trang 10

Estimation of the scale parameter

Denote by v (µi) = b00(θi) the so-called variance function andnote that b00(θi) implicitly depends on µi through the relation

Trang 11

Testing linear hypotheses

lr , w , u > χ2r(1 − α)

Trang 12

Criteria for model fit

n−p-distributed

Trang 13

Criteria for model selection

The Akaike information criterion (AIC) for model selection isdefined generally as

AIC = −2l ( ˆβ) + 2p The Bayesian information criterion (BIC) is defined generallyas

BIC = −2l ( ˆβ) + log(n)p

If the model contains a dispersion parameter φ, its MLestimator should be substituted into the respective model andthe total number of parameters should be increased to p + 1

Trang 14

Binary regression models

only two possible values, denoted by 0 and 1

probabilities of ‘success’ and ‘failure’, respectively

Trang 15

Binary regression models III

through a relation of the form

πi = h(ηi) = h(β0+ β1xi 1+ · · · + βkxik) ,where the response function h is a strictly monotonicallyincreasing cdf on the real line

This ensures h(η) ∈ [0, 1] and the relation above can always

be expressed in the form

ηi = g (πi) ,with the inverse link function g = h−1

Logit and probit modelsare the most widely used binaryregression models

Trang 17

Probit model

cumulative distribution function Φ(·), that is,

π = Φ(η) = Φ(β0+ β1x1+ + βkxk)

g (π) = probit(π) = Φ−1(π) = η = β0+ β1x1+ + βkxk

A (minor) disadvantage is the required numerical evaluation of

Φ in the maximum likelihood estimation of β

Trang 18

Interpretation of the logit model

Summary:

The odds πi/(1 − πi) = P(yi = 1|xi)/P(yi = 0|xi) follow themultiplicative model

P(yi = 1|xi)

P(yi = 0|xi) = exp(β0) · exp(xi 1β1) · · exp(xikβk)

If, for example, xi 1 increases by one unit to xi 1+ 1, thefollowing applies to theodds ratio:

P(yi = 1|xi 1+ 1, )

P(yi = 0|xi 1+ 1, )/

P(yi = 1|xi 1, )P(yi = 0|xi 1, ) = exp(β1)

β1> 0 : odds ratio > 1,

β1< 0 : odds ratio < 1,

Trang 19

Fitting the logit model

The parameters of the logistic regression model are estimated

estimated response probability and values x1, x2, , xk can beexpressed as

Trang 20

Fitting the logit model II

The estimated value of the linear systematic component ofthe model for the i th observation is

Trang 21

Standard errors of parameter estimates

Following the estimation of the β-parameters in a logistic

be needed

estimate, se( ˆβj), for j = 0, , k

for the corresponding true value, βj, are ˆβj ± z1−α

2 × se( ˆβj).These interval estimates throw light on the likely range ofvalues of the parameter

Trang 22

Count data

Count data are frequently observed when the number ofevents within a fixed time frame or frequencies in a

contingency table have to be analyzed

Sometimes, a normal approximation can be sufficient

specific properties of count data are most appropriate

The Poisson distributionis the simplest and most widely usedchoice

Trang 23

Log-linear Poisson model

The most widely used model for count data connects the rate

λi = E(yi) of the Poisson distribution with the linear predictor

ηi = x>i β via

λi = exp(ηi) = exp(β0) exp(β1xi 1) · · exp(βkxik)

or in log-linear form through

log(λi) = ηi = x>i β = β0+ β1xi 1+ + βkxik

The effect of covariates on the rate λ is, thus, exponentiallymultiplicative similar to the effect on the odds π/(1 − π) inthe logit model

The effect on the logarithm of the rate is linear

Trang 24

The assumption of a Poisson distribution for the responsesimplies

λi = E(yi) = Var(yi)

For similar reasons as in case with binomial data, a

significantly higher empirical variance is frequently observed inapplications of Poisson regression

For this reason, it is often useful to introduce an

overdispersion parameter φ by assuming

Ngày đăng: 04/12/2015, 17:08

TỪ KHÓA LIÊN QUAN