1. Trang chủ
  2. » Ngoại Ngữ

2021-04-08-Three-Levels-Of-Spline

40 1 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Three Levels of Spline Models: Understanding, Application and Beyond
Tác giả Boyi Guo
Trường học University of Alabama at Birmingham
Chuyên ngành Biostatistics
Thể loại thesis
Năm xuất bản 2021
Thành phố Birmingham
Định dạng
Số trang 40
Dung lượng 793,96 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

I Non-linear Effect ModifierI Non-proportional Hazard Models I Generalized Additive Mixed Model... I To review the basic concepts of spline I To raise awareness of advanced spline applic

Trang 1

Three Levels of Spline Models:

Understanding, Application and Beyond

Boyi Guo

Department of Biostatistics University of Alabama at Birmingham

April 8th, 2021

Trang 2

Who Am I?

Trang 3

Who Am I?

I 4th-year Ph.D student in BST @ UAB

I Dissertation: Bayesian high-dimensional additive models

I Background:

I Balanced methodology & collaboration

I Experienced R programmer & package creator

I Graduate in about 1 year, Looking for

I Faculty postion in Biostat

I Post-doc in methodology dev on HD, causal inference

Trang 4

Overview

Trang 5

I Non-linear Effect Modifier

I Non-proportional Hazard Models

I Generalized Additive Mixed Model

Trang 6

Objectives

Trang 7

I To review the basic concepts of spline

I To raise awareness of advanced spline applications

Disclaimer

I Minimum level of theoretical justification

I No discussion on model fitting algorithms or software implementations

Trang 8

Understanding

Trang 10

Previous Solutions:

I Variable categorization: e.g using quartiles of a continuous variable in a model

I Assume all subjects within a group shares the same risk/effect

I Loss of data fidelity

I Polynomial regression:

y = β0+ β1X + β2X2+ · · · + β m X m + 

I Precision issues, e.g X is blood pressure measure, and X3 would be extremely large

I Goodness of fit: deciding which order of polynomial term should be included

Trang 11

I Can be easily incorporated in linear regression, generalized linear regression, Cox

regression, as regression splines

Trang 12

Spline Components

I Order/degree of the polynomial function, m

I Normally, m = 3, i.e cubic spline is sufficient

I An increasing breakpoints sequence τ

I a.k.a knots, where the piece-wise functions joint

I e.g k ≡ |τ | = 5, equally spaced

I Continuity conditions at knots, v

I to control the smoothness between pieces

I e.g continuous at second derivative for cubic spline

Trang 13

Toy Example

A spline function of the variable X , f (X ), of order m = 0 with k = 2 knots

1 = 1, τ2= 5) and no continuity condition

Trang 14

Visual Demonstration

Figure from Hastie, Tibshirani, and Friedman (2009) PP.142

Trang 15

Cubic Spline

I Cubic polynomial in each piece-wise function, i.e m = 3

I E.g truncated power bases with 3 knots at τ1, τ2, τ3

f (X ) = β0+ β1X + β2X2+ β3X3+ β4(X − τ1)3+

+ β5(X − τ2 )3++ β6(X − τ3 )3+

I Continuous at second derivative

I The smoothest possible interpolant

I Alternative representation

I B-spline bases for stable computation

I Natural cubic spline for linearity beyond boundary knots (f000(X ) = 0)

Trang 16

Cubic Spline

Figure from Hastie, Tibshirani, and Friedman (2009) PP.143

Trang 18

Software Implementation

Two-step procedure

I Create the ‘design’ matrix of the spline function B(X )

I Fit the preferred model including B(X ) as covariates / predictors

library(splines) # Package for b-spline

x_spline <- bs(x, degree = 3 # cubic polynomial

df = 8 # 5 (df-degree) knots

glm(y ~ x_spline) # Fitting the spline model

# Equivalently

glm(y ~ bs(x, degree=3 df=8))

Trang 19

Variability Band

I A delicate statistical problem

I Confidence about spline functions VS point estimates

I Most commonly used: 95% point-wise confidence interval

I Can be calculate using statistical contrasts for regression splines

Trang 20

Hypothesis Testing

I Two hypothesis tests

I If the non-linear terms are necessary:

Trang 21

Rule of Thumb

I Cubic splines for smooth interpolant

I B-spline for computation stability

I 3-5 equally spaced knots

I Transform variables with extreme values for computational stability

I e.g prefer f (log(X )) over f (X ) when modeling CRP

I Examine outlier’s effect on statistically significant non-linear relationship

I Survival Model

I Knots are decided by equal number of events in each group

I Defer to Sleeper and Harrington (1990) for practical guidance

Trang 22

Application

Trang 23

Varying Coefficient

To model a non-constant effect of the variable Z as a function of another variable X

E (y ) = f (X )Z , where f (X ) is the varying coefficient of Z

I Example: statistical interaction βXZ where f (X ) = βX

I What if the slope of the effect are not constant across the domain of X ?

Trang 24

Non-linear Effect Modification

0 1 2 3

1.0 1.5 2.0 2.5 3.0

X

0 1 2 3

1.0 1.5 2.0 2.5 3.0

X

Trang 25

Non-linear Effect Modification

E (y ) = f (X ) + f0(X )Z = β Z =0 T B(X ) + β T Z =1 B(X ) ∗ Z

I f (X ) models the effect of X when Z = 0

I f0(X )Z models the modifying effect of Z at different values of X

I f0(X ) is the varying coefficient of Z , using a non-linear function, for non-constant

slope

Trang 26

Non-linear Effect Modification

I Assumptions of consideration

I Should f (X ) be linear or non-linear?

I Should f (X ) use the same bases as f0(X )?

I Should f (X ) be the same level of complexity as f0(X )?

Trang 28

Mixed Model

To model the non-linear fixed effect while considering random effects

I Good for longitudinal studies or multi-center studies

I Easy to implement: to include your design matrix of B(X ) in the fixed effect

I gamm in R-package mgcv

Trang 29

Beyond

Trang 30

Spline Surface

I Model the non-linear interaction between two continuous variables

I Thin-plate splines, tensor product splines

I Thin-plate spline is scale-sensitive

I Tensor product spline is scale-invariant

I Dealing with over smoothing across boundary

I Soap film smoothing

I Application:

I Loop, M S., Howard, G., de Los Campos, G., Al-Hamdan, M Z., Safford, M M., Levitan, E B., & McClure, L A (2017) Heat maps of hypertension, diabetes

mellitus, and smoking in the continental United States Circulation:

Cardiovascular Quality and Outcomes, 10(1), e003350.

Trang 31

Smoothing Spline

I Motivation:

I To simplify the decision making about the knots

I Idea:

I Set the number of knots to a really large value (k=25, 40, N)

I Use variable selection methods, penalized models specifically, to decide the

smoothness of the spline

Trang 32

f00(X )2dx

I λ is a tuning parameter, selected via (generalized) cross-validation

Trang 33

Statistical Complications

I Estimated degree of freedom due to shrinkage

I Harder to conduct hypothesis testing, and calculate CI

I More decisions when modeling effect modification

I Same smoothness for the spline functions?

I If the same, how to estimate the smoothness

Trang 34

Function Selection

I Question of interest

I If a variable X has effect on the outcome Y

I High-dimensional data analysis, e.g EHR, Genomics

I Solutions

I Step-wise function selection

I Group penalized models

I Bayesian hierarchical models

Trang 35

Conclusion

Trang 36

I Reviewed concepts of spline

I New insight of advanced spline models

I Same set of variables can lead to many models with different assumptions

I Fit many models and compare

I Explore the inconsistency

I Balance between interpolation and prediction

I “Black box” models for improved prediction

I Consult with statisticians when not comfortable dealing spline models

Trang 37

Great Book

Wood, S N (2017) Generalized additive

models: an introduction with R CRC

press

I Chapter 7 for examples

Trang 38

Q & A

Trang 39

Reference

Trang 40

Gray, Robert J 1992 “Flexible Methods for Analyzing Survival Data Using Splines,

with Applications to Breast Cancer Prognosis.” Journal of the American Statistical Association 87 (420): 942–51 https://doi.org/10.1080/01621459.1992.10476248 Hastie, Trevor, Robert Tibshirani, and Jerome Friedman 2009 The Elements of

Statistical Learning: Data Mining, Inference, and Prediction Springer Science &

Business Media

Loop, Matthew Shane, George Howard, Gustavo de Los Campos, Mohammad Z

Al-Hamdan, Monika M Safford, Emily B Levitan, and Leslie A McClure 2017

“Heat Maps of Hypertension, Diabetes Mellitus, and Smoking in the Continental

United States.” Circulation: Cardiovascular Quality and Outcomes 10 (1): e003350.

Sleeper, Lynn A., and David P Harrington 1990 “Regression Splines in the Cox

Model with Application to Covariate Effects in Liver Disease.” Journal of the

American Statistical Association 85 (412): 941–49.

Ngày đăng: 21/10/2022, 22:39

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN