I Non-linear Effect ModifierI Non-proportional Hazard Models I Generalized Additive Mixed Model... I To review the basic concepts of spline I To raise awareness of advanced spline applic
Trang 1Three Levels of Spline Models:
Understanding, Application and Beyond
Boyi Guo
Department of Biostatistics University of Alabama at Birmingham
April 8th, 2021
Trang 2Who Am I?
Trang 3Who Am I?
I 4th-year Ph.D student in BST @ UAB
I Dissertation: Bayesian high-dimensional additive models
I Background:
I Balanced methodology & collaboration
I Experienced R programmer & package creator
I Graduate in about 1 year, Looking for
I Faculty postion in Biostat
I Post-doc in methodology dev on HD, causal inference
Trang 4Overview
Trang 5I Non-linear Effect Modifier
I Non-proportional Hazard Models
I Generalized Additive Mixed Model
Trang 6Objectives
Trang 7I To review the basic concepts of spline
I To raise awareness of advanced spline applications
Disclaimer
I Minimum level of theoretical justification
I No discussion on model fitting algorithms or software implementations
Trang 8Understanding
Trang 10Previous Solutions:
I Variable categorization: e.g using quartiles of a continuous variable in a model
I Assume all subjects within a group shares the same risk/effect
I Loss of data fidelity
I Polynomial regression:
y = β0+ β1X + β2X2+ · · · + β m X m +
I Precision issues, e.g X is blood pressure measure, and X3 would be extremely large
I Goodness of fit: deciding which order of polynomial term should be included
Trang 11I Can be easily incorporated in linear regression, generalized linear regression, Cox
regression, as regression splines
Trang 12Spline Components
I Order/degree of the polynomial function, m
I Normally, m = 3, i.e cubic spline is sufficient
I An increasing breakpoints sequence τ
I a.k.a knots, where the piece-wise functions joint
I e.g k ≡ |τ | = 5, equally spaced
I Continuity conditions at knots, v
I to control the smoothness between pieces
I e.g continuous at second derivative for cubic spline
Trang 13Toy Example
A spline function of the variable X , f (X ), of order m = 0 with k = 2 knots
(τ1 = 1, τ2= 5) and no continuity condition
Trang 14Visual Demonstration
Figure from Hastie, Tibshirani, and Friedman (2009) PP.142
Trang 15Cubic Spline
I Cubic polynomial in each piece-wise function, i.e m = 3
I E.g truncated power bases with 3 knots at τ1, τ2, τ3
f (X ) = β0+ β1X + β2X2+ β3X3+ β4(X − τ1)3+
+ β5(X − τ2 )3++ β6(X − τ3 )3+
I Continuous at second derivative
I The smoothest possible interpolant
I Alternative representation
I B-spline bases for stable computation
I Natural cubic spline for linearity beyond boundary knots (f000(X ) = 0)
Trang 16Cubic Spline
Figure from Hastie, Tibshirani, and Friedman (2009) PP.143
Trang 18Software Implementation
Two-step procedure
I Create the ‘design’ matrix of the spline function B(X )
I Fit the preferred model including B(X ) as covariates / predictors
library(splines) # Package for b-spline
x_spline <- bs(x, degree = 3 # cubic polynomial
df = 8 # 5 (df-degree) knots
glm(y ~ x_spline) # Fitting the spline model
# Equivalently
glm(y ~ bs(x, degree=3 df=8))
Trang 19Variability Band
I A delicate statistical problem
I Confidence about spline functions VS point estimates
I Most commonly used: 95% point-wise confidence interval
I Can be calculate using statistical contrasts for regression splines
Trang 20Hypothesis Testing
I Two hypothesis tests
I If the non-linear terms are necessary:
Trang 21Rule of Thumb
I Cubic splines for smooth interpolant
I B-spline for computation stability
I 3-5 equally spaced knots
I Transform variables with extreme values for computational stability
I e.g prefer f (log(X )) over f (X ) when modeling CRP
I Examine outlier’s effect on statistically significant non-linear relationship
I Survival Model
I Knots are decided by equal number of events in each group
I Defer to Sleeper and Harrington (1990) for practical guidance
Trang 22Application
Trang 23Varying Coefficient
To model a non-constant effect of the variable Z as a function of another variable X
E (y ) = f (X )Z , where f (X ) is the varying coefficient of Z
I Example: statistical interaction βXZ where f (X ) = βX
I What if the slope of the effect are not constant across the domain of X ?
Trang 24Non-linear Effect Modification
0 1 2 3
1.0 1.5 2.0 2.5 3.0
X
0 1 2 3
1.0 1.5 2.0 2.5 3.0
X
Trang 25Non-linear Effect Modification
E (y ) = f (X ) + f0(X )Z = β Z =0 T B(X ) + β T Z =1 B(X ) ∗ Z
I f (X ) models the effect of X when Z = 0
I f0(X )Z models the modifying effect of Z at different values of X
I f0(X ) is the varying coefficient of Z , using a non-linear function, for non-constant
slope
Trang 26Non-linear Effect Modification
I Assumptions of consideration
I Should f (X ) be linear or non-linear?
I Should f (X ) use the same bases as f0(X )?
I Should f (X ) be the same level of complexity as f0(X )?
Trang 28Mixed Model
To model the non-linear fixed effect while considering random effects
I Good for longitudinal studies or multi-center studies
I Easy to implement: to include your design matrix of B(X ) in the fixed effect
I gamm in R-package mgcv
Trang 29Beyond
Trang 30Spline Surface
I Model the non-linear interaction between two continuous variables
I Thin-plate splines, tensor product splines
I Thin-plate spline is scale-sensitive
I Tensor product spline is scale-invariant
I Dealing with over smoothing across boundary
I Soap film smoothing
I Application:
I Loop, M S., Howard, G., de Los Campos, G., Al-Hamdan, M Z., Safford, M M., Levitan, E B., & McClure, L A (2017) Heat maps of hypertension, diabetes
mellitus, and smoking in the continental United States Circulation:
Cardiovascular Quality and Outcomes, 10(1), e003350.
Trang 31Smoothing Spline
I Motivation:
I To simplify the decision making about the knots
I Idea:
I Set the number of knots to a really large value (k=25, 40, N)
I Use variable selection methods, penalized models specifically, to decide the
smoothness of the spline
Trang 32f00(X )2dx
I λ is a tuning parameter, selected via (generalized) cross-validation
Trang 33Statistical Complications
I Estimated degree of freedom due to shrinkage
I Harder to conduct hypothesis testing, and calculate CI
I More decisions when modeling effect modification
I Same smoothness for the spline functions?
I If the same, how to estimate the smoothness
Trang 34Function Selection
I Question of interest
I If a variable X has effect on the outcome Y
I High-dimensional data analysis, e.g EHR, Genomics
I Solutions
I Step-wise function selection
I Group penalized models
I Bayesian hierarchical models
Trang 35Conclusion
Trang 36I Reviewed concepts of spline
I New insight of advanced spline models
I Same set of variables can lead to many models with different assumptions
I Fit many models and compare
I Explore the inconsistency
I Balance between interpolation and prediction
I “Black box” models for improved prediction
I Consult with statisticians when not comfortable dealing spline models
Trang 37Great Book
Wood, S N (2017) Generalized additive
models: an introduction with R CRC
press
I Chapter 7 for examples
Trang 38Q & A
Trang 39Reference
Trang 40Gray, Robert J 1992 “Flexible Methods for Analyzing Survival Data Using Splines,
with Applications to Breast Cancer Prognosis.” Journal of the American Statistical Association 87 (420): 942–51 https://doi.org/10.1080/01621459.1992.10476248 Hastie, Trevor, Robert Tibshirani, and Jerome Friedman 2009 The Elements of
Statistical Learning: Data Mining, Inference, and Prediction Springer Science &
Business Media
Loop, Matthew Shane, George Howard, Gustavo de Los Campos, Mohammad Z
Al-Hamdan, Monika M Safford, Emily B Levitan, and Leslie A McClure 2017
“Heat Maps of Hypertension, Diabetes Mellitus, and Smoking in the Continental
United States.” Circulation: Cardiovascular Quality and Outcomes 10 (1): e003350.
Sleeper, Lynn A., and David P Harrington 1990 “Regression Splines in the Cox
Model with Application to Covariate Effects in Liver Disease.” Journal of the
American Statistical Association 85 (412): 941–49.