Time-Varying Coefficient Models for Longitudinal data

class: center, middle, inverse, title-slide

# Time-Varying Coefficient Models for Longitudinal data
### Isaac Quintanilla
### 2020-10-16

---

## Presentation Access

This presentation is available at:

[gitlab.com/inqs909/vcm](https://gitlab.com/inqs909/vcm)

---
## Thank You to all essential workers!

- Healthcare workers

- Farmworkers

- Grocery workers

- All other essential workers

---

## UC Merced Land Acknowledgement

We pause to acknowledge all local indigenous peoples, including the Yokuts and Miwuk, who inhabited this land. We embrace their continued connection to this region and thank them for allowing us to live, work, learn, and collaborate on their traditional homeland. Let us now take a moment of silence to pay respect to their elders and to all Yokuts and Miwuk people, past and present.

---

## UCR Land Acknowledgement

I would like to respectfully acknowledge and recognize our responsibility to the original and current caretakers of this land, water, and air: the Cahuilla [ka-wee-ahh], Tongva [tong-va], Luiseño [loo-say-ngo], and Serrano [se-ran-oh] peoples and all of their ancestors and descendants, past, present, and future. Today this meeting place is home to many Indigenous peoples from all over the world, including UCR faculty, students, and staff, and we are grateful to have the opportunity to live and work on these homelands.

---
## Table of Contents

* Longitudinal Data

* Time-Varying Coefficient Models

* Standard Error Estimation

* Bandwidth and Kernel Function

* Simulation Study

---
layout: false
class: inverse, middle, center

# Longitudinal Data

---

## Longitudinal Data

Longitudinal Data are repeated measurements for a subject collected at different time points.

- Measurements can be collected at equally or irregular time points

- Each subject may contain different measurements

- Each measurement is correlated with one another

---
layout: false
class: inverse, middle, center

# Time-Varying Coefficient Models
---
## Model
Parametric Model:
.size130[
$$
Y=\boldsymbol X^\mathrm T\boldsymbol\beta+\epsilon
$$
]

- `$\boldsymbol \beta=(\beta_1,\beta_2,...,\beta_p)^T$`
- `$\boldsymbol X=(X_1,X_2,...,X_p)^T$`
- `$Y$` response variable
- `$\epsilon$` error term

---
## Model

Time-Varying Coefficient Model:

.size130[
$$
Y=\boldsymbol X^\mathrm T\boldsymbol\beta(t)+\epsilon
$$
]

- `$\boldsymbol \beta(t)=\{\beta_1(t),\beta_2(t),....\beta_p(t)\}^T$`
- `$\boldsymbol X=(X_1,X_2,...,X_p)^T$`
- `$Y$` response variable
- `$\epsilon$` error term

???
- `$\boldsymbol \beta$` vector of coefficients
- `$\boldsymbol X$` vector of predictors
- `$Y$` response variable
- `$\epsilon$` error term
- `$\boldsymbol \beta(t)$` varying coefficient

---

## Estimation

- `$\boldsymbol \beta (t)$` is unknown

### Approximation techniques

- Polynomial Splines

- Smoothing Splines

- Local Polynomials

???

We will focus on local polynomials

- directly estimate the function at a grid point

- The function is described by a set of gridpoints

- Use Local Linear models

---

## Local Linear Model

For a set of grid points, the varying coefficient is approximated around `$t_0$` with a Taylor's Expansion
.size130[
$$
\boldsymbol \beta(t)\approx \boldsymbol \beta(t_0)+\boldsymbol \beta^\prime(t_0)(t-t_0)
$$
]

The model can be rewritten as
.size130[
$$
\boldsymbol \beta(t)\approx \boldsymbol a+\boldsymbol b (t-t_0)
$$
]

???

* `$\boldsymbol \beta(t_0)$` is the function
* `$\boldsymbol \beta(t_0)^\prime$` is the first derivative with respect to t
* As long as we are in the neighborhood of `$t_0$` this approximation is correct.
* No higher order polynomial are necessary
* Lessens the chances to be affected by the curse of dimensionality
* You need to choose a bandwidth and kernel function

---

### Estimation Procedure

---

### Estimating value `$t_0=0.5$`

Bandwidth and Kernel Function

---
## Local Least Squares
To find the estimates of the varying coefficient, we minimize the local least squares function. 
For `$n$` subjects, each subject containing `$n_i$` measurements, the local least squares is formulated as

`\begin{equation}
L(\boldsymbol a,\boldsymbol b)=\sum^n_{i=1}\sum^{n_i}_{j=1}\left[Y_{ij}-\boldsymbol X_i^\mathrm T \{\boldsymbol a+\boldsymbol b(t_{ij}-t_0)\} \right]^2K_h(t_{ij},t_0)
\end{equation}`

--
- `$t_{ij}$`: time point

--
- `$Y_{ij}$`: outcome

--
- `$\boldsymbol X_{i}$`: time-invariant predictors

--
- `$K_h(\cdot)$`: kernel function with associated bandwidth `$h$`

---
## Weighted Least Squares Estimator

The estimates for `$\boldsymbol a (t_0)$` are found with with least squares estimator:

`\begin{equation}
\hat {\boldsymbol a} (t_0)=(\boldsymbol I_p, \boldsymbol 0_p)\left(\sum_{i=1}^n\mathcal X_i^\mathrm T \mathcal K_i \mathcal X_i\right)^{-1}\left(\sum_{i=1}^n\mathcal X_i^\mathrm T \mathcal K_i \mathcal Y_i\right)
\end{equation}`

--
- `$\mathcal Y_i$`: vector of repeated measurements for `$i^{th}$` subject

- `$\mathcal K_i$`: `$n_i \times n_i$` diagonal matrix accounting for the weights

- `$\mathcal X_i$`: `$n_i \times 2p$` design matrix of local linear model

- `$p$`: number of predictors

???

Since we only care about the function, we do not need to know how the first derivative looks like.

Next slide is messy

---
## Asymptotic Theory

According to Zhang and Lee (2000), the asymptotic distribution is

`\begin{equation}
cov^{-1/2}\{\hat{\boldsymbol a}(t_0)\}[\hat{\boldsymbol a}(t_0)-\boldsymbol a (t_0)-bias\{\hat{\boldsymbol a}(t_0)\}]\xrightarrow{D} N(\boldsymbol 0,\boldsymbol I_{p})
\end{equation}`

`\begin{equation}
bias\left\lbrace\hat{\boldsymbol a}(t_0)\right\rbrace=2^{-1}h^2\mu_2\boldsymbol a^{\prime\prime}(t_0)
\end{equation}`

`\begin{equation}
cov\{\hat{\boldsymbol a}(t_0)\}=\lbrace nh f(_0)E(XX^\mathrm T|T=t_0)\rbrace^{-1}\nu_0\sigma^2(t_0)
\end{equation}`

--
--

- `$\nu_i=\int t^i K^2(t)dt$`

- `$\mu_i=\int t^iK(t)dt$`

- `$f(t_0)$`: Density function of `$T$`

???

The main thing to take away is that the estimators are asymptotically normal

---

## Asymptotic Covariance

.size60[
`\begin{eqnarray*}
& \widehat{cov} \{\hat {\boldsymbol a}(t_0) \} \approx \\
& (\boldsymbol I_p, \boldsymbol 0_p)\left(\sum_{i=1}^n\mathcal X_i^\mathrm T \mathcal K_i \mathcal X_i\right)^{-1}\left(\sum_{i=1}^n\mathcal X_i^\mathrm T \mathcal K_i \mathcal Q_i \mathcal K_i \mathcal X_i\right)\left(\sum_{i=1}^n\mathcal X_i^\mathrm T \mathcal K_i \mathcal X_i\right)^{-1}(\boldsymbol I_p, \boldsymbol 0_p)^\mathrm T
\end{eqnarray*}`
]

- `$\mathcal Q_i$`: a diagonal matrix of the squared residuals

- The sandwich estimator is used

???

The sandwich estimator has shown to provide consistent results.

---
layout: false
class: inverse, middle, center

# Standard Error Estimation for Longitudinal Data

---

## Standard Errors

Longitudinal data have correlated repeated measurements; therefore, the correlation must be taken into account to accurately estimate the standard errors. The following papers provide methods to estimate the standard error with correlated data.

- Wu and Chiang (2000)

- Fan and Huang (2005)

- Fan, Huang, and Li (2007)

---

## Bootstrap Standard Errors

Due to mispecification of the correlation matrix, the standard errors may be biased. An alternative method is to estimate the standard errors via bootstrap.

---
layout: false
class: inverse, middle, center

# Bandwidth and Kernel Function

---

## Bandwidth

The choice of bandwidth has an effect on the bias-variance trade-off

- Smaller `$h$`, smaller bias, larger variance

- Larger `$h$`, larger bias, smaller variance

- Need to find ideal bandwidth to minimize both

- The ideal bandwidth can be found via a cross-validation approach

---

## Kernel Function

Use the Epanechnikov Kernel Function:

`$K(z) = \frac{3}{4}(1-z^2)_+$`

---
layout: false
class: inverse, middle, center

# Simulation Study
---
## Normal Simulation Parameters

- 250 Monte Carlo Datasets

- 250 participants

- 25 equally-space time points from 0 to 1

- 1 predictor from `$N(-2,1)$`

- `$\beta_0(t)=\sqrt t$`

- `$\beta_1(t)=-\sin(t)$`

- Outcome was generated from a normal distribution

---
## Normal Estimation VCM

- WLS estimator was used to obtain vcm estimates at 100 grid points equally spaced from 0 to 1

- `$h=0.1$`

- Epanechnikov Kernel Function was used

---
## Normal Data Results
<img src="Presentation_UCM_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" />
---
## Normal Data Results
<img src="Presentation_UCM_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" />

---

## Thank You!

Be Safe and Healthy!

---
layout: false
class: inverse, middle, center

# Appendix

---
layout: false
class: inverse, middle, center

# Useful References

---
## Varying-Coefficient Models

- Fan and Zhang (2008)
- Hastie and Tibshirani (1993)
- Hoover, Rice, Wu, and Yang (1998)
- Cai, Fan, and Li (2000)
- Zhang and Lee (2000)
- Kürüm, Li, Wang, and Şentürk (2014)
- Kürüm, Li, Shiffman, and Yao (2016)

---

## VCM for Longitudinal Data

- Hoover, Rice, Wu, et al. (1998)

- Wu and Chiang (2000)

- Fan and Huang (2005)

- Fan, Huang, and Li (2007)

- Xue and Zhu (2007)

- Fan and Wu (2008)

---
layout: false
class: inverse, middle, center

# Generalized Time-Varying Coefficient Models
---
## Model

`\begin{equation}
g\lbrace m(t,\boldsymbol X)\rbrace= E(Y|\boldsymbol X,t)=\boldsymbol X^\mathrm T\boldsymbol{\beta}(t)
\end{equation}`

- `$\boldsymbol X$`: vector of predictors

- `$Y$`: outcome

- `$t$`: time point

- `$g(\cdot)$`: canonical link-function

???

g is a canonical link function

---
## Local Linear Model

For a set of grid points, the varying coefficient is approximates around `$t_0$` with a Taylor's Expansion
.size130[
$$
\boldsymbol \beta(t)\approx \boldsymbol \beta(t_0)+\boldsymbol \beta^\prime(t_0)(t-t_0)
$$
]

The model can be rewritten as
.size130[
$$
\boldsymbol \beta(t)\approx \boldsymbol a+\boldsymbol b (t-t_0)
$$
]

???

---
## Local Log-Likelihood Function

For `$n$` subjects, each subject containing `$n_i$` measurements, the local log-likelihood function is constructed as

.size90[
`\begin{equation}
\mathcal{L} (\boldsymbol a,\boldsymbol b)=\sum^n_{i=1}\sum^{n_i}_{j=1}\ell (g^{-1}[ \boldsymbol X_i^\mathrm T\lbrace \boldsymbol a+\boldsymbol b(t_{ij}-t_0)\rbrace],Y_{ij})K_h(t_{ij}-t_0)
\end{equation}`
]

- `$\ell(\cdot,\cdot)$`: log-likelihood function

--
- `$t_{ij}$`: time point

--
- `$Y_{ij}$`: outcome

--
- `$\boldsymbol X_{i}$`: time-invariant predictors

--
- `$K_h(\cdot)$`: kernel function with associated bandwidth `$h$`

---
## Estimator

The estimator that minimizes `$-\mathcal L(\boldsymbol a,\boldsymbol b)$` is found via a Newton-Raphson algorithm with its update

`\begin{equation}
\boldsymbol c^{(it+1)}=\boldsymbol c^{(it)}-\{\mathcal H^{(it)}\}^{-1}\mathcal G^{(it)} 
\end{equation}`

- `$\boldsymbol c = (\boldsymbol a^\mathrm T,\boldsymbol b^\mathrm T)^\mathrm T$`

- `$\boldsymbol c^{(it)}$`: current iteration of `$\boldsymbol c$`

- `$\mathcal H^{(it)}=-\mathcal L^{\prime\prime}(\boldsymbol a,\boldsymbol b)$`

- `$\mathcal G^{(it)}=-\mathcal L^{\prime}(\boldsymbol a,\boldsymbol b)$`

???

Initial estimates can be obtained from glmmm

---
## Asymptotic Theory

Based on the regulatory conditions provided by Cai, Fan, and Li (2000) and Kürüm, Li, Shiffman, et al. (2016), the asymptotic distribution for `$\boldsymbol c_{ML}$` is given as

`\begin{equation}
\sqrt{nh}\left\{\boldsymbol H\left( \boldsymbol c_{ML}-\boldsymbol c\right)-\mathrm{bias}(\boldsymbol c)+o_P(h^2) \right\}\sim N(0,\boldsymbol \Sigma)
\end{equation}`

- `$\boldsymbol H=diag(1,h)\otimes\boldsymbol I_{p}$`

- `$\boldsymbol \Sigma$`: covariance of `$\boldsymbol c$`

---
## One-Step Estimator

To reduce computational burden, Cai, Fan, and Li (2000) propose the one-step estimator

`\begin{equation}
\boldsymbol c_{OS}=\boldsymbol c^{(0)}-\{\mathcal H^{(0)}\}^{-1}\mathcal G^{(0)} 
\end{equation}`

- `$\boldsymbol c^{(0)}$`: initial value of `$\boldsymbol c$`

- `$\mathcal H^{(0)}=-\mathcal L^{\prime\prime}(\boldsymbol a,\boldsymbol b)$`

- `$\mathcal G^{(0)}=-\mathcal L^{\prime}(\boldsymbol a,\boldsymbol b)$`

---

## One-Step Theorem

Cai, Fan, and Li (2000) provides this theorem

`\begin{equation}
diag(1,h)\otimes \boldsymbol I_{p}\{ \boldsymbol c^{(0)}-\boldsymbol c\}=O_p\{h^2+(nh)^{-1/2}\}.
\end{equation}`

This means as long as your initial estimate is close to the truth, `$\boldsymbol c_{OS}$` has the same asymptotic distribution of `$\boldsymbol c_{ML}$`

---

## Using OS Estimator

- Find the estimates for the first grid point using a Newton-Raphson algorithm

- Use the estimates of the first grid point as the initial values for the next grid point's estimates

- Repeat until all grid points' estimates are found

---
## Standard Error

`\begin{equation}
\widehat{cov}\{\hat{\boldsymbol a}(t_0)\}=
 (\boldsymbol I_{p},\boldsymbol 0_{p})
\hat{\boldsymbol \Gamma}_1^{-1}\hat{\boldsymbol \Gamma}_2^{-1}\hat{\boldsymbol \Gamma}_1^{-1}
(\boldsymbol I_{p},\boldsymbol 0_{p})^\mathrm T,  
\end{equation}`
where

.size60[
`\begin{equation}
\hat{\boldsymbol \Gamma}_1=\sum_{i=1}^n \sum_{j=1}^{n_i} z_2\left[\boldsymbol X_i^\mathrm T\left\lbrace\hat{\boldsymbol a}(t_0)+\hat{\boldsymbol b}(t_0)(t_{ij}-t_0)\right\rbrace,Y_{ij}\right]\boldsymbol T_{ij} \otimes(\boldsymbol X_i^\mathrm T\boldsymbol X_i)K_h(t_{ij}-t_0),
\end{equation}`
]

.size60[
`\begin{equation}
\hat{\boldsymbol \Gamma}_1=\sum_{i=1}^n \sum_{j=1}^{n_i} z_1^2\left[\boldsymbol X_i^\mathrm T\left\lbrace\hat{\boldsymbol a}(t_0)+\hat{\boldsymbol b}(t_0)(t_{ij}-t_0)\right\rbrace,Y_{ij}\right]\boldsymbol T_{ij} \otimes(\boldsymbol X_i^\mathrm T\boldsymbol X_i)K_h(t_{ij}-t_0),
\end{equation}`
]

- `$z_j=\frac{\partial^j}{\partial s^j}\ell\{g^{-1}(s),y\}$`

- `$\boldsymbol T_{ij}=(1, t_{ij}-t_0)^\mathrm T(1, t_{ij}-t_0)$` for `$j=1,...,n_i$`.

---
## Binary Simulation Parameters

- 250 Monte Carlo Datasets

- 250 participants

- 25 equally-space time points from 0 to 1

- 2 predictor from `$N\{(-1,1)^\mathrm T,diag(1.5^2,.5^2)\}$`

- `$\beta_0(t)=\sin (t)$`

- `$\beta_1(t)=\sqrt t$`

- `$\beta_2(t)=-\cos(t)$`

- Outcome was generated from a latent normal distribution

---
## Binary Estimation VCM

- Initial values obtained from GLMM

- OS estimator was used to obtain vcm estimates at 100 grid points equally spaced from 0 to 1

- `$h=0.1$`

- Epanechnikov Kernel Function was used

---
## Binary Data Results

---
## Binary Data Results

---

## Binary Data Results

---
layout: false
class: inverse, middle, center

# Bandwidth Selection

---

## Bandwidth Selection

Choosing the correct bandwidth is important for the bias-variance trade-off.

---

## Reference
.scrollable[
Cai, Z., J. Fan, and R. Li (2000). "Efficient Estimation and Inferences
for Varying-Coefficient Models". In: _Journal of the American
Statistical Association_ 95.451, pp. 888-902. ISSN: 0162-1459. DOI:
[10.2307/2669472](https://doi.org/10.2307%2F2669472).

Fan, J. and T. Huang (2005). "Profile Likelihood Inferences on
Semiparametric Varying-Coefficient Partially Linear Models". In:
_Bernoulli_ 11.6, pp. 1031-1057. ISSN: 1350-7265.

Fan, J., T. Huang, and R. Li (2007). "Analysis of Longitudinal Data
With Semiparametric Estimation of Covariance Function". In: _Journal of
the American Statistical Association_ 102.478, pp. 632-641. ISSN:
0162-1459. DOI:
[10.1198/016214507000000095](https://doi.org/10.1198%2F016214507000000095).

Fan, J. and Y. Wu (2008). "Semiparametric Estimation of Covariance
Matrixes for Longitudinal Data". In: _Journal of the American
Statistical Association_ 103.484, pp. 1520-1533. ISSN: 0162-1459. DOI:
[10.1198/016214508000000742](https://doi.org/10.1198%2F016214508000000742).

Fan, J. and W. Zhang (2008). "Statistical Methods with Varying
Coefficient Models". In: _Statistics and its interface_ 1.1, pp.
179-195. ISSN: 1938-7989.

Hastie, T. and R. Tibshirani (1993). "Varying-Coefficient Models". En.
In: _Journal of the Royal Statistical Society: Series B
(Methodological)_ 55.4, pp. 757-779. ISSN: 2517-6161. DOI:
[10.1111/j.2517-6161.1993.tb01939.x](https://doi.org/10.1111%2Fj.2517-6161.1993.tb01939.x).

Hoover, D. R., J. A. Rice, C. O. Wu, et al. (1998). "Nonparametric
Smoothing Estimates of Time-Varying Coefficient Models with
Longitudinal Data". En. In: _Biometrika_ 85.4, pp. 809-822. ISSN:
0006-3444. DOI:
[10.1093/biomet/85.4.809](https://doi.org/10.1093%2Fbiomet%2F85.4.809).

Kürüm, E., R. Li, S. Shiffman, et al. (2016). "Time-Varying Coefficient
Models for Joint Modeling Binary and Continuous Outcomes in
Longitudinal Data". In: _Statistica Sinica_ 26.3, pp. 979-1000. ISSN:
1017-0405.

Kürüm, E., R. Li, Y. Wang, et al. (2014). "Nonlinear
Varying-Coefficient Models with Applications to a Photosynthesis
Study". En. In: _Journal of Agricultural, Biological, and Environmental
Statistics_ 19.1, pp. 57-81. ISSN: 1537-2693. DOI:
[10.1007/s13253-013-0157-7](https://doi.org/10.1007%2Fs13253-013-0157-7).

Wu, C. O. and C. Chiang (2000). "KERNEL SMOOTHING ON VARYING
COEFFICIENT MODELS WITH LONGITUDINAL DEPENDENT VARIABLE". In:
_Statistica Sinica_ 10.2, pp. 433-456. ISSN: 1017-0405.

Xue, L. and L. Zhu (2007). "Empirical Likelihood for a Varying
Coefficient Model With Longitudinal Data". In: _Journal of the American
Statistical Association_ 102.478, pp. 642-654. ISSN: 0162-1459. DOI:
[10.1198/016214507000000293](https://doi.org/10.1198%2F016214507000000293).

Zhang, W. and S. Lee (2000). "Variable Bandwidth Selection in
Varying-Coefficient Models". In: _Journal of Multivariate Analysis_
74.1, pp. 116-134. ISSN: 0047-259X. DOI:
[10.1006/jmva.1999.1883](https://doi.org/10.1006%2Fjmva.1999.1883).

Aden-Buie, G. (2020). _xaringanthemer: Custom 'Xaringan' CSS Themes_.
https://pkg.garrickadenbuie.com/xaringanthemer,
https://github.com/gadenbuie/xaringanthemer.

Bates, D., M. Maechler, B. Bolker, et al. (2019). _lme4: Linear
Mixed-Effects Models using 'Eigen' and S4_. R package version 1.1-21.
URL:
[https://CRAN.R-project.org/package=lme4](https://CRAN.R-project.org/package=lme4).

Genz, A., F. Bretz, T. Miwa, et al. (2020). _mvtnorm: Multivariate
Normal and t Distributions_. R package version 1.1-0. URL:
[https://CRAN.R-project.org/package=mvtnorm](https://CRAN.R-project.org/package=mvtnorm).

McLean, M. W. (2019). _RefManageR: Straightforward 'BibTeX' and
'BibLaTeX' Bibliography Management_. R package version 1.2.12. URL:
[https://CRAN.R-project.org/package=RefManageR](https://CRAN.R-project.org/package=RefManageR).

R Core Team (2020). _R: A Language and Environment for Statistical
Computing_. R Foundation for Statistical Computing. Vienna, Austria.
URL: [https://www.R-project.org/](https://www.R-project.org/).

Wickham, H. (2019). _tidyverse: Easily Install and Load the
'Tidyverse'_. R package version 1.3.0. URL:
[https://CRAN.R-project.org/package=tidyverse](https://CRAN.R-project.org/package=tidyverse).

Xie, Y. (2020). _xaringan: Presentation Ninja_. R package version 0.15.
URL:
[https://CRAN.R-project.org/package=xaringan](https://CRAN.R-project.org/package=xaringan).
]