Description

Fitting Survival Regression Models via 'Stan'.

Description

Parametric survival regression models under the maximum likelihood approach via 'Stan'. Implemented regression models include accelerated failure time models, proportional hazards models, proportional odds models, accelerated hazard models, Yang and Prentice models, and extended hazard models. Available baseline survival distributions include exponential, Weibull, log-normal, log-logistic, gamma, generalized gamma, rayleigh, Gompertz and fatigue (Birnbaum-Saunders) distributions. References: Lawless (2002) <ISBN:9780471372158>; Bennett (1982) <doi:10.1002/sim.4780020223>; Chen and Wang(2000) <doi:10.1080/01621459.2000.10474236>; Demarqui and Mayrink (2021) <doi:10.1214/20-BJPS471>.

README.md

cran.r-project.org

survstan

The aim of the R package survstan is to provide a

toolkit for fitting survival models using Stan.

The R package survstan can be used to fit right-censored survival data under independent censoring. The implemented models allow the fitting of survival data in the presence/absence of covariates. All inferential procedures are currently based on the maximum likelihood (ML) approach.

Installation

You can install the released version of survstan from CRAN with:

install.packages("survstan")

You can install the development version of survstan from GitHub with:

# install.packages("devtools")
devtools::install_github("fndemarqui/survstan")

Inference procedures

Let $(t_{i}, \delta_{i})$ be the observed survival time and its corresponding failure indicator, $i=1, \cdots, n$, and $\boldsymbol{\theta}$ be a $k \times 1$ vector of parameters. Then, the likelihood function for right-censored survival data under independent censoring can be expressed as:

$$ L(\boldsymbol{\theta}) = \prod_{i=1}^{n}f(t_{i}|\boldsymbol{\theta})^{\delta_{i}}S(t_{i}|\boldsymbol{\theta})^{1-\delta_{i}}. $$

The maximum likelihood estimate (MLE) of $\boldsymbol{\theta}$ is obtained by directly maximization of $\log(L(\boldsymbol{\theta}))$ using the rstan::optimizing() function. The function rstan::optimizing() further provides the hessian matrix of $\log(L(\boldsymbol{\theta}))$, needed to obtain the observed Fisher information matrix, which is given by:

$$ \mathscr{I}(\hat{\boldsymbol{\theta}}) = -\frac{\partial^2}{\partial \boldsymbol{\theta}\boldsymbol{\theta}'} \log L(\boldsymbol{\theta})\mid_{\boldsymbol{\theta}=\hat{\boldsymbol{\theta}}}, $$

Inferences on $\boldsymbol{\theta}$ are then based on the asymptotic properties of the MLE, $\hat{\boldsymbol{\theta}}$, that state that:

$$ \hat{\boldsymbol{\theta}} \asymp N_{k}(\boldsymbol{\theta}, \mathscr{I}^{-1}(\hat{\boldsymbol{\theta}})). $$

Baseline Distributions

Some of the most popular baseline survival distributions are implemented in the R package survstan. Such distributions include:

Exponential
Weibull
Lognormal
Loglogistic
Gamma,
Generalized Gamma (original Stacy’s parametrization)
Generalized Gamma (alternative Prentice’s parametrization)
Gompertz
Rayleigh
Birnbaum-Saunders (fatigue)

The parametrizations adopted in the package survstan are presented next.

Exponential Distribution

If $T \sim \mbox{Exp}(\lambda)$, then

$$ f(t|\lambda) = \lambda\exp\left{-\lambda t\right}I_{[0, \infty)}(t), $$ where $\lambda>0$ is the rate parameter.

The survival and hazard functions in this case are given by:

$$ S(t|\lambda) = \exp\left{-\lambda t\right} $$ and $$ h(t|\lambda) = \lambda. $$

Weibull Distribution

If $T \sim \mbox{Weibull}(\alpha, \gamma)$, then

$$ f(t|\alpha, \gamma) = \frac{\alpha}{\gamma^{\alpha}}t^{\alpha-1}\exp\left{-\left(\frac{t}{\gamma}\right)^{\alpha}\right}I_{[0, \infty)}(t), $$ where $\alpha>0$ and $\gamma>0$ are the shape and scale parameters, respectively.

The survival and hazard functions in this case are given by:

$$ S(t|\alpha, \gamma) = \exp\left{-\left(\frac{t}{\gamma}\right)^{\alpha}\right} $$ and $$ h(t|\alpha, \gamma) = \frac{\alpha}{\gamma^{\alpha}}t^{\alpha-1}. $$

Lognormal Distribution

If $T \sim \mbox{LN}(\mu, \sigma)$, then

$$ f(t|\mu, \sigma) = \frac{1}{\sqrt{2\pi}t\sigma}\exp\left{-\frac{1}{2}\left(\frac{log(t)-\mu}{\sigma}\right)^2\right}I_{[0, \infty)}(t), $$ where $-\infty < \mu < \infty$ and $\sigma>0$ are the mean and standard deviation in the log scale of $T$.

The survival and hazard functions in this case are given by:

$$S(t|\mu, \sigma) = \Phi\left(\frac{-log(t)+\mu}{\sigma}\right)$$ and $$h(t|\mu, \sigma) = \frac{f(t|\mu, \sigma)}{S(t|\mu, \sigma)},$$ where $\Phi(\cdot)$ is the cumulative distribution function of the standard normal distribution.

Loglogistic Distribution

If $T \sim \mbox{LL}(\alpha, \gamma)$, then

$$ f(t|\alpha, \gamma) = \frac{\frac{\alpha}{\gamma}\left(\frac{t}{\gamma}\right)^{\alpha-1}}{\left[1 + \left(\frac{t}{\gamma}\right)^{\alpha}\right]^2}I_{[0, \infty)}(t), ~ \alpha>0, \gamma>0, $$

where $\alpha>0$ and $\gamma>0$ are the shape and scale parameters, respectively.

The survival and hazard functions in this case are given by:

$$S(t|\alpha, \gamma) = \frac{1}{1+ \left(\frac{t}{\gamma}\right)^{\alpha}}$$ and $$ h(t|\alpha, \gamma) = \frac{\frac{\alpha}{\gamma}\left(\frac{t}{\gamma}\right)^{\alpha-1}}{1 + \left(\frac{t}{\gamma}\right)^{\alpha}}. $$

Gamma Distribution

If $T \sim \mbox{Gamma}(\alpha, \lambda)$, then

$$f(t|\alpha, \lambda) = \frac{\lambda^{\alpha}}{\Gamma(\alpha)}t^{\alpha-1}\exp\left{-\lambda t\right}I_{[0, \infty)}(t),$$

where $\Gamma(\alpha) = \int_{0}^{\infty}u^{\alpha-1}\exp{-u}du$ is the gamma function.

The survival function is given by

$$S(t|\alpha, \lambda) = 1 - \frac{\gamma^{}(\alpha, \lambda t)}{\Gamma(\alpha)},$$ where $\gamma^{}(\alpha, \lambda t)$ is the lower incomplete gamma function, which is available only numerically. Finally, the hazard function is expressed as:

$$h(t|\alpha, \lambda) = \frac{f(t|\alpha, \lambda)}{S(t|\alpha, \lambda)}.$$

Generalized Gamma Distribution (original Stacy’s parametrization)

If $T \sim \mbox{ggstacy}(\alpha, \gamma, \kappa)$, then

$$f(t|\alpha, \gamma, \kappa) = \frac{\kappa}{\gamma^{\alpha}\Gamma(\alpha/\kappa)}t^{\alpha-1}\exp\left{-\left(\frac{t}{\gamma}\right)^{\kappa}\right}I_{[0, \infty)}(t),$$ for $\alpha>0$, $\gamma>0$ and $\kappa>0$.

It can be show that the survival function can be expressed as:

$$S(t|\alpha, \gamma, \kappa) = S_{G}(x|\nu, 1),$$ where $x = \displaystyle\left(\frac{t}{\gamma}\right)^\kappa$, and $F_{G}(\cdot|\nu, 1)$ corresponds to the distribution function of a gamma distribution with shape parameter $\nu = \alpha/\gamma$ and scale parameter equals to 1.

Finally, the hazard function is expressed as:

$$h(t|\alpha, \gamma, \kappa) = \frac{f(t|\alpha, \gamma, \kappa)}{S(t|\alpha, \gamma, \kappa)}.$$

Generalized Gamma Distribution (alternative Prentice’s parametrization)

If $T \sim \mbox{ggprentice}(\mu, \sigma, \varphi)$, then

$$f(t | \mu, \sigma, \varphi) = \begin{cases} \frac{|\varphi|(\varphi^{-2})^{\varphi^{-2}}}{\sigma t\Gamma(\varphi^{-2})}\exp{\varphi^{-2}[\varphi w - \exp(\varphi w)]}I_{[0, \infty)}(t), & \varphi \neq 0 \
\frac{1}{\sqrt{2\pi}t\sigma}\exp\left{-\frac{1}{2}\left(\frac{log(t)-\mu}{\sigma}\right)^2\right}I_{[0, \infty)}(t), & \varphi = 0 \end{cases} $$ where $w = \frac{\log(t) - \mu}{\sigma}$, for $-\infty < \mu < \infty$, $\sigma>0$ and $-\infty < \varphi < \infty$$.

It can be show that the survival function can be expressed as:

$$ S(t|\mu, \sigma, \varphi) = \begin{cases} S_{G}(x|1/\varphi^2, 1), & \varphi > 0 \
1-S_{G}(x|1/\varphi^2, 1), & \varphi < 0 \
S_{LN}(x|\mu, \sigma), & \varphi = 0 \end{cases} $$ where $x = \frac{1}{\varphi^2}\exp{\varphi w}$, $S_{G}(\cdot|1/\varphi^2, 1)$ is the distribution function of a gamma distribution with shape parameter $1/\varphi^2$ and scale parameter equals to 1, and $S_{LN}(x|\mu, \sigma)$ corresponds to the survival function of a lognormal distribution with location parameter $\mu$ and scale parameter $\sigma$.

Finally, the hazard function is expressed as:

$$h(t|\alpha, \gamma, \kappa) = \frac{f(t|\alpha, \gamma, \kappa)}{S(t|\alpha, \gamma, \kappa)}.$$

Gompertz Distribution

If $T \sim \mbox{Gamma}(\alpha, \gamma)$, then

$$f(t|\alpha, \lambda) = \alpha\exp\left{\gamma x-\frac{\alpha}{\gamma}\left(e^{\gamma x} - 1\right)\right}I_{[0, \infty)}(t).$$

The survival and hazard functions are given, respectively, by

$$S(t|\alpha, \lambda) = \exp\left{-\frac{\alpha}{\gamma}\left(e^{\gamma x} - 1\right)\right}.$$ and

$$h(t|\alpha, \lambda) = \alpha\exp{\gamma x}.$$

Rayleigh Distribution

Let $T \sim \mbox{rayleigh}(\sigma)$, where $\sigma>0$ is a scale parameter. Then, the density, survival and hazard functions are respectively given by:

$$f(t|\sigma) = \frac{x}{\sigma^2}\exp\left{-\frac{x^2}{2\sigma^2}\right},$$ $$S(t|\sigma) = \exp\left{-\frac{x^2}{2\sigma^2}\right}$$ and

$$h(t|\sigma) = \frac{x}{\sigma^2}.$$

Birnbaum-Saunders (fatigue) Distribution

If $T \sim \mbox{fatigue}(\alpha, \gamma)$, then

$$ f(t|\alpha, \gamma) = \frac{\sqrt{\frac{t}{\gamma}}+\sqrt{\frac{\gamma}{t}}}{2 \alpha t}\phi\left(\sqrt{\frac{t}{\gamma}}+\sqrt{\frac{\gamma}{t}}\right)(t), ~ \alpha>0, \gamma>0, $$

where $\phi(\cdot)$ is the probability density function of a standard normal distribution, $\alpha>0$ and $\gamma>0$ are the shape and scale parameters, respectively.

The survival function in this case is given by:

$$ S(t|\alpha, \gamma) =\Phi\left(\sqrt{\frac{t}{\gamma}}-\sqrt{\frac{\gamma}{t}}\right)(t) $$,

where $\Phi(\cdot)$ is the cumulative distribution function of a standard normal distribution. The hazard function is given by $$h(t|\mu, \sigma) = \frac{f(t|\alpha, \gamma)}{S(t|\alpha, \gamma)}. $$

Regression models

When covariates are available, it is possible to fit six different regression models with the R package survstan:

accelerated failure time (AFT) models;
proportional hazards (PH) models;
proportional odds (PO) models;
accelerated hazard (AH) models.
Yang and Prentice (YP) models.
extended hazard (EH) models.

The regression survival models implemented in the R package survstan are briefly described in the sequel. Denote by $\mathbf{x}$ a $1\times p$ vector of covariates, and let $\boldsymbol{\beta}$ and $\boldsymbol{\phi}$ be $p \times 1$ vectors of regression coefficients, and $\boldsymbol{\theta}$ a vector of parameters associated with some baseline survival distribution. To ensure identifiability of the implemented regression models, we shall assume that the linear predictors $\mathbf{x} \boldsymbol{\beta}$ and $\mathbf{x} \boldsymbol{\phi}$ do not include an intercept term.

Accelerate Failure Time Models

Accelerated failure time (AFT) models are defined as

$$ T = \exp{\mathbf{x} \boldsymbol{\beta}}\nu, $$ where $\nu$ follows a baseline distribution with survival function $S_{0}(\cdot|\boldsymbol{\theta})$ so that

$$ f(t|\boldsymbol{\theta}, \boldsymbol{\beta}, \mathbf{x}) = e^{-\mathbf{x} \boldsymbol{\beta}}f_{0}(te^{-\mathbf{x} \boldsymbol{\beta}}|\boldsymbol{\theta}) $$ and

$$ S(t|\boldsymbol{\theta}, \boldsymbol{\beta}, \mathbf{x}) = S_{0}(t e^{-\mathbf{x} \boldsymbol{\beta}}|\boldsymbol{\theta}). $$

Proportional Hazards Models

Proportional hazards (PH) models are defined as

$$ h(t|\boldsymbol{\theta}, \boldsymbol{\beta}, \mathbf{x}) = h_{0}(t|\boldsymbol{\theta})\exp{\mathbf{x} \boldsymbol{\beta}}, $$ where $h_{0}(t|\boldsymbol{\theta})$ is a baseline hazard function so that

$$ f(t|\boldsymbol{\theta}, \boldsymbol{\beta}, \mathbf{x}) = h_{0}(t|\boldsymbol{\theta})\exp\left{\mathbf{x} \boldsymbol{\beta} - H_{0}(t|\boldsymbol{\theta})e^{\mathbf{x} \boldsymbol{\beta}}\right}, $$ and

$$ S(t|\boldsymbol{\theta}, \boldsymbol{\beta}, \mathbf{x}) = \exp\left{ - H_{0}(t|\boldsymbol{\theta})e^{\mathbf{x} \boldsymbol{\beta}}\right}. $$

Proportional Odds Models

Proportional Odds (PO) models are defined as

$$ R(t|\boldsymbol{\theta}, \boldsymbol{\beta}, \mathbf{x}) = R_{0}(t|\boldsymbol{\theta})\exp{\mathbf{x} \boldsymbol{\beta}}, $$ where $\displaystyle R_{0}(t|\boldsymbol{\theta}) = \frac{1-S_{0}(t|\boldsymbol{\theta})}{S_{0}(t|\boldsymbol{\theta})} = \exp{H_{0}(t|\boldsymbol{\theta})}-1$ is a baseline odds function so that

$$ f(t|\boldsymbol{\theta}, \boldsymbol{\beta}, \mathbf{x}) = \frac{h_{0}(t|\boldsymbol{\theta})\exp{\mathbf{x} \boldsymbol{\beta} + H_{0}(t|\boldsymbol{\theta})}}{[1 + R_{0}(t|\boldsymbol{\theta})e^{\mathbf{x} \boldsymbol{\beta}}]^2}. $$

and

$$ S(t|\boldsymbol{\theta}, \boldsymbol{\beta}, \mathbf{x}) = \frac{1}{1 + R_{0}(t|\boldsymbol{\theta})e^{\mathbf{x} \boldsymbol{\beta}}}. $$

Accelerated Hazard Models

Accelerated hazard (AH) models can be defined as

$$h(t|\boldsymbol{\theta}, \boldsymbol{\beta},\mathbf{x}) = h_{0}\left(t/e^{\mathbf{x}\boldsymbol{\beta}}|\boldsymbol{\theta}\right)$$

so that

$$S(t|\boldsymbol{\theta}, \boldsymbol{\beta},\mathbf{x}) = \exp\left{- H_{0}\left(t/ e^{\mathbf{x}\boldsymbol{\beta}}|\boldsymbol{\theta}\right)e^{\mathbf{x}\boldsymbol{\beta}} \right} $$ and $$f(t|\boldsymbol{\theta}, \boldsymbol{\beta}, \mathbf{x}) = h_{0}\left(t/e^{\mathbf{x}\boldsymbol{\beta}}|\boldsymbol{\theta}\right)\exp\left{- H_{0}\left(t/ e^{\mathbf{x}\boldsymbol{\beta}}|\boldsymbol{\theta}\right)e^{\mathbf{x}\boldsymbol{\beta}} \right}. $$

Extended hazard Models

The survival function of the extended hazard (EH) model is given by:

$$S(t|\boldsymbol{\theta},\boldsymbol{\beta}, \boldsymbol{\phi}) = \exp\left{-H_{0}(t/e^{\mathbf{x}\boldsymbol{\beta}}|\boldsymbol{\theta})\exp(\mathbf{x}(\boldsymbol{\beta} + \boldsymbol{\phi}))\right}. $$

The hazard and the probability density functions are then expressed as:

$$h(t|\boldsymbol{\theta},\boldsymbol{\beta}, \boldsymbol{\phi}) = h_{0}(t/e^{\mathbf{x}\boldsymbol{\beta}}|\boldsymbol{\theta})\exp{\mathbf{x}\boldsymbol{\phi}} $$ and

$$f(t|\boldsymbol{\theta},\boldsymbol{\beta}, \boldsymbol{\phi}) = h_{0}(t/e^{\mathbf{x}\boldsymbol{\beta}}|\boldsymbol{\theta})\exp{\mathbf{x}\boldsymbol{\beta}}\exp\left{-H_{0}(t/e^{\mathbf{x}\boldsymbol{\beta}}|\boldsymbol{\theta})\exp(\mathbf{x}(\boldsymbol{\beta}+ \boldsymbol{\phi}))\right}, $$

respectively.

The EH model includes the AH, AFT and PH models as particular cases when $\boldsymbol{\phi} = \mathbf{0}$, $\boldsymbol{\phi} = -\boldsymbol{\beta}$, and $\boldsymbol{\beta} = \mathbf{0}$, respectively.

Yang and Prentice Models

The survival function of the Yang and Prentice (YP) model is given by:

$$S(t|\boldsymbol{\theta},\boldsymbol{\beta}, \boldsymbol{\phi}) = \left[1+\frac{\kappa_{S}}{\kappa_{L}}R_{0}(t|\boldsymbol{\theta})\right]^{-\kappa_{L}}. $$

The hazard and the probability density functions are then expressed as:

$$h(t|\boldsymbol{\theta},\boldsymbol{\beta}, \boldsymbol{\phi}) = \frac{\kappa_{S}h_{0}(t|\boldsymbol{\theta})\exp{H_{0}(t|\boldsymbol{\theta})}}{\left[1+\frac{\kappa_{S}}{\kappa_{L}}R_{0}(t|\boldsymbol{\theta})\right]} $$ and

$$f(t|\boldsymbol{\theta},\boldsymbol{\beta}, \boldsymbol{\phi}) = \kappa_{S}h_{0}(t|\boldsymbol{\theta})\exp{H_{0}(t|\boldsymbol{\theta})}\left[1+\frac{\kappa_{S}}{\kappa_{L}}R_{0}(t|\boldsymbol{\theta})\right]^{-(1+\kappa_{L})}, $$

respectively, where $\kappa_{S} = \exp{\mathbf{x}\boldsymbol{\beta}}$ and $\kappa_{L} = \exp{\mathbf{x}\boldsymbol{\phi}}$.

The YO model includes the PH and PO models as particular cases when $\boldsymbol{\phi} = \boldsymbol{\beta}$ and $\boldsymbol{\phi} = \mathbf{0}$, respectively.

r-survstan

survstan

Installation

Inference procedures

Baseline Distributions

Exponential Distribution

Weibull Distribution

Lognormal Distribution

Loglogistic Distribution

Gamma Distribution

Generalized Gamma Distribution (original Stacy’s parametrization)

Generalized Gamma Distribution (alternative Prentice’s parametrization)

Gompertz Distribution

Rayleigh Distribution

Birnbaum-Saunders (fatigue) Distribution

Regression models

Accelerate Failure Time Models

Proportional Hazards Models

Proportional Odds Models

Accelerated Hazard Models

Extended hazard Models

Yang and Prentice Models

Version

License

Status

Source

Homepage

Platforms (77)

survstan

Installation

Inference procedures

Baseline Distributions

Exponential Distribution

Weibull Distribution

Lognormal Distribution

Loglogistic Distribution

Gamma Distribution

Generalized Gamma Distribution (original Stacy’s parametrization)

Generalized Gamma Distribution (alternative Prentice’s parametrization)

Gompertz Distribution

Rayleigh Distribution

Birnbaum-Saunders (fatigue) Distribution

Regression models

Accelerate Failure Time Models

Proportional Hazards Models

Proportional Odds Models

Accelerated Hazard Models

Extended hazard Models

Yang and Prentice Models

Version

License

Status

Source

Homepage

Platforms77 (77)

Platforms (77)