Description

Broken Stick Model for Irregular Longitudinal Data.

Description

Data on multiple individuals through time are often sampled at times that differ between persons. Irregular observation times can severely complicate the statistical analysis of the data. The broken stick model approximates each subject’s trajectory by one or more connected line segments. The times at which segments connect (breakpoints) are identical for all subjects and under control of the user. A well-fitting broken stick model effectively transforms individual measurements made at irregular times into regular trajectories with common observation times. Specification of the model requires three variables: time, measurement and subject. The model is a special case of the linear mixed model, with time as a linear B-spline and subject as the grouping factor. The main assumptions are: subjects are exchangeable, trajectories between consecutive breakpoints are straight, random effects follow a multivariate normal distribution, and unobserved data are missing at random. The package contains functions for fitting the broken stick model to data, for predicting curves in new data and for plotting broken stick estimates. The package supports two optimization methods, and includes options to structure the variance-covariance matrix of the random effects. The analyst may use the software to smooth growth curves by a series of connected straight lines, to align irregularly observed curves to a common time grid, to create synthetic curves at a user-specified set of breakpoints, to estimate the time-to-time correlation matrix and to predict future observations. See <doi:10.18637/jss.v106.i07> for additional documentation on background, methodology and applications.

README.md

cran.r-project.org

brokenstick

The broken stick model describes a set of individual curves by a linear mixed model using second-order linear B-splines. The main use of the model is to align irregularly observed data to a user-specified grid of break ages.

All fitting can done in the Z-score scale, so nonlinearities and irregular data can be treated as separate problems. This package contains functions for fitting a broken stick model to data, for exporting the parameters of the model for independent use outside this package, and for predicting broken stick curves for new data.

Installation

Install the brokenstick package from CRAN as follows:

install.packages("brokenstick")

The latest version (revision branch) can be installed from GitHub as follows:

install.packages("remotes")
remotes::install_github("growthcharts/brokenstick")

Overview

The broken stick model describes a set of individual curves by a linear mixed model using linear B-splines. The model can be used

to smooth growth curves by a series of connected straight lines;
to align irregularly observed curves to a common age grid;
to create synthetic curves at a user-specified set of break ages;
to estimate the time-to-time correlation matrix;
to predict future observations.

The user specifies a set of break ages at which the straight lines connect. Each individual obtains an estimate at each break age, so the set of estimates of the individual form a smoothed version of the observed trajectory.

The main assumptions of the broken stick model are:

The trajectory between the break ages follows a straight line, and is generally not of particular interest;
Broken stick estimates follow a common multivariate normal distribution;
Missing data are missing at random (MAR);
Individuals are exchangeable and uncorrelated.

In order to conform to the assumption of multivariate normality, the user may fit the broken stick model on suitably transformed data that yield the standard normal ($Z$) scale. Unique feature of the broken stick model are:

Modular: Issues related to non-linearity of the growth curves in the observed scale can be treated separately, i.e., outside the broken stick model;
Local: A given data point will contribute only to the estimates corresponding to the closest break ages;
Exportable: The broken stick model can be exported and reused for prediction for new data in alternative computing environments.

The brokenstick package contains functions for

Fitting the broken stick model to data,
Plotting individual trajectories,
Predicting broken stick estimates for new data.

Resources

Background

I took the name broken stick from Ruppert, Wand, and Carroll (2003), page 59-61, but it is actually much older.
As far as I know, de Kroon et al. (2010) is the first publication that uses the broken stick model without the intercept in a mixed modelling context. See The Terneuzen birth cohort: BMI changes between 2 and 6 years correlate strongest with adult overweight.
The model was formally defined and extended in Flexible Imputation of Missing Data (second edition). See van Buuren (2018).
The evaluation by Anderson et al. (2019) concluded:

We recommend the use of the brokenstick model with standardised Z‐score data. Aside from the accuracy of the fit, another key advantage of the brokenstick model is that it is easier to fit and provides easily interpretable estimates of child growth trajectories.

Instructive materials

Publication: S. van Buuren (2023), Broken Stick Model for Irregular Longitudinal Data. Journal of Statistical Software, 106(7), 1–51. doi:10.18637/jss.v106.i07, html version;
Companion site contains vignettes and articles that explain the model and the use of the software.

Acknowledgment

This work was supported by the Bill & Melinda Gates Foundation. The contents are the sole responsibility of the authors and may not necessarily represent the official views of the Bill & Melinda Gates Foundation or other agencies that may have supported the primary data studies used in the present study.

References

Anderson, C., R. Hafen, O. Sofrygin, L. Ryan, and HBGDki Community. 2019. “Comparing Predictive Abilities of Longitudinal Child Growth Models.” Statistics in Medicine 38 (19): 3555–70. https://doi.org/10.1002/sim.7693.

de Kroon, M. L. A., C. M. Renders, J. P. van Wouwe, S. van Buuren, and R. A. Hirasing. 2010. “The Terneuzen Birth Cohort: BMI Changes Between 2 and 6 Years Correlate Strongest with Adult Overweight.” PLOS One 5 (2): e9155. https://doi.org/10.1371/journal.pone.0009155.

Ruppert, D., M. P. Wand, and R. J. Carroll. 2003. Semiparametric Regression. Cambridge: Cambridge University Press.

van Buuren, S. 2018. Flexible Imputation of Missing Data. 2nd ed. Boca Raton, FL: CRC Press. https://stefvanbuuren.name/fimd/.

r-brokenstick

brokenstick

Installation

Overview

Resources

Background

Instructive materials

Acknowledgment

References

Version

License

Status

Source

Homepage

Platforms (76)

brokenstick

Installation

Overview

Resources

Background

Instructive materials

Acknowledgment

References

Version

License

Status

Source

Homepage

Platforms76 (76)

Platforms (76)