Network-Guided Temporal Forests for Feature Selection in High-Dimensional Longitudinal Data.
TemporalForest
TemporalForest is an R package for reproducible feature selection in high-dimensional longitudinal data.
It combines time-aware network reduction (consensus TOM from WGCNA), mixed-effects model trees that respect within-subject correlation, and stability selection to deliver a small, interpretable, and stable set of predictors.
Why TemporalForest?
- Time-aware modules: Builds a consensus topological overlap network across time points to keep temporally persistent correlations.
- Mixed-effects trees: LMER-trees handle within-subject dependence and reduce split bias common in standard trees.
- Stability selection: Uses bootstrapping and selection probabilities to control false discoveries and improve reproducibility.
- Practical speed-ups: Optionally pass a precomputed
dissimilarity_matrixto skip the slow network construction stage. - Designed for omics & sensors: Works well when p » n, repeated measures, and correlated predictors are the norm.
Installation
You can install the released version of TemporalForest from CRAN with:
install.packages("TemporalForest")
And the development version from GitHub with:
# install.packages("remotes")
remotes::install_github("SisiShao/TemporalForest")
30-second Quick Start
A tiny example that skips network construction by supplying a lightweight dissimilarity matrix:
library(TemporalForest)
set.seed(11)
n_subjects <- 60; n_timepoints <- 2; p <- 20
# Build X: list of length T, each an n × p matrix with identical column names
X <- replicate(n_timepoints, matrix(rnorm(n_subjects * p), n_subjects, p), simplify = FALSE)
colnames(X[[1]]) <- colnames(X[[2]]) <- paste0("V", 1:p)
# Long view + metadata
X_long <- do.call(rbind, X)
id <- rep(seq_len(n_subjects), each = n_timepoints)
time <- rep(seq_len(n_timepoints), times = n_subjects)
# Outcome with three strong signals
u_subj <- rnorm(n_subjects, 0, 0.7)
eps <- rnorm(length(id), 0, 0.08)
Y <- 4*X_long[, "V1"] + 3.5*X_long[, "V2"] + 3.2*X_long[, "V3"] +
rep(u_subj, each = n_timepoints) + eps
# Simple dissimilarity to bypass Stage 1 (fast demo)
A <- 1 - abs(stats::cor(X_long)); diag(A) <- 0
dimnames(A) <- list(colnames(X[[1]]), colnames(X[[1]]))
fit <- temporal_forest(
X = X, Y = Y, id = id, time = time,
dissimilarity_matrix = A, # skip WGCNA/TOM
n_features_to_select = 3, # expect V1, V2, V3
n_boot_screen = 6, n_boot_select = 18,
keep_fraction_screen = 1,
min_module_size = 2,
alpha_screen = 0.5, alpha_select = 0.6
)
print(fit$top_features)
#> [1] "V1" "V3" "V2"
For a more detailed example and a full pipeline run, please see the package vignette.
Documentation
A long-form guide and reproducible examples can be found in the vignette: vignette("TemporalForest-Introduction", package = "TemporalForest")
Contributing
Issues and pull requests are welcome! Please report bugs or request features at the official GitHub repository.
Citation
If you use TemporalForest in your work, please cite the manuscript:
Shao, S., Moore, J.H., Ramirez, C.M. (2025). Network-Guided Temporal Forests for Feature Selection in High-Dimensional Longitudinal Data. Manuscript submitted for publication.
You can also get the citation from within R:
citation("TemporalForest")