Description
Missingness Benchmark for Continuous Glucose Monitoring Data.
Description
Evaluates predictive performance under feature-level missingness in repeated-measures continuous glucose monitoring-like data. The benchmark injects missing values at user-specified rates, imputes incomplete feature matrices using an iterative chained-equations approach inspired by multivariate imputation by chained equations (MICE; Azur et al. (2011) <doi:10.1002/mpr.329>), fits Random Forest regression models (Breiman (2001) <doi:10.1023/A:1010933404324>) and k-nearest-neighbor regression models (Zhang (2016) <doi:10.21037/atm.2016.03.37>), and reports mean absolute percentage error and R-squared across missingness rates.
README.md
CGMissingDataR
CGMissingDataR is an R package based on the CGMissingData Python library for evaluating model performance under feature missingness by:
- injecting missing values into feature columns at specified masking rates,
- imputing missing values using a Multiple Imputation by Chained Equations (MICE)-style iterative imputer, and
- training Random Forest and k-Nearest Neighbors regressors to report Mean ABsolute Percentage Error (MAPE) and R across missingness levels.
Before the installation, ensure that you have the following R packages installed:
install.packages(c("FNN", "ranger", "mice"))
Install the development version of CGMissingDataR from GitHub:
devtools::install_github("saraswatsh/CGMissingDataR")
Vignette
A brief vignette illustrating the usage of CGMissingDataR can be found here.
Changelog
The changelog is available here.