Environment Based Clustering for Interpretable Predictive Models in High Dimensional Data.
This package is under active development
eclust
The eclust
package implements the methods developped in the paper An analytic approach for interpretable predictive models in high dimensional data, in the presence of interactions with exposures (2017+)Preprint. Breifly, eclust
is a two-step procedure: 1a) a clustering stage where variables are clustered based on some measure of similarity, 1b) a dimension reduction stage where a summary measure is created for each of the clusters, and 2) a simultaneous variable selection and regression stage on the summarized cluster measures.
Installation
You can install the development version of eclust
from GitHub with:
install.packages("pacman")
pacman::p_install_gh("sahirbhatnagar/eclust")
Vignette
See the online vignette for example usage of the functions.
Credit
This package is makes use of several existing packages including:
glmnet
for lasso and elasticnet regressionearth
for MARS modelsWGCNA
for topological overlap matrices
Related Work
- Park, M. Y., Hastie, T., & Tibshirani, R. (2007). Averaged gene expressions for regression. Biostatistics, 8(2), 212-227.
- Bühlmann, P., Rütimann, P., van de Geer, S., & Zhang, C. H. (2013). Correlated variables in regression: clustering and sparse estimation. Journal of Statistical Planning and Inference, 143(11), 1835-1858.
Contact
- Issues: https://github.com/sahirbhatnagar/eclust/issues
- Pull Requests: https://github.com/sahirbhatnagar/eclust/
- e-mail: [email protected]
Latest news
You can see the most recent changes to the package in the NEWS.md file
Code of Conduct
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.