Description

Joint Sparse Regression & Network Learning with Missing Data.

Description

Simultaneously estimates sparse regression coefficients and response network structure in multivariate models with missing data. Unlike traditional approaches requiring imputation, handles missingness natively through unbiased estimating equations (MCAR/MAR compatible). Employs dual L1 regularization with automated selection via cross-validation or information criteria. Includes parallel computation, warm starts, adaptive grids, publication-ready visualizations, and prediction methods. Ideal for genomics, neuroimaging, and multi-trait studies with incomplete high-dimensional outcomes. See Zeng et al. (2025) <doi:10.48550/arXiv.2507.05990>.

README.md

cran.r-project.org

missoNet

Multi-task regression and network estimation with missing responses — no imputation required!

missoNet jointly estimates regression coefficients and the response network (precision matrix) from multi-response data where some responses are missing (MCAR/MAR/MNAR). Estimation is based on unbiased estimating equations with separate L1 regularization for coefficients and the precision matrix, enabling robust multi-trait analysis under incomplete outcomes.

Why missoNet?

Native handling of missing responses without ad‑hoc imputation.
Joint learning of effects (Beta) and conditional dependency structure (Theta).
Two regularization paths with glmnet-like ergonomics.
Reliable model selection via cross‑validation (with the 1‑SE rule) or information criteria (e.g., BIC).
Built for scale: warm starts, parallel, and adaptive lambda grids.

If you only have a single response, classical lasso/elastic net (e.g., glmnet) is simpler and likely faster.

Installation

CRAN (stable)

install.packages("missoNet")

GitHub (development)

# install.packages("devtools")
devtools::install_github("yixiao-zeng/missoNet", build_vignettes = TRUE)

Quick start

library(missoNet)

# Example data with ~15% missing responses (MCAR)
sim <- generateData(n = 300, p = 50, q = 10, rho = 0.15, missing.type = "MCAR")

# Fit along two lambda paths; choose via BIC (no CV)
fit <- missoNet(X = sim$X, Y = sim$Z, GoF = "BIC")

# Extract estimates at the selected solution
Beta  <- fit$est.min$Beta   # p x q regression coefficients
Theta <- fit$est.min$Theta  # q x q precision (conditional network)

# Visualize selection path
plot(fit, type = "scatter")

Cross‑validation & prediction

# 5-fold CV over (lambda.beta, lambda.theta)
cvfit <- cv.missoNet(X = sim$X, Y = sim$Z, kfold = 5)

# Inspect CV heatmap and selected models (min and 1-SE variants)
plot(cvfit, type = "heatmap")

# Predict responses on new data
Y_hat <- predict(cvfit, newx = sim$X, s = "lambda.min")

Tip: Try s = "lambda.1se.beta" or "lambda.1se.theta" for more conservative sparsity when available.

Parallel processing

library(parallel)

cl <- makeCluster(max(1, detectCores() - 1))
cvfit <- cv.missoNet(X = sim$X, Y = sim$Z, kfold = 5,
                     parallel = TRUE, cl = cl)
stopCluster(cl)

Advanced usage

Custom penalty factors

# Lessen the penalty for prior-important predictors
p <- ncol(sim$X); q <- ncol(sim$Z)
beta.pen.factor <- matrix(1, p, q)
beta.pen.factor[c(1, 2), ] <- 0.1

fit <- missoNet(X = sim$X, Y = sim$Z,
                beta.pen.factor = beta.pen.factor)

Adaptive search (faster large runs)

fit <- missoNet(X = sim$X, Y = sim$Z,
                adaptive.search   = TRUE,
                n.lambda.beta     = 50,
                n.lambda.theta    = 50)

Documentation

vignette("missoNet-introduction")
vignette("missoNet-cross-validation")
vignette("missoNet-case-study")

If vignettes are not available from CRAN binaries on your platform, install from source using the GitHub command above with build_vignettes = TRUE.

Performance notes

Handles substantial missingness in responses, without imputation.
Warm starts and adaptive grids often yield 5–10× speedups in large problems.
Scales to p > 1,000 predictors and q > 100 responses with reasonable settings.

Actual performance will depend on sparsity, signal-to-noise, and missingness mechanisms.

When to use (and not)

Great for

Multi-trait genomic studies (eQTL, meQTL, pQTL)
High-dimensional omics with partially observed outcomes
Longitudinal studies with dropout
Network inference under incomplete responses

Not ideal for

Single-response regression (use glmnet or similar)
Extremely sparse information (e.g., >50% missing responses across most traits)

Citation

If you use missoNet in your research, please cite:

@article{zeng2025missonet,
  title   = {Multivariate regression with missing response data for modelling regional DNA methylation QTLs},
  author  = {Zeng, Yixiao and Alam, Shomoita and Bernatsky, Sasha and Hudson, Marie and Colmegna, In{\'e}s and Stephens, David A and Greenwood, Celia MT and Yang, Archer Y},
  journal = {arXiv preprint arXiv:2507.05990},
  year    = {2025},
  url     = {https://arxiv.org/abs/2507.05990}
}

Contributing

Contributions and issues are welcome! Please open a discussion or pull request on the GitHub repository.

License

GPL-2. See the LICENSE file.

r-missoNet

missoNet

Why missoNet?

Installation

Quick start

Cross‑validation & prediction

Parallel processing

Advanced usage

Custom penalty factors

Adaptive search (faster large runs)

Documentation

Performance notes

When to use (and not)

Citation

Contributing

License

Version

License

Status

Source

Homepage

Platforms (76)

missoNet

Why missoNet?

Installation

Quick start

Cross‑validation & prediction

Parallel processing

Advanced usage

Custom penalty factors

Adaptive search (faster large runs)

Documentation

Performance notes

When to use (and not)

Citation

Contributing

License

Version

License

Status

Source

Homepage

Platforms76 (76)

Platforms (76)