MyNixOS website logo
Description

Supervised Learning with Mandatory Splits and Seeds.

Implements the split-fit-evaluate-assess workflow from Hastie, Tibshirani, and Friedman (2009, ISBN:978-0-387-84857-0) "The Elements of Statistical Learning", Chapter 7. Provides three-way data splitting with automatic stratification, mandatory seeds for reproducibility, automatic data type handling, and 10 algorithms out of the box. Uses 'Rust' backend for cross-language deterministic splitting. Designed for tabular supervised learning with minimal ceremony. Polyglot parity with the 'Python' 'mlw' package on 'PyPI'.

ml ml hex logo

A grammar of machine learning workflows for R.

R-CMD-check MIT epagogy.ai

Split, fit, evaluate, assess — four verbs that encode the workflow from Hastie, Tibshirani & Friedman (The Elements of Statistical Learning, Ch. 7). The evaluate/assess boundary makes data leakage inexpressible: ml_evaluate() runs on validation data and can be called freely; ml_assess() runs on held-out test data and locks after one use.

Installation

# Install from GitHub (current)
remotes::install_github("epagogy/ml", subdir = "r")

# install.packages("ml")
# CRAN submission is under review — the line above will work once accepted.

R >= 4.1.0. Optional backends: 'xgboost', 'ranger', 'glmnet', 'kknn', 'e1071', 'naivebayes', 'rpart'.

Usage

library(ml)

s <- ml_split(iris, "Species", seed = 42)

model <- ml_fit(s$train, "Species", seed = 42)
ml_evaluate(model, s$valid)       # check performance, tweak, repeat

final <- ml_fit(s$dev, "Species", seed = 42)
ml_assess(final, test = s$test)   # final exam — second call errors

s$dev is train + valid combined, used for the final refit before assessment. This three-way split (train 60 / valid 20 / test 20) with a .dev convenience accessor follows the textbook protocol exactly.

Core verbs

ml_split()Stratified three-way split → $train, $valid, $test, $dev
ml_fit()Train a model (per-fold preprocessing, deterministic seeding)
ml_evaluate()Validation metrics — repeat freely
ml_assess()Test metrics — once, final, locks after use

These four are the grammar. Everything else extends it:

ml_screen()Algorithm leaderboard
ml_tune()Hyperparameter search
ml_stack()OOF ensemble stacking
ml_predict()Class labels or probabilities
ml_explain()Feature importance
ml_compare()Side-by-side model comparison
ml_validate()Pass/fail deployment gate
ml_drift()Distribution shift detection (KS, chi-squared)
ml_calibrate()Probability calibration (Platt, isotonic)
ml_profile()Dataset summary
ml_save() / ml_load()Serialize to .mlr

Algorithms

13 families. engine = "auto" uses the Rust backend when available; engine = "r" forces the R package backend.

AlgorithmStringClfRegBackend
Logistic"logistic"Ynnet
Decision Tree"decision_tree"YYrpart
Random Forest"random_forest"YYranger
Extra Trees"extra_trees"YYRust
Gradient Boosting"gradient_boosting"YYRust
XGBoost"xgboost"YYxgboost
Ridge"linear"Yglmnet
Elastic Net"elastic_net"Yglmnet
SVM"svm"YYe1071
KNN"knn"YYkknn
Naive Bayes"naive_bayes"Ynaivebayes
AdaBoost"adaboost"YRust
Hist. Gradient Boosting"histgradient"YYRust

Design notes

Seeds.seed = NULL auto-generates a seed and stores it on the result for reproducibility. seed = 42 gives full deterministic control.

Per-fold preprocessing. Scaling and encoding fit on training folds only, never on validation or test. No information leaks across the split boundary.

Error messages. Wrong column name? ml_fit() tells you what columns exist. Wrong algorithm string? It lists the valid ones. Errors aim to fix themselves.

Citation

Roth, S. (2026). A Grammar of Machine Learning Workflows.
doi:10.5281/zenodo.19023838

License

MIT. Simon Roth, 2026.

Metadata

Version

0.1.2

License

Unknown

Platforms (80)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    uefi
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-uefi
  • aarch64-windows
  • aarch64_be-none
  • arc-linux
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-linux
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • sh4-linux
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-uefi
  • x86_64-windows