Description

Supervised Learning with Mandatory Splits and Seeds.

Description

Implements the split-fit-evaluate-assess workflow from Hastie, Tibshirani, and Friedman (2009, ISBN:978-0-387-84857-0) "The Elements of Statistical Learning", Chapter 7. Provides three-way data splitting with automatic stratification, mandatory seeds for reproducibility, automatic data type handling, and 10 algorithms out of the box. Uses 'Rust' backend for cross-language deterministic splitting. Designed for tabular supervised learning with minimal ceremony. Polyglot parity with the 'Python' 'mlw' package on 'PyPI'.

README.md

cran.r-project.org

ml

A grammar of machine learning workflows for R.

Split, fit, evaluate, assess — four verbs that encode the workflow from Hastie, Tibshirani & Friedman (The Elements of Statistical Learning, Ch. 7). The evaluate/assess boundary makes data leakage inexpressible: ml_evaluate() runs on validation data and can be called freely; ml_assess() runs on held-out test data and locks after one use.

Installation

# Install from GitHub (current)
remotes::install_github("epagogy/ml", subdir = "r")

# install.packages("ml")
# CRAN submission is under review — the line above will work once accepted.

R >= 4.1.0. Optional backends: 'xgboost', 'ranger', 'glmnet', 'kknn', 'e1071', 'naivebayes', 'rpart'.

Usage

library(ml)

s <- ml_split(iris, "Species", seed = 42)

model <- ml_fit(s$train, "Species", seed = 42)
ml_evaluate(model, s$valid)       # check performance, tweak, repeat

final <- ml_fit(s$dev, "Species", seed = 42)
ml_assess(final, test = s$test)   # final exam — second call errors

s$dev is train + valid combined, used for the final refit before assessment. This three-way split (train 60 / valid 20 / test 20) with a .dev convenience accessor follows the textbook protocol exactly.

Core verbs


`ml_split()`	Stratified three-way split → `$train`, `$valid`, `$test`, `$dev`
`ml_fit()`	Train a model (per-fold preprocessing, deterministic seeding)
`ml_evaluate()`	Validation metrics — repeat freely
`ml_assess()`	Test metrics — once, final, locks after use

These four are the grammar. Everything else extends it:


`ml_screen()`	Algorithm leaderboard
`ml_tune()`	Hyperparameter search
`ml_stack()`	OOF ensemble stacking
`ml_predict()`	Class labels or probabilities
`ml_explain()`	Feature importance
`ml_compare()`	Side-by-side model comparison
`ml_validate()`	Pass/fail deployment gate
`ml_drift()`	Distribution shift detection (KS, chi-squared)
`ml_calibrate()`	Probability calibration (Platt, isotonic)
`ml_profile()`	Dataset summary
`ml_save()` / `ml_load()`	Serialize to `.mlr`

Algorithms

13 families. engine = "auto" uses the Rust backend when available; engine = "r" forces the R package backend.

Algorithm	String	Clf	Reg	Backend
Logistic	`"logistic"`	Y		nnet
Decision Tree	`"decision_tree"`	Y	Y	rpart
Random Forest	`"random_forest"`	Y	Y	ranger
Extra Trees	`"extra_trees"`	Y	Y	Rust
Gradient Boosting	`"gradient_boosting"`	Y	Y	Rust
XGBoost	`"xgboost"`	Y	Y	xgboost
Ridge	`"linear"`		Y	glmnet
Elastic Net	`"elastic_net"`		Y	glmnet
SVM	`"svm"`	Y	Y	e1071
KNN	`"knn"`	Y	Y	kknn
Naive Bayes	`"naive_bayes"`	Y		naivebayes
AdaBoost	`"adaboost"`	Y		Rust
Hist. Gradient Boosting	`"histgradient"`	Y	Y	Rust

Design notes

Seeds.seed = NULL auto-generates a seed and stores it on the result for reproducibility. seed = 42 gives full deterministic control.

Per-fold preprocessing. Scaling and encoding fit on training folds only, never on validation or test. No information leaks across the split boundary.

Error messages. Wrong column name? ml_fit() tells you what columns exist. Wrong algorithm string? It lists the valid ones. Errors aim to fix themselves.

Citation

Roth, S. (2026). A Grammar of Machine Learning Workflows.
doi:10.5281/zenodo.19023838

License

MIT. Simon Roth, 2026.

r-ml

ml

Installation

Usage

Core verbs

Algorithms

Design notes

Citation

License

Version

License

Status

Source

Homepage

Platforms (80)

ml

Installation

Usage

Core verbs

Algorithms

Design notes

Citation

License

Version

License

Status

Source

Homepage

Platforms80 (80)

Platforms (80)