MyNixOS website logo
Description

Weighted Empirical Risk Minimization for Changepoint Regression.

R interface to the 'weightederm' package for 'Python', which provides 'scikit-learn'-style estimators for offline change point regression (data segmentation) via weighted empirical risk minimization. Supports least-squares, Huber, and logistic losses with fixed or cross-validated numbers of change points. Wraps 'Python' via 'reticulate'. Arpino and Venkataramanan (2026) <doi:10.48550/arXiv.2604.11746>.

𐄷 Weighted ERM - fast & accurate change point regression in R


weightederm

arXiv DOI
CRAN status R-CMD-check

R interface to the weightederm Python package — changepoint regression via Weighted Empirical Risk Minimization (WERM).

Overview

weightederm detects changepoints in ordered regression data by minimizing a Weighted Empirical Risk (WERM). The method is described in:

Inferring Change Points in Regression via Sample Weighting by Gabriel Arpino and Ramji Venkataramanan.

Six estimators are available:

LossFixed num_chgptsUnknown num_chgpts via CV
Least squareswerm_least_squares()werm_least_squares_cv()
Huberwerm_huber()werm_huber_cv()
Logisticwerm_logistic()werm_logistic_cv()

Prerequisites

This package calls Python via reticulate. You need:

  1. Python ≥ 3.9 — install from https://www.python.org/ or via conda/pyenv.
  2. The weightederm Python package:
pip install git+https://github.com/gabrielarpino/weightederm.git

Installation

From CRAN

install.packages("weightederm")

Development version from GitHub

# install.packages("pak")
pak::pak("gabrielarpino/weightederm-r")

Or with remotes:

# install.packages("remotes")
remotes::install_github("gabrielarpino/weightederm-r")

Pointing to your Python environment

By default, reticulate auto-discovers Python. If it picks the wrong one, call weightederm_configure_python()before your first model fit:

# Option 1 — point to a specific Python binary
weightederm_configure_python(python = "/path/to/your/venv/bin/python")

# Option 2 — point to the Python binary inside a conda environment
weightederm_configure_python("/path/to/miniconda/envs/my_conda_env/bin/python")

You can also set the RETICULATE_PYTHON environment variable in your .Renviron file to avoid calling this every session:

RETICULATE_PYTHON=/path/to/your/venv/bin/python

Quick start

M1 — fixed number of changepoints, sparse linear regression

library(weightederm)
set.seed(0)

n <- 120L; p <- 20L; true_cp <- 60L
X <- matrix(rnorm(n * p), n, p)

beta_left  <- rep(0, p); beta_left[c(1, 4)]  <- c( 2.0, -1.5)
beta_right <- rep(0, p); beta_right[c(1, 4)] <- c(-1.0,  2.5)

y <- numeric(n)
y[1:true_cp]          <- X[1:true_cp, ]          %*% beta_left  + rnorm(true_cp, sd = 0.2)
y[(true_cp+1):n]      <- X[(true_cp+1):n, ]      %*% beta_right + rnorm(n - true_cp, sd = 0.2)

fit <- werm_least_squares(X, y, num_chgpts = 1L, delta = 5L,
                          search_method = "efficient", fit_intercept = FALSE)
fit
#> WERM Changepoint Estimator (werm_least_squares)
#>   Changepoints (1-indexed): 60
#>   num_chgpts: 1 | num_signals: 2 | n_features_in: 20
#>   Objective: ...

M2 — unknown number of changepoints via cross-validation

set.seed(1)
n <- 180L; p <- 10L
X <- matrix(rnorm(n * p), n, p)

beta_1 <- rep(0, p); beta_1[1]      <- 3.5
beta_2 <- rep(0, p); beta_2[1:2]    <- c(-3.0, 3.0)
beta_3 <- rep(0, p); beta_3[1:3]    <- c(2.5, -2.5, 2.5)

y <- numeric(n)
y[1:60]    <- X[1:60, ]    %*% beta_1 + rnorm(60, sd = 0.05)
y[61:120]  <- X[61:120, ]  %*% beta_2 + rnorm(60, sd = 0.05)
y[121:180] <- X[121:180, ] %*% beta_3 + rnorm(60, sd = 0.05)

fit <- werm_least_squares_cv(X, y, max_num_chgpts = 2L, delta = 5L, cv = 3L,
                              search_method = "efficient", fit_intercept = FALSE)
fit$best_num_chgpts  # 2
fit$changepoints     # near c(60, 120)
fit$cv_results       # data.frame of num_chgpts vs mean_test_score

M3 — binary classification with a structural break

set.seed(2)
n <- 160L; p <- 12L; true_cp <- 80L
X <- matrix(rnorm(n * p), n, p)

beta_left  <- rep(0, p); beta_left[c(1, 3)]  <- c( 2.5, -2.0)
beta_right <- rep(0, p); beta_right[c(1, 3)] <- c(-2.5,  2.0)

eta <- numeric(n)
eta[1:true_cp]       <- X[1:true_cp, ]       %*% beta_left
eta[(true_cp+1):n]   <- X[(true_cp+1):n, ]   %*% beta_right
y <- rbinom(n, 1L, 1 / (1 + exp(-eta)))

fit <- werm_logistic(X, y, num_chgpts = 1L, delta = 5L,
                     search_method = "efficient", fit_intercept = FALSE,
                     max_iter = 300L)
fit$changepoints   # near 80
fit$classes        # c("0", "1")

# Predict probabilities on new data
X_new <- matrix(rnorm(10 * p), 10, p)
predict(fit, X_new, type = "prob")   # 10 × 2 matrix
predict(fit, X_new, type = "class")  # character vector of labels

Key arguments

ArgumentDescriptionDefault
num_chgptsNumber of changepoints (fixed estimators)required
max_num_chgptsUpper bound for CV search (CV estimators)required
deltaMinimum gap between changepoints during search1L
search_method"efficient" (greedy + local refinement) or "brute_force""efficient"
fit_interceptInclude per-segment interceptTRUE
cvNumber of interleaved CV folds (CV estimators)5L
penalty"none", "l1", or "l2""none" ("l2" for logistic)
alphaPenalty strength0.0
epsilonHuber transition parameter1.35
max_iterOptimizer iterations (Huber / logistic)100L

Fitted object

Every werm_*() call returns a named list with class c("werm_<type>", "werm_fit"):

ElementDescription
changepointsInteger vector, 1-indexed (R convention)
num_chgptsNumber of detected changepoints
num_signalsnum_chgpts + 1
objectiveMinimised WERM value
signal_coefs(num_signals × p) Stage-1 WERM coefficient matrix
signal_interceptsNumeric vector or NULL
last_segment_coefCoefficients used by predict()
last_segment_interceptNumeric or NULL
n_features_inNumber of input features
classes(logistic only) Character vector of length 2

CV estimators additionally expose best_num_chgpts, best_score, cv_results (a data frame), num_chgpts_grid, segment_bounds, segment_coefs, and segment_intercepts.

S3 methods

print(fit)           # compact summary
summary(fit)         # print + coefficient table
coef(fit)            # last_segment_coef vector
predict(fit, X_new)  # numeric predictions (regression) or class labels / probabilities (logistic)

Related resources

  • User Guide — algorithm overview, fixed vs CV, penalties, prediction, S3 methods (with tested R examples)
  • weightederm Python package — Python API, parameter reference, benchmark notebook

License

Apache License 2.0 — the same license as the underlying Python package.

Metadata

Version

0.1.0

License

Unknown

Platforms (80)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    uefi
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-uefi
  • aarch64-windows
  • aarch64_be-none
  • arc-linux
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-linux
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • sh4-linux
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-uefi
  • x86_64-windows