MyNixOS website logo
Description

Comprehensive Regime Change Detection in Time Series.

A unified framework for detecting regime changes (changepoints) in time series data. Implements both frequentist methods including Cumulative Sum (CUSUM, Page (1954) <doi:10.1093/biomet/41.1-2.100>), Pruned Exact Linear Time (PELT, Killick, Fearnhead, and Eckley (2012) <doi:10.1080/01621459.2012.737745>), Binary Segmentation, and Wild Binary Segmentation, as well as Bayesian methods such as Bayesian Online Changepoint Detection (BOCPD, Adams and MacKay (2007) <doi:10.48550/arXiv.0710.3742> and Shiryaev-Roberts. Supports offline analysis for retrospective detection and online monitoring for real-time surveillance. Provides rigorous uncertainty quantification through confidence intervals and posterior distributions. Handles univariate and multivariate series with detection of changes in mean, variance, trend, and distributional properties.
knitr::opts_chunk$set(
  echo = TRUE,
  eval = FALSE,
  message = FALSE,
  warning = FALSE,
  fig.align = "center"
)

\newpage

Introduction

RegimeChange is an R package for detecting structural breaks and regime changes in time series data. It provides a unified interface to both frequentist and Bayesian methods, with particular emphasis on robust performance in challenging scenarios including low signal-to-noise ratios, subtle variance changes, autocorrelated data, and heavy-tailed distributions.

Philosophical Foundations

The Ontological Problem

Regime change detection addresses a fundamental philosophical question:

How do we distinguish a real change from the inherent variability of the world?

Every observable system exhibits fluctuations. The central question is: is this observed fluctuation "noise" (random variation within the same regime) or "signal" (evidence that the system transitioned to a qualitatively different state)?

Conceptual Structure

Duality of States

We assume the world can be described through "regimes" or "states" characterized by parameters $\theta$. A system is in state $\theta_0$ until, at some unknown moment, it transitions to $\theta_1$. This is a discrete ontology of change: there is no graduality, there is rupture.

Imperfect Observation

We do not observe $\theta$ directly, but noisy manifestations $y_i$. The epistemology here is probabilistic: each observation is a "clue" about the true state, but none is conclusive by itself.

Evidence Accumulation

The statistic $S = \sum s_i$ embodies a fundamental epistemological principle: evidence accumulates. Each observation contributes a "vote" in favor of $H_0$ or $H_1$, and the sum represents the total weight of evidence.

The Fundamental Dilemma

There exists an irresolvable epistemological tension between two types of error:

ErrorTechnical NameConsequence
Detecting change where there is noneFalse alarm (Type I)Acting unnecessarily
Not detecting a real changeDelayed detection (Type II)Failing to react in time

The decision threshold represents precisely this negotiation: how much evidence do we demand before declaring that the world has changed?

Universality of the Problem

This structure appears across diverse domains:

Domain$\theta_0$$\theta_1$Question
EconomicsExpansionRecessionWhen did the cycle change?
MedicineStable patientDeteriorationWhen did the crisis begin?
FinanceStable marketBubble/crashWhen did the regime change?
EcologyBalanced ecosystemDisturbedWhen did the collapse occur?
ManufacturingProcess under controlDefectiveWhen did it go out of adjustment?
ClimatologyStationary climateClimate changeWhen did the trend begin?
EpidemiologyEndemicOutbreakWhen did the epidemic start?
Electrical engineeringNormal operationFailure/transientWhen did the event occur?

Information and Surprise

The likelihood ratio:

$$s_i = \ln \left( \frac{p_{\theta_1}(y_i)}{p_{\theta_0}(y_i)} \right)$$

has a deep informational interpretation: it measures how surprising observation $y_i$ is under each hypothesis. If $s_i > 0$, the observation is "more expected" under $\theta_1$ than under $\theta_0$. It is, literally, a quantification of relative surprise.

\newpage

Motivation

Changepoint detection is a fundamental problem in time series analysis with applications in finance, quality control, climatology, and genomics. While several R packages address this problem, practitioners frequently encounter difficulties with:

  • Autocorrelated data: Most methods assume independence, leading to inflated false positive rates
  • Subtle changes: Low SNR or small variance shifts often go undetected
  • Outliers and heavy tails: Standard Gaussian assumptions break down with contaminated data
  • Method selection: No single algorithm dominates across all scenarios

RegimeChange addresses these challenges through adaptive variance floors for numerical stability, optional pre-whitening for autocorrelated series, robust M-estimation for heavy-tailed data, and ensemble methods that leverage multiple algorithms.

Competitive Positioning

Current State of the Art

PackageLanguageStrengthsLimitations
changepointR (C backend)Optimized PELT, stable, matureFrequentist only, no online, limited uncertainty
tidychangepointRTidyverse interface, unifies methodsWrapper, adds no new algorithms
rupturesPythonVery complete for offlineNot Bayesian, not online
bayesian_changepoint_detectionPythonBOCPD with PyTorchBayesian only, doesn't integrate other methods
Kats (Meta)PythonBOCPD, well documentedMeta ecosystem, heavy dependencies

Gaps Addressed

RegimeChange provides:

  • Integration of frequentist AND Bayesian methods
  • Online mode with streaming updates
  • Rigorous uncertainty quantification
  • Multivariate series support
  • Detection of multiple types of change
  • High-performance backend (Julia) with friendly API (R)
  • Deep learning for complex patterns (optional)

Value Proposition

  1. Unified paradigms: frequentist, Bayesian, and deep learning in a coherent API
  2. Both modes: offline for research, online for production
  3. Native uncertainty: estimations come with confidence intervals
  4. Performance + Usability: optional Julia backend, tidyverse-compatible R interface

\newpage

Installation

# Install from GitHub
# install.packages("devtools")
devtools::install_github("IsadoreNabi/RegimeChange")

Quick Start

library(RegimeChange)

# Generate data with a changepoint at t=200
set.seed(42)
data <- c(rnorm(200, 0, 1), rnorm(200, 2, 1))

# Detect using PELT
result <- detect_regimes(data, method = "pelt")
print(result)

# For autocorrelated data, enable AR correction
result_ar <- detect_regimes(data, method = "pelt", correct_ar = TRUE)

# For data with outliers or heavy tails, enable robust estimation
result_robust <- detect_regimes(data, method = "pelt", robust = TRUE)

\newpage

Taxonomy of Changes

Types of Detectable Changes

The library detects changes in different statistical properties:

  • Location: Mean, Trend, Level
  • Dispersion: Variance, Range
  • Dependence: Autocorrelation, Cross-correlation, Structure
  • Distribution: Shape, Tails, Mode

Formal Specification

TypeParameterHypothesisExample
Mean$\mu$$H_0: \mu = \mu_0$ vs $H_1: \mu = \mu_1$Level change in GDP
Variance$\sigma^2$$H_0: \sigma^2 = \sigma_0^2$ vs $H_1: \sigma^2 = \sigma_1^2$Volatility increase
Mean and Variance$(\mu, \sigma^2)$Joint changeEconomic regime transition
Trend$\beta$ (slope)$H_0: \beta = 0$ vs $H_1: \beta \neq 0$Start of inflationary trend
Autocorrelation$\rho$$H_0: \rho = \rho_0$ vs $H_1: \rho = \rho_1$Change in persistence
Distribution$F$$H_0: F = F_0$ vs $H_1: F \neq F_0$General non-parametric change

Simple vs Multiple Changes

Simple change (one point):

A single changepoint $\tau$ divides the series into Regime A and Regime B.

Multiple changes (k points):

Multiple changepoints $\tau_1, \tau_2, \ldots, \tau_k$ divide the series into $k+1$ regimes.

Detection Configuration

# The user can specify what to look for
detect_regimes(
  data,
  type = c("mean", "variance", "both", "trend", "distribution"),
  n_changepoints = c("single", "multiple", "unknown"),
  min_segment_length = 30  # Minimum observations per regime
)

\newpage

Algorithmic Architecture

Method Families

The detection methods are organized into three main families:

  1. Frequentist: PELT, BinSeg, CUSUM, Wild BinSeg, FPOP, E-Divisive, Kernel CPD
  2. Bayesian: BOCPD (Adams-MacKay), Shiryaev-Roberts
  3. Deep Learning: Autoencoders, TCN, Transformer, CPC, Ensemble

Frequentist Methods

CUSUM (Cumulative Sum)

The foundational method. Accumulates deviations from a target value.

Statistic: $$S_n = \sum_{i=1}^{n} (x_i - \mu_0)$$

Decision: Alarm when $|S_n| > h$

Complexity: $O(n)$

Use: Mean change detection, single changepoint.

detect_regimes(data, method = "cusum", threshold = 4)

PELT (Pruned Exact Linear Time)

The gold standard for multiple changepoints.

Idea: Dynamic programming with pruning to find optimal segmentation.

Objective function: $$\sum_{i=1}^{m+1} \left[ \mathcal{C}(y_{(\tau_{i-1}+1):\tau_i}) \right] + \beta m$$

where $\mathcal{C}$ is segment cost and $\beta$ is the penalty for number of changes.

Complexity: $O(n)$ in favorable cases, $O(n^2)$ worst case.

Supported penalties: AIC, BIC, MBIC, MDL, manual.

# PELT (default) - O(n) optimal partitioning
detect_regimes(data, method = "pelt", penalty = "MBIC")

# With AR correction for autocorrelated data
detect_regimes(data, method = "pelt", correct_ar = TRUE)

# With robust estimation for outliers/heavy tails
detect_regimes(data, method = "pelt", robust = TRUE)

Binary Segmentation

Recursive divide and conquer.

Algorithm:

  1. Find the best changepoint in the entire dataset
  2. If significant, split into two segments
  3. Apply recursively to each segment

Complexity: $O(n \log n)$

Limitation: Does not guarantee global optimum (greedy).

detect_regimes(data, method = "binseg", n_changepoints = 3)

Wild Binary Segmentation (WBS)

Improvement on Binary Segmentation with random intervals.

Idea: Instead of searching the entire interval, search M random intervals and take the best.

Advantage: More robust to nearby changes.

detect_regimes(data, method = "wbs", n_intervals = 100)

FPOP (Functional Pruning Optimal Partitioning)

Optimized version of PELT with functional pruning.

Complexity: $O(n)$ guaranteed for certain cost functions.

fpop_detect(data, penalty = "BIC")

E-Divisive

Non-parametric method using energy distance.

Idea: Use energy distance to detect changes in complete distribution.

Advantage: No parametric distribution assumption.

edivisive_detect(data, min_size = 30)

Kernel-based CPD

Idea: Map data to Hilbert space and detect changes in that space.

Advantage: Captures changes in complex data structure.

kernel_cpd_detect(data, kernel = "rbf")

Bayesian Methods

BOCPD (Bayesian Online Changepoint Detection)

Reference: Adams & MacKay (2007)

Central idea: Maintain a posterior distribution over the "run length" (time since last change).

Joint distribution: $$P(r_t, x_{1:t}) = \sum_{r_{t-1}} P(x_t | r_t, x^{(r)}) \cdot P(r_t | r_{t-1}) \cdot P(r_{t-1}, x_{1:t-1})$$

Components:

  • Predictive: $P(x_t | r_t, x^{(r)})$ — probability of datum given run length
  • Changepoint prior: $P(r_t | r_{t-1})$ — prior on change occurrence (typically geometric)
  • Message: $P(r_{t-1}, x_{1:t-1})$ — recursion from previous step

Supported conjugate models:

  • Normal with known variance (Normal-Normal)
  • Normal with unknown mean and variance (Normal-Gamma)
  • Poisson (Gamma-Poisson)
  • Multivariate (Normal-Wishart)

Complexity per observation: $O(t)$ naïve, $O(1)$ with truncation.

# Bayesian Online Changepoint Detection
detect_regimes(data, method = "bocpd")

# Custom prior specification
prior <- normal_gamma(mu0 = 0, kappa0 = 0.1, alpha0 = 1, beta0 = 1)
detect_regimes(data, method = "bocpd", prior = prior)

Shiryaev-Roberts

Statistic: $$R_n = \sum_{k=1}^{n} \prod_{i=k}^{n} \frac{p_{\theta_1}(x_i)}{p_{\theta_0}(x_i)}$$

Interpretation: Sum of likelihood ratios from all possible past changepoints.

Advantage: Asymptotically optimal for minimizing detection delay.

shiryaev_roberts(data, threshold = 100)

Deep Learning Methods

Note: These methods require optional dependencies (keras, tensorflow).

Autoencoders for Detection

Idea: Train autoencoder on "normal" data, detect change when reconstruction error increases.

autoencoder_detect(data, window_size = 50, threshold = 2.0)

TCN (Temporal Convolutional Network)

Convolutional architecture for sequence modeling.

tcn_detect(data, window_size = 100)

Transformer-based Detection

Attention-based architecture for capturing long-range dependencies.

transformer_detect(data, window_size = 100)

CPC (Contrastive Predictive Coding)

Self-supervised representation learning for change detection.

cpc_detect(data, window_size = 50)

Ensemble Deep Learning

Combine multiple deep learning detectors.

ensemble_dl_detect(data, methods = c("autoencoder", "tcn"))

\newpage

Operation Modes

Offline Mode

Context: Retrospective analysis. Complete dataset is available.

Question: "When did the changes occur?"

Characteristics:

  • Can look forward and backward
  • Seeks globally optimal segmentation
  • No temporal constraint

Appropriate algorithms: PELT, Binary Segmentation, WBS, FPOP, E-Divisive

Use cases:

  • Historical analysis of economic series
  • Study of past climate changes
  • Scientific research in general

Online Mode

Context: Real-time surveillance. Data arrives sequentially.

Question: "Is a change occurring now?"

Characteristics:

  • Only sees past and present
  • Must decide immediately
  • Trade-off speed vs false alarms

Appropriate algorithms: CUSUM, BOCPD, Shiryaev-Roberts

Key metrics:

  • ARL$_0$: Average Run Length under $H_0$ (average time to false alarm)
  • ARL$_1$: Average Run Length under $H_1$ (average detection delay)

Use cases:

  • Industrial monitoring
  • Epidemiological surveillance
  • Fraud detection
  • Algorithmic trading

Online Detection API

# Initialize detector
detector <- regime_detector(method = "bocpd")

# Process streaming data
for (x in new_observations) {
  result <- update(detector, x)
  if (result$alarm) {
    message("Changepoint detected at time ", result$time)
    detector <- reset(detector)
  }
}

Unified Interface

# Offline mode (default)
result <- detect_regimes(data, mode = "offline", method = "pelt")

# Online mode via regime_detector
detector <- regime_detector(method = "bocpd", prior = normal_gamma())

for (x in data_stream) {
  result <- update(detector, x)
  
  if (result$alarm) {
    handle_regime_change(result)
  }
}

\newpage

Uncertainty Quantification

Fundamental Principle

Every changepoint estimate must be accompanied by an uncertainty measure.

This is what distinguishes a scientifically rigorous library from a purely practical tool.

Types of Uncertainty

Location Uncertainty

How confident are we that the change occurred exactly at $\hat{\tau}$?

Representation:

  • Confidence interval: $[\hat{\tau} - \delta, \hat{\tau} + \delta]$
  • Complete posterior distribution: $P(\tau | \text{data})$

Existence Uncertainty

How confident are we that there is a change at all?

Representation:

  • p-value (frequentist)
  • Bayes Factor (Bayesian)
  • Posterior probability of change

Number of Changes Uncertainty

How many changes are there really?

Representation:

  • Distribution over number of changes
  • Optimal penalty (BIC, MDL)

Implementation

Bootstrap for Frequentist Methods

bootstrap_changepoint <- function(data, method, B = 1000) {
  
  estimates <- numeric(B)
  
  for (b in 1:B) {
    # Resample blocks to preserve dependence
    data_boot <- block_bootstrap(data)
    estimates[b] <- detect(data_boot, method)$changepoint
  }
  
  # Confidence interval
  ci <- quantile(estimates, c(0.025, 0.975))
  
  return(list(
    estimate = detect(data, method)$changepoint,
    ci = ci,
    se = sd(estimates),
    distribution = estimates
  ))
}

Complete Posterior for Bayesian Methods

BOCPD naturally produces $P(r_t | x_{1:t})$, from which one can derive:

# Probability of change at each point
prob_change <- bocpd_result$posterior[, 1]  # run length = 0

# Most probable changepoint
map_changepoint <- which.max(prob_change)

# Credible interval
credible_interval <- hpd_interval(prob_change, prob = 0.95)

Result Structure

# Every detection function returns:
result <- list(
  # Point estimates
  changepoints = c(45, 120, 380),
  
  # Location uncertainty
  confidence_intervals = list(
    c(42, 48),
    c(115, 125),
    c(375, 385)
  ),
  
  # Existence uncertainty
  existence_probability = c(0.95, 0.88, 0.72),
  
  # Segment information
  segments = list(
    list(start = 1, end = 45, params = list(mean = 2.3, var = 0.5)),
    list(start = 46, end = 120, params = list(mean = 5.1, var = 0.8))
    # ...
  ),
  
  # Diagnostics
  information_criterion = list(BIC = 234.5, AIC = 228.1),
  
  # Complete posterior (if applicable)
  posterior = matrix(...)  # P(change at t) for each t
)

\newpage

Multivariate Series

The Problem

With $d$ simultaneous series, the following can occur:

  1. Synchronous change: All series change at the same point
  2. Asynchronous change: Different series change at different points
  3. Correlation change: Dependence structure changes, even if marginals don't

General Model

Let $\mathbf{X}_t \in \mathbb{R}^d$ be a vector of observations at time $t$.

Before change: $\mathbf{X}_t \sim N(\boldsymbol{\mu}_0, \boldsymbol{\Sigma}_0)$

After change: $\mathbf{X}_t \sim N(\boldsymbol{\mu}_1, \boldsymbol{\Sigma}_1)$

The change can be in:

  • $\boldsymbol{\mu}$ (mean vector)
  • $\boldsymbol{\Sigma}$ (covariance matrix, including correlations)
  • Both

Multivariate Extensions

Multivariate BOCPD

Use Normal-Wishart predictive distribution:

Prior: $(\boldsymbol{\mu}, \boldsymbol{\Sigma}) \sim \text{NIW}(\mu_0, \kappa_0, \nu_0, \boldsymbol{\Psi}_0)$

Update: Conjugate formulas for updating hyperparameters.

prior <- normal_wishart(mu0 = rep(0, d), kappa0 = 1, 
                        nu0 = d + 1, Psi0 = diag(d))
result <- bocpd(data_matrix, prior = prior)

Sparse Projection (Wang & Samworth)

For high dimension ($d >> n$), project to sparse directions before detecting.

Idea: Find direction $\mathbf{v}$ that maximizes change signal, with $||\mathbf{v}||_0 \leq s$.

sparse_projection_cpd(data_matrix, sparsity = 5)

Multivariate API

# Data: n × d matrix
data <- matrix(...)

# Detect synchronous changes
result <- detect_regimes(data, 
                         type = "multivariate",
                         sync = TRUE)

# Detect changes per series
result <- detect_regimes(data,
                         type = "multivariate", 
                         sync = FALSE,
                         share_changepoints = FALSE)

# Detect correlation change
result <- detect_regimes(data,
                         type = "correlation",
                         method = "normal_wishart")

\newpage

Empirical Performance

We conducted comprehensive benchmarks comparing RegimeChange against established R packages: changepoint (Killick & Eckley, 2014), wbs (Fryzlewicz, 2014), not (Baranowski et al., 2019), and ecp (James & Matteson, 2015). Tests were performed with 50 replications per scenario, using synthetic data under maximum difficulty conditions.

Benchmark 1: Frequentist Methods (17 Scenarios)

17 scenarios $\times$ 50 reps = 850 tests. Tolerance = 15 observations.

Overall Ranking (Mean F1 Score)

RankMethodMean F1Package
1RC_auto0.804RegimeChange
2ECP_pkg0.788ecp
3CP_binseg0.777changepoint
4RC_cusum0.761RegimeChange
5NOT_pkg0.722not
6CP_pelt0.720changepoint
7RC_pelt0.719RegimeChange
8RC_wbs_mean0.681RegimeChange
9RC_wbs0.654RegimeChange
10RC_binseg0.652RegimeChange
11WBS_pkg0.634wbs
12RC_ediv0.612RegimeChange
13RC_not0.514RegimeChange

RegimeChange's automatic method selector (RC_auto) achieves the highest overall F1 score, outperforming all reference packages.

Performance by Scenario

ScenarioBest RCBest ReferenceWinner
Mean δ=3 (easy)1.0001.000Tie
Mean δ=2 (medium)1.0001.000Tie
Mean δ=1 (hard)0.9800.980Tie
Mean δ=0.5 (very hard)0.7730.760RC (+0.01)
Variance 4:10.9800.980Tie
Variance 2:10.6670.400RC (+0.27)
Multi-CP (3 changes)1.0001.000Tie
Multi-CP (5 changes)1.0001.000Tie
AR(0.3)+change1.0001.000Tie
AR(0.5)+change1.0000.983RC (+0.02)
AR(0.7)+change0.9000.720RC (+0.18)
2% outliers1.0000.967RC (+0.03)
5% outliers1.0000.963RC (+0.04)
10% outliers0.9930.973RC (+0.02)
t-dist (df=5)1.0001.000Tie
t-dist (df=3)1.0000.993RC (+0.01)
t-dist (df=2)0.9670.983Ref (+0.02)

Summary: RegimeChange wins or ties in 16 of 17 scenarios. The only loss is marginal (0.017 in t-dist df=2).

Key Advantages

ScenarioRegimeChangeReferenceImprovement
Subtle variance (2:1)0.6670.400+67%
Strong AR (ρ=0.7)0.9000.720+25%
5% outliers1.0000.963+4%
Heavy tails t(df=3)1.0000.993+1%

Benchmark 2: Robustness to Outliers and Heavy Tails

10 scenarios $\times$ 50 reps = 500 tests. This benchmark specifically evaluates performance under data contamination.

Robustness Results (F1 ± SD)

ScenarioRC_peltCP_peltRC_robustRC_edivECP_pkg
Clean (baseline)1.00±0.001.00±0.000.98±0.080.77±0.200.99±0.07
2% outliers0.24±0.120.52±0.261.00±0.000.81±0.170.99±0.07
5% outliers0.14±0.040.23±0.120.99±0.050.82±0.170.99±0.07
10% outliers0.11±0.080.17±0.190.99±0.050.85±0.181.00±0.00
15% outliers0.12±0.090.17±0.251.00±0.000.75±0.310.96±0.12
t-dist (df=5)0.89±0.190.97±0.121.00±0.000.78±0.181.00±0.00
t-dist (df=3)0.50±0.210.72±0.261.00±0.000.78±0.170.97±0.12
t-dist (df=2)0.26±0.170.38±0.321.00±0.000.82±0.180.97±0.12
Mixture (10%)0.30±0.140.64±0.330.99±0.050.77±0.180.97±0.12
5% Cauchy0.41±0.200.51±0.221.00±0.000.77±0.190.97±0.10

Robustness Ranking (Excluding Baseline)

RankMethodMean F1
1RC_robust0.998
2ECP_pkg0.979
3RC_ediv0.795
4CP_pelt0.479
5RC_pelt0.331

Key Finding: RC_robust achieves near-perfect performance (F1 = 0.998) across all contamination scenarios, with perfect scores (F1 = 1.00) in 6 of 9 challenging scenarios. This represents a 108% improvement over the best reference package (changepoint PELT at F1 = 0.479) on contaminated data.

Summary of Findings

Where RegimeChange excels:

  • Contaminated data: RC_robust achieves F1 = 0.998 vs. CP_pelt at 0.479 (+108%)
  • Subtle variance changes: RC achieves 0.667 vs. reference 0.400 (+67%)
  • Autocorrelated series: RC achieves 0.900 vs. reference 0.720 for AR(0.7) (+25%)
  • Overall ranking: RC_auto achieves highest mean F1 (0.804) across all methods

Where RegimeChange ties:

  • Standard scenarios with clear mean shifts: Perfect detection (F1 = 1.00)
  • Multiple changepoint scenarios: Excellent detection (F1 ≥ 0.995)

Where references have slight advantage:

  • Very heavy tails (t-dist df=2): ECP achieves 0.983 vs. RC 0.967 (margin: 0.016)

Configuration Guide

Data CharacteristicsRecommended SettingsExpected F1
Clean iid datadefault0.95+
Mild contamination (2-5%)robust = TRUE0.99+
Heavy contamination (10%+)robust = TRUE0.99+
Heavy tails (t-dist, Cauchy)robust = TRUE0.99+
Autocorrelated (AR)correct_ar = TRUE0.95+
Subtle variance changestype = "both"0.65+
Unknown data qualitydetect_regimes(data, method = "auto")0.80+

Recommendation: For real-world data where contamination or heavy tails are possible, robust = TRUE is strongly recommended. The computational overhead is modest, and the robustness gains are substantial.

Reproducing the Benchmarks

library(RegimeChange)
library(changepoint)

# Example 1: AR(0.7) scenario
set.seed(123)
ar_data <- as.numeric(arima.sim(list(ar = 0.7), n = 400))
ar_data[201:400] <- ar_data[201:400] + 2  # Add mean shift

cp_result <- cpt.meanvar(ar_data, method = "PELT", penalty = "MBIC")
rc_result <- detect_pelt(ar_data, "both", "MBIC", 5)
rc_result_cusum <- cusum(ar_data)

# RC_cusum typically closer to true changepoint at 200

# Example 2: Contaminated data (5% outliers)
set.seed(456)
clean_data <- c(rnorm(200, 0), rnorm(200, 2))
outlier_idx <- sample(400, 20)
contaminated <- clean_data
contaminated[outlier_idx] <- contaminated[outlier_idx] + 
                              sample(c(-10, 10), 20, replace = TRUE)

# Standard methods fail
cp_result <- cpts(cpt.mean(contaminated, method = "PELT", penalty = "MBIC"))
# Often misses the true changepoint or finds spurious ones

# Robust method succeeds
rc_result <- detect_pelt(contaminated, "mean", "MBIC", 5, robust = TRUE)
rc_result$changepoints  # Typically close to 200

\newpage

Metrics and Evaluation

Metrics for Offline Mode

Localization Error

Hausdorff Distance: $$d_H(\hat{\mathcal{T}}, \mathcal{T}) = \max\left( \max_{\hat{\tau} \in \hat{\mathcal{T}}} \min_{\tau \in \mathcal{T}} |\hat{\tau} - \tau|, \max_{\tau \in \mathcal{T}} \min_{\hat{\tau} \in \hat{\mathcal{T}}} |\tau - \hat{\tau}| \right)$$

Interpretation: Maximum distance between an estimated point and the nearest true one.

Segmentation Metrics

Rand Index: $$RI = \frac{TP + TN}{TP + TN + FP + FN}$$

where TP, TN, FP, FN are defined over pairs of points:

  • TP: Same segment in truth and estimation
  • TN: Different segment in truth and estimation
  • FP: Same estimated segment, different true segment
  • FN: Different estimated segment, same true segment

Adjusted Rand Index: Corrected for chance.

Covering Metric

$$Cover(\hat{\mathcal{S}}, \mathcal{S}) = \frac{1}{n} \sum_{S \in \mathcal{S}} |S| \cdot \max_{\hat{S} \in \hat{\mathcal{S}}} \frac{|S \cap \hat{S}|}{|S \cup \hat{S}|}$$

Interpretation: Weighted average of IoU (Intersection over Union) for each segment.

F1-Score with Tolerance

Given a tolerance margin $M$:

  • TP: Change detected within $\pm M$ of a true one
  • FP: Change detected with no nearby true one
  • FN: True change with no nearby detection

$$F_1 = \frac{2 \cdot \text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}$$

Metrics for Online Mode

Average Run Length (ARL)

ARL$_0$ (under $H_0$): Average time to false alarm. We want ARL$_0$ high.

ARL$_1$ (under $H_1$): Average time from change to detection. We want ARL$_1$ low.

Trade-off: Increasing threshold $\rightarrow$ $\uparrow$ARL$_0$, $\uparrow$ARL$_1$

Detection Probability

$$P_D = P(\text{detect change} | \text{change occurred})$$

Measured at different time horizons after the change.

False Alarm Probability

$$P_{FA} = P(\text{detect change} | \text{no change occurred})$$

Per time window.

Evaluation Framework

# Evaluate result against ground truth
result <- detect_regimes(data, method = "pelt")
metrics <- evaluate(result, true_changepoints = c(100, 250), tolerance = 10)
print(metrics)

# Available metrics
hausdorff_distance(result$changepoints, true_cps)
f1_score(result$changepoints, true_cps, tolerance = 10)
rand_index(result, true_cps)
adjusted_rand_index(result, true_cps)
covering_metric(result, true_cps)
precision_score(result$changepoints, true_cps, tolerance = 10)
recall_score(result$changepoints, true_cps, tolerance = 10)
mean_absolute_error(result$changepoints, true_cps)
rmse_changepoints(result$changepoints, true_cps)

Benchmarking

The library includes:

  1. Synthetic datasets with known ground truth
  2. Real reference datasets (economic, industrial, well-log)
  3. Comparison function between methods
# Compare multiple methods
benchmark <- compare_methods(
  data = benchmark_data,
  methods = c("pelt", "bocpd", "wbs", "cusum"),
  true_changepoints = true_cps
)

# Automatic visualization
plot(benchmark)

\newpage

Technical Details

Robust Estimation (robust = TRUE)

When enabled, RegimeChange uses:

  • Adaptive winsorization: Outliers beyond 3 MADs are clipped
  • Huber M-estimation: Loss function with 95% efficiency at normal
  • Qn scale estimator: More efficient than MAD (82% vs 37%)
  • Iteratively reweighted least squares: For joint location-scale estimation

This provides a breakdown point of 50% (maximum theoretical) while maintaining high efficiency for clean data.

AR Correction (correct_ar = TRUE)

Pre-whitening transformation: $y't = y_t - \rho \cdot y{t-1}$

  • AR(1) coefficient estimated via Yule-Walker (or robust median-based method if robust = TRUE)
  • Applied only when $|\rho| > 0.1$ and $n > 30$
  • Changepoint indices automatically adjusted for the lost observation

\newpage

Technical Architecture

Design Philosophy

"R Usability, Julia Performance, Statistical Rigor"

Principles:

  1. Tidyverse-compatible R API
  2. Optional intensive computations in Julia
  3. Native R visualization (ggplot2)
  4. Extensibility for new methods

Package Structure

RegimeChange/
├── DESCRIPTION
├── NAMESPACE
├── LICENSE
├── README.md
├── R/
│   ├── data.R                  # Dataset documentation
│   ├── detect.R                # Main function detect_regimes()
│   ├── evaluation.R            # Metrics and benchmarking
│   ├── imports.R               # Package imports
│   ├── julia_backend.R         # Julia connection via JuliaCall
│   ├── methods_advanced.R      # FPOP, Kernel CPD, E-Divisive
│   ├── methods_bayesian.R      # BOCPD, Shiryaev-Roberts
│   ├── methods_deeplearning.R  # Autoencoder, TCN, Transformer
│   ├── methods_frequentist.R   # PELT, CUSUM, BinSeg, WBS
│   ├── priors.R                # Prior distributions
│   ├── visualization.R         # Plotting functions
│   └── zzz.R                   # Package initialization
├── inst/
│   └── julia/
│       └── RegimeChangeJulia.jl
├── man/
├── tests/
│   ├── testthat.R
│   └── testthat/
├── vignettes/
│   ├── introduction.Rmd
│   ├── offline-detection.Rmd
│   └── bayesian-methods.Rmd
└── data/

Julia Backend

The optional Julia backend provides high-performance implementations:

# Initialize Julia (optional, for better performance)
init_julia()

# Check availability
julia_available()
julia_status()

# Benchmark R vs Julia backends
benchmark_backends(data, method = "pelt")

Julia module exports: pelt_detect, fpop_detect, bocpd_detect, cusum_detect, kernel_cpd_detect, wbs_detect, segneigh_detect, pelt_multivariate

\newpage

API Reference

Main Function

#' Detect regime changes in time series
#'
#' @param data Numeric vector, ts, or matrix for multivariate
#' @param method Algorithm: "pelt", "bocpd", "cusum", "wbs", 
#'               "shiryaev", etc.
#' @param type Change type: "mean", "variance", "both", "trend", 
#'             "distribution"
#' @param mode Operation mode: "offline" (default), "online"
#' @param penalty For offline methods: "BIC", "AIC", "MBIC", "MDL", 
#'                or numeric
#' @param min_segment Minimum segment length
#' @param prior List with prior specification (for Bayesian methods)
#' @param threshold Detection threshold (for online/CUSUM methods)
#' @param correct_ar Logical: apply AR(1) correction?
#' @param robust Logical: use robust estimation?
#' @param ... Additional method-specific arguments
#'
#' @return Object of class "regime_result"
detect_regimes <- function(data, method = "pelt", ...)

Prior Functions (Bayesian)

# Normal-Gamma prior for BOCPD
normal_gamma(mu0 = 0, kappa0 = 1, alpha0 = 1, beta0 = 1)

# Normal prior with known variance
normal_known_var(mu0 = 0, sigma0 = 1, known_var = 1)

# Normal-Wishart prior for multivariate
normal_wishart(mu0, kappa0, nu0, Psi0)

# Poisson-Gamma prior
poisson_gamma(alpha0 = 1, beta0 = 1)

# Inverse-Gamma prior for variance
inverse_gamma_var(alpha0 = 1, beta0 = 1)

# Hazard rate priors
geometric_hazard(lambda = 0.01)
constant_hazard(rate = 0.01)
negbin_hazard(alpha = 1, beta = 1)

Online API

# Create online detector
detector <- regime_detector(method = "bocpd", prior = normal_gamma())

# Update with new observation
result <- update(detector, x)

# Reset after detection
detector <- reset(detector)

Visualization

# Plot detection results
plot(result)                          # S3 method
plot(result, type = "posterior")      # Posterior probabilities
plot(result, type = "segments")       # Segment visualization

# Additional plots
plot_summary(result)
plot_interactive(result)              # Requires plotly
plot_compare(result1, result2)        # Compare methods

Evaluation

# Evaluate against ground truth
evaluate(result, true_changepoints, tolerance = 5)

# Individual metrics
f1_score(detected, true, tolerance)
hausdorff_distance(detected, true)
rand_index(result, true)
adjusted_rand_index(result, true)
covering_metric(result, true)
precision_score(detected, true, tolerance)
recall_score(detected, true, tolerance)

# Compare methods
compare_methods(data, methods = c("pelt", "bocpd"), 
                true_changepoints = true_cps)

\newpage

Future Directions

Planned: Integration with EmpiricalDynamics

RegimeChange is designed to complement equation discovery workflows:

LibraryQuestion
RegimeChangeWhen do the system's properties change?
EmpiricalDynamicsWhat is the equation governing the system?

Potential workflow: Instead of searching for ONE equation for the entire dataset, discover different equations for each regime:

$$\text{Raw data} \rightarrow \text{RegimeChange} \rightarrow \text{Segments} \rightarrow \text{EmpiricalDynamics} \rightarrow \text{Switching system}$$

Other Future Enhancements

  • Graph-based changepoint detection (Sulem et al., 2024)
  • Improved multivariate optimization for $p >> n$
  • Additional deep learning architectures
  • Real-time dashboard for online monitoring

\newpage

Limitations

  • Very short series: For $n < 100$, standard PELT implementations may be preferable
  • High-dimensional data: Multivariate support is available but not yet optimized for $p >> n$
  • Computational cost: Robust estimation is $O(n^2)$ vs $O(n)$ for standard PELT; disable for very large datasets
  • AR + heavy contamination: When both issues are present, robust = TRUE alone typically works better than combining both corrections
  • Deep learning methods: Require optional dependencies (keras, tensorflow) and may need tuning for specific domains

\newpage

References

Fundamental Methods

  • Page, E. S. (1954). Continuous inspection schemes. Biometrika.
  • Basseville, M., & Nikiforov, I. (1993). Detection of Abrupt Changes: Theory and Application.
  • Killick, R., Fearnhead, P., & Eckley, I. A. (2012). Optimal detection of changepoints with a linear computational cost. Journal of the American Statistical Association, 107(500), 1590-1598.
  • Killick, R., & Eckley, I. A. (2014). changepoint: An R package for changepoint analysis. Journal of Statistical Software, 58(3), 1-19.

Bayesian Methods

  • Adams, R. P., & MacKay, D. J. (2007). Bayesian online changepoint detection. arXiv preprint arXiv:0710.3742.
  • Tartakovsky, A. G., & Moustakides, G. V. (2010). State-of-the-art in Bayesian changepoint detection. Sequential Analysis.

Deep Learning

  • Li, J., Fearnhead, P., Fryzlewicz, P., & Wang, T. (2024). Automatic change-point detection in time series via deep learning. JRSS-B.
  • Gupta, M., Wadhvani, R., & Rasool, A. (2022). Real-time change-point detection. Expert Systems with Applications.

Robust Statistics

  • Rousseeuw, P. J., & Croux, C. (1993). Alternatives to the median absolute deviation. Journal of the American Statistical Association, 88(424), 1273-1283.
  • Huber, P. J. (1981). Robust Statistics. John Wiley & Sons.

Other Packages

  • Truong, C., Oudre, L., & Vayatis, N. (2020). ruptures: change point detection in Python.
  • Baumer, B. et al. (2025). tidychangepoint: A Unified Framework. R package.
  • Fryzlewicz, P. (2014). Wild binary segmentation for multiple change-point detection. Annals of Statistics, 42(6), 2243-2281.
  • Baranowski, R., Chen, Y., & Fryzlewicz, P. (2019). Narrowest-over-threshold detection of multiple change points and change-point-like features. Journal of the Royal Statistical Society: Series B, 81(3), 649-672.
  • James, N. A., & Matteson, D. S. (2015). ecp: An R package for nonparametric multiple change point analysis of multivariate data. Journal of Statistical Software, 62(7), 1-25.

\newpage

Glossary

TermDefinition
ChangepointPoint in time where the statistical properties of a series change
RegimeTime period with homogeneous statistical properties
Run lengthTime elapsed since the last changepoint
ARLAverage Run Length: average time until a decision
Hazard rateInstantaneous probability that a change occurs
PosteriorProbability distribution after observing the data
PELTPruned Exact Linear Time: optimal segmentation algorithm
BOCPDBayesian Online Changepoint Detection
CUSUMCumulative Sum: cumulative statistic for detection

\newpage

Citation

@software{regimechange2024,
  title = {RegimeChange: Robust Changepoint Detection for Time Series},
  author = {Gómez Julián, José Mauricio},
  year = {2024},
  url = {https://github.com/IsadoreNabi/RegimeChange}
}

License

MIT © José Mauricio Gómez Julián.

Metadata

Version

0.1.1

License

Unknown

Platforms (78)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    uefi
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-uefi
  • aarch64-windows
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-linux
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-uefi
  • x86_64-windows