MyNixOS website logo
Description

Empirical Discovery of Differential Equations from Time Series Data.

A comprehensive toolkit for discovering differential and difference equations from empirical time series data using symbolic regression. The package implements a complete workflow from data preprocessing (including Total Variation Regularized differentiation for noisy economic data), visual exploration of dynamical structure, and symbolic equation discovery via genetic algorithms. It leverages a high-performance 'Julia' backend ('SymbolicRegression.jl') to provide industrial-grade robustness, physics-informed constraints, and rigorous out-of-sample validation. Designed for economists, physicists, and researchers studying dynamical systems from observational data.

Based on the detailed documentation provided for EmpiricalDynamics, the addition of strategic diagrams would significantly enhance user understanding, particularly for the complex workflows involved in SDE discovery and the hybrid architecture.

Here is the revised documentation with suggested image tags inserted at high-value locations.

EmpiricalDynamics

High-Performance & Robust Empirical Discovery of Differential Equations from Time Series Data

EmpiricalDynamics is a comprehensive toolkit for discovering differential and difference equations from empirical time series data. It combines the statistical power of R with a high-performance Julia backend (via SymbolicRegression.jl) to offer a robust engine capable of recovering physical laws, economic models, and stochastic differential equations from noisy data.

Performance Benchmarks:

  • Deterministic ODEs: R² > 0.93 on chaotic systems (Lorenz attractor)
  • Stochastic SDEs: Drift R² > 0.86, Diffusion R² > 0.59 with GLS refinement
  • Physics Constants: Precision of 10⁻⁸ recovering π and e from noisy data

🚀 Key Features

FeatureDescription
Hybrid R/Julia ArchitectureSeamlessly leverages Julia's speed for symbolic regression while keeping R's statistical ecosystem for analysis
Multi-threaded SearchAutomatic parallelization across CPU cores via Julia threads
SDE DiscoveryFull pipeline for Stochastic Differential Equations: drift + diffusion recovery
GLS Iterative RefinementIterative Generalized Least Squares for heteroscedastic data
TVR DifferentiationTotal Variation Regularized derivatives for robust estimation in noisy data
Physics-InformedAutomatic detection of fundamental constants (π, e, φ)
Production ReadyHandles edge cases (NaNs, outliers, single-variable inputs) gracefully

📦 Installation

Prerequisites

  • R (>= 4.0.0)
  • Julia (>= 1.9): Must be installed and available in your system path

Install from Source

# 1. Install R dependencies
install.packages(c("CVXR", "minpack.lm", "signal", "lmtest", "tseries", 
                   "ggplot2", "gridExtra", "JuliaCall", "devtools"))

# 2. Install EmpiricalDynamics
devtools::install()

# 3. (Optional) Enable Julia parallelization
# Add to ~/.Renviron:
# JULIA_NUM_THREADS=8

First-Time Setup

library(EmpiricalDynamics)

# Initialize Julia backend (installs required Julia packages)
# This may take a few minutes on first run
julia <- setup_julia_backend()


⚡ Quick Start

Example 1: Discovering a Physical Law

Recovering from noisy observations:

library(EmpiricalDynamics)

# Generate noisy data
set.seed(42)
t <- seq(-3, 3, length.out = 200)
y <- pi * sin(t) + exp(1) * t + rnorm(200, 0, 0.05)
data <- data.frame(y = y, t = t)

# Symbolic search
results <- symbolic_search(
  target = data$y,
  predictors = data[, "t", drop = FALSE],
  complexity_limit = 15,
  n_iterations = 500,
  parsimony = 0.001
)

# Best equation
best <- select_equation(results, criterion = "knee")
print(best)
# Expression: (sin(t) * 3.14159) + (t * 2.71828)
# R²: 0.9999

Example 2: Chaotic System (Lorenz Attractor)

Recovering the Lorenz equations from trajectory data:

# Assuming you have trajectory data with X, Y, Z and their TVR derivatives
data_lorenz <- data.frame(X = X, Y = Y, Z = Z, dX = dX_tvr, dY = dY_tvr, dZ = dZ_tvr)

# Add auxiliary transformations
data_lorenz$XY <- X * Y
data_lorenz$XZ <- X * Z
data_lorenz$YmX <- Y - X

# Search for dx/dt = σ(y - x)
results_dX <- symbolic_search(
  target = data_lorenz$dX,
  predictors = data_lorenz[, c("X", "Y", "Z", "YmX")],
  complexity_limit = 20,
  n_iterations = 1000,
  parsimony = 0.0001
)
# Achieves R² > 0.93


🔬 SDE Discovery: Complete Workflow

Recovering Stochastic Differential Equations of the form:

Where is the drift and is the diffusion coefficient.

The Challenge

In SDEs, the observed derivative contains both signal (drift) and noise (diffusion):

Direct symbolic regression on recovers drift poorly when Signal-to-Noise Ratio (SNR) < 1.

Solution: Hybrid GLS + Symbolic Search

library(EmpiricalDynamics)

# ═══════════════════════════════════════════════════════════════════
# STEP 1: Compute derivatives with TVR
# ═══════════════════════════════════════════════════════════════════

data_spec <- specify_variables(data, endogenous = "Z", exogenous = "X", time_col = "time")

tvr_result <- compute_derivative(
  Z = data_spec$Z,
  t = data_spec$time,
  method = "tvr",
  lambda = "auto",
  solver = "osqp"
)

dZ_tvr <- tvr_result$derivative

# ═══════════════════════════════════════════════════════════════════
# STEP 2: Initial drift search (symbolic)
# ═══════════════════════════════════════════════════════════════════

# Prepare predictors with transformations
data_full <- create_transformations(data, variables = c("X", "Z"))
data_full$dZ <- dZ_tvr
data_full$sinX <- sin(data_full$X)
data_full$absX <- abs(data_full$X)
data_full$Z3 <- data_full$Z^3

drift_initial <- symbolic_search(
  target = data_full$dZ,
  predictors = data_full[, c("X", "Z", "sinX", "Z3")],
  complexity_limit = 20,
  n_iterations = 600,
  parsimony = 0.0002
)

# ═══════════════════════════════════════════════════════════════════
# STEP 3: GLS iterative refinement
# ═══════════════════════════════════════════════════════════════════

sde_refined <- estimate_sde_iterative(
  target = data_full$dZ,
  predictors = data_full[, c("X", "Z", "sinX", "Z3", "absX")],
  data = data_full,
  initial_drift = select_equation(drift_initial, "best"),
  max_iter = 8,
  tol = 0.001
)

# GLS iteratively:
# 1. Estimates diffusion from residuals
# 2. Calculates weights = 1/σ²
# 3. Re-estimates drift with weighted least squares
# 4. Repeats until convergence

# ═══════════════════════════════════════════════════════════════════
# STEP 4: Final diffusion search (symbolic)
# ═══════════════════════════════════════════════════════════════════

# Use refined residuals
residuals_refined <- sde_refined$final_residuals
resid_magnitude <- abs(residuals_refined) / sqrt(dt)

# Smooth with median filter (more robust than mean)
resid_smoothed <- stats::runmed(resid_magnitude, k = 21)

diffusion_final <- symbolic_search(
  target = resid_smoothed,
  predictors = data_full[, c("X", "absX")],
  complexity_limit = 12,
  n_iterations = 400,
  parsimony = 0.0005
)

# ═══════════════════════════════════════════════════════════════════
# STEP 5: Construct final SDE model
# ═══════════════════════════════════════════════════════════════════

sde_final <- construct_sde(
  drift = sde_refined$drift,
  diffusion = select_equation(diffusion_final, "best"),
  variable = "Z"
)

print(sde_final)

Benchmark Results

Test system:

MethodDrift R²Diffusion R²
Direct symbolic search0.820.07
GLS + Symbolic (recommended)0.860.59

When to Use Each Method

ScenarioRecommended Method
Deterministic ODE (no noise)Direct symbolic_search()
SDE with SNR > 2Direct symbolic_search() on dZ, then residuals
SDE with SNR < 1GLS iterative + symbolic (as above)
Unknown noise structuremodel_conditional_variance(method = "symbolic")

🧮 Differentiation Methods

Computing derivatives from noisy time series is critical. We offer multiple methods:

# Total Variation Regularization (recommended for noisy data)
deriv_tvr <- compute_derivative(Z, t, method = "tvr", lambda = "auto", solver = "osqp")

# Savitzky-Golay filter (fast fallback)
deriv_sg <- compute_derivative(Z, t, method = "savgol", p = 3, n = 15)

# Cubic spline (smooth data)
deriv_spline <- compute_derivative(Z, t, method = "spline")

# Spectral (periodic data)
deriv_spectral <- compute_derivative(Z, t, method = "spectral")

Comparison

MethodNoise RobustnessDiscontinuitiesSpeedBest For
TVR⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐Noisy economic/financial data
Savgol⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐General purpose, fallback
Spline⭐⭐⭐⭐⭐⭐Smooth physical data
Spectral⭐⭐⭐⭐⭐Periodic signals

🛠️ Advanced Configuration

Julia Backend Parameters

# Direct Julia call with full control
results <- julia_call(
  "run_symbolic_search_robust",
  data_path,
  response_col,
  predictor_cols,
  max_complexity = 20L,      # Tree size limit
  n_iterations = 500L,       # Genetic algorithm generations
  parsimony = 0.001,         # Complexity penalty
  custom_operators = ""      # e.g., "safe_log,safe_sqrt"
)

Parsimony Guidelines

ValueUse Case
0.1Quick exploration, very simple models
0.01Standard regression
0.001Physics discovery (recommended)
0.0001Complex systems (Lorenz, etc.)

Parallelization

# Check Julia threads
julia_eval("Threads.nthreads()")

# Enable more threads (before julia_setup)
Sys.setenv(JULIA_NUM_THREADS = "8")


📊 Diagnostic Tools

Residual Analysis

# After fitting drift
residuals <- data$dZ - predict(drift_model, data)

# Comprehensive diagnostics
diag <- residual_diagnostics(
  residuals = residuals,
  data = data,
  predictors = c("X", "Z"),
  max_lag = 20,
  plot = TRUE
)

print(diag)
# Ljung-Box (autocorrelation): p=0.XXX
# ARCH-LM (conditional het.): p=0.XXX  
# Breusch-Pagan (het.): p=0.XXX
# Jarque-Bera (normality): p=0.XXX

Model Validation

# Cross-validation
cv_result <- cross_validate(
  equation = best_drift,
  data = data,
  target_col = "dZ",
  k = 5
)

# Trajectory simulation
sim <- simulate_trajectory(
  sde_model = sde_final,
  initial_state = c(Z = 0),
  t_span = c(0, 30),
  dt = 0.01,
  n_paths = 100
)

plot(sim)


📈 Complete Example: Economic Growth Model

Discovering a Solow-Swan type growth equation from GDP data:

library(EmpiricalDynamics)

# Load data
data <- read_empirical_data("gdp_quarterly.csv")
data <- specify_variables(data, endogenous = "GDP", exogenous = c("Capital", "Labor"), 
                          time_col = "Quarter")

# Compute growth rate
deriv <- compute_derivative(data$GDP, data$Quarter, method = "tvr", lambda = "auto")
data$dGDP <- deriv$derivative

# Create transformations
data <- create_transformations(data, variables = c("GDP", "Capital", "Labor"))

# Search for growth equation
results <- symbolic_search(
  target = data$dGDP,
  predictors = data[, c("GDP", "Capital", "Labor", "log_Capital", "log_Labor")],
  complexity_limit = 20,
  n_iterations = 800,
  parsimony = 0.001
)

# Select and validate
best <- select_equation(results, criterion = "knee")
cv <- cross_validate(best, data, "dGDP", k = 5)

# Export results
export_results(best, cv, format = "latex", file = "growth_model.tex")


📚 Function Reference

Core Functions

FunctionDescription
symbolic_search()Main symbolic regression interface
select_equation()Select from Pareto front (best, simplest, knee)
compute_derivative()Numerical differentiation (TVR, Savgol, etc.)
specify_variables()Annotate data with variable roles
create_transformations()Auto-generate log, sqrt, square variants

SDE Functions

FunctionDescription
estimate_sde_iterative()GLS iterative refinement for SDEs
model_conditional_variance()Estimate diffusion from residuals
construct_sde()Combine drift + diffusion into SDE object
simulate_trajectory()Monte Carlo SDE simulation

Diagnostics

FunctionDescription
residual_diagnostics()Ljung-Box, ARCH-LM, Breusch-Pagan, Jarque-Bera
cross_validate()K-fold cross-validation
validate_model()Out-of-sample prediction metrics

🔧 Troubleshooting

Julia Threads Not Working

# Check current threads
julia_eval("Threads.nthreads()")  # Shows 1?

# Solution: Set BEFORE starting R
# Windows: System Environment Variables → JULIA_NUM_THREADS = 8
# Linux/Mac: Add to ~/.Renviron: JULIA_NUM_THREADS=8
# Then restart RStudio completely

TVR Solver Fails

# If OSQP fails, try SCS (slower but more robust)
deriv <- compute_derivative(Z, t, method = "tvr", solver = "scs")

# If all solvers fail, use Savitzky-Golay
deriv <- compute_derivative(Z, t, method = "savgol", p = 3, n = 15)

Low R² on SDE Diffusion

This is expected when SNR < 1. The diffusion term is inherently harder to recover because:

  1. It's estimated from residuals (which contain drift estimation error)
  2. The stochastic noise masks the systematic pattern

Solutions:

  • Use GLS iterative refinement (improves diffusion R² by ~8x)
  • Increase data quantity (more observations)
  • Reduce measurement noise if possible
  • Accept that diffusion R² > 0.5 is often the practical ceiling

📖 Citation

If you use EmpiricalDynamics in your research, please cite:

@software{EmpiricalDynamics,
  title = {EmpiricalDynamics: Robust Discovery of Differential Equations with R and Julia},
  author = {Gómez Julián, José Mauricio},
  year = {2025},
  url = {https://github.com/IsadoreNabi/EmpiricalDynamics},
  version = {0.1.0}
}

📜 License

MIT License. See LICENSE for details.


🗺️ Roadmap

  • [ ] GPU acceleration for symbolic search
  • [ ] Automatic hyperparameter tuning
  • [ ] Neural-guided symbolic regression
  • [ ] Time-varying parameter detection
  • [ ] Integration with Stan for Bayesian SDEs

Built with ❤️ for Science and Economics.

Metadata

Version

0.1.3

License

Unknown

Platforms (80)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    uefi
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-uefi
  • aarch64-windows
  • aarch64_be-none
  • arc-linux
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-linux
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • sh4-linux
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-uefi
  • x86_64-windows