MyNixOS website logo
Description

Adaptive Regularization using Cubics for Optimization.

Implements cubic regularization methods (ARC) for local optimization problems common in statistics and applied research. Provides robust handling of ill-conditioned, nonconvex, and indefinite Hessian problems with automatic saddle point escape. Supports box constraints; linear equality constraints are planned for a future release.

arcopt

R-CMD-check Lifecycle: experimental License: MIT

Robust nonlinear optimization for R

When optim() fails on your ill-conditioned model, arcopt is designed to succeed. It uses Adaptive Regularization with Cubics (ARC) to handle indefinite Hessians and escape saddle points automatically.

Installation

# Install from GitHub
pak::pak("marcus-waldman/arcopt")

# Or with devtools
devtools::install_github("marcus-waldman/arcopt")

Quick Start

library(arcopt)

# Rosenbrock function - a classic difficult optimization problem
result <- arcopt(
  x0 = c(-1.2, 1),
  fn = function(x) (1 - x[1])^2 + 100 * (x[2] - x[1]^2)^2,
  gr = function(x) c(
    -2 * (1 - x[1]) - 400 * x[1] * (x[2] - x[1]^2),
    200 * (x[2] - x[1]^2)
  ),
  hess = function(x) matrix(c(
    1200 * x[1]^2 - 400 * x[2] + 2, -400 * x[1],
    -400 * x[1], 200
  ), 2, 2)
)

result$par
#> [1] 1 1

Two Modes: Exact Hessian vs Quasi-Newton

arcopt offers two optimization strategies:

Mode 1: Exact Hessian (Default)

Best when you can compute the Hessian analytically or via automatic differentiation.

# Provide fn, gr, and hess
result <- arcopt(x0, fn, gr, hess)

Mode 2: Quasi-Newton (No Hessian Required)

Best when computing the Hessian is expensive or unavailable. Uses BFGS/SR1 approximations. arcopt seeds the initial approximation $B_0$ with a one-time finite-difference Hessian computed from the supplied gradient (cost: $2n$ gradient evaluations at startup), which dramatically improves saddle-escape behavior compared to the identity-initialization convention used in classical BFGS/SR1.

# Just provide fn and gr - no Hessian needed
result <- arcopt(x0, fn, gr, control = list(use_qn = TRUE))

Which Mode Should I Use?

ScenarioRecommendation
Analytic Hessian availableUse exact Hessian (default)
Hessian via autodiff (torch, Stan)Use exact Hessian
Hessian expensive (n > 100)Use use_qn = TRUE
No Hessian functionUse use_qn = TRUE

Benchmark Results

Tested on 8 standard functions with 10 starting points each (80 test cases):

MethodConvergenceAvg IterationsHessian Evals
Exact Hessian100%221,443
BFGS QN100%280
SR1 QN99%610

BFGS Quasi-Newton achieves the same 100% convergence as exact Hessian with only 26% more iterations and zero Hessian evaluations.

On problems with symmetric saddle points (e.g., growth-mixture models, factor models with rotational symmetry), qn_method = "hybrid" combines SR1's ability to represent indefinite curvature with BFGS's fast local convergence and matches full-Hessian solvers' saddle-escape reliability using only gradient information.

Box Constraints

# Optimize with bounds: 0 <= x <= 0.5
result <- arcopt(
  x0 = c(0.1, 0.1),
  fn = fn, gr = gr, hess = hess,
  lower = c(0, 0),
  upper = c(0.5, 0.5)
)

Why ARC?

Standard optimizers struggle when the Hessian is indefinite or nearly singular—common in:

  • Maximum likelihood with complex likelihoods
  • Bayesian posterior modes (MAP estimation)
  • Nonlinear mixed effects models
  • Any model with poor identifiability

Cubic regularization automatically handles negative curvature without line searches or trust-region adjustments. It provably escapes saddle points and converges to local minima.

The Key Idea

Instead of the quadratic model used by Newton's method:

m(s) = f + g's + (1/2) s'Hs

ARC adds a cubic term:

m(s) = f + g's + (1/2) s'Hs + (σ/3)||s||³

This cubic term prevents unbounded steps when the Hessian has negative eigenvalues.

Adaptive Solver-Mode Dispatch

arcopt() automatically adapts its subproblem solver to the local landscape:

  1. Cubic (default). Uses adaptive cubic regularization with a Newton-first fast path. Handles saddles and indefinite Hessians naturally.
  2. Trust-region (enabled by default). Fires a one-way switch from cubic when four runtime signals indicate flat-ridge or stuck-indefinite-saddle stagnation. Replaces the cubic penalty with a hard step-norm constraint so the step can walk off a near-singular ridge.
  3. QN-polish (opt-in). Fires a bidirectional switch from cubic when the iterate enters the quadratic attraction basin of a strict minimum. Replaces the cubic subproblem with Wolfe line-search BFGS warmstarted from the current exact Hessian; skips further hess() calls until convergence or revert, saving wall-clock time on expensive Hessians (Stan AD, finite differences).
result <- arcopt(x0, fn, gr, hess, control = list(
  tr_fallback_enabled = TRUE,    # cubic->TR on flat-ridge (default)
  qn_polish_enabled = TRUE       # cubic<->BFGS polish in basin (opt-in)
))

result$diagnostics$solver_mode_final   # "cubic", "tr", or "qn_polish"
result$diagnostics$ridge_switches      # count of cubic->TR transitions
result$diagnostics$qn_polish_switches  # count of cubic->polish transitions
result$diagnostics$qn_polish_reverts   # count of polish->cubic reversions

See design/design-principles.qmd §3a for the σ↔r duality framing and todo/tr-fallback-hybrid-briefing.md / todo/qn-polish-mode-briefing.md for the detector designs.

Common Options

arcopt(x0, fn, gr, hess,
  control = list(
    # Quasi-Newton mode (no exact Hessian needed)
    use_qn = TRUE,
    qn_method = "hybrid",    # recommended for saddle-prone problems;
                             # also "bfgs", "sr1"

    # Convergence tolerances
    gtol_abs = 1e-5,         # Gradient norm
    maxit = 1000,            # Max iterations

    # Hybrid solver-mode dispatch
    tr_fallback_enabled = TRUE,   # cubic->TR on stagnation
    qn_polish_enabled   = FALSE,  # cubic<->polish in quadratic basin

    # Output
    trace   = 1,             # Depth of saved trace data (0/1/2/3)
    verbose = FALSE          # Print one line per iteration to console
  )
)

The full set of advanced controls (cubic regularization tuning, TR-fallback thresholds, polish-mode thresholds, QN routing) lives under ?arcopt_advanced_controls.

Comparison with optim()

Featurearcoptoptim(BFGS)optim(L-BFGS-B)
Handles indefinite Hessian
Escapes saddle points
Box constraints
Hessian-free mode
Best forComplex likelihoodsSmooth convexLarge bound-constrained

When NOT to Use arcopt

arcopt is a local optimizer. Use something else for:

  • Global optimization: Use DEoptim, GenSA, or nloptr with global algorithms
  • Derivative-free: Use nloptr with COBYLA or dfoptim
  • Very large scale (n > 10,000): Use lbfgs or sparse methods
  • Linear/Quadratic programming: Use lpSolve, quadprog

Best for: 2-500 parameters, nonconvex objectives, need robust convergence.

Return Value

result <- arcopt(x0, fn, gr, hess)

result$par                 # Optimal parameters
result$value               # Optimal function value
result$gradient            # Gradient at solution
result$hessian             # Hessian at solution (exact in cubic/tr mode;
                           # BFGS approximation if solver_mode_final == "qn_polish")
result$converged           # TRUE if converged
result$iterations          # Number of iterations
result$evaluations         # List: fn, gr, hess counts
result$message             # Why it stopped

# Solver-mode diagnostics (nested sublist; most users do not inspect this).
# See ?arcopt_advanced_controls for the full schema.
result$diagnostics$solver_mode_final           # "cubic", "tr", or "qn_polish"
result$diagnostics$ridge_switches              # count of cubic->TR transitions
result$diagnostics$radius_final                # final TR radius (NA if no switch)
result$diagnostics$qn_polish_switches          # count of cubic->qn_polish transitions
result$diagnostics$qn_polish_reverts           # count of qn_polish->cubic reversions
result$diagnostics$hess_evals_at_polish_switch # baseline for Hessian-eval savings

References

Adaptive Regularization with Cubics (ARC)

Cartis, C., Gould, N. I. M., & Toint, P. L. (2011). Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results. Mathematical Programming, 127(2), 245-295.

Nesterov, Y., & Polyak, B. T. (2006). Cubic regularization of Newton method and its global performance. Mathematical Programming, 108(1), 177-205.

Quasi-Newton with Cubic Regularization

Kamzolov, D., Ziu, K., Agafonov, A., & Takáč, M. (2025). Accelerated Adaptive Cubic Regularized Quasi-Newton Methods. Journal of Optimization Theory and Applications, 208. https://doi.org/10.1007/s10957-025-02804-3

Kamzolov, D., Ziu, K., Agafonov, A., & Takáč, M. (2023). Cubic Regularization is the Key! The First Accelerated Quasi-Newton Method with a Global Convergence Rate of O(k^-2) for Convex Functions. arXiv:2302.04987.

Benson, H. Y., & Shanno, D. F. (2018). Cubic regularization in symmetric rank-1 quasi-Newton methods. Mathematical Programming Computation, 10(4), 457-494.

License

MIT © 2025 Marcus Waldman

Contributing

Issues and PRs welcome at: https://github.com/marcus-waldman/arcopt/issues.

Metadata

Version

0.3.0

License

Unknown

Platforms (80)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    uefi
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-uefi
  • aarch64-windows
  • aarch64_be-none
  • arc-linux
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-linux
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • sh4-linux
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-uefi
  • x86_64-windows