Adaptive Regularization using Cubics for Optimization.
arcopt
Robust nonlinear optimization for R
When optim() fails on your ill-conditioned model, arcopt is designed to succeed. It uses Adaptive Regularization with Cubics (ARC) to handle indefinite Hessians and escape saddle points automatically.
Installation
# Install from GitHub
pak::pak("marcus-waldman/arcopt")
# Or with devtools
devtools::install_github("marcus-waldman/arcopt")
Quick Start
library(arcopt)
# Rosenbrock function - a classic difficult optimization problem
result <- arcopt(
x0 = c(-1.2, 1),
fn = function(x) (1 - x[1])^2 + 100 * (x[2] - x[1]^2)^2,
gr = function(x) c(
-2 * (1 - x[1]) - 400 * x[1] * (x[2] - x[1]^2),
200 * (x[2] - x[1]^2)
),
hess = function(x) matrix(c(
1200 * x[1]^2 - 400 * x[2] + 2, -400 * x[1],
-400 * x[1], 200
), 2, 2)
)
result$par
#> [1] 1 1
Two Modes: Exact Hessian vs Quasi-Newton
arcopt offers two optimization strategies:
Mode 1: Exact Hessian (Default)
Best when you can compute the Hessian analytically or via automatic differentiation.
# Provide fn, gr, and hess
result <- arcopt(x0, fn, gr, hess)
Mode 2: Quasi-Newton (No Hessian Required)
Best when computing the Hessian is expensive or unavailable. Uses BFGS/SR1 approximations. arcopt seeds the initial approximation $B_0$ with a one-time finite-difference Hessian computed from the supplied gradient (cost: $2n$ gradient evaluations at startup), which dramatically improves saddle-escape behavior compared to the identity-initialization convention used in classical BFGS/SR1.
# Just provide fn and gr - no Hessian needed
result <- arcopt(x0, fn, gr, control = list(use_qn = TRUE))
Which Mode Should I Use?
| Scenario | Recommendation |
|---|---|
| Analytic Hessian available | Use exact Hessian (default) |
| Hessian via autodiff (torch, Stan) | Use exact Hessian |
| Hessian expensive (n > 100) | Use use_qn = TRUE |
| No Hessian function | Use use_qn = TRUE |
Benchmark Results
Tested on 8 standard functions with 10 starting points each (80 test cases):
| Method | Convergence | Avg Iterations | Hessian Evals |
|---|---|---|---|
| Exact Hessian | 100% | 22 | 1,443 |
| BFGS QN | 100% | 28 | 0 |
| SR1 QN | 99% | 61 | 0 |
BFGS Quasi-Newton achieves the same 100% convergence as exact Hessian with only 26% more iterations and zero Hessian evaluations.
On problems with symmetric saddle points (e.g., growth-mixture models, factor models with rotational symmetry), qn_method = "hybrid" combines SR1's ability to represent indefinite curvature with BFGS's fast local convergence and matches full-Hessian solvers' saddle-escape reliability using only gradient information.
Box Constraints
# Optimize with bounds: 0 <= x <= 0.5
result <- arcopt(
x0 = c(0.1, 0.1),
fn = fn, gr = gr, hess = hess,
lower = c(0, 0),
upper = c(0.5, 0.5)
)
Why ARC?
Standard optimizers struggle when the Hessian is indefinite or nearly singular—common in:
- Maximum likelihood with complex likelihoods
- Bayesian posterior modes (MAP estimation)
- Nonlinear mixed effects models
- Any model with poor identifiability
Cubic regularization automatically handles negative curvature without line searches or trust-region adjustments. It provably escapes saddle points and converges to local minima.
The Key Idea
Instead of the quadratic model used by Newton's method:
m(s) = f + g's + (1/2) s'Hs
ARC adds a cubic term:
m(s) = f + g's + (1/2) s'Hs + (σ/3)||s||³
This cubic term prevents unbounded steps when the Hessian has negative eigenvalues.
Adaptive Solver-Mode Dispatch
arcopt() automatically adapts its subproblem solver to the local landscape:
- Cubic (default). Uses adaptive cubic regularization with a Newton-first fast path. Handles saddles and indefinite Hessians naturally.
- Trust-region (enabled by default). Fires a one-way switch from cubic when four runtime signals indicate flat-ridge or stuck-indefinite-saddle stagnation. Replaces the cubic penalty with a hard step-norm constraint so the step can walk off a near-singular ridge.
- QN-polish (opt-in). Fires a bidirectional switch from cubic when the iterate enters the quadratic attraction basin of a strict minimum. Replaces the cubic subproblem with Wolfe line-search BFGS warmstarted from the current exact Hessian; skips further
hess()calls until convergence or revert, saving wall-clock time on expensive Hessians (Stan AD, finite differences).
result <- arcopt(x0, fn, gr, hess, control = list(
tr_fallback_enabled = TRUE, # cubic->TR on flat-ridge (default)
qn_polish_enabled = TRUE # cubic<->BFGS polish in basin (opt-in)
))
result$diagnostics$solver_mode_final # "cubic", "tr", or "qn_polish"
result$diagnostics$ridge_switches # count of cubic->TR transitions
result$diagnostics$qn_polish_switches # count of cubic->polish transitions
result$diagnostics$qn_polish_reverts # count of polish->cubic reversions
See design/design-principles.qmd §3a for the σ↔r duality framing and todo/tr-fallback-hybrid-briefing.md / todo/qn-polish-mode-briefing.md for the detector designs.
Common Options
arcopt(x0, fn, gr, hess,
control = list(
# Quasi-Newton mode (no exact Hessian needed)
use_qn = TRUE,
qn_method = "hybrid", # recommended for saddle-prone problems;
# also "bfgs", "sr1"
# Convergence tolerances
gtol_abs = 1e-5, # Gradient norm
maxit = 1000, # Max iterations
# Hybrid solver-mode dispatch
tr_fallback_enabled = TRUE, # cubic->TR on stagnation
qn_polish_enabled = FALSE, # cubic<->polish in quadratic basin
# Output
trace = 1, # Depth of saved trace data (0/1/2/3)
verbose = FALSE # Print one line per iteration to console
)
)
The full set of advanced controls (cubic regularization tuning, TR-fallback thresholds, polish-mode thresholds, QN routing) lives under ?arcopt_advanced_controls.
Comparison with optim()
| Feature | arcopt | optim(BFGS) | optim(L-BFGS-B) |
|---|---|---|---|
| Handles indefinite Hessian | ✓ | ✗ | ✗ |
| Escapes saddle points | ✓ | ✗ | ✗ |
| Box constraints | ✓ | ✗ | ✓ |
| Hessian-free mode | ✓ | ✓ | ✓ |
| Best for | Complex likelihoods | Smooth convex | Large bound-constrained |
When NOT to Use arcopt
arcopt is a local optimizer. Use something else for:
- Global optimization: Use
DEoptim,GenSA, ornloptrwith global algorithms - Derivative-free: Use
nloptrwith COBYLA ordfoptim - Very large scale (n > 10,000): Use
lbfgsor sparse methods - Linear/Quadratic programming: Use
lpSolve,quadprog
Best for: 2-500 parameters, nonconvex objectives, need robust convergence.
Return Value
result <- arcopt(x0, fn, gr, hess)
result$par # Optimal parameters
result$value # Optimal function value
result$gradient # Gradient at solution
result$hessian # Hessian at solution (exact in cubic/tr mode;
# BFGS approximation if solver_mode_final == "qn_polish")
result$converged # TRUE if converged
result$iterations # Number of iterations
result$evaluations # List: fn, gr, hess counts
result$message # Why it stopped
# Solver-mode diagnostics (nested sublist; most users do not inspect this).
# See ?arcopt_advanced_controls for the full schema.
result$diagnostics$solver_mode_final # "cubic", "tr", or "qn_polish"
result$diagnostics$ridge_switches # count of cubic->TR transitions
result$diagnostics$radius_final # final TR radius (NA if no switch)
result$diagnostics$qn_polish_switches # count of cubic->qn_polish transitions
result$diagnostics$qn_polish_reverts # count of qn_polish->cubic reversions
result$diagnostics$hess_evals_at_polish_switch # baseline for Hessian-eval savings
References
Adaptive Regularization with Cubics (ARC)
Cartis, C., Gould, N. I. M., & Toint, P. L. (2011). Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results. Mathematical Programming, 127(2), 245-295.
Nesterov, Y., & Polyak, B. T. (2006). Cubic regularization of Newton method and its global performance. Mathematical Programming, 108(1), 177-205.
Quasi-Newton with Cubic Regularization
Kamzolov, D., Ziu, K., Agafonov, A., & Takáč, M. (2025). Accelerated Adaptive Cubic Regularized Quasi-Newton Methods. Journal of Optimization Theory and Applications, 208. https://doi.org/10.1007/s10957-025-02804-3
Kamzolov, D., Ziu, K., Agafonov, A., & Takáč, M. (2023). Cubic Regularization is the Key! The First Accelerated Quasi-Newton Method with a Global Convergence Rate of O(k^-2) for Convex Functions. arXiv:2302.04987.
Benson, H. Y., & Shanno, D. F. (2018). Cubic regularization in symmetric rank-1 quasi-Newton methods. Mathematical Programming Computation, 10(4), 457-494.
License
MIT © 2025 Marcus Waldman
Contributing
Issues and PRs welcome at: https://github.com/marcus-waldman/arcopt/issues.