OFF (Proximal) Newton-type Methods with Inexact Derivatives for Unconstrained Optimization
Pith reviewed 2026-05-21 18:21 UTC · model grok-4.3
The pith
Objective-function-free Newton methods achieve the same convergence rates with inexact derivatives
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Inexact gradients and Hessians suffice for OFF proximal Newton and regularized Newton methods to preserve global and local convergence rates identical to exact versions; for finite-sum problems adaptive sampling produces locally superlinear and quadratic convergence in expectation, while the OFF regularized Newton and negative curvature algorithm attains optimal worst-case iteration and sample complexity for locating approximate second-order stationary points.
What carries the argument
The objective-function-free (OFF) framework that substitutes verifiable inexact gradients and Hessians for exact function and derivative values, supported by lazy gradient and adaptive sampling strategies.
If this is right
- Global convergence rates remain unchanged from the exact-derivative case.
- Local convergence is superlinear or quadratic, matching the exact setting.
- Adaptive sampling removes the sample-size versus accuracy circularity for finite-sum problems.
- Worst-case complexity for approximate second-order points stays optimal.
Where Pith is reading between the lines
- The approach may lower computational cost for large-scale problems where full derivative evaluations are prohibitive.
- Similar inexactness tolerances could be tested in stochastic gradient settings beyond finite sums.
- Integration with existing solvers might allow direct replacement of exact Newton steps in practice.
Load-bearing premise
Inexact gradients and Hessians can be made to satisfy practical accuracy criteria without creating circular dependencies on sample sizes.
What would settle it
An instance of a nonconvex finite-sum problem where the adaptive sampling strategy produces gradient estimates that violate the accuracy criteria, resulting in loss of the claimed convergence rate.
read the original abstract
In this paper, we propose objective-function-free (OFF) variants of the proximal Newton method for nonconvex composite optimization problems and the regularized Newton method for unconstrained optimization problems, respectively, using inexact gradients and Hessians. Theoretical analyses verify that the global and local convergence rates of the proposed algorithms remain consistent with those attained under exact evaluations of the objective function and derivatives. To validate the practical applicability of the theoretical assumptions, a lazy gradient strategy is adopted to provide a verifiable scheme for satisfying the accuracy criteria of approximate gradients. For finite-sum optimization problems, an adaptive sampling strategy is further developed to eliminate the circular dependency between sample size and gradient estimation. The proposed algorithm is proven to achieve locally superlinear and quadratic convergence in expectation. Furthermore, we present an OFF regularized Newton and negative curvature algorithm, which uses inexact derivatives to locate approximate second-order stationary points in unconstrained optimization. The worst-case iteration and (sample) operation complexity of the developed algorithm is consistent with the optimal results in the existing literature.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes objective-function-free (OFF) variants of the proximal Newton method for nonconvex composite optimization and the regularized Newton method for unconstrained optimization, both using inexact gradients and Hessians. Theoretical analyses establish that global and local convergence rates match those achieved with exact derivatives. A lazy gradient strategy provides a verifiable scheme for satisfying the accuracy criteria, while an adaptive sampling strategy for finite-sum problems removes the circular dependency between sample size and gradient estimation error. The algorithms achieve locally superlinear and quadratic convergence in expectation. An additional OFF regularized Newton method incorporating negative curvature is developed to find approximate second-order stationary points, with worst-case iteration and sample complexity matching optimal bounds from the literature.
Significance. If the results hold, the work is significant for extending Newton-type methods to inexact derivative settings while preserving strong theoretical guarantees. The verifiable accuracy conditions via the lazy gradient strategy and the adaptive sampling rule that eliminates circular dependence between sample size and error tolerance are practical strengths, enabling application to large-scale and stochastic problems. The matching of optimal complexities for locating second-order stationary points further enhances relevance in nonconvex optimization.
minor comments (2)
- [Abstract] Abstract: the claim that theoretical analyses 'verify' the rates would be strengthened by a one-sentence outline of how error terms from inexact derivatives are controlled in the global convergence argument.
- The notation for the accuracy criteria (e.g., the specific bounds on gradient and Hessian errors) could be introduced earlier with a brief comparison to standard inexact Newton conditions to improve readability for readers outside the immediate subfield.
Simulated Author's Rebuttal
We thank the referee for their positive evaluation of our manuscript and for recommending minor revision. We appreciate the acknowledgment of the significance of extending Newton-type methods to inexact derivative settings while preserving convergence guarantees, as well as the practical value of the lazy gradient strategy and adaptive sampling approach.
Circularity Check
No significant circularity; derivations extend standard Newton analysis under independently verifiable inexactness conditions
full rationale
The paper presents OFF proximal Newton and regularized Newton methods with inexact gradients/Hessians, proving that global/local convergence rates match the exact-derivative case under explicit accuracy criteria. The lazy gradient strategy and adaptive sampling rule are constructed using only current gradient norms and variance bounds to meet these criteria without circular dependence between sample size and estimation error. No load-bearing steps reduce by definition or self-citation to the target rates; the analysis relies on standard Newton convergence frameworks augmented by verifiable tolerances. The worst-case complexities are shown consistent with known optimal bounds from the literature, without renaming or smuggling ansatzes. This is a self-contained extension with no evidence of the enumerated circular patterns.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The objective function satisfies standard twice-continuous differentiability and Lipschitz continuity assumptions on gradients and Hessians.
- ad hoc to paper Inexact derivatives can be made to satisfy explicit accuracy criteria that are independent of the current iterate in a verifiable way.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theoretical analyses verify that the global and local convergence rates of the proposed algorithms remain consistent with those attained under exact evaluations... worst-case iteration and sample complexity... O(ε^{-3/2})
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
lazy gradient strategy... adaptive sampling... eliminate the circular dependency between sample size and gradient estimation
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Y. Arjevani, Y. Carmon, J. C. Duchi, D. J. Foster, A. Sekhari, and K. Sridharan. Second-order information in non-convex stochastic optimization: power and limitations. InConference on Learning Theory, pages 242–299. PMLR, 2020. A. Beck.First-Order Methods in Optimization. Society for Industrial and Applied Mathematics,
work page 2020
-
[2]
Beck,First-order methods in optimization,MOS-SIAM Ser
Philadelphia, PA, 2017. doi: 10.1137/1.9781611974997. S. Bellavia, G. Gurioli, B. Morini, and P. L. Toint. Adaptive regularization for nonconvex optimization using inexact function values and randomly perturbed derivatives.Journal of Complexity, 68:101591, 2022a. S. Bellavia, G. Gurioli, B. Morini, and P. L. Toint. Trust-region algorithms: Probabilistic c...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.