OFF (Proximal) Newton-type Methods with Inexact Derivatives for Unconstrained Optimization

Hong Zhu

arxiv: 2512.06825 · v2 · pith:RK37UI7Cnew · submitted 2025-12-07 · 🧮 math.OC

OFF (Proximal) Newton-type Methods with Inexact Derivatives for Unconstrained Optimization

Hong Zhu This is my paper

Pith reviewed 2026-05-21 18:21 UTC · model grok-4.3

classification 🧮 math.OC

keywords objective-function-free methodsproximal Newtoninexact derivativesnonconvex optimizationadaptive samplingsecond-order stationary points

0 comments

The pith

Objective-function-free Newton methods achieve the same convergence rates with inexact derivatives

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops objective-function-free variants of proximal Newton methods for nonconvex composite problems and regularized Newton methods for unconstrained optimization that rely on inexact gradients and Hessians rather than exact evaluations. Theoretical analysis shows these methods retain the global convergence and local superlinear or quadratic rates of their exact counterparts when the approximations meet verifiable accuracy conditions. For finite-sum problems a lazy gradient strategy and adaptive sampling eliminate circular dependencies between sample size and estimation error, yielding convergence in expectation. The work further introduces an OFF regularized Newton and negative curvature algorithm whose worst-case iteration and sample complexity for approximate second-order stationary points matches optimal bounds in the literature.

Core claim

Inexact gradients and Hessians suffice for OFF proximal Newton and regularized Newton methods to preserve global and local convergence rates identical to exact versions; for finite-sum problems adaptive sampling produces locally superlinear and quadratic convergence in expectation, while the OFF regularized Newton and negative curvature algorithm attains optimal worst-case iteration and sample complexity for locating approximate second-order stationary points.

What carries the argument

The objective-function-free (OFF) framework that substitutes verifiable inexact gradients and Hessians for exact function and derivative values, supported by lazy gradient and adaptive sampling strategies.

If this is right

Global convergence rates remain unchanged from the exact-derivative case.
Local convergence is superlinear or quadratic, matching the exact setting.
Adaptive sampling removes the sample-size versus accuracy circularity for finite-sum problems.
Worst-case complexity for approximate second-order points stays optimal.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach may lower computational cost for large-scale problems where full derivative evaluations are prohibitive.
Similar inexactness tolerances could be tested in stochastic gradient settings beyond finite sums.
Integration with existing solvers might allow direct replacement of exact Newton steps in practice.

Load-bearing premise

Inexact gradients and Hessians can be made to satisfy practical accuracy criteria without creating circular dependencies on sample sizes.

What would settle it

An instance of a nonconvex finite-sum problem where the adaptive sampling strategy produces gradient estimates that violate the accuracy criteria, resulting in loss of the claimed convergence rate.

read the original abstract

In this paper, we propose objective-function-free (OFF) variants of the proximal Newton method for nonconvex composite optimization problems and the regularized Newton method for unconstrained optimization problems, respectively, using inexact gradients and Hessians. Theoretical analyses verify that the global and local convergence rates of the proposed algorithms remain consistent with those attained under exact evaluations of the objective function and derivatives. To validate the practical applicability of the theoretical assumptions, a lazy gradient strategy is adopted to provide a verifiable scheme for satisfying the accuracy criteria of approximate gradients. For finite-sum optimization problems, an adaptive sampling strategy is further developed to eliminate the circular dependency between sample size and gradient estimation. The proposed algorithm is proven to achieve locally superlinear and quadratic convergence in expectation. Furthermore, we present an OFF regularized Newton and negative curvature algorithm, which uses inexact derivatives to locate approximate second-order stationary points in unconstrained optimization. The worst-case iteration and (sample) operation complexity of the developed algorithm is consistent with the optimal results in the existing literature.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript proposes objective-function-free (OFF) variants of the proximal Newton method for nonconvex composite optimization and the regularized Newton method for unconstrained optimization, both using inexact gradients and Hessians. Theoretical analyses establish that global and local convergence rates match those achieved with exact derivatives. A lazy gradient strategy provides a verifiable scheme for satisfying the accuracy criteria, while an adaptive sampling strategy for finite-sum problems removes the circular dependency between sample size and gradient estimation error. The algorithms achieve locally superlinear and quadratic convergence in expectation. An additional OFF regularized Newton method incorporating negative curvature is developed to find approximate second-order stationary points, with worst-case iteration and sample complexity matching optimal bounds from the literature.

Significance. If the results hold, the work is significant for extending Newton-type methods to inexact derivative settings while preserving strong theoretical guarantees. The verifiable accuracy conditions via the lazy gradient strategy and the adaptive sampling rule that eliminates circular dependence between sample size and error tolerance are practical strengths, enabling application to large-scale and stochastic problems. The matching of optimal complexities for locating second-order stationary points further enhances relevance in nonconvex optimization.

minor comments (2)

[Abstract] Abstract: the claim that theoretical analyses 'verify' the rates would be strengthened by a one-sentence outline of how error terms from inexact derivatives are controlled in the global convergence argument.
The notation for the accuracy criteria (e.g., the specific bounds on gradient and Hessian errors) could be introduced earlier with a brief comparison to standard inexact Newton conditions to improve readability for readers outside the immediate subfield.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive evaluation of our manuscript and for recommending minor revision. We appreciate the acknowledgment of the significance of extending Newton-type methods to inexact derivative settings while preserving convergence guarantees, as well as the practical value of the lazy gradient strategy and adaptive sampling approach.

Circularity Check

0 steps flagged

No significant circularity; derivations extend standard Newton analysis under independently verifiable inexactness conditions

full rationale

The paper presents OFF proximal Newton and regularized Newton methods with inexact gradients/Hessians, proving that global/local convergence rates match the exact-derivative case under explicit accuracy criteria. The lazy gradient strategy and adaptive sampling rule are constructed using only current gradient norms and variance bounds to meet these criteria without circular dependence between sample size and estimation error. No load-bearing steps reduce by definition or self-citation to the target rates; the analysis relies on standard Newton convergence frameworks augmented by verifiable tolerances. The worst-case complexities are shown consistent with known optimal bounds from the literature, without renaming or smuggling ansatzes. This is a self-contained extension with no evidence of the enumerated circular patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard smoothness and boundedness assumptions typical for Newton convergence proofs plus new accuracy conditions on inexact derivatives that the lazy and adaptive strategies are meant to satisfy.

axioms (2)

domain assumption The objective function satisfies standard twice-continuous differentiability and Lipschitz continuity assumptions on gradients and Hessians.
These are the background conditions required for any Newton-type convergence analysis in nonconvex settings.
ad hoc to paper Inexact derivatives can be made to satisfy explicit accuracy criteria that are independent of the current iterate in a verifiable way.
The paper introduces and relies on these accuracy criteria to prove that convergence rates match the exact case.

pith-pipeline@v0.9.0 · 5699 in / 1365 out tokens · 61455 ms · 2026-05-21T18:21:32.184801+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theoretical analyses verify that the global and local convergence rates of the proposed algorithms remain consistent with those attained under exact evaluations... worst-case iteration and sample complexity... O(ε^{-3/2})
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

lazy gradient strategy... adaptive sampling... eliminate the circular dependency between sample size and gradient estimation

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages

[1]

Arjevani, Y

Y. Arjevani, Y. Carmon, J. C. Duchi, D. J. Foster, A. Sekhari, and K. Sridharan. Second-order information in non-convex stochastic optimization: power and limitations. InConference on Learning Theory, pages 242–299. PMLR, 2020. A. Beck.First-Order Methods in Optimization. Society for Industrial and Applied Mathematics,

work page 2020
[2]

Beck,First-order methods in optimization,MOS-SIAM Ser

Philadelphia, PA, 2017. doi: 10.1137/1.9781611974997. S. Bellavia, G. Gurioli, B. Morini, and P. L. Toint. Adaptive regularization for nonconvex optimization using inexact function values and randomly perturbed derivatives.Journal of Complexity, 68:101591, 2022a. S. Bellavia, G. Gurioli, B. Morini, and P. L. Toint. Trust-region algorithms: Probabilistic c...

work page doi:10.1137/1.9781611974997 2017

[1] [1]

Arjevani, Y

Y. Arjevani, Y. Carmon, J. C. Duchi, D. J. Foster, A. Sekhari, and K. Sridharan. Second-order information in non-convex stochastic optimization: power and limitations. InConference on Learning Theory, pages 242–299. PMLR, 2020. A. Beck.First-Order Methods in Optimization. Society for Industrial and Applied Mathematics,

work page 2020

[2] [2]

Beck,First-order methods in optimization,MOS-SIAM Ser

Philadelphia, PA, 2017. doi: 10.1137/1.9781611974997. S. Bellavia, G. Gurioli, B. Morini, and P. L. Toint. Adaptive regularization for nonconvex optimization using inexact function values and randomly perturbed derivatives.Journal of Complexity, 68:101591, 2022a. S. Bellavia, G. Gurioli, B. Morini, and P. L. Toint. Trust-region algorithms: Probabilistic c...

work page doi:10.1137/1.9781611974997 2017