and XI, Y

SCOTT, M · 2025 · math.NA · arXiv 2511.19716

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Stochastic Gradient Descent (SGD) often slows in the late stage of training due to anisotropic curvature and gradient noise. We analyze preconditioned SGD in the geometry induced by a symmetric positive definite matrix $\mathbf{M}$, deriving bounds in which both the convergence rate and the stochastic noise floor are governed by $\mathbf{M}$-dependent quantities: the rate through an effective condition number in the $\mathbf{M}$-metric, and the floor through the product of that condition number and the preconditioned noise level. For nonconvex objectives, we establish a preconditioner-dependent basin-stability guarantee: when smoothness and basin size are measured in the $\mathbf{M}$-norm, the probability that the iterates remain in a well-behaved local region admits an explicit lower bound. This perspective is particularly relevant in Scientific Machine Learning (SciML), where achieving small training loss under stochastic updates is closely tied to physical fidelity, numerical stability, and constraint satisfaction. The framework applies to both diagonal/adaptive and curvature-aware preconditioners and yields a simple design principle: choose $\mathbf{M}$ to improve local conditioning while attenuating noise. Experiments on a quadratic diagnostic and three SciML benchmarks validate the predicted rate-floor behavior.

representative citing papers

When Does Dynamic Preconditioning Preserve the Polyak-Ruppert CLT? A Stabilization Threshold

math.ST · 2026-04-26 · unverdicted · novelty 7.0

Dynamic preconditioning preserves the Polyak-Ruppert CLT for averaged SGD if the preconditioner stabilizes at rate β > (α + 1)/2.

On subspace-constrained preconditioning for randomized iterative methods

math.NA · 2026-05-28 · unverdicted · novelty 5.0

Refines subspace preconditioning for randomized linear solvers via QR-like factorization, enabling implicit use and proving expected linear convergence while reducing to a smaller system with good singular values.

citing papers explorer

Showing 2 of 2 citing papers.

When Does Dynamic Preconditioning Preserve the Polyak-Ruppert CLT? A Stabilization Threshold math.ST · 2026-04-26 · unverdicted · none · ref 35 · internal anchor
Dynamic preconditioning preserves the Polyak-Ruppert CLT for averaged SGD if the preconditioner stabilizes at rate β > (α + 1)/2.
On subspace-constrained preconditioning for randomized iterative methods math.NA · 2026-05-28 · unverdicted · none · ref 68 · internal anchor
Refines subspace preconditioning for randomized linear solvers via QR-like factorization, enabling implicit use and proving expected linear convergence while reducing to a smaller system with good singular values.

and XI, Y

fields

years

verdicts

representative citing papers

citing papers explorer