pith. sign in

arxiv: 2604.07972 · v1 · submitted 2026-04-09 · 🧮 math.OC · math.DG· math.DS

Smooth, globally Polyak-{L}ojasiewicz functions are nonlinear least-squares

Pith reviewed 2026-05-10 17:54 UTC · model grok-4.3

classification 🧮 math.OC math.DGmath.DS
keywords Polyak-Łojasiewicz conditionnonlinear least squaresRiemannian optimizationgradient flowsubmersionminimizer geometryfiber bundle
0
0 comments X

The pith

A smooth globally PL function on a contractible manifold must be a nonlinear sum of squares.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proves that the global Polyak-Łojasiewicz inequality, combined with infinite differentiability, rigidly determines the form of a function on any contractible manifold such as Euclidean space. Under these conditions the function equals its minimum value plus the squared Euclidean norm of a submersion whose image dimension equals the codimension of the minimizer set. This structure is obtained by showing that the endpoint map of negative gradient flow is a trivial smooth fiber bundle over the minimizers. The result implies that the geometry of the minimizer set is highly constrained: it is either diffeomorphic to Euclidean space or must carry exotic topology.

Core claim

If f is C^∞ smooth and the manifold M is contractible, then global PL implies that f(x) equals f* plus the squared norm of a submersion φ from M into Euclidean space of dimension equal to the codimension of the minimizer set S. The proof proceeds by establishing that the endpoint map of the negative gradient flow is a trivial smooth fiber bundle over S.

What carries the argument

The endpoint map of negative gradient flow, which is shown to be a trivial smooth fiber bundle over the minimizer set S.

If this is right

  • The minimizer set S must be a smooth submanifold of M.
  • Either S is diffeomorphic to Euclidean space, in which case a smooth change of coordinates turns f into a convex quadratic, or S must have exotic topology such as that of the Whitehead manifold.
  • There exists a complete Riemannian metric on M under which f remains PL and is geodesically convex.
  • The PL condition forces f to be a nonlinear least-squares problem whose residuals are given by the submersion φ.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Many optimization problems that satisfy PL may admit a hidden sum-of-squares representation after a suitable smooth reparametrization.
  • The possible topologies of minimizer sets for global PL functions are limited on contractible domains, excluding many common manifolds unless the ambient space is non-contractible.
  • Gradient-based methods on PL functions may be implicitly following the fibers of this submersion, which could explain observed fast convergence rates.

Load-bearing premise

The manifold is contractible and the function is infinitely differentiable, allowing the gradient flow endpoint map to be a trivial fiber bundle.

What would settle it

A C^∞ function on R^n that satisfies the global PL inequality everywhere but whose minimizer set is not the base of a smooth fiber bundle given by gradient flow trajectories.

Figures

Figures reproduced from arXiv: 2604.07972 by Christopher Criscitiello, Nicolas Boumal, Quentin Rebjock.

Figure 1
Figure 1. Figure 1: Background colors indicate level sets of some function [PITH_FULL_IMAGE:figures/full_fig_p014_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of the proof for Theorem 4.6. Background colors indicate level sets of f(x) = 1 2 [PITH_FULL_IMAGE:figures/full_fig_p022_2.png] view at source ↗
read the original abstract

The Polyak-{\L}ojasiewicz (P{\L}) condition is often invoked in nonconvex optimization because it allows fast convergence of algorithms beyond strong convexity. A function $f \colon \mathcal{M} \to \mathbb{R}$ on a Riemannian manifold $\mathcal{M}$ is globally P{\L} if $\|\nabla f(x)\|^2 \geq 2\mu(f(x) - f^*)$ for all $x$, where $f^* = \inf f$ and $\mu > 0$. How much does this pointwise, first-order inequality constrain $f$ and its set of minimizers $S$? We show that if $f$ is also smooth ($C^\infty$) and $\mathcal{M}$ is contractible (e.g., if $\mathcal{M} = \mathbb{R}^n$), then the P{\L} condition imposes a firm global structure: such a function is necessarily of the form $f(x) = f^* + \|\varphi(x)\|^2$ (a nonlinear sum of squares) where $\varphi \colon \mathcal{M} \to \mathbb{R}^k$ is a submersion, and $k$ is the codimension of $S$ in $\mathcal{M}$. The proof hinges on showing that the end-point map of negative gradient flow on $f$ is a trivial smooth fiber bundle over $S$. This rigidity leads to a striking dichotomy. Either $S$ is diffeomorphic to a Euclidean space, in which case $f$ can be transformed into a convex quadratic by a smooth change of coordinates. Or $S$ must display genuinely exotic geometry; for example, it can be diffeomorphic to the Whitehead manifold. As a further consequence, we show that there exists a complete Riemannian metric on $\mathcal{M}$ under which $f$ remains P{\L} and becomes geodesically convex.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that a C^∞ function f on a contractible Riemannian manifold M satisfying the global Polyak-Łojasiewicz (PL) inequality ||∇f||² ≥ 2μ(f - f*) must be a nonlinear least-squares function: f(x) = f* + ||φ(x)||² for a submersion φ: M → R^k (k = codim S), where S is the minimizer set. The proof proceeds by showing that the endpoint map of the negative gradient flow is a C^∞ trivial fiber bundle over S. Consequences include a dichotomy on the topology of S (either diffeomorphic to Euclidean space or exotic, e.g., Whitehead manifold) and the existence of a complete metric on M under which f is both PL and geodesically convex.

Significance. If the central structural claim holds, the result supplies a sharp differential-geometric characterization of globally PL functions, linking first-order optimization conditions to the geometry of fiber bundles and nonlinear least squares. This could inform landscape analysis in nonconvex optimization and the design of coordinate changes that convexify PL problems. The derivation is direct from the PL inequality plus smoothness and contractibility, without fitted parameters or circular reductions.

major comments (2)
  1. [§3] §3 (construction of the endpoint map π): the claim that π(x) := lim_{t→∞} φ_t(x) is C^∞ (and a submersion) requires justifying that the t→∞ limit commutes with all higher derivatives D^k for k≥2. The PL decay f(φ_t(x)) ≤ (f(x)-f*)e^{-2μ t} controls C^0 and C^1 behavior via ||∇f||, but supplies no uniform bound on ||D^k φ_t|| that would permit interchanging lim and differentiation. A detailed a-priori estimate or invocation of a specific theorem on smoothness of infinite-time flows is needed; without it the identification f = f* + ||φ||² and the fiber-bundle conclusion are not yet secured.
  2. [Theorem 4.1] Theorem 4.1 (dichotomy for S): the statement that S is either diffeomorphic to R^m or must be exotic (e.g., Whitehead manifold) rests on the same endpoint map being a smooth trivial bundle. If the smoothness of π is only C^1, the topological conclusions weaken and the claim that S cannot be, say, a compact manifold without boundary becomes conditional on an unproven regularity step.
minor comments (2)
  1. [§2] Notation: the codimension k is introduced as the dimension of the target of φ, but the relation k = codim_M S is stated only in the abstract; add an explicit sentence in §2 relating the rank of Dφ to the dimension of the normal bundle of S.
  2. [§5] The final metric-construction result (existence of a complete metric making f geodesically convex while preserving PL) is stated without a numbered theorem or equation reference; label it as Theorem 5.3 and give the explicit conformal factor or warping function used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive report. The comments correctly identify that the current exposition in §3 leaves the higher-order regularity of the endpoint map insufficiently justified. We will revise the manuscript to supply the missing a-priori estimates, thereby securing both the C^∞ structure and the subsequent topological conclusions.

read point-by-point responses
  1. Referee: [§3] §3 (construction of the endpoint map π): the claim that π(x) := lim_{t→∞} φ_t(x) is C^∞ (and a submersion) requires justifying that the t→∞ limit commutes with all higher derivatives D^k for k≥2. The PL decay f(φ_t(x)) ≤ (f(x)-f*)e^{-2μ t} controls C^0 and C^1 behavior via ||∇f||, but supplies no uniform bound on ||D^k φ_t|| that would permit interchanging lim and differentiation. A detailed a-priori estimate or invocation of a specific theorem on smoothness of infinite-time flows is needed; without it the identification f = f* + ||φ||² and the fiber-bundle conclusion are not yet secured.

    Authors: We agree that the present argument does not explicitly bound the higher derivatives of the flow. In the revised manuscript we will insert a new technical lemma (Lemma 3.4) that derives uniform-in-t bounds on ||D^k φ_t|| for every k by induction on k. The base cases k=0,1 follow directly from the global PL inequality and the exponential decay of f(φ_t). For the inductive step we differentiate the ODE dφ_t/dt = -∇f(φ_t) repeatedly, apply Faà di Bruno’s formula, and close the resulting differential inequality for the k-th derivative via a Gronwall estimate that exploits the exponential decay of the lower-order terms. The resulting bound is independent of t, so the limit π inherits C^∞ regularity. Surjectivity of dπ (hence the submersion property) follows from the fact that the fibers are the stable manifolds of the gradient flow, which are transverse to the level sets of f. With this lemma in place the identification f = f* + ||π||² and the trivial-bundle structure are fully justified. revision: yes

  2. Referee: [Theorem 4.1] Theorem 4.1 (dichotomy for S): the statement that S is either diffeomorphic to R^m or must be exotic (e.g., Whitehead manifold) rests on the same endpoint map being a smooth trivial bundle. If the smoothness of π is only C^1, the topological conclusions weaken and the claim that S cannot be, say, a compact manifold without boundary becomes conditional on an unproven regularity step.

    Authors: The proof of Theorem 4.1 invokes the smooth trivial-bundle theorem for the endpoint map π to deduce that S is diffeomorphic to a Euclidean space or must carry an exotic smooth structure. Once the C^∞ regularity of π is established by the new lemma in §3, the topological dichotomy follows without further hypotheses. We will add a short clarifying paragraph after the statement of Theorem 4.1 that explicitly records this dependence and notes that the argument rules out compact S (or any manifold with nontrivial fundamental group at infinity) precisely because a smooth trivial bundle over such an S would contradict contractibility of M. revision: yes

Circularity Check

0 steps flagged

No circularity: direct derivation from PL + smoothness + contractibility

full rationale

The paper's central claim follows from the global PL inequality ||∇f||² ≥ 2μ(f - f*) together with C^∞ smoothness and contractibility of M. It shows that the negative-gradient-flow endpoint map is a trivial smooth fiber bundle over S, yielding f = f* + ||φ||² with φ a submersion. This is a standard differential-geometric argument that does not presuppose the conclusion, fit parameters to data, rename known results, or rely on load-bearing self-citations. External background results from Riemannian geometry are invoked without circular reduction. The derivation chain is self-contained and non-tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper introduces no free parameters or invented entities. It relies only on standard domain assumptions from smooth manifold theory and the given definition of the global PL inequality.

axioms (2)
  • domain assumption f is C^∞ smooth on the Riemannian manifold M
    Invoked to guarantee that the negative gradient flow is a smooth dynamical system whose end-point map is well-defined and smooth.
  • domain assumption M is contractible
    Used to conclude that the fiber bundle formed by the gradient-flow end-point map is trivial.

pith-pipeline@v0.9.0 · 5662 in / 1679 out tokens · 49771 ms · 2026-05-10T17:54:25.608045+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes
    ?
    echoes

    ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

    We show that if f is also smooth (C^∞) and M is contractible ... then the PŁ condition imposes a firm global structure: such a function is necessarily of the form f(x) = f* + ||φ(x)||² ... where φ : M → R^k is a submersion ... The proof hinges on showing that the end-point map of negative gradient flow on f is a trivial smooth fiber bundle over S.

  • IndisputableMonolith/Foundation/BranchSelection.lean branch_selection echoes
    ?
    echoes

    ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

    the end-point map π : M → S ... is a smooth submersion ... π is a trivial smooth fiber bundle ... f(y) = f* + ||φ(y)||²

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

7 extracted references · 7 canonical work pages

  1. [1]

    doi: 10.1090/noti1837. S. Chatterjee. Convergence of gradient descent for deep neural networks, 2022. URLhttps: //arxiv.org/abs/2203.16462. S. Chen, Z. Lin, Y. Polyanskiy, and P. Rigollet. Quantitative clustering in mean-field trans- former models, 2025a. URLhttps://arxiv.org/abs/2504.14697. X. Chen, L. Xin, and M. Zhao. Hidden convexity in queueing model...

  2. [2]

    doi: doi:10.1515/9781400861064. G. Garrigos. Square distance functions are Polyak-Łojasiewicz and vice-versa, 2023. URL https://arxiv.org/abs/2301.10332. J. Glimm. Two cartesian products which are euclidean spaces.Bulletin de la Société Math- ématique de France, 88:131–135, 1960. URLhttp://www.numdam.org/item?id=BSMF_ 1960__88__131_0. P. Goldstein, Z. Gro...

  3. [3]

    doi: 10.1007/bfb0092042. L. Hörmander.The Analysis of Linear Partial Differential Operators III: Pseudo-Differential Operators. Classics in Mathematics. Springer Berlin Heidelberg, 2007. doi: 10.1007/ 978-3-540-49938-1. L. S. Husch and T. M. Price. Finding a boundary for a 3-manifold.Annals of Mathematics, 91(1):223–235, 1970. URLhttp://www.jstor.org/stab...

  4. [4]

    doi: 10.1007/BF01830674. U. Marteau-Ferey, F. Bach, and A. Rudi. Second order conditions to decompose smooth functions as sums of squares.SIAM Journal on Optimization, 34(1):616–641, 2024. B. Mazur. A note on some contractible 4-manifolds.Annals of Mathematics, 73(1):221–228,

  5. [5]

    URLhttp://www.jstor.org/stable/1970288

    ISSN 0003486X, 19398980. URLhttp://www.jstor.org/stable/1970288. D. McMillan. Cartesian products of contractible open manifolds.Bulletin of the American Mathematical Society, 67(5):510–514, 1961. Communicated by Edwin Moise, June 27, 1961. D. McMillan and E. Zeeman. On contractible open manifolds.Mathematical Proceed- ings of the Cambridge Philosophical S...

  6. [6]

    doi: 10.2969/aspm/00310423

    Mathematical Society of Japan. doi: 10.2969/aspm/00310423. J. Stallings. The piecewise-linear structure of Euclidean space.Proceedings of the Cambridge Philosophical Society, 58(3):481–488, 1962. doi: 10.1017/S0305004100036403. C. Udrişte.Convex functions and optimization methods on Riemannian manifolds, volume 297 ofMathematics and its applications. Kluw...

  7. [7]

    URLhttps://proceedings.mlr.press/v195/yue23a.html

    PMLR, 2023. URLhttps://proceedings.mlr.press/v195/yue23a.html. 46