Smooth, globally Polyak-{L}ojasiewicz functions are nonlinear least-squares
Pith reviewed 2026-05-10 17:54 UTC · model grok-4.3
The pith
A smooth globally PL function on a contractible manifold must be a nonlinear sum of squares.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
If f is C^∞ smooth and the manifold M is contractible, then global PL implies that f(x) equals f* plus the squared norm of a submersion φ from M into Euclidean space of dimension equal to the codimension of the minimizer set S. The proof proceeds by establishing that the endpoint map of the negative gradient flow is a trivial smooth fiber bundle over S.
What carries the argument
The endpoint map of negative gradient flow, which is shown to be a trivial smooth fiber bundle over the minimizer set S.
If this is right
- The minimizer set S must be a smooth submanifold of M.
- Either S is diffeomorphic to Euclidean space, in which case a smooth change of coordinates turns f into a convex quadratic, or S must have exotic topology such as that of the Whitehead manifold.
- There exists a complete Riemannian metric on M under which f remains PL and is geodesically convex.
- The PL condition forces f to be a nonlinear least-squares problem whose residuals are given by the submersion φ.
Where Pith is reading between the lines
- Many optimization problems that satisfy PL may admit a hidden sum-of-squares representation after a suitable smooth reparametrization.
- The possible topologies of minimizer sets for global PL functions are limited on contractible domains, excluding many common manifolds unless the ambient space is non-contractible.
- Gradient-based methods on PL functions may be implicitly following the fibers of this submersion, which could explain observed fast convergence rates.
Load-bearing premise
The manifold is contractible and the function is infinitely differentiable, allowing the gradient flow endpoint map to be a trivial fiber bundle.
What would settle it
A C^∞ function on R^n that satisfies the global PL inequality everywhere but whose minimizer set is not the base of a smooth fiber bundle given by gradient flow trajectories.
Figures
read the original abstract
The Polyak-{\L}ojasiewicz (P{\L}) condition is often invoked in nonconvex optimization because it allows fast convergence of algorithms beyond strong convexity. A function $f \colon \mathcal{M} \to \mathbb{R}$ on a Riemannian manifold $\mathcal{M}$ is globally P{\L} if $\|\nabla f(x)\|^2 \geq 2\mu(f(x) - f^*)$ for all $x$, where $f^* = \inf f$ and $\mu > 0$. How much does this pointwise, first-order inequality constrain $f$ and its set of minimizers $S$? We show that if $f$ is also smooth ($C^\infty$) and $\mathcal{M}$ is contractible (e.g., if $\mathcal{M} = \mathbb{R}^n$), then the P{\L} condition imposes a firm global structure: such a function is necessarily of the form $f(x) = f^* + \|\varphi(x)\|^2$ (a nonlinear sum of squares) where $\varphi \colon \mathcal{M} \to \mathbb{R}^k$ is a submersion, and $k$ is the codimension of $S$ in $\mathcal{M}$. The proof hinges on showing that the end-point map of negative gradient flow on $f$ is a trivial smooth fiber bundle over $S$. This rigidity leads to a striking dichotomy. Either $S$ is diffeomorphic to a Euclidean space, in which case $f$ can be transformed into a convex quadratic by a smooth change of coordinates. Or $S$ must display genuinely exotic geometry; for example, it can be diffeomorphic to the Whitehead manifold. As a further consequence, we show that there exists a complete Riemannian metric on $\mathcal{M}$ under which $f$ remains P{\L} and becomes geodesically convex.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that a C^∞ function f on a contractible Riemannian manifold M satisfying the global Polyak-Łojasiewicz (PL) inequality ||∇f||² ≥ 2μ(f - f*) must be a nonlinear least-squares function: f(x) = f* + ||φ(x)||² for a submersion φ: M → R^k (k = codim S), where S is the minimizer set. The proof proceeds by showing that the endpoint map of the negative gradient flow is a C^∞ trivial fiber bundle over S. Consequences include a dichotomy on the topology of S (either diffeomorphic to Euclidean space or exotic, e.g., Whitehead manifold) and the existence of a complete metric on M under which f is both PL and geodesically convex.
Significance. If the central structural claim holds, the result supplies a sharp differential-geometric characterization of globally PL functions, linking first-order optimization conditions to the geometry of fiber bundles and nonlinear least squares. This could inform landscape analysis in nonconvex optimization and the design of coordinate changes that convexify PL problems. The derivation is direct from the PL inequality plus smoothness and contractibility, without fitted parameters or circular reductions.
major comments (2)
- [§3] §3 (construction of the endpoint map π): the claim that π(x) := lim_{t→∞} φ_t(x) is C^∞ (and a submersion) requires justifying that the t→∞ limit commutes with all higher derivatives D^k for k≥2. The PL decay f(φ_t(x)) ≤ (f(x)-f*)e^{-2μ t} controls C^0 and C^1 behavior via ||∇f||, but supplies no uniform bound on ||D^k φ_t|| that would permit interchanging lim and differentiation. A detailed a-priori estimate or invocation of a specific theorem on smoothness of infinite-time flows is needed; without it the identification f = f* + ||φ||² and the fiber-bundle conclusion are not yet secured.
- [Theorem 4.1] Theorem 4.1 (dichotomy for S): the statement that S is either diffeomorphic to R^m or must be exotic (e.g., Whitehead manifold) rests on the same endpoint map being a smooth trivial bundle. If the smoothness of π is only C^1, the topological conclusions weaken and the claim that S cannot be, say, a compact manifold without boundary becomes conditional on an unproven regularity step.
minor comments (2)
- [§2] Notation: the codimension k is introduced as the dimension of the target of φ, but the relation k = codim_M S is stated only in the abstract; add an explicit sentence in §2 relating the rank of Dφ to the dimension of the normal bundle of S.
- [§5] The final metric-construction result (existence of a complete metric making f geodesically convex while preserving PL) is stated without a numbered theorem or equation reference; label it as Theorem 5.3 and give the explicit conformal factor or warping function used.
Simulated Author's Rebuttal
We thank the referee for the careful and constructive report. The comments correctly identify that the current exposition in §3 leaves the higher-order regularity of the endpoint map insufficiently justified. We will revise the manuscript to supply the missing a-priori estimates, thereby securing both the C^∞ structure and the subsequent topological conclusions.
read point-by-point responses
-
Referee: [§3] §3 (construction of the endpoint map π): the claim that π(x) := lim_{t→∞} φ_t(x) is C^∞ (and a submersion) requires justifying that the t→∞ limit commutes with all higher derivatives D^k for k≥2. The PL decay f(φ_t(x)) ≤ (f(x)-f*)e^{-2μ t} controls C^0 and C^1 behavior via ||∇f||, but supplies no uniform bound on ||D^k φ_t|| that would permit interchanging lim and differentiation. A detailed a-priori estimate or invocation of a specific theorem on smoothness of infinite-time flows is needed; without it the identification f = f* + ||φ||² and the fiber-bundle conclusion are not yet secured.
Authors: We agree that the present argument does not explicitly bound the higher derivatives of the flow. In the revised manuscript we will insert a new technical lemma (Lemma 3.4) that derives uniform-in-t bounds on ||D^k φ_t|| for every k by induction on k. The base cases k=0,1 follow directly from the global PL inequality and the exponential decay of f(φ_t). For the inductive step we differentiate the ODE dφ_t/dt = -∇f(φ_t) repeatedly, apply Faà di Bruno’s formula, and close the resulting differential inequality for the k-th derivative via a Gronwall estimate that exploits the exponential decay of the lower-order terms. The resulting bound is independent of t, so the limit π inherits C^∞ regularity. Surjectivity of dπ (hence the submersion property) follows from the fact that the fibers are the stable manifolds of the gradient flow, which are transverse to the level sets of f. With this lemma in place the identification f = f* + ||π||² and the trivial-bundle structure are fully justified. revision: yes
-
Referee: [Theorem 4.1] Theorem 4.1 (dichotomy for S): the statement that S is either diffeomorphic to R^m or must be exotic (e.g., Whitehead manifold) rests on the same endpoint map being a smooth trivial bundle. If the smoothness of π is only C^1, the topological conclusions weaken and the claim that S cannot be, say, a compact manifold without boundary becomes conditional on an unproven regularity step.
Authors: The proof of Theorem 4.1 invokes the smooth trivial-bundle theorem for the endpoint map π to deduce that S is diffeomorphic to a Euclidean space or must carry an exotic smooth structure. Once the C^∞ regularity of π is established by the new lemma in §3, the topological dichotomy follows without further hypotheses. We will add a short clarifying paragraph after the statement of Theorem 4.1 that explicitly records this dependence and notes that the argument rules out compact S (or any manifold with nontrivial fundamental group at infinity) precisely because a smooth trivial bundle over such an S would contradict contractibility of M. revision: yes
Circularity Check
No circularity: direct derivation from PL + smoothness + contractibility
full rationale
The paper's central claim follows from the global PL inequality ||∇f||² ≥ 2μ(f - f*) together with C^∞ smoothness and contractibility of M. It shows that the negative-gradient-flow endpoint map is a trivial smooth fiber bundle over S, yielding f = f* + ||φ||² with φ a submersion. This is a standard differential-geometric argument that does not presuppose the conclusion, fit parameters to data, rename known results, or rely on load-bearing self-citations. External background results from Riemannian geometry are invoked without circular reduction. The derivation chain is self-contained and non-tautological.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption f is C^∞ smooth on the Riemannian manifold M
- domain assumption M is contractible
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
We show that if f is also smooth (C^∞) and M is contractible ... then the PŁ condition imposes a firm global structure: such a function is necessarily of the form f(x) = f* + ||φ(x)||² ... where φ : M → R^k is a submersion ... The proof hinges on showing that the end-point map of negative gradient flow on f is a trivial smooth fiber bundle over S.
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
the end-point map π : M → S ... is a smooth submersion ... π is a trivial smooth fiber bundle ... f(y) = f* + ||φ(y)||²
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
doi: 10.1090/noti1837. S. Chatterjee. Convergence of gradient descent for deep neural networks, 2022. URLhttps: //arxiv.org/abs/2203.16462. S. Chen, Z. Lin, Y. Polyanskiy, and P. Rigollet. Quantitative clustering in mean-field trans- former models, 2025a. URLhttps://arxiv.org/abs/2504.14697. X. Chen, L. Xin, and M. Zhao. Hidden convexity in queueing model...
-
[2]
doi: doi:10.1515/9781400861064. G. Garrigos. Square distance functions are Polyak-Łojasiewicz and vice-versa, 2023. URL https://arxiv.org/abs/2301.10332. J. Glimm. Two cartesian products which are euclidean spaces.Bulletin de la Société Math- ématique de France, 88:131–135, 1960. URLhttp://www.numdam.org/item?id=BSMF_ 1960__88__131_0. P. Goldstein, Z. Gro...
-
[3]
doi: 10.1007/bfb0092042. L. Hörmander.The Analysis of Linear Partial Differential Operators III: Pseudo-Differential Operators. Classics in Mathematics. Springer Berlin Heidelberg, 2007. doi: 10.1007/ 978-3-540-49938-1. L. S. Husch and T. M. Price. Finding a boundary for a 3-manifold.Annals of Mathematics, 91(1):223–235, 1970. URLhttp://www.jstor.org/stab...
-
[4]
doi: 10.1007/BF01830674. U. Marteau-Ferey, F. Bach, and A. Rudi. Second order conditions to decompose smooth functions as sums of squares.SIAM Journal on Optimization, 34(1):616–641, 2024. B. Mazur. A note on some contractible 4-manifolds.Annals of Mathematics, 73(1):221–228,
-
[5]
URLhttp://www.jstor.org/stable/1970288
ISSN 0003486X, 19398980. URLhttp://www.jstor.org/stable/1970288. D. McMillan. Cartesian products of contractible open manifolds.Bulletin of the American Mathematical Society, 67(5):510–514, 1961. Communicated by Edwin Moise, June 27, 1961. D. McMillan and E. Zeeman. On contractible open manifolds.Mathematical Proceed- ings of the Cambridge Philosophical S...
-
[6]
Mathematical Society of Japan. doi: 10.2969/aspm/00310423. J. Stallings. The piecewise-linear structure of Euclidean space.Proceedings of the Cambridge Philosophical Society, 58(3):481–488, 1962. doi: 10.1017/S0305004100036403. C. Udrişte.Convex functions and optimization methods on Riemannian manifolds, volume 297 ofMathematics and its applications. Kluw...
-
[7]
URLhttps://proceedings.mlr.press/v195/yue23a.html
PMLR, 2023. URLhttps://proceedings.mlr.press/v195/yue23a.html. 46
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.