pith. machine review for the scientific record. sign in

arxiv: 2605.08877 · v2 · submitted 2026-05-09 · 🧮 math.NA · cs.NA

Recognition: no theorem link

Non-Uniqueness of Solutions in Neural Variational Methods

Andreas Langer

Pith reviewed 2026-05-13 07:30 UTC · model grok-4.3

classification 🧮 math.NA cs.NA
keywords non-uniquenessneural variational methodsDeep Ritz methodweak PINNsill-posed optimizationfinite measurementsvariational regularization
0
0 comments X

The pith

Neural variational methods produce non-unique minimizers because finite linear measurements cannot uniquely determine expressive neural trial functions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an abstract framework showing that variational neural discretizations constrain trial functions only through finitely many linear measurements such as quadrature evaluations or finite test spaces. This structural feature renders the discrete optimization problems ill-posed, with non-unique or degenerate minimizers. The result holds for the Deep Ritz method, neural discretizations of variational regularization functionals, and weak PINNs alike. It occurs independently of whether the underlying continuous variational problem is well-posed. A reader would care because the same finite-measurement setup is common in these widely used approaches.

Core claim

We develop an abstract analytical framework that isolates this finite-information mechanism and extends its applicability beyond strong-form formulations. We apply the framework to three representative variational neural discretizations: the Deep Ritz method, neural network discretizations of variational regularization functionals, and weak PINNs. Despite their differing formulations, these methods constrain the neural trial function only through finitely many linear measurements, such as quadrature evaluations or finite-dimensional test spaces. We show that this structural feature leads to ill-posed discrete optimization problems, manifested by non-uniqueness or degeneracy of minimizers, 0

What carries the argument

Finite-information mechanism: the use of only finitely many linear measurements to constrain sufficiently expressive neural trial spaces.

If this is right

  • Discrete optimization problems in these methods are ill-posed.
  • Minimizers are non-unique or degenerate even when the continuous problem is well-posed.
  • The degeneracy appears across Deep Ritz, variational regularization, and weak PINN formulations.
  • The issue is independent of the specific differential equation or regularization functional.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Increasing the number of measurement points may shrink but not remove the set of minimizers without also restricting the neural space.
  • Additional parameter regularization on the network weights could be required to restore uniqueness in practice.
  • Hybrid discretizations that combine finite measurements with infinite-dimensional constraints on the trial space might avoid the degeneracy.

Load-bearing premise

Neural trial spaces are sufficiently expressive that finite-dimensional linear measurements fail to uniquely determine the trial function.

What would settle it

An explicit example of one of these methods in which the minimizer is provably unique when only finitely many linear measurements are used would falsify the non-uniqueness claim.

read the original abstract

Recent work has shown that strong-form physics-informed neural networks (PINNs) based on pointwise enforcement of differential operators can be ill-posed due to the combination of sufficiently expressive neural network trial spaces with finitely many measurements. In this work, we develop an abstract analytical framework that isolates this finite-information mechanism and extends its applicability beyond strong-form formulations. We apply the framework to three representative variational neural discretizations: the Deep Ritz method, neural network discretizations of variational regularization functionals, and weak PINNs. Despite their differing formulations, these methods constrain the neural trial function only through finitely many linear measurements, such as quadrature evaluations or finite-dimensional test spaces. We show that this structural feature leads to ill-posed discrete optimization problems, manifested by non-uniqueness or degeneracy of minimizers, independently of the well-posedness of the underlying continuous variational problem.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper develops an abstract analytical framework showing that three classes of neural variational methods—the Deep Ritz method, neural discretizations of variational regularization functionals, and weak PINNs—yield ill-posed discrete optimization problems. The core mechanism is that each method constrains the neural trial function only through finitely many linear measurements (quadrature evaluations or pairings against a finite test space). Consequently, the discrete loss is constant on positive-dimensional fibers of the measurement map, producing non-unique or degenerate minimizers independently of whether the underlying continuous variational problem is well-posed.

Significance. If the framework holds, the result supplies a structural explanation for observed non-uniqueness and instability in neural variational solvers that is independent of coercivity or uniqueness in the continuous limit. The generality across strong-form, weak-form, and energy-based formulations is a useful organizing principle that could inform the design of measurement-augmented or regularized training procedures.

major comments (2)
  1. [§3.2] §3.2, Definition 3.1: the measurement map M is stated to be finite-rank for all three methods, but the argument that its fibers have positive dimension relies on the neural trial space being sufficiently expressive; the manuscript should supply a precise statement of the minimal expressivity assumption (e.g., density in a Sobolev space or surjectivity onto a finite-dimensional subspace) that guarantees dim(ker M) > 0.
  2. [§4.1] §4.1, Theorem 4.3: the proof that every loss value is attained by a continuum of distinct functions is given in function space, yet the actual optimization is performed over network parameters θ. The manuscript does not address whether the parametrization map θ ↦ u_θ is surjective onto the relevant fiber; without this, non-uniqueness in function space need not imply multiple distinct parameter vectors attaining the same loss.
minor comments (2)
  1. [§2] Notation for the finite test space V_h is introduced in §2 but reused without redefinition in §5; a single consolidated notation table would improve readability.
  2. [Figure 2] Figure 2 caption states that the loss surface is plotted 'over a two-dimensional slice'; the axes labels and the precise parametrization of the slice should be stated explicitly.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment and constructive feedback. The comments highlight important points regarding the assumptions and the distinction between function space and parameter space, which we will clarify in the revision.

read point-by-point responses
  1. Referee: [§3.2] §3.2, Definition 3.1: the measurement map M is stated to be finite-rank for all three methods, but the argument that its fibers have positive dimension relies on the neural trial space being sufficiently expressive; the manuscript should supply a precise statement of the minimal expressivity assumption (e.g., density in a Sobolev space or surjectivity onto a finite-dimensional subspace) that guarantees dim(ker M) > 0.

    Authors: We agree with this observation. The current manuscript implicitly assumes sufficient expressivity of the neural networks to ensure that the measurement map has a non-trivial kernel. In the revised version, we will introduce a precise minimal expressivity assumption (e.g., that the neural trial functions are dense in the Sobolev space or that their linear span intersects the kernel of M non-trivially) immediately following Definition 3.1. This will make the condition for positive-dimensional fibers explicit. revision: yes

  2. Referee: [§4.1] §4.1, Theorem 4.3: the proof that every loss value is attained by a continuum of distinct functions is given in function space, yet the actual optimization is performed over network parameters θ. The manuscript does not address whether the parametrization map θ ↦ u_θ is surjective onto the relevant fiber; without this, non-uniqueness in function space need not imply multiple distinct parameter vectors attaining the same loss.

    Authors: The referee correctly identifies a gap between the function-space analysis and the parametric optimization. Theorem 4.3 shows non-uniqueness in the infinite-dimensional function space. To bridge this, we will add a discussion in §4.1 noting that, by the universal approximation property of neural networks, for sufficiently wide networks the parametrization map is dense in the function space. Consequently, the preimage of each fiber under the parametrization is non-empty and typically contains a continuum of parameters, leading to multiple distinct θ with the same loss value. We will also reference relevant results on the geometry of neural network loss landscapes. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The derivation isolates the finite-measurement property of the discrete loss (quadrature or finite test-space pairings) as the source of non-uniqueness in the optimization problem. This follows directly from the definitions of the trial space and loss functional without any reduction to fitted parameters, self-referential citations, or ansatzes smuggled from prior work. The argument is self-contained: any loss value attained by one trial function is attained by a positive-dimensional fiber of functions under the measurement map, independently of continuous coercivity. No load-bearing step collapses by construction to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that neural trial spaces remain sufficiently expressive relative to any finite set of linear measurements; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Neural trial functions are sufficiently expressive that finite linear measurements do not uniquely determine the function.
    Invoked to explain why finite constraints produce non-uniqueness or degeneracy.

pith-pipeline@v0.9.0 · 5431 in / 1155 out tokens · 38825 ms · 2026-05-13T07:30:13.315249+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages

  1. [1]

    S. Alliney. A property of the minimum vectors of a regularizing functional defined by means of the absolute norm. IEEE Transactions on Signal Processing , 45(4):913–917, 1997

  2. [2]

    Arora, A

    R. Arora, A. Basu, P. Mianjy, and A. Mukherjee. Understanding deep neural networks with rectified linear units. arXiv preprint arXiv:1611.01491 , 2016

  3. [3]

    Bergounioux and L

    M. Bergounioux and L. Piffet. A second-order model for image denoising. Set-Valued and Varia- tional Analysis, 18(3):277–306, 2010

  4. [4]

    Berrone, C

    S. Berrone, C. Canuto, and M. Pintore. Variational physics informed neural networks: the role of quadratures and test functions. Journal of Scientific Computing , 92(3):100, 2022

  5. [5]

    Chambolle, V

    A. Chambolle, V. Caselles, D. Cremers, M. Novaga, and T. Pock. An introduction to total variation for image analysis. Theoretical foundations and numerical methods for sparse recovery , 9:263–340, 2010

  6. [6]

    T. Chan, A. Marquina, and P. Mulet. High-order total variation-based image restoration. SIAM Journal on Scientific Computing , 22(2):503–516, 2000

  7. [7]

    T. F. Chan, S. H. Kang, and J. Shen. Euler’s elastica and curvature-based inpainting. SIAM J. Appl. Math., 63(2):564–592, 2002

  8. [8]

    Chartrand

    R. Chartrand. Exact reconstruction of sparse signals via nonconvex minimization. IEEE Signal Processing Letters, 14(10):707–710, 2007

  9. [9]

    Courte and M

    L. Courte and M. Zeinhofer. Robin pre-training for the deep Ritz method. Proceedings of the Northern Lights Deep Learning Workshop , 4, Jan. 2023

  10. [10]

    De Ryck, S

    T. De Ryck, S. Mishra, and R. Molinaro. wPINNs: Weak physics informed neural networks for approximating entropy solutions of hyperbolic conservation laws. SIAM Journal on Numerical Analysis, 62(2):811–841, 2024

  11. [11]

    W. E and B. Yu. The Deep Ritz Method: A deep learning-based numerical algorithm for solving variational problems. Communications in Mathematics and Statistics , 6(1):1–12, 2018

  12. [12]

    Fuks and H

    O. Fuks and H. A. Tchelepi. Limitations of physics informed machine learning for nonlinear two-phase transport in porous media. Journal of Machine Learning for Modeling and Computing , 1(1), 2020

  13. [13]

    Hinterberger and O

    W. Hinterberger and O. Scherzer. Variational methods on the space of functions of bounded Hessian for convexification and denoising. Computing, 76, 2006. 18 A. Langer

  14. [14]

    Hinterm¨ uller and A

    M. Hinterm¨ uller and A. Langer. Subspace correction methods for a class of nonsmooth and nonadditive convex variational problems with mixed L1/L2 data-fidelity in image processing. SIAM Journal on Imaging Sciences , 6(4):2134–2173, 2013

  15. [15]

    Khodayi-Mehr and M

    R. Khodayi-Mehr and M. Zavlanos. VarNet: Variational neural networks for the solution of partial differential equations. In A. M. Bayen, A. Jadbabaie, G. Pappas, P. A. Parrilo, B. Recht, C. Tomlin, and M. Zeilinger, editors, Proceedings of the 2nd Conference on Learning for Dynamics and Control, volume 120 of Proceedings of Machine Learning Research, page...

  16. [16]

    Krishnapriyan, A

    A. Krishnapriyan, A. Gholami, S. Zhe, R. Kirby, and M. W. Mahoney. Characterizing possible failure modes in physics-informed neural networks. Advances in Neural Information Processing Systems, 34:26548–26560, 2021

  17. [17]

    I. E. Lagaris, A. Likas, and D. I. Fotiadis. Artificial neural networks for solving ordinary and partial differential equations. IEEE Transactions on Neural Networks , 9(5):987–1000, 1998

  18. [18]

    Lai, X.-C

    R. Lai, X.-C. Tai, and T. F. Chan. A ridge and corner preserving model for surface restoration. SIAM Journal on Scientific Computing , 35(2):A675–A695, 2013

  19. [19]

    A. Langer. Automated parameter selection in the L1-L2-TV model for removing Gaussian plus impulse noise. Inverse Problems, 33(7):074002, 2017

  20. [20]

    A. Langer. The ill-posed foundations of physics-informed neural networks and their finite- difference variants. arXiv preprint arXiv:2601.07017 , 2026

  21. [21]

    Langer and S

    A. Langer and S. Behnamian. DeepTV: A neural network approach for total variation minimiza- tion. arXiv preprint arXiv:2409.05569 , 2024

  22. [22]

    K. L. Lim, R. Dutta, and M. Rotaru. Physics informed neural network using finite difference method. In 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC) , pages 1828–1833. IEEE, 2022

  23. [23]

    Lysaker, A

    M. Lysaker, A. Lundervold, and X.-C. Tai. Noise removal using fourth-order partial differen- tial equation with applications to medical magnetic resonance images in space and time. IEEE Transactions on Image Processing, 12(12):1579–1590, 2003

  24. [24]

    Lysaker and X.-C

    M. Lysaker and X.-C. Tai. Iterative image restoration combining total variation minimization and a second-order functional. International Journal of Computer Vision , 66(1):5–18, 2006

  25. [25]

    Nikolova

    M. Nikolova. Minimizers of cost-functions involving nonsmooth data-fidelity terms. Application to the processing of outliers. SIAM Journal on Numerical Analysis , 40(3):965–994 (electronic), 2002

  26. [26]

    Nikolova

    M. Nikolova. A variational approach to remove outliers and impulse noise.Journal of Mathematical Imaging and Vision , 20(1-2):99–120, 2004

  27. [27]

    Nikolova, M

    M. Nikolova, M. K. Ng, S. Zhang, and W.-K. Ching. Efficient reconstruction of piecewise constant images using nonsmooth nonconvex minimization. SIAM J. Imaging Sci. , 1(1):2–25, 2008

  28. [28]

    Papafitsoros and C

    K. Papafitsoros and C. B. Sch¨ onlieb. A combined first and second order variational approach for image reconstruction. J. Math. Imaging Vision , 48(2):308–338, 2014

  29. [29]

    A. Pinkus. Approximation theory of the MLP model in neural networks. Acta Numerica, 8:143– 195, 1999

  30. [30]

    M. Raissi. Deep hidden physics models: Deep learning of nonlinear partial differential equations. Journal of Machine Learning Research , 19(25):1–24, 2018. 19 A. Langer

  31. [31]

    Raissi, P

    M. Raissi, P. Perdikaris, and G. E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics , 378:686–707, 2019

  32. [32]

    J. A. Rivera, J. M. Taylor, ´A. J. Omella, and D. Pardo. On quadrature rules for solving par- tial differential equations using neural networks. Computer Methods in Applied Mechanics and Engineering, 393:114710, 2022

  33. [33]

    L. I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena , 60(1):259–268, 1992

  34. [34]

    Scherzer

    O. Scherzer. Denoising with higher order derivatives of bounded variation and an application to parameter estimation. Computing, 60(1):1–27, 1998

  35. [35]

    X.-C. Tai, J. Hahn, and G. J. Chung. A fast algorithm for Euler’s elastica model using augmented Lagrangian method. SIAM J. Imaging Sci. , 4(1):313–344, 2011

  36. [36]

    S. Wang, X. Yu, and P. Perdikaris. When and why PINNs fail to train: A neural tangent kernel perspective. Journal of Computational Physics , 449:110768, 2022. 20