arxiv: 2605.08877 · v2 · submitted 2026-05-09 · 🧮 math.NA · cs.NA

Recognition: no theorem link

Non-Uniqueness of Solutions in Neural Variational Methods

Andreas Langer

Pith reviewed 2026-05-13 07:30 UTC · model grok-4.3

classification 🧮 math.NA cs.NA

keywords non-uniquenessneural variational methodsDeep Ritz methodweak PINNsill-posed optimizationfinite measurementsvariational regularization

0 comments

The pith

Neural variational methods produce non-unique minimizers because finite linear measurements cannot uniquely determine expressive neural trial functions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an abstract framework showing that variational neural discretizations constrain trial functions only through finitely many linear measurements such as quadrature evaluations or finite test spaces. This structural feature renders the discrete optimization problems ill-posed, with non-unique or degenerate minimizers. The result holds for the Deep Ritz method, neural discretizations of variational regularization functionals, and weak PINNs alike. It occurs independently of whether the underlying continuous variational problem is well-posed. A reader would care because the same finite-measurement setup is common in these widely used approaches.

Core claim

We develop an abstract analytical framework that isolates this finite-information mechanism and extends its applicability beyond strong-form formulations. We apply the framework to three representative variational neural discretizations: the Deep Ritz method, neural network discretizations of variational regularization functionals, and weak PINNs. Despite their differing formulations, these methods constrain the neural trial function only through finitely many linear measurements, such as quadrature evaluations or finite-dimensional test spaces. We show that this structural feature leads to ill-posed discrete optimization problems, manifested by non-uniqueness or degeneracy of minimizers, 0

What carries the argument

Finite-information mechanism: the use of only finitely many linear measurements to constrain sufficiently expressive neural trial spaces.

If this is right

Discrete optimization problems in these methods are ill-posed.
Minimizers are non-unique or degenerate even when the continuous problem is well-posed.
The degeneracy appears across Deep Ritz, variational regularization, and weak PINN formulations.
The issue is independent of the specific differential equation or regularization functional.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Increasing the number of measurement points may shrink but not remove the set of minimizers without also restricting the neural space.
Additional parameter regularization on the network weights could be required to restore uniqueness in practice.
Hybrid discretizations that combine finite measurements with infinite-dimensional constraints on the trial space might avoid the degeneracy.

Load-bearing premise

Neural trial spaces are sufficiently expressive that finite-dimensional linear measurements fail to uniquely determine the trial function.

What would settle it

An explicit example of one of these methods in which the minimizer is provably unique when only finitely many linear measurements are used would falsify the non-uniqueness claim.

read the original abstract

Recent work has shown that strong-form physics-informed neural networks (PINNs) based on pointwise enforcement of differential operators can be ill-posed due to the combination of sufficiently expressive neural network trial spaces with finitely many measurements. In this work, we develop an abstract analytical framework that isolates this finite-information mechanism and extends its applicability beyond strong-form formulations. We apply the framework to three representative variational neural discretizations: the Deep Ritz method, neural network discretizations of variational regularization functionals, and weak PINNs. Despite their differing formulations, these methods constrain the neural trial function only through finitely many linear measurements, such as quadrature evaluations or finite-dimensional test spaces. We show that this structural feature leads to ill-posed discrete optimization problems, manifested by non-uniqueness or degeneracy of minimizers, independently of the well-posedness of the underlying continuous variational problem.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows that Deep Ritz, neural variational regularization, and weak PINNs all inherit non-uniqueness from using only finitely many linear measurements, extending the strong-form PINN issue in a clean abstract way.

read the letter

The main thing to know is that this paper isolates a structural non-uniqueness in three variational neural methods that comes directly from constraining the trial function through only finitely many linear measurements, such as quadrature points or finite test spaces. This holds even when the underlying continuous variational problem is well-posed, and it produces ill-posed discrete optimization problems with non-unique or degenerate minimizers. The argument extends the earlier strong-form PINN results without relying on the specific form of the differential operator. What the paper does well is build a simple abstract framework around the measurement map and show that positive-dimensional fibers in that map force every loss value to be attained by a continuum of distinct functions. The logic is direct and avoids circularity by focusing on the finite-information property rather than on fitted parameters or self-referential claims. The independence from coercivity or uniqueness in the continuous problem is stated clearly and follows from the setup. Soft spots are limited. The framework applies the same reasoning to the three methods, but the details of how quadrature or test-space choices interact with the neural trial space would need verification in the full proofs to rule out any special cases where uniqueness accidentally holds. No load-bearing contradictions appear in the abstract chain. This work is aimed at people building or analyzing neural discretizations for variational problems and PDEs. It functions as a cautionary theoretical note rather than a new solver. Readers who care about the foundations of these methods would get direct value from it. I would send it for peer review; the core observation is worth checking in detail even if the proofs need tightening.

Referee Report

2 major / 2 minor

Summary. The paper develops an abstract analytical framework showing that three classes of neural variational methods—the Deep Ritz method, neural discretizations of variational regularization functionals, and weak PINNs—yield ill-posed discrete optimization problems. The core mechanism is that each method constrains the neural trial function only through finitely many linear measurements (quadrature evaluations or pairings against a finite test space). Consequently, the discrete loss is constant on positive-dimensional fibers of the measurement map, producing non-unique or degenerate minimizers independently of whether the underlying continuous variational problem is well-posed.

Significance. If the framework holds, the result supplies a structural explanation for observed non-uniqueness and instability in neural variational solvers that is independent of coercivity or uniqueness in the continuous limit. The generality across strong-form, weak-form, and energy-based formulations is a useful organizing principle that could inform the design of measurement-augmented or regularized training procedures.

major comments (2)

[§3.2] §3.2, Definition 3.1: the measurement map M is stated to be finite-rank for all three methods, but the argument that its fibers have positive dimension relies on the neural trial space being sufficiently expressive; the manuscript should supply a precise statement of the minimal expressivity assumption (e.g., density in a Sobolev space or surjectivity onto a finite-dimensional subspace) that guarantees dim(ker M) > 0.
[§4.1] §4.1, Theorem 4.3: the proof that every loss value is attained by a continuum of distinct functions is given in function space, yet the actual optimization is performed over network parameters θ. The manuscript does not address whether the parametrization map θ ↦ u_θ is surjective onto the relevant fiber; without this, non-uniqueness in function space need not imply multiple distinct parameter vectors attaining the same loss.

minor comments (2)

[§2] Notation for the finite test space V_h is introduced in §2 but reused without redefinition in §5; a single consolidated notation table would improve readability.
[Figure 2] Figure 2 caption states that the loss surface is plotted 'over a two-dimensional slice'; the axes labels and the precise parametrization of the slice should be stated explicitly.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment and constructive feedback. The comments highlight important points regarding the assumptions and the distinction between function space and parameter space, which we will clarify in the revision.

read point-by-point responses

Referee: [§3.2] §3.2, Definition 3.1: the measurement map M is stated to be finite-rank for all three methods, but the argument that its fibers have positive dimension relies on the neural trial space being sufficiently expressive; the manuscript should supply a precise statement of the minimal expressivity assumption (e.g., density in a Sobolev space or surjectivity onto a finite-dimensional subspace) that guarantees dim(ker M) > 0.

Authors: We agree with this observation. The current manuscript implicitly assumes sufficient expressivity of the neural networks to ensure that the measurement map has a non-trivial kernel. In the revised version, we will introduce a precise minimal expressivity assumption (e.g., that the neural trial functions are dense in the Sobolev space or that their linear span intersects the kernel of M non-trivially) immediately following Definition 3.1. This will make the condition for positive-dimensional fibers explicit. revision: yes
Referee: [§4.1] §4.1, Theorem 4.3: the proof that every loss value is attained by a continuum of distinct functions is given in function space, yet the actual optimization is performed over network parameters θ. The manuscript does not address whether the parametrization map θ ↦ u_θ is surjective onto the relevant fiber; without this, non-uniqueness in function space need not imply multiple distinct parameter vectors attaining the same loss.

Authors: The referee correctly identifies a gap between the function-space analysis and the parametric optimization. Theorem 4.3 shows non-uniqueness in the infinite-dimensional function space. To bridge this, we will add a discussion in §4.1 noting that, by the universal approximation property of neural networks, for sufficiently wide networks the parametrization map is dense in the function space. Consequently, the preimage of each fiber under the parametrization is non-empty and typically contains a continuum of parameters, leading to multiple distinct θ with the same loss value. We will also reference relevant results on the geometry of neural network loss landscapes. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The derivation isolates the finite-measurement property of the discrete loss (quadrature or finite test-space pairings) as the source of non-uniqueness in the optimization problem. This follows directly from the definitions of the trial space and loss functional without any reduction to fitted parameters, self-referential citations, or ansatzes smuggled from prior work. The argument is self-contained: any loss value attained by one trial function is attained by a positive-dimensional fiber of functions under the measurement map, independently of continuous coercivity. No load-bearing step collapses by construction to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that neural trial spaces remain sufficiently expressive relative to any finite set of linear measurements; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption Neural trial functions are sufficiently expressive that finite linear measurements do not uniquely determine the function.
Invoked to explain why finite constraints produce non-uniqueness or degeneracy.

pith-pipeline@v0.9.0 · 5431 in / 1155 out tokens · 38825 ms · 2026-05-13T07:30:13.315249+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages

[1]

S. Alliney. A property of the minimum vectors of a regularizing functional defined by means of the absolute norm. IEEE Transactions on Signal Processing , 45(4):913–917, 1997

work page 1997
[2]

Arora, A

R. Arora, A. Basu, P. Mianjy, and A. Mukherjee. Understanding deep neural networks with rectified linear units. arXiv preprint arXiv:1611.01491 , 2016

work page arXiv 2016
[3]

Bergounioux and L

M. Bergounioux and L. Piffet. A second-order model for image denoising. Set-Valued and Varia- tional Analysis, 18(3):277–306, 2010

work page 2010
[4]

Berrone, C

S. Berrone, C. Canuto, and M. Pintore. Variational physics informed neural networks: the role of quadratures and test functions. Journal of Scientific Computing , 92(3):100, 2022

work page 2022
[5]

Chambolle, V

A. Chambolle, V. Caselles, D. Cremers, M. Novaga, and T. Pock. An introduction to total variation for image analysis. Theoretical foundations and numerical methods for sparse recovery , 9:263–340, 2010

work page 2010
[6]

T. Chan, A. Marquina, and P. Mulet. High-order total variation-based image restoration. SIAM Journal on Scientific Computing , 22(2):503–516, 2000

work page 2000
[7]

T. F. Chan, S. H. Kang, and J. Shen. Euler’s elastica and curvature-based inpainting. SIAM J. Appl. Math., 63(2):564–592, 2002

work page 2002
[8]

Chartrand

R. Chartrand. Exact reconstruction of sparse signals via nonconvex minimization. IEEE Signal Processing Letters, 14(10):707–710, 2007

work page 2007
[9]

Courte and M

L. Courte and M. Zeinhofer. Robin pre-training for the deep Ritz method. Proceedings of the Northern Lights Deep Learning Workshop , 4, Jan. 2023

work page 2023
[10]

De Ryck, S

T. De Ryck, S. Mishra, and R. Molinaro. wPINNs: Weak physics informed neural networks for approximating entropy solutions of hyperbolic conservation laws. SIAM Journal on Numerical Analysis, 62(2):811–841, 2024

work page 2024
[11]

W. E and B. Yu. The Deep Ritz Method: A deep learning-based numerical algorithm for solving variational problems. Communications in Mathematics and Statistics , 6(1):1–12, 2018

work page 2018
[12]

Fuks and H

O. Fuks and H. A. Tchelepi. Limitations of physics informed machine learning for nonlinear two-phase transport in porous media. Journal of Machine Learning for Modeling and Computing , 1(1), 2020

work page 2020
[13]

Hinterberger and O

W. Hinterberger and O. Scherzer. Variational methods on the space of functions of bounded Hessian for convexification and denoising. Computing, 76, 2006. 18 A. Langer

work page 2006
[14]

Hinterm¨ uller and A

M. Hinterm¨ uller and A. Langer. Subspace correction methods for a class of nonsmooth and nonadditive convex variational problems with mixed L1/L2 data-fidelity in image processing. SIAM Journal on Imaging Sciences , 6(4):2134–2173, 2013

work page 2013
[15]

Khodayi-Mehr and M

R. Khodayi-Mehr and M. Zavlanos. VarNet: Variational neural networks for the solution of partial differential equations. In A. M. Bayen, A. Jadbabaie, G. Pappas, P. A. Parrilo, B. Recht, C. Tomlin, and M. Zeilinger, editors, Proceedings of the 2nd Conference on Learning for Dynamics and Control, volume 120 of Proceedings of Machine Learning Research, page...

work page 2020
[16]

Krishnapriyan, A

A. Krishnapriyan, A. Gholami, S. Zhe, R. Kirby, and M. W. Mahoney. Characterizing possible failure modes in physics-informed neural networks. Advances in Neural Information Processing Systems, 34:26548–26560, 2021

work page 2021
[17]

I. E. Lagaris, A. Likas, and D. I. Fotiadis. Artificial neural networks for solving ordinary and partial differential equations. IEEE Transactions on Neural Networks , 9(5):987–1000, 1998

work page 1998
[18]

Lai, X.-C

R. Lai, X.-C. Tai, and T. F. Chan. A ridge and corner preserving model for surface restoration. SIAM Journal on Scientific Computing , 35(2):A675–A695, 2013

work page 2013
[19]

A. Langer. Automated parameter selection in the L1-L2-TV model for removing Gaussian plus impulse noise. Inverse Problems, 33(7):074002, 2017

work page 2017
[20]

A. Langer. The ill-posed foundations of physics-informed neural networks and their finite- difference variants. arXiv preprint arXiv:2601.07017 , 2026

work page arXiv 2026
[21]

Langer and S

A. Langer and S. Behnamian. DeepTV: A neural network approach for total variation minimiza- tion. arXiv preprint arXiv:2409.05569 , 2024

work page arXiv 2024
[22]

K. L. Lim, R. Dutta, and M. Rotaru. Physics informed neural network using finite difference method. In 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC) , pages 1828–1833. IEEE, 2022

work page 2022
[23]

Lysaker, A

M. Lysaker, A. Lundervold, and X.-C. Tai. Noise removal using fourth-order partial differen- tial equation with applications to medical magnetic resonance images in space and time. IEEE Transactions on Image Processing, 12(12):1579–1590, 2003

work page 2003
[24]

Lysaker and X.-C

M. Lysaker and X.-C. Tai. Iterative image restoration combining total variation minimization and a second-order functional. International Journal of Computer Vision , 66(1):5–18, 2006

work page 2006
[25]

Nikolova

M. Nikolova. Minimizers of cost-functions involving nonsmooth data-fidelity terms. Application to the processing of outliers. SIAM Journal on Numerical Analysis , 40(3):965–994 (electronic), 2002

work page 2002
[26]

Nikolova

M. Nikolova. A variational approach to remove outliers and impulse noise.Journal of Mathematical Imaging and Vision , 20(1-2):99–120, 2004

work page 2004
[27]

Nikolova, M

M. Nikolova, M. K. Ng, S. Zhang, and W.-K. Ching. Efficient reconstruction of piecewise constant images using nonsmooth nonconvex minimization. SIAM J. Imaging Sci. , 1(1):2–25, 2008

work page 2008
[28]

Papafitsoros and C

K. Papafitsoros and C. B. Sch¨ onlieb. A combined first and second order variational approach for image reconstruction. J. Math. Imaging Vision , 48(2):308–338, 2014

work page 2014
[29]

A. Pinkus. Approximation theory of the MLP model in neural networks. Acta Numerica, 8:143– 195, 1999

work page 1999
[30]

M. Raissi. Deep hidden physics models: Deep learning of nonlinear partial differential equations. Journal of Machine Learning Research , 19(25):1–24, 2018. 19 A. Langer

work page 2018
[31]

Raissi, P

M. Raissi, P. Perdikaris, and G. E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics , 378:686–707, 2019

work page 2019
[32]

J. A. Rivera, J. M. Taylor, ´A. J. Omella, and D. Pardo. On quadrature rules for solving par- tial differential equations using neural networks. Computer Methods in Applied Mechanics and Engineering, 393:114710, 2022

work page 2022
[33]

L. I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena , 60(1):259–268, 1992

work page 1992
[34]

Scherzer

O. Scherzer. Denoising with higher order derivatives of bounded variation and an application to parameter estimation. Computing, 60(1):1–27, 1998

work page 1998
[35]

X.-C. Tai, J. Hahn, and G. J. Chung. A fast algorithm for Euler’s elastica model using augmented Lagrangian method. SIAM J. Imaging Sci. , 4(1):313–344, 2011

work page 2011
[36]

S. Wang, X. Yu, and P. Perdikaris. When and why PINNs fail to train: A neural tangent kernel perspective. Journal of Computational Physics , 449:110768, 2022. 20

work page 2022