Sampling Distributions as Regularization in Learned Inverse Problems

Jodi Mead; Sandra R. Babyale

arxiv: 2605.25177 · v1 · pith:YQXQLGQNnew · submitted 2026-05-24 · 🧮 math.NA · cs.NA

Sampling Distributions as Regularization in Learned Inverse Problems

Sandra R. Babyale , Jodi Mead This is my paper

Pith reviewed 2026-06-29 23:30 UTC · model grok-4.3

classification 🧮 math.NA cs.NA

keywords inverse problemsneural networksregularizationsampling distributionsynthetic dataphysics-informed neural networksFredholm integral equationseismic tomography

0 comments

The pith

Sampling parameters from the forward model to create synthetic training data defines an implicit regularization operator on the learned inverse map.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

When neural networks are trained to solve inverse problems using synthetic data, the choice of how to sample unknown parameters from the forward model is not neutral. The network minimizes empirical risk on those pairs, and the classical fact that conditional expectation minimizes mean-square error implies that the learned operator converges to a regularized solution whose regularization is fixed by the sampling distribution. This holds in both the infinite-data limit and finite-data settings, including for physics-informed networks. Numerical experiments on a linear Fredholm equation, nonlinear subsurface inversion, and 2D seismic tomography show three distinct regularization sources—prior sampling, architecture, and physics residuals—and demonstrate that a mismatched sampling distribution degrades accuracy in ways the other two cannot fully correct.

Core claim

The learned inverse operator obtained by empirical-risk minimization on pairs generated by sampling parameters from the forward model converges to the conditional expectation of the parameters given the observations; this conditional expectation is itself an implicit regularization operator whose form is determined by the sampling measure. The same mechanism applies to physics-informed neural networks. In finite samples the operator remains influenced by the sampling distribution, and mismatched sampling produces reconstruction errors that neither richer architectures nor additional physics residuals can eliminate.

What carries the argument

The implicit regularization operator induced by the sampling distribution used to generate training pairs (synthetic observations, true parameters) from the forward model.

If this is right

In the infinite-data limit the learned map equals the conditional expectation under the chosen sampling measure.
Finite-data training still inherits bias from the sampling distribution in addition to architectural and physics-informed effects.
A mismatched sampling distribution produces persistent reconstruction artifacts that cannot be removed by increasing network capacity or adding physics residuals alone.
The sampling distribution must be designed with the same attention given to a classical regularization functional.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

One could deliberately engineer sampling distributions to realize target regularization operators that are difficult to express as explicit penalties.
The same sampling-induced regularization may appear in any learned inverse method that generates its own training pairs from a forward simulator, not only neural networks.
In practice, validation on held-out synthetic or real data could be used to tune the sampling distribution itself rather than only the network weights.

Load-bearing premise

The neural inverse operator is trained by minimizing empirical risk on input-output pairs that are generated by sampling parameters from the forward model.

What would settle it

Train the same network architecture on the same three test problems but replace the original sampling distribution with one that differs only in its moments; if the reconstruction error does not change in the manner predicted by the conditional-expectation formula while architecture and physics terms are held fixed, the claim is falsified.

Figures

Figures reproduced from arXiv: 2605.25177 by Jodi Mead, Sandra R. Babyale.

**Figure 2.** Figure 2: Wing problem: best-performing architecture per prior family under the [PITH_FULL_IMAGE:figures/full_fig_p020_2.png] view at source ↗

**Figure 3.** Figure 3: Wing problem: best-performing NN (solid curves, lowest [PITH_FULL_IMAGE:figures/full_fig_p021_3.png] view at source ↗

**Figure 4.** Figure 4: Subsurface interface inversion: best-performing architecture per prior family [PITH_FULL_IMAGE:figures/full_fig_p022_4.png] view at source ↗

**Figure 5.** Figure 5: Subsurface interface inversion: effect of covariance structure on reconstruction [PITH_FULL_IMAGE:figures/full_fig_p024_5.png] view at source ↗

**Figure 6.** Figure 6: Subsurface interface inversion: best-performing NN (solid curves, lowest [PITH_FULL_IMAGE:figures/full_fig_p025_6.png] view at source ↗

**Figure 7.** Figure 7: Cross-well seismic tomography: (a) true velocity field and Gauss-Newton [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗

**Figure 8.** Figure 8: Cross-well seismic tomography: best-performing architecture per prior family [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗

**Figure 9.** Figure 9: Cross-well seismic tomography: effect of covariance structure on reconstruction [PITH_FULL_IMAGE:figures/full_fig_p028_9.png] view at source ↗

read the original abstract

Neural networks have emerged as effective tools for solving ill-posed inverse problems. In many scientific applications, however, observational training data are insufficient, and learned inverse operators must instead be trained on synthetic data generated from the forward model. This requires specifying unknown parameters in the forward model and solving the model to generate synthetic observations. Typically, the unknown parameters are sampled from a prescribed probability distribution. Here, we show that this sampling strategy is not a neutral preprocessing step, but instead defines an implicit regularization operator. This result follows from the fact that the learned inverse operator minimizes empirical risk together with the classical result that conditional expectation minimizes mean-square error. We present theoretical results for the implicit regularization operator in both infinite- and finite-data settings, including Physics Informed Neural Networks (PINNs). These results are demonstrated numerically on three inverse problems of increasing complexity: a 1D linear Fredholm integral equation, a 1D nonlinear subsurface interface inversion, and a 2D nonlinear cross-well seismic traveltime tomography problem. Across all three problems, three distinct sources of regularization are identified in the learned operator: prior sampling, architectural, and physics-informed regularization. A mismatched sampling distribution is shown to degrade reconstruction quality in ways that neither more expressive architectures nor augmented physics residuals can fully correct. The results demonstrate that the sampling distribution should be chosen with the same care as a classical regularization functional and provide a practical framework for implementing more sophisticated regularization operators using neural networks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Sampling distributions act as implicit regularization via conditional expectation in learned inverses, with solid numerics on three problems but an open question on the PINN case.

read the letter

The main result is that the distribution used to sample parameters when generating synthetic training pairs is not neutral: it defines an implicit regularization operator because the trained network approximates the conditional expectation of the parameters given the observations. This follows from standard empirical risk minimization plus the classical property that conditional expectation minimizes mean-square error.

The paper does a clean job laying out the theory for both infinite- and finite-data settings and then testing it on three inverse problems that increase in complexity: a linear Fredholm integral equation, a nonlinear subsurface interface problem, and 2D cross-well seismic tomography. The experiments separate prior-sampling regularization from architectural and physics-informed contributions, and they show that a mismatched sampling distribution degrades results in ways that neither richer networks nor extra physics residuals fully repair.

The soft spot is the PINN extension. The abstract includes PINNs in the theoretical claims, yet standard PINN training minimizes a physics residual over collocation points rather than a direct supervised term on sampled parameter-observation pairs. If the manuscript does not supply a specific argument showing why the conditional-expectation equivalence still holds under that objective, the PINN part rests on an unverified step. The numerical results themselves look careful and the citation pattern is appropriate.

This is for people who train neural solvers for ill-posed inverse problems when real data are scarce. A reader working on training-data design or regularization in scientific machine learning will find the perspective and the three-problem comparison useful. The work is coherent on its own terms and grounded enough to merit a serious referee.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that the choice of sampling distribution for generating synthetic training pairs (observations, parameters) from a forward model is not neutral but defines an implicit regularization operator on the learned inverse. This follows because the neural network minimizes empirical risk (MSE) on those pairs and the conditional expectation E[parameter | observation] under the sampling measure is the unique minimizer of population MSE. Theoretical results are derived for both infinite- and finite-data regimes and are stated to extend to Physics-Informed Neural Networks; the claim is illustrated on three inverse problems of increasing complexity, with the sampling distribution, network architecture, and physics residual each identified as distinct regularization sources.

Significance. If the central equivalence holds, the result is significant because it supplies a precise statistical interpretation of a ubiquitous but rarely analyzed preprocessing choice in learned inverse problems, elevating parameter sampling to the same status as an explicit regularization functional. The explicit identification of three regularization mechanisms and the demonstration that a mismatched prior cannot be fully compensated by architecture or physics terms are practically useful. The paper receives credit for stating results in both population and finite-sample settings and for including PINNs within the theoretical scope.

major comments (2)

[theoretical results for PINNs] The section presenting theoretical results for PINNs: the central derivation equates the learned operator to the conditional expectation because the network minimizes empirical MSE on (observation, parameter) pairs. Standard PINN training, however, minimizes a physics-residual loss over collocation points (plus optional data misfit) without an explicit supervised term ||NN(obs) - true_param||^2. The manuscript must supply the additional argument showing that the residual objective induces the same population risk; without it the inclusion of PINNs in the theoretical claims is not supported.
[numerical experiments] Numerical experiments on the three inverse problems (1D Fredholm, 1D subsurface, 2D cross-well tomography): the claim that mismatched sampling degrades reconstructions in ways that neither more expressive architectures nor augmented physics residuals can fully correct is load-bearing for the practical recommendation. The experiments must demonstrate that the architecture and physics variants were varied while holding the sampling distribution fixed at the mismatched choice; otherwise the separation of the three regularization sources is not cleanly established.

minor comments (2)

[theoretical development] Notation for the sampling distribution and the induced measure on observations should be introduced once and used consistently; the current alternation between p( heta) and the push-forward measure occasionally obscures whether statements refer to the parameter prior or the induced observation distribution.
[figures] Figure captions for the reconstruction-error plots should state the precise sampling distributions used in each panel so that the mismatch experiments can be reproduced without consulting the main text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and constructive comments, which help clarify the scope of our theoretical claims and the presentation of the experiments. We respond to each major comment below.

read point-by-point responses

Referee: [theoretical results for PINNs] The section presenting theoretical results for PINNs: the central derivation equates the learned operator to the conditional expectation because the network minimizes empirical MSE on (observation, parameter) pairs. Standard PINN training, however, minimizes a physics-residual loss over collocation points (plus optional data misfit) without an explicit supervised term ||NN(obs) - true_param||^2. The manuscript must supply the additional argument showing that the residual objective induces the same population risk; without it the inclusion of PINNs in the theoretical claims is not supported.

Authors: We agree that the core derivation relies on minimization of the supervised empirical MSE, which yields the conditional expectation. The manuscript states that the results extend to PINNs, but does not supply a separate argument showing that a pure residual loss induces equivalent population risk. In revision we will either restrict the theoretical statements to supervised settings or add the required argument, for example by considering PINN formulations that incorporate a parameter data-misfit term or by analyzing the residual objective in the limit of the inverse problem. We will include this clarification as a new subsection. revision: yes
Referee: [numerical experiments] Numerical experiments on the three inverse problems (1D Fredholm, 1D subsurface, 2D cross-well tomography): the claim that mismatched sampling degrades reconstructions in ways that neither more expressive architectures nor augmented physics residuals can fully correct is load-bearing for the practical recommendation. The experiments must demonstrate that the architecture and physics variants were varied while holding the sampling distribution fixed at the mismatched choice; otherwise the separation of the three regularization sources is not cleanly established.

Authors: The experiments were performed by holding the sampling distribution fixed at each mismatched choice while separately varying architecture (depth, width, activations) and physics residuals (added or removed terms, different collocation densities). These controlled comparisons appear in the result figures for all three problems. To make the isolation of regularization sources explicit, we will add direct statements in the text and figure captions confirming that sampling remains fixed in the architecture and physics variants. This will strengthen the presentation without altering the experimental design. revision: partial

Circularity Check

0 steps flagged

No circularity; derivation applies standard conditional-expectation property to ERM objective

full rationale

The paper's central claim states that sampling defines implicit regularization because the learned operator minimizes empirical risk and conditional expectation minimizes MSE. This is a direct invocation of a classical, externally verifiable result (not derived within the paper) applied to the supervised training objective on synthetic pairs. No step reduces a prediction to a fitted input by construction, renames a known result, or relies on a load-bearing self-citation chain. The extension to PINNs is asserted under the paper's stated assumptions without exhibiting a self-referential reduction. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The derivation rests on one standard mathematical axiom and one modeling assumption about training; the sampling distribution itself is the main free choice that determines the regularization strength.

free parameters (1)

parameter sampling distribution
The probability distribution from which forward-model parameters are drawn is chosen by the user and directly sets the implicit regularization operator.

axioms (1)

standard math conditional expectation minimizes mean-square error
Invoked to identify the implicit regularization operator that the learned inverse map converges to.

pith-pipeline@v0.9.1-grok · 5785 in / 1342 out tokens · 41926 ms · 2026-06-29T23:30:47.865556+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 1 canonical work pages

[1]

Andrew M. Stuart. Inverse problems: A bayesian perspective.Acta Numerica, 19:451–559, 2010

2010
[2]

Masoumeh Dashti and Andrew M. Stuart. The bayesian approach to inverse problems. In Roger Ghanem, Dave Higdon, and Houman Owhadi, editors,Handbook of Uncertainty Quantification, pages 311–428. Springer, 2017

2017
[3]

Solving inverse problems using data-driven models.Acta Numerica, 28:1–174, 2019

Simon Arridge, Peter Maass, Ozan ¨Oktem, and Carola-Bibiane Sch¨ onlieb. Solving inverse problems using data-driven models.Acta Numerica, 28:1–174, 2019

2019
[4]

Learned primal-dual reconstruction.IEEE Transactions on Medical Imaging, 37(6):1322–1332, 2018

Jonas Adler and Ozan ¨Oktem. Learned primal-dual reconstruction.IEEE Transactions on Medical Imaging, 37(6):1322–1332, 2018

2018
[5]

Wainwright.High-Dimensional Statistics: A Non-Asymptotic Viewpoint

Martin J. Wainwright.High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge University Press, 2019

2019
[6]

Andreas Hauptmann, Simon Arridge, Felix Lucka, Vivek Muthurangu, and Jennifer A. Steeden. Real-time cardiovascular mri reconstruction using deep learning.Magnetic Resonance in Medicine, 79(3):1374–1383, 2018

2018
[7]

Stuart, and Anima Anandkumar

Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew M. Stuart, and Anima Anandkumar. Neural operator: Learning maps between function spaces.Journal of Machine Learning Research, 24(89):1–97, 2023

2023
[8]

Ashish Bora, Ajil Jalal, Eric Price, and Alexandros G. Dimakis. Compressed sensing using generative models. InProceedings of the 34th International Conference on Machine Learning (ICML), pages 537–546, 2017

2017
[9]

Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole

Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. International Conference on Learning Representations (ICLR), 2021

2021
[10]

Mead and R.A

J.L. Mead and R.A. Renaut. Least squares problems with inequality constraints as quadratic constraints.Linear Algebra and Its Applications, 432(8):1936–1949, 2010

1936
[11]

On the well-posedness of bayesian inverse problems.SIAM/ASA Journal on Uncertainty Quantification, 8(1):451–482, 2020

Jonas Latz. On the well-posedness of bayesian inverse problems.SIAM/ASA Journal on Uncertainty Quantification, 8(1):451–482, 2020

2020
[12]

Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.Journal of Computational Physics, 378:686–707, 2019

2019
[13]

Physics-informed neural networks to solve inverse problems in unbounded domains, 2025

Gregorio P´ erez-Bernal, Oscar Rinc´ on-Carde˜ no, Silvana Montoya-Noguera, and Nicol´ as Guar´ ın-Zapata. Physics-informed neural networks to solve inverse problems in unbounded domains, 2025

2025
[14]

Adams, and George Em Karniadakis

Ameya Dilip Jagtap, Zhiping Mao, Nikolaus A. Adams, and George Em Karniadakis. Physics-informed neural networks for inverse problems in supersonic flows.ArXiv, abs/2202.11821, 2022

work page arXiv 2022
[15]

Springer Series in Statistics

Trevor Hastie, Robert Tibshirani, and Jerome Friedman.The Elements of Statistical Learning. Springer Series in Statistics. Springer New York Inc., New York, NY, USA, 2001

2001
[16]

Deep learning techniques for inverse problems in imaging.IEEE Journal on Selected Areas in Information Theory, 1(1):39–56, 2020

Gregory Ongie, Ajil Jalal, Christopher A Metzler, Richard G Baraniuk, Alexandros G Dimakis, and Rebecca Willett. Deep learning techniques for inverse problems in imaging.IEEE Journal on Selected Areas in Information Theory, 1(1):39–56, 2020

2020
[17]

Ghosh, and Aad W

Subhashis Ghosal, Jayanta K. Ghosh, and Aad W. van der Vaart. Convergence rates of posterior distributions.Annals of Statistics, 28(2):500–531, 2000

2000
[18]

SIAM, 2005

Albert Tarantola.Inverse problem theory and methods for model parameter estimation. SIAM, 2005

2005
[19]

Elsevier, 2018

Richard C Aster, Brian Borchers, and Clifford H Thurber.Parameter estimation and inverse problems. Elsevier, 2018

2018

[1] [1]

Andrew M. Stuart. Inverse problems: A bayesian perspective.Acta Numerica, 19:451–559, 2010

2010

[2] [2]

Masoumeh Dashti and Andrew M. Stuart. The bayesian approach to inverse problems. In Roger Ghanem, Dave Higdon, and Houman Owhadi, editors,Handbook of Uncertainty Quantification, pages 311–428. Springer, 2017

2017

[3] [3]

Solving inverse problems using data-driven models.Acta Numerica, 28:1–174, 2019

Simon Arridge, Peter Maass, Ozan ¨Oktem, and Carola-Bibiane Sch¨ onlieb. Solving inverse problems using data-driven models.Acta Numerica, 28:1–174, 2019

2019

[4] [4]

Learned primal-dual reconstruction.IEEE Transactions on Medical Imaging, 37(6):1322–1332, 2018

Jonas Adler and Ozan ¨Oktem. Learned primal-dual reconstruction.IEEE Transactions on Medical Imaging, 37(6):1322–1332, 2018

2018

[5] [5]

Wainwright.High-Dimensional Statistics: A Non-Asymptotic Viewpoint

Martin J. Wainwright.High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge University Press, 2019

2019

[6] [6]

Andreas Hauptmann, Simon Arridge, Felix Lucka, Vivek Muthurangu, and Jennifer A. Steeden. Real-time cardiovascular mri reconstruction using deep learning.Magnetic Resonance in Medicine, 79(3):1374–1383, 2018

2018

[7] [7]

Stuart, and Anima Anandkumar

Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew M. Stuart, and Anima Anandkumar. Neural operator: Learning maps between function spaces.Journal of Machine Learning Research, 24(89):1–97, 2023

2023

[8] [8]

Ashish Bora, Ajil Jalal, Eric Price, and Alexandros G. Dimakis. Compressed sensing using generative models. InProceedings of the 34th International Conference on Machine Learning (ICML), pages 537–546, 2017

2017

[9] [9]

Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole

Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. International Conference on Learning Representations (ICLR), 2021

2021

[10] [10]

Mead and R.A

J.L. Mead and R.A. Renaut. Least squares problems with inequality constraints as quadratic constraints.Linear Algebra and Its Applications, 432(8):1936–1949, 2010

1936

[11] [11]

On the well-posedness of bayesian inverse problems.SIAM/ASA Journal on Uncertainty Quantification, 8(1):451–482, 2020

Jonas Latz. On the well-posedness of bayesian inverse problems.SIAM/ASA Journal on Uncertainty Quantification, 8(1):451–482, 2020

2020

[12] [12]

Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.Journal of Computational Physics, 378:686–707, 2019

2019

[13] [13]

Physics-informed neural networks to solve inverse problems in unbounded domains, 2025

Gregorio P´ erez-Bernal, Oscar Rinc´ on-Carde˜ no, Silvana Montoya-Noguera, and Nicol´ as Guar´ ın-Zapata. Physics-informed neural networks to solve inverse problems in unbounded domains, 2025

2025

[14] [14]

Adams, and George Em Karniadakis

Ameya Dilip Jagtap, Zhiping Mao, Nikolaus A. Adams, and George Em Karniadakis. Physics-informed neural networks for inverse problems in supersonic flows.ArXiv, abs/2202.11821, 2022

work page arXiv 2022

[15] [15]

Springer Series in Statistics

Trevor Hastie, Robert Tibshirani, and Jerome Friedman.The Elements of Statistical Learning. Springer Series in Statistics. Springer New York Inc., New York, NY, USA, 2001

2001

[16] [16]

Deep learning techniques for inverse problems in imaging.IEEE Journal on Selected Areas in Information Theory, 1(1):39–56, 2020

Gregory Ongie, Ajil Jalal, Christopher A Metzler, Richard G Baraniuk, Alexandros G Dimakis, and Rebecca Willett. Deep learning techniques for inverse problems in imaging.IEEE Journal on Selected Areas in Information Theory, 1(1):39–56, 2020

2020

[17] [17]

Ghosh, and Aad W

Subhashis Ghosal, Jayanta K. Ghosh, and Aad W. van der Vaart. Convergence rates of posterior distributions.Annals of Statistics, 28(2):500–531, 2000

2000

[18] [18]

SIAM, 2005

Albert Tarantola.Inverse problem theory and methods for model parameter estimation. SIAM, 2005

2005

[19] [19]

Elsevier, 2018

Richard C Aster, Brian Borchers, and Clifford H Thurber.Parameter estimation and inverse problems. Elsevier, 2018

2018