Recognition: 2 theorem links
· Lean TheoremAdjoint-Compatible Surrogates of the Expected Information Gain for Optimal Experimental Design
Pith reviewed 2026-05-14 22:29 UTC · model grok-4.3
The pith
Surrogates of the expected information gain enable adjoint-based optimization for Bayesian experimental design.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors introduce adjoint-compatible surrogates of the EIG based on an exact chain-rule decomposition and tractable approximations of the posterior distribution of the unknown parameter. This leads to two surrogate criteria: an instantaneous surrogate obtained by replacing the posterior with the prior and a Gaussian tilting surrogate obtained by reweighting the prior through a design-driven quadratic information factor. They also propose a multi-center tilting surrogate to improve robustness for complex or multimodal priors. Theoretical properties include exactness of the Gaussian tilting surrogate in the linear-Gaussian setting, and the surrogates are shown to be competitive in nearly-G
What carries the argument
Adjoint-compatible EIG surrogates constructed via chain-rule decomposition and posterior approximations including prior replacement and Gaussian tilting.
If this is right
- The surrogates yield time-additive objectives that are compatible with adjoint-based optimal control methods.
- The Gaussian tilting surrogate is exactly equal to the true EIG in linear-Gaussian settings.
- The proposed surrogates remain competitive with full EIG in nearly Gaussian regimes.
- They provide clearer benefits over Fisher-based designs when the prior uncertainty is non-Gaussian or multimodal.
- The multi-center tilting surrogate enhances robustness for complex or multimodal priors.
Where Pith is reading between the lines
- These surrogates could facilitate the application of information-theoretic design criteria to larger-scale problems where full EIG evaluation is prohibitive.
- Similar approximation strategies might be developed for other information measures in optimal design beyond the EIG.
- Extension to systems with stochastic dynamics or partial observations could broaden the method's applicability.
- Validation through closed-loop experiments would test if the surrogate-optimized designs indeed yield higher information in practice.
Load-bearing premise
The approximations to the posterior distribution used in the surrogates are sufficiently accurate to retain the advantages of the full expected information gain over Fisher-based criteria especially under strong nonlinearities or non-Gaussian priors.
What would settle it
Computation of the true EIG for a surrogate-optimized design in a strongly nonlinear system with multimodal prior, showing that it underperforms a Fisher-optimized design in terms of actual information gain.
Figures
read the original abstract
We consider optimal experimental design for parameter estimation in dynamical systems governed by controlled ordinary differential equations. In such problems, Fisher-based criteria are attractive because they lead to time-additive objectives compatible with adjoint-based optimal control, but they remain intrinsically local and may perform poorly under strong nonlinearities or non-Gaussian prior uncertainty. By contrast, the expected information gain (EIG) provides a principled Bayesian objective, yet it is typically too costly to evaluate and does not naturally admit an adjoint-compatible formulation. In this work, we introduce adjoint-compatible surrogates of the EIG based on an exact chain-rule decomposition and tractable approximations of the posterior distribution of the unknown parameter. This leads to two surrogate criteria: an instantaneous surrogate, obtained by replacing the posterior with the prior, and a Gaussian tilting surrogate, obtained by reweighting the prior through a design-driven quadratic information factor. We also propose a multi-center tilting surrogate to improve robustness for complex or multimodal priors. We establish theoretical properties of these surrogates, including exactness of the Gaussian tilting surrogate in the linear-Gaussian setting, and illustrate their behavior on benchmark controlled dynamical systems. The results show that the proposed surrogates remain competitive in nearly Gaussian regimes and provide clearer benefits over Fisher-based designs when prior uncertainty is non-Gaussian or multimodal.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes adjoint-compatible surrogate criteria for the expected information gain (EIG) in optimal experimental design for parameter estimation in controlled ODE systems. The surrogates are constructed from an exact chain-rule decomposition of the EIG together with tractable posterior approximations: an instantaneous surrogate that replaces the posterior by the prior, a Gaussian tilting surrogate that reweights the prior by a design-dependent quadratic information factor, and a multi-center tilting variant for multimodal priors. Exactness of the Gaussian tilting surrogate is proven in the linear-Gaussian setting; numerical results on benchmark dynamical systems are presented to illustrate competitiveness with Fisher criteria, with claimed advantages under non-Gaussian or multimodal priors.
Significance. If the surrogates can be shown to deliver designs that meaningfully outperform Fisher-based criteria while preserving adjoint compatibility and computational tractability, the work would offer a practical route to Bayesian OED for nonlinear dynamical systems. The chain-rule decomposition and the exact linear-Gaussian result are clear technical strengths that could be leveraged in scalable optimal-control formulations.
major comments (3)
- [§4.2, Theorem 4.1] §4.2 (Gaussian tilting surrogate definition) and Theorem 4.1: exactness holds only in the linear-Gaussian case; no a priori bounds are derived on the total-variation or KL distance between the tilted measure and the true posterior as a function of the Lipschitz constant of the forward map or the magnitude of the quadratic tilting factor. This gap is load-bearing for the central claim that the surrogates retain EIG advantages over Fisher criteria under strong nonlinearities.
- [§5] §5 (numerical benchmarks): the reported comparisons show competitiveness in near-Gaussian regimes but provide no quantitative assessment of how the approximation error grows with nonlinearity or prior non-Gaussianity relative to the EIG–Fisher performance gap; without such diagnostics it is unclear whether the observed benefits survive once the tilting error exceeds the information-theoretic advantage.
- [§3.1] §3.1 (chain-rule decomposition): while the decomposition itself is exact, the subsequent replacement of the posterior by the prior (instantaneous surrogate) or by the tilted measure is introduced without an accompanying error-propagation analysis that would justify the claim of retained Bayesian advantages for general ODEs.
minor comments (2)
- [Eq. (12)] The quadratic information factor appearing in the tilting surrogate (Eq. (12)) would benefit from an explicit line-by-line derivation showing how the design enters the exponent.
- [§1] A short paragraph contrasting the proposed surrogates with existing adjoint-compatible information criteria (e.g., those based on the Fisher information matrix) would clarify the precise novelty.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments, which help clarify the scope and limitations of our work. We address each major point below and outline the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [§4.2, Theorem 4.1] §4.2 (Gaussian tilting surrogate definition) and Theorem 4.1: exactness holds only in the linear-Gaussian case; no a priori bounds are derived on the total-variation or KL distance between the tilted measure and the true posterior as a function of the Lipschitz constant of the forward map or the magnitude of the quadratic tilting factor. This gap is load-bearing for the central claim that the surrogates retain EIG advantages over Fisher criteria under strong nonlinearities.
Authors: We agree that Theorem 4.1 establishes exactness of the Gaussian tilting surrogate exclusively in the linear-Gaussian setting, and that no general a priori error bounds (in total variation or KL divergence) are derived in terms of the forward map's Lipschitz constant or the tilting factor. In the revised manuscript we will explicitly restate the theorem's scope, add a dedicated paragraph in §4.2 discussing this limitation, and note that obtaining such bounds for nonlinear ODEs is an open question left for future work. The numerical results in §5 nevertheless provide empirical support for practical utility under moderate nonlinearities. revision: partial
-
Referee: [§5] §5 (numerical benchmarks): the reported comparisons show competitiveness in near-Gaussian regimes but provide no quantitative assessment of how the approximation error grows with nonlinearity or prior non-Gaussianity relative to the EIG–Fisher performance gap; without such diagnostics it is unclear whether the observed benefits survive once the tilting error exceeds the information-theoretic advantage.
Authors: We acknowledge that the benchmarks in §5 illustrate competitiveness mainly in near-Gaussian regimes without supplying explicit quantitative diagnostics (e.g., error-vs-nonlinearity curves or scaling of tilting error relative to the EIG–Fisher gap). In the revision we will augment §5 with additional experiments on the benchmark systems that plot the surrogate approximation error against increasing nonlinearity strength and prior non-Gaussianity, together with a direct comparison to the observed performance gap versus Fisher criteria. revision: yes
-
Referee: [§3.1] §3.1 (chain-rule decomposition): while the decomposition itself is exact, the subsequent replacement of the posterior by the prior (instantaneous surrogate) or by the tilted measure is introduced without an accompanying error-propagation analysis that would justify the claim of retained Bayesian advantages for general ODEs.
Authors: The chain-rule decomposition in §3.1 is exact by construction. However, we did not provide a full error-propagation analysis quantifying how the posterior approximation errors translate into EIG surrogate error for general nonlinear ODEs. In the revised version we will insert a short subsection following §3.1 that uses the exact decomposition to derive a first-order propagation bound and discusses the conditions under which the surrogate retains a Bayesian advantage over Fisher information. revision: partial
Circularity Check
No circularity: surrogates defined via explicit decomposition and approximations
full rationale
The derivation introduces two EIG surrogates (instantaneous and Gaussian-tilting) from an exact chain-rule decomposition of the expected information gain together with explicit, tractable replacements for the posterior (prior substitution or design-driven quadratic reweighting). These constructions are stated directly in terms of the forward map, prior, and design variables; they do not reduce by the paper's own equations to quantities that are fitted from the target EIG or defined in terms of the surrogates themselves. Exactness is shown only for the linear-Gaussian case by direct substitution, which is a verification rather than a tautology. No load-bearing self-citations, uniqueness theorems imported from prior work by the same authors, or renaming of known empirical patterns appear in the derivation chain. The central claim therefore remains independent of its inputs.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Exact chain-rule decomposition of the expected information gain
- domain assumption Tractable approximations of the posterior are sufficient to preserve EIG benefits
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclearWe introduce adjoint-compatible surrogates of the EIG based on an exact chain-rule decomposition and tractable approximations of the posterior distribution... Gaussian tilting surrogate, obtained by reweighting the prior through a design-driven quadratic information factor.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclearthe Gaussian tilting surrogate is exact in the linear-Gaussian setting
Reference graph
Works this paper leans on
-
[1]
On the Nonrandomized Optimality and Randomized Nonoptimality of Sym- metrical Designs
J. Kiefer. “On the Nonrandomized Optimality and Randomized Nonoptimality of Sym- metrical Designs”. In:The Annals of Mathematical Statistics29.3 (1958), pp. 675 –699
work page 1958
-
[2]
On a Measure of the Information Provided by an Experiment
D. V. Lindley. “On a Measure of the Information Provided by an Experiment”. In:The Annals of Mathematical Statistics27.4 (1956), pp. 986 –1005
work page 1956
-
[3]
Application of a Measure of Information to the Design and Comparison of Regression Experiments
M. Stone. “Application of a Measure of Information to the Design and Comparison of Regression Experiments”. In:The Annals of Mathematical Statistics30.1 (1959), pp. 55 –70
work page 1959
-
[4]
Martijn Berger and Weng Wong. “Applied Optimal Designs”. In: (Oct. 2005)
work page 2005
-
[5]
Optimal experimental design: For- mulations and computations
Xun Huan, Jayanth Jagalur, and Youssef Marzouk. “Optimal experimental design: For- mulations and computations”. In:Acta Numerica33 (2024), 715–840
work page 2024
-
[6]
Systems biology: experimental design
Clemens Kreutz and Jens Timmer. “Systems biology: experimental design”. In:The FEBS Journal276.4 (2009), pp. 923–942
work page 2009
-
[7]
Model-based design of experiments for param- eter precision: State of the art
Gaia Franceschini and Sandro Macchietto. “Model-based design of experiments for param- eter precision: State of the art”. In:Chemical Engineering Science63.19 (2008). Model- Based Experimental Analysis, pp. 4846–4872.issn: 0009-2509
work page 2008
-
[8]
Sampling Decisions in Optimum Experimental Design in the Light of Pontryagin’s Maximum Principle
Sebastian Sager. “Sampling Decisions in Optimum Experimental Design in the Light of Pontryagin’s Maximum Principle”. In:SIAM Journal on Control and Optimization51.4 (2013), pp. 3181–3207
work page 2013
-
[9]
Stefan Körkel et al. “Numerical methods for optimal control problems in design of robust optimal experiments for nonlinear dynamic processes”. In:Optimization Methods and Soft- ware19.3–4 (2004), pp. 327–338
work page 2004
-
[10]
Bayesian Experimental Design: A Review
Kathryn Chaloner and Isabella Verdinelli. “Bayesian Experimental Design: A Review”. In: Statistical Science10.3 (1995), pp. 273–304
work page 1995
-
[11]
A Review of Modern Computational Algorithms for Bayesian Optimal Design
Elizabeth G. Ryan et al. “A Review of Modern Computational Algorithms for Bayesian Optimal Design”. In:International Statistical Review84.1 (2016), pp. 128–154
work page 2016
-
[12]
On Nesting Monte Carlo Estimators
Tom Rainforth et al. “On Nesting Monte Carlo Estimators”. In:Proceedings of the 35th International Conference on Machine Learning. Ed. by Jennifer Dy and Andreas Krause. Vol. 80. Proceedings of Machine Learning Research. PMLR, 2018, pp. 4267–4276
work page 2018
-
[13]
Variational Estimators for Bayesian Optimal Experimental Design
Adam Foster et al. “Variational Estimators for Bayesian Optimal Experimental Design”. In: (Mar. 2019)
work page 2019
-
[14]
Andreas Krause, Ajit Singh, and Carlos Guestrin. “Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies”. In:Journal of Machine Learning Research9.8 (2008), pp. 235–284
work page 2008
-
[15]
Steven Maio and Alen Alexanderian.On submodularity of the expected information gain. 2025
work page 2025
-
[16]
Optimized expected information gain for nonlinear dynamical systems
Alberto Giovanni Busetto, Cheng Soon Ong, and Joachim Buhmann. “Optimized expected information gain for nonlinear dynamical systems”. In: vol. 382. June 2009, p. 13
work page 2009
-
[17]
Antony Overstall, David Woods, and Ben Parker. “Bayesian Optimal Design for Ordinary Differential Equation Models With Application in Biological Science”. In:Journal of the American Statistical Association115 (Apr. 2019)
work page 2019
-
[18]
Simulation-Based Optimal Bayesian Experimen- tal Design for Nonlinear Systems
Xun Huan and Youssef M. Marzouk. “Simulation-Based Optimal Bayesian Experimen- tal Design for Nonlinear Systems”. In:Journal of Computational Physics232.1 (2013), pp. 288–317. 25
work page 2013
-
[19]
Optimal Bayesian experiment design for nonlinear dynamic systems with chance constraints
Joel Paulson, Marc Martin-Casas, and Ali Mesbah. “Optimal Bayesian experiment design for nonlinear dynamic systems with chance constraints”. In:Journal of Process Control 77 (Feb. 2019)
work page 2019
-
[20]
On the mathematical foundations of theoretical statistics
R. A. Fisher. “On the mathematical foundations of theoretical statistics”. In:Philosophical Transactions of the Royal Society of London, Series A: Containing Papers of a Mathe- matical or Physical Character222.594-604 (Jan. 1922), pp. 309–368
work page 1922
-
[21]
Oxford University Press, May 2007
A C Atkinson, A N Donev, and R D Tobias.Optimum Experimental Designs, with SAS. Oxford University Press, May 2007
work page 2007
- [22]
-
[23]
Luc Pronzato and Andrej Pázman.Design of Experiments in Nonlinear Models: Asymp- totic Normality, Optimality Criteria and Small-Sample Properties. Vol. 212. Lecture Notes in Statistics. Springer, 2013
work page 2013
-
[24]
Eric Walter and Luc Pronzato.Identification of Parametric Models from Experimental Data. 1st ed. Communications and Control Engineering. Springer, 1997
work page 1997
-
[25]
J. Kiefer. “Optimum Experimental Designs”. In:Journal of the Royal Statistical Society: Series B (Methodological)21.2 (Dec. 2018), pp. 272–304
work page 2018
-
[26]
Boston, MA: Harvard University Press, 1961
Howard Raiffa and Robert Schlaifer.Applied Statistical Decision Theory. Boston, MA: Harvard University Press, 1961
work page 1961
-
[27]
Lindley.Bayesian Statistics: A Review
Dennis V. Lindley.Bayesian Statistics: A Review. Philadelphia: SIAM, 1972
work page 1972
-
[28]
Kenneth J. Ryan. “Estimating Expected Information Gains for Experimental Designs with Application to the Random Fatigue-Limit Model”. In:Journal of Computational and Graphical Statistics12.3 (2003), pp. 585–603
work page 2003
-
[29]
Thomas M. Cover and Joy A. Thomas.Elements of Information Theory. New York: Wiley, 1991
work page 1991
-
[30]
Daniel R. Cavagnaro et al. “Adaptive Design Optimization: A Mutual Information-Based Approach to Model Discrimination in Cognitive Science”. In:Neural Computation22.4 (Apr. 2010), pp. 887–905
work page 2010
-
[31]
A Sequential Monte Carlo Algorithm to Incorporate Model Uncertainty in Bayesian Sequential Design
Christopher C. Drovandi, James M. McGree, and Anthony N. Pettitt. “A Sequential Monte Carlo Algorithm to Incorporate Model Uncertainty in Bayesian Sequential Design”. In: Journal of Computational and Graphical Statistics23.1 (2014), pp. 3–24
work page 2014
-
[32]
Quan Long et al. “Fast estimation of expected information gains for Bayesian experimental designs based on Laplace approximations”. In:Computer Methods in Applied Mechanics and Engineering259 (2013), pp. 24–39
work page 2013
-
[33]
Sequential Optimal Design of Neuro- physiology Experiments
Jakub Lewi, Robert J. Butera, and Liam Paninski. “Sequential Optimal Design of Neuro- physiology Experiments”. In:Neural Computation21.3 (2009), pp. 619–687
work page 2009
-
[34]
KangjieZhouetal.“ASPIRe:AnInformativeTrajectoryPlannerwithMutualInformation Approximation for Target Search and Tracking”. In:2024 IEEE International Conference on Robotics and Automation (ICRA). 2024, pp. 4626–4632
work page 2024
-
[35]
Numerical Methods for Experimental De- sign of Large-Scale Linear Ill-Posed Inverse Problems
Eldad Haber, Lior Horesh, and Luis Tenorio. “Numerical Methods for Experimental De- sign of Large-Scale Linear Ill-Posed Inverse Problems”. In:Inverse Problems24.5 (2008), p. 055012
work page 2008
-
[36]
Alen Alexanderian et al. “A-Optimal Design of Experiments for Infinite-Dimensional Bayesian Linear Inverse Problems with Regularized l0-Sparsification”. In:SIAM Journal on Scientific Computing36.5 (2014), A2122–A2148
work page 2014
-
[37]
The Interval Analysis of Multi- linear Expressions
Cosimo Laneve, Tudor A. Lascu, and Vania Sordoni. “The Interval Analysis of Multi- linear Expressions”. In:Electronic Notes in Theoretical Computer Science267.2 (2010). Proceedings of the Tools for Automatic Program AnalysiS (TAPAS), pp. 43–53. 26
work page 2010
-
[38]
CasADi – A software framework for nonlinear optimization and optimal control
Joel A. E. Andersson et al. “CasADi – A software framework for nonlinear optimization and optimal control”. In:Mathematical Programming Computation11.1 (2019), pp. 1–36
work page 2019
-
[39]
Sebastian Sager et al. “Numerical Methods for Optimal Control with Binary Control Functions Applied to a Lotka-Volterra Type Fishing Problem”. In:Recent Advances in Optimization. Ed. by Alberto Seeger. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006, pp. 269–289
work page 2006
-
[40]
Optimal Experimental Design for Universal Differential Equations
Christoph Plate, Carl Julius Martensen, and Sebastian Sager. “Optimal Experimental Design for Universal Differential Equations”. In:IEEE Transactions on Automatic Control 71.3 (2026), pp. 1521–1536
work page 2026
-
[41]
Comparison of resampling schemes for particle filtering
R. Douc and O. Cappe. “Comparison of resampling schemes for particle filtering”. In:ISPA
- [42]
-
[43]
Tian cheng Li et al. “Resampling methods for particle filtering: identical distribution, a new method, and comparable study”. In:Frontiers of Information Technology & Electronic Engineering16.11 (2015), pp. 969–984
work page 2015
-
[44]
Emma Lagracie and Luc de Montella.Particle Filtering for Non-Deterministic Electrocar- diographic Imaging. 2025
work page 2025
-
[45]
Nonlinear ill-posed problem analysis in model-based parameter estimation and experimental design
Diana C. López C. et al. “Nonlinear ill-posed problem analysis in model-based parameter estimation and experimental design”. In:Computers and Chemical Engineering77 (2015), pp. 24–42
work page 2015
-
[46]
1 Model order reduction: basic concepts and notation
Peter Benner et al. “1 Model order reduction: basic concepts and notation”. In: Oct. 2021, pp. 1–14
work page 2021
-
[47]
Variational Bayesian optimal experimental design with normalizing flows
Jiayuan Dong et al. “Variational Bayesian optimal experimental design with normalizing flows”. In:Computer Methods in Applied Mechanics and Engineering433 (2025)
work page 2025
-
[48]
Tractable optimal experimental de- sign using transport maps*
Karina Koval, Roland Herzog, and Robert Scheichl. “Tractable optimal experimental de- sign using transport maps*”. In:Inverse Problems40.12 (2024), p. 125002
work page 2024
-
[49]
Cédric Villani.Optimal Transport: Old and New. Vol. 338. Grundlehren der mathematis- chen Wissenschaften. Berlin, Heidelberg: Springer, 2009. 27
work page 2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.