pith. sign in

arxiv: 2606.09949 · v1 · pith:DK53T7FXnew · submitted 2026-06-08 · 💻 cs.LG · cs.AI

Learning Where to Simulate: Generative Active Sampling for Online PDE Surrogate Training

Pith reviewed 2026-06-27 17:18 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords PDE surrogatesactive samplingdiffusion modelsonline traininggenerative active learningsurrogate modelingchallenging dynamicstail error statistics
0
0 comments X

The pith

OGAS trains a parallel diffusion model to steer PDE solver parameters toward configurations that challenge the surrogate, cutting tail errors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that uniform sampling of PDE configuration parameters leaves surrogates vulnerable to high errors on rare but difficult trajectories. OGAS couples data generation and training by running a fast diffusion model that takes surrogate difficulty signals and outputs parameters likely to produce hard dynamics. The model draws from a prior biased toward high difficulty, continuously shifting the sampling distribution without slowing the workflow. This yields surrogates with lower errors above the 99th percentile and reduced overall error spread across Kuramoto-Sivashinsky, Navier-Stokes, and Gray-Scott systems.

Core claim

OGAS introduces an online active sampling loop in which a diffusion model is trained concurrently to map surrogate-derived difficulty signals to configuration parameters. By conditioning the diffusion model on a prior that favors high difficulty, the method generates training trajectories that expose weaknesses in the current surrogate, producing consistent gains in tail statistics and error dispersion relative to uniform sampling while adding negligible wall-clock cost.

What carries the argument

The conditional diffusion model that serves as a fast, reactive sampler: it receives a difficulty signal from the surrogate and produces configuration parameters that steer the PDE solver toward challenging regimes.

If this is right

  • Errors above the 99th percentile decrease substantially compared with uniform sampling.
  • Overall error dispersion shrinks across the test distribution.
  • Worst-case reliability of the surrogate improves for the same training budget.
  • Wall-time overhead remains negligible despite the added generative model.
  • Average error may rise slightly as a direct trade-off for the tail gains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same generative steering loop could be applied to other expensive simulation domains such as molecular dynamics where uniform parameter sampling also misses rare events.
  • Because the sampler updates online, it may enable fully adaptive training pipelines that require fewer total solver calls to reach a target reliability level.
  • Transfer of the learned sampler across different surrogate architectures or to higher-dimensional PDEs remains an open extension not addressed in the work.

Load-bearing premise

The diffusion model must keep mapping surrogate difficulty signals to parameters that actually produce harder dynamics even as the surrogate itself improves.

What would settle it

On any of the three tested 2D PDEs, run OGAS and uniform sampling to the same data budget; if the 99th-percentile error does not drop under OGAS, the central performance claim is false.

Figures

Figures reproduced from arXiv: 2606.09949 by Abhishek Purandare (DATAMOVE), Bruno Raffin (DATAMOVE), Pierre Cesar (DATAMOVE), Sofya Dymchenko (DATAMOVE).

Figure 1
Figure 1. Figure 1: Online surrogate training workflow with OGAS. All components are running in parallel. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Improvement ratios over Uniform for normalized RMSE statistics; values above one [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: OGAS samples GS parameters progressively focusing on hard regions. Tail compression. The dominant effect of OGAS is a strong compression of the high-error tail, directly improving robustness (RMSE-max, RMSE-p99) and consistency (RMSE-std). The right three columns of [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Loss distributions over seeds on 1D KdV and KS Ablations. We probe three knobs of the loss-based OGAS on 1D KS (Appendix D.1). Bias correction reduces error dispersion by 1.26×, average by 1.22×, and worst-case by 1.2× compared to no correction ( [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Gray-Scott trajectory grid. Visual snapshots from the 2D validation suite: Gray-Scott (reaction-diffusion patterns). All simulations are performed using the Exponax JAX-based spectral solver [23] and show timestep evolution across different domain size values [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Navier-Stokes trajectory grid. Visual snapshots from the 2D validation suite: Navier-Stokes Kolmogorov Flow (fluid dynamics). All simulations are performed using the Exponax JAX-based spectral solver [23] and show timestep evolution across different diffusivity values sampled as L ∼ U[10, 130]. We discard a warmup of 200 internal steps prior to recording. The time delta used is ∆t = 0.5 and nsub = 5 intern… view at source ↗
Figure 7
Figure 7. Figure 7: Kuramoto-Sivashinsky trajectory grid. Visual snapshots from the 2D validation suite: Kuramoto-Sivashinsky (chaotic). All simulations are performed using the Exponax JAX-based spectral solver [23] and show timestep evolution across different domain size values mode sinusoidal forcing in the vorticity equation) and periodic boundaries. The domain extent is fixed to L = 2π, while viscosity is sampled as ν ∼ U… view at source ↗
Figure 8
Figure 8. Figure 8: Example of truncated fourier IC sampling with various cutoff and taper power [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Example of gaussian blobs IC sampling with varying number of blobs [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Navier-Stokes Trajectory Samples. Comparison of rollout predictions on difficult trajectories. We compare the ground truth (GT) against surrogates trained with Uniform sampling and OGAS [PITH_FULL_IMAGE:figures/full_fig_p030_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Kuramoto-Sivashinsky Trajectory Samples. Comparison of rollout predictions on difficult trajectories. We compare the ground truth (GT) against surrogates trained with Uniform sampling and OGAS. D.9 DDPM Training Dynamics [PITH_FULL_IMAGE:figures/full_fig_p031_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Gray-Scott Trajectory Samples. Comparison of rollout predictions on difficult trajec￾tories. We compare the ground truth (GT) against surrogates trained with Uniform sampling and OGAS. stable over the remaining timesteps. After this initial transition, the loss plateaus within a relatively narrow range (with occasional fluctuations, especially on Kuramoto–Sivashinsky), which is consistent with stable opti… view at source ↗
Figure 13
Figure 13. Figure 13: Mixture sampling evolution for Gray-Scott. [PITH_FULL_IMAGE:figures/full_fig_p033_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Mixture sampling evolution for Kuramoto-Sivashinsky. [PITH_FULL_IMAGE:figures/full_fig_p033_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Mixture sampling evolution for Navier-Stokes. [PITH_FULL_IMAGE:figures/full_fig_p034_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: DDPM Training Loss Evolution (OGAS-L). The plots show the training loss (log scale) over optimization steps for the OGAS-L strategy, averaged over 3 seeds with standard deviation indicated by the shaded regions. Comparison is performed across three architectures: FNO, U-Net, and scOT. 35 [PITH_FULL_IMAGE:figures/full_fig_p035_16.png] view at source ↗
read the original abstract

Data-driven PDE surrogates are trained with data produced by numerical PDE solvers. However, when the surrogate's goal is to generalize across a wide range of PDE configurations (e.g., initial conditions and physical coefficients), generating a representative training set is non-trivial. Uniform sampling of configuration parameters often under-represents trajectories exhibiting challenging dynamics, leading to high prediction errors and large error variance in the trained surrogate. Online training, where data generation and surrogate training are coupled, offers a natural advantage by allowing solver parameters to be steered on-the-fly. To efficiently exploit this capability, we introduce Online Generative Active Sampling (OGAS), an active learning method that reactively learns the relationship between configuration parameters and surrogate performance to control the sampling distribution. OGAS trains a fast diffusion model in parallel to the surrogate to act as a conditional sampler, mapping a surrogate-derived difficulty signal (e.g., loss or uncertainty) to configuration parameters. By actively drawing target signals from a prior biased toward high difficulty, OGAS continuously steers data generation toward challenging regimes without delaying the training workflow. We evaluate OGAS across 2D PDEs with distinct challenging dynamics (Kuramoto-Sivashinsky, Navier-Stokes, Gray-Scott) and up to 308 parameters, using multiple surrogate architectures. Across all settings, OGAS consistently improves tail statistics, yielding substantial reductions in errors above the 99th percentile and overall error dispersion compared to uniform sampling. While prioritizing challenging trajectories introduces a trade-off with average error, OGAS effectively ensures worst-case reliability of trained surrogates with negligible wall-time overhead.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces Online Generative Active Sampling (OGAS), an active learning procedure that trains a diffusion model concurrently with a PDE surrogate to map surrogate-derived difficulty signals (loss or uncertainty) to configuration parameters, thereby steering numerical solver trajectories toward challenging regimes. The central empirical claim is that, across Kuramoto-Sivashinsky, Navier-Stokes, and Gray-Scott equations and multiple surrogate architectures, OGAS yields consistent reductions in errors above the 99th percentile and lower error dispersion relative to uniform sampling, at negligible wall-time cost, while accepting a possible increase in average error.

Significance. If the reported tail-statistic gains are shown to be robust and attributable to the adaptive sampling rather than total sample volume, the method would offer a low-overhead route to worst-case reliability for data-driven PDE surrogates, a practical concern in applications where uniform sampling leaves high-error regimes under-represented.

major comments (3)
  1. [§3 (OGAS training loop and diffusion conditioning)] The headline claim that OGAS improves final tail statistics rests on the assumption that a diffusion model trained online on evolving surrogate signals continues to generate trajectories that remain among the hardest for the converged surrogate. Because the surrogate error landscape changes during training, early difficulty labels may correspond to regimes that later become easy; the manuscript provides no analysis or ablation demonstrating that the learned mapping stabilizes or that the final 99th-percentile errors on a fixed test set are demonstrably lower than those obtained by simply increasing the total number of uniformly sampled trajectories.
  2. [§4 (Experimental results and tables)] The abstract states that OGAS 'consistently improves tail statistics' across all settings, yet the provided description contains no information on the number of independent runs, statistical significance tests, error bars, or data-exclusion criteria used to compute the 99th-percentile and dispersion metrics. Without these, it is impossible to determine whether the reported gains exceed run-to-run variability or result from post-hoc selection of favorable seeds.
  3. [§4.3 (Comparison with uniform sampling)] The trade-off statement that 'prioritizing challenging trajectories introduces a trade-off with average error' is presented without quantitative characterization of how large this increase is relative to the tail improvement, or whether a simple re-weighting of the uniform baseline could achieve comparable tail behavior at lower average-error cost.
minor comments (2)
  1. [§3.2] The precise functional form of the 'prior biased toward high difficulty' used to draw target signals for the diffusion model is not stated; an explicit equation or pseudocode line would clarify reproducibility.
  2. Notation for the difficulty signal (loss versus uncertainty) is used interchangeably in the abstract; a single consistent symbol and definition would aid readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment point by point below, indicating where revisions will be made to improve clarity and rigor.

read point-by-point responses
  1. Referee: [§3 (OGAS training loop and diffusion conditioning)] The headline claim that OGAS improves final tail statistics rests on the assumption that a diffusion model trained online on evolving surrogate signals continues to generate trajectories that remain among the hardest for the converged surrogate. Because the surrogate error landscape changes during training, early difficulty labels may correspond to regimes that later become easy; the manuscript provides no analysis or ablation demonstrating that the learned mapping stabilizes or that the final 99th-percentile errors on a fixed test set are demonstrably lower than those obtained by simply increasing the total number of uniformly sampled trajectories.

    Authors: We agree that the manuscript lacks an explicit ablation on mapping stability and a controlled comparison against equivalent-volume uniform sampling. In the revision we will add (i) plots tracking the evolution of the conditioned diffusion distribution across training epochs and (ii) a direct baseline that matches total solver trajectories under uniform sampling. These additions will quantify whether tail gains exceed what extra uniform samples alone can achieve. revision: yes

  2. Referee: [§4 (Experimental results and tables)] The abstract states that OGAS 'consistently improves tail statistics' across all settings, yet the provided description contains no information on the number of independent runs, statistical significance tests, error bars, or data-exclusion criteria used to compute the 99th-percentile and dispersion metrics. Without these, it is impossible to determine whether the reported gains exceed run-to-run variability or result from post-hoc selection of favorable seeds.

    Authors: We accept that reproducibility details were omitted. All reported results were obtained from five independent random seeds per PDE-surrogate pair; we will insert error bars (one standard deviation), state the run count explicitly, and add paired t-test p-values for the 99th-percentile and dispersion metrics. No runs or data points were excluded. revision: yes

  3. Referee: [§4.3 (Comparison with uniform sampling)] The trade-off statement that 'prioritizing challenging trajectories introduces a trade-off with average error' is presented without quantitative characterization of how large this increase is relative to the tail improvement, or whether a simple re-weighting of the uniform baseline could achieve comparable tail behavior at lower average-error cost.

    Authors: Tables already list both mean and 99th-percentile errors, permitting direct inspection of the trade-off. We did not, however, quantify its magnitude relative to tail gains or test re-weighting. The revision will add a short paragraph reporting typical relative changes (average error increase of 5-15 % versus 99th-percentile reductions often exceeding 30 %) and a brief discussion of why static re-weighting of uniform trajectories is unlikely to match the online adaptive benefit. revision: partial

Circularity Check

0 steps flagged

No circularity; empirical method with independent evaluation

full rationale

The paper presents OGAS as an empirical active-learning procedure that couples a diffusion model to surrogate-derived signals for steering PDE configuration sampling. No equations, derivations, or self-citations appear in the abstract or described method that reduce the reported tail-error improvements to a quantity defined by the inputs themselves. The central claims rest on experimental comparisons against uniform sampling across multiple PDEs, surrogate architectures, and parameter counts, with the evaluation performed on held-out test statistics rather than on quantities fitted or renamed from the training signals. This structure keeps the result self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies insufficient technical detail to enumerate specific free parameters, background axioms, or new postulated entities; the method description implies an unstated assumption that difficulty signals are learnably correlated with configuration space but does not quantify any fitted constants.

pith-pipeline@v0.9.1-grok · 5842 in / 1175 out tokens · 22544 ms · 2026-06-27T17:18:13.536398+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

57 extracted references · 11 canonical work pages

  1. [1]

    Improving the Efficiency of Training Physics-Informed Neural Networks Using Active Learning

    Yuri Aikawa, Naonori Ueda, and Toshiyuki Tanaka. Improving the Efficiency of Training Physics-Informed Neural Networks Using Active Learning. New Gener. Comput., 42(4):739– 760, November 2024. ISSN 1882-7055. doi: 10.1007/s00354-024-00253-6

  2. [2]

    Fluid Intelligence: A Forward Look on AI Foundation Models in Computational Fluid Dynamics, November 2025

    Neil Ashton, Johannes Brandstetter, and Siddhartha Mishra. Fluid Intelligence: A Forward Look on AI Foundation Models in Computational Fluid Dynamics, November 2025

  3. [3]

    Feasibility Study on Active Learning of Smart Surrogates for Scientific Simulations, July 2024

    Pradeep Bajracharya, Javier Quetzalcóatl Toledo-Marín, Geoffrey Fox, Shantenu Jha, and Lin- wei Wang. Feasibility Study on Active Learning of Smart Surrogates for Scientific Simulations, July 2024

  4. [4]

    Discriminative Learning Under Covariate Shift

    Steffen Bickel, Michael Brückner, and Tobias Scheffer. Discriminative Learning Under Covariate Shift. Journal of Machine Learning Research, 10(75):2137–2155, 2009. URL http://jmlr.org/papers/v10/bickel09a.html

  5. [5]

    From flops to iops: The new bottlenecks of scientific computing

    Spyros Blanas. From flops to iops: The new bottlenecks of scientific computing. https://www.sigarch.org/from-flops-to-iops-the-new-bottlenecks-of-scientific-computing/, 2020

  6. [6]

    Worrall, and Max Welling

    Johannes Brandstetter, Daniel E. Worrall, and Max Welling. Message passing neural PDE solvers. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022, 2022

  7. [7]

    Active learning for data streams: A survey

    Davide Cacciarelli and Murat Kulahci. Active learning for data streams: A survey. Machine Learning, 113(1):185–239, January 2024. ISSN 1573-0565. doi: 10.1007/s10994-023-06454-2

  8. [8]

    Population Monte Carlo

    Olivier Cappé, Arnaud Guillin, Jean-Michel Marin, and Christian Robert. Population Monte Carlo. Journal of Computational and Graphical Statistics, 13(4):907–929, 2004. URL https: //hal.science/hal-01337419

  9. [9]

    2020 Proceedings of the National Academy of Sciences, 117, 30055, doi:10.1073/pnas.1912789117 da Cunha, E., Charlot, S., & Elbaz, D

    Kyle Cranmer, Johann Brehmer, and Gilles Louppe. The Frontier of Simulation-Based Inference. Proceedings of the National Academy of Sciences, 117(48):30055–30062, December 2020. ISSN 0027-8424, 1091-6490. doi: 10.1073/pnas.1912789117

  10. [10]

    Mitigating propagation failures in physics-informed neural networks using retain-resample-release (r3) sampling

    Arka Daw, Jie Bu, Sifan Wang, Paris Perdikaris, and Anuj Karpatne. Mitigating propagation failures in physics-informed neural networks using retain-resample-release (r3) sampling. In Proceedings of the 40th International Conference on Machine Learning, pages 7264–7302. PMLR, 2023. URL https://proceedings.mlr.press/v202/daw23a.html. ISSN: 2640- 3498

  11. [11]

    Loss-driven sampling within hard-to-learn ar- eas for simulation-based neural network training

    Sofya Dymchenko and Bruno Raffin. Loss-driven sampling within hard-to-learn ar- eas for simulation-based neural network training. In MLPS 2023 - Machine Learning and the Physical Sciences Workshop at NeurIPS 2023 - 37th Conference on Neural Information Processing Systems, pages 1–5, New Orleans, United States, December 2023

  12. [12]

    MelissaDL x Breed: Towards Data- Efficient On-line Supervised Training of Multi-parametric Surrogates with Active Learning

    Sofya Dymchenko, Abhishek Purandare, and Bruno Raffin. MelissaDL x Breed: Towards Data- Efficient On-line Supervised Training of Multi-parametric Surrogates with Active Learning. In SC-W 2024 - Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, pages 1–9, Atlanta (Georgia), United States, November 2024...

  13. [13]

    Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio

    Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative Adversarial Networks, June 2014. URL http://arxiv.org/abs/1406.2661. arXiv:1406.2661 [stat]

  14. [14]

    Poseidon: Efficient Foundation Models for PDEs, May 2024

    Maximilian Herde, Bogdan Raoni ´c, Tobias Rohner, Roger Käppeli, Roberto Molinaro, Em- manuel de Bézenac, and Siddhartha Mishra. Poseidon: Efficient Foundation Models for PDEs, May 2024

  15. [15]

    Classifier-Free Diffusion Guidance, July 2022

    Jonathan Ho and Tim Salimans. Classifier-Free Diffusion Guidance, July 2022. URL http: //arxiv.org/abs/2207.12598. arXiv:2207.12598 [cs]

  16. [16]

    Denoising Diffusion Probabilistic Models, December

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising Diffusion Probabilistic Models, December

  17. [17]

    arXiv:2006.11239 [cs]

    URL http://arxiv.org/abs/2006.11239. arXiv:2006.11239 [cs]

  18. [18]

    A Framework and Benchmark for Deep Batch Active Learning for Regression, August 2023

    David Holzmüller, Viktor Zaverkin, Johannes Kästner, and Ingo Steinwart. A Framework and Benchmark for Deep Batch Active Learning for Regression, August 2023. 11

  19. [19]

    PDE-transformer: Efficient and versatile transformers for physics simulations

    Benjamin Holzschuh, Qiang Liu, Georg Kohl, and Nils Thuerey. PDE-transformer: Efficient and versatile transformers for physics simulations. In Forty-Second International Conference on Machine Learning, June 2025

  20. [20]

    Semi-Supervised Active Learning with Temporal Output Discrepancy, July 2021

    Siyu Huang, Tianyang Wang, Haoyi Xiong, Jun Huan, and Dejing Dou. Semi-Supervised Active Learning with Temporal Output Discrepancy, July 2021

  21. [21]

    Joshi, Fatih Porikli, and Nikolaos Papanikolopoulos

    Ajay J. Joshi, Fatih Porikli, and Nikolaos Papanikolopoulos. Multi-class active learning for image classification. 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 2372–2379, June 2009. doi: 10.1109/CVPR.2009.5206627

  22. [22]

    Active Learning with Selective Time- Step Acquisition for PDEs

    Yegon Kim, Hyunsu Kim, Gyeonghoon Ko, and Juho Lee. Active Learning with Selective Time- Step Acquisition for PDEs. In Forty-Second International Conference on Machine Learning, June 2025

  23. [23]

    Stochastic Batch Acquisition: A Simple Baseline for Deep Active Learning, September 2023

    Andreas Kirsch, Sebastian Farquhar, Parmida Atighehchian, Andrew Jesson, Frederic Branchaud-Charron, and Yarin Gal. Stochastic Batch Acquisition: A Simple Baseline for Deep Active Learning, September 2023

  24. [24]

    APEBench: A Benchmark for Autoregressive Neural Emulators of PDEs

    Felix Koehler, Simon Niedermayr, Rüdiger Westermann, and Nils Thuerey. APEBench: A Benchmark for Autoregressive Neural Emulators of PDEs. In NeurIPS 2024, Vancouver,BC, Canada, December 10 - 15, 2024. arXiv, October 2024

  25. [25]

    Efficient Generative Transformer Operators For Million-Point PDEs, December 2025

    Armand Kassaï Koupaï, Lise Le Boudec, and Patrick Gallinari. Efficient Generative Transformer Operators For Million-Point PDEs, December 2025

  26. [26]

    Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles, November 2017

    Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles, November 2017

  27. [27]

    PINNACLE: PINN Adaptive ColLocation and Experimental points selection

    Gregory Kang Ruey Lau, Apivich Hemachandra, See-Kiong Ng, and Bryan Kian Hsiang Low. PINNACLE: PINN Adaptive ColLocation and Experimental points selection. In The Twelfth International Conference on Learning Representations, October 2023

  28. [28]

    I/o in machine learning applications on hpc systems: A 360-degree survey

    Noah Lewis, Jean Luca Bez, and Suren Byna. I/o in machine learning applications on hpc systems: A 360-degree survey. ACM Comput. Surv., 57(10), May 2025. ISSN 0360-0300. doi: 10.1145/3722215. URL https://doi.org/10.1145/3722215

  29. [29]

    Stuart, and Anima Anandkumar

    Zongyi Li, Nikola Borislavov Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhat- tacharya, Andrew M. Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021. URL https: /...

  30. [30]

    Dynamic SBI: Round-free Sequential Simulation-Based Inference with Adaptive Datasets, October 2025

    Huifang Lyu, James Alvey, Noemi Anau Montel, Mauro Pieroni, and Christoph Weniger. Dynamic SBI: Round-free Sequential Simulation-Based Inference with Adaptive Datasets, October 2025

  31. [31]

    Training Deep Surrogate Models with Large Scale Online Learning

    Lucas Meyer, Marc Schouler, Robert Alexander Caulk, Alejandro Ribés, and Bruno Raf- fin. Training Deep Surrogate Models with Large Scale Online Learning. In ICML 2023 - International Conference on Machine Learning, pages 1–17, July 2023. URL https: //hal.science/hal-04102400

  32. [32]

    High Throughput Training of Deep Surrogates from Large Ensemble Runs

    Lucas Meyer, Marc Schouler, Robert Alexander Caulk, Alejandro Ribés, and Bruno Raffin. High Throughput Training of Deep Surrogates from Large Ensemble Runs. In SC 2023 - The International Conference for High Performance Computing, Networking, Storage, and Analysis, pages 1–14, Denver, CO, United States, November 2023. ACM. doi: 10.1145/ 3581784.3607083. U...

  33. [33]

    RIGNO: A Graph-based framework for robust and accurate operator learning for PDEs on arbitrary domains, January 2025

    Sepehr Mousavi, Shizheng Wen, Levi Lingsch, Maximilian Herde, Bogdan Raoni´c, and Sid- dhartha Mishra. RIGNO: A Graph-based framework for robust and accurate operator learning for PDEs on arbitrary domains, January 2025

  34. [34]

    Active Learning for Neural PDE Solvers

    Daniel Musekamp, Marimuthu Kalimuthu, David Holzmüller, Makoto Takamoto, and Mathias Niepert. Active Learning for Neural PDE Solvers. In The Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net, 2025

  35. [35]

    Nguyen, Payel Das, and Steven G

    Raphaël Pestourie, Youssef Mroueh, Thanh V . Nguyen, Payel Das, and Steven G. Johnson. Active learning of deep surrogates for PDEs: Application to metasurface design. npj Comput Mater, 6(1), October 2020. ISSN 2057-3960. doi: 10.1038/s41524-020-00431-2. 12

  36. [36]

    Raphaël Pestourie, Youssef Mroueh, Chris Rackauckas, Payel Das, and Steven G. Johnson. Physics-enhanced deep surrogates for partial differential equations. Nat Mach Intell, 5(12): 1458–1465, December 2023. ISSN 2522-5839. doi: 10.1038/s42256-023-00761-y

  37. [37]

    Battaglia

    Tobias Pfaff, Meire Fortunato, Alvaro Sanchez-Gonzalez, and Peter W. Battaglia. Learning mesh-based simulation with graph networks. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021. URL https://openreview.net/forum?id=roNqYL0_XP

  38. [38]

    Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differen- tial equations

    M. Raissi, P. Perdikaris, and G. E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686–707, February 2019. ISSN 0021-9991. doi: 10.1016/j.jcp.2018.10.045

  39. [39]

    Gupta, Xiaojiang Chen, and Xin Wang

    Pengzhen Ren, Yun Xiao, Xiaojun Chang, Po-Yao Huang, Zhihui Li, Brij B. Gupta, Xiaojiang Chen, and Xin Wang. A Survey of Deep Active Learning.arXiv e-prints, art. arXiv:2009.00236, August 2020. doi: 10.48550/arXiv.2009.00236

  40. [40]

    Melissa: coordinating large-scale ensemble runs for deep learning and sensitivity analyses

    Marc Schouler, Robert Alexander Caulk, Lucas Meyer, Théophile Terraz, Christoph Conrads, Sebastian Friedemann, Achal Agarwal, Juan Manuel Baldonado, Bartłomiej Pogodzi´nski, Anna Sekuła, Alejandro Ribes, and Bruno Raffin. Melissa: coordinating large-scale ensemble runs for deep learning and sensitivity analyses. Journal of Open Source Software, 8(86):5291...

  41. [41]

    Active Learning for Convolutional Neural Networks: A Core-Set Approach, June 2018

    Ozan Sener and Silvio Savarese. Active Learning for Convolutional Neural Networks: A Core-Set Approach, June 2018

  42. [42]

    Active learning literature survey, 2009

    Burr Settles. Active learning literature survey, 2009

  43. [43]

    H. S. Seung, M. Opper, and H. Sompolinsky. Query by committee. In Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, pages 287–294. Publ by ACM,

  44. [44]

    doi: 10.1145/130385.130417

  45. [45]

    On the Benefits of Active Data Collection in Operator Learning, February 2025

    Unique Subedi and Ambuj Tewari. On the Benefits of Active Data Collection in Operator Learning, February 2025

  46. [46]

    Das-pinns: A deep adaptive sampling method for solving high-dimensional partial differential equations

    Kejun Tang, Xiaoliang Wan, and Chao Yang. Das-pinns: A deep adaptive sampling method for solving high-dimensional partial differential equations. Journal of Computational Physics, 476: 111868, 2023. URL https://www.math.lsu.edu/~xlwan/papers/journal/das.pdf

  47. [47]

    Data efficient surrogate modeling for engineering design: Ensemble-free batch mode deep active learning for regression, November 2022

    Harsh Vardhan, Umesh Timalsina, Peter V olgyesi, and Janos Sztipanovits. Data efficient surrogate modeling for engineering design: Ensemble-free batch mode deep active learning for regression, November 2022

  48. [48]

    An Expert’s Guide to Training Physics-informed Neural Networks, August 2023

    Sifan Wang, Shyam Sankaran, Hanwen Wang, and Paris Perdikaris. An Expert’s Guide to Training Physics-informed Neural Networks, August 2023. URL http://arxiv.org/abs/ 2308.08468. arXiv:2308.08468 [physics]

  49. [49]

    A Plug-and- Play Query Synthesis Active Learning Framework for Neural PDE Solvers

    Zhiyuan Wang, Jinwoo Go, Byung-Jun Yoon, Nathan Urban, and Xiaoning Qian. A Plug-and- Play Query Synthesis Active Learning Framework for Neural PDE Solvers. NeurIPS 2025, 2025

  50. [50]

    A comprehensive study of non-adaptive and residual-based adaptive sampling for physics-informed neural networks, July 2022

    Chenxi Wu, Min Zhu, Qinyang Tan, Yadhu Kartha, and Lu Lu. A comprehensive study of non-adaptive and residual-based adaptive sampling for physics-informed neural networks, July 2022

  51. [51]

    Zanisi, A

    L. Zanisi, A. Ho, T. Madula, J. Barr, J. Citrin, S. Pamela, J. Buchanan, F. Casson, V . Gopakumar, and J. E. T. contributors. Efficient training sets for surrogate models of tokamak turbulence with Active Deep Ensembles, October 2023. 13 A Theoretical Foundations: Sampling Inertia and Uniform-Prior Training This appendix complements Sec. 4 with a compact ...

  52. [52]

    Project xk ∈ Rd to h0 ∈ Rm with a linear layer

    Input projection. Project xk ∈ Rd to h0 ∈ Rm with a linear layer

  53. [53]

    Compute a sinusoidal embedding of the diffusion step k, map it to Rm, then project to e = 4 m = 4096

    Embeddings. Compute a sinusoidal embedding of the diffusion step k, map it to Rm, then project to e = 4 m = 4096. Independently embed the condition ˜ε to R4096 (and embed the null token ˜ε∅ similarly). Sum the time and condition embeddings to obtain a fused embedding in R4096

  54. [54]

    Residual denoising trunk. Apply two identical residual MLP blocks, each of the form h ← h + Linear SiLU(LayerNorm(h)) , where the fused embedding modulates each block via affine modulation (FiLM-style): per block we produce (γ, β) ∈ Rm × Rm from the fused embedding and apply LayerNorm(h) 7→ γ ⊙ LayerNorm(h) + β

  55. [55]

    A final LayerNorm–Linear head maps the resulting hidden state back to Rd to output εϕ(xk, k, ˜ε)

    Output head. A final LayerNorm–Linear head maps the resulting hidden state back to Rd to output εϕ(xk, k, ˜ε). Density-ratio classifier used to form wg(λ) To compute wg(λ) (Eq. (21)), we train a lightweight discriminator on λ with a 2-layer MLP: Linear(d →64) → ReLU → Linear(64 →1). We use the scalar logit output in Eq. (21) and clamp the resulting weight...

  56. [56]

    For P = 5,000, M = 2, and Trollout = 15, this totals 150,000 forward passes per batch selection

    Inference cost: Each selection step requires P × M × Trollout forward passes. For P = 5,000, M = 2, and Trollout = 15, this totals 150,000 forward passes per batch selection

  57. [57]

    Even with parallelization, the new generation takes at least 3 minutes for each model under our configuration

    Throughput impact: In our online experiment, this scoring step must block the simulation or training process until the next batch is selected. Even with parallelization, the new generation takes at least 3 minutes for each model under our configuration. This explains why we limit the resampling period to 1000 simulations (10 resamplings per experiment) in...