pith. machine review for the scientific record. sign in

arxiv: 2605.08832 · v1 · submitted 2026-05-09 · 💻 cs.LG · physics.flu-dyn

Recognition: 2 theorem links

· Lean Theorem

Inpainting physics: self-supervised learning for context-driven fluid simulation

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:35 UTC · model grok-4.3

classification 💻 cs.LG physics.flu-dyn
keywords inpaintingself-supervised learningfluid simulationneural surrogatecomputational fluid dynamicsvelocity fieldmasked autoencoderflow matching
0
0 comments X

The pith

Reformulating steady fluid simulation as inpainting lets a self-supervised prior over velocity fields adapt to new boundary conditions at inference time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard neural surrogates for computational fluid dynamics map explicit geometry and boundary conditions directly to solution fields during training, locking them to the conditions seen in the data. This paper instead trains models on velocity fields alone in a self-supervised manner to learn a general prior, then imposes arbitrary boundary constraints only at inference by fixing known regions and treating the rest as an inpainting task. A local neighbourhood tokeniser converts high-resolution 3D velocity fields into compact latent tokens so that masked autoencoder and flow-matching models can scale to large meshes. On intracranial aneurysm hemodynamics, the resulting model reconstructs complete velocity fields from sparse context, exceeds supervised baselines when boundaries or datasets shift, and supports local geometry edits by reusing unchanged context regions. The central move is to convert task-specific predictors into reusable flow priors conditioned on context.

Core claim

Steady CFD inference can be recast as an inpainting problem: a self-supervised prior is learned over velocity fields without explicit boundary conditions in training, after which new constraints are imposed at inference by fixing known inlet, outlet, or unchanged geometry regions, allowing full-field reconstruction from sparse context and local edits without retraining.

What carries the argument

A local neighbourhood tokeniser that converts high-resolution 3D velocity fields into compact spatial latent tokens, on which latent flow-matching and masked-autoencoder models are trained self-supervised.

If this is right

  • Full velocity fields can be reconstructed from sparse boundary context on 3D meshes.
  • The approach outperforms supervised neural surrogates when boundary conditions or training datasets shift.
  • Local geometry edits become possible by reusing unchanged simulation context without full recomputation.
  • Neural surrogates function as reusable flow priors rather than task-specific predictors tied to fixed conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same inpainting formulation could be tested on time-dependent flows by including temporal context tokens.
  • Similar self-supervised priors might apply to reconstruction of other physical fields such as pressure or temperature.
  • Lower data requirements for surrogate modeling could follow if explicit problem specifications are needed only at inference.
  • Application to CFD problems outside hemodynamics would test whether the prior generalizes across different flow regimes.

Load-bearing premise

A prior learned self-supervised over velocity fields without boundary conditions will accurately and stably incorporate arbitrary new boundary constraints and local geometry changes when applied at inference to unseen data.

What would settle it

Direct numerical comparison of the inpainted velocity field against a high-fidelity CFD solver on a held-out 3D mesh that uses inlet and outlet profiles never present in the training distribution.

Figures

Figures reproduced from arXiv: 2605.08832 by Benedikt Wiestler, Daniel R\"uckert, Jonas Weidner, Julian Suk, Yeray Martin-Ruisanchez.

Figure 1
Figure 1. Figure 1: Inpainting physics. (1) We tokenise raw velocity fields into local ball-shaped latent representations. (2) We train a self-supervised model on these tokenised velocity fields using latent flow matching or a masked autoencoder. (3) At inference, boundary conditions are explicitly enforced by fixing known regions like inflow and outflow during inpainting, enabling generalisation to unseen geometries and flow… view at source ↗
Figure 2
Figure 2. Figure 2: The neighbourhood tokeniser demonstrates low reconstruction error over all mass flows. We compare our tokeniser using 2500 tokens with naive baselines of random downsampling and re-interpolation. Additionally, we show examples of the latent ball representation and the reconstruction on both datasets. Neighbourhood tokens accurately represent velocity fields. Our inpainting formulation requires a representa… view at source ↗
Figure 3
Figure 3. Figure 3: Supervised models perform best on the forward prediction on in-distribution tasks. L-MAE and L-FM perform better with additional context. We compare different supervised and self-supervised models on the prediction of velocity fields provided with varying amounts of context (higher masking fraction means less context). Inpainting improves out-of-distribution generalisation under boundary-condition and data… view at source ↗
Figure 4
Figure 4. Figure 4: The best supervised model (red) fails out-of-distribution (ood), while the L-MAE (orange) provides solid results, especially with context. Left, we show the performance over all mass flows. Supervised-Att fails to extrapolate the expected linear mean velocity scaling ood. Right, we provide the nMSE for varied contexts at m = 1 and on the external AneuG test set. Local geometry editing benefits from reusabl… view at source ↗
Figure 5
Figure 5. Figure 5: Inpainting local geometry edits benefits from global context. We locally deform two example geometries by modelling the growth of an aneurysm. By generously masking the area around it and conditioning our inpainting approach on the original simulation, we achieve superior results compared to neural surrogates that simulate the full geometry from scratch for every edit. We show the difference to the ground … view at source ↗
Figure 6
Figure 6. Figure 6: Overview of the architecture of the neighbourhood tokeniser (NT). (1) We obtain each neighbourhood and for every point we obtain the corresponding input for the encoder, containing xr, yr, zr as local position, d as distance to the wall and vx, vy, vz for the velocity values. (2) We encode the data through an MLP followed by max-pooling, yielding the latent space for the neigh￾bourhood. (3) We expand by cr… view at source ↗
Figure 7
Figure 7. Figure 7: Analysis for nMSE values under different number of centres. 2500 is the best scenario [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: The supervised models’ prediction breaks for the external AneuG dataset and OOD mass flows. L-MAE inpainting is the only method consistently outperforming the naive baselines. We compare different supervised and self-supervised models on the prediction of velocity fields provided with varying levels of context. Experiments are conducted on the external AneuG [Ding et al., 2025] dataset. We show the respect… view at source ↗
Figure 9
Figure 9. Figure 9: Combining L-FM with L-MAE improves performance slightly. We test additional integration schemes for L-FM. Iterative masking or soft boundaries show partial improvement, while initialising L-FM with the solution of the L-MAE slightly improves the L-MAE. We evaluate on Aneumo dataset on mass flow 3. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗
read the original abstract

Neural surrogate models for computational fluid dynamics (CFD) are typically trained as forward operators that map explicit problem specifications, such as geometry and boundary conditions, to solution fields. This ties the model to the conditioning variables seen during training and limits reuse under boundary-condition shifts or local geometry changes. We propose to reformulate steady CFD inference as an inpainting problem: instead of training on explicit boundary conditions, we learn a self-supervised prior over velocity fields and impose boundary constraints only during inference by fixing known regions such as inlet, outlet or unchanged regions from previous simulations. To scale this idea to large 3D meshes, we introduce a local neighbourhood tokeniser that represents high-resolution velocity fields as compact spatial latent tokens and train latent flow-matching and masked-autoencoder models on these tokens. On intracranial aneurysm hemodynamics, our method reconstructs full velocity fields from sparse boundary context, outperforms supervised neural surrogates under boundary-condition and dataset shift and enables local geometry editing by reusing unchanged simulation context. These results suggest that viewing CFD inference as context-conditioned inpainting can turn neural surrogates from task-specific predictors into reusable flow priors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes reformulating steady CFD inference as an inpainting task: a self-supervised prior over velocity fields is learned via masked autoencoders and latent flow-matching on velocity fields tokenized by a local neighbourhood tokeniser, without explicit boundary conditions or physics residuals during training. At inference, new boundary constraints and local geometry changes are imposed solely by fixing sparse known velocity patches (e.g., inlet/outlet or unchanged context), enabling full-field reconstruction, improved generalization under BC and dataset shifts, and reusable context on intracranial aneurysm hemodynamics data.

Significance. If the central claims hold, the work could meaningfully advance neural surrogates for CFD by converting them from task-specific forward maps into reusable, context-conditioned flow priors. The local neighbourhood tokeniser is a practical contribution for scaling self-supervised models to high-resolution 3D meshes. The self-supervised training strategy that avoids conditioning on BCs during learning directly targets a known limitation of supervised surrogates.

major comments (2)
  1. [§3.2] §3.2 (Inference procedure): The method imposes new boundary conditions and geometry edits exclusively by fixing known velocity patches at inference time, yet no mechanism (constrained sampling, projection, or auxiliary loss) is described to guarantee exact matching to the fixed regions or to enforce physical properties such as divergence-free flow. This assumption is load-bearing for the claims of accurate reconstruction from sparse context and stable performance under arbitrary shifts.
  2. [§4] §4 (Experiments): The reported outperformance over supervised neural surrogates under boundary-condition and dataset shift is stated without accompanying quantitative metrics, baseline specifications, error bars, or ablations on the neighbourhood token size or model components. This weakens the ability to evaluate the strength of the central generalization claim.
minor comments (2)
  1. [§3.1] The definition and hyperparameter sensitivity of the neighbourhood token size should be expanded with an explicit equation or pseudocode in §3.1 to improve reproducibility.
  2. A brief discussion of related inpainting or masked-modeling work in physics-informed ML would better situate the contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the potential of reformulating CFD inference as context-conditioned inpainting. We address each major comment below and indicate the revisions planned for the manuscript.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (Inference procedure): The method imposes new boundary conditions and geometry edits exclusively by fixing known velocity patches at inference time, yet no mechanism (constrained sampling, projection, or auxiliary loss) is described to guarantee exact matching to the fixed regions or to enforce physical properties such as divergence-free flow. This assumption is load-bearing for the claims of accurate reconstruction from sparse context and stable performance under arbitrary shifts.

    Authors: We agree that the inference procedure section would benefit from greater precision. The manuscript explains that known velocity patches are supplied as unmasked tokens to the latent flow-matching or masked autoencoder model at inference, allowing the generative process to condition on them. However, we acknowledge that an explicit mechanism guaranteeing exact reproduction of the fixed patches is not described. In the revised manuscript we will augment §3.2 with a lightweight post-sampling projection step that overwrites the generated values in the fixed regions with the supplied known velocities, thereby ensuring exact matching without altering the learned prior. With respect to physical properties such as divergence-free flow, the model acquires these properties implicitly through training on physics-consistent data; no auxiliary loss or constrained sampling is applied at inference. We will add a concise discussion of this design choice and its implications, noting that explicit enforcement could be explored as future work (e.g., via a latent-space divergence regularizer). These clarifications strengthen the description while preserving the reported experimental outcomes. revision: partial

  2. Referee: [§4] §4 (Experiments): The reported outperformance over supervised neural surrogates under boundary-condition and dataset shift is stated without accompanying quantitative metrics, baseline specifications, error bars, or ablations on the neighbourhood token size or model components. This weakens the ability to evaluate the strength of the central generalization claim.

    Authors: We accept the referee’s observation that the experimental presentation requires additional quantitative detail to support the generalization claims. Although comparative results are shown, the manuscript does not provide the full set of metrics, error statistics, baseline descriptions, or component ablations requested. In the revised version we will expand §4 to include: tables reporting relative L2 and velocity-magnitude errors with standard deviations computed over multiple random seeds; explicit specifications of the supervised neural surrogate baselines (architectures, training regimes, and hyper-parameters); and ablation studies varying neighbourhood token size as well as the relative contributions of the masked-autoencoder and latent flow-matching components. These additions will allow readers to assess the strength of the reported improvements under boundary-condition and dataset shifts. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the self-supervised inpainting formulation

full rationale

The paper trains a self-supervised prior over velocity fields via masked autoencoders and latent flow-matching on tokenized meshes, without BCs or physics residuals in training. Inference imposes new boundary constraints solely by fixing sparse known velocity patches and inpainting the remainder. This does not reduce any claimed result to its inputs by construction: the generative model is not fitted to test-time BC values or meshes, and no derivation step equates a prediction to a training fit or self-citation. Evaluation on held-out aneurysm data with imposed contexts remains an independent empirical test. No self-definitional, fitted-input-renamed-as-prediction, or load-bearing self-citation patterns appear in the central claims.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the existence of a learnable prior over velocity fields that remains useful when boundary conditions are imposed only at inference, plus the assumption that the local tokeniser preserves sufficient spatial information for accurate inpainting on large 3D meshes.

free parameters (1)
  • neighbourhood token size
    The spatial extent and resolution of each local token in the neighbourhood tokeniser is chosen to balance compactness and fidelity for high-resolution velocity fields.
axioms (1)
  • domain assumption Steady fluid velocity fields possess statistical structure that can be captured by a self-supervised prior without explicit boundary conditioning during training
    This underpins the entire reformulation of CFD inference as inpainting.
invented entities (1)
  • local neighbourhood tokeniser no independent evidence
    purpose: Represent high-resolution 3D velocity fields as compact spatial latent tokens to enable scalable training of latent flow-matching and masked-autoencoder models
    New component introduced to handle large meshes; no independent evidence provided beyond the claimed performance on aneurysm data.

pith-pipeline@v0.9.0 · 5509 in / 1470 out tokens · 31295 ms · 2026-05-12T01:35:47.019800+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

  1. [1]

    Benedikt Alkin and Maurits Bleeker and Richard Kurle and Tobias Kronlachner and Reinhard Sonnleitner and Matthias Dorfer and Johannes Brandstetter , title =. Trans. Mach. Learn. Res. , volume =. 2025 , url =

  2. [2]

    International Conference on Medical Image Computing and Computer-Assisted Intervention , pages=

    LaB-GATr: geometric algebra transformers for large biomedical surface and volume meshes , author=. International Conference on Medical Image Computing and Computer-Assisted Intervention , pages=. 2024 , organization=

  3. [3]

    arXiv preprint arXiv:2505.14717 , year=

    Aneumo: A Large-Scale Multimodal Aneurysm Dataset with Computational Fluid Dynamics Simulations and Deep Learning Benchmarks , author=. arXiv preprint arXiv:2505.14717 , year=

  4. [4]

    The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track , year=

    AneuG-Flow: A Large-Scale Synthetic Dataset of Diverse Intracranial Aneurysm Geometries and Hemodynamics , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track , year=

  5. [5]

    Proceedings of the IEEE/CVF international conference on computer vision , pages=

    Scalable diffusion models with transformers , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

  6. [6]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Masked autoencoders are scalable vision learners , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  7. [7]

    Advances in neural information processing systems , volume=

    Pointnet++: Deep hierarchical feature learning on point sets in a metric space , author=. Advances in neural information processing systems , volume=

  8. [8]

    A foundation model for the Earth system , volume =

    Bodnar, Cristian and Bruinsma, Wessel and Lucic, Ana and Stanley, Megan and Allen, Anna and Brandstetter, Johannes and Garvan, Patrick and Riechert, Maik and Weyn, Jonathan and Dong, Haiyu and Gupta, Jayesh and Thambiratnam, Kit and Archibald, Alexander and Wu, Chun-Chieh and Heider, Elizabeth and Welling, Max and Turner, Richard and Perdikaris, Paris , y...

  9. [9]

    Lorentz-Equivariant Geometric Algebra Transformers for High-Energy Physics , booktitle =

    Jonas Spinner and Victor Bres. Lorentz-Equivariant Geometric Algebra Transformers for High-Energy Physics , booktitle =. 2024 , url =

  10. [10]

    Learning nonlinear operators via

    Lu, Lu and Jin, Pengzhan and Pang, Guofei and Zhang, Zhongqiang and Karniadakis, George , year =. Learning nonlinear operators via. Nature Machine Intelligence , doi =

  11. [11]

    Stuart and Anima Anandkumar , title =

    Zongyi Li and Nikola Borislavov Kovachki and Kamyar Azizzadenesheli and Burigede Liu and Kaushik Bhattacharya and Andrew M. Stuart and Anima Anandkumar , title =. 9th International Conference on Learning Representations,. 2021 , url =

  12. [12]

    Transolver++: An Accurate Neural Solver for

    Huakun Luo and Haixu Wu and Hang Zhou and Lanxiang Xing and Yichen Di and Jianmin Wang and Mingsheng Long , booktitle=. Transolver++: An Accurate Neural Solver for. 2025 , url=

  13. [13]

    2026 , url=

    Benjamin Holzschuh and Georg Kohl and Florian Redinger and Nils Thuerey , booktitle=. 2026 , url=

  14. [14]

    Fengbo: a Clifford Neural Operator pipeline for 3D

    Alberto Pepe and Mattia Montanari and Joan Lasenby , booktitle=. Fengbo: a Clifford Neural Operator pipeline for 3D. 2025 , url=

  15. [15]

    Turner and Johannes Brandstetter , editor =

    Phillip Lippe and Bas Veeling and Paris Perdikaris and Richard E. Turner and Johannes Brandstetter , editor =. PDE-Refiner: Achieving Accurate Long Rollouts with Neural. Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023 , year =

  16. [16]

    On conditional diffusion models for

    Aliaksandra Shysheya and Cristiana Diaconu and Federico Bergamin and Paris Perdikaris and Jos. On conditional diffusion models for. Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024 , year =

  17. [17]

    The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

    Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

  18. [18]

    From Zero to Turbulence: Generative Modeling for

    Marten Lienen and David L. From Zero to Turbulence: Generative Modeling for. The Twelfth International Conference on Learning Representations,. 2024 , url =

  19. [19]

    Utkarsh, P

    Physics-constrained flow matching: Sampling generative models with hard constraints , author=. arXiv preprint arXiv:2506.04171 , year=

  20. [20]

    The Thirteenth International Conference on Learning Representations,

    Mario Lino and Tobias Pfaff and Nils Thuerey , title =. The Thirteenth International Conference on Learning Representations,. 2025 , url =

  21. [21]

    Conditional neural field latent diffusion model for generating spatiotemporal turbulence , volume =

    Du, Pan and Parikh, Meet and Fan, Xiantao and Liu, Xin-Yang and Wang, Jian-Xun , year =. Conditional neural field latent diffusion model for generating spatiotemporal turbulence , volume =. Nature Communications , doi =

  22. [22]

    Unifying Predictions of Deterministic and Stochastic Physics in Mesh-reduced Space with Sequential Flow Generative Model , booktitle =

    Luning Sun and Xu Han and Han Gao and Jian. Unifying Predictions of Deterministic and Stochastic Physics in Mesh-reduced Space with Sequential Flow Generative Model , booktitle =. 2023 , url =

  23. [23]

    and Winhart, B

    Drygala, C. and Winhart, B. and di Mare, F. and Gottschalk, H. , title =. Physics of Fluids , volume =. 2022 , month =. doi:10.1063/5.0082562 , url =

  24. [24]

    and Bonaccorso, F

    Buzzicotti, M. and Bonaccorso, F. and Di Leoni, P. Clark and Biferale, L. , journal =. Reconstruction of turbulent data with deep generative models for semantic inpainting from. 2021 , month =. doi:10.1103/PhysRevFluids.6.050503 , url =

  25. [25]

    Holzschuh and Qiang Liu and Georg Kohl and Nils Thuerey , editor =

    Benjamin J. Holzschuh and Qiang Liu and Georg Kohl and Nils Thuerey , editor =. Forty-second International Conference on Machine Learning,. 2025 , url =

  26. [26]

    IEEE transactions on image processing , volume=

    The farthest point strategy for progressive image sampling , author=. IEEE transactions on image processing , volume=. 1997 , publisher=

  27. [27]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Pointnet: Deep learning on point sets for 3d classification and segmentation , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  28. [28]

    Proceedings of the IEEE international conference on computer vision , pages=

    Delving deep into rectifiers: Surpassing human-level performance on imagenet classification , author=. Proceedings of the IEEE international conference on computer vision , pages=

  29. [29]

    Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining , pages=

    Optuna: A next-generation hyperparameter optimization framework , author=. Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining , pages=

  30. [30]

    An automated and time-efficient framework for simulation of coronary blood flow under steady and pulsatile conditions , journal =

    Guido Nannini and Simone Saitta and Luca Mariani and Riccardo Maragna and Andrea Baggiano and Saima Mushtaq and Gianluca Pontone and Alberto Redaelli , keywords =. An automated and time-efficient framework for simulation of coronary blood flow under steady and pulsatile conditions , journal =. 2024 , issn =. doi:https://doi.org/10.1016/j.cmpb.2024.108415 , url =

  31. [31]

    DrivAerML: High-Fidelity Computational Fluid Dynamics Dataset for Road-Car External Aerodynamics , doi =

    Ashton, Neil and Mockett, Charles and Fuchs, Marian and Fliessbach, Louis and Hetmann, Hendrik and Knacke, Thilo and Schonwald, Norbert and Skaperdas, Vangelis and Fotiadis, Grigoris and Walle, Astrid and Hupertz, Burkhard and Maddix, Danielle , year =. DrivAerML: High-Fidelity Computational Fluid Dynamics Dataset for Road-Car External Aerodynamics , doi =

  32. [32]

    MViTv2: Improved Multiscale Vision Transformers for Classification and Detection , isbn =

    Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Bj. High-Resolution Image Synthesis with Latent Diffusion Models , booktitle =. 2022 , url =. doi:10.1109/CVPR52688.2022.01042 , timestamp =

  33. [33]

    Geometry Aware Operator Transformer as an efficient and accurate neural surrogate for

    Shizheng Wen and Arsh Kumbhat and Levi Lingsch and Sepehr Mousavi and Yizhou Zhao and Praveen Chandrashekar and Siddhartha Mishra , booktitle=. Geometry Aware Operator Transformer as an efficient and accurate neural surrogate for. 2026 , url=