pith. sign in

arxiv: 2606.11657 · v1 · pith:LMID66D6new · submitted 2026-06-10 · 💻 cs.LG · cs.AI

Sparse probes and murky physics: a case study of interpretability challenges in a foundation model for continuum dynamics

Pith reviewed 2026-06-27 10:53 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords sparse autoencodermechanistic interpretabilityfoundation modelcontinuum dynamicsshear flowenstrophyfluid dynamics emulation
0
0 comments X

The pith

A foundation model for continuum dynamics recruits SAE features in piecewise consistent but physically unaligned patterns across shear flow setups.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper applies a sparse autoencoder to a layer inside the Walrus foundation model for continuum dynamics and uses enstrophy to triage more than 20,000 features. In shear-flow test cases it compares feature activation across different numerical parameter values and finds that some features recur in similar roles. This recurrence is only partial and does not line up with conventional physical quantities such as energy or vorticity fields. Output mismatches between the emulator and direct simulation, including overly diffuse or localized structures, are traced to shifts in particular SAE features. The work points to open practical problems in ranking mechanistically relevant features and separating stable internal structure from single-layer or SAE artifacts.

Core claim

Across multiple shear-flow setups the model shows evidence of piecewise consistency in which subsets of SAE features recur in similar roles, but this structure is intermittent and does not map cleanly onto standard physical decompositions; parts of the observed discrepancies between numerical simulation and emulator outputs can be connected to changes in specific SAE feature usage.

What carries the argument

Sparse autoencoder features triaged by enstrophy in one selected layer of the Walrus foundation model for continuum dynamics.

If this is right

  • Subsets of SAE features recur in similar roles across different shear-flow parameter values.
  • The observed consistency remains only piecewise and does not align with standard physical decompositions.
  • Some systematic output discrepancies between simulation and emulation are traceable to changes in particular SAE feature usage.
  • Single-layer SAE analysis leaves open how to separate stable internal structure from analysis artifacts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Using additional or combined physical metrics for triage might expose whether low-enstrophy features carry overlooked mechanistic information.
  • The intermittent consistency could indicate that the model develops effective yet non-physical internal representations for continuum tasks.
  • Extending the same probing approach to other layers or to different foundation models would test whether these interpretability issues are widespread.

Load-bearing premise

Enstrophy supplies a sufficient and unbiased filter for selecting important SAE features from over 20,000 without missing low-enstrophy but mechanistically relevant ones or creating selection artifacts that alter the reported consistency and discrepancy patterns.

What would settle it

Repeating the triage and comparison with a different physical quantity such as integrated kinetic energy instead of enstrophy, and obtaining feature sets that map cleanly onto physical decompositions with stable roles across all setups, would falsify the claim of intermittent and non-mapping structure.

Figures

Figures reproduced from arXiv: 2606.11657 by Katherine Rosenfeld, Maike Sonnewald.

Figure 1
Figure 1. Figure 1: Comparing Walrus to the numerical reference: the tracer field, bias, and resulting energy [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparing enstrophy, E, of Sim50 versus total feature activation for the feature with greatest Spearman’s rank corelation coefficient (ρ = 0.85). We also show the tracer field (middle row) and blue enstrophy overlaid by red feature activation heatmaps (bottom row) [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: As in Fig. 2 for feature 8245 is Sim [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: As in Fig. 2 for feature 8245 is Sim [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Energy spectra for simulations Sim50 (left) and Sim3 (right) and two timesteps near the be￾ginning (solid line) and middle (dashed line) of the simulation. We show results from the numerical simulation (dark blue) as well as the Walrus single step prediction (light blue). representation to capture, while representational spreading makes whatever remains harder to iso￾late. However, neither pattern is unive… view at source ↗
Figure 6
Figure 6. Figure 6: MSE loss (left), Aux loss (center), and alive fraction (right) from our training runs for [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Enstrophy distributions per time-step and trajectory. We order the simulations by the mean [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Distribution of Spearman’s rank coefficient, [PITH_FULL_IMAGE:figures/full_fig_p061_8.png] view at source ↗
read the original abstract

Generative AI emulators are increasingly used in scientific domains where we already have strong theory, benchmarks, and physical intuition. This raises a central evaluation and interpretability question: when a foundation-style model can reproduce known continuum dynamics, what internal mechanism supports that behavior, is the internal behaviour consistent with known physics, and how does it relate to where the emulator succeeds or fails? We investigate a cross-domain foundation model for continuum dynamics, Walrus by Polymathic, using mechanistic interpretability guided by physical principles. We apply a sparse autoencoder (SAE) to probe a selected layer, and address the practical challenge of triaging a large feature set (over 20,000) using enstrophy as a physically grounded metric. As a deliberately simple testbed, we focus on shear flow and compare feature recruitment across multiple shear-flow setups, i.e. parameter values in the numerical simulation. Across setups we find evidence of piecewise consistency, with subsets of features recurring in similar roles, but this structure is intermittent and does not map cleanly onto standard physical decompositions. In parallel, direct comparisons between numerical simulation and the emulator reveal systematic output-level discrepancies, including regimes where energy/structures become too diffuse or too localized. We connect parts of these discrepancies to changes in specific SAE feature usage. Our work highlights open questions for scientific foundation models: how to robustly prioritize mechanistically meaningful features, how to separate stable structure from analysis artifacts (including single-layer and SAE limitations), and how to use established benchmarks to decide when "different" internal representations are genuinely informative rather than merely effective.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper applies sparse autoencoders to a selected layer of the Walrus foundation model for continuum dynamics, using enstrophy to triage >20,000 features in shear-flow simulations across parameter values. It reports piecewise consistency in recurring feature roles that is intermittent and does not map cleanly to standard physical decompositions, while linking some output discrepancies (diffuse vs. localized structures) between simulation and emulator to changes in specific SAE feature usage.

Significance. If the central empirical observations hold after addressing triage validation, the work usefully surfaces open methodological questions for interpretability in scientific foundation models: robust prioritization of mechanistically meaningful features, separation of stable structure from single-layer/SAE artifacts, and criteria for when internal differences are informative. The deliberate choice of a simple shear-flow testbed and external physical metric (enstrophy) is a strength for grounding the analysis.

major comments (2)
  1. [Methods] The enstrophy triage procedure (described in the methods) is load-bearing for the claims of piecewise consistency and discrepancy-feature linkages, yet the manuscript supplies no quantitative check (e.g., overlap with a non-enstrophy metric, recall of known vorticity features, or sensitivity analysis) that the threshold preserves the relevant mechanistic subspace rather than systematically excluding low-enstrophy but causally important features such as subtle boundary or gradient encodings.
  2. [Results] The abstract and results sections state findings of 'piecewise consistency' and causal connections to output discrepancies without accompanying quantitative support (overlap fractions, statistical tests, ablation on feature subsets, or error bars), which prevents assessment of effect sizes and reproducibility of the intermittency observation.
minor comments (1)
  1. Notation for SAE feature indices and the precise layer chosen should be defined explicitly on first use to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which identify important opportunities to strengthen the methodological grounding and quantitative presentation of our case study. We respond to each major comment below.

read point-by-point responses
  1. Referee: [Methods] The enstrophy triage procedure (described in the methods) is load-bearing for the claims of piecewise consistency and discrepancy-feature linkages, yet the manuscript supplies no quantitative check (e.g., overlap with a non-enstrophy metric, recall of known vorticity features, or sensitivity analysis) that the threshold preserves the relevant mechanistic subspace rather than systematically excluding low-enstrophy but causally important features such as subtle boundary or gradient encodings.

    Authors: We agree that the triage procedure would benefit from additional validation. Enstrophy was selected because it is a physically natural metric for the vorticity-dominated shear-flow testbed. In the revision we will add a sensitivity analysis across threshold values and report feature overlap with a secondary metric (kinetic energy) to check whether low-enstrophy but potentially relevant encodings are excluded. This will make explicit the extent to which the selected subspace is preserved. revision: yes

  2. Referee: [Results] The abstract and results sections state findings of 'piecewise consistency' and causal connections to output discrepancies without accompanying quantitative support (overlap fractions, statistical tests, ablation on feature subsets, or error bars), which prevents assessment of effect sizes and reproducibility of the intermittency observation.

    Authors: The reported patterns are qualitative observations drawn from the deliberately limited shear-flow testbed. We will add overlap fractions for recurring features across parameter values and include a limited ablation on the most frequently recruited feature subsets to quantify their contribution to the observed output discrepancies. Because the intermittency itself is the central empirical finding, formal statistical tests are not straightforward, but we will clarify the exploratory character of the results and note reproducibility across the tested configurations. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical case study with external physical triage metric

full rationale

The paper is an observational interpretability case study applying SAE probes to a pre-trained foundation model and triaging >20k features via the external physical quantity enstrophy. No derivation chain, fitted-parameter predictions, self-definitional steps, or load-bearing self-citations exist. Claims of piecewise consistency and discrepancy linkage rest on direct comparisons to numerical simulations and feature activation patterns, not on any reduction to the paper's own inputs or prior author work by construction. The enstrophy triage is an analysis choice whose adequacy is debatable on methodological grounds but does not create circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; the work rests on standard SAE machinery and the domain assumption that enstrophy is a suitable triage signal.

pith-pipeline@v0.9.1-grok · 5823 in / 1265 out tokens · 42119 ms · 2026-06-27T10:53:00.316956+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 30 canonical work pages · 5 internal anchors

  1. [1]

    Journal of Computational Physics , author =

    A transformer-based convolutional method to model inverse cascade in forced two-dimensional turbulence , volume =. Journal of Computational Physics , author =. 2025 , pages =. doi:10.1016/j.jcp.2024.113475 , abstract =

  2. [2]

    McCabe, Michael and Mukhopadhyay, Payel and Marwah, Tanya and Blancard, Bruno Regaldo-Saint and Rozet, Francois and Diaconu, Cristiana and Meyer, Lucas and Wong, Kaze W. K. and Sotoudeh, Hadi and Bietti, Alberto and Espejo, Irina and Fear, Rio and Golkar, Siavash and Hehir, Tom and Hirashima, Keiya and Krawezik, Geraud and Lanusse, Francois and Morel, Rud...

  3. [3]

    and Beneitez, Miguel and Berger, Marsha and Burkhart, Blakesley and Burns, Keaton and Dalziel, Stuart B

    Ohana, Ruben and McCabe, Michael and Meyer, Lucas and Morel, Rudy and Agocs, Fruzsina J. and Beneitez, Miguel and Berger, Marsha and Burkhart, Blakesley and Burns, Keaton and Dalziel, Stuart B. and Fielding, Drummond B. and Fortunato, Daniel and Goldberg, Jared A. and Hirashima, Keiya and Jiang, Yan-Fei and Kerswell, Rich R. and Maddu, Suryanarayana and M...

  4. [4]

    , month = dec, year =

    MacMillan, Theodore and Ouellette, Nicholas T. , month = dec, year =. Towards mechanistic understanding in a data-driven weather model: internal activations reveal interpretable physical features , shorttitle =. doi:10.48550/arXiv.2512.24440 , abstract =

  5. [5]

    Poseidon:

    Herde, Maximilian and Raonić, Bogdan and Rohner, Tobias and Käppeli, Roger and Molinaro, Roberto and Bézenac, Emmanuel de and Mishra, Siddhartha , month = nov, year =. Poseidon:. doi:10.48550/arXiv.2405.19101 , abstract =

  6. [6]

    Park, Kiho and Choe, Yo Joong and Veitch, Victor , month = jul, year =. The. doi:10.48550/arXiv.2311.03658 , abstract =

  7. [7]

    every concept is activated with positive probability

    Zoom. Distill , author =. 2020 , pages =. doi:10.23915/distill.00024.001 , number =

  8. [8]

    transformer-circuits , author =

    Towards. transformer-circuits , author =

  9. [9]

    Fear, Rio Alexa and Mukhopadhyay, Payel and McCabe, Michael and Bietti, Alberto and Cranmer, Miles , month = nov, year =. Physics. doi:10.48550/arXiv.2511.20798 , abstract =

  10. [10]

    Physical Review Research , keywords =

    Dedalus:. Physical Review Research , author =. 2020 , note =. doi:10.1103/PhysRevResearch.2.023068 , abstract =

  11. [11]

    Adam: A Method for Stochastic Optimization

    Kingma, Diederik P. and Ba, Jimmy , month = jan, year =. Adam:. doi:10.48550/arXiv.1412.6980 , abstract =

  12. [12]

    Probabilistic machine learning: an introduction , publisher =

    Murphy, Kevin P , year =. Probabilistic machine learning: an introduction , publisher =

  13. [13]

    Cunningham, Hoagy and Ewart, Aidan and Riggs, Logan and Huben, Robert and Sharkey, Lee , month = oct, year =. Sparse. doi:10.48550/arXiv.2309.08600 , abstract =

  14. [14]

    Proceedings of the National Academy of Sciences , author =

    Sparse autoencoders uncover biologically interpretable features in protein language model representations , volume =. Proceedings of the National Academy of Sciences , author =. 2025 , pages =. doi:10.1073/pnas.2506316122 , abstract =

  15. [15]

    Guan, Haoxiang and He, Jiyan and Zhang, Jie , month = jul, year =. Sparse. doi:10.48550/arXiv.2507.07486 , abstract =

  16. [16]

    Shu, Dong and Wu, Xuansheng and Zhao, Haiyan and Rai, Daking and Yao, Ziyu and Liu, Ninghao and Du, Mengnan , year =. A. doi:10.48550/ARXIV.2503.05613 , abstract =

  17. [17]

    2025 , pages =

    Nature Methods , author =. 2025 , pages =. doi:10.1038/s41592-025-02836-7 , language =

  18. [18]

    and Castro, Daniel C

    Abdulaal, Ahmed and Fry, Hugo and Montaña-Brown, Nina and Ijishakin, Ayodeji and Gao, Jack and Hyland, Stephanie and Alexander, Daniel C. and Castro, Daniel C. , year =. An. doi:10.48550/ARXIV.2410.03334 , abstract =

  19. [19]

    doi:10.48550/arXiv.2212.12794 , abstract =

    Lam, Remi and Sanchez-Gonzalez, Alvaro and Willson, Matthew and Wirnsberger, Peter and Fortunato, Meire and Alet, Ferran and Ravuri, Suman and Ewalds, Timo and Eaton-Rosen, Zach and Hu, Weihua and Merose, Alexander and Hoyer, Stephan and Holland, George and Vinyals, Oriol and Stott, Jacklynn and Pritzel, Alexander and Mohamed, Shakir and Battaglia, Peter ...

  20. [20]

    the-well-rbc-sf , url =

    Morel, Rudy , month = nov, year =. the-well-rbc-sf , url =

  21. [21]

    Scaling and evaluating sparse autoencoders

    Gao, Leo and Tour, Tom Dupré la and Tillman, Henk and Goh, Gabriel and Troll, Rajan and Radford, Alec and Sutskever, Ilya and Leike, Jan and Wu, Jeffrey , month = jun, year =. Scaling and evaluating sparse autoencoders , url =. doi:10.48550/arXiv.2406.04093 , abstract =

  22. [22]

    arXiv.org , author =

    Controllable. arXiv.org , author =

  23. [23]

    arXiv.org , author =

    Axial. arXiv.org , author =

  24. [24]

    arXiv.org , author =

    Multiple. arXiv.org , author =

  25. [25]

    Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead

    Rudin, Cynthia , year =. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead

  26. [26]

    Elhage, Nelson and Hume, Tristan and Olsson, Catherine and Schiefer, Nicholas and Henighan, Tom and Kravec, Shauna and Hatfield-Dodds, Zac and Lasenby, Robert and Drain, Dawn and Chen, Carol and Grosse, Roger and McCandlish, Sam and Kaplan, Jared and Amodei, Dario and Wattenberg, Martin and Olah, Christopher , month = sep, year =. Toy. doi:10.48550/arXiv....

  27. [27]

    Carleo, I

    Machine learning and the physical sciences , volume =. Reviews of Modern Physics , author =. 2019 , pages =. doi:10.1103/RevModPhys.91.045002 , language =

  28. [28]

    Arithmetic

    Nikankin, Yaniv and Reusch, Anja and Mueller, Aaron and Belinkov, Yonatan , month = may, year =. Arithmetic. doi:10.48550/arXiv.2410.21272 , abstract =

  29. [29]

    Interpretable

    Wetzel, Sebastian Johann and Ha, Seungwoong and Iten, Raban and Klopotek, Miriam and Liu, Ziming , year =. Interpretable. doi:10.48550/ARXIV.2503.23616 , abstract =

  30. [30]

    Nature Communications , author =

    Physics-informed learning of governing equations from scarce data , volume =. Nature Communications , author =. 2021 , note =. doi:10.1038/s41467-021-26434-1 , abstract =

  31. [31]

    , month = sep, year =

    Sanderse, Benjamin and Stinis, Panos and Maulik, Romit and Ahmed, Shady E. , month = sep, year =. Scientific machine learning for closure models in multiscale problems: a review , shorttitle =. doi:10.48550/arXiv.2403.02913 , abstract =

  32. [32]

    Templeton, Adly and Conerly, Tom , month = may, year =. Scaling

  33. [33]

    Annual Review of Condensed Matter Physics , author =

    Machine. Annual Review of Condensed Matter Physics , author =. 2025 , note =. doi:10.1146/annurev-conmatphys-043024-114758 , abstract =

  34. [34]

    Artificial Intelligence for the Earth Systems , author =

    A hierarchical ensemble manifold methodology for new knowledge on spatial data:. Artificial Intelligence for the Earth Systems , author =

  35. [35]

    Yik, William and Sonnewald, Maike and Clare, Mariana C. A. and Lguensat, Redouane , month = dec, year =. Southern. doi:10.48550/arXiv.2310.13916 , abstract =

  36. [36]

    Journal of Advances in Modeling Earth Systems , author =

    Revealing the. Journal of Advances in Modeling Earth Systems , author =. 2021 , note =. doi:10.1029/2021MS002496 , abstract =

  37. [37]

    Journal of Advances in Modeling Earth Systems , author =

    Explainable. Journal of Advances in Modeling Earth Systems , author =. 2022 , note =. doi:10.1029/2022MS003162 , abstract =

  38. [38]

    arXiv.org , author =

  39. [39]

    in review , author =

    Machine. in review , author =

  40. [40]

    Environmental Research Letters , author =

    Bridging observations, theory and numerical simulation of the ocean using machine learning , volume =. Environmental Research Letters , author =. 2021 , note =. doi:10.1088/1748-9326/ac0eb0 , abstract =

  41. [41]

    Engineering Applications of Artificial Intelligence , author =

    Automated identification of dominant physical processes , volume =. Engineering Applications of Artificial Intelligence , author =. 2022 , keywords =. doi:10.1016/j.engappai.2022.105496 , abstract =