pith. sign in

arxiv: 2605.23778 · v1 · pith:KNRYHDAVnew · submitted 2026-05-22 · ⚛️ physics.ao-ph · cs.LG· physics.comp-ph

The physics of AI weather models

Pith reviewed 2026-05-25 02:18 UTC · model grok-4.3

classification ⚛️ physics.ao-ph cs.LGphysics.comp-ph
keywords AI weather modelsgradient flowlatent spacefree energy functionalparticle descriptioncentered kernel alignmentGraphCastAurora
0
0 comments X

The pith

AI weather models implement a particle description of the atmosphere with movements driven by gradient flow toward a learned free energy minimum.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates whether AI weather models are implicitly solving physical equations, though possibly different from traditional ones. It finds that models with different architectures represent the atmosphere similarly, as shown by correlations in forecast skill and Centered Kernel Alignment. The authors propose that these models use a particle-like description where each mesh point's latent variables represent a particle's position in high-dimensional space, and the particles move according to gradient flow minimizing a learned free energy. Analysis of layer processing supports this by showing shifts from large to small spatial scales with depth. If correct, this constrains the possible physical laws the models can learn based on their structure and training.

Core claim

The authors propose that the AI models implement a particle description of the atmosphere, where the latent variables at each mesh point correspond to the position of a particle in the high dimensional latent space. They hypothesize that the movement of the particles follows a gradient flow in the latent space towards a minimum of a learned free energy functional. This is evidenced by similar representations across models and the observed progression from large-scale to small-scale changes with increasing layer depth.

What carries the argument

A particle description of the atmosphere in which latent variables represent particle positions and evolution follows gradient flow on a learned free energy functional.

If this is right

  • Different AI weather models represent the atmosphere in similar ways despite architectural differences.
  • The models process information from large spatial scales in early layers to smaller scales in deeper layers.
  • The architecture and training constrain the form of physical laws simulated by the models.
  • Evidence from Centered Kernel Alignment supports convergence on similar representations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the gradient flow interpretation holds, modifying the latent space dynamics could improve model stability or interpretability.
  • This particle view might link AI weather prediction to concepts from statistical mechanics.
  • Future models could be designed to explicitly minimize a free energy functional to enhance physical consistency.

Load-bearing premise

The observed change from large-scale modifications in early layers to small-scale modifications in deeper layers reflects gradient flow on a free energy landscape rather than an effect of the model architecture or training procedure.

What would settle it

A demonstration that altering the layer order or training to remove the large-to-small scale progression still yields equivalent forecast skill would challenge the gradient flow hypothesis.

read the original abstract

Could it be that AI weather models are solving physical equations, although they may not be the equations used by conventional NWP models? We compute correlations of forecast skill and Centered Kernel Alignment, providing evidence that different AI weather models represent the atmosphere in similar ways, despite differences in architecture and capacity. We argue that the architecture and training of the AI models constrains the form of the physical laws that they might simulate. In particular, we propose that the models implement a particle description of the atmosphere, where the latent variables at each mesh point correspond to the position of a particle in the high dimensional latent space. We hypothesize that the movement of the particles follows a gradient flow in the latent space towards a minimum of a learned free energy functional. Analysis of the GraphCast and Aurora models show that they make changes on large spatial scales in the early processor layers and move to smaller scale with increasing layer depth, consistent with the gradient flow hypothesis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper computes CKA correlations between GraphCast and Aurora to argue that AI weather models represent the atmosphere similarly despite architectural differences. It proposes that latent variables at mesh points act as particle positions in high-dimensional latent space and hypothesizes that layer-wise processing implements gradient flow toward a minimum of a learned free-energy functional, with supporting evidence from the observed progression of changes from large spatial scales in early layers to smaller scales in deeper layers.

Significance. If the gradient-flow interpretation could be made rigorous, the work would offer a physically motivated lens on why AI weather models generalize and a potential route to extracting effective equations from trained networks. The CKA similarity result is a modest but concrete observation; the particle/free-energy framing remains a hypothesis without a derived mapping from network weights to the claimed functional.

major comments (3)
  1. [Abstract] Abstract (final paragraph) and the hypothesis statement: the observed large-to-small scale refinement with depth is presented as consistent with gradient flow on a free-energy landscape, yet no derivation shows that this ordering is diagnostic of variational dynamics rather than a generic consequence of the multi-scale graph/transformer processors used in both models.
  2. [Hypothesis section] The central hypothesis equates latent variables with particle positions and layer updates with gradient steps, but the manuscript contains no explicit construction or extraction of the free-energy functional from the trained weights, nor any computation of its gradient that could be compared to the observed layer updates.
  3. [Analysis of GraphCast and Aurora] No statistical tests, error bars, or controls are reported for the CKA values or the scale-progression measurements; the link between these quantities and the free-energy claim therefore rests on qualitative inspection alone.
minor comments (1)
  1. [Proposal of particle description] Notation for the latent-space particle description is introduced without a precise mapping from mesh-point indices to the high-dimensional coordinates.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for their insightful comments on our manuscript. We address each of the major comments below and have made revisions to improve clarity and acknowledge limitations where appropriate.

read point-by-point responses
  1. Referee: [Abstract] Abstract (final paragraph) and the hypothesis statement: the observed large-to-small scale refinement with depth is presented as consistent with gradient flow on a free-energy landscape, yet no derivation shows that this ordering is diagnostic of variational dynamics rather than a generic consequence of the multi-scale graph/transformer processors used in both models.

    Authors: We agree that the observed scale progression is a characteristic of the multi-scale processing in these architectures and is not by itself a unique signature of gradient flow. In the revised manuscript, we have modified the abstract and hypothesis section to present this as suggestive consistency with the proposed gradient flow rather than diagnostic evidence. We have also added a short discussion noting that this behavior could arise from other mechanisms but is particularly aligned with variational minimization. revision: yes

  2. Referee: [Hypothesis section] The central hypothesis equates latent variables with particle positions and layer updates with gradient steps, but the manuscript contains no explicit construction or extraction of the free-energy functional from the trained weights, nor any computation of its gradient that could be compared to the observed layer updates.

    Authors: The manuscript presents the particle and free-energy interpretation as a hypothesis inspired by the model architecture and empirical observations, without claiming a full derivation. We do not extract an explicit functional or match gradients to updates, as this would require new techniques beyond the scope of the current work. We have revised the text to more explicitly label this as a hypothesis and to highlight the need for future work on deriving the functional. revision: partial

  3. Referee: [Analysis of GraphCast and Aurora] No statistical tests, error bars, or controls are reported for the CKA values or the scale-progression measurements; the link between these quantities and the free-energy claim therefore rests on qualitative inspection alone.

    Authors: We acknowledge the qualitative nature of the presented analysis. For the revised version, we have added error bars to the CKA similarity measures based on variability across different forecast lead times and included a note on the absence of formal statistical testing as a limitation of the current study. Additional controls, such as comparisons with untrained networks, are discussed as potential extensions. revision: partial

standing simulated objections not resolved
  • Providing an explicit construction of the free-energy functional from the trained network weights and verifying that layer updates correspond to its gradient steps.

Circularity Check

0 steps flagged

No significant circularity; hypothesis presented as interpretive proposal with consistency check

full rationale

The paper proposes a particle-gradient-flow interpretation of AI weather models as a hypothesis and reports that observed layer-wise scale progression (large scales early, smaller scales later) is consistent with it. This is an interpretive link rather than a derivation chain containing self-definitional equations, fitted inputs renamed as predictions, or load-bearing self-citations. No quoted text shows the hypothesis being defined in terms of the observations or vice versa; the scale changes are treated as independent empirical findings. The claim may be under-supported or non-unique, but it does not reduce to its inputs by construction under the enumerated patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The claim rests on two invented entities (latent-space particles and a learned free-energy functional) whose only support is the interpretive fit to observed layer behavior; no independent evidence or falsifiable prediction is supplied.

axioms (2)
  • domain assumption Centered Kernel Alignment and forecast-skill correlations measure similarity of physical representations across models
    Invoked when the authors conclude that different architectures represent the atmosphere in similar ways.
  • ad hoc to paper Progression from large to small spatial scales with layer depth is diagnostic of gradient flow
    Used to link the GraphCast/Aurora layer analysis to the hypothesized dynamics.
invented entities (2)
  • particle in high-dimensional latent space no independent evidence
    purpose: to represent the atmospheric state at each mesh point
    Introduced to give a physical picture of the latent variables; no independent evidence supplied.
  • learned free energy functional no independent evidence
    purpose: to define the minimum toward which latent particles flow
    Postulated to explain the hypothesized dynamics; no independent evidence supplied.

pith-pipeline@v0.9.0 · 5691 in / 1594 out tokens · 24992 ms · 2026-05-25T02:18:47.038625+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · 10 internal anchors

  1. [1]

    arXiv, ://arxiv.org/abs/2506.10772, arXiv:2506.10772 [cs], doi:10.48550/arXiv.2506.10772

    Alet, F., and Coauthors, 2025: Skillful joint probabilistic weather forecasting from marginals. arXiv, ://arxiv.org/abs/2506.10772, arXiv:2506.10772 [cs], doi:10.48550/arXiv.2506.10772

  2. [2]

    Schiff, and Y

    Alvarez-Melis, D., Y. Schiff, and Y. Mroueh, 2021: Optimizing Functionals on the Space of Probabilities with Input Convex Neural Networks . arXiv, ://arxiv.org/abs/2106.00774, arXiv:2106.00774 [stat], doi:10.48550/arXiv.2106.00774

  3. [3]

    Eliciting Latent Predictions from Transformers with the Tuned Lens

    Belrose, N., Z. Furman, L. Smith, D. Halawi, I. Ostrovsky, L. McKinney, S. Biderman, and J. Steinhardt, 2023: Eliciting Latent Predictions from Transformers with the Tuned Lens . arXiv, ://arxiv.org/abs/2303.08112, arXiv:2303.08112 [cs], doi:10.48550/arXiv.2303.08112

  4. [4]

    Mechanistic Interpretability for AI Safety -- A Review

    Bereska, L., and E. Gavves, 2024: Mechanistic Interpretability for AI Safety -- A Review . arXiv, ://arxiv.org/abs/2404.14082, arXiv:2404.14082 [cs], doi:10.48550/arXiv.2404.14082

  5. [5]

    Bi, K., L. Xie, H. Zhang, X. Chen, X. Gu, and Q. Tian, 2023: Accurate medium-range global weather forecasting with 3D neural networks. Nature, 619 (7970), 533--538, doi:10.1038/s41586-023-06185-3, ://www.nature.com/articles/s41586-023-06185-3

  6. [6]

    Nature, 641 (8065), 1180--1187, doi:10.1038/s41586-025-09005-y, ://www.nature.com/articles/s41586-025-09005-y

    Bodnar, C., and Coauthors, 2025: A foundation model for the Earth system. Nature, 641 (8065), 1180--1187, doi:10.1038/s41586-025-09005-y, ://www.nature.com/articles/s41586-025-09005-y

  7. [7]

    arXiv, ://arxiv.org/abs/2507.12144, arXiv:2507.12144 [cs], doi:10.48550/arXiv.2507.12144

    Bonev, B., and Coauthors, 2025: FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale. arXiv, ://arxiv.org/abs/2507.12144, arXiv:2507.12144 [cs], doi:10.48550/arXiv.2507.12144

  8. [8]

    Chen, R. T. Q., Y. Rubanova, J. Bettencourt, and D. K. Duvenaud, 2018: Neural Ordinary Differential Equations . Advances in Neural Information Processing Systems , Curran Associates, Inc., Vol. 31, ://papers.nips.cc/paper_files/paper/2018/hash/69386f6bb1dfed68692a24c8686939b9-Abstract.html

  9. [9]

    Lessig, A

    Couairon, G., C. Lessig, A. Charantonis, and C. Monteleoni, 2024: ArchesWeather : An efficient AI weather forecasting model at 1.5\ deg\ resolution. arXiv, ://arxiv.org/abs/2405.14527, doi:10.48550/arXiv.2405.14527

  10. [10]

    Singh, A

    Couairon, G., R. Singh, A. Charantonis, C. Lessig, and C. Monteleoni, 2026: ArchesWeatherGen : Skillful and compute-efficient probabilistic weather forecasting with machine learning. Science Advances, 12 (17), eadx2372, doi:10.1126/sciadv.adx2372, ://www.science.org/doi/full/10.1126/sciadv.adx2372

  11. [11]

    Cuomo, S., V. S. d. Cola, F. Giampaolo, G. Rozza, M. Raissi, and F. Piccialli, 2022: Scientific Machine Learning through Physics - Informed Neural Networks : Where we are and What 's next. arXiv, ://arxiv.org/abs/2201.05624, arXiv:2201.05624 [cs], doi:10.48550/arXiv.2201.05624

  12. [12]

    arXiv, ://arxiv.org/abs/2509.17601, arXiv:2509.17601 [physics], doi:10.48550/arXiv.2509.17601

    Dunstan, T., and Coauthors, 2025: FastNet : Improving the physical consistency of machine-learning weather prediction models through loss function design. arXiv, ://arxiv.org/abs/2509.17601, arXiv:2509.17601 [physics], doi:10.48550/arXiv.2509.17601

  13. [13]

    ://charts.ecmwf.int/catalogue/packages/ai_models/

    ECMWF, 2025: ECMWF Charts . ://charts.ecmwf.int/catalogue/packages/ai_models/

  14. [14]

    Edamadaka, S., S. Yang, J. Li, and R. Gómez-Bombarelli, 2025: Universally Converging Representations of Matter Across Scientific Foundation Models . arXiv, ://arxiv.org/abs/2512.03750, arXiv:2512.03750 [cs], doi:10.48550/arXiv.2512.03750

  15. [15]

    ://transformer-circuits.pub/2021/framework/index.html, https://transformer-circuits.pub/2021/framework/index.html

    Elhage, N., and Coauthors, 2021: A Mathematical Framework for Transformer Circuits . ://transformer-circuits.pub/2021/framework/index.html, https://transformer-circuits.pub/2021/framework/index.html

  16. [16]

    Optimal Transport on Quantum Structures , J

    Figalli, A., 2024: An Introduction to Optimal Transport and Wasserstein Gradient Flows . Optimal Transport on Quantum Structures , J. Maas, S. Rademacher, T. Titkos, and D. Virosztek, Eds., Vol. 29, Springer Nature Switzerland, Cham, 1--28, doi:10.1007/978-3-031-50466-2_1, ://link.springer.com/10.1007/978-3-031-50466-2_1, series Title: Bolyai Society Math...

  17. [17]

    Letrouit, Y

    Geshkovski, B., C. Letrouit, Y. Polyanskiy, and P. Rigollet, 2023: A mathematical perspective on Transformers . ://arxiv.org/abs/2312.10794v4

  18. [18]

    Huang, G., Y. Sun, Z. Liu, D. Sedra, and K. Q. Weinberger, 2016: Deep Networks with Stochastic Depth . Computer Vision – ECCV 2016 , B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds., Springer International Publishing, Cham, 646--661, doi:10.1007/978-3-319-46493-0_39

  19. [19]

    The Platonic Representation Hypothesis

    Huh, M., B. Cheung, T. Wang, and P. Isola, 2024: The Platonic Representation Hypothesis . ://arxiv.org/abs/2405.07987v5

  20. [20]

    SIAM Journal on Mathematical Analysis , volume =

    Jordan, R., D. Kinderlehrer, and F. Otto, 1998: The Variational Formulation of the Fokker -- Planck Equation . SIAM Journal on Mathematical Analysis, 29 (1), 1--17, doi:10.1137/S0036141096303359, ://epubs.siam.org/doi/10.1137/S0036141096303359

  21. [21]

    Similarity of Neural Network Representations Revisited

    Kornblith, S., M. Norouzi, H. Lee, and G. Hinton, 2019: Similarity of Neural Network Representations Revisited . arXiv, ://arxiv.org/abs/1905.00414, arXiv:1905.00414 [cs.LG], doi:10.48550/arXiv.1905.00414

  22. [22]

    Kurth, T., and Coauthors, 2023: FourCastNet : Accelerating Global High - Resolution Weather Forecasting Using Adaptive Fourier Neural Operators . Proceedings of the Platform for Advanced Scientific Computing Conference , Association for Computing Machinery, New York, NY, USA, 1--11, PASC '23, doi:10.1145/3592979.3593412, ://dl.acm.org/doi/10.1145/3592979.3593412

  23. [23]

    doi: 10.1126/science.adi2336

    Lam, R., and Coauthors, 2023: Learning skillful medium-range global weather forecasting. Science, 382 (6677), 1416--1421, doi:10.1126/science.adi2336, ://www.science.org/doi/10.1126/science.adi2336

  24. [24]

    arXiv, ://arxiv.org/abs/2406.01465, doi:10.48550/arXiv.2406.01465

    Lang, S., and Coauthors, 2024 a : AIFS - ECMWF 's data-driven forecasting system. arXiv, ://arxiv.org/abs/2406.01465, doi:10.48550/arXiv.2406.01465

  25. [25]

    arXiv, ://arxiv.org/abs/2412.15832, arXiv:2412.15832 [physics] version: 1, doi:10.48550/arXiv.2412.15832

    Lang, S., and Coauthors, 2024 b : AIFS - CRPS : Ensemble forecasting using a model trained with a loss function based on the Continuous Ranked Probability Score . arXiv, ://arxiv.org/abs/2412.15832, arXiv:2412.15832 [physics] version: 1, doi:10.48550/arXiv.2412.15832

  26. [26]

    arXiv, ://arxiv.org/abs/2502.17516, arXiv:2502.17516 [cs], doi:10.48550/arXiv.2502.17516

    Lin, Z., and Coauthors, 2025: A Survey on Mechanistic Interpretability for Multi - Modal Foundation Models . arXiv, ://arxiv.org/abs/2502.17516, arXiv:2502.17516 [cs], doi:10.48550/arXiv.2502.17516

  27. [27]

    N., 1969: The predictability of a flow which possesses many scales of motion

    Lorenz, E. N., 1969: The predictability of a flow which possesses many scales of motion. Tellus, 21 (3), 289--307, doi:10.1111/j.2153-3490.1969.tb00444.x, ://onlinelibrary.wiley.com/doi/abs/10.1111/j.2153-3490.1969.tb00444.x, \_eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.2153-3490.1969.tb00444.x

  28. [28]

    Decoupled Weight Decay Regularization

    Loshchilov, I., and F. Hutter, 2019: Decoupled Weight Decay Regularization . arXiv, ://arxiv.org/abs/1711.05101, arXiv:1711.05101 [cs], doi:10.48550/arXiv.1711.05101

  29. [29]

    MacMillan, T., and N. T. Ouellette, 2025: Towards mechanistic understanding in a data-driven weather model: internal activations reveal interpretable physical features. ://arxiv.org/abs/2512.24440v1

  30. [30]

    Marion, P., Y.-H. Wu, M. E. Sander, and G. Biau, 2024: Implicit regularization of deep residual networks towards neural ODEs . arXiv, ://arxiv.org/abs/2309.01213, arXiv:2309.01213 [cs, stat] version: 3, doi:10.48550/arXiv.2309.01213

  31. [31]

    Mermin, N. D., 1990: What's wrong with this pillow? Boojums All the Way through: Communicating Science in a Prosaic Age , Cambridge University Press, Cambridge, 198--204, doi:10.1017/CBO9780511608216.017, ://www.cambridge.org/core/books/boojums-all-the-way-through/whats-wrong-with-this-pillow/9B6C0AFA094ED6667647D8E9706784A0

  32. [32]

    Messori, 2024: Do data-driven models beat numerical models in forecasting weather extremes? A comparison of IFS HRES , Pangu - Weather , and GraphCast

    Olivetti, L., and G. Messori, 2024: Do data-driven models beat numerical models in forecasting weather extremes? A comparison of IFS HRES , Pangu - Weather , and GraphCast . Geoscientific Model Development, 17 (21), 7915--7962, doi:10.5194/gmd-17-7915-2024, ://gmd.copernicus.org/articles/17/7915/2024/

  33. [33]

    Palmer, T. N., A. Döring, and G. Seregin, 2014: The real butterfly effect. Nonlinearity, 27 (9), R123, doi:10.1088/0951-7715/27/9/R123, ://dx.doi.org/10.1088/0951-7715/27/9/R123

  34. [34]

    FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators

    Pathak, J., and Coauthors, 2022: FourCastNet : A Global Data -driven High -resolution Weather Model using Adaptive Fourier Neural Operators . arXiv, ://arxiv.org/abs/2202.11214, arXiv:2202.11214 [physics]

  35. [35]

    arXiv, ://arxiv.org/abs/2501.10465, arXiv:2501.10465 [math], doi:10.48550/arXiv.2501.10465

    Peyré, G., 2025 a : The Mathematics of Artificial Intelligence . arXiv, ://arxiv.org/abs/2501.10465, arXiv:2501.10465 [math], doi:10.48550/arXiv.2501.10465

  36. [36]

    arXiv, ://arxiv.org/abs/2512.06797, arXiv:2512.06797 [math], doi:10.48550/arXiv.2512.06797

    Peyré, G., 2025 b : Optimal and Diffusion Transports in Machine Learning . arXiv, ://arxiv.org/abs/2512.06797, arXiv:2512.06797 [math], doi:10.48550/arXiv.2512.06797

  37. [37]

    arXiv, ://arxiv.org/abs/2312.15796, arXiv:2312.15796 [physics] version: 2, doi:10.48550/arXiv.2312.15796

    Price, I., and Coauthors, 2024: GenCast : Diffusion -based ensemble forecasting for medium-range weather. arXiv, ://arxiv.org/abs/2312.15796, arXiv:2312.15796 [physics] version: 2, doi:10.48550/arXiv.2312.15796

  38. [38]

    Rai, D., Y. Zhou, S. Feng, A. Saparov, and Z. Yao, 2025: A Practical Review of Mechanistic Interpretability for Transformer - Based Language Models . arXiv, ://arxiv.org/abs/2407.02646, arXiv:2407.02646 [cs], doi:10.48550/arXiv.2407.02646

  39. [39]

    arXiv, ://arxiv.org/abs/2512.01868, arXiv:2512.01868 [cs], doi:10.48550/arXiv.2512.01868

    Rigollet, P., 2026: The Mean - Field Dynamics of Transformers . arXiv, ://arxiv.org/abs/2512.01868, arXiv:2512.01868 [cs], doi:10.48550/arXiv.2512.01868

  40. [40]

    Sander, M. E., P. Ablin, M. Blondel, and G. Peyré, 2022: Sinkformers: Transformers with Doubly Stochastic Attention . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics , PMLR, 3515--3530, ://proceedings.mlr.press/v151/sander22a.html

  41. [41]

    { Euclidean, Metric, and Wasserstein } Gradient Flows: an overview

    Santambrogio, F., 2016: \ Euclidean , Metric , and Wasserstein \ Gradient Flows : an overview. arXiv, ://arxiv.org/abs/1609.03890, arXiv:1609.03890 [math], doi:10.48550/arXiv.1609.03890

  42. [42]

    Bruinsma, G

    Selz, T., W. Bruinsma, G. C. Craig, S. Markou, R. Turner, and A. Vaughan, 2025: On the effective resolution of AI weather prediction models. doi:10.22541/essoar.174139239.94807670/v1, ://www.authorea.com/users/645836/articles/1274105-on-the-effective-resolution-of-ai-weather-prediction-models

  43. [43]

    Selz, T., and G. C. Craig, 2023: Can Artificial Intelligence - Based Weather Prediction Models Simulate the Butterfly Effect ? Geophysical Research Letters, 50 (20), e2023GL105\,747, doi:10.1029/2023GL105747, ://onlinelibrary.wiley.com/doi/abs/10.1029/2023GL105747, \_eprint: https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/2023GL105747

  44. [44]

    G., and Coauthors, 2018: Storylines: an alternative approach to representing uncertainty in physical aspects of climate change

    Shepherd, T. G., and Coauthors, 2018: Storylines: an alternative approach to representing uncertainty in physical aspects of climate change. Climatic Change, 151 (3), 555--571, doi:10.1007/s10584-018-2317-9, ://doi.org/10.1007/s10584-018-2317-9

  45. [45]

    A Bayesian Perspective on Generalization and Stochastic Gradient Descent

    Smith, S. L., and Q. V. Le, 2018: A Bayesian Perspective on Generalization and Stochastic Gradient Descent . arXiv, ://arxiv.org/abs/1710.06451, arXiv:1710.06451 [cs], doi:10.48550/arXiv.1710.06451

  46. [46]

    Sun, Y. Q., P. Hassanzadeh, M. Zand, A. Chattopadhyay, J. Weare, and D. S. Abbot, 2025: Can AI weather models predict out-of-distribution gray swan tropical cyclones? Proceedings of the National Academy of Sciences, 122 (21), e2420914\,122, doi:10.1073/pnas.2420914122, ://www.pnas.org/doi/10.1073/pnas.2420914122

  47. [47]

    Tempest, K. I., M. Beylich, and G. C. Craig, 2026: Mechanistic Interpretability Tool for AI Weather Models . arXiv, ://arxiv.org/abs/2604.20467, arXiv:2604.20467 [physics], doi:10.48550/arXiv.2604.20467

  48. [48]

    Baratin, and R

    Vuckovic, J., A. Baratin, and R. T. d. Combes, 2020: A Mathematical Theory of Attention . arXiv, ://arxiv.org/abs/2007.02876, arXiv:2007.02876 [stat.ML], doi:10.48550/arXiv.2007.02876

  49. [49]

    Perceptrons and localization of attention's mean-field landscape

    Álvarez López, A., B. Geshkovski, and D. Ruiz-Balet, 2026: Perceptrons and localization of attention's mean-field landscape. arXiv, ://arxiv.org/abs/2601.21366, arXiv:2601.21366 [cs], doi:10.48550/arXiv.2601.21366