The physics of AI weather models

George Craig; Kirsten I. Tempest; Matthias Beylich; Tobias Selz

arxiv: 2605.23778 · v1 · pith:KNRYHDAVnew · submitted 2026-05-22 · ⚛️ physics.ao-ph · cs.LG· physics.comp-ph

The physics of AI weather models

George Craig , Tobias Selz , Matthias Beylich , Kirsten I. Tempest This is my paper

Pith reviewed 2026-05-25 02:18 UTC · model grok-4.3

classification ⚛️ physics.ao-ph cs.LGphysics.comp-ph

keywords AI weather modelsgradient flowlatent spacefree energy functionalparticle descriptioncentered kernel alignmentGraphCastAurora

0 comments

The pith

AI weather models implement a particle description of the atmosphere with movements driven by gradient flow toward a learned free energy minimum.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates whether AI weather models are implicitly solving physical equations, though possibly different from traditional ones. It finds that models with different architectures represent the atmosphere similarly, as shown by correlations in forecast skill and Centered Kernel Alignment. The authors propose that these models use a particle-like description where each mesh point's latent variables represent a particle's position in high-dimensional space, and the particles move according to gradient flow minimizing a learned free energy. Analysis of layer processing supports this by showing shifts from large to small spatial scales with depth. If correct, this constrains the possible physical laws the models can learn based on their structure and training.

Core claim

The authors propose that the AI models implement a particle description of the atmosphere, where the latent variables at each mesh point correspond to the position of a particle in the high dimensional latent space. They hypothesize that the movement of the particles follows a gradient flow in the latent space towards a minimum of a learned free energy functional. This is evidenced by similar representations across models and the observed progression from large-scale to small-scale changes with increasing layer depth.

What carries the argument

A particle description of the atmosphere in which latent variables represent particle positions and evolution follows gradient flow on a learned free energy functional.

If this is right

Different AI weather models represent the atmosphere in similar ways despite architectural differences.
The models process information from large spatial scales in early layers to smaller scales in deeper layers.
The architecture and training constrain the form of physical laws simulated by the models.
Evidence from Centered Kernel Alignment supports convergence on similar representations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the gradient flow interpretation holds, modifying the latent space dynamics could improve model stability or interpretability.
This particle view might link AI weather prediction to concepts from statistical mechanics.
Future models could be designed to explicitly minimize a free energy functional to enhance physical consistency.

Load-bearing premise

The observed change from large-scale modifications in early layers to small-scale modifications in deeper layers reflects gradient flow on a free energy landscape rather than an effect of the model architecture or training procedure.

What would settle it

A demonstration that altering the layer order or training to remove the large-to-small scale progression still yields equivalent forecast skill would challenge the gradient flow hypothesis.

read the original abstract

Could it be that AI weather models are solving physical equations, although they may not be the equations used by conventional NWP models? We compute correlations of forecast skill and Centered Kernel Alignment, providing evidence that different AI weather models represent the atmosphere in similar ways, despite differences in architecture and capacity. We argue that the architecture and training of the AI models constrains the form of the physical laws that they might simulate. In particular, we propose that the models implement a particle description of the atmosphere, where the latent variables at each mesh point correspond to the position of a particle in the high dimensional latent space. We hypothesize that the movement of the particles follows a gradient flow in the latent space towards a minimum of a learned free energy functional. Analysis of the GraphCast and Aurora models show that they make changes on large spatial scales in the early processor layers and move to smaller scale with increasing layer depth, consistent with the gradient flow hypothesis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper hypothesizes that AI weather models run a particle gradient flow on a learned free energy in latent space, backed by CKA alignment and layer-wise scale refinement, but the inference stays correlational and untested against architectural alternatives.

read the letter

The main takeaway is that this work proposes AI models like GraphCast and Aurora encode the atmosphere as particles in high-dimensional latent space whose updates follow gradient flow toward a minimum of some learned free energy. The claim rests on two observations: different models show similar internal representations via Centered Kernel Alignment and forecast-skill correlations, and both shift from large-scale adjustments in early layers to small-scale refinements in deeper layers.

Referee Report

3 major / 1 minor

Summary. The paper computes CKA correlations between GraphCast and Aurora to argue that AI weather models represent the atmosphere similarly despite architectural differences. It proposes that latent variables at mesh points act as particle positions in high-dimensional latent space and hypothesizes that layer-wise processing implements gradient flow toward a minimum of a learned free-energy functional, with supporting evidence from the observed progression of changes from large spatial scales in early layers to smaller scales in deeper layers.

Significance. If the gradient-flow interpretation could be made rigorous, the work would offer a physically motivated lens on why AI weather models generalize and a potential route to extracting effective equations from trained networks. The CKA similarity result is a modest but concrete observation; the particle/free-energy framing remains a hypothesis without a derived mapping from network weights to the claimed functional.

major comments (3)

[Abstract] Abstract (final paragraph) and the hypothesis statement: the observed large-to-small scale refinement with depth is presented as consistent with gradient flow on a free-energy landscape, yet no derivation shows that this ordering is diagnostic of variational dynamics rather than a generic consequence of the multi-scale graph/transformer processors used in both models.
[Hypothesis section] The central hypothesis equates latent variables with particle positions and layer updates with gradient steps, but the manuscript contains no explicit construction or extraction of the free-energy functional from the trained weights, nor any computation of its gradient that could be compared to the observed layer updates.
[Analysis of GraphCast and Aurora] No statistical tests, error bars, or controls are reported for the CKA values or the scale-progression measurements; the link between these quantities and the free-energy claim therefore rests on qualitative inspection alone.

minor comments (1)

[Proposal of particle description] Notation for the latent-space particle description is introduced without a precise mapping from mesh-point indices to the high-dimensional coordinates.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for their insightful comments on our manuscript. We address each of the major comments below and have made revisions to improve clarity and acknowledge limitations where appropriate.

read point-by-point responses

Referee: [Abstract] Abstract (final paragraph) and the hypothesis statement: the observed large-to-small scale refinement with depth is presented as consistent with gradient flow on a free-energy landscape, yet no derivation shows that this ordering is diagnostic of variational dynamics rather than a generic consequence of the multi-scale graph/transformer processors used in both models.

Authors: We agree that the observed scale progression is a characteristic of the multi-scale processing in these architectures and is not by itself a unique signature of gradient flow. In the revised manuscript, we have modified the abstract and hypothesis section to present this as suggestive consistency with the proposed gradient flow rather than diagnostic evidence. We have also added a short discussion noting that this behavior could arise from other mechanisms but is particularly aligned with variational minimization. revision: yes
Referee: [Hypothesis section] The central hypothesis equates latent variables with particle positions and layer updates with gradient steps, but the manuscript contains no explicit construction or extraction of the free-energy functional from the trained weights, nor any computation of its gradient that could be compared to the observed layer updates.

Authors: The manuscript presents the particle and free-energy interpretation as a hypothesis inspired by the model architecture and empirical observations, without claiming a full derivation. We do not extract an explicit functional or match gradients to updates, as this would require new techniques beyond the scope of the current work. We have revised the text to more explicitly label this as a hypothesis and to highlight the need for future work on deriving the functional. revision: partial
Referee: [Analysis of GraphCast and Aurora] No statistical tests, error bars, or controls are reported for the CKA values or the scale-progression measurements; the link between these quantities and the free-energy claim therefore rests on qualitative inspection alone.

Authors: We acknowledge the qualitative nature of the presented analysis. For the revised version, we have added error bars to the CKA similarity measures based on variability across different forecast lead times and included a note on the absence of formal statistical testing as a limitation of the current study. Additional controls, such as comparisons with untrained networks, are discussed as potential extensions. revision: partial

standing simulated objections not resolved

Providing an explicit construction of the free-energy functional from the trained network weights and verifying that layer updates correspond to its gradient steps.

Circularity Check

0 steps flagged

No significant circularity; hypothesis presented as interpretive proposal with consistency check

full rationale

The paper proposes a particle-gradient-flow interpretation of AI weather models as a hypothesis and reports that observed layer-wise scale progression (large scales early, smaller scales later) is consistent with it. This is an interpretive link rather than a derivation chain containing self-definitional equations, fitted inputs renamed as predictions, or load-bearing self-citations. No quoted text shows the hypothesis being defined in terms of the observations or vice versa; the scale changes are treated as independent empirical findings. The claim may be under-supported or non-unique, but it does not reduce to its inputs by construction under the enumerated patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The claim rests on two invented entities (latent-space particles and a learned free-energy functional) whose only support is the interpretive fit to observed layer behavior; no independent evidence or falsifiable prediction is supplied.

axioms (2)

domain assumption Centered Kernel Alignment and forecast-skill correlations measure similarity of physical representations across models
Invoked when the authors conclude that different architectures represent the atmosphere in similar ways.
ad hoc to paper Progression from large to small spatial scales with layer depth is diagnostic of gradient flow
Used to link the GraphCast/Aurora layer analysis to the hypothesized dynamics.

invented entities (2)

particle in high-dimensional latent space no independent evidence
purpose: to represent the atmospheric state at each mesh point
Introduced to give a physical picture of the latent variables; no independent evidence supplied.
learned free energy functional no independent evidence
purpose: to define the minimum toward which latent particles flow
Postulated to explain the hypothesized dynamics; no independent evidence supplied.

pith-pipeline@v0.9.0 · 5691 in / 1594 out tokens · 24992 ms · 2026-05-25T02:18:47.038625+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We hypothesize that the movement of the particles follows a gradient flow in the latent space towards a minimum of a learned free energy functional... v_s = −∇(δG/δ α) ... G(α)=∫H(α)dx + ∫V(x)α(x)dx + ½∫∫W(x,x′)α(x)α(x′)dxdx′
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the system of mesh points as a set of interacting particles... mean-field limit... continuity equation ∂s α̃_s + ∇·(α̃_s v_s)=0

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · 10 internal anchors

[1]

arXiv, ://arxiv.org/abs/2506.10772, arXiv:2506.10772 [cs], doi:10.48550/arXiv.2506.10772

Alet, F., and Coauthors, 2025: Skillful joint probabilistic weather forecasting from marginals. arXiv, ://arxiv.org/abs/2506.10772, arXiv:2506.10772 [cs], doi:10.48550/arXiv.2506.10772

work page doi:10.48550/arxiv.2506.10772 2025
[2]

Schiff, and Y

Alvarez-Melis, D., Y. Schiff, and Y. Mroueh, 2021: Optimizing Functionals on the Space of Probabilities with Input Convex Neural Networks . arXiv, ://arxiv.org/abs/2106.00774, arXiv:2106.00774 [stat], doi:10.48550/arXiv.2106.00774

work page doi:10.48550/arxiv.2106.00774 2021
[3]

Eliciting Latent Predictions from Transformers with the Tuned Lens

Belrose, N., Z. Furman, L. Smith, D. Halawi, I. Ostrovsky, L. McKinney, S. Biderman, and J. Steinhardt, 2023: Eliciting Latent Predictions from Transformers with the Tuned Lens . arXiv, ://arxiv.org/abs/2303.08112, arXiv:2303.08112 [cs], doi:10.48550/arXiv.2303.08112

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.08112 2023
[4]

Mechanistic Interpretability for AI Safety -- A Review

Bereska, L., and E. Gavves, 2024: Mechanistic Interpretability for AI Safety -- A Review . arXiv, ://arxiv.org/abs/2404.14082, arXiv:2404.14082 [cs], doi:10.48550/arXiv.2404.14082

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2404.14082 2024
[5]

Bi, K., L. Xie, H. Zhang, X. Chen, X. Gu, and Q. Tian, 2023: Accurate medium-range global weather forecasting with 3D neural networks. Nature, 619 (7970), 533--538, doi:10.1038/s41586-023-06185-3, ://www.nature.com/articles/s41586-023-06185-3

work page doi:10.1038/s41586-023-06185-3 2023
[6]

Nature, 641 (8065), 1180--1187, doi:10.1038/s41586-025-09005-y, ://www.nature.com/articles/s41586-025-09005-y

Bodnar, C., and Coauthors, 2025: A foundation model for the Earth system. Nature, 641 (8065), 1180--1187, doi:10.1038/s41586-025-09005-y, ://www.nature.com/articles/s41586-025-09005-y

work page doi:10.1038/s41586-025-09005-y 2025
[7]

arXiv, ://arxiv.org/abs/2507.12144, arXiv:2507.12144 [cs], doi:10.48550/arXiv.2507.12144

Bonev, B., and Coauthors, 2025: FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale. arXiv, ://arxiv.org/abs/2507.12144, arXiv:2507.12144 [cs], doi:10.48550/arXiv.2507.12144

work page doi:10.48550/arxiv.2507.12144 2025
[8]

Chen, R. T. Q., Y. Rubanova, J. Bettencourt, and D. K. Duvenaud, 2018: Neural Ordinary Differential Equations . Advances in Neural Information Processing Systems , Curran Associates, Inc., Vol. 31, ://papers.nips.cc/paper_files/paper/2018/hash/69386f6bb1dfed68692a24c8686939b9-Abstract.html

work page 2018
[9]

Lessig, A

Couairon, G., C. Lessig, A. Charantonis, and C. Monteleoni, 2024: ArchesWeather : An efficient AI weather forecasting model at 1.5\ deg\ resolution. arXiv, ://arxiv.org/abs/2405.14527, doi:10.48550/arXiv.2405.14527

work page doi:10.48550/arxiv.2405.14527 2024
[10]

Singh, A

Couairon, G., R. Singh, A. Charantonis, C. Lessig, and C. Monteleoni, 2026: ArchesWeatherGen : Skillful and compute-efficient probabilistic weather forecasting with machine learning. Science Advances, 12 (17), eadx2372, doi:10.1126/sciadv.adx2372, ://www.science.org/doi/full/10.1126/sciadv.adx2372

work page doi:10.1126/sciadv.adx2372 2026
[11]

Cuomo, S., V. S. d. Cola, F. Giampaolo, G. Rozza, M. Raissi, and F. Piccialli, 2022: Scientific Machine Learning through Physics - Informed Neural Networks : Where we are and What 's next. arXiv, ://arxiv.org/abs/2201.05624, arXiv:2201.05624 [cs], doi:10.48550/arXiv.2201.05624

work page doi:10.48550/arxiv.2201.05624 2022
[12]

arXiv, ://arxiv.org/abs/2509.17601, arXiv:2509.17601 [physics], doi:10.48550/arXiv.2509.17601

Dunstan, T., and Coauthors, 2025: FastNet : Improving the physical consistency of machine-learning weather prediction models through loss function design. arXiv, ://arxiv.org/abs/2509.17601, arXiv:2509.17601 [physics], doi:10.48550/arXiv.2509.17601

work page doi:10.48550/arxiv.2509.17601 2025
[13]

://charts.ecmwf.int/catalogue/packages/ai_models/

ECMWF, 2025: ECMWF Charts . ://charts.ecmwf.int/catalogue/packages/ai_models/

work page 2025
[14]

Edamadaka, S., S. Yang, J. Li, and R. Gómez-Bombarelli, 2025: Universally Converging Representations of Matter Across Scientific Foundation Models . arXiv, ://arxiv.org/abs/2512.03750, arXiv:2512.03750 [cs], doi:10.48550/arXiv.2512.03750

work page doi:10.48550/arxiv.2512.03750 2025
[15]

://transformer-circuits.pub/2021/framework/index.html, https://transformer-circuits.pub/2021/framework/index.html

Elhage, N., and Coauthors, 2021: A Mathematical Framework for Transformer Circuits . ://transformer-circuits.pub/2021/framework/index.html, https://transformer-circuits.pub/2021/framework/index.html

work page 2021
[16]

Optimal Transport on Quantum Structures , J

Figalli, A., 2024: An Introduction to Optimal Transport and Wasserstein Gradient Flows . Optimal Transport on Quantum Structures , J. Maas, S. Rademacher, T. Titkos, and D. Virosztek, Eds., Vol. 29, Springer Nature Switzerland, Cham, 1--28, doi:10.1007/978-3-031-50466-2_1, ://link.springer.com/10.1007/978-3-031-50466-2_1, series Title: Bolyai Society Math...

work page doi:10.1007/978-3-031-50466-2_1 2024
[17]

Letrouit, Y

Geshkovski, B., C. Letrouit, Y. Polyanskiy, and P. Rigollet, 2023: A mathematical perspective on Transformers . ://arxiv.org/abs/2312.10794v4

work page arXiv 2023
[18]

Huang, G., Y. Sun, Z. Liu, D. Sedra, and K. Q. Weinberger, 2016: Deep Networks with Stochastic Depth . Computer Vision – ECCV 2016 , B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds., Springer International Publishing, Cham, 646--661, doi:10.1007/978-3-319-46493-0_39

work page doi:10.1007/978-3-319-46493-0_39 2016
[19]

The Platonic Representation Hypothesis

Huh, M., B. Cheung, T. Wang, and P. Isola, 2024: The Platonic Representation Hypothesis . ://arxiv.org/abs/2405.07987v5

work page internal anchor Pith review Pith/arXiv arXiv 2024
[20]

SIAM Journal on Mathematical Analysis , volume =

Jordan, R., D. Kinderlehrer, and F. Otto, 1998: The Variational Formulation of the Fokker -- Planck Equation . SIAM Journal on Mathematical Analysis, 29 (1), 1--17, doi:10.1137/S0036141096303359, ://epubs.siam.org/doi/10.1137/S0036141096303359

work page doi:10.1137/s0036141096303359 1998
[21]

Similarity of Neural Network Representations Revisited

Kornblith, S., M. Norouzi, H. Lee, and G. Hinton, 2019: Similarity of Neural Network Representations Revisited . arXiv, ://arxiv.org/abs/1905.00414, arXiv:1905.00414 [cs.LG], doi:10.48550/arXiv.1905.00414

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1905.00414 2019
[22]

Kurth, T., and Coauthors, 2023: FourCastNet : Accelerating Global High - Resolution Weather Forecasting Using Adaptive Fourier Neural Operators . Proceedings of the Platform for Advanced Scientific Computing Conference , Association for Computing Machinery, New York, NY, USA, 1--11, PASC '23, doi:10.1145/3592979.3593412, ://dl.acm.org/doi/10.1145/3592979.3593412

work page doi:10.1145/3592979.3593412 2023
[23]

doi: 10.1126/science.adi2336

Lam, R., and Coauthors, 2023: Learning skillful medium-range global weather forecasting. Science, 382 (6677), 1416--1421, doi:10.1126/science.adi2336, ://www.science.org/doi/10.1126/science.adi2336

work page doi:10.1126/science.adi2336 2023
[24]

arXiv, ://arxiv.org/abs/2406.01465, doi:10.48550/arXiv.2406.01465

Lang, S., and Coauthors, 2024 a : AIFS - ECMWF 's data-driven forecasting system. arXiv, ://arxiv.org/abs/2406.01465, doi:10.48550/arXiv.2406.01465

work page doi:10.48550/arxiv.2406.01465 2024
[25]

arXiv, ://arxiv.org/abs/2412.15832, arXiv:2412.15832 [physics] version: 1, doi:10.48550/arXiv.2412.15832

Lang, S., and Coauthors, 2024 b : AIFS - CRPS : Ensemble forecasting using a model trained with a loss function based on the Continuous Ranked Probability Score . arXiv, ://arxiv.org/abs/2412.15832, arXiv:2412.15832 [physics] version: 1, doi:10.48550/arXiv.2412.15832

work page doi:10.48550/arxiv.2412.15832 2024
[26]

arXiv, ://arxiv.org/abs/2502.17516, arXiv:2502.17516 [cs], doi:10.48550/arXiv.2502.17516

Lin, Z., and Coauthors, 2025: A Survey on Mechanistic Interpretability for Multi - Modal Foundation Models . arXiv, ://arxiv.org/abs/2502.17516, arXiv:2502.17516 [cs], doi:10.48550/arXiv.2502.17516

work page doi:10.48550/arxiv.2502.17516 2025
[27]

N., 1969: The predictability of a flow which possesses many scales of motion

Lorenz, E. N., 1969: The predictability of a flow which possesses many scales of motion. Tellus, 21 (3), 289--307, doi:10.1111/j.2153-3490.1969.tb00444.x, ://onlinelibrary.wiley.com/doi/abs/10.1111/j.2153-3490.1969.tb00444.x, \_eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.2153-3490.1969.tb00444.x

work page doi:10.1111/j.2153-3490.1969.tb00444.x 1969
[28]

Decoupled Weight Decay Regularization

Loshchilov, I., and F. Hutter, 2019: Decoupled Weight Decay Regularization . arXiv, ://arxiv.org/abs/1711.05101, arXiv:1711.05101 [cs], doi:10.48550/arXiv.1711.05101

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1711.05101 2019
[29]

MacMillan, T., and N. T. Ouellette, 2025: Towards mechanistic understanding in a data-driven weather model: internal activations reveal interpretable physical features. ://arxiv.org/abs/2512.24440v1

work page arXiv 2025
[30]

Marion, P., Y.-H. Wu, M. E. Sander, and G. Biau, 2024: Implicit regularization of deep residual networks towards neural ODEs . arXiv, ://arxiv.org/abs/2309.01213, arXiv:2309.01213 [cs, stat] version: 3, doi:10.48550/arXiv.2309.01213

work page doi:10.48550/arxiv.2309.01213 2024
[31]

Mermin, N. D., 1990: What's wrong with this pillow? Boojums All the Way through: Communicating Science in a Prosaic Age , Cambridge University Press, Cambridge, 198--204, doi:10.1017/CBO9780511608216.017, ://www.cambridge.org/core/books/boojums-all-the-way-through/whats-wrong-with-this-pillow/9B6C0AFA094ED6667647D8E9706784A0

work page doi:10.1017/cbo9780511608216.017 1990
[32]

Messori, 2024: Do data-driven models beat numerical models in forecasting weather extremes? A comparison of IFS HRES , Pangu - Weather , and GraphCast

Olivetti, L., and G. Messori, 2024: Do data-driven models beat numerical models in forecasting weather extremes? A comparison of IFS HRES , Pangu - Weather , and GraphCast . Geoscientific Model Development, 17 (21), 7915--7962, doi:10.5194/gmd-17-7915-2024, ://gmd.copernicus.org/articles/17/7915/2024/

work page doi:10.5194/gmd-17-7915-2024 2024
[33]

Palmer, T. N., A. Döring, and G. Seregin, 2014: The real butterfly effect. Nonlinearity, 27 (9), R123, doi:10.1088/0951-7715/27/9/R123, ://dx.doi.org/10.1088/0951-7715/27/9/R123

work page doi:10.1088/0951-7715/27/9/r123 2014
[34]

FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators

Pathak, J., and Coauthors, 2022: FourCastNet : A Global Data -driven High -resolution Weather Model using Adaptive Fourier Neural Operators . arXiv, ://arxiv.org/abs/2202.11214, arXiv:2202.11214 [physics]

work page internal anchor Pith review Pith/arXiv arXiv 2022
[35]

arXiv, ://arxiv.org/abs/2501.10465, arXiv:2501.10465 [math], doi:10.48550/arXiv.2501.10465

Peyré, G., 2025 a : The Mathematics of Artificial Intelligence . arXiv, ://arxiv.org/abs/2501.10465, arXiv:2501.10465 [math], doi:10.48550/arXiv.2501.10465

work page doi:10.48550/arxiv.2501.10465 2025
[36]

arXiv, ://arxiv.org/abs/2512.06797, arXiv:2512.06797 [math], doi:10.48550/arXiv.2512.06797

Peyré, G., 2025 b : Optimal and Diffusion Transports in Machine Learning . arXiv, ://arxiv.org/abs/2512.06797, arXiv:2512.06797 [math], doi:10.48550/arXiv.2512.06797

work page doi:10.48550/arxiv.2512.06797 2025
[37]

arXiv, ://arxiv.org/abs/2312.15796, arXiv:2312.15796 [physics] version: 2, doi:10.48550/arXiv.2312.15796

Price, I., and Coauthors, 2024: GenCast : Diffusion -based ensemble forecasting for medium-range weather. arXiv, ://arxiv.org/abs/2312.15796, arXiv:2312.15796 [physics] version: 2, doi:10.48550/arXiv.2312.15796

work page doi:10.48550/arxiv.2312.15796 2024
[38]

Rai, D., Y. Zhou, S. Feng, A. Saparov, and Z. Yao, 2025: A Practical Review of Mechanistic Interpretability for Transformer - Based Language Models . arXiv, ://arxiv.org/abs/2407.02646, arXiv:2407.02646 [cs], doi:10.48550/arXiv.2407.02646

work page doi:10.48550/arxiv.2407.02646 2025
[39]

arXiv, ://arxiv.org/abs/2512.01868, arXiv:2512.01868 [cs], doi:10.48550/arXiv.2512.01868

Rigollet, P., 2026: The Mean - Field Dynamics of Transformers . arXiv, ://arxiv.org/abs/2512.01868, arXiv:2512.01868 [cs], doi:10.48550/arXiv.2512.01868

work page doi:10.48550/arxiv.2512.01868 2026
[40]

Sander, M. E., P. Ablin, M. Blondel, and G. Peyré, 2022: Sinkformers: Transformers with Doubly Stochastic Attention . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics , PMLR, 3515--3530, ://proceedings.mlr.press/v151/sander22a.html

work page 2022
[41]

{ Euclidean, Metric, and Wasserstein } Gradient Flows: an overview

Santambrogio, F., 2016: \ Euclidean , Metric , and Wasserstein \ Gradient Flows : an overview. arXiv, ://arxiv.org/abs/1609.03890, arXiv:1609.03890 [math], doi:10.48550/arXiv.1609.03890

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1609.03890 2016
[42]

Bruinsma, G

Selz, T., W. Bruinsma, G. C. Craig, S. Markou, R. Turner, and A. Vaughan, 2025: On the effective resolution of AI weather prediction models. doi:10.22541/essoar.174139239.94807670/v1, ://www.authorea.com/users/645836/articles/1274105-on-the-effective-resolution-of-ai-weather-prediction-models

work page doi:10.22541/essoar.174139239.94807670/v1 2025
[43]

Selz, T., and G. C. Craig, 2023: Can Artificial Intelligence - Based Weather Prediction Models Simulate the Butterfly Effect ? Geophysical Research Letters, 50 (20), e2023GL105\,747, doi:10.1029/2023GL105747, ://onlinelibrary.wiley.com/doi/abs/10.1029/2023GL105747, \_eprint: https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/2023GL105747

work page doi:10.1029/2023gl105747 2023
[44]

G., and Coauthors, 2018: Storylines: an alternative approach to representing uncertainty in physical aspects of climate change

Shepherd, T. G., and Coauthors, 2018: Storylines: an alternative approach to representing uncertainty in physical aspects of climate change. Climatic Change, 151 (3), 555--571, doi:10.1007/s10584-018-2317-9, ://doi.org/10.1007/s10584-018-2317-9

work page doi:10.1007/s10584-018-2317-9 2018
[45]

A Bayesian Perspective on Generalization and Stochastic Gradient Descent

Smith, S. L., and Q. V. Le, 2018: A Bayesian Perspective on Generalization and Stochastic Gradient Descent . arXiv, ://arxiv.org/abs/1710.06451, arXiv:1710.06451 [cs], doi:10.48550/arXiv.1710.06451

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1710.06451 2018
[46]

Sun, Y. Q., P. Hassanzadeh, M. Zand, A. Chattopadhyay, J. Weare, and D. S. Abbot, 2025: Can AI weather models predict out-of-distribution gray swan tropical cyclones? Proceedings of the National Academy of Sciences, 122 (21), e2420914\,122, doi:10.1073/pnas.2420914122, ://www.pnas.org/doi/10.1073/pnas.2420914122

work page doi:10.1073/pnas.2420914122 2025
[47]

Tempest, K. I., M. Beylich, and G. C. Craig, 2026: Mechanistic Interpretability Tool for AI Weather Models . arXiv, ://arxiv.org/abs/2604.20467, arXiv:2604.20467 [physics], doi:10.48550/arXiv.2604.20467

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.20467 2026
[48]

Baratin, and R

Vuckovic, J., A. Baratin, and R. T. d. Combes, 2020: A Mathematical Theory of Attention . arXiv, ://arxiv.org/abs/2007.02876, arXiv:2007.02876 [stat.ML], doi:10.48550/arXiv.2007.02876

work page doi:10.48550/arxiv.2007.02876 2020
[49]

Perceptrons and localization of attention's mean-field landscape

Álvarez López, A., B. Geshkovski, and D. Ruiz-Balet, 2026: Perceptrons and localization of attention's mean-field landscape. arXiv, ://arxiv.org/abs/2601.21366, arXiv:2601.21366 [cs], doi:10.48550/arXiv.2601.21366

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2601.21366 2026

[1] [1]

arXiv, ://arxiv.org/abs/2506.10772, arXiv:2506.10772 [cs], doi:10.48550/arXiv.2506.10772

Alet, F., and Coauthors, 2025: Skillful joint probabilistic weather forecasting from marginals. arXiv, ://arxiv.org/abs/2506.10772, arXiv:2506.10772 [cs], doi:10.48550/arXiv.2506.10772

work page doi:10.48550/arxiv.2506.10772 2025

[2] [2]

Schiff, and Y

Alvarez-Melis, D., Y. Schiff, and Y. Mroueh, 2021: Optimizing Functionals on the Space of Probabilities with Input Convex Neural Networks . arXiv, ://arxiv.org/abs/2106.00774, arXiv:2106.00774 [stat], doi:10.48550/arXiv.2106.00774

work page doi:10.48550/arxiv.2106.00774 2021

[3] [3]

Eliciting Latent Predictions from Transformers with the Tuned Lens

Belrose, N., Z. Furman, L. Smith, D. Halawi, I. Ostrovsky, L. McKinney, S. Biderman, and J. Steinhardt, 2023: Eliciting Latent Predictions from Transformers with the Tuned Lens . arXiv, ://arxiv.org/abs/2303.08112, arXiv:2303.08112 [cs], doi:10.48550/arXiv.2303.08112

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.08112 2023

[4] [4]

Mechanistic Interpretability for AI Safety -- A Review

Bereska, L., and E. Gavves, 2024: Mechanistic Interpretability for AI Safety -- A Review . arXiv, ://arxiv.org/abs/2404.14082, arXiv:2404.14082 [cs], doi:10.48550/arXiv.2404.14082

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2404.14082 2024

[5] [5]

Bi, K., L. Xie, H. Zhang, X. Chen, X. Gu, and Q. Tian, 2023: Accurate medium-range global weather forecasting with 3D neural networks. Nature, 619 (7970), 533--538, doi:10.1038/s41586-023-06185-3, ://www.nature.com/articles/s41586-023-06185-3

work page doi:10.1038/s41586-023-06185-3 2023

[6] [6]

Nature, 641 (8065), 1180--1187, doi:10.1038/s41586-025-09005-y, ://www.nature.com/articles/s41586-025-09005-y

Bodnar, C., and Coauthors, 2025: A foundation model for the Earth system. Nature, 641 (8065), 1180--1187, doi:10.1038/s41586-025-09005-y, ://www.nature.com/articles/s41586-025-09005-y

work page doi:10.1038/s41586-025-09005-y 2025

[7] [7]

arXiv, ://arxiv.org/abs/2507.12144, arXiv:2507.12144 [cs], doi:10.48550/arXiv.2507.12144

Bonev, B., and Coauthors, 2025: FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale. arXiv, ://arxiv.org/abs/2507.12144, arXiv:2507.12144 [cs], doi:10.48550/arXiv.2507.12144

work page doi:10.48550/arxiv.2507.12144 2025

[8] [8]

Chen, R. T. Q., Y. Rubanova, J. Bettencourt, and D. K. Duvenaud, 2018: Neural Ordinary Differential Equations . Advances in Neural Information Processing Systems , Curran Associates, Inc., Vol. 31, ://papers.nips.cc/paper_files/paper/2018/hash/69386f6bb1dfed68692a24c8686939b9-Abstract.html

work page 2018

[9] [9]

Lessig, A

Couairon, G., C. Lessig, A. Charantonis, and C. Monteleoni, 2024: ArchesWeather : An efficient AI weather forecasting model at 1.5\ deg\ resolution. arXiv, ://arxiv.org/abs/2405.14527, doi:10.48550/arXiv.2405.14527

work page doi:10.48550/arxiv.2405.14527 2024

[10] [10]

Singh, A

Couairon, G., R. Singh, A. Charantonis, C. Lessig, and C. Monteleoni, 2026: ArchesWeatherGen : Skillful and compute-efficient probabilistic weather forecasting with machine learning. Science Advances, 12 (17), eadx2372, doi:10.1126/sciadv.adx2372, ://www.science.org/doi/full/10.1126/sciadv.adx2372

work page doi:10.1126/sciadv.adx2372 2026

[11] [11]

Cuomo, S., V. S. d. Cola, F. Giampaolo, G. Rozza, M. Raissi, and F. Piccialli, 2022: Scientific Machine Learning through Physics - Informed Neural Networks : Where we are and What 's next. arXiv, ://arxiv.org/abs/2201.05624, arXiv:2201.05624 [cs], doi:10.48550/arXiv.2201.05624

work page doi:10.48550/arxiv.2201.05624 2022

[12] [12]

arXiv, ://arxiv.org/abs/2509.17601, arXiv:2509.17601 [physics], doi:10.48550/arXiv.2509.17601

Dunstan, T., and Coauthors, 2025: FastNet : Improving the physical consistency of machine-learning weather prediction models through loss function design. arXiv, ://arxiv.org/abs/2509.17601, arXiv:2509.17601 [physics], doi:10.48550/arXiv.2509.17601

work page doi:10.48550/arxiv.2509.17601 2025

[13] [13]

://charts.ecmwf.int/catalogue/packages/ai_models/

ECMWF, 2025: ECMWF Charts . ://charts.ecmwf.int/catalogue/packages/ai_models/

work page 2025

[14] [14]

Edamadaka, S., S. Yang, J. Li, and R. Gómez-Bombarelli, 2025: Universally Converging Representations of Matter Across Scientific Foundation Models . arXiv, ://arxiv.org/abs/2512.03750, arXiv:2512.03750 [cs], doi:10.48550/arXiv.2512.03750

work page doi:10.48550/arxiv.2512.03750 2025

[15] [15]

://transformer-circuits.pub/2021/framework/index.html, https://transformer-circuits.pub/2021/framework/index.html

Elhage, N., and Coauthors, 2021: A Mathematical Framework for Transformer Circuits . ://transformer-circuits.pub/2021/framework/index.html, https://transformer-circuits.pub/2021/framework/index.html

work page 2021

[16] [16]

Optimal Transport on Quantum Structures , J

Figalli, A., 2024: An Introduction to Optimal Transport and Wasserstein Gradient Flows . Optimal Transport on Quantum Structures , J. Maas, S. Rademacher, T. Titkos, and D. Virosztek, Eds., Vol. 29, Springer Nature Switzerland, Cham, 1--28, doi:10.1007/978-3-031-50466-2_1, ://link.springer.com/10.1007/978-3-031-50466-2_1, series Title: Bolyai Society Math...

work page doi:10.1007/978-3-031-50466-2_1 2024

[17] [17]

Letrouit, Y

Geshkovski, B., C. Letrouit, Y. Polyanskiy, and P. Rigollet, 2023: A mathematical perspective on Transformers . ://arxiv.org/abs/2312.10794v4

work page arXiv 2023

[18] [18]

Huang, G., Y. Sun, Z. Liu, D. Sedra, and K. Q. Weinberger, 2016: Deep Networks with Stochastic Depth . Computer Vision – ECCV 2016 , B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds., Springer International Publishing, Cham, 646--661, doi:10.1007/978-3-319-46493-0_39

work page doi:10.1007/978-3-319-46493-0_39 2016

[19] [19]

The Platonic Representation Hypothesis

Huh, M., B. Cheung, T. Wang, and P. Isola, 2024: The Platonic Representation Hypothesis . ://arxiv.org/abs/2405.07987v5

work page internal anchor Pith review Pith/arXiv arXiv 2024

[20] [20]

SIAM Journal on Mathematical Analysis , volume =

Jordan, R., D. Kinderlehrer, and F. Otto, 1998: The Variational Formulation of the Fokker -- Planck Equation . SIAM Journal on Mathematical Analysis, 29 (1), 1--17, doi:10.1137/S0036141096303359, ://epubs.siam.org/doi/10.1137/S0036141096303359

work page doi:10.1137/s0036141096303359 1998

[21] [21]

Similarity of Neural Network Representations Revisited

Kornblith, S., M. Norouzi, H. Lee, and G. Hinton, 2019: Similarity of Neural Network Representations Revisited . arXiv, ://arxiv.org/abs/1905.00414, arXiv:1905.00414 [cs.LG], doi:10.48550/arXiv.1905.00414

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1905.00414 2019

[22] [22]

Kurth, T., and Coauthors, 2023: FourCastNet : Accelerating Global High - Resolution Weather Forecasting Using Adaptive Fourier Neural Operators . Proceedings of the Platform for Advanced Scientific Computing Conference , Association for Computing Machinery, New York, NY, USA, 1--11, PASC '23, doi:10.1145/3592979.3593412, ://dl.acm.org/doi/10.1145/3592979.3593412

work page doi:10.1145/3592979.3593412 2023

[23] [23]

doi: 10.1126/science.adi2336

Lam, R., and Coauthors, 2023: Learning skillful medium-range global weather forecasting. Science, 382 (6677), 1416--1421, doi:10.1126/science.adi2336, ://www.science.org/doi/10.1126/science.adi2336

work page doi:10.1126/science.adi2336 2023

[24] [24]

arXiv, ://arxiv.org/abs/2406.01465, doi:10.48550/arXiv.2406.01465

Lang, S., and Coauthors, 2024 a : AIFS - ECMWF 's data-driven forecasting system. arXiv, ://arxiv.org/abs/2406.01465, doi:10.48550/arXiv.2406.01465

work page doi:10.48550/arxiv.2406.01465 2024

[25] [25]

arXiv, ://arxiv.org/abs/2412.15832, arXiv:2412.15832 [physics] version: 1, doi:10.48550/arXiv.2412.15832

Lang, S., and Coauthors, 2024 b : AIFS - CRPS : Ensemble forecasting using a model trained with a loss function based on the Continuous Ranked Probability Score . arXiv, ://arxiv.org/abs/2412.15832, arXiv:2412.15832 [physics] version: 1, doi:10.48550/arXiv.2412.15832

work page doi:10.48550/arxiv.2412.15832 2024

[26] [26]

arXiv, ://arxiv.org/abs/2502.17516, arXiv:2502.17516 [cs], doi:10.48550/arXiv.2502.17516

Lin, Z., and Coauthors, 2025: A Survey on Mechanistic Interpretability for Multi - Modal Foundation Models . arXiv, ://arxiv.org/abs/2502.17516, arXiv:2502.17516 [cs], doi:10.48550/arXiv.2502.17516

work page doi:10.48550/arxiv.2502.17516 2025

[27] [27]

N., 1969: The predictability of a flow which possesses many scales of motion

Lorenz, E. N., 1969: The predictability of a flow which possesses many scales of motion. Tellus, 21 (3), 289--307, doi:10.1111/j.2153-3490.1969.tb00444.x, ://onlinelibrary.wiley.com/doi/abs/10.1111/j.2153-3490.1969.tb00444.x, \_eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.2153-3490.1969.tb00444.x

work page doi:10.1111/j.2153-3490.1969.tb00444.x 1969

[28] [28]

Decoupled Weight Decay Regularization

Loshchilov, I., and F. Hutter, 2019: Decoupled Weight Decay Regularization . arXiv, ://arxiv.org/abs/1711.05101, arXiv:1711.05101 [cs], doi:10.48550/arXiv.1711.05101

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1711.05101 2019

[29] [29]

MacMillan, T., and N. T. Ouellette, 2025: Towards mechanistic understanding in a data-driven weather model: internal activations reveal interpretable physical features. ://arxiv.org/abs/2512.24440v1

work page arXiv 2025

[30] [30]

Marion, P., Y.-H. Wu, M. E. Sander, and G. Biau, 2024: Implicit regularization of deep residual networks towards neural ODEs . arXiv, ://arxiv.org/abs/2309.01213, arXiv:2309.01213 [cs, stat] version: 3, doi:10.48550/arXiv.2309.01213

work page doi:10.48550/arxiv.2309.01213 2024

[31] [31]

Mermin, N. D., 1990: What's wrong with this pillow? Boojums All the Way through: Communicating Science in a Prosaic Age , Cambridge University Press, Cambridge, 198--204, doi:10.1017/CBO9780511608216.017, ://www.cambridge.org/core/books/boojums-all-the-way-through/whats-wrong-with-this-pillow/9B6C0AFA094ED6667647D8E9706784A0

work page doi:10.1017/cbo9780511608216.017 1990

[32] [32]

Messori, 2024: Do data-driven models beat numerical models in forecasting weather extremes? A comparison of IFS HRES , Pangu - Weather , and GraphCast

Olivetti, L., and G. Messori, 2024: Do data-driven models beat numerical models in forecasting weather extremes? A comparison of IFS HRES , Pangu - Weather , and GraphCast . Geoscientific Model Development, 17 (21), 7915--7962, doi:10.5194/gmd-17-7915-2024, ://gmd.copernicus.org/articles/17/7915/2024/

work page doi:10.5194/gmd-17-7915-2024 2024

[33] [33]

Palmer, T. N., A. Döring, and G. Seregin, 2014: The real butterfly effect. Nonlinearity, 27 (9), R123, doi:10.1088/0951-7715/27/9/R123, ://dx.doi.org/10.1088/0951-7715/27/9/R123

work page doi:10.1088/0951-7715/27/9/r123 2014

[34] [34]

FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators

Pathak, J., and Coauthors, 2022: FourCastNet : A Global Data -driven High -resolution Weather Model using Adaptive Fourier Neural Operators . arXiv, ://arxiv.org/abs/2202.11214, arXiv:2202.11214 [physics]

work page internal anchor Pith review Pith/arXiv arXiv 2022

[35] [35]

arXiv, ://arxiv.org/abs/2501.10465, arXiv:2501.10465 [math], doi:10.48550/arXiv.2501.10465

Peyré, G., 2025 a : The Mathematics of Artificial Intelligence . arXiv, ://arxiv.org/abs/2501.10465, arXiv:2501.10465 [math], doi:10.48550/arXiv.2501.10465

work page doi:10.48550/arxiv.2501.10465 2025

[36] [36]

arXiv, ://arxiv.org/abs/2512.06797, arXiv:2512.06797 [math], doi:10.48550/arXiv.2512.06797

Peyré, G., 2025 b : Optimal and Diffusion Transports in Machine Learning . arXiv, ://arxiv.org/abs/2512.06797, arXiv:2512.06797 [math], doi:10.48550/arXiv.2512.06797

work page doi:10.48550/arxiv.2512.06797 2025

[37] [37]

arXiv, ://arxiv.org/abs/2312.15796, arXiv:2312.15796 [physics] version: 2, doi:10.48550/arXiv.2312.15796

Price, I., and Coauthors, 2024: GenCast : Diffusion -based ensemble forecasting for medium-range weather. arXiv, ://arxiv.org/abs/2312.15796, arXiv:2312.15796 [physics] version: 2, doi:10.48550/arXiv.2312.15796

work page doi:10.48550/arxiv.2312.15796 2024

[38] [38]

Rai, D., Y. Zhou, S. Feng, A. Saparov, and Z. Yao, 2025: A Practical Review of Mechanistic Interpretability for Transformer - Based Language Models . arXiv, ://arxiv.org/abs/2407.02646, arXiv:2407.02646 [cs], doi:10.48550/arXiv.2407.02646

work page doi:10.48550/arxiv.2407.02646 2025

[39] [39]

arXiv, ://arxiv.org/abs/2512.01868, arXiv:2512.01868 [cs], doi:10.48550/arXiv.2512.01868

Rigollet, P., 2026: The Mean - Field Dynamics of Transformers . arXiv, ://arxiv.org/abs/2512.01868, arXiv:2512.01868 [cs], doi:10.48550/arXiv.2512.01868

work page doi:10.48550/arxiv.2512.01868 2026

[40] [40]

Sander, M. E., P. Ablin, M. Blondel, and G. Peyré, 2022: Sinkformers: Transformers with Doubly Stochastic Attention . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics , PMLR, 3515--3530, ://proceedings.mlr.press/v151/sander22a.html

work page 2022

[41] [41]

{ Euclidean, Metric, and Wasserstein } Gradient Flows: an overview

Santambrogio, F., 2016: \ Euclidean , Metric , and Wasserstein \ Gradient Flows : an overview. arXiv, ://arxiv.org/abs/1609.03890, arXiv:1609.03890 [math], doi:10.48550/arXiv.1609.03890

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1609.03890 2016

[42] [42]

Bruinsma, G

Selz, T., W. Bruinsma, G. C. Craig, S. Markou, R. Turner, and A. Vaughan, 2025: On the effective resolution of AI weather prediction models. doi:10.22541/essoar.174139239.94807670/v1, ://www.authorea.com/users/645836/articles/1274105-on-the-effective-resolution-of-ai-weather-prediction-models

work page doi:10.22541/essoar.174139239.94807670/v1 2025

[43] [43]

Selz, T., and G. C. Craig, 2023: Can Artificial Intelligence - Based Weather Prediction Models Simulate the Butterfly Effect ? Geophysical Research Letters, 50 (20), e2023GL105\,747, doi:10.1029/2023GL105747, ://onlinelibrary.wiley.com/doi/abs/10.1029/2023GL105747, \_eprint: https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/2023GL105747

work page doi:10.1029/2023gl105747 2023

[44] [44]

G., and Coauthors, 2018: Storylines: an alternative approach to representing uncertainty in physical aspects of climate change

Shepherd, T. G., and Coauthors, 2018: Storylines: an alternative approach to representing uncertainty in physical aspects of climate change. Climatic Change, 151 (3), 555--571, doi:10.1007/s10584-018-2317-9, ://doi.org/10.1007/s10584-018-2317-9

work page doi:10.1007/s10584-018-2317-9 2018

[45] [45]

A Bayesian Perspective on Generalization and Stochastic Gradient Descent

Smith, S. L., and Q. V. Le, 2018: A Bayesian Perspective on Generalization and Stochastic Gradient Descent . arXiv, ://arxiv.org/abs/1710.06451, arXiv:1710.06451 [cs], doi:10.48550/arXiv.1710.06451

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1710.06451 2018

[46] [46]

Sun, Y. Q., P. Hassanzadeh, M. Zand, A. Chattopadhyay, J. Weare, and D. S. Abbot, 2025: Can AI weather models predict out-of-distribution gray swan tropical cyclones? Proceedings of the National Academy of Sciences, 122 (21), e2420914\,122, doi:10.1073/pnas.2420914122, ://www.pnas.org/doi/10.1073/pnas.2420914122

work page doi:10.1073/pnas.2420914122 2025

[47] [47]

Tempest, K. I., M. Beylich, and G. C. Craig, 2026: Mechanistic Interpretability Tool for AI Weather Models . arXiv, ://arxiv.org/abs/2604.20467, arXiv:2604.20467 [physics], doi:10.48550/arXiv.2604.20467

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.20467 2026

[48] [48]

Baratin, and R

Vuckovic, J., A. Baratin, and R. T. d. Combes, 2020: A Mathematical Theory of Attention . arXiv, ://arxiv.org/abs/2007.02876, arXiv:2007.02876 [stat.ML], doi:10.48550/arXiv.2007.02876

work page doi:10.48550/arxiv.2007.02876 2020

[49] [49]

Perceptrons and localization of attention's mean-field landscape

Álvarez López, A., B. Geshkovski, and D. Ruiz-Balet, 2026: Perceptrons and localization of attention's mean-field landscape. arXiv, ://arxiv.org/abs/2601.21366, arXiv:2601.21366 [cs], doi:10.48550/arXiv.2601.21366

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2601.21366 2026