The physics of AI weather models
Pith reviewed 2026-05-25 02:18 UTC · model grok-4.3
The pith
AI weather models implement a particle description of the atmosphere with movements driven by gradient flow toward a learned free energy minimum.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors propose that the AI models implement a particle description of the atmosphere, where the latent variables at each mesh point correspond to the position of a particle in the high dimensional latent space. They hypothesize that the movement of the particles follows a gradient flow in the latent space towards a minimum of a learned free energy functional. This is evidenced by similar representations across models and the observed progression from large-scale to small-scale changes with increasing layer depth.
What carries the argument
A particle description of the atmosphere in which latent variables represent particle positions and evolution follows gradient flow on a learned free energy functional.
If this is right
- Different AI weather models represent the atmosphere in similar ways despite architectural differences.
- The models process information from large spatial scales in early layers to smaller scales in deeper layers.
- The architecture and training constrain the form of physical laws simulated by the models.
- Evidence from Centered Kernel Alignment supports convergence on similar representations.
Where Pith is reading between the lines
- If the gradient flow interpretation holds, modifying the latent space dynamics could improve model stability or interpretability.
- This particle view might link AI weather prediction to concepts from statistical mechanics.
- Future models could be designed to explicitly minimize a free energy functional to enhance physical consistency.
Load-bearing premise
The observed change from large-scale modifications in early layers to small-scale modifications in deeper layers reflects gradient flow on a free energy landscape rather than an effect of the model architecture or training procedure.
What would settle it
A demonstration that altering the layer order or training to remove the large-to-small scale progression still yields equivalent forecast skill would challenge the gradient flow hypothesis.
read the original abstract
Could it be that AI weather models are solving physical equations, although they may not be the equations used by conventional NWP models? We compute correlations of forecast skill and Centered Kernel Alignment, providing evidence that different AI weather models represent the atmosphere in similar ways, despite differences in architecture and capacity. We argue that the architecture and training of the AI models constrains the form of the physical laws that they might simulate. In particular, we propose that the models implement a particle description of the atmosphere, where the latent variables at each mesh point correspond to the position of a particle in the high dimensional latent space. We hypothesize that the movement of the particles follows a gradient flow in the latent space towards a minimum of a learned free energy functional. Analysis of the GraphCast and Aurora models show that they make changes on large spatial scales in the early processor layers and move to smaller scale with increasing layer depth, consistent with the gradient flow hypothesis.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper computes CKA correlations between GraphCast and Aurora to argue that AI weather models represent the atmosphere similarly despite architectural differences. It proposes that latent variables at mesh points act as particle positions in high-dimensional latent space and hypothesizes that layer-wise processing implements gradient flow toward a minimum of a learned free-energy functional, with supporting evidence from the observed progression of changes from large spatial scales in early layers to smaller scales in deeper layers.
Significance. If the gradient-flow interpretation could be made rigorous, the work would offer a physically motivated lens on why AI weather models generalize and a potential route to extracting effective equations from trained networks. The CKA similarity result is a modest but concrete observation; the particle/free-energy framing remains a hypothesis without a derived mapping from network weights to the claimed functional.
major comments (3)
- [Abstract] Abstract (final paragraph) and the hypothesis statement: the observed large-to-small scale refinement with depth is presented as consistent with gradient flow on a free-energy landscape, yet no derivation shows that this ordering is diagnostic of variational dynamics rather than a generic consequence of the multi-scale graph/transformer processors used in both models.
- [Hypothesis section] The central hypothesis equates latent variables with particle positions and layer updates with gradient steps, but the manuscript contains no explicit construction or extraction of the free-energy functional from the trained weights, nor any computation of its gradient that could be compared to the observed layer updates.
- [Analysis of GraphCast and Aurora] No statistical tests, error bars, or controls are reported for the CKA values or the scale-progression measurements; the link between these quantities and the free-energy claim therefore rests on qualitative inspection alone.
minor comments (1)
- [Proposal of particle description] Notation for the latent-space particle description is introduced without a precise mapping from mesh-point indices to the high-dimensional coordinates.
Simulated Author's Rebuttal
We thank the referee for their insightful comments on our manuscript. We address each of the major comments below and have made revisions to improve clarity and acknowledge limitations where appropriate.
read point-by-point responses
-
Referee: [Abstract] Abstract (final paragraph) and the hypothesis statement: the observed large-to-small scale refinement with depth is presented as consistent with gradient flow on a free-energy landscape, yet no derivation shows that this ordering is diagnostic of variational dynamics rather than a generic consequence of the multi-scale graph/transformer processors used in both models.
Authors: We agree that the observed scale progression is a characteristic of the multi-scale processing in these architectures and is not by itself a unique signature of gradient flow. In the revised manuscript, we have modified the abstract and hypothesis section to present this as suggestive consistency with the proposed gradient flow rather than diagnostic evidence. We have also added a short discussion noting that this behavior could arise from other mechanisms but is particularly aligned with variational minimization. revision: yes
-
Referee: [Hypothesis section] The central hypothesis equates latent variables with particle positions and layer updates with gradient steps, but the manuscript contains no explicit construction or extraction of the free-energy functional from the trained weights, nor any computation of its gradient that could be compared to the observed layer updates.
Authors: The manuscript presents the particle and free-energy interpretation as a hypothesis inspired by the model architecture and empirical observations, without claiming a full derivation. We do not extract an explicit functional or match gradients to updates, as this would require new techniques beyond the scope of the current work. We have revised the text to more explicitly label this as a hypothesis and to highlight the need for future work on deriving the functional. revision: partial
-
Referee: [Analysis of GraphCast and Aurora] No statistical tests, error bars, or controls are reported for the CKA values or the scale-progression measurements; the link between these quantities and the free-energy claim therefore rests on qualitative inspection alone.
Authors: We acknowledge the qualitative nature of the presented analysis. For the revised version, we have added error bars to the CKA similarity measures based on variability across different forecast lead times and included a note on the absence of formal statistical testing as a limitation of the current study. Additional controls, such as comparisons with untrained networks, are discussed as potential extensions. revision: partial
- Providing an explicit construction of the free-energy functional from the trained network weights and verifying that layer updates correspond to its gradient steps.
Circularity Check
No significant circularity; hypothesis presented as interpretive proposal with consistency check
full rationale
The paper proposes a particle-gradient-flow interpretation of AI weather models as a hypothesis and reports that observed layer-wise scale progression (large scales early, smaller scales later) is consistent with it. This is an interpretive link rather than a derivation chain containing self-definitional equations, fitted inputs renamed as predictions, or load-bearing self-citations. No quoted text shows the hypothesis being defined in terms of the observations or vice versa; the scale changes are treated as independent empirical findings. The claim may be under-supported or non-unique, but it does not reduce to its inputs by construction under the enumerated patterns.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Centered Kernel Alignment and forecast-skill correlations measure similarity of physical representations across models
- ad hoc to paper Progression from large to small spatial scales with layer depth is diagnostic of gradient flow
invented entities (2)
-
particle in high-dimensional latent space
no independent evidence
-
learned free energy functional
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We hypothesize that the movement of the particles follows a gradient flow in the latent space towards a minimum of a learned free energy functional... v_s = −∇(δG/δ α) ... G(α)=∫H(α)dx + ∫V(x)α(x)dx + ½∫∫W(x,x′)α(x)α(x′)dxdx′
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
the system of mesh points as a set of interacting particles... mean-field limit... continuity equation ∂s α̃_s + ∇·(α̃_s v_s)=0
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
arXiv, ://arxiv.org/abs/2506.10772, arXiv:2506.10772 [cs], doi:10.48550/arXiv.2506.10772
Alet, F., and Coauthors, 2025: Skillful joint probabilistic weather forecasting from marginals. arXiv, ://arxiv.org/abs/2506.10772, arXiv:2506.10772 [cs], doi:10.48550/arXiv.2506.10772
-
[2]
Alvarez-Melis, D., Y. Schiff, and Y. Mroueh, 2021: Optimizing Functionals on the Space of Probabilities with Input Convex Neural Networks . arXiv, ://arxiv.org/abs/2106.00774, arXiv:2106.00774 [stat], doi:10.48550/arXiv.2106.00774
-
[3]
Eliciting Latent Predictions from Transformers with the Tuned Lens
Belrose, N., Z. Furman, L. Smith, D. Halawi, I. Ostrovsky, L. McKinney, S. Biderman, and J. Steinhardt, 2023: Eliciting Latent Predictions from Transformers with the Tuned Lens . arXiv, ://arxiv.org/abs/2303.08112, arXiv:2303.08112 [cs], doi:10.48550/arXiv.2303.08112
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.08112 2023
-
[4]
Mechanistic Interpretability for AI Safety -- A Review
Bereska, L., and E. Gavves, 2024: Mechanistic Interpretability for AI Safety -- A Review . arXiv, ://arxiv.org/abs/2404.14082, arXiv:2404.14082 [cs], doi:10.48550/arXiv.2404.14082
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2404.14082 2024
-
[5]
Bi, K., L. Xie, H. Zhang, X. Chen, X. Gu, and Q. Tian, 2023: Accurate medium-range global weather forecasting with 3D neural networks. Nature, 619 (7970), 533--538, doi:10.1038/s41586-023-06185-3, ://www.nature.com/articles/s41586-023-06185-3
-
[6]
Bodnar, C., and Coauthors, 2025: A foundation model for the Earth system. Nature, 641 (8065), 1180--1187, doi:10.1038/s41586-025-09005-y, ://www.nature.com/articles/s41586-025-09005-y
-
[7]
arXiv, ://arxiv.org/abs/2507.12144, arXiv:2507.12144 [cs], doi:10.48550/arXiv.2507.12144
Bonev, B., and Coauthors, 2025: FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale. arXiv, ://arxiv.org/abs/2507.12144, arXiv:2507.12144 [cs], doi:10.48550/arXiv.2507.12144
-
[8]
Chen, R. T. Q., Y. Rubanova, J. Bettencourt, and D. K. Duvenaud, 2018: Neural Ordinary Differential Equations . Advances in Neural Information Processing Systems , Curran Associates, Inc., Vol. 31, ://papers.nips.cc/paper_files/paper/2018/hash/69386f6bb1dfed68692a24c8686939b9-Abstract.html
work page 2018
-
[9]
Couairon, G., C. Lessig, A. Charantonis, and C. Monteleoni, 2024: ArchesWeather : An efficient AI weather forecasting model at 1.5\ deg\ resolution. arXiv, ://arxiv.org/abs/2405.14527, doi:10.48550/arXiv.2405.14527
-
[10]
Couairon, G., R. Singh, A. Charantonis, C. Lessig, and C. Monteleoni, 2026: ArchesWeatherGen : Skillful and compute-efficient probabilistic weather forecasting with machine learning. Science Advances, 12 (17), eadx2372, doi:10.1126/sciadv.adx2372, ://www.science.org/doi/full/10.1126/sciadv.adx2372
-
[11]
Cuomo, S., V. S. d. Cola, F. Giampaolo, G. Rozza, M. Raissi, and F. Piccialli, 2022: Scientific Machine Learning through Physics - Informed Neural Networks : Where we are and What 's next. arXiv, ://arxiv.org/abs/2201.05624, arXiv:2201.05624 [cs], doi:10.48550/arXiv.2201.05624
-
[12]
arXiv, ://arxiv.org/abs/2509.17601, arXiv:2509.17601 [physics], doi:10.48550/arXiv.2509.17601
Dunstan, T., and Coauthors, 2025: FastNet : Improving the physical consistency of machine-learning weather prediction models through loss function design. arXiv, ://arxiv.org/abs/2509.17601, arXiv:2509.17601 [physics], doi:10.48550/arXiv.2509.17601
-
[13]
://charts.ecmwf.int/catalogue/packages/ai_models/
ECMWF, 2025: ECMWF Charts . ://charts.ecmwf.int/catalogue/packages/ai_models/
work page 2025
-
[14]
Edamadaka, S., S. Yang, J. Li, and R. Gómez-Bombarelli, 2025: Universally Converging Representations of Matter Across Scientific Foundation Models . arXiv, ://arxiv.org/abs/2512.03750, arXiv:2512.03750 [cs], doi:10.48550/arXiv.2512.03750
-
[15]
Elhage, N., and Coauthors, 2021: A Mathematical Framework for Transformer Circuits . ://transformer-circuits.pub/2021/framework/index.html, https://transformer-circuits.pub/2021/framework/index.html
work page 2021
-
[16]
Optimal Transport on Quantum Structures , J
Figalli, A., 2024: An Introduction to Optimal Transport and Wasserstein Gradient Flows . Optimal Transport on Quantum Structures , J. Maas, S. Rademacher, T. Titkos, and D. Virosztek, Eds., Vol. 29, Springer Nature Switzerland, Cham, 1--28, doi:10.1007/978-3-031-50466-2_1, ://link.springer.com/10.1007/978-3-031-50466-2_1, series Title: Bolyai Society Math...
-
[17]
Geshkovski, B., C. Letrouit, Y. Polyanskiy, and P. Rigollet, 2023: A mathematical perspective on Transformers . ://arxiv.org/abs/2312.10794v4
-
[18]
Huang, G., Y. Sun, Z. Liu, D. Sedra, and K. Q. Weinberger, 2016: Deep Networks with Stochastic Depth . Computer Vision – ECCV 2016 , B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds., Springer International Publishing, Cham, 646--661, doi:10.1007/978-3-319-46493-0_39
-
[19]
The Platonic Representation Hypothesis
Huh, M., B. Cheung, T. Wang, and P. Isola, 2024: The Platonic Representation Hypothesis . ://arxiv.org/abs/2405.07987v5
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[20]
SIAM Journal on Mathematical Analysis , volume =
Jordan, R., D. Kinderlehrer, and F. Otto, 1998: The Variational Formulation of the Fokker -- Planck Equation . SIAM Journal on Mathematical Analysis, 29 (1), 1--17, doi:10.1137/S0036141096303359, ://epubs.siam.org/doi/10.1137/S0036141096303359
-
[21]
Similarity of Neural Network Representations Revisited
Kornblith, S., M. Norouzi, H. Lee, and G. Hinton, 2019: Similarity of Neural Network Representations Revisited . arXiv, ://arxiv.org/abs/1905.00414, arXiv:1905.00414 [cs.LG], doi:10.48550/arXiv.1905.00414
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1905.00414 2019
-
[22]
Kurth, T., and Coauthors, 2023: FourCastNet : Accelerating Global High - Resolution Weather Forecasting Using Adaptive Fourier Neural Operators . Proceedings of the Platform for Advanced Scientific Computing Conference , Association for Computing Machinery, New York, NY, USA, 1--11, PASC '23, doi:10.1145/3592979.3593412, ://dl.acm.org/doi/10.1145/3592979.3593412
-
[23]
Lam, R., and Coauthors, 2023: Learning skillful medium-range global weather forecasting. Science, 382 (6677), 1416--1421, doi:10.1126/science.adi2336, ://www.science.org/doi/10.1126/science.adi2336
-
[24]
arXiv, ://arxiv.org/abs/2406.01465, doi:10.48550/arXiv.2406.01465
Lang, S., and Coauthors, 2024 a : AIFS - ECMWF 's data-driven forecasting system. arXiv, ://arxiv.org/abs/2406.01465, doi:10.48550/arXiv.2406.01465
-
[25]
Lang, S., and Coauthors, 2024 b : AIFS - CRPS : Ensemble forecasting using a model trained with a loss function based on the Continuous Ranked Probability Score . arXiv, ://arxiv.org/abs/2412.15832, arXiv:2412.15832 [physics] version: 1, doi:10.48550/arXiv.2412.15832
-
[26]
arXiv, ://arxiv.org/abs/2502.17516, arXiv:2502.17516 [cs], doi:10.48550/arXiv.2502.17516
Lin, Z., and Coauthors, 2025: A Survey on Mechanistic Interpretability for Multi - Modal Foundation Models . arXiv, ://arxiv.org/abs/2502.17516, arXiv:2502.17516 [cs], doi:10.48550/arXiv.2502.17516
-
[27]
N., 1969: The predictability of a flow which possesses many scales of motion
Lorenz, E. N., 1969: The predictability of a flow which possesses many scales of motion. Tellus, 21 (3), 289--307, doi:10.1111/j.2153-3490.1969.tb00444.x, ://onlinelibrary.wiley.com/doi/abs/10.1111/j.2153-3490.1969.tb00444.x, \_eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.2153-3490.1969.tb00444.x
-
[28]
Decoupled Weight Decay Regularization
Loshchilov, I., and F. Hutter, 2019: Decoupled Weight Decay Regularization . arXiv, ://arxiv.org/abs/1711.05101, arXiv:1711.05101 [cs], doi:10.48550/arXiv.1711.05101
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1711.05101 2019
- [29]
-
[30]
Marion, P., Y.-H. Wu, M. E. Sander, and G. Biau, 2024: Implicit regularization of deep residual networks towards neural ODEs . arXiv, ://arxiv.org/abs/2309.01213, arXiv:2309.01213 [cs, stat] version: 3, doi:10.48550/arXiv.2309.01213
-
[31]
Mermin, N. D., 1990: What's wrong with this pillow? Boojums All the Way through: Communicating Science in a Prosaic Age , Cambridge University Press, Cambridge, 198--204, doi:10.1017/CBO9780511608216.017, ://www.cambridge.org/core/books/boojums-all-the-way-through/whats-wrong-with-this-pillow/9B6C0AFA094ED6667647D8E9706784A0
-
[32]
Olivetti, L., and G. Messori, 2024: Do data-driven models beat numerical models in forecasting weather extremes? A comparison of IFS HRES , Pangu - Weather , and GraphCast . Geoscientific Model Development, 17 (21), 7915--7962, doi:10.5194/gmd-17-7915-2024, ://gmd.copernicus.org/articles/17/7915/2024/
-
[33]
Palmer, T. N., A. Döring, and G. Seregin, 2014: The real butterfly effect. Nonlinearity, 27 (9), R123, doi:10.1088/0951-7715/27/9/R123, ://dx.doi.org/10.1088/0951-7715/27/9/R123
-
[34]
Pathak, J., and Coauthors, 2022: FourCastNet : A Global Data -driven High -resolution Weather Model using Adaptive Fourier Neural Operators . arXiv, ://arxiv.org/abs/2202.11214, arXiv:2202.11214 [physics]
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[35]
arXiv, ://arxiv.org/abs/2501.10465, arXiv:2501.10465 [math], doi:10.48550/arXiv.2501.10465
Peyré, G., 2025 a : The Mathematics of Artificial Intelligence . arXiv, ://arxiv.org/abs/2501.10465, arXiv:2501.10465 [math], doi:10.48550/arXiv.2501.10465
-
[36]
arXiv, ://arxiv.org/abs/2512.06797, arXiv:2512.06797 [math], doi:10.48550/arXiv.2512.06797
Peyré, G., 2025 b : Optimal and Diffusion Transports in Machine Learning . arXiv, ://arxiv.org/abs/2512.06797, arXiv:2512.06797 [math], doi:10.48550/arXiv.2512.06797
-
[37]
Price, I., and Coauthors, 2024: GenCast : Diffusion -based ensemble forecasting for medium-range weather. arXiv, ://arxiv.org/abs/2312.15796, arXiv:2312.15796 [physics] version: 2, doi:10.48550/arXiv.2312.15796
-
[38]
Rai, D., Y. Zhou, S. Feng, A. Saparov, and Z. Yao, 2025: A Practical Review of Mechanistic Interpretability for Transformer - Based Language Models . arXiv, ://arxiv.org/abs/2407.02646, arXiv:2407.02646 [cs], doi:10.48550/arXiv.2407.02646
-
[39]
arXiv, ://arxiv.org/abs/2512.01868, arXiv:2512.01868 [cs], doi:10.48550/arXiv.2512.01868
Rigollet, P., 2026: The Mean - Field Dynamics of Transformers . arXiv, ://arxiv.org/abs/2512.01868, arXiv:2512.01868 [cs], doi:10.48550/arXiv.2512.01868
-
[40]
Sander, M. E., P. Ablin, M. Blondel, and G. Peyré, 2022: Sinkformers: Transformers with Doubly Stochastic Attention . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics , PMLR, 3515--3530, ://proceedings.mlr.press/v151/sander22a.html
work page 2022
-
[41]
{ Euclidean, Metric, and Wasserstein } Gradient Flows: an overview
Santambrogio, F., 2016: \ Euclidean , Metric , and Wasserstein \ Gradient Flows : an overview. arXiv, ://arxiv.org/abs/1609.03890, arXiv:1609.03890 [math], doi:10.48550/arXiv.1609.03890
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1609.03890 2016
-
[42]
Selz, T., W. Bruinsma, G. C. Craig, S. Markou, R. Turner, and A. Vaughan, 2025: On the effective resolution of AI weather prediction models. doi:10.22541/essoar.174139239.94807670/v1, ://www.authorea.com/users/645836/articles/1274105-on-the-effective-resolution-of-ai-weather-prediction-models
-
[43]
Selz, T., and G. C. Craig, 2023: Can Artificial Intelligence - Based Weather Prediction Models Simulate the Butterfly Effect ? Geophysical Research Letters, 50 (20), e2023GL105\,747, doi:10.1029/2023GL105747, ://onlinelibrary.wiley.com/doi/abs/10.1029/2023GL105747, \_eprint: https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/2023GL105747
-
[44]
Shepherd, T. G., and Coauthors, 2018: Storylines: an alternative approach to representing uncertainty in physical aspects of climate change. Climatic Change, 151 (3), 555--571, doi:10.1007/s10584-018-2317-9, ://doi.org/10.1007/s10584-018-2317-9
-
[45]
A Bayesian Perspective on Generalization and Stochastic Gradient Descent
Smith, S. L., and Q. V. Le, 2018: A Bayesian Perspective on Generalization and Stochastic Gradient Descent . arXiv, ://arxiv.org/abs/1710.06451, arXiv:1710.06451 [cs], doi:10.48550/arXiv.1710.06451
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1710.06451 2018
-
[46]
Sun, Y. Q., P. Hassanzadeh, M. Zand, A. Chattopadhyay, J. Weare, and D. S. Abbot, 2025: Can AI weather models predict out-of-distribution gray swan tropical cyclones? Proceedings of the National Academy of Sciences, 122 (21), e2420914\,122, doi:10.1073/pnas.2420914122, ://www.pnas.org/doi/10.1073/pnas.2420914122
-
[47]
Tempest, K. I., M. Beylich, and G. C. Craig, 2026: Mechanistic Interpretability Tool for AI Weather Models . arXiv, ://arxiv.org/abs/2604.20467, arXiv:2604.20467 [physics], doi:10.48550/arXiv.2604.20467
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.20467 2026
-
[48]
Vuckovic, J., A. Baratin, and R. T. d. Combes, 2020: A Mathematical Theory of Attention . arXiv, ://arxiv.org/abs/2007.02876, arXiv:2007.02876 [stat.ML], doi:10.48550/arXiv.2007.02876
-
[49]
Perceptrons and localization of attention's mean-field landscape
Álvarez López, A., B. Geshkovski, and D. Ruiz-Balet, 2026: Perceptrons and localization of attention's mean-field landscape. arXiv, ://arxiv.org/abs/2601.21366, arXiv:2601.21366 [cs], doi:10.48550/arXiv.2601.21366
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2601.21366 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.