Recognition: unknown
Global Attention with Linear Complexity for Exascale Generative Data Assimilation in Earth System Prediction
Pith reviewed 2026-05-10 08:46 UTC · model grok-4.3
The pith
STORM achieves linear-complexity global attention in a generative DA framework, scaling to 20 billion tokens and 1.6 ExaFLOPs on 32k GPUs for km-scale Earth modeling.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
On 32,768 GPUs of the Frontier supercomputer, our method achieves 63% strong scaling efficiency and 1.6 ExaFLOP sustained performance. We further scale to 20 billion spatiotemporal tokens, enabling km-scale global modeling over 177k temporal frames.
Load-bearing premise
The generative one-stage framework accurately samples the Bayesian posterior for Earth-system states without the conventional forecast-update cycle, and the linear-complexity attention preserves the fidelity needed for reliable assimilation at scale.
Figures
read the original abstract
Accurate weather and climate prediction relies on data assimilation (DA), which estimates the Earth system state by integrating observations with models. While exascale computing has significantly advanced earth simulation, scalable and accurate inference of the Earth system state remains a fundamental bottleneck, limiting uncertainty quantification and prediction of extreme events. We introduce a unified one-stage generative DA framework that reformulates assimilation as Bayesian posterior sampling, replacing the conventional forecast-update cycle with compute-dense, GPU-efficient inference. At the core is STORM, a novel spatiotemporal transformer with a global attention linear-complexity scaling algorithm that breaks the quadratic attention barrier. On 32,768 GPUs of the Frontier supercomputer, our method achieves 63% strong scaling efficiency and 1.6 ExaFLOP sustained performance. We further scale to 20 billion spatiotemporal tokens, enabling km-scale global modeling over 177k temporal frames, regimes previously unreachable, establishing a new paradigm for Earth system prediction.
Editorial analysis
A structured set of objections, weighed in public.
Circularity Check
No circularity: performance claims are empirical measurements, not derived by construction
full rationale
The abstract presents the 63% strong scaling efficiency, 1.6 ExaFLOP sustained performance, and scaling to 20 billion spatiotemporal tokens as direct measurements obtained from runs on 32,768 GPUs of the Frontier supercomputer. The STORM linear-complexity global attention algorithm is introduced as a novel component that enables these regimes, but no equations, derivations, or self-citations are shown that reduce the reported performance numbers to fitted parameters, renamed inputs, or tautological definitions. The one-stage generative DA reformulation is a methodological choice whose fidelity is asserted as an assumption rather than proven by reducing to prior self-referential results. The derivation chain therefore remains self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Global attention can be reformulated to achieve linear complexity while retaining sufficient expressivity for spatiotemporal Earth-system data.
- domain assumption One-stage generative sampling can replace the iterative forecast-update cycle without loss of assimilation accuracy.
invented entities (1)
-
STORM spatiotemporal transformer
no independent evidence
Reference graph
Works this paper leans on
-
[1]
The E3SM-MMF case study: A case study in global cloud-resolving multi-scale modeling at exascale,
K. Zhanget al., “The E3SM-MMF case study: A case study in global cloud-resolving multi-scale modeling at exascale,” inProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC23). ACM, 2023
2023
-
[2]
Pushing the frontier: Global cloud-resolving climate simulations at 1km resolution on the Frontier exascale system,
W. Linet al., “Pushing the frontier: Global cloud-resolving climate simulations at 1km resolution on the Frontier exascale system,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC25). ACM, 2025
2025
-
[3]
FourCastNet-V2: Multi-scale global data-driven weather forecasting at 0.1 degree resolution on exascale systems,
T. Kurthet al., “FourCastNet-V2: Multi-scale global data-driven weather forecasting at 0.1 degree resolution on exascale systems,” inProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC24). ACM, 2024
2024
-
[4]
The simple cloud-resolving e3sm atmosphere model running on the frontier exascale system,
M. Tayloret al., “The simple cloud-resolving e3sm atmosphere model running on the frontier exascale system,” inProceedings of the interna- tional conference for high performance computing, networking, storage and analysis, 2023, pp. 1–11
2023
-
[5]
Computing the full earth system at 1km resolution,
D. Klockeet al., “Computing the full earth system at 1km resolution,” inProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2025, pp. 125–136
2025
-
[6]
Learning skillful medium-range global weather forecast- ing,
R. Lamet al., “Learning skillful medium-range global weather forecast- ing,”Science, vol. 382, no. 6677, pp. 1416–1424, 2023
2023
-
[7]
Accurate medium-range global weather forecasting with 3d neural networks,
K. Bi, L. Xie, H. Zhang, X. Chen, L. Gu, and Q. Tian, “Accurate medium-range global weather forecasting with 3d neural networks,” Nature, vol. 619, no. 7970, pp. 533–538, 2023
2023
-
[8]
J. Pathaket al., “FourCastNet: A global data-driven high-resolution weather forecasting model using adaptive fourier neural operators,” arXiv preprint arXiv:2202.11214, 2022
work page internal anchor Pith review arXiv 2022
-
[9]
Deterministic nonperiodic flow,
E. N. Lorenz, “Deterministic nonperiodic flow,”Journal of the Atmo- spheric Sciences, vol. 20, no. 2, pp. 130–141, 1963
1963
-
[10]
A generalization of Lorenz’s model for the predictability of flows with many scales of motion,
R. Rotunno and C. Snyder, “A generalization of Lorenz’s model for the predictability of flows with many scales of motion,”Journal of the Atmospheric Sciences, vol. 65, no. 3, pp. 1063–1076, 2008
2008
-
[11]
Upscale versus “up-amplitude
R. Rotunno, C. Snyder, and F. Judt, “Upscale versus “up-amplitude” growth of forecast-error spectra,”Journal of the Atmospheric Sciences, vol. 80, no. 1, pp. 63–72, 2023
2023
-
[12]
Is space-time attention all you need for video understanding?
G. Bertasius, H. Wang, and L. Torresani, “Is space-time attention all you need for video understanding?” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2021, pp. 813–824
2021
-
[13]
The ECMWF ensemble prediction system: Looking back (more than) 25 years and projecting forward 25 years,
T. N. Palmer, “The ECMWF ensemble prediction system: Looking back (more than) 25 years and projecting forward 25 years,”Quarterly Journal of the Royal Meteorological Society, vol. 145, no. S1, pp. 12–24, 2018
2018
-
[14]
Diffda: a diffusion model for weather-scale data assimilation,
L. Huanget al., “Diffda: a diffusion model for weather-scale data assimilation,” inProceedings of the 41st International Conference on Machine Learning, ser. ICML’24, 2024
2024
-
[15]
Using diffusion models to do data assimilation,
D. Hodyss and M. Morzfeld, “Using diffusion models to do data assimilation,”Monthly Weather Review, vol. 153, no. 6, pp. 1245–1262, 2025
2025
-
[16]
Generative data assimilation of sparse weather station observations at kilometer scales,
P. Manshausenet al., “Generative data assimilation of sparse weather station observations at kilometer scales,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 2154–2163
2024
-
[17]
A score-based filter for nonlinear data assimilation,
F. Bao, Z. Zhang, and G. Zhang, “A score-based filter for nonlinear data assimilation,”Journal of Computational Physics, vol. 514, p. 113207, 2024
2024
-
[18]
An ensemble score filter for tracking high-dimensional nonlin- ear dynamical system,
——, “An ensemble score filter for tracking high-dimensional nonlin- ear dynamical system,”Computer Methods in Applied Mechanics and Engineering, vol. 432, no. Part B, p. 117447, 2024
2024
-
[19]
Nonlinear ensemble filtering with diffusion models: application to the surface quasi-geostrophic dynamics,
F. Bao, H. Chipilski, S. Liang, G. Zhang, and J. Whitaker, “Nonlinear ensemble filtering with diffusion models: application to the surface quasi-geostrophic dynamics,”Monthly Weather Review, vol. 153, no. 7, pp. 1155–1169, 2025
2025
-
[20]
Gencast: Diffusion-based ensemble forecasting for medium-range weather,
I. Priceet al., “Gencast: Diffusion-based ensemble forecasting for medium-range weather,”Nature, 2024
2024
-
[21]
Swin transformer v2: Scaling up capacity and resolution,
Z. Liuet al., “Swin transformer v2: Scaling up capacity and resolution,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 12 009–12 019
2022
-
[22]
Maxvit: Multi-axis vision transformer,
Z. Tuet al., “Maxvit: Multi-axis vision transformer,”ECCV, 2022
2022
-
[23]
Orbit-2: Scaling exascale vision foundation models for weather and climate downscaling,
X. Wanget al., “Orbit-2: Scaling exascale vision foundation models for weather and climate downscaling,” inProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC ’25. New York, NY , USA: Association for Computing Machinery, 2025, p. 86–98
2025
-
[24]
Elucidating the design space of diffusion-based generative models,
T. Karras, M. Aittala, T. Aila, and S. Laine, “Elucidating the design space of diffusion-based generative models,” inProc. NeurIPS, 2022
2022
-
[25]
FlashAttention-2: Faster attention with better parallelism and work partitioning,
T. Dao, “FlashAttention-2: Faster attention with better parallelism and work partitioning,” 2023
2023
-
[26]
Orbit: Oak ridge base foundation model for earth system predictability,
X. Wanget al., “Orbit: Oak ridge base foundation model for earth system predictability,” ser. SC ’24, 2024
2024
-
[27]
The ERA5 global reanalysis,
H. Hersbachet al., “The ERA5 global reanalysis,”Quarterly Journal of the Royal Meteorological Society, vol. 146, no. 730, pp. 1999–2049, 2020
1999
-
[28]
The High-Resolution Rapid Refresh (HRRR): An hourly updating convection-allowing forecast model. Part I: Motivation and system description,
D. C. Dowellet al., “The High-Resolution Rapid Refresh (HRRR): An hourly updating convection-allowing forecast model. Part I: Motivation and system description,”Weather and Forecasting, vol. 37, no. 8, pp. 1371–1395, 2022
2022
-
[29]
Kmz event files for Hurricanes Laura, Delta, Michael, and Teddy,
National Weather Service, “Kmz event files for Hurricanes Laura, Delta, Michael, and Teddy,” Weather event KMZ files, 2018 and 2020, individual storm-specific KMZ files obtained from weather.gov
2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.