pith. machine review for the scientific record. sign in

arxiv: 2605.01126 · v1 · submitted 2026-05-01 · 💻 cs.LG

Recognition: unknown

Extreme Weather Bench: A framework and benchmark for evaluation of high-impact weather

Authors on Pith no claims yet

Pith reviewed 2026-05-09 19:17 UTC · model grok-4.3

classification 💻 cs.LG
keywords extreme weatherbenchmark suiteAI weather modelsmodel verificationhigh-impact weathernumerical weather predictioncase studiesimpact metrics
0
0 comments X

The pith

Extreme Weather Bench supplies standardized case studies, data, and impact metrics to evaluate AI and NWP models on high-impact weather.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Extreme Weather Bench as an open-source benchmark suite designed to standardize evaluation of weather models on events that matter to the public. It supplies a fixed collection of case studies spanning different scales and weather types, along with observational data, impact-focused metrics, and code for running the tests. Current AI weather model assessments often rely on global averages or researcher-chosen examples, which makes fair comparisons difficult. A shared benchmark lets developers verify models against the same high-impact phenomena and track progress toward better real-world performance. The suite is intended to grow through community input and apply to both AI and traditional numerical weather prediction systems.

Core claim

Extreme Weather Bench is a community-driven framework that supplies a standard set of high-impact weather case studies across multiple spatial and temporal scales, paired with observational data, impact-based metrics, and open-source evaluation code. This setup allows direct verification of AI and NWP models on phenomena that affect people, enabling consistent comparisons across models and focusing development on events that carry real consequences rather than on global-scale statistics alone.

What carries the argument

The Extreme Weather Bench suite itself, which organizes case studies, data, metrics, and code into a reusable evaluation system.

If this is right

  • Models can be tested and ranked on identical high-impact events instead of researcher-selected examples.
  • Direct head-to-head comparisons become possible between AI weather models and traditional NWP systems.
  • Evaluation shifts from global error statistics toward metrics that reflect public impacts such as damage or disruption.
  • Open code and data lower the barrier for new groups to participate in model verification.
  • Ongoing community additions will expand coverage to more phenomena and regions over time.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Adoption of EWB could create a shared reference point similar to standard datasets in other AI domains, making it easier to measure genuine progress in weather prediction.
  • The benchmark may surface systematic weaknesses in current models for specific hazard types that global metrics currently obscure.
  • Integration with real-time forecast systems could turn the benchmark into a continuous monitoring tool rather than a one-time test set.
  • If the metrics prove predictive of operational value, they could influence how funding agencies and operational centers prioritize model development.

Load-bearing premise

The selected case studies and impact-based metrics represent the full range of high-impact weather events around the world and will produce model improvements that transfer to operational forecasting.

What would settle it

A set of models that perform well on the EWB cases but show no corresponding gains in real-world forecasts of high-impact events outside those cases, or a finding that the chosen cases miss major types of hazards experienced globally.

read the original abstract

Forecasting the wide variety of high-impact weather events experienced globally is a challenge for both Artificial Intelligence (AI) and Numerical Weather Prediction (NWP) models and it is critical that such models be properly verified before deployment. Although AI weather models are rapidly evolving, much of their evaluation is currently done either with a global-scale evaluation or by hand-picking a small number of case studies or a region. A widely-used open-source benchmark suite focusing on high-impact weather will help to drive the science forward for all scales of weather models, as it has for other AI fields. Here we introduce Extreme Weather Bench (EWB), a new community-driven benchmark suite that facilitates model validation and verification on a variety of high-impact hazards that matter to people around the globe. EWB provides a standard set of case studies (spanning across multiple spatial and temporal scales and different parts of the weather spectrum), observational data, impact-based metrics, and open-source code for users to evaluate their models. Verifying that a model works against a standard set of case studies, especially events that are high-impact for the general public, is a key piece of improving the trustworthiness of AI models. EWB will help to drive the science forward for all weather models, enabling true comparisons across models and evaluating models on specific high-impact phenomena through the use of case studies. EWB is a free open-source community-driven system and will continue to evolve to include additional phenomena, test cases and metrics in collaboration with the worldwide weather and forecast verification community.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces Extreme Weather Bench (EWB), a community-driven open-source benchmark suite for evaluating AI and NWP models on high-impact weather. It supplies a standardized collection of case studies spanning multiple spatial and temporal scales, associated observational data, impact-based metrics, and code to support verification and enable comparisons across models.

Significance. If widely adopted, EWB could establish a common reference for assessing model performance on societally relevant hazards, addressing the current reliance on global aggregates or ad-hoc case selection. The extensible, community-oriented design and emphasis on impact-based evaluation are strengths that align with successful benchmark efforts in other AI domains.

major comments (1)
  1. Abstract: the positioning of EWB as critical for 'improving the trustworthiness of AI models' and 'driving the science forward' rests on the untested assumption that the chosen cases and impact-based metrics are representative and will produce meaningful improvements; the manuscript provides no quantitative validation results, example model evaluations, or comparisons to demonstrate this utility.
minor comments (2)
  1. Abstract: the term 'different parts of the weather spectrum' is introduced without definition or elaboration, which reduces clarity for readers outside the immediate domain.
  2. The manuscript should include explicit links or DOIs to the open-source code repository, data sources, and case-study files in the main text (rather than only in supplementary material) to support immediate reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive review and recommendation of minor revision. We address the single major comment below.

read point-by-point responses
  1. Referee: Abstract: the positioning of EWB as critical for 'improving the trustworthiness of AI models' and 'driving the science forward' rests on the untested assumption that the chosen cases and impact-based metrics are representative and will produce meaningful improvements; the manuscript provides no quantitative validation results, example model evaluations, or comparisons to demonstrate this utility.

    Authors: We agree that the abstract makes forward-looking claims without accompanying quantitative demonstrations or model comparisons in the current manuscript. The paper's core contribution is the introduction of the standardized benchmark (case studies, data, metrics, and code) rather than its application to specific models. To address this directly, we will revise the abstract to moderate the language and add a short section with example evaluations of baseline AI and NWP models on a subset of cases, thereby illustrating the benchmark's intended use. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper introduces Extreme Weather Bench (EWB) as a community-driven benchmark suite supplying case studies, observational data, impact-based metrics, and open-source code for model evaluation on high-impact weather. No derivation chain, equations, fitted parameters, or predictive claims exist that could reduce to inputs by construction. The contribution is the factual assembly and release of these artifacts, with representativeness framed as an assumption about downstream utility rather than a precondition or self-referential step. No self-citations are load-bearing for any central claim, and the work is self-contained as a framework release without internal reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on domain assumptions about what constitutes high-impact weather and how to measure model skill on it; no free parameters or invented physical entities are introduced.

axioms (2)
  • domain assumption A fixed set of case studies can serve as a representative and stable benchmark for global high-impact weather evaluation.
    Invoked when stating that the provided cases span scales and phenomena sufficiently for community use.
  • domain assumption Impact-based metrics provide a more relevant assessment of model performance than traditional verification scores alone.
    Stated as a key feature of the benchmark design.

pith-pipeline@v0.9.0 · 5598 in / 1255 out tokens · 27757 ms · 2026-05-09T19:17:29.013797+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

69 extracted references · 41 canonical work pages

  1. [1]

    Scienc e 382(6669), 1416–1421 (2023) https://doi.org/10.1126/science.adi2336

    Lam, R., Sanchez-Gonzalez, A., Willson, M., Wirnsberger, P., Fortunato, M., Alet, F., Ravuri, S., et al.: Learning skillful medium-range global weather forecasting. Science, 2336 (2023) https://doi.org/10.1126/science.adi2336

  2. [2]

    Nature637, 84–90 (2025) https://doi.org/10.1038/s41586-024-08252-9

    Price, I., Sanchez-Gonzalez, A., Alet, F., Andersson, T.R., El-Kadi, A., Masters, D., Ewalds, T.,et al.: Probabilistic weather forecasting with machine learning. Nature637(8044), 84–90 (2025) https://doi.org/10.1038/s41586-024-08252-9

  3. [3]

    Skillful joint probabilistic weather forecasting from marginals.arXiv preprint arXiv:2506.10772, 2025

    Alet, F., Price, I., El-Kadi, A., Masters, D., Markou, S., Andersson, T.R., Stott, J.,et al.: Skillful joint probabilistic weather forecasting from marginals. arXiv [Cs.LG] (2025) https://doi.org/10.48550/arXiv.2506.10772

  4. [4]

    Accurate medium-range global weather forecasting with 3d neural networks,

    Bi, K., Xie, L., Zhang, H., Chen, X., Gu, X., Tian, Q.: Accurate medium-range global weather forecasting with 3D neural networks. Nature (2023) https://doi. org/10.1038/s41586-023-06185-3

  5. [5]

    Journal of Advances in Modeling Earth Systems12(11) (2020) https://doi.org/10.1029/ 2020ms002203

    Rasp, S., Dueben, P.D., Scher, S., Weyn, J.A., Mouatadid, S., Thuerey, N.: Weath- erBench: A benchmark data set for data-driven weather forecasting. Journal of Advances in Modeling Earth Systems12(11) (2020) https://doi.org/10.1029/ 2020ms002203

  6. [6]

    WeatherBench 2: A benchmark for the next generation of data‐driven global weather models

    Rasp, S., Hoyer, S., Merose, A., Langmore, I., Battaglia, P., Russell, T., Sanchez- Gonzalez, A., et al.: WeatherBench 2: A benchmark for the next generation of data-driven global weather models. Journal of Advances in Modeling Earth Systems16(6) (2024) https://doi.org/10.1029/2023ms004019

  7. [7]

    Geophysical Research Letters51(22) (2024) https://doi.org/10.1029/ 2024gl110960

    Feldmann, M., Beucler, T., Gomez, M., Martius, O.: Lightning-fast convective outlooks: Predicting severe convective environments with global AI-based weather models. Geophysical Research Letters51(22) (2024) https://doi.org/10.1029/ 2024gl110960

  8. [8]

    npj Climate and Atmospheric Science7(1), 1–12 (2024) https://doi.org/10.1038/s41612-024-00769-0

    Liu, C.-C., Hsu, K., Peng, M.S., Chen, D.-S., Chang, P.-L., Hsiao, L.-F., Fong, C.-T.,et al.: Evaluation of five global AI models for predicting weather in Eastern Asia and Western Pacific. npj Climate and Atmospheric Science7(1), 1–12 (2024) https://doi.org/10.1038/s41612-024-00769-0

  9. [9]

    In: IEEE Computer Vision and Pattern Recognition (CVPR) (2009)

    Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: A 31 large-scale hierarchical image database. In: IEEE Computer Vision and Pattern Recognition (CVPR) (2009)

  10. [10]

    npj Natural Hazards1(1), 1–6 (2024) https://doi.org/10.1038/s44304-024-00014-x

    McGovern, A., Demuth, J., Bostrom, A., Wirz, C.D., Tissot, P.E., Cains, M.G., Musgrave, K.D.: The value of convergence research for developing trustworthy AI for weather, climate, and ocean hazards. npj Natural Hazards1(1), 1–6 (2024) https://doi.org/10.1038/s44304-024-00014-x

  11. [11]

    Weather and Forecasting39, 1219–1241 (2024) https: //doi.org/10.1175/WAF-D-23-0180.1

    Cains, M.G., Wirz, C.D., Demuth, J.L., Bostrom, A., Gagne, D.J., McGovern, A., Sobash, R.A., Madlambayan, D.: Exploring NWS forecasters’ assessment of AI guidance trustworthiness. Weather and Forecasting39, 1219–1241 (2024) https: //doi.org/10.1175/WAF-D-23-0180.1

  12. [12]

    Bulletin of the American Meteorological Society105, 2194–2215 (2024) https://doi.org/10.1175/BAMS-D-24-0044.1

    Wirz, C.D., Demuth, J.L., Cains, M.G., White, M., Radford, J., Bostrom, A.: National weather service (NWS) forecasters’ perceptions of AI/ML and its use in operational forecasting. Bulletin of the American Meteorological Society105, 2194–2215 (2024) https://doi.org/10.1175/BAMS-D-24-0044.1

  13. [13]

    Risk Analysis44, 1498–1513 (2024) https://doi.org/10.1111/risa.14245

    Bostrom, A., Demuth, J.L., Wirz, C.D., Cains, M.G., Schumacher, A., Madlam- bayan, D., Bansal, A.S., Bearth, A., Chase, R., Crosman, K.M., Ebert-Uphoff, I., Gagne, D.J., Guikema, S., Hoffman, R., Johnson, B.B., Kumler-Bonfanti, C., Lee, J.D., Lowe, A., McGovern, A., Williams, J.K.: Trust and trustworthy arti- ficial intelligence: A research agenda for AI ...

  14. [14]

    Weather and Forecasting8(2), 281–293 (1993) https://doi

    Murphy, A.H.: What is a good forecast? An essay on the nature of goodness in weather forecasting. Weather and Forecasting8(2), 281–293 (1993) https://doi. org/10.1175/1520-0434(1993)008⟨0281:WIAGFA⟩2.0.CO;2

  15. [15]

    Journal of the American Statistical Association102(477), 359–378 (2007)

    Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and esti- mation. Journal of the American Statistical Association102(477), 359–378 (2007)

  16. [16]

    Journal of the American Statistical Association106(494), 746–762 (2011)

    Gneiting, T.: Making and evaluating point forecasts. Journal of the American Statistical Association106(494), 746–762 (2011)

  17. [17]

    Weather and Forecasting11(1), 3–20 (1996) https://doi.org/10.1175/ 1520-0434(1996)011⟨0003:TFAASE⟩2.0.CO;2

    Murphy, A.H.: The Finley affair: A signal event in the history of forecast ver- ification. Weather and Forecasting11(1), 3–20 (1996) https://doi.org/10.1175/ 1520-0434(1996)011⟨0003:TFAASE⟩2.0.CO;2

  18. [18]

    Artifi- cial Intelligence for the Earth Systems3, 230104 (2024) https://doi.org/10.1175/ AIES-D-23-0104.1

    Brooks, H.E., Flora, M.L., Baldwin, M.E.: A rose by any other name: On basic scores from the 2×2 table and the plethora of names attached to them. Artifi- cial Intelligence for the Earth Systems3, 230104 (2024) https://doi.org/10.1175/ AIES-D-23-0104.1

  19. [19]

    Artificial Intelligence for the Earth Systems (2024) https://doi.org/10

    Hakim, G.J., Masanam, S.: Dynamical tests of a deep-learning weather prediction model. Artificial Intelligence for the Earth Systems (2024) https://doi.org/10. 32 1175/AIES-D-23-0090.1

  20. [20]

    Weather and Forecasting35(4), 1381–1406 (2020) https://doi.org/ 10.1175/WAF-D-19-0108.1

    Demuth, J.L., Morss, R.E., Jankov, I., Alcott, T.I., Alexander, C.R., Nietfeld, D., Jensen, T.L., Novak, D.R., Benjamin, S.G.: Recommendations for develop- ing useful and usable convection-allowing model ensemble information for NWS forecasters. Weather and Forecasting35(4), 1381–1406 (2020) https://doi.org/ 10.1175/WAF-D-19-0108.1

  21. [21]

    Weather, Climate, and Society14(2), 481–498 (2022) https://doi.org/ 10.1175/WCAS-D-21-0034.1

    Ripberger, J., Bell, A., Fox, A., Forney, A., Livingston, W., Gaddie, C., Silva, C., Jenkins-Smith, H.: Communicating probability information in weather forecasts: Findings and recommendations from a living systematic review of the research literature. Weather, Climate, and Society14(2), 481–498 (2022) https://doi.org/ 10.1175/WCAS-D-21-0034.1

  22. [22]

    Quarterly Journal of the Royal Meteorological Society126(563), 649–667 (2000) https://doi.org/10.1002/qj.49712656313

    Richardson, D.S.: Skill and relative economic value of the ECMWF ensemble pre- diction system. Quarterly Journal of the Royal Meteorological Society126(563), 649–667 (2000) https://doi.org/10.1002/qj.49712656313

  23. [23]

    American Meteorological Journal1, 19–24 (1884)

    Finley, J.P.: Tornado predictions. American Meteorological Journal1, 19–24 (1884)

  24. [24]

    Jolliffe, I.T., Stephenson, D.B.: Forecast Verification: A Practitioner’s Guide in Atmospheric Science, p. 274. Wiley-Blackwell, Oxford, U.K. (2012)

  25. [25]

    Geoscientific Model Development17(21), 7915–7962 (2024) https: //doi.org/10.5194/gmd-17-7915-2024

    Olivetti, L., Messori, G.: Do data-driven models beat numerical models in fore- casting weather extremes? A comparison of IFS HRES, Pangu-Weather, and GraphCast. Geoscientific Model Development17(21), 7915–7962 (2024) https: //doi.org/10.5194/gmd-17-7915-2024

  26. [26]

    Journal of Open Source Software9(99), 6889 (2024) https://doi.org/10.21105/joss.06889

    Leeuwenburg, T., Loveday, N., Ebert, E.E., Cook, H., Khanarmuei, M., Taggart, R.J., Ramanathan, N.,et al.: Scores: A Python package for verifying and evaluat- ing models and predictions with xarray. Journal of Open Source Software9(99), 6889 (2024) https://doi.org/10.21105/joss.06889

  27. [27]

    Nature648(8092), 97–108 (2025) https://doi.org/10.1038/ s41586-025-09716-2

    Xiang, A., Andrews, J.T.A., Bourke, R.L., Thong, W., LaChance, J.M., Georgievski, T., Modas, A.,et al.: Fair human-centric image dataset for ethi- cal AI benchmarking. Nature648(8092), 97–108 (2025) https://doi.org/10.1038/ s41586-025-09716-2

  28. [28]

    Proteins89(12), 1633–1646 (2021) https://doi.org/10.1002/prot.26223

    Kryshtafovych, A., Moult, J., Albrecht, R., Chang, G.A., Chao, K., Fraser, A., Greenfield, J., Hartmann, M.D., Herzberg, O., Josts, I., Leiman, P.G., Linden, S.B., Lupas, A.N., Nelson, D.C., Rees, S.D., Shang, X., Sokolova, M.L., Tidow, H., AlphaFold2 team: Computational models in the service of X-ray and cryo- electron microscopy structure determination....

  29. [29]

    doi: 10.18653/v1/W18-5446

    Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A 33 multi-task benchmark and analysis platform for natural language understand- ing. In: Proceedings of the International Conference on Learning Representations (ICLR), pp. 353–355 (2019). https://doi.org/10.18653/v1/W18-5446

  30. [30]

    Nature Climate Change6(1), 106–111 (2016) https://doi.org/10.1038/nclimate2777

    Lin, N., Emanuel, K.: Grey swan tropical cyclones. Nature Climate Change6(1), 106–111 (2016) https://doi.org/10.1038/nclimate2777

  31. [31]

    Nath, S., Palmer, T.: Can AI models reliably forecast extreme weather events? Nature651(8106), 583–584 (2026) https://doi.org/10.1038/d41586-026-00842-z

  32. [32]

    Statistical Science32(1), 106–127 (2017) https://doi.org/10.1214/16-sts588

    Lerch, S., Thorarinsdottir, T.L., Ravazzolo, F., Gneiting, T.: Forecaster’s dilemma: Extreme events and forecast evaluation. Statistical Science32(1), 106–127 (2017) https://doi.org/10.1214/16-sts588

  33. [33]

    Bulletin of the American Meteorological Society103(9), 2050–2068 (2022) https://doi.org/10.1175/bams-d-21-0234.1

    Magnusson, L., Ackerley, D., Bouteloup, Y., Chen, J.-H., Doyle, J., Earnshaw, P., Kwon, Y.C.,et al.: Skill of medium-range forecast models using the same initial conditions. Bulletin of the American Meteorological Society103(9), 2050–2068 (2022) https://doi.org/10.1175/bams-d-21-0234.1

  34. [34]

    Data Project Report 1, Integrated Research on Disas- ter Risk (2014)

    Integrated Research on Disaster Risk: Peril classification and hazard glossary. Data Project Report 1, Integrated Research on Disas- ter Risk (2014). https://council.science/wp-content/uploads/2019/12/ Peril-Classification-and-Hazard-Glossary-1.pdf

  35. [36]

    arXiv [Physics.Ao-Ph] (2024)

    Jin, W., Weyn, J., Zhao, P., Xiang, S., Bian, J., Fang, Z., Dong, H., Sun, H., Thambiratnam, K., Zhang, Q.: WeatherReal: A benchmark based on in-situ observations for evaluating weather models. arXiv [Physics.Ao-Ph] (2024)

  36. [37]

    Quarterly Journal of the Royal Meteorological Society148(748), 3124–3137 (2022) https://doi.org/10.1002/qj

    Lavers, D.A., Simmons, A., Vamborg, F., Rodwell, M.J.: An evaluation of ERA5 precipitation for climate monitoring. Quarterly Journal of the Royal Meteorological Society148(748), 3124–3137 (2022) https://doi.org/10.1002/qj. 4351

  37. [38]

    Quarterly Journal of the Royal Meteorological Society150(758), 436–446 (2024) https://doi.org/10.1002/qj.4604

    Buschow, S.: Tropical convection in ERA5 has partly shifted from parameterized to resolved. Quarterly Journal of the Royal Meteorological Society150(758), 436–446 (2024) https://doi.org/10.1002/qj.4604

  38. [39]

    WMO, vol

    World Meteorological Organization: Guide to the Global Observing System. WMO, vol. 488. WMO, Geneva (2010)

  39. [40]

    Geoscientific Instrumentation, Methods and Data Systems Discussions (2016) https://doi.org/ 34 10.5194/gi-2016-9

    Dunn, R.J.H., Willett, K.M., Parker, D.E., Mitchell, L.: GHCNh. Geoscientific Instrumentation, Methods and Data Systems Discussions (2016) https://doi.org/ 34 10.5194/gi-2016-9

  40. [41]

    In: ECSS2023 (2023)

    Brimelow, J.C., Kopp, G.A., Sills, D.M.: The Northern Hail Project: A renais- sance in hail research in Canada. In: ECSS2023 (2023). Copernicus Meetings

  41. [42]

    Bulletin of the Ameri- can Meteorological Society101(12), 2113–2132 (2020) https://doi.org/10.1175/ BAMS-D-20-0012.1

    Sills, D.M., Kopp, G.A., Elliott, L., Jaffe, A.L., Sutherland, L., Miller, C.S., Kunkel, J.M., Hong, E., Stevenson, S., Wang, W.: The Northern Tornadoes Project: Uncovering Canada’s true tornado climatology. Bulletin of the Ameri- can Meteorological Society101(12), 2113–2132 (2020) https://doi.org/10.1175/ BAMS-D-20-0012.1

  42. [43]

    Eos, Transactions, AGU90(46) (2009) https://doi.org/10.1029/2009EO060002

    Knapp, K.R., Kruk, M.C., Levinson, D.H., Gibney, E.J.: Archive compiles new resource for global tropical cyclone research. Eos, Transactions, AGU90(46) (2009) https://doi.org/10.1029/2009EO060002

  43. [44]

    Scientific Data11, 900 (2024) https://doi.org/10.1038/s41597-024-03679-1

    Mo, R.: EDARA: An ERA5-based dataset for atmospheric river analysis. Scientific Data11, 900 (2024) https://doi.org/10.1038/s41597-024-03679-1

  44. [45]

    Meteorological Applications22(3), 534–543 (2015) https://doi.org/10.1002/met

    Gilleland, E., Roux, G.: A new approach to testing forecast predictive accuracy. Meteorological Applications22(3), 534–543 (2015) https://doi.org/10.1002/met. 1485

  45. [46]

    Epidemiology20, 205–213 (2009) https: //doi.org/10.1097/EDE.0b013e318190ee08

    Anderson, B., Bell, M.: Weather-related mortality: how heat, cold, and heat waves affect mortality in the United States. Epidemiology20, 205–213 (2009) https: //doi.org/10.1097/EDE.0b013e318190ee08

  46. [47]

    arXiv [Physics.Ao-Ph] (2024)

    Lang, S., Alexe, M., Chantry, M., Dramsch, J., Pinault, F., Raoult, B., Clare, M.C.A., et al.: AIFS - ECMWF’s data-driven forecasting system. arXiv [Physics.Ao-Ph] (2024)

  47. [48]

    Bulletin of the American Meteorological Society106(1), 68–76 (2025) https://doi.org/10

    Radford, J.T., Ebert-Uphoff, I., Stewart, J.Q., Musgrave, K.D., DeMaria, R., Tourville, N., Hilburn, K.: Accelerating community-wide evaluation of AI models for global weather prediction by facilitating access to model output. Bulletin of the American Meteorological Society106(1), 68–76 (2025) https://doi.org/10. 1175/bams-d-24-0057.1

  48. [49]

    In: Proceedings of the Platform for Advanced Scientific Computing Conference, pp

    Kurth, T., Subramanian, S., Harrington, P., Pathak, J., Mardani, M., Hall, D., Miele, A., Kashinath, K., Anandkumar, A.: FourCastNet: Accelerating global high-resolution weather forecasting using adaptive Fourier neural operators. In: Proceedings of the Platform for Advanced Scientific Computing Conference, pp. 1–11. ACM, New York, NY, USA (2023). https:/...

  49. [50]

    arXiv [Cs.LG] (2023) 35

    Bonev, B., Kurth, T., Hundt, C., Pathak, J., Baust, M., Kashinath, K., Anand- kumar, A.: Spherical Fourier neural operators: Learning stable dynamics on the sphere. arXiv [Cs.LG] (2023) 35

  50. [51]

    https://www.ecmwf.int/en/publications/ ifs-documentation

    Blanchonnet, H.: IFS Documentation. https://www.ecmwf.int/en/publications/ ifs-documentation. Last accessed: 22 April 2026 (2022)

  51. [52]

    Nature Climate Change5, 46–50 (2015) https://doi.org/10.1038/nclimate2468

    Christidis, N., Jones, G., Stott, P.: Dramatically increasing chance of extremely hot summers since the 2003 European heatwave. Nature Climate Change5, 46–50 (2015) https://doi.org/10.1038/nclimate2468

  52. [53]

    Science305, 994–997 (2004) https://doi.org/10.1126/ science.1098704

    Meehl, G.A., Tebaldi, C.: More intense, more frequent, and longer lasting heat waves in the 21st century. Science305, 994–997 (2004) https://doi.org/10.1126/ science.1098704

  53. [54]

    Nature Climate Change16(1), 26–32 (2026) https://doi.org/10

    Callahan, C.W., Trok, J., Wilson, A.J., Gould, C.F., Heft-Neal, S., Diffenbaugh, N.S., Burke, M.: Increasing risk of mass human heat mortality if historical weather patterns recur. Nature Climate Change16(1), 26–32 (2026) https://doi.org/10. 1038/s41558-025-02480-1

  54. [55]

    Environmental Epidemiology6(1), 189 (2022) https://doi.org/10.1097/ EE9.0000000000000189

    Henderson, S.B., McLean, K.E., Lee, M.J., Kosatsky, T.: Analysis of commu- nity deaths during the catastrophic 2021 heat dome: Early evidence to inform the public health response during subsequent events in greater Vancouver, Canada. Environmental Epidemiology6(1), 189 (2022) https://doi.org/10.1097/ EE9.0000000000000189

  55. [56]

    Artificial Intelligence for the Earth Systems5(1) (2026) https: //doi.org/10.1175/aies-d-25-0065.1

    White, E., Hill, A.J.: Severe weather forecasts from artificial intelligence weather prediction models. Artificial Intelligence for the Earth Systems5(1) (2026) https: //doi.org/10.1175/aies-d-25-0065.1

  56. [57]

    Artificial Intelligence for the Earth Systems5(1) (2026) https://doi.org/10.1175/aies-d-25-0045.1

    Hua, Z., Sobash, R.A., Gagne, D.J., Sha, Y., Anderson-Frey, A.: Improving medium-range severe weather prediction through transformer postprocessing of AI weather forecasts. Artificial Intelligence for the Earth Systems5(1) (2026) https://doi.org/10.1175/aies-d-25-0045.1

  57. [58]

    Bulletin of the American Meteorological Society (2020) https: //doi.org/10.1175/BAMS-D-19-0321.1

    Gensini, V.A., Haberlie, A.M., Marsh, P.T.: Practically perfect hindcasts of severe convective storms. Bulletin of the American Meteorological Society (2020) https: //doi.org/10.1175/BAMS-D-19-0321.1

  58. [59]

    National Weather Digest28, 13–24 (2004)

    Craven, J.P., Brooks, H.E.: Baseline climatology of sounding derived parameters associated with deep moist convection. National Weather Digest28, 13–24 (2004)

  59. [60]

    Water3, 455–478 (2011) https://doi.org/10.3390/w3020445

    Dettinger, M.D., Ralph, F.M., Das, T., Neiman, P.J., Cayan, D.: Atmospheric rivers, floods, and the water resources of California. Water3, 455–478 (2011) https://doi.org/10.3390/w3020445

  60. [61]

    Scientific Data11(1), 440 (2024) https://doi

    Guan, B., Waliser, D.E.: A regionally refined quarter-degree global atmospheric rivers database based on ERA5. Scientific Data11(1), 440 (2024) https://doi. org/10.1038/s41597-024-03258-4

  61. [62]

    Geophysical Research Letters19, 2401–2404 (1992) https://doi.org/10

    Newell, R.E., Newell, N.E., Zhu, Y., Scott, C.: Tropospheric rivers?—A pilot 36 study. Geophysical Research Letters19, 2401–2404 (1992) https://doi.org/10. 1029/92GL02916

  62. [63]

    Natural Hazards76(3), 1651–1665 (2015)

    Habeeb, D., Vargo, J., Stone, B.: Rising heat wave trends in large US cities. Natural Hazards76(3), 1651–1665 (2015)

  63. [64]

    Technical report (2021)

    United States Environmental Protection Agency: Technical documentation: Heat waves. Technical report (2021). https://www.epa.gov/sites/default/files/ 2021-04/documents/heat-waves td.pdf

  64. [65]

    Monthly Weather Review149(5), 1211–1232 (2021) https://doi.org/10.1175/mwr-d-20-0248.1

    Allen, J.T., Allen, E.R., Richter, H., Lepore, C.: Australian tornadoes in 2013: Implications for climatology and forecasting. Monthly Weather Review149(5), 1211–1232 (2021) https://doi.org/10.1175/mwr-d-20-0248.1

  65. [66]

    Bulletin of the American Meteorological Society103(10), 2273–2284 (2022) https://doi.org/10

    May, R.M., Goebbert, K.H., Thielen, J.E., Leeman, J.R., Camron, M.D., Bruick, Z., Bruning, E.C., Manser, R.P., Arms, S.C., Marsh, P.T.: Metpy: A mete- orological python library for data analysis and visualization. Bulletin of the American Meteorological Society103(10), 2273–2284 (2022) https://doi.org/10. 1175/BAMS-D-21-0125.1

  66. [67]

    Geoscientific Model Development 14, 5023–5048 (2021) https://doi.org/10.5194/gmd-14-5023-2021

    Ullrich, P.A., Zarzycki, C.M., McClenny, E.E., Pinheiro, M.C., Stansfield, A.M., Reed, K.A.: TempestExtremes v2.1: a community framework for feature detec- tion, tracking, and analysis in large datasets. Geoscientific Model Development 14, 5023–5048 (2021) https://doi.org/10.5194/gmd-14-5023-2021

  67. [68]

    CLIVAR working group on hurricanes

    Walsh, K.J.E.,et al.: Hurricanes and climate: The U.S. CLIVAR working group on hurricanes. Bulletin of the American Meteorological Society96, 997–1017 (2015) https://doi.org/10.1175/BAMS-D-13-00242.1

  68. [69]

    Journal of Climate20, 2307–2314 (2007) https://doi.org/10.1175/JCLI4074.1

    Walsh, K.J.E., Fiorino, M., Landsea, C.W., McInnes, K.L.: Objectively deter- mined resolution-dependent threshold criteria for the detection of tropical cyclones in climate models and reanalyses. Journal of Climate20, 2307–2314 (2007) https://doi.org/10.1175/JCLI4074.1

  69. [70]

    Weather and Forecasting38, 1363–1374 (2023) https://doi.org/10.1175/WAF-D-22-0199.1 37

    Trabing, B.C., Musgrave, K.D., DeMaria, M., Zachry, B.C., Brennan, M.J., Rappaport, E.N.: The development and evaluation of a tropical cyclone proba- bilistic landfall forecast product. Weather and Forecasting38, 1363–1374 (2023) https://doi.org/10.1175/WAF-D-22-0199.1 37