AIFS-DOP: End-to-End Medium-Range Weather Prediction from Observations Alone with Machine Learning
Pith reviewed 2026-06-26 18:51 UTC · model grok-4.3
The pith
A machine learning model trained only on gridded observations matches IFS performance at medium ranges when verified against real data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AIFS-DOP is trained on a 40-year harmonized dataset of gridded observations without using NWP reanalysis or model data. The resulting model is competitive with ECMWF's Integrated Forecasting System when scored on a one-year period of forecasts across 2021/2022. This progress on Direct Observation Prediction represents the first time that a data-driven model, trained solely on observations, is competitive with the IFS at medium ranges for several key upper-air and surface headline scores, when verified against observation data.
What carries the argument
AIFS-DOP, an end-to-end machine learning model trained exclusively on harmonized gridded observations to produce direct observation predictions.
If this is right
- Medium-range forecasts can be generated without any input from numerical weather prediction reanalysis or model fields during either training or inference.
- Verification can be performed directly against withheld observations rather than against reanalysis products.
- The same observation-only training procedure yields competitive scores on both upper-air and surface variables at medium ranges.
- Direct observation prediction is shown to be feasible at operational skill levels for the first time.
- pith_inferences=[
Load-bearing premise
The 40-year harmonized gridded observation dataset must be of high enough quality, spatial coverage, and temporal consistency that a model trained on it can generalize to an independent future year without any leakage from or dependence on numerical weather prediction fields.
What would settle it
A clear underperformance relative to IFS on multiple headline scores when the same verification protocol is applied to an additional independent year after 2022 would falsify the competitiveness claim.
Figures
read the original abstract
We introduce the Artificial Intelligence Forecasting System for Direct Observation Prediction (AIFS-DOP). AIFS-DOP is trained on a 40-year harmonized dataset of gridded observations, without using numerical weather prediction (NWP) reanalysis or model data. The resulting model is competitive with ECMWF's Integrated Forecasting System (IFS) when scored on a one year period of forecasts across 2021/2022. This progress on Direct Observation Prediction represents the first time that a data-driven model, trained solely on observations, is competitive with the IFS at medium ranges for several key upper-air and surface headline scores, when verified against observation data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces AIFS-DOP, a machine-learning model for medium-range weather forecasting trained end-to-end solely on a 40-year harmonized gridded observation dataset with no use of NWP reanalysis or model fields. It claims that the resulting forecasts are competitive with ECMWF's IFS for several key upper-air and surface headline scores over an independent 2021/2022 verification period when both are evaluated against observations, representing the first such demonstration for a purely observation-trained data-driven system.
Significance. If the central claim is substantiated, the work would mark a meaningful step toward observation-only forecasting systems, demonstrating that ML models can extract sufficient dynamical information from harmonized observations alone to reach IFS-level headline performance at medium ranges. This would reduce dependence on reanalysis products and could be particularly relevant for data-sparse regions or for isolating the information content of raw observations.
major comments (2)
- [Data section / Methods] The competitiveness claim rests entirely on the premise that the 40-year harmonized gridded observation dataset contains no implicit NWP influence or leakage. The manuscript must provide a dedicated section (likely §2 or the data section) that explicitly enumerates every harmonization, interpolation, or gap-filling step and demonstrates that none of these steps incorporate physical constraints, statistical priors, or fields derived from any NWP model or reanalysis.
- [Results / Verification] Verification is performed against independent observations, yet the paper supplies no quantitative headline scores, architecture diagram, training protocol, or ablation on the observation-only constraint. Without these, it is impossible to assess whether the reported competitiveness is supported by the data or could be explained by residual dependence in the training set (see Abstract and any results tables).
minor comments (1)
- [Abstract] Clarify the precise definition of 'harmonized gridded observations' versus reanalysis in the abstract and introduction to avoid reader confusion about the 'observations alone' boundary condition.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback and for highlighting the importance of rigorously documenting the observation-only training process. We address each major comment below and will incorporate the suggested changes in the revised manuscript.
read point-by-point responses
-
Referee: [Data section / Methods] The competitiveness claim rests entirely on the premise that the 40-year harmonized gridded observation dataset contains no implicit NWP influence or leakage. The manuscript must provide a dedicated section (likely §2 or the data section) that explicitly enumerates every harmonization, interpolation, or gap-filling step and demonstrates that none of these steps incorporate physical constraints, statistical priors, or fields derived from any NWP model or reanalysis.
Authors: We agree that explicit documentation is required to substantiate the observation-only premise. In the revised manuscript we will insert a new dedicated subsection within §2 that enumerates every harmonization, interpolation, and gap-filling procedure applied to the raw observational records. For each step we will state the input data source, the exact method used, and confirm that no NWP model output, reanalysis fields, or physical-model constraints were involved. This addition will directly address the leakage concern. revision: yes
-
Referee: [Results / Verification] Verification is performed against independent observations, yet the paper supplies no quantitative headline scores, architecture diagram, training protocol, or ablation on the observation-only constraint. Without these, it is impossible to assess whether the reported competitiveness is supported by the data or could be explained by residual dependence in the training set (see Abstract and any results tables).
Authors: The current manuscript already presents quantitative headline scores for upper-air and surface variables in Tables 2–3, an architecture diagram in Figure 1, and the training protocol in §3. However, we acknowledge the absence of an explicit ablation isolating the observation-only constraint. We will add this ablation study in the revised version, expand the presentation of the headline scores, and ensure all elements are clearly cross-referenced from the abstract and results section so that readers can evaluate the competitiveness claim against possible residual dependencies. revision: partial
Circularity Check
No circularity; empirical ML training and verification on independent observations
full rationale
The paper's central claim is that an ML model trained solely on a 40-year harmonized gridded observation dataset (explicitly without NWP reanalysis or model data) produces forecasts competitive with IFS when both are scored against independent observation data in 2021/2022. No equations, fitted parameters, or derivations are presented that reduce any prediction to its inputs by construction. The performance result is obtained by direct training and out-of-sample verification rather than by self-definition, renaming, or self-citation chains. The dataset independence is asserted as a precondition but is not shown to collapse into the target result via any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
8 APREPRINT- JUNE18, 2026 Anna Allen, Stratis Markou, Will Tebbutt, James Requeima, Wessel P
URLhttps://arxiv.org/abs/2412.15687. 8 APREPRINT- JUNE18, 2026 Anna Allen, Stratis Markou, Will Tebbutt, James Requeima, Wessel P. Bruinsma, Tom R. Andersson, Michael Herzog, Nicholas D. Lane, Matthew Chantry, J. Scott Hosking, and Richard E. Turner. End-to-end data-driven weather prediction.Nature, 641(8065):1172–1179,
arXiv 2026
-
[2]
URL https: //www.nature.com/articles/s41586-025-08897-0
doi:10.1038/s41586-025-08897-0. URL https://doi.org/10. 1038/s41586-025-08897-0. Marcin Andrychowicz, Lasse Espeholt, Di Li, Samier Merchant, Alexander Merose, Fred Zyda, Shreya Agrawal, and Nal Kalchbrenner. Deep learning for day forecasts from sparse observations (MetNet-3).arXiv preprint arXiv:2306.06079,
-
[3]
URL https://arxiv.org/abs/2306.06079
doi:10.48550/arXiv.2306.06079. URL https://arxiv.org/abs/2306.06079. v3, July
-
[4]
Accurate medium-range global weather forecasting with 3D neural networks , volume =
doi:10.1038/s41586-023-06185-3. Eulalie Boucher, Mihai Alexe, Peter Lean, Ewan Pinnington, Simon Lang, Patrick Laloyaux, Lorenzo Zampieri, Patricia de Rosnay, Niels Bormann, and Anthony McNally. Learning coupled earth system dynamics with GraphDOP.arXiv preprint,
- [5]
-
[6]
URL https: //doi.org/10.15770/EUM_SEC_CLM_0046. EUMETSAT. HIRS Level 1C Fundamental Data Record Release 2 - Multimission - Global,
-
[7]
URL https: //doi.org/10.15770/EUM_SEC_CLM_0036. Hans Hersbach, Bill Bell, Paul Berrisford, Shoji Hirahara, András Horányi, Joaquín Muñoz-Sabater, Julien Nicolas, Carole Peubey, Raluca Radu, Dinand Schepers, Adrian Simmons, Cornel Soci, Saleh Abdalla, Xavier Abellan, Gianpaolo Balsamo, Peter Bechtold, Gionata Biavati, Jean Bidlot, Massimo Bonavita, Giovann...
-
[8]
doi:10.1002/qj.3803. URLhttps://doi.org/10.1002/qj.3803. Ryan Keisler. Forecasting global weather with Graph Neural Networks.arXiv preprint arXiv:2202.07575,
-
[9]
Forecasting Global Weather with Graph Neural Networks , publisher =
doi:10.48550/arXiv.2202.07575. URLhttps://arxiv.org/abs/2202.07575. Kenneth R. Knapp, S. Ansari, C. L. Bain, M. A. Bourassa, M. J. Dickinson, C. Funk, C. N. Helms, C. C. Hennon, C. D. Holmes, G. J. Huffman, J. P. Kossin, H.-T. Lee, A. Loew, and G. Magnusdottir. Globally gridded satellite (GridSat) observations for climate studies.Bulletin of the American ...
-
[10]
URLhttps://doi.org/10.1175/2011BAMS3039.1
doi:10.1175/2011BAMS3039.1. URLhttps://doi.org/10.1175/2011BAMS3039.1. Patrick Laloyaux, Mihai Alexe, Eulalie Boucher, Peter Lean, Ewan Pinnington, Simon Lang, Tobias Necker, and Anthony McNally. Using data assimilation tools to dissect GraphDOP,
-
[11]
URL https://arxiv.org/abs/ 2510.27388. Remi Lam, Alvaro Sanchez-Gonzalez, Matthew Willson, Peter Wirnsberger, Meire Fortunato, Ferran Alet, Suman Ravuri, Timo Ewalds, Zach Eaton-Rosen, Weihua Hu, Alexander Merose, Stephan Hoyer, George Holland, Oriol Vinyals, Jacklynn Stott, Alexander Pritzel, Shakir Mohamed, and Peter Battaglia. Learning skillful medium-...
-
[12]
Learning skillful medium-range global weather forecasting , volume =
doi:10.1126/science.adi2336. URL https://www.science.org/doi/10.1126/science.adi2336. Simon Lang, Mihai Alexe, Matthew Chantry, Jesper Dramsch, Florian Pinault, Baudouin Raoult, Mariana C. A. Clare, Christian Lessig, Michael Maier-Gerber, Linus Magnusson, Zied Ben Bouallègue, Ana Prieto Nemesio, Peter D. Dueben, Andrew Brown, Florian Pappenberger, and Flo...
-
[13]
URLhttps://arxiv.org/abs/2406.01465. Simon Lang, Mihai Alexe, Mariana CA Clare, Christopher Roberts, Rilwan Adewoyin, Zied Ben Bouallègue, Matthew Chantry, Jesper Dramsch, Peter D Dueben, Sara Hahner, et al. AIFS-CRPS: ensemble forecasting using a model trained with a loss function based on the continuous ranked probability score.npj Artificial Intelligen...
-
[14]
doi:https://doi.org/10.1038/s44387-026-00073-7. Peter Lean, Mihai Alexe, Eulalie Boucher, Ewan Pinnington, Simon Lang, Patrick Laloyaux, Niels Bormann, and Anthony McNally. Learning from nature: insights into GraphDOP’s representations of the Earth System.arXiv preprint,
-
[15]
URLhttps://arxiv.org/abs/2508.18018
doi:10.48550/arXiv.2508.18018. URLhttps://arxiv.org/abs/2508.18018. Anthony McNally, Christian Lessig, Peter Lean, Eulalie Boucher, Mihai Alexe, Ewan Pinnington, Matthew Chantry, Simon Lang, Chris Burrows, Marcin Chrust, Florian Pinault, Ethel Villeneuve, Niels Bormann, and Sean Healy. Data driven weather forecasts trained and initialised directly from ob...
-
[16]
URLhttps://arxiv.org/abs/2407.15586
doi:10.48550/arXiv.2407.15586. URLhttps://arxiv.org/abs/2407.15586. 9 APREPRINT- JUNE18, 2026 G. Moldovan, E. Pinnington, A. Prieto Nemesio, S. Lang, Z. Ben Bouallègue, J. Dramsch, M. Alexe, M. Santa Cruz, S. Hahner, H. Cook, H. Theissen, M. Clare, C. O’Brien, J. Polster, L. Magnusson, G. Mertes, F. Pinault, B. Raoult, P. de Rosnay, R. Forbes, and M. Chan...
-
[17]
URL https://egusphere.copernicus
doi:10.5194/egusphere-2025-4716. URL https://egusphere.copernicus. org/preprints/2025/egusphere-2025-4716/. D. J. Newman. Zarr storage specification version 2: Cloud-optimized persistence using Zarr. Esds-rfc-048, NASA Earth Science Data and Information System Standards Coordination Office,
-
[18]
Ilan Price, Alvaro Sanchez-Gonzalez, Ferran Alet, Tom R
URLhttps://arxiv.org/abs/2508.18486. Ilan Price, Alvaro Sanchez-Gonzalez, Ferran Alet, Tom R. Andersson, Andrew El-Kadi, Dominic Masters, Timo Ewalds, Jacklynn Stott, Shakir Mohamed, Peter Battaglia, Remi Lam, and Matthew Willson. Probabilistic weather forecasting with machine learning.Nature, 637(8044):84–90, January
-
[19]
doi:10.1038/s41586-024-08252-9. URL https://doi.org/10.1038/s41586-024-08252-9. Florence Rabier, Heikki Järvinen, E. Klinker, J.-F. Mahfouf, and A. Simmons. The ECMWF operational implementation of four-dimensional variational assimilation. Part I: experimental results with simplified physics.Quarterly Journal of the Royal Meteorological Society, 126(564):...
-
[20]
doi:10.1002/qj.49712656415. Ambrogio V olonté, Suzanne L. Gray, Peter A. Clark, Oscar Martínez-Alvarado, and Duncan Ackerley. Strong surface winds in storm eunice. part 1: storm overview and indications of sting jet activity from observations and model data. Weather, 79(2):40–45,
-
[21]
doi:https://doi.org/10.1002/wea.4402. Y . Wang, X. Zhang, W. Ning, M. A. Lazzara, M. Ding, C. H. Reijmer, P. C. J. P. Smeets, P. Grigioni, P. Heil, E. R. Thomas, D. Mikolajczyk, L. J. Welhouse, L. M. Keller, Z. Zhai, Y . Sun, and S. Hou. The AntAWS dataset: a compilation of Antarctic automatic weather station observations.Earth System Science Data, 15(1):411–429,
-
[22]
URLhttps://essd.copernicus.org/articles/15/411/2023/
doi:10.5194/essd-15-411-2023. URLhttps://essd.copernicus.org/articles/15/411/2023/. N. P. Wedi. Increasing the horizontal resolution in numerical weather prediction and climate simulations: illusion or panacea?Philosophical Transactions of the Royal Society A, 372,
-
[23]
Janni Yuval, Ian Langmore, Dmitrii Kochkov, and Stephan Hoyer
doi:10.1098/rsta.2013.0289. Janni Yuval, Ian Langmore, Dmitrii Kochkov, and Stephan Hoyer. Neural general circulation models optimized to predict satellite-based precipitation observations,
-
[24]
Cheng-Zhi Zou, Wenhui Wang, and NOAA CDR Program
URLhttps://arxiv.org/abs/2412.11973. Cheng-Zhi Zou, Wenhui Wang, and NOAA CDR Program. NOAA Fundamental Climate Data Record (FCDR) of MSU Level 1c Brightness Temperature, Version 1.0,
-
[25]
URL https://doi.org/10.7289/V51Z429F. Accessed: 2026-01-30. Appendix A Specification of training datasets Table 1 lists the datasets that were used to train the model described in Section
-
[26]
B Instrument acronyms Table 2 lists the full names of the satellite instruments that were used in the present study. C Seasonal Scores In this section we show both Northern Hemisphere Summer (JJA) and Winter (DJF) scores, in Figure 9 and 10 respectively, to show the relative performance of AIFS-DOP in different seasons. 10 APREPRINT- JUNE18, 2026 1 2 3 4 ...
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.