arxiv: 2604.16643 · v1 · submitted 2026-04-17 · ⚛️ physics.ao-ph

Recognition: unknown

WP-MIP: An Artificial Intelligence, Hybrid and Physically Based Model Intercomparison Project for Weather Prediction

Alex Kaltenbaugh, Amy McGovern, Andre L. O. Neves, Ankur Srivastava, Barbara Casati, Beth J. Woodham, Bruno S. Guimaraes, Caio A. S. Coelho, Catherine de Burgh-Day, Chen Li, Chris Harris, Chrstian Lussana, Claude Gilbert, David S. Richardson, Debra Hudson, Duncan Ackerley, Eun-Hee Lee, Fanglin Yang, Gan Zhang, Gregor Skok, Hongyan Zhu, Inna Polichtchouk, Jan-Huey Chen, Joel Miller, John Pill, Jorge L. Garcia Franco, Kathryn Newman, Koos van der Merwe, Kyounngmi Cho, Leo Separovic, Linus Magnusson, Maheswar Pradhan, Manuel Fuentes, Marion Mittermaier, Marta Koch, Martin Koehler, Masashi Ujiie, Massimo Bonavita, Michelle Harrold, Michelle Simoes Reboita, Mikhail Tolstykh, Mohau J. Mateyisi, Molly James, Nicholas Loveday, Nurizana Amir Aziz, Paulo Y. Kubota, Radomir Zaripov, Richard Mladek, Roland Potthast, Ron McTaggart-Cowan, Rostislav Fadeev, Stephane Chamberland, Subhrajit Rath, Syed Husain, Wei Li, Weiwei Li, Zhuo Wang, Zied Ben Bouallegue, Zubiar Maalick

Pith reviewed 2026-05-10 06:37 UTC · model grok-4.3

classification ⚛️ physics.ao-ph

keywords weather predictionmodel intercomparisonmachine learninghybrid modelsforecast verificationglobal deterministic forecastsWMO initiative

0 comments

The pith

WP-MIP will build a shared database of physical, machine-learning, and hybrid weather forecasts to compare their performance and develop best practices.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the Weather Prediction Model Intercomparison Project as a WMO-supported effort to address open questions about machine-learning models for weather prediction. It will gather forecasts from physically based models, current machine-learning systems, and hybrids that combine elements of both, all generated from both each center's preferred initial conditions and a common set. This collection will support broad, distributed evaluation to reveal where each approach succeeds or falls short in producing physically consistent predictions. The project aims to produce specialized verification techniques and concrete guidance for model developers and national weather services on improving forecast quality.

Core claim

WP-MIP creates a centralized repository of global deterministic forecasts from physically based, machine-learning, and hybrid models, contributed by institutions on six continents. By using both center-specific and common initial conditions, the project enables sensitivity studies that isolate the effects of initialization choices from model architecture. The resulting data will drive the development of AI-ready verification methods that highlight relative strengths and weaknesses, with the explicit goal of generating best-practice recommendations for operational forecasting systems.

What carries the argument

The centralized forecast database that stores predictions under both center-specific and common initial conditions to support sensitivity analysis and generalizable verification.

If this is right

Machine-learning model developers will receive targeted feedback on where their systems produce physically inconsistent outputs.
National weather centers will have data to evaluate whether hybrid models offer advantages in speed and accuracy for operational use.
New verification methods will emerge that are designed specifically for assessing machine-learning and hybrid forecasts.
International collaboration will produce guidance that directly influences the design of next-generation prediction systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The dual-initialization design could isolate whether differences in skill stem from model physics or from how each system handles starting conditions.
Results may identify specific forecast variables or lead times where hybrid approaches provide the clearest benefit over pure machine-learning or physical models.
The verification techniques developed here could serve as templates for similar intercomparisons in regional or climate modeling.

Load-bearing premise

Participating institutions will contribute enough forecasts under both initialization types to allow development of verification techniques that generalize across conditions and model classes.

What would settle it

If the collected forecasts fail to support verification techniques that perform reliably across different weather regimes and model types, the project cannot deliver the intended best-practice guidance.

read the original abstract

Rapid progress in the field of machine-learning for weather prediction has led to the emergence of algorithms whose forecasting skill can exceed that of traditional physically based models. This development represents an opportunity to improve the quality of forecasting services provided by operational centers, particularly given the speed at which machine-learning based models generate predictions. Despite the clear promise of these systems, questions remain about the ability of the current generation of machine-learning models to generate physically consistent predictions of the full suite of required forecast fields under all conditions. Answering these questions will require careful comparisons between the well-understood physically based models, current state-of-the-art machine-learning models, and the hybrid models that combine elements of these two archetypes. The Weather Prediction Model Intercomparison Project (WP-MIP) is a World Meteorological Organization-supported initiative whose initial goal is to create a centralized database of physically based, machine-learning and hybrid model forecasts to enable a distributed assessment and evaluation effort. The first instance of WP-MIP focuses on global deterministic predictions using both center-specific and common initializations to facilitate sensitivity studies. Forecasts contributed by institutions across six continents will be used to develop AI-ready verification techniques that highlight the strengths and weaknesses of each class of prediction system, with the goal of establishing best-practice guidance to model developers and national weather centers. The broad engagement of the operational and forecast-evaluation communities in WP-MIP will ensure that the project results are highly relevant to the development and deployment of next-generation weather prediction systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a project announcement for WP-MIP that sets up a database and evaluation plan for physical, ML, and hybrid weather models but reports no data or completed comparisons.

read the letter

The main thing to know is that this paper describes the launch of the Weather Prediction Model Intercomparison Project rather than delivering any new forecasts, metrics, or findings. It outlines plans for a centralized collection of global deterministic runs from traditional physics-based models, current machine-learning systems, and hybrids, with contributions from centers on six continents. The setup includes both each group's native initial conditions and a common set to support sensitivity checks on initialization versus model architecture. That distinction is a reasonable way to isolate where ML approaches may lose physical consistency across variables and conditions. The stated aim of building AI-ready verification methods that flag strengths and weaknesses for each class of model is also a practical focus, given how standard scores can miss issues like conservation violations in learned predictions. Broad operational involvement is noted as a way to keep the eventual guidance relevant to real services. The paper does a decent job laying out these intentions in plain terms without overclaiming what has been achieved so far. The obvious limitation is that everything rests on future participation, data volume, and whether the new verification techniques actually generalize. No sample forecasts, error statistics, or pilot results are shown, so there is no way to test the framework from the text. Potential practical hurdles such as data formats, sharing policies, or compute demands for the common-initialization runs are not addressed in any detail. This is mainly of interest to operational weather centers and groups already working on ML components for forecasting. A reader who needs ideas for benchmarking hybrid systems could extract useful structure from the proposed design, but the document itself adds no citable results or empirical support. I would send it for peer review. The topic is timely and the basic plan is coherent; referees could help refine the evaluation approach and participation requirements before the project collects its first large dataset.

Referee Report

1 major / 1 minor

Summary. The manuscript describes the Weather Prediction Model Intercomparison Project (WP-MIP), a World Meteorological Organization-supported initiative to create a centralized database of global deterministic forecasts from physically based numerical models, machine-learning models, and hybrid models. Forecasts will be contributed by institutions across six continents using both center-specific and common initial conditions to support sensitivity studies. The resulting data are intended to enable distributed evaluation, the development of AI-ready verification techniques that identify strengths and weaknesses of each model class, and the formulation of best-practice guidance for model developers and national weather centers.

Significance. If the project secures adequate participation and successfully produces generalizable verification methods, it would be significant for the weather prediction community. It would establish the first coordinated, multi-institutional framework for systematically comparing traditional physics-based forecasts with rapidly advancing machine-learning and hybrid systems, directly addressing questions of physical consistency across conditions. The resulting database and evaluation protocols could provide actionable guidance to operational centers on integrating machine-learning components into forecasting suites.

major comments (1)

Abstract: The central claim that WP-MIP will 'develop AI-ready verification techniques that highlight the strengths and weaknesses of each class of prediction system' and 'establish best-practice guidance' rests on the prospective availability of sufficient forecast data under common initializations and the subsequent creation of robust, generalizable metrics. No concrete details on proposed verification metrics, data standards, minimum participation thresholds, or example techniques are supplied, rendering the feasibility of these outcomes impossible to assess from the manuscript.

minor comments (1)

The manuscript would benefit from an explicit project timeline, data-access and licensing policies, and a list of institutions that have already committed to contributing forecasts.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive evaluation of WP-MIP's potential significance and for the constructive major comment. We acknowledge that the original manuscript, as a high-level project description, provided insufficient concrete details on verification metrics, data standards, participation thresholds, and example techniques, making feasibility hard to assess. We have revised the manuscript to address this directly.

read point-by-point responses

Referee: Abstract: The central claim that WP-MIP will 'develop AI-ready verification techniques that highlight the strengths and weaknesses of each class of prediction system' and 'establish best-practice guidance' rests on the prospective availability of sufficient forecast data under common initializations and the subsequent creation of robust, generalizable metrics. No concrete details on proposed verification metrics, data standards, minimum participation thresholds, or example techniques are supplied, rendering the feasibility of these outcomes impossible to assess from the manuscript.

Authors: We agree that the original abstract and body text were too high-level on these points, which limits evaluation of the project's concrete plans. WP-MIP is an ongoing WMO-supported initiative whose specific protocols are still being finalized through community consultation, so the manuscript intentionally focused on the overall framework rather than finalized deliverables. In the revised version we have: (1) updated the abstract to qualify the claims as goals contingent on data collection; (2) added a new subsection (Section 3.3) that specifies data standards (GRIB2/NetCDF with prescribed variables, horizontal/vertical resolutions, and metadata requirements), minimum participation thresholds (at least five institutions contributing at least one model per class, with a target of ten total contributors), and example verification techniques (traditional scores such as RMSE and anomaly correlation combined with AI-driven methods for detecting physical inconsistencies, e.g., conservation-law violations or regime-dependent error patterns). We also describe how common-initialization experiments will be used to isolate model-class differences. These additions make the intended outcomes more assessable while preserving the manuscript's prospective character. We believe the revision strengthens the paper without overstating current readiness. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The manuscript is a descriptive project announcement outlining the planned WP-MIP database, participation framework, and evaluation goals. It contains no equations, derivations, fitted parameters, or load-bearing claims that reduce to prior inputs by construction. All statements are prospective (e.g., future contributions and verification techniques) and rest on external factors such as institutional participation rather than any self-referential or fitted logic internal to the text. No self-citations are invoked to justify uniqueness or ansatzes, and the document makes no predictions that collapse to its own assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The document contains no mathematical derivations, empirical fits, or new physical postulates. It rests on the standard assumption that model intercomparison projects can usefully inform operational practice, which is drawn from prior climate and weather MIPs.

pith-pipeline@v0.9.0 · 5850 in / 1039 out tokens · 26822 ms · 2026-05-10T06:37:06.260723+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

AIMIP Phase 1: systematic evaluations of AI weather and climate models
physics.ao-ph 2026-05 unverdicted novelty 6.0

AIMIP Phase 1 shows AI models simulate historical climate and El Niño responses as well as traditional models, though some underestimate trends and diverge in generalization tests, with a public dataset released for f...

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages · cited by 1 Pith paper

[1]

Nature525(7567), 47–55 (2015) https://doi.org/10.1038/nature14956

Bauer, P., A. Thorpe, and G. Brunet, 2015: The quiet revolution of numerical weather prediction.Nature,525, 47–55, https://doi.org/ 10.1038/nature14956. Ben-Bouall`egue, Z., 2026: What is a realistic forecast? Assessing data- driven weather forecasts, a journey from verification to falsification. arXiv, 11 pp, preprint arXiv:2602.00622v1. Ben Bouall `egue...

work page doi:10.1038/nature14956 2015
[2]

rep., World Meteorological Organization, 56 pp

Tech. rep., World Meteorological Organization, 56 pp. https://library.wmo.int/idurl/4/58209. WMO, 2023: Manual on the WMO Integrated Processing and Predic- tion System: Annex IV to the WMO Technical Regulations. Tech. Rep. WMO-No. 485, World Meteorological Organization, 196 pp. https://library.wmo.int/idurl/4/35703. WMO, 2025: WP-MIP: the Weather Predicti...

work page doi:10.1002/qj.2378 2023