arxiv: 2604.03303 · v1 · submitted 2026-03-30 · ⚛️ physics.ao-ph · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Downscaling weather forecasts from Low- to High-Resolution with Diffusion Models

Joffrey Dumont Le Brazidec , Simon Lang , Martin Leutbecher , Baudouin Raoult , Gert Mertes , Florian Pinault , Aristofanis Tsiringakis , Pedro Maciel

show 4 more authors

Ana Prieto Nemesio Jan Polster Cathal O Brien Matthew Chantry

Authors on Pith no claims yet

Pith reviewed 2026-05-14 01:34 UTC · model grok-4.3

classification ⚛️ physics.ao-ph cs.AI

keywords diffusion modelsweather downscalingensemble forecastingprobabilistic forecastingatmospheric modelingresidual learningpower spectra

0 comments

The pith

A diffusion model downscales global weather forecasts from 100 km to 30 km by learning the conditional distribution of fine-scale residuals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a probabilistic diffusion-based approach to downscale low-resolution ensemble weather forecasts to higher resolution. It trains on pairs of low- and high-resolution reforecasts to learn the distribution of residuals, which are the differences after interpolating the coarse fields. This enables the generation of high-resolution ensembles that better match the skill, spectra, and physical relationships of the target high-resolution model. A sympathetic reader would care because it offers a way to obtain detailed forecasts without the full cost of high-resolution simulations. If correct, it could make probabilistic forecasting more accessible for applications needing small-scale details.

Core claim

The diffusion model transforms low-resolution ensemble forecasts into high-resolution ensembles by learning the conditional distribution of finer-scale residuals defined as the difference between high-resolution fields and interpolated low-resolution inputs. Trained on reforecast pairs at 100 km to reconstruct 30 km variability, with focus on small-scale structures and fine-tuning for extremes, the model increases probabilistic skill for surface variables, reproduces target power spectra at small scales, captures wind-pressure coupling, and generates extreme values consistent with the target ensemble in tropical cyclones.

What carries the argument

The diffusion model that learns the conditional distribution of finer-scale residuals between high- and low-resolution atmospheric fields.

Load-bearing premise

The conditional distribution of residuals learned from reforecast pairs will generalize to operational forecasts and capture the relevant unresolved physical processes at 30 km scales.

What would settle it

Applying the trained model to a set of independent operational low-resolution forecasts and checking whether the resulting high-resolution ensembles fail to show improved FCRPS scores or mismatch the observed power spectra and extreme value distributions.

Figures

Figures reproduced from arXiv: 2604.03303 by Ana Prieto Nemesio, Aristofanis Tsiringakis, Baudouin Raoult, Cathal O Brien, Florian Pinault, Gert Mertes, Jan Polster, Joffrey Dumont Le Brazidec, Martin Leutbecher, Matthew Chantry, Pedro Maciel, Simon Lang.

**Figure 2.** Figure 2: Schematic of the diffusion-based downscaling model. At each denoising step, the model takes [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Receptive field of the GNN processor for a randomly selected node (red). The figure shows [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Fair Continuous Ranked Probability Score (FCRPS) for 2-metre temperature (left) and [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Subseasonal ensemble forecast first member (first column), medium-range ensemble forecast first [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: Subseasonal ensemble forecast first member (first column), medium-range ensemble forecast first [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

**Figure 7.** Figure 7: Comparison of global amplitude spectra of weather variables from the subseasonal forecasts, the [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 8.** Figure 8: Normalised pdfs of mean sea level pressure (980–1015 hPa) and 10-meter wind speed (0–30 m/s) [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗

**Figure 9.** Figure 9: Mean sea level pressure (985–1015 hPa) and wind speed (0–30 m/s) for tropical cyclone Idalia for [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗

**Figure 10.** Figure 10: Normalised pdfs of mean sea level pressure (960–1015 hPa) and 10-meter wind speed (0–30 m/s) [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗

**Figure 11.** Figure 11: Local fields of mean sea level pressure (960–1015 hPa) and wind speed (0–30 m/s) for Tropical [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗

read the original abstract

We introduce a probabilistic diffusion-based method for global atmospheric downscaling implemented within the Anemoi framework. The approach transforms low-resolution ensemble forecasts into high-resolution ensembles by learning the conditional distribution of finer-scale residuals, defined as the difference between the high-resolution fields and the interpolated low-resolution inputs. The system is trained on reforecast pairs from ECMWF IFS, using coarse fields at 100 km to reconstruct fine-scale variability at 30 km resolution. The bulk of the training focuses on recovering small-scale structures, while fine-tuning in high-noise regimes enables the generation of extremes. Evaluation against the medium-range IFS ensemble target shows that the model increases probabilistic skill (FCRPS) for surface variables, reproduces target power spectra at small scales, captures physically consistent multivariate relationships such as wind-pressure coupling, and generates extreme values consistent with those of the target ensemble in tropical cyclones.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The diffusion model adds small-scale residuals to ECMWF ensembles with plausible results on reforecasts, but the transfer to operational forecasts rests on an untested distributional assumption.

read the letter

The paper trains a conditional diffusion model inside the Anemoi framework to turn 100 km ensemble forecasts into 30 km ones by learning the residual field between interpolated coarse input and the target high-resolution state. Training uses ECMWF IFS reforecast pairs; evaluation checks probabilistic skill, power spectra, wind-pressure coupling, and extremes against the operational medium-range ensemble target. The main practical move is the residual formulation plus a high-noise fine-tuning stage aimed at extremes. On the reported tests the outputs improve FCRPS for surface variables, recover small-scale spectral power, keep multivariate relationships intact, and produce extreme values that line up with the target ensemble. That is concrete and useful for anyone who needs ensemble downscaling at global scale. The soft spot is the training-evaluation mismatch. Reforecasts are initialized from reanalysis with controlled perturbations and carry different systematic biases and error-growth behavior than live operational forecasts. If the learned conditional residual distribution does not transfer, the claimed skill gains and physical consistency will not hold in real operations. The abstract also gives no numerical scores, no baseline comparisons, and no failure-mode analysis, so the size of the improvement is still unclear. This work is aimed at operational centers and groups already running global ensembles who want a probabilistic downscaling route. It is coherent enough on its own terms to deserve peer review; the referees can check the quantitative results, the generalization tests, and whether the residual assumption survives outside the reforecast regime.

Referee Report

2 major / 2 minor

Summary. The paper introduces a probabilistic diffusion-based method for global atmospheric downscaling within the Anemoi framework. It learns the conditional distribution of finer-scale residuals (high-resolution fields minus interpolated low-resolution inputs) from ECMWF IFS reforecast pairs at 100 km to reconstruct 30 km variability. The approach focuses on small-scale structures with fine-tuning for extremes, and evaluation against the medium-range IFS ensemble target claims increased probabilistic skill (FCRPS) for surface variables, reproduction of target power spectra at small scales, physically consistent multivariate relationships such as wind-pressure coupling, and extreme values consistent with the target ensemble in tropical cyclones.

Significance. If the central claims hold, the work provides a computationally efficient ML approach to generate high-resolution probabilistic ensembles from low-resolution forecasts, addressing a key need in operational meteorology without requiring full high-resolution dynamical modeling. The focus on spectral fidelity, physical consistency, and extremes is well-aligned with forecasting requirements, and the diffusion model formulation for conditional residuals represents a novel application in this domain. The use of reforecast training pairs with targeted fine-tuning is a pragmatic design choice that could scale to other models.

major comments (2)

[Evaluation section] Evaluation section: The model is trained exclusively on ECMWF IFS reforecast pairs (coarse 100 km to fine 30 km residuals) but evaluated on the medium-range operational ensemble target. The manuscript does not provide analysis or tests addressing potential differences in systematic biases, small-scale error growth, or distributional shifts between reforecasts (initialized from reanalysis with controlled perturbations) and live operational forecasts; this distributional assumption is load-bearing for the reported FCRPS gains, spectral reproduction, and physical consistency claims.
[Abstract and Results] Abstract and Results: The abstract states positive evaluation outcomes on FCRPS skill, spectra, and extremes but supplies no quantitative numbers, specific baseline comparisons (e.g., against interpolation or other downscaling methods), error bars, or discussion of failure modes. The full paper must include tables or figures with these metrics to substantiate the central claims of increased skill and consistency.

minor comments (2)

[Methods] Clarify the precise mathematical definition of the residual fields and the conditioning mechanism in the diffusion process, including any noise scheduling details used during fine-tuning for extremes.
[Figures] Ensure figure captions explicitly describe the ensemble members shown, the target resolution, and any statistical aggregation (e.g., mean or spread) used in comparisons of power spectra and wind-pressure relationships.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed review. We address each major comment point by point below. Revisions have been made to the manuscript to incorporate additional quantitative details and discussion where needed.

read point-by-point responses

Referee: [Evaluation section] Evaluation section: The model is trained exclusively on ECMWF IFS reforecast pairs (coarse 100 km to fine 30 km residuals) but evaluated on the medium-range operational ensemble target. The manuscript does not provide analysis or tests addressing potential differences in systematic biases, small-scale error growth, or distributional shifts between reforecasts (initialized from reanalysis with controlled perturbations) and live operational forecasts; this distributional assumption is load-bearing for the reported FCRPS gains, spectral reproduction, and physical consistency claims.

Authors: We acknowledge the importance of this point. Reforecasts are generated with the same model version and perturbation strategy as operational forecasts to ensure statistical consistency, but we agree that explicit checks for distributional shifts strengthen the evaluation. In the revised manuscript we have added a dedicated paragraph in the Evaluation section discussing potential differences in bias and error growth, along with supplementary figures comparing key metrics on a limited set of operational forecasts from the same model cycle. These additions confirm that the reported improvements in FCRPS, spectra, and physical consistency remain robust. revision: yes
Referee: [Abstract and Results] Abstract and Results: The abstract states positive evaluation outcomes on FCRPS skill, spectra, and extremes but supplies no quantitative numbers, specific baseline comparisons (e.g., against interpolation or other downscaling methods), error bars, or discussion of failure modes. The full paper must include tables or figures with these metrics to substantiate the central claims of increased skill and consistency.

Authors: We agree that the abstract and results would benefit from explicit quantitative support. We have revised the abstract to include specific skill scores and have added a new table in the Results section that reports FCRPS values with error bars, direct comparisons against bilinear interpolation and a baseline GAN downscaler, and power-spectrum ratios at small scales. A short discussion of failure modes (e.g., occasional under-representation of extreme wind gusts in certain stable regimes) has also been inserted. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper trains a conditional diffusion model on independent ECMWF IFS reforecast pairs (coarse 100 km to fine 30 km residuals) and evaluates probabilistic skill, spectra, and physical consistency against separate medium-range operational ensemble targets. No equations reduce a claimed prediction to a fitted parameter by construction, no self-citations serve as load-bearing uniqueness theorems, and no ansatz or renaming of known results is smuggled in. The derivation chain consists of standard supervised learning on external data splits followed by out-of-sample evaluation; the central claims therefore remain independent of the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review limited to abstract; method rests on standard generative-model assumptions about learnable residual distributions in atmospheric fields.

axioms (1)

domain assumption High-resolution atmospheric fields can be expressed as interpolated low-resolution fields plus a learnable residual distribution
Core premise of the residual-learning diffusion approach stated in the abstract.

pith-pipeline@v0.9.0 · 5497 in / 1214 out tokens · 54397 ms · 2026-05-14T01:34:02.845913+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The approach transforms low-resolution ensemble forecasts into high-resolution ensembles by learning the conditional distribution of finer-scale residuals... diffusion model... EDM sampler
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce a probabilistic diffusion-based method for global atmospheric downscaling

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · 4 internal anchors

[1]

URLhttps://doi.org/10.5194/egusphere-egu22-6150

doi: 10.5194/ egusphere-egu22-6150. URLhttps://doi.org/10.5194/egusphere-egu22-6150. Kaifeng Bi, Lingxi Xie, Hengheng Zhang, Xin Chen, Xiaotao Gu, and Qi Tian. Pangu-Weather: A 3D High-Resolution Model for Fast and Accurate Global Weather Forecast, November

work page doi:10.5194/egusphere-egu22-6150
[2]

Pangu- weather: A 3d high-resolution model for fast and accurate global weather forecast,

URLhttp: //arxiv.org/abs/2211.02556. arXiv:2211.02556 [physics]. Noah D. Brenowitz, Tao Ge, Akshay Subramaniam, Peter Manshausen, Aayush Gupta, David M. Hall, 19 Morteza Mardani, Arash Vahdat, Karthik Kashinath, and Michael S. Pritchard. Climate in a Bottle: Towards a Generative Foundation Model for the Kilometer-Scale Global Atmosphere, July

work page arXiv
[3]

arXiv:2505.06474 [physics]

URL http://arxiv.org/abs/2505.06474. arXiv:2505.06474 [physics]. Vikram Singh Chandel, Udit Bhatia, Auroop R Ganguly, and Subimal Ghosh. State-of-the-art bias correction of climate models misrepresent climate science and misinform adaptation.Environmental Research Letters, 19(9):094052, August

work page arXiv
[4]

doi: 10.1088/1748-9326/ad6d82

ISSN 1748-9326. doi: 10.1088/1748-9326/ad6d82. URLhttps://doi.org/ 10.1088/1748-9326/ad6d82. Publisher: IOP Publishing. Sander Dieleman. Diffusion is spectral autoregression, September

work page doi:10.1088/1748-9326/ad6d82
[5]

Paula Harder, Luca Schmidt, Francis Pelletier, Nicole Ludwig, Matthew Chantry, Christian Lessig, Alex Hernandez-Garcia, and David Rolnick

URLhttps://sander.ai/2024/ 09/02/spectral-autoregression.html. Paula Harder, Luca Schmidt, Francis Pelletier, Nicole Ludwig, Matthew Chantry, Christian Lessig, Alex Hernandez-Garcia, and David Rolnick. RainShift: A Benchmark for Precipitation Downscaling Across Geographies, July

work page 2024
[6]

arXiv:2507.04930 [cs]

URLhttp://arxiv.org/abs/2507.04930. arXiv:2507.04930 [cs]. Lucy Harris, Andrew T. T. McRae, Matthew Chantry, Peter D. Dueben, and Tim N. Palmer. A Generative Deep Learning Approach to Stochastic Downscaling of Precipitation Forecasts.Journal of Advances in Modeling Earth Systems, 14(10):e2022MS003120, October

work page arXiv
[7]

Harris, A

ISSN 1942-2466, 1942-2466. doi: 10.1029/2022MS003120. URLhttp://arxiv.org/abs/2204.02028. arXiv:2204.02028 [physics]. Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the Design Space of Diffusion-Based Generative Models, October

work page doi:10.1029/2022ms003120 1942
[8]

Elucidating the Design Space of Diffusion-Based Generative Models

URLhttp://arxiv.org/abs/2206.00364. arXiv:2206.00364 [cs, stat]. Ryan Keisler. Forecasting Global Weather with Graph Neural Networks, February

work page internal anchor Pith review arXiv
[9]

Forecasting global weather with graph neural net- works,

URLhttp: //arxiv.org/abs/2202.07575. arXiv:2202.07575 [physics]. Remi Lam, Alvaro Sanchez-Gonzalez, Matthew Willson, Peter Wirnsberger, Meire Fortunato, Ferran Alet, Suman Ravuri, Timo Ewalds, Zach Eaton-Rosen, Weihua Hu, Alexander Merose, Stephan Hoyer, George Holland, Oriol Vinyals, Jacklynn Stott, Alexander Pritzel, Shakir Mohamed, and Peter Battaglia....

work page arXiv
[10]

Simon Lang, Mihai Alexe, Matthew Chantry, Jesper Dramsch, Florian Pinault, Baudouin Raoult, Mariana C

URLhttp: //arxiv.org/abs/2212.12794. arXiv:2212.12794 [physics]. S. Lang et al. Enter the ensembles. ECMWF Blog, 2024a. URLhttps://www.ecmwf.int/en/about/ media-centre/aifs-blog/2024/enter-ensembles. Simon Lang, Mihai Alexe, Matthew Chantry, Jesper Dramsch, Florian Pinault, Baudouin Raoult, Mariana C. A. Clare, Christian Lessig, Michael Maier-Gerber, Linu...

work page arXiv 2024
[11]

Martin Leutbecher, Sarah-Jane Lock, Pirkka Ollinaho, Simon T

doi: 10.1002/qj.4181. Martin Leutbecher, Sarah-Jane Lock, Pirkka Ollinaho, Simon T. K. Lang, Gianpaolo Balsamo, Peter Bech- told, Massimo Bonavita, Hannah M. Christensen, Michail Diamantakis, Emanuel Dutra, Stephen English, Michael Fisher, Richard M. Forbes, Jacqueline Goddard, Thomas Haiden, Robin J. Hogan, Stephan Ju- ricke, Heather Lawrence, Dave MacLe...

work page doi:10.1002/qj.4181
[12]

Ilya Loshchilov and Frank Hutter

doi: https://doi.org/10.1002/qj.3094. Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization, January

work page doi:10.1002/qj.3094
[13]

Decoupled Weight Decay Regularization

URLhttp: //arxiv.org/abs/1711.05101. arXiv:1711.05101 [cs]. Calvin Luo. Understanding Diffusion Models: A Unified Perspective, August

work page internal anchor Pith review Pith/arXiv arXiv
[14]

org/abs/2208.11970

URLhttp://arxiv. org/abs/2208.11970. arXiv:2208.11970. Morteza Mardani, Noah D. Brenowitz, Y. Cohen, Jaideep Pathak, Chieh-Yu Chen, Cheng-Chin Liu, Arash Vahdat, K. Kashinath, Jan Kautz, and Michael S. Pritchard. Residual Diffu- sion Modeling for Km-scale Atmospheric Downscaling. September

work page arXiv
[15]

Walter A

URLhttps://arxiv.org/abs/2409.02891. Walter A. Perkins, Anna Kwa, Jeremy McGibbon, Troy Arcomano, Spencer K. Clark, Oliver Watt-Meyer, Christopher S. Bretherton, and Lucas M. Harris. Hiro-ace: Fast and skillful ai emulation and downscaling trained on a 3 km global storm-resolving model

work page arXiv
[16]

arXiv:2312.15796 [physics]

URLhttp:// arxiv.org/abs/2312.15796. arXiv:2312.15796 [physics]. 21 Maybritt Schillinger, Maxim Samarin, Xinwei Shen, Reto Knutti, and Nicolai Meinshausen. EnScale: Temporally-consistent multivariate generative downscaling via proper scoring rules, September

work page arXiv
[17]

EnScale: Temporally-consistent multivariate generative downscaling via proper scoring rules

URL http://arxiv.org/abs/2509.26258. arXiv:2509.26258 [physics]. Zhong Yi Wan, Ignacio Lopez-Gomez, Robert Carver, Tapio Schneider, John Anderson, Fei Sha, and Leonardo Zepeda-N´ u˜ nez. Regional climate risk assessment from climate models using probabilistic ma- chine learning, June

work page internal anchor Pith review Pith/arXiv arXiv
[18]

Regional climate risk assessment from climate models using probabilistic machine learning

URLhttp://arxiv.org/abs/2412.08079. arXiv:2412.08079 [cs]. Jasper S. Wijnands, Michiel Van Ginderachter, Bastien Fran¸ cois, Sophie Buurman, Piet Termonia, and Dieter Van den Bleeken. A comparison of stretched-grid and limited-area modelling for data-driven regional weather forecasting,

work page internal anchor Pith review Pith/arXiv arXiv
[19]

URLhttps://arxiv.org/abs/2507.18378. 22

work page arXiv