An intercomparison of generative machine learning methods for downscaling precipitation at fine spatial scales

Bryn Ward-Leikis; Hong-Yang Liu; Jeff Adie; Neelesh Rampal; Peter B. Gibson; Steven C. Sherwood; Tristan Meyers; Vassili Kitsios; Yang Juntao; Yun Sing Koh

arxiv: 2512.13987 · v2 · pith:S7FYM6T7new · submitted 2025-12-16 · ⚛️ physics.ao-ph

An intercomparison of generative machine learning methods for downscaling precipitation at fine spatial scales

Neelesh Rampal , Bryn Ward-Leikis , Yun Sing Koh , Peter B. Gibson , Hong-Yang Liu , Vassili Kitsios , Tristan Meyers , Jeff Adie

show 2 more authors

Yang Juntao Steven C. Sherwood

This is my paper

classification ⚛️ physics.ao-ph

keywords climateflowmatchingskillacrosschangediffusiongans

0 comments

read the original abstract

Machine learning (ML) offers a computationally efficient approach for generating large ensembles of high-resolution climate projections, but deterministic ML methods often smooth fine-scale structures and underestimate extremes. While stochastic generative models show promise, few studies have compared their skill under both present-day and future climates. This study compares Generative Adversarial Networks (GANs), flow matching and diffusion models across multiple configurations for downscaling daily precipitation from a regional climate model (RCM) over New Zealand. Model skill is assessed across spatial structure, distributional metrics, climatological means, extremes, ensemble calibration, and climate change signals. Unlike GANs, diffusion and flow matching models generate predictions through many sequential steps. Here we show that using higher-order differential equation solvers, the number of steps required can be reduced with only a minor reduction in skill, heavily reducing the computational burden for downscaling large ensembles, which may have otherwise prevented their use in operational settings. Overall, GANs, flow matching and diffusion perform competitively across most metrics, except that diffusion and flow matching produce higher-fidelity predictions and better-calibrated ensembles compared to GANs -which are under-dispersive. Most approaches capture mean precipitation signals reasonably well, but underestimate end-of-century climate change signals of extreme precipitation, despite being trained on RCM simulations spanning the future period. Only one GAN and one flow matching configuration can reproduce this change signal reliably. These results highlight the importance of evaluating model performance across a comprehensive set of metrics, and that neither visual realism nor good skill on standard metrics guarantee skill in predicting climate change signals.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

CORDEX-ML-Bench: A Benchmark for Data-Driven Regional Climate Downscaling -Experiment Design and Overview
physics.ao-ph 2026-06 unverdicted novelty 7.0

CORDEX-ML-Bench benchmarks 40 ML models for climate downscaling and finds generative models outperform deterministic ones on precipitation while historically trained models underestimate future climate signals.
Flow Matching for Convective-Scale Precipitation Downscaling
physics.ao-ph 2026-05 unverdicted novelty 4.0

Flow matching produces better spatial structure than diffusion models for convective precipitation downscaling but underestimates heavy rainfall amounts.