pith. machine review for the scientific record. sign in

arxiv: 2604.22522 · v1 · submitted 2026-04-24 · ⚛️ physics.ao-ph

Recognition: unknown

Hybrid weather prediction using spectral nudging toward machine-learning forecasts

Authors on Pith no claims yet

Pith reviewed 2026-05-08 08:56 UTC · model grok-4.3

classification ⚛️ physics.ao-ph
keywords hybrid weather forecastingspectral nudgingmachine learning weather predictionnumerical weather predictionlarge-scale skilltropical cyclone forecastsforecast busts
0
0 comments X

The pith

Spectral nudging of large scales from machine-learning forecasts into a physics-based model improves overall weather prediction skill while keeping small-scale behavior intact.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether a physics-based weather model can be gently corrected at broad scales by a machine-learning forecast to gain accuracy without breaking the detailed physical processes that the physics model handles well. Nudging is limited to large-scale virtual temperature and vorticity so that local storms, fronts, and other fine features continue to evolve according to established fluid and thermodynamic rules. Gains appear in large-scale metrics, fewer outright forecast failures, and better tropical cyclone paths, all while extremes and variability remain comparable to the un-nudged physics run. The approach is presented as a concrete way to let each modeling style contribute where it is strongest.

Core claim

Scale-selective spectral nudging applied only to the large scales of virtual temperature and vorticity in the ECMWF IFS model toward a machine-learned forecast raises large-scale forecast skill by up to 1.5 days in the tropics and 12-18 hours in the extra-tropics, reduces the number of busts, preserves forecast variability and the representation of extremes, and improves tropical cyclone track forecasts while leaving intensity and small-scale physics consistent with the original physics-based model.

What carries the argument

Scale-selective spectral nudging, which relaxes only the large-scale spectral coefficients of virtual temperature and vorticity in the physics-based model toward the machine-learning solution at each time step.

If this is right

  • Large-scale forecast skill improves relative to the free-running physics model.
  • The frequency of forecast busts decreases.
  • Forecast variability and the representation of extreme near-surface weather remain comparable to the physics-only model.
  • Tropical cyclone tracks benefit from the improved large-scale steering flow while intensity stays physically consistent with the physics model.
  • The hybrid setup offers a practical route to combine machine-learning and physics-based systems without replacing either entirely.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same nudging strategy could be tested with other machine-learning models or applied to additional large-scale variables to see if further gains appear.
  • If machine-learning models continue to improve at large scales, the hybrid approach might allow physics models to focus computational effort on smaller scales.
  • Operational centers could adopt the method incrementally by nudging only selected variables or regions first.

Load-bearing premise

Nudging applied only to the large scales of virtual temperature and vorticity will preserve the dynamical and physical behaviour of the underlying physics-based model at smaller scales.

What would settle it

A direct comparison showing that the hybrid run produces statistically different small-scale variability, different distributions of extreme near-surface variables, or physically inconsistent tropical cyclone intensities relative to the free-running IFS would falsify the preservation claim.

Figures

Figures reproduced from arXiv: 2604.22522 by B. Vanniere, E. Gasc\'on, I. Polichtchouk, M. C. A. Clare, M. Chantry, M. Maier-Gerber, S. Lang.

Figure 1
Figure 1. Figure 1: Global kinetic energy spectra as a function of total wavenumber at 500 hPa for forecast day 10 from AIFS-Single-ML (dotted black), IFS (blue), hy-IFS (dashed red) and operational AIFS-Single (dark green). Spectra are averaged for 3 months of forecasts. Black vertical line shows total wavenumber 21, which is the cut-off wavenumber for nudging. of the semi-diurnal tide signal. This resulted in a wavenumber-2… view at source ↗
Figure 2
Figure 2. Figure 2: Specific humidity at 700 hPa over western Europe at 18 UTC on 3 October 2025 in (a) the ECMWF analysis and t=114 h forecasts by (b) IFS, (c) hy-IFS, and (d) AIFS. The black box in (a) marks the region shown in view at source ↗
Figure 3
Figure 3. Figure 3: Mean sea level pressure (black contours) and 10 m wind speed (shading) over the region marked by black box in Figure 2a for (a) ECMWF analysis and t=114 h forecasts by (b) IFS, (c) hy-IFS, and (d) AIFS. The pure AIFS forecast underestimates the near-surface wind maxima over western Scotland and out of the three forecasts, the hy-IFS most closely reproduces the analysed storm structure and near-surface wind… view at source ↗
Figure 4
Figure 4. Figure 4: Summary scorecard comparing forecast skill differences between the spectrally nudged IFS (hy-IFS) and the control IFS, using anomaly correlation (CCAF) and root mean square error (RMSEF), verified against the ECMWF operational analysis (top) and against radiosonde and SYNOP observations (bottom). Shades of blue indicate improvement in hy-IFS relative to IFS, and shades of red indicate degradation. Each box… view at source ↗
Figure 5
Figure 5. Figure 5: RMSE of 500 hPa geopotential height (top); ACC of 850 hPa temperature (middle); and RMSE of 500 hPa wind speed (bottom) for hy-IFS (dashed red), IFS (solid blue) and operational AIFS (dotted green) for the (a,d,g) Northern Hemisphere extra-tropics (20◦N-90◦N), (b,e,h) tropics (20◦N-20◦S), and (c,f,i) Southern Hemisphere extra-tropics (20◦S-90◦S). Thin black lines show hybrid forecasts with AIFS-Single-ML f… view at source ↗
Figure 6
Figure 6. Figure 6: RMSE of (a) 10-m wind speed and (b) 2-m temperature, and SEEPS of (c) total precipitation for hy-IFS (dashed red), IFS (solid blue) and operational AIFS (dotted green) for the Northern Hemisphere extra-tropics. Thin black lines show hybrid forecasts with AIFS-Single-ML fine-tuned on ECMWF analysis only for 2016–2020, rather than 2016–2023 as in hy-IFS (dashed red). Lower RMSE values and higher SEEPS values… view at source ↗
Figure 7
Figure 7. Figure 7: Timeseries of day 6 ACC of geopotential height for hy-IFS (red), IFS (blue) and operational AIFS (green) for (a) Europe (35◦N-75◦N; 12.5◦W-42.5◦E), (b) North America (25◦N-60◦N; 120◦W-75◦W) and (c) East Asia (25◦N-60◦N; 102.5◦E-150◦E). A forecast bust is defined to occur when ACC drops below 40%. The number of busts for each system is shown in the legend. as 2-m temperature and precipitation. 4.1 Sensitivi… view at source ↗
Figure 8
Figure 8. Figure 8: (a) Mean position error and (b) mean maximum wind for tropical cyclones as a function of forecast lead time for the hy-IFS (in red), IFS (in blue), and AIFS-Single (in green). Verification statistics are computed with respect to IBTrACS (Gahtan et al., 2024; Knapp et al., 2010) for forecasts initialised at 00 UTC between 1 June 2024 and 31 October 2025. Case counts for each lead time are displayed below th… view at source ↗
Figure 9
Figure 9. Figure 9: Relative change (%) in threshold-weighted mean absolute error (twMAE) for extreme near￾surface variables in the Northern Hemisphere, expressed as percentage difference (hy-IFS minus IFS) relative to IFS. Results are shown as a function of forecast lead time (days 1–10) and stratified by orographic complexity (flat, hilly, mountainous). Panels show (a) extreme high 2-m temperature events (≥ 35◦C), (b) extre… view at source ↗
Figure 10
Figure 10. Figure 10: Impact of doubling the entrainment rate in the deep convection parametrisation in forecasts without nudging (left) and with spectral nudging (right). Vertical cross-sections of (a,b) temperature and (c,d) vector wind component RMSE differences of an experiment which uses double entrainment when nudging is not active (a,c) and when nudging is active (b,d). Note the different contour intervals for (a) and (… view at source ↗
Figure 11
Figure 11. Figure 11: Total precipitation accumulated over 1 hour between 23:00–00:00 UTC on 2 January 2025 for (a) IMERG observations and the three forecast experiments initialized at 00:00 UTC on 1 January 2025 using: (b) standard convection, (c) doubling entrainment rate, and (d) doubling entrainment rate combined with spectral nudging. project onto the large-scale flow. For surface fields, such as 2-m dew point temperature… view at source ↗
read the original abstract

A hybrid approach to numerical weather prediction is investigated, in which the unperturbed physics-based ECMWF Integrated Forecasting System (IFS) is spectrally nudged toward forecasts from a machine-learned weather forecast model, trained to forecast on model levels. Nudging is applied only to the large scales of virtual temperature and vorticity, with the objective of improving large-scale forecast skill while preserving the dynamical and physical behaviour of the underlying physics-based model at smaller scales. Consistent with previous studies, spectral nudging substantially improves large-scale forecast skill relative to the free-running IFS, with gains of up to 1.5 days in the tropics and 12-18 hours in the extra-tropics, and a reduced frequency of forecast busts. These improvements are achieved while preserving forecast variability. The representation of extreme near-surface weather is maintained or improved. Tropical cyclone track forecasts benefit from improved large-scale steering flow, while storm intensity remains comparable to that of the physics-based model and more physically consistent than in pure machine-learned weather forecast models. These results confirm that scale-selective spectral nudging provides a practical pathway for combining machine-learning and physics-based forecasting systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes and tests a hybrid NWP system in which the free-running ECMWF IFS is spectrally nudged toward a machine-learned forecast model (trained on model levels). Nudging is restricted to the large scales of virtual temperature and vorticity. The central claim is that this scale-selective nudging improves large-scale forecast skill (up to 1.5 days in the tropics, 12-18 h in the extratropics, fewer busts) while preserving the IFS's small-scale dynamical and physical behaviour, forecast variability, extreme near-surface weather, and tropical-cyclone intensity.

Significance. If the preservation of small-scale behaviour can be demonstrated, the work supplies a concrete, immediately usable route for injecting ML large-scale skill into an operational physics-based model without discarding the model's small-scale physics and variability. The reported skill gains and reduced bust frequency are practically relevant; the maintenance of TC intensity and extremes is a non-trivial positive result.

major comments (2)
  1. [Abstract / Results] Abstract and Results: The claim that 'the dynamical and physical behaviour of the underlying physics-based model at smaller scales' is preserved rests on aggregate diagnostics (forecast variability, representation of extreme near-surface weather). No scale-decomposed diagnostics—kinetic-energy spectra, cross-scale energy fluxes, or statistics of parameterized processes (convection, boundary-layer turbulence)—are shown to confirm that small-scale behaviour remains statistically indistinguishable from the free-running IFS. This verification is load-bearing for the 'practical pathway' conclusion.
  2. [Methods] Methods: The precise spectral cutoff, nudging strength, and vertical structure of the nudging operator are not stated with sufficient quantitative detail to allow reproduction or to assess possible leakage into the small scales. Without these parameters, it is impossible to judge whether the reported preservation is robust or specific to the chosen cutoff.
minor comments (2)
  1. [Abstract] The abstract states gains 'of up to 1.5 days' and '12-18 hours' but does not indicate the verification metric (e.g., anomaly correlation, RMSE) or the exact lead times at which these gains are measured.
  2. [Results] Baseline comparison is only against the free-running IFS; a direct comparison against the pure ML model on the same small-scale diagnostics would strengthen the claim that the hybrid retains physical consistency advantages.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and positive review, which highlights the practical relevance of the hybrid approach. We address each major comment below and will revise the manuscript accordingly to strengthen the presentation and evidence.

read point-by-point responses
  1. Referee: [Abstract / Results] Abstract and Results: The claim that 'the dynamical and physical behaviour of the underlying physics-based model at smaller scales' is preserved rests on aggregate diagnostics (forecast variability, representation of extreme near-surface weather). No scale-decomposed diagnostics—kinetic-energy spectra, cross-scale energy fluxes, or statistics of parameterized processes (convection, boundary-layer turbulence)—are shown to confirm that small-scale behaviour remains statistically indistinguishable from the free-running IFS. This verification is load-bearing for the 'practical pathway' conclusion.

    Authors: We appreciate the referee's emphasis on the need for more direct verification of small-scale preservation. The manuscript currently supports the claim through multiple aggregate but physically relevant diagnostics, including preserved forecast variability (which reflects small-scale energy), maintained or improved representation of extreme near-surface weather (a small-scale phenomenon), and comparable tropical-cyclone intensity with more physical consistency than pure ML models. Nevertheless, we agree that explicit scale-decomposed diagnostics would provide stronger, more targeted evidence. In the revised manuscript we will add kinetic-energy spectra for the nudged and free-running IFS runs, and we will include basic statistics on parameterized processes where these can be extracted from the model output. revision: yes

  2. Referee: [Methods] Methods: The precise spectral cutoff, nudging strength, and vertical structure of the nudging operator are not stated with sufficient quantitative detail to allow reproduction or to assess possible leakage into the small scales. Without these parameters, it is impossible to judge whether the reported preservation is robust or specific to the chosen cutoff.

    Authors: We acknowledge that the original submission did not present the nudging parameters with the quantitative precision required for full reproducibility. The revised Methods section will explicitly state the spectral cutoff wavenumber, the nudging relaxation coefficient (including its units and time scale), and the vertical structure of the nudging operator (including any tapering or level-dependent weighting). These details will be accompanied by a brief justification of the chosen values and a note on how the cutoff was selected to minimize leakage into smaller scales. revision: yes

Circularity Check

0 steps flagged

No significant circularity: empirical comparisons to free-running IFS

full rationale

The paper reports an experimental hybrid forecasting setup in which spectral nudging is applied to selected large-scale fields of the IFS toward an independently trained ML model. Forecast skill, variability, extreme-event statistics, and tropical-cyclone properties are then measured directly against the un-nudged IFS baseline. No equations, parameters, or central claims are shown to reduce to fitted inputs or prior self-citations by construction; the reported gains (e.g., 1.5 days in the tropics) are obtained from fresh integrations and verified against external observables. Self-citations to earlier nudging studies are present but serve only as background and are not invoked to establish uniqueness or to substitute for the present empirical evidence.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that large-scale nudging leaves small-scale physics intact; this is a standard premise in atmospheric modeling but is not independently verified in the provided abstract.

axioms (1)
  • domain assumption Large-scale components of virtual temperature and vorticity can be adjusted via nudging without adversely affecting small-scale dynamical and physical behaviour.
    Invoked in the abstract as the objective of the nudging strategy.

pith-pipeline@v0.9.0 · 5528 in / 1251 out tokens · 50421 ms · 2026-05-08T08:56:42.651733+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages

  1. [1]

    48550/arXiv.2409.06735

    Evaluation of tropical cyclone track and intensity forecasts from artificial intelligence weather prediction (aiwp) models.arXiv preprint arXiv:2409.06735doi:https://doi.org/10. 48550/arXiv.2409.06735. Diamantakis M, V´ aˇ na F. 2022. A fast converging and concise algorithm for computing the departure points in semi-lagrangian weather and climate models.Q...

  2. [2]

    Indices for monitoring changes in extremes based on daily temperature and precipitation data.WIREs Climate Change2(6): 851–870, doi:10.1002/wcc.147. 21