Recognition: 2 theorem links
· Lean TheoremComparing Ocean Forecasts Driven with Machine Learning-based and Physics-based Atmospheric Forcings
Pith reviewed 2026-05-10 17:54 UTC · model grok-4.3
The pith
Ocean forecasts forced by machine learning atmospheric data show comparable or enhanced skill versus physics-based forcing.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The ocean forecasts forced with AIFS atmospheric data exhibit comparable or enhanced predictive skill compared to those forced with ACCESS-G3 data.
What carries the argument
Side-by-side evaluation of NEMO ocean model skill under AIFS versus ACCESS-G3 atmospheric forcing, using identical initial conditions and assessing surface variables against reanalysis and observations.
Load-bearing premise
That differences in ocean forecast skill are caused only by the atmospheric forcing and not by other model biases or the limited two-year initialization window.
What would settle it
Repeating the experiment over a five-year period or with an independent ocean model and finding that AIFS-forced runs show systematically higher errors than ACCESS-G3 runs would falsify the central claim.
read the original abstract
Operational ocean forecasting systems conventionally employ dynamical ocean models driven by atmospheric forcing derived from numerical weather prediction (NWP) models. Recent advancements in artificial intelligence and machine learning (ML) have led to the development of ML-based atmospheric weather models, which have competitive, if not better, medium range forecast accuracy compared to traditional NWP systems. This study evaluates the impact of ML-based atmospheric forcing on ocean forecast skill through two sets of 10-day forecasts using the UK Met Office GOSI9 configuration of the NEMO dynamical ocean model. Both experiments share identical ocean initial conditions; but differ in atmospheric forcing: one uses ECMWF's ML-based AIFS model, while the other uses the Australian Bureau of Meteorology's physics-based NWP model, ACCESS-G3. Forecasts were initialized on the first day of each month over the period 2023-2024. The quality of the atmospheric forcing was assessed by comparing AIFS and ACCESS-G3 forecast skill against both ECMWF reanalysis v5 (ERA5) and ACCESS-G3 analyses. Results indicate that AIFS consistently outperforms ACCESS-G3, either from the initial forecast time or after the first few days. Oceanic forecast skill was evaluated against both the GOSI9 reanalysis and observations, focusing on key surface variables including sea surface temperature, salinity, sea level, and ocean currents. The ocean forecasts forced with AIFS atmospheric data exhibit comparable or enhanced predictive skill compared to those forced with ACCESS-G3 data. These findings underscore the potential of ML-based atmospheric models to replace traditional NWP forcing in operational ocean forecasting systems, offering improved accuracy and computational efficiency.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper compares 10-day NEMO/GOSI9 ocean forecasts initialized monthly in 2023-2024 with identical ocean initial conditions but differing atmospheric forcings: ECMWF's ML-based AIFS versus the physics-based ACCESS-G3 NWP model. Atmospheric forcing quality is assessed against ERA5 and ACCESS-G3 analyses, while ocean forecast skill for SST, salinity, sea level, and currents is evaluated against GOSI9 reanalysis and observations. The central claim is that AIFS-driven forecasts exhibit comparable or enhanced predictive skill relative to ACCESS-G3-driven forecasts.
Significance. If the attribution holds, the work provides evidence that ML-based atmospheric models can serve as effective replacements for traditional NWP forcings in operational ocean forecasting, with potential gains in accuracy and efficiency. The design using shared ocean initial conditions and multi-variable evaluation against both reanalysis and observations is a strength that isolates the forcing impact at a high level.
major comments (2)
- [Abstract and Results (oceanic forecast skill evaluation)] The evaluation relies on at most 24 forecast cases initialized monthly over 2023-2024 only (as stated in the abstract). Ocean variables exhibit strong seasonal and interannual variability, and without multi-year baselines, cross-validation across periods, or statistical significance tests on skill deltas, it is not possible to rule out that any AIFS advantage is an artifact of the sampled period rather than a general property of the ML forcing. This directly affects the central attribution claim.
- [Methods and Results sections] The manuscript does not report full details on the error metrics, statistical tests applied to skill differences, or controls for potential model-specific biases and confounding factors in the NEMO/GOSI9 setup. This omission leaves the robustness of the 'comparable or enhanced' skill conclusion under-supported given the small sample.
minor comments (1)
- [Abstract] The abstract could explicitly state the exact number of forecasts performed and the precise initialization dates to clarify the sample size.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed review of our manuscript. We have addressed each major comment point by point below, providing clarifications and indicating where revisions will be made to strengthen the paper. Our responses aim to enhance the robustness and transparency of the analysis without overstating the current results.
read point-by-point responses
-
Referee: [Abstract and Results (oceanic forecast skill evaluation)] The evaluation relies on at most 24 forecast cases initialized monthly over 2023-2024 only (as stated in the abstract). Ocean variables exhibit strong seasonal and interannual variability, and without multi-year baselines, cross-validation across periods, or statistical significance tests on skill deltas, it is not possible to rule out that any AIFS advantage is an artifact of the sampled period rather than a general property of the ML forcing. This directly affects the central attribution claim.
Authors: We agree that the sample of 24 monthly-initialized forecasts over 2023-2024 is limited and does not capture full interannual variability, which is a genuine constraint on generalizability. This is a fair point regarding the central attribution. To mitigate this, we will add statistical significance tests on the skill differences (e.g., paired Wilcoxon signed-rank tests or bootstrap resampling with confidence intervals) in the revised Results section. We will also expand the Discussion to explicitly note this temporal limitation and recommend longer-term evaluations in future work. The experimental design, with identical ocean initial conditions, helps isolate the forcing impact, and the consistency of AIFS advantages across multiple variables and against both reanalysis and independent observations provides supporting evidence within the sampled period. We do not claim universality but demonstrate potential applicability. revision: partial
-
Referee: [Methods and Results sections] The manuscript does not report full details on the error metrics, statistical tests applied to skill differences, or controls for potential model-specific biases and confounding factors in the NEMO/GOSI9 setup. This omission leaves the robustness of the 'comparable or enhanced' skill conclusion under-supported given the small sample.
Authors: We thank the referee for highlighting this omission, which we agree weakens the support for our conclusions. In the revised manuscript, we will expand the Methods section to provide: complete definitions and mathematical formulations of all error metrics (RMSE, bias, anomaly correlation coefficient, etc.); descriptions of statistical tests for skill differences (including those we will newly apply); and explicit details on experimental controls, such as the use of identical ocean initial conditions to isolate atmospheric forcing effects, along with any bias-handling procedures or configuration choices in the NEMO/GOSI9 model that address potential confounding factors. These additions will be cross-referenced in the Results to better substantiate the 'comparable or enhanced' skill findings. revision: yes
- The two-year period (2023-2024) inherently limits our ability to perform multi-year baselines or cross-validation across independent periods without substantial additional data and computational resources.
Circularity Check
No significant circularity; independent empirical comparison
full rationale
The paper performs a straightforward side-by-side evaluation of ocean forecast skill under two external atmospheric forcing datasets (ECMWF AIFS and BoM ACCESS-G3) using identical NEMO/GOSI9 initial conditions and the same ocean model configuration. Skill metrics are computed against independent references (GOSI9 reanalysis, in-situ observations, ERA5). No derivation chain, fitted parameters, self-citations, or ansatzes are invoked to support the central claim; the result is a direct data-driven comparison rather than a reduction of outputs to inputs by construction. The short 2023-2024 sample is a methodological limitation but does not constitute circularity.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Dynamical ocean models like NEMO produce reliable forecasts when provided with accurate atmospheric forcing.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinctionreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The ocean forecasts forced with AIFS atmospheric data exhibit comparable or enhanced predictive skill compared to those forced with ACCESS-G3 data.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
E., Gaudel, Q., Regnier, C., Van Gennip, S., Drevillon, M., Drillet, Y., and Lellouche, J.-M
Aouni, A. E., Gaudel, Q., Regnier, C., Van Gennip, S., Drevillon, M., Drillet, Y., and Lellouche, J.-M. (2025). Glonet: Mercator’s end-to-end neural forecasting system. arXiv preprint. doi:10.48550/arXiv.2412.05454 Behrens, E., and Bostock, H. (2023). The response of the subtropical front to changes in the southern hemisphere westerly winds—evidence from ...
-
[2]
doi:10.1038/s41612- 023-00512-1 Cummings, J. A., and Smedstad, O. M. (2013). Variational data assimilation for the global ocean. In Data Assimilation for Atmospheric, Oceanic and Hydrologic Applications (Vol. II). (Eds S. Park and L. Xu.) pp. 303–343. (Springer: Berlin.) doi:10.1007/978-3-642-35088- 7_13 Droghei, R., Buongiorno Nardelli, B., and Santoleri...
-
[3]
T., Blockley, E., Megann, A., and Hewitt, H
doi:10.3390/rs12040720 Guiavarc’h, C., Storkey, D., Blaker, A. T., Blockley, E., Megann, A., and Hewitt, H. (2025). GOSI9: UK global ocean and sea ice configurations. Geoscientific Model Development 18, 377–403. doi:10.5194/gmd-18-377-2025 Halpern, B. S., Frazier, M., Potapenko, J., Casey, K. S., Koenig, K., Longo, C., Lowndes, J. S., Rockwood, R. C., Sel...
-
[4]
doi:10.1038/ncomms8615 Halpern, D., Knox, R. A., and Luther, D. S. (1988). Observations of 20-day meridional current oscillations in the upper ocean along the Pacific equator. Journal of Physical Oceanography 18, 1514–1534. He, Q., Zhan, W., Cai, S., Du, Y., Chen, Z., Tang, S., and Zhan, H. (2023). Enhancing impacts of mesoscale eddies on Southern Ocean t...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.