pith. sign in

arxiv: 2605.15470 · v1 · pith:JCS2PZMKnew · submitted 2026-05-14 · 💻 cs.LG · physics.ao-ph

Njord: A Probabilistic Graph Neural Network for Ensemble Ocean Forecasting

Pith reviewed 2026-05-19 14:39 UTC · model grok-4.3

classification 💻 cs.LG physics.ao-ph
keywords ocean forecastinggraph neural networksprobabilistic modelsensemble predictionmachine learninguncertainty estimationocean dynamics
0
0 comments X

The pith

A probabilistic graph neural network for ocean forecasting achieves the lowest errors on a global benchmark while providing uncertainty estimates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Njord, a model that combines deep latent variables with graph neural networks to generate probabilistic ensemble forecasts for ocean dynamics in both global and regional settings. This approach allows sampling multiple forecasts in a single forward pass, unlike deterministic machine learning models that ignore the chaotic nature of ocean systems. To handle large irregular grids, the model uses K-means cluster meshes that adapt to sea surface geometry at 0.25 degree global and 2 km regional resolutions. On the OceanBench benchmark against real observations, Njord records the lowest average errors across upper-ocean variables, with the biggest gains in surface temperature prediction.

Core claim

Njord integrates a deep latent variable framework with a graph neural network architecture on K-means cluster meshes, enabling single-pass sampling of ensemble forecasts that outperform deterministic baselines on upper-ocean variables while supplying uncertainty estimates from the ensembles.

What carries the argument

K-means cluster meshes adapted to irregular sea surface geometry, combined with a deep latent variable model that supports efficient probabilistic sampling within the graph neural network.

Load-bearing premise

K-means cluster meshes adapt sufficiently well to irregular sea-surface geometry to allow accurate and efficient scaling of the graph neural network to global 0.25-degree and regional 2 km grids.

What would settle it

Demonstrating that a competing model produces lower average errors than Njord across upper-ocean variables on the OceanBench benchmark when validated against real-world observations would undermine the performance advantage.

Figures

Figures reproduced from arXiv: 2605.15470 by Daniel Holmberg, Erik Larsson, Fredrik Lindsten, Joel Oskarsson, Teemu Roos.

Figure 1
Figure 1. Figure 1: Njord. at global short-range (1–10 days) timescales. These models are however, deterministic: they produce a single trajectory and are typically trained with mean squared error, which encourages pre￾dictions toward the conditional mean of the future state rather than capturing the full predictive distribution. Consequently, they tend to smooth over fine-scale variance and offer limited insight into the pro… view at source ↗
Figure 2
Figure 2. Figure 2: One-step prediction in the Njord model. Residuals are predicted at time [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Example of graph node place￾ment in the Red Sea. 4.1 A graph adapted to ocean geometry Graph-based global weather forecasting models use icosahedral meshes [30, 9, 31] for constructing the spa￾tial graph that the model operates over. These meshes are constructed by iteratively subdividing an icosahedron, with each subdivision quadrupling the number of nodes and edges [30]. As the size of the graph heavily … view at source ↗
Figure 4
Figure 4. Figure 4: RMSE for Sea Surface Temperature (SST), Sea Surface Height (SSH), Sea Surface Salin [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: SSR averaged over all global ocean variables. The Spread-Skill Ratio (SSR) in [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Global SST at a 10 d lead, initialized on 2024-01-30. Ground truth is GLO12 analysis. [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Arctic SIT at 10 d lead time, initialized 2024-01-30. Ground truth is GLO12 analysis. [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Global SST predictions eval￾uated on satellite measurements. To further evaluate SST forecasts outside of OceanBench, we compare the predicted potential temperature of the up￾permost ocean layer against a global ocean bias-adjusted SST product [42], based on multi-sensor satellite observa￾tions [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 10
Figure 10. Figure 10: RMSE for Temperature (T), Salinity (S), Zonal Current (U) at 47 m depth, as well as Sea [PITH_FULL_IMAGE:figures/full_fig_p009_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Baltic Sea SST at 10 d lead time, initialized 2024-03-05. Ground truth is NEMO analysis. [PITH_FULL_IMAGE:figures/full_fig_p009_11.png] view at source ↗
Figure 9
Figure 9. Figure 9: SSR averaged over Baltic Sea variables. Across variables, Njord-Baltic achieves RMSE values comparable to SeaCast while providing probabilistic forecasts. In this regional setting, GLO12 exhibits a relatively flat error curve, similar to a cli￾matological baseline. Both Njord-Baltic and SeaCast clearly out￾perform persistence. Njord-Baltic matches SeaCast in determinis￾tic accuracy while additionally provi… view at source ↗
Figure 12
Figure 12. Figure 12: One-step prediction in the Njord-Baltic model. Residuals are predicted at time [PITH_FULL_IMAGE:figures/full_fig_p018_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Global graphs used by Njord, with grid nodes in blue, encoding/decoding edges in black, [PITH_FULL_IMAGE:figures/full_fig_p019_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Regional graphs used by Njord, with grid nodes in blue, M2G and G2M edges in black, [PITH_FULL_IMAGE:figures/full_fig_p020_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Example of mesh node placement in the Gulf of California (latitude [PITH_FULL_IMAGE:figures/full_fig_p021_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Example of mesh node placement in the northern Red Sea and Suez Canal (latitude [PITH_FULL_IMAGE:figures/full_fig_p021_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Example of mesh node placement in the Bråviken bay and Östergötland Archipelago, on [PITH_FULL_IMAGE:figures/full_fig_p022_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Example of mesh node placement in the Turku Archipelago in south-western Finland. [PITH_FULL_IMAGE:figures/full_fig_p022_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Ensemble mean CRPS scorecards. The heatmaps display the relative difference between [PITH_FULL_IMAGE:figures/full_fig_p026_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: The heatmaps display the relative difference in RMSE and CRPS between Njord trained [PITH_FULL_IMAGE:figures/full_fig_p027_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: Spatial evaluation of SIC at a 30-day lead time. The panels compare the ground truth [PITH_FULL_IMAGE:figures/full_fig_p029_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: Log-scaled scatter density heatmaps evaluating predicted versus observed SIC and SIT at [PITH_FULL_IMAGE:figures/full_fig_p030_22.png] view at source ↗
Figure 23
Figure 23. Figure 23: Ensemble mean CRPS scorecards. The heatmaps display the relative difference between [PITH_FULL_IMAGE:figures/full_fig_p030_23.png] view at source ↗
Figure 24
Figure 24. Figure 24: Global RMSE of SST by fore￾cast lead time, where Njord has the lowest error compared to satellite measurements. The dataset merges multi-sensor satellite observations into a Level-3 global grid [PITH_FULL_IMAGE:figures/full_fig_p035_24.png] view at source ↗
Figure 25
Figure 25. Figure 25: Spatial distribution of normalized RMSE difference for SST between Njord ensemble [PITH_FULL_IMAGE:figures/full_fig_p035_25.png] view at source ↗
Figure 26
Figure 26. Figure 26: Surface variables: SSH, SIC, and SIT. Columns from left to right show RMSE, CRPS, [PITH_FULL_IMAGE:figures/full_fig_p036_26.png] view at source ↗
Figure 27
Figure 27. Figure 27: Temperature at six different depths. Columns from left to right show RMSE, CRPS, and [PITH_FULL_IMAGE:figures/full_fig_p037_27.png] view at source ↗
Figure 28
Figure 28. Figure 28: Salinity at six different depths. Columns from left to right show RMSE, CRPS, and SSR. [PITH_FULL_IMAGE:figures/full_fig_p038_28.png] view at source ↗
Figure 29
Figure 29. Figure 29: Zonal current at six different depths. Columns from left to right show RMSE, CRPS, and [PITH_FULL_IMAGE:figures/full_fig_p039_29.png] view at source ↗
Figure 30
Figure 30. Figure 30: Normalized RMSE difference for various variables and depth levels, comparing ensemble [PITH_FULL_IMAGE:figures/full_fig_p040_30.png] view at source ↗
Figure 31
Figure 31. Figure 31: Sea ice concentration at lead time 10 d, init 2024-12-24. [PITH_FULL_IMAGE:figures/full_fig_p041_31.png] view at source ↗
Figure 32
Figure 32. Figure 32: Sea ice thickness at lead time 10 d, init 2024-12-24. [PITH_FULL_IMAGE:figures/full_fig_p042_32.png] view at source ↗
Figure 33
Figure 33. Figure 33: Temperature at the surface, lead time 10 d, init 2024-12-24. [PITH_FULL_IMAGE:figures/full_fig_p042_33.png] view at source ↗
Figure 34
Figure 34. Figure 34: Salinity at the surface, lead time 10 d, init 2024-12-24. [PITH_FULL_IMAGE:figures/full_fig_p043_34.png] view at source ↗
Figure 35
Figure 35. Figure 35: Zonal current at the surface, lead time 10 d, init 2024-12-24. [PITH_FULL_IMAGE:figures/full_fig_p043_35.png] view at source ↗
Figure 36
Figure 36. Figure 36: Meridional current at the surface, lead time 10 d, init 2024-12-24. [PITH_FULL_IMAGE:figures/full_fig_p044_36.png] view at source ↗
Figure 37
Figure 37. Figure 37: Sea surface height at lead time 10 d, init 2024-12-24. [PITH_FULL_IMAGE:figures/full_fig_p044_37.png] view at source ↗
Figure 38
Figure 38. Figure 38: Surface variables: SLA, SIC and SIT. Reanalysis variants are shown dashed and analysis [PITH_FULL_IMAGE:figures/full_fig_p045_38.png] view at source ↗
Figure 39
Figure 39. Figure 39: Temperature at 1, 9, 28, 47 and 91 m depth. [PITH_FULL_IMAGE:figures/full_fig_p046_39.png] view at source ↗
Figure 40
Figure 40. Figure 40: Salinity at 1, 9, 28, 47 and 91 m depth. [PITH_FULL_IMAGE:figures/full_fig_p047_40.png] view at source ↗
Figure 41
Figure 41. Figure 41: Meridional current at 1, 9, 28, 47 and 91 m depth. [PITH_FULL_IMAGE:figures/full_fig_p048_41.png] view at source ↗
Figure 42
Figure 42. Figure 42: Sea ice concentration at lead time 10 d, init 2024-02-20. [PITH_FULL_IMAGE:figures/full_fig_p049_42.png] view at source ↗
Figure 43
Figure 43. Figure 43: Sea ice thickness at lead time 10 d, init 2024-02-20. [PITH_FULL_IMAGE:figures/full_fig_p050_43.png] view at source ↗
Figure 44
Figure 44. Figure 44: Temperature at the surface, lead time 10 d, init 2024-02-20. [PITH_FULL_IMAGE:figures/full_fig_p050_44.png] view at source ↗
Figure 45
Figure 45. Figure 45: Salinity at the surface, lead time 10 d, init 2024-02-20. [PITH_FULL_IMAGE:figures/full_fig_p051_45.png] view at source ↗
Figure 46
Figure 46. Figure 46: Zonal current at the surface, lead time 10 d, init 2024-02-20. [PITH_FULL_IMAGE:figures/full_fig_p051_46.png] view at source ↗
Figure 47
Figure 47. Figure 47: Meridional current at the surface, lead time 10 d, init 2024-02-20. [PITH_FULL_IMAGE:figures/full_fig_p052_47.png] view at source ↗
Figure 48
Figure 48. Figure 48: Sea level anomaly at lead time 10 d, init 2024-02-20. [PITH_FULL_IMAGE:figures/full_fig_p052_48.png] view at source ↗
read the original abstract

Ocean dynamics are inherently chaotic, yet existing machine learning ocean models produce only deterministic forecasts. We introduce Njord, a probabilistic data-driven model for ocean forecasting, applicable to both global and regional domains. Njord combines a deep latent variable framework with a graph neural network architecture, enabling sampling each forecast step in a single forward pass. We apply Njord globally at 0.25{\deg} resolution and regionally to the Baltic Sea at 2 km resolution. To scale to these large ocean grids we introduce K-means cluster meshes that adapt to irregular sea surface geometry. Experiments demonstrate strong performance on both domains compared to deterministic machine learning baselines, while also providing uncertainty estimates from the sampled ensemble forecasts. On the global OceanBench benchmark, Njord achieves the lowest errors on average across upper-ocean variables when evaluated against real-world observations, with the largest improvements in surface temperature prediction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript introduces Njord, a probabilistic graph neural network for ensemble ocean forecasting that combines a deep latent variable model with GNN message passing to generate sampled forecasts in a single forward pass. It scales the approach to a global 0.25° grid and a regional 2 km Baltic Sea grid by introducing K-means cluster meshes that adapt to irregular sea-surface geometry. The central empirical claim is that Njord attains the lowest average errors across upper-ocean variables on the OceanBench benchmark when evaluated against real-world observations, with the largest gains in surface temperature, while also supplying uncertainty estimates from the ensemble.

Significance. If the performance and scaling claims are substantiated, the work would be significant for demonstrating that probabilistic GNNs can deliver calibrated ensemble forecasts for chaotic ocean dynamics at both global and high-resolution regional scales. The provision of uncertainty estimates alongside competitive point forecasts against real observations addresses a practical gap in existing deterministic ML ocean models. The adaptive mesh construction, if shown to respect physical boundaries, could serve as a reusable technique for applying graph-based methods to masked geophysical domains.

major comments (1)
  1. [Abstract] Abstract and mesh-construction section: the claim that K-means cluster meshes 'adapt to irregular sea surface geometry' is load-bearing for the scaling argument to 0.25° global and 2 km regional grids, yet no description is given of how land-sea masks are enforced, whether invalid cross-land edges are removed, or what mesh-quality metrics (e.g., connectivity, boundary fidelity) are satisfied. Standard K-means on latitude-longitude coordinates does not inherently respect masks; without explicit post-processing or boundary-aware clustering, message passing can produce unphysical connections, undermining the applicability claim.
minor comments (1)
  1. [Abstract] Abstract: quantitative error values, baseline definitions, and training details are omitted even though the headline performance claim is stated; adding at least the key RMSE or MAE numbers and the names of the deterministic ML baselines would improve readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive and detailed review. The concern about insufficient description of the K-means mesh construction and mask handling is well-taken. We address this point below and will revise the manuscript to provide the requested technical details.

read point-by-point responses
  1. Referee: [Abstract] Abstract and mesh-construction section: the claim that K-means cluster meshes 'adapt to irregular sea surface geometry' is load-bearing for the scaling argument to 0.25° global and 2 km regional grids, yet no description is given of how land-sea masks are enforced, whether invalid cross-land edges are removed, or what mesh-quality metrics (e.g., connectivity, boundary fidelity) are satisfied. Standard K-means on latitude-longitude coordinates does not inherently respect masks; without explicit post-processing or boundary-aware clustering, message passing can produce unphysical connections, undermining the applicability claim.

    Authors: We agree that the manuscript currently provides insufficient detail on how the K-means meshes enforce land-sea boundaries. In the revised version we will expand the mesh-construction section with the following additions: (i) clustering is performed exclusively on sea-grid points identified by the land-sea mask; (ii) after clustering, any graph edges connecting nodes separated by land are explicitly removed by a post-processing step that checks line-of-sight connectivity within the masked domain; (iii) we will report quantitative mesh-quality metrics including average node degree, fraction of boundary nodes, and verification that no cross-land edges remain. These clarifications will substantiate the adaptation claim and rule out unphysical message passing. We believe the revised description will fully address the referee’s concern. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation and claims are self-contained with external validation

full rationale

The paper presents Njord as a novel probabilistic latent-variable GNN for ensemble ocean forecasting, with K-means cluster meshes introduced to handle irregular sea-surface geometry at global 0.25° and regional 2 km scales. The central performance claim rests on evaluation against real-world observations on the public OceanBench benchmark, which is independent of the model's fitted parameters or internal definitions. No equations, predictions, or uniqueness arguments in the abstract or described content reduce by construction to inputs, self-citations, or ansatzes; the architecture and mesh adaptation are positioned as original contributions whose validity is tested externally rather than assumed via prior self-referential results.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The work rests on standard neural-network training assumptions and the domain premise that graph representations plus clustering suffice for ocean geometry; no new physical axioms or invented entities are introduced.

free parameters (1)
  • Neural network hyperparameters (depth, width, learning rate, latent dimension)
    Chosen or tuned during training; typical for any deep learning model and not derived from first principles.
axioms (1)
  • domain assumption Ocean dynamics on irregular domains can be faithfully represented by graph neural networks on K-means-derived meshes
    Invoked to justify scaling to global and regional grids; stated in the abstract description of the architecture.

pith-pipeline@v0.9.0 · 5684 in / 1251 out tokens · 73474 ms · 2026-05-19T14:39:20.562050+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 2 internal anchors

  1. [1]

    From observation to information and users: The Copernicus Marine Service perspective.Frontiers in Marine Science, 6:234, 2019

    Pierre Yves Le Traon, Antonio Reppucci, Enrique Alvarez Fanjul, Lotfi Aouf, Arno Behrens, Maria Belmonte, Abderrahim Bentamy, Laurent Bertino, Vittorio Ernesto Brando, Matilde Brandt Kreiner, et al. From observation to information and users: The Copernicus Marine Service perspective.Frontiers in Marine Science, 6:234, 2019

  2. [2]

    Evolution of the Copernicus Marine Service global ocean analysis and forecasting high-resolution system: Potential benefit for a wide range of users

    Jean-Michel Lellouche, Eric Greiner, Giovanni Ruggiero, Romain Bourdallé-Badie, Charles- Emmanuel Testut, Olivier Le Galloudec, Mounir Benkiran, and Gilles Garric. Evolution of the Copernicus Marine Service global ocean analysis and forecasting high-resolution system: Potential benefit for a wide range of users. InEuroGOOS International Conference, volume...

  3. [3]

    Nemo-Nordic 2.0: Operational marine forecast model for the Baltic Sea.Geoscientific Model Development, 14(9):5731–5749, 2021

    Tuomas Kärnä, Patrik Ljungemyr, Saeed Falahat, Ida Ringgaard, Lars Axell, Vasily Korabel, Jens Murawski, Ilja Maljutenko, Anja Lindenthal, Simon Jandt-Scheelke, et al. Nemo-Nordic 2.0: Operational marine forecast model for the Baltic Sea.Geoscientific Model Development, 14(9):5731–5749, 2021

  4. [4]

    GLONET: Mercator’s end-to-end neural global ocean forecasting system.Journal of Geophysical Research: Machine Learning and Computation, 2(3), 2025

    Anass El Aouni, Quentin Gaudel, Charly Regnier, Simon Van Gennip, Olivier Le Galloudec, Marie Drevillon, Yann Drillet, and Jean-Michel Lellouche. GLONET: Mercator’s end-to-end neural global ocean forecasting system.Journal of Geophysical Research: Machine Learning and Computation, 2(3), 2025

  5. [5]

    Accurate Mediter- ranean Sea forecasting via graph-based deep learning.Scientific Reports, 15(45051), 2025

    Daniel Holmberg, Emanuela Clementi, Italo Epicoco, and Teemu Roos. Accurate Mediter- ranean Sea forecasting via graph-based deep learning.Scientific Reports, 15(45051), 2025

  6. [6]

    Forecasting the eddying ocean with a deep neural network

    Yingzhe Cui, Ruohan Wu, Xiang Zhang, Ziqi Zhu, Bo Liu, Jun Shi, Junshi Chen, Hailong Liu, Shenghui Zhou, Liang Su, et al. Forecasting the eddying ocean with a deep neural network. Nature Communications, 16(1):2268, 2025. 10

  7. [7]

    XiHe: A data-driven model for global ocean eddy-resolving forecasting.arXiv preprint arXiv:2402.02995, 2024

    Xiang Wang, Renzhi Wang, Ningzi Hu, Pinqiang Wang, Peng Huo, Guihua Wang, Huizan Wang, Senzhang Wang, Junxing Zhu, Jianbo Xu, et al. XiHe: A data-driven model for global ocean eddy-resolving forecasting.arXiv preprint arXiv:2402.02995, 2024

  8. [8]

    FuXi-Ocean: A global ocean forecasting system with sub-daily resolution

    Qiusheng Huang, Yuan Niu, Xiaohui Zhong, Anboyu Guo, Lei Chen, Dianjun Zhang, Xuefeng Zhang, and Hao Li. FuXi-Ocean: A global ocean forecasting system with sub-daily resolution. InAdvances in Neural Information Processing Systems, volume 38, 2025

  9. [9]

    Probabilistic weather forecasting with hierarchical graph neural networks

    Joel Oskarsson, Tomas Landelius, Marc P Deisenroth, and Fredrik Lindsten. Probabilistic weather forecasting with hierarchical graph neural networks. InAdvances in Neural Informa- tion Processing Systems, volume 37, 2024

  10. [10]

    Proba- bilistic weather forecasting with machine learning.Nature, 637(8044):84–90, 2025

    Ilan Price, Alvaro Sanchez-Gonzalez, Ferran Alet, Tom R Andersson, Andrew El-Kadi, Do- minic Masters, Timo Ewalds, Jacklynn Stott, Shakir Mohamed, Peter Battaglia, et al. Proba- bilistic weather forecasting with machine learning.Nature, 637(8044):84–90, 2025

  11. [11]

    Ocean- Net: A principled neural operator-based digital twin for regional oceans.Scientific Reports, 14 (21181), 2024

    Ashesh Chattopadhyay, Michael Gray, Tianning Wu, Anna B Lowe, and Ruoying He. Ocean- Net: A principled neural operator-based digital twin for regional oceans.Scientific Reports, 14 (21181), 2024

  12. [12]

    OceanBench: A benchmark for data-driven global ocean forecasting systems

    Anass El Aouni, Quentin Gaudel, Juan Emmanuel Johnson, Regnier Charly, Julien Le Sommer, Ronan Fablet, Marie Drevillon, Yann Drillet, Pierre Yves Le Traon, et al. OceanBench: A benchmark for data-driven global ocean forecasting systems. InNeural Information Processing Systems, volume 39, 2025

  13. [13]

    Seasonal Arctic sea ice forecasting with probabilistic deep learning.Nature Communications, 12(1):5124, 2021

    Tom R Andersson, J Scott Hosking, María Pérez-Ortiz, Brooks Paige, Andrew Elliott, Chris Russell, Stephen Law, Daniel C Jones, Jeremy Wilkinson, Tony Phillips, et al. Seasonal Arctic sea ice forecasting with probabilistic deep learning.Nature Communications, 12(1):5124, 2021

  14. [14]

    Coupled ocean-atmosphere dynamics in a machine learning Earth system model.arXiv preprint arXiv:2406.08632, 2024

    Chenggong Wang, Michael S Pritchard, Noah Brenowitz, Yair Cohen, Boris Bonev, Thorsten Kurth, Dale Durran, and Jaideep Pathak. Coupled ocean-atmosphere dynamics in a machine learning Earth system model.arXiv preprint arXiv:2406.08632, 2024

  15. [15]

    Samudra: An AI global ocean emulator for climate.Geo- physical Research Letters, 52(10), 2025

    Surya Dheeshjith, Adam Subel, Alistair Adcroft, Julius Busecke, Carlos Fernandez-Granda, Shubham Gupta, and Laure Zanna. Samudra: An AI global ocean emulator for climate.Geo- physical Research Letters, 52(10), 2025

  16. [16]

    Data-driven ensemble prediction of the global ocean.arXiv preprint arXiv:2603.19591, 2026

    Qiusheng Huang, Xiaohui Zhong, Anboyu Guo, Ziyi Peng, Lei Chen, and Hao Li. Data-driven ensemble prediction of the global ocean.arXiv preprint arXiv:2603.19591, 2026

  17. [17]

    Kilometer-scale convection-allowing model emulation using generative diffusion modeling.Science Advances, 12(5):eadv0423, 2026

    Jaideep Pathak, Yair Cohen, Piyush Garg, Peter Harrington, Noah Brenowitz, Dale Durran, Morteza Mardani, Arash Vahdat, Shaoming Xu, Karthik Kashinath, et al. Kilometer-scale convection-allowing model emulation using generative diffusion modeling.Science Advances, 12(5):eadv0423, 2026

  18. [18]

    Diffusion-LAM: Prob- abilistic limited area weather forecasting with diffusion

    Erik Larsson, Joel Oskarsson, Tomas Landelius, and Fredrik Lindsten. Diffusion-LAM: Prob- abilistic limited area weather forecasting with diffusion. InICLR 2025 Workshop on Tackling Climate Change with Machine Learning, 2025

  19. [19]

    AIFS-CRPS: Ensemble forecasting using a model trained with a loss function based on the continuous ranked probability score.npj Artificial Intelligence, 2(1):18, 2026

    Simon Lang, Mihai Alexe, Mariana CA Clare, Christopher Roberts, Rilwan Adewoyin, Zied Ben Bouallègue, Matthew Chantry, Jesper Dramsch, Peter D Dueben, Sara Hahner, et al. AIFS-CRPS: Ensemble forecasting using a model trained with a loss function based on the continuous ranked probability score.npj Artificial Intelligence, 2(1):18, 2026

  20. [20]

    Probabilis- tic forecasting with generative networks via scoring rule minimization.Journal of Machine Learning Research, 25(45):1–64, 2024

    Lorenzo Pacchiardi, Rilwan A Adewoyin, Peter Dueben, and Ritabrata Dutta. Probabilis- tic forecasting with generative networks via scoring rule minimization.Journal of Machine Learning Research, 25(45):1–64, 2024

  21. [21]

    arXiv, ://arxiv.org/abs/2507.12144, arXiv:2507.12144 [cs], doi:10.48550/arXiv.2507.12144

    Boris Bonev, Thorsten Kurth, Ankur Mahesh, Mauro Bisson, Jean Kossaifi, Karthik Kashinath, Anima Anandkumar, William D Collins, Michael S Pritchard, and Alexander Keller. FourCast- Net 3: A geometric approach to probabilistic machine-learning weather forecasting at scale. arXiv preprint arXiv:2507.12144, 2025. 11

  22. [22]

    arXiv, ://arxiv.org/abs/2506.10772, arXiv:2506.10772 [cs], doi:10.48550/arXiv.2506.10772

    Ferran Alet, Ilan Price, Andrew El-Kadi, Dominic Masters, Stratis Markou, Tom R Andersson, Jacklynn Stott, Remi Lam, Matthew Willson, Alvaro Sanchez-Gonzalez, et al. Skillful joint probabilistic weather forecasting from marginals.arXiv preprint arXiv:2506.10772, 2025

  23. [23]

    CRPS-LAM: Regional ensemble weather forecasting from matching marginals

    Erik Larsson, Joel Oskarsson, Tomas Landelius, and Fredrik Lindsten. CRPS-LAM: Regional ensemble weather forecasting from matching marginals. InEurIPS 2025 Workshop on AI for Climate and Conservation, 2025

  24. [24]

    High-resolution probabilistic data-driven weather modeling with a stretched-grid.arXiv preprint arXiv:2511.23043, 2025

    Even Marius Nordhagen, Håvard Homleid Haugen, Aram Farhad Shafiq Salihi, Magnus Sikora Ingstad, Thomas Nils Nipen, Ivar Ambjørn Seierstad, Inger-Lise Frogner, Mariana Clare, Si- mon Lang, Matthew Chantry, et al. High-resolution probabilistic data-driven weather modeling with a stretched-grid.arXiv preprint arXiv:2511.23043, 2025

  25. [25]

    Learning structured output representation using deep conditional generative models

    Kihyuk Sohn, Honglak Lee, and Xinchen Yan. Learning structured output representation using deep conditional generative models. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 28, 2015

  26. [26]

    AERIS: Argonne Earth systems model for reliable and skillful predictions

    Väinö Hatanpää, Eugene Ku, Jason Stock, Murali Emani, Sam Foreman, Chunyong Jung, Sandeep Madireddy, Tung Nguyen, Varuni Sastry, Ray AO Sinurat, et al. AERIS: Argonne Earth systems model for reliable and skillful predictions. InProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 72– 85, 2025

  27. [27]

    To- wards diffusion models for large-scale sea-ice modelling

    Tobias Sebastian Finn, Charlotte Durand, Alban Farchi, Marc Bocquet, and Julien Brajard. To- wards diffusion models for large-scale sea-ice modelling. InICML 2024 Workshop on Machine Learning for Earth System Modeling, 2024

  28. [28]

    SwinVRNN: A data-driven ensemble fore- casting model via learned distribution perturbation.Journal of Advances in Modeling Earth Systems, 15(2), 2023

    Yuan Hu, Lei Chen, Zhibin Wang, and Hao Li. SwinVRNN: A data-driven ensemble fore- casting model via learned distribution perturbation.Journal of Advances in Modeling Earth Systems, 15(2), 2023

  29. [29]

    Interaction networks for learning about objects, relations and physics

    Peter Battaglia, Razvan Pascanu, Matthew Lai, Danilo Jimenez Rezende, et al. Interaction networks for learning about objects, relations and physics. InAdvances in Neural Information Processing Systems, volume 29, 2016

  30. [30]

    Learning skillful medium-range global weather forecasting.Science, 382(6677):1416–1421, 2023

    Remi Lam, Alvaro Sanchez-Gonzalez, Matthew Willson, Peter Wirnsberger, Meire Fortunato, Ferran Alet, Suman Ravuri, Timo Ewalds, Zach Eaton-Rosen, Weihua Hu, et al. Learning skillful medium-range global weather forecasting.Science, 382(6677):1416–1421, 2023

  31. [31]

    arXiv, ://arxiv.org/abs/2406.01465, doi:10.48550/arXiv.2406.01465

    Simon Lang, Mihai Alexe, Matthew Chantry, Jesper Dramsch, Florian Pinault, Baudouin Raoult, Mariana CA Clare, Christian Lessig, Michael Maier-Gerber, Linus Magnusson, et al. AIFS–ECMWF’s data-driven forecasting system.arXiv preprint arXiv:2406.01465, 2024

  32. [32]

    Convolutional conditional neural processes

    Jonathan Gordon, Wessel P Bruinsma, Andrew YK Foong, James Requeima, Yann Dubois, and Richard E Turner. Convolutional conditional neural processes. InInternational Conference on Learning Representations, 2020

  33. [33]

    A foundation model for the Earth system.Nature, 641(8065):1180–1187, 2025

    Cristian Bodnar, Wessel P Bruinsma, Ana Lucic, Megan Stanley, Anna Allen, Johannes Brand- stetter, Patrick Garvan, Maik Riechert, Jonathan A Weyn, Haiyu Dong, et al. A foundation model for the Earth system.Nature, 641(8065):1180–1187, 2025

  34. [34]

    Andreas Griewank and Andrea Walther. Algorithm 799: Revolve: An implementation of checkpointing for the reverse or adjoint mode of computational differentiation.ACM Transac- tions on Mathematical Software, 26(1):19–45, 2000

  35. [35]

    Training Deep Nets with Sublinear Memory Cost

    Tianqi Chen, Bing Xu, Chiyuan Zhang, and Carlos Guestrin. Training deep nets with sublinear memory cost.arXiv preprint arXiv:1604.06174, 2016

  36. [36]

    Regional ocean forecasting with hierarchical graph neural networks

    Daniel Holmberg, Emanuela Clementi, and Teemu Roos. Regional ocean forecasting with hierarchical graph neural networks. InNeurIPS 2024 Workshop on Tackling Climate Change with Machine Learning, 2024. 12

  37. [37]

    Building machine learning limited area models: Kilometer-scale weather forecasting in realistic settings.arXiv preprint arXiv:2504.09340, 2025

    Simon Adamov, Joel Oskarsson, Leif Denby, Tomas Landelius, Kasper Hintz, Simon Chris- tiansen, Irene Schicker, Carlos Osuna, Fredrik Lindsten, Oliver Fuhrer, et al. Building machine learning limited area models: Kilometer-scale weather forecasting in realistic settings.arXiv preprint arXiv:2504.09340, 2025

  38. [38]

    The Copernicus global 1/12 oceanic and sea ice GLORYS12 reanalysis.Frontiers in Earth Science, 9:698876, 2021

    Jean-Michel Lellouche, Eric Greiner, Romain Bourdallé-Badie, Gilles Garric, Angélique Melet, Marie Drévillon, Clément Bricaud, Mathieu Hamon, Olivier Le Galloudec, Charly Reg- nier, et al. The Copernicus global 1/12 oceanic and sea ice GLORYS12 reanalysis.Frontiers in Earth Science, 9:698876, 2021

  39. [39]

    NEMO ocean engine

    Gurvan Madec and the NEMO team. NEMO ocean engine. Technical report, Institut Pierre- Simon Laplace, 2016

  40. [40]

    The ERA5 global reanalysis.Quarterly Journal of the Royal Meteorological Society, 146(730):1999–2049, 2020

    Hans Hersbach, Bill Bell, Paul Berrisford, Shoji Hirahara, András Horányi, Joaquín Muñoz- Sabater, Julien Nicolas, Carole Peubey, Raluca Radu, Dinand Schepers, et al. The ERA5 global reanalysis.Quarterly Journal of the Royal Meteorological Society, 146(730):1999–2049, 2020

  41. [41]

    Integrated forecasting system, 2024.https://www.ecmwf.int/en/forecasts/ documentation-and-support/changes-ecmwf-model

    ECMWF. Integrated forecasting system, 2024.https://www.ecmwf.int/en/forecasts/ documentation-and-support/changes-ecmwf-model

  42. [42]

    Copernicus Marine Service Information

    E.U. Copernicus Marine Service Information. ODYSSEA global ocean - sea surface tempera- ture multi-sensor L3 observations, 2026. URLhttps://doi.org/10.48670/moi-00164

  43. [43]

    Graph-based neural weather predic- tion for limited area modeling

    Joel Oskarsson, Tomas Landelius, and Fredrik Lindsten. Graph-based neural weather predic- tion for limited area modeling. InNeurIPS 2023 Workshop on Tackling Climate Change with Machine Learning, 2023

  44. [44]

    C. A. T. Ferro. Fair scores for ensemble forecasts.Quarterly Journal of the Royal Meteoro- logical Society, 140(683):1917–1923, 2014

  45. [45]

    Decoupled weight decay regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InInternational Conference on Learning Representations, 2019

  46. [46]

    Magneto-thermoelectric current induced by phonon drag in low-dimensional junctions

    V . Fortin, M. Abaza, F. Anctil, and R. Turcotte. Why should ensemble spread match the RMSE of the ensemble mean?Journal of Hydrometeorology, 15(4):1708 – 1713, 2014. A Model Details A.1 Graph-EFM details We adopt the probabilistic framework of Graph-EFM [9], a latent variable model in which stochas- ticity is introduced through latent variablesZdefined o...