An efficient multi-GPU implementation for the Discontinuous Galerkin ocean model SLIM

Ange P. Ishimwe; Colin Scherpereel; Emmanuel Hanert; Jonathan Lambrechts; Miguel De Le Court; Vincent Legat

arxiv: 2605.16082 · v1 · pith:WWZZXKGKnew · submitted 2026-05-15 · 💻 cs.DC · physics.ao-ph· physics.comp-ph· physics.flu-dyn

An efficient multi-GPU implementation for the Discontinuous Galerkin ocean model SLIM

Miguel De Le Court , Vincent Legat , Ange P. Ishimwe , Colin Scherpereel , Emmanuel Hanert , Jonathan Lambrechts This is my paper

Pith reviewed 2026-05-19 18:37 UTC · model grok-4.3

classification 💻 cs.DC physics.ao-phphysics.comp-phphysics.flu-dyn

keywords Discontinuous GalerkinGPU computingocean modelingSLIMmulti-GPUunstructured meshcoastal simulationhigh-performance computing

0 comments

The pith

A GPU-optimized Discontinuous Galerkin ocean model achieves the speed of roughly 1500 CPU cores on a single card and scales to 1024 GPUs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper describes a complete 3D implementation of the SLIM unstructured-mesh ocean model that uses Discontinuous Galerkin finite elements and is rewritten for both single-GPU and multi-GPU hardware. It reports that one high-end GPU delivers performance comparable to about 1500 CPU cores and that swapping a 128-core CPU node for a four-GPU node produces roughly fifty times the speed. The code keeps high weak-scaling efficiency out to 1024 GPUs and is tested on a Great Barrier Reef domain at five times the spatial resolution of earlier models while running one hundred times faster than real time. This work targets the long-standing barrier that high computational cost has placed on detailed coastal simulations with DG methods.

Core claim

Mapping the DG-FE ocean equations to GPU kernels through optimized memory layouts, element-wise parallelization, and matrix-free treatment of vertical processes produces an implementation that runs efficiently on both NVIDIA and AMD GPUs, maintains weak scaling to 1024 devices, and supports real-world coastal runs at previously unattainable resolution.

What carries the argument

GPU kernels for Discontinuous Galerkin finite elements that use matrix-free vertical solvers and distributed multi-GPU communication.

If this is right

A four-GPU node can replace a 128-core CPU node and deliver about fifty times higher throughput for the same coastal model.
Spatial resolution five times finer than current best models becomes feasible while still running faster than real time.
Weak scaling that holds to 1024 GPUs opens the door to ensemble forecasts or basin-scale high-resolution studies.
The same kernel strategies apply to both NVIDIA and AMD architectures, reducing dependence on a single vendor.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar GPU mappings could be applied to other DG-based fluid models in atmosphere or ice-sheet science.
Routine availability of such resolution may improve forecasts of localized coastal hazards such as reef bleaching or storm surge.
The approach could be combined with adaptive mesh refinement to focus compute only where needed.
Operational centers might adopt GPU clusters to run multiple high-resolution scenarios within the same wall-clock window.

Load-bearing premise

The Discontinuous Galerkin formulation and vertical processes can be turned into GPU kernels whose communication overhead stays low enough that the reported benchmarks remain representative of full production runs.

What would settle it

A timing measurement on the Great Barrier Reef case that shows the physical-to-numerical time ratio falling well below 100 because of unexpected data-transfer costs would falsify the performance claims.

Figures

Figures reproduced from arXiv: 2605.16082 by Ange P. Ishimwe, Colin Scherpereel, Emmanuel Hanert, Jonathan Lambrechts, Miguel De Le Court, Vincent Legat.

**Figure 2.** Figure 2: (a) Schematic view of the five main components of a time step. The ordering shown corresponds [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Structure-of-Arrays (SoA) memory layout used in our implementation. Prisms within a column [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Example of a cell layout for the same mesh as Figure [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Example of a cell matrix with two layers and 128 columns for a scalar field (6 values per layer). [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Subset of a cell processed by a block of 128 threads. This example shows a cell with 128 [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Kernels used for the computation of the horizontal terms of the momentum equation. [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 8.** Figure 8: Kernels of the 2D external mode during a full step of the scheme. In this example, the external [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

**Figure 9.** Figure 9: Kernels used for the computation of the vertical terms of the momentum equation during an [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗

**Figure 10.** Figure 10: Kernels used for the computation of the vertical terms of the momentum equation during an [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗

**Figure 11.** Figure 11: Timeline of both the Compute stream and Communications stream for the two main phases [PITH_FULL_IMAGE:figures/full_fig_p016_11.png] view at source ↗

**Figure 12.** Figure 12: Timeline of both the Compute stream and Communications stream for 3 iterations of the 2D [PITH_FULL_IMAGE:figures/full_fig_p016_12.png] view at source ↗

**Figure 13.** Figure 13: Performance of the 3D model on various hardware platforms with 32 layers and increasing [PITH_FULL_IMAGE:figures/full_fig_p018_13.png] view at source ↗

**Figure 14.** Figure 14: Global memory bandwidth and floating-point throughput as a percent of peak over a complete [PITH_FULL_IMAGE:figures/full_fig_p019_14.png] view at source ↗

**Figure 15.** Figure 15: Normalized time per iteration of the 3D scheme as a function of the number of layers for an [PITH_FULL_IMAGE:figures/full_fig_p020_15.png] view at source ↗

**Figure 16.** Figure 16: Scaling of the 3D model with 32 layers on the MeluXina cluster with A100 GPUs. [PITH_FULL_IMAGE:figures/full_fig_p021_16.png] view at source ↗

**Figure 17.** Figure 17: Efficiency of the 3D model with 32 layers on the MeluXina cluster with A100 GPUs. [PITH_FULL_IMAGE:figures/full_fig_p022_17.png] view at source ↗

**Figure 18.** Figure 18: Scaling of the 3D model with 32 layers on the LUMI cluster equipped with MI250X GPUs. [PITH_FULL_IMAGE:figures/full_fig_p023_18.png] view at source ↗

**Figure 19.** Figure 19: Computational mesh for the Great Barrier Reef configuration. The horizontal resolution [PITH_FULL_IMAGE:figures/full_fig_p025_19.png] view at source ↗

**Figure 20.** Figure 20: Surface vertical vorticity at increasing levels of zoom in the Great Barrier Reef domain. The [PITH_FULL_IMAGE:figures/full_fig_p026_20.png] view at source ↗

**Figure 21.** Figure 21: Modelled sea surface temperature on October 31, 2024 at 10:00, after two months of sim [PITH_FULL_IMAGE:figures/full_fig_p027_21.png] view at source ↗

read the original abstract

Unstructured-mesh ocean models are increasingly used for coastal applications due to their ability to represent complex geometries and apply local grid refinement where needed. However, their broader use has been hindered by their high computational cost, particularly for models based on the Discontinuous Galerkin finite element (DG-FE) method, which involves significantly more degrees of freedom than traditional finite volume or continuous finite element approaches. The rapid emergence of GPU-based high-performance computing architectures now offers a pathway to address this limitation, as DG-FE formulations are inherently well suited to massively parallel, element-wise computations. Here, we present a full 3D DG-FE ocean model implementation optimized for both single- and multi-GPU systems, with support for both NVIDIA and AMD architectures. We detail the computational strategies employed to achieve high performance, including memory layout optimization, kernel-level parallelization, and matrix-free solvers for key vertical processes. Benchmark results demonstrate that a single HPC-grade GPU (e.g. NVIDIA A100) delivers performance equivalent to approximately 1500 CPU cores, while replacing a 128-core CPU node with a 4xA100 GPU node yields a speedup of around 50x. Weak-scaling efficiency is maintained up to 1024 GPUs. We further demonstrate the model's capabilities on a real-world application in the Great Barrier Reef, achieving a spatial resolution five times finer than the most accurate existing model while maintaining a physical-to-numerical time ratio of 100. These results highlight how GPU-accelerated DG-FE methods can dramatically advance the capabilities of unstructured-mesh ocean modeling, enabling ultra-high-resolution coastal simulations that were previously infeasible.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Solid engineering port of SLIM to multi-GPU with cross-vendor support and real-application benchmarks, but the 1024-GPU weak scaling rests on unshown halo-exchange costs.

read the letter

The paper delivers a working multi-GPU version of the SLIM DG-FE ocean model, including memory-layout changes, kernel parallelization, matrix-free vertical solvers, and support for both NVIDIA and AMD hardware. The headline numbers are a single A100 matching roughly 1500 CPU cores, a 50x node-level speedup, and weak scaling that holds to 1024 GPUs on a refined Great Barrier Reef mesh that runs five times finer than prior models at a 100:1 physical-to-numerical time ratio. That is concrete engineering output that could let coastal groups run higher-resolution cases on existing GPU clusters without rewriting the whole code base from scratch. The real-world example helps show the practical payoff rather than just synthetic kernels. The soft spot is the scaling story. Unstructured coastal meshes produce irregular halo regions whose volume grows with local refinement, and the abstract gives no breakdown of compute versus communication time at the largest scale. If standard GPU-aware MPI is used without explicit overlap or compression, latency could start to dominate well before 1024 GPUs; the paper would be stronger with a short profile or table showing that overhead stays under 15 percent. Baselines also lack error bars and exact CPU node specs, so the 1500-core equivalence is plausible but not fully pinned down. This is for people who already run or maintain unstructured ocean models and want GPU porting recipes they can adapt. A reader working on coastal hazard or ecosystem applications would get usable ideas and numbers. It is worth sending to peer review; the implementation is substantial enough that referees can check the missing profiles and confirm the claims hold in the full code and data.

Referee Report

1 major / 2 minor

Summary. This paper presents an efficient implementation of the Discontinuous Galerkin finite element (DG-FE) ocean model SLIM for multi-GPU systems, including optimizations for memory layout, kernel parallelization, and matrix-free solvers. It reports performance benchmarks showing a single NVIDIA A100 GPU equivalent to approximately 1500 CPU cores, a 50x speedup when replacing a 128-core CPU node with a 4xA100 GPU node, maintained weak-scaling efficiency up to 1024 GPUs, and a real-world application to the Great Barrier Reef achieving five times finer spatial resolution with a physical-to-numerical time ratio of 100.

Significance. Should the reported performance and scaling results prove robust, this work would be significant for the field of computational ocean modeling. It demonstrates how GPU acceleration can overcome the high computational cost of DG-FE methods on unstructured meshes, potentially enabling ultra-high-resolution simulations of complex coastal environments that were not feasible before.

major comments (1)

[Weak scaling results] The claim that weak-scaling efficiency is maintained up to 1024 GPUs is central to the multi-GPU contribution. However, the manuscript does not provide a breakdown of the fraction of wall time spent on inter-GPU communication versus computation at large scales. On unstructured meshes with refinement, such as the Great Barrier Reef application, halo exchange volumes can be irregular and substantial; without profiling data showing that communication remains a small percentage of total time, the scaling efficiency cannot be confidently assessed.

minor comments (2)

[Benchmark description] The performance equivalence of one A100 to 1500 CPU cores and the 50x node speedup lack accompanying error bars, details on the exact CPU configuration (e.g., core count per node, processor type), and verification that all overheads are accounted for in the 1500-core equivalence.
[Abstract] The abstract mentions support for both NVIDIA and AMD architectures but provides no specific performance numbers for AMD GPUs, which would help assess portability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive review and for recognizing the potential significance of our multi-GPU DG-FE implementation. We address the major comment below and will revise the manuscript to incorporate additional profiling data.

read point-by-point responses

Referee: [Weak scaling results] The claim that weak-scaling efficiency is maintained up to 1024 GPUs is central to the multi-GPU contribution. However, the manuscript does not provide a breakdown of the fraction of wall time spent on inter-GPU communication versus computation at large scales. On unstructured meshes with refinement, such as the Great Barrier Reef application, halo exchange volumes can be irregular and substantial; without profiling data showing that communication remains a small percentage of total time, the scaling efficiency cannot be confidently assessed.

Authors: We appreciate the referee's point that a communication-versus-computation breakdown would strengthen the scaling claims, particularly for the irregular halo exchanges that arise on locally refined unstructured meshes. The reported weak-scaling efficiencies are derived from full wall-clock timings that already include all inter-GPU communication; however, we agree that explicit profiling data would allow readers to assess the overhead more directly. In the revised manuscript we will add a new figure and accompanying text that report the measured fraction of wall time spent on halo exchanges (via CUDA-aware MPI or equivalent) at representative scale points up to 1024 GPUs. Where possible we will also include the corresponding breakdown for the Great Barrier Reef configuration. revision: yes

Circularity Check

0 steps flagged

No circularity: performance claims are direct experimental measurements

full rationale

This is an implementation and benchmarking paper whose central results consist of measured wall-clock times, speedups, and weak-scaling efficiencies obtained on actual GPU hardware. The reported equivalences (single A100 ≈ 1500 CPU cores, 50× node speedup, scaling to 1024 GPUs) and the Great Barrier Reef run metrics are direct outcomes of the described kernels and MPI/GPU-direct exchanges; they are not obtained by fitting parameters to a subset of the same data and then re-deriving the same quantities, nor by self-definitional equations or load-bearing self-citations. The derivation chain is therefore self-contained against external hardware benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper is an engineering implementation of an established DG-FE ocean model; it introduces no new physical axioms or invented entities and relies on standard parallel computing practices and existing ocean-model assumptions.

axioms (1)

domain assumption Standard assumptions of the Discontinuous Galerkin finite-element discretization for shallow-water and 3D ocean equations remain valid on GPU architectures.
The implementation inherits the mathematical formulation of the original SLIM model without re-deriving or altering its governing equations.

pith-pipeline@v0.9.0 · 5861 in / 1442 out tokens · 48873 ms · 2026-05-19T18:37:09.946944+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages

[1]

2020 , publisher =

Allen Coral Atlas [Dataset] , shorttitle =. 2020 , publisher =. doi:10.5281/zenodo.3833242 , urldate =

work page doi:10.5281/zenodo.3833242 2020
[2]

2021 , doi =

Global Distribution of Coral Reefs, Compiled from Multiple Sources Including the Millennium Coral Reef Mapping Project [Dataset] , author =. 2021 , doi =

work page 2021
[3]

2020 , publisher =

AusBathyTopo (Great Barrier Reef) 30m 2017 -- A Regional-Scale Depth Model (20170025C) [Dataset] , author =. 2020 , publisher =. doi:10.4225/25/5a207b36022d2 , urldate =

work page doi:10.4225/25/5a207b36022d2 2017
[4]

2020 , publisher =

High Resolution Depth Model for the Great Barrier Reef and Coral Sea 100 m [Dataset] , author =. 2020 , publisher =. doi:10.26186/5E2F8BB629D07 , urldate =

work page doi:10.26186/5e2f8bb629d07 2020
[5]

2023 , publisher =

Torres Strait Bathymetry 30m 2020 -- A High-Resolution Depth Model (20200021C) [Dataset] , author =. 2023 , publisher =. doi:10.26186/144348 , urldate =

work page doi:10.26186/144348 2020
[6]

2020 , month = mar, publisher =

Gulf of Papua Bathymetry Raster Dataset [Dataset] , author =. 2020 , month = mar, publisher =. doi:10.6084/m9.figshare.11986797.v1 , urldate =

work page doi:10.6084/m9.figshare.11986797.v1 2020
[7]

2015 , urldate =

Planet Dump Retrieved from https://planet.osm.org [Dataset] , author =. 2015 , urldate =

work page 2015
[8]

2023 , publisher =

Bureau of Meteorology atmospheric high-resolution regional reanalysis for Australia -- Version 2 (BARRA2) [Dataset] , author =. 2023 , publisher =

work page 2023
[9]

Bluelink

Chamberlain, Matthew and. Bluelink. doi:10.25914/2WXJ-VT48 , urldate =

work page doi:10.25914/2wxj-vt48
[10]

2002 , journal =

An oceanic general circulation model framed in hybrid isopycnic-Cartesian coordinates , author =. 2002 , journal =

work page 2002
[11]

2022 , institution =

BARRA2: Development of the next-generation Australian regional atmospheric reanalysis , author =. 2022 , institution =

work page 2022
[12]

2024 , institution =

BARRA-C2: Development of the kilometre-scale downscaled atmospheric reanalysis over Australia , author =. 2024 , institution =

work page 2024
[13]

2023 , journal =

Simulations in the era of exascale Computing , author =. 2023 , journal =

work page 2023
[14]

, year =

Chen, Changsheng and Liu, Hedong and Beardsley, Robert C. , year =. An unstructured grid, finite-volume, three-dimensional, primitive equations ocean model: application to coastal ocean and estuaries , shorttitle =. Journal of Atmospheric and Oceanic Technology , volume =

work page
[15]

2021 , journal =

Evolution of the graphics processing unit (GPU) , author =. 2021 , journal =

work page 2021
[16]

2017 , journal =

The finite-volume sea ice-ocean model (FESOM2) , author =. 2017 , journal =

work page 2017
[17]

2015 , journal =

Resolving eddies by local mesh refinement , author =. 2015 , journal =

work page 2015
[18]

2013 , journal =

A parallel local timestepping Runge--Kutta discontinuous Galerkin method with applications to coastal ocean modeling , author =. 2013 , journal =

work page 2013
[19]

2002 , journal =

Efficient inverse modeling of barotropic ocean tides , author =. 2002 , journal =

work page 2002
[20]

2021 , journal =

Fast, cheap, and turbulent---Global ocean modeling with GPU acceleration in python , author =. 2021 , journal =

work page 2021
[21]

Thetis coastal ocean model: Discontinuous Galerkin discretization for the three-dimensional hydrostatic equations , shorttitle =

K. Thetis coastal ocean model: Discontinuous Galerkin discretization for the three-dimensional hydrostatic equations , shorttitle =. 2018 , journal =

work page 2018
[22]

Korn, P. and Br. ICON-O: The ocean component of the ICON earth system model---Global simulation characteristics and local telescoping capability , shorttitle =. 2022 , journal =

work page 2022
[23]

2022 , month = mar, publisher =

NEMO ocean engine [Software] , author =. 2022 , month = mar, publisher =. doi:10.5281/zenodo.6334656 , urldate =

work page doi:10.5281/zenodo.6334656 2022
[24]

1997 , journal =

A finite-volume, incompressible Navier-Stokes model for studies of the ocean on parallel computers , author =. 1997 , journal =

work page 1997
[25]

1971 , journal =

Oceanic diffusion diagrams , author =. 1971 , journal =

work page 1971
[26]

High performance regional ocean modeling with GPU acceleration , booktitle =

Panzer, Ian and Lines, Spencer and Mak, Jason and Choboter, Paul and Lupo, Chris , year =. High performance regional ocean modeling with GPU acceleration , booktitle =

work page
[27]

and McWilliams, James C

Shchepetkin, Alexander F. and McWilliams, James C. , year =. The regional oceanic modeling system (ROMS): A split-explicit, free-surface, topography-following-coordinate oceanic model , shorttitle =. Ocean Modelling , volume =

work page
[28]

2025 , journal =

A GPU-based ocean dynamical core for routine mesoscale-resolving climate simulations , author =. 2025 , journal =

work page 2025
[29]

and Hill, Christopher and Ardakani, Matin Raayai and Blaschke, Johannes and Campin, Jean-Michel and Churavy, Valentin and Constantinou, Navid C

Silvestri, Simone and Wagner, Gregory L. and Hill, Christopher and Ardakani, Matin Raayai and Blaschke, Johannes and Campin, Jean-Michel and Churavy, Valentin and Constantinou, Navid C. and Edelman, Alan and Marshall, John and Ramadhan, Ali and Souza, Andre and Ferrari, Raffaele , year =. Oceananigans.jl: A Julia library that achieves breakthrough resolut...

work page doi:10.48550/arxiv.2309.06662
[30]

, year =

Smagorinsky, J. , year =. General circulation experiments with the primitive equations: I. The basic experiment , shorttitle =. Monthly Weather Review , volume =

work page
[31]

2004 , month = nov, journal =

A finite element model for the Venice Lagoon: Development, setup, calibration, and validation , author =. 2004 , month = nov, journal =. doi:10.1016/j.jmarsys.2004.05.009 , urldate =

work page doi:10.1016/j.jmarsys.2004.05.009 2004
[32]

2025 , month = feb, number =

High-level, high-resolution ocean modeling at all scales with Oceananigans , author =. 2025 , month = feb, number =. doi:10.48550/arXiv.2502.14148 , urldate =. 2502.14148 , archiveprefix =

work page doi:10.48550/arxiv.2502.14148 2025
[33]

and Danilov, S

Wang, Q. and Danilov, S. and Sidorenko, D. and Timmermann, R. and Wekerle, C. and Wang, X. and Jung, T. and Schr. The finite element sea ice-ocean model (FESOM) v1.4: Formulation of an ocean general circulation model , shorttitle =. 2014 , month = apr, journal =. doi:10.5194/gmd-7-663-2014 , urldate =

work page doi:10.5194/gmd-7-663-2014 2014
[34]

2024 , month = nov, journal =

Accelerating LASG/IAP climate system ocean model version 3 for performance portability using Kokkos , author =. 2024 , month = nov, journal =. doi:10.1016/j.future.2024.06.029 , urldate =

work page doi:10.1016/j.future.2024.06.029 2024
[35]

2008 , month = mar, journal =

A basin- to channel-scale unstructured grid hurricane storm surge model applied to southern Louisiana , author =. 2008 , month = mar, journal =. doi:10.1175/2007MWR1946.1 , urldate =

work page doi:10.1175/2007mwr1946.1 2008
[36]

and Huang, Xiaomeng and Zhang, Yan and Fu, Haohuan and Oey, Lie-Yauw and Xu, Fanghua and Yang, G

Xu, S. and Huang, Xiaomeng and Zhang, Yan and Fu, Haohuan and Oey, Lie-Yauw and Xu, Fanghua and Yang, G. , year =. gpuPOM: A GPU-based Princeton Ocean Model , shorttitle =. Geoscientific Model Development Discussions , volume =

work page
[37]

2016 , month = jun, journal =

Seamless cross-scale modeling with SCHISM , author =. 2016 , month = jun, journal =. doi:10.1016/j.ocemod.2016.05.002 , urldate =

work page doi:10.1016/j.ocemod.2016.05.002 2016
[38]

2008 , month = aug, journal =

A multi-scale model of the hydrodynamics of the whole Great Barrier Reef , author =. 2008 , month = aug, journal =. doi:10.1016/j.ecss.2008.03.016 , urldate =

work page doi:10.1016/j.ecss.2008.03.016 2008
[39]

2023 , month = jun, journal =

Biophysical model resolution affects coral connectivity estimates , author =. 2023 , month = jun, journal =. doi:10.1038/s41598-023-36158-5 , urldate =

work page doi:10.1038/s41598-023-36158-5 2023
[40]

2010 , journal =

Multi-scale modelling of coastal, shelf, and global ocean dynamics , author =. 2010 , journal =

work page 2010
[41]

2006 , month = dec, journal =

Algorithms for density, potential temperature, conservative temperature, and the freezing temperature of seawater , author =. 2006 , month = dec, journal =. doi:10.1175/JTECH1946.1 , urldate =

work page doi:10.1175/jtech1946.1 2006
[42]

2013 , month = dec, journal =

Multiscale modeling of coastal, shelf, and global ocean dynamics , author =. 2013 , month = dec, journal =. doi:10.1007/s10236-013-0655-8 , urldate =

work page doi:10.1007/s10236-013-0655-8 2013
[43]

2013 , month = jan, journal =

A baroclinic discontinuous Galerkin finite element model for coastal flows , author =. 2013 , month = jan, journal =. doi:10.1016/j.ocemod.2012.09.009 , urldate =

work page doi:10.1016/j.ocemod.2012.09.009 2013
[44]

2014 , month = jan, journal =

An efficient parallel implementation of explicit multirate Runge--Kutta schemes for discontinuous Galerkin computations , author =. 2014 , month = jan, journal =. doi:10.1016/j.jcp.2013.07.041 , urldate =

work page doi:10.1016/j.jcp.2013.07.041 2014
[45]

2003 , month = jan, journal =

A generic length-scale equation for geophysical turbulence models , author =. 2003 , month = jan, journal =

work page 2003
[46]

2025 , month = jan, journal =

A multi-scale IMEX second-order Runge--Kutta method for 3D hydrodynamic ocean models , author =. 2025 , month = jan, journal =. doi:10.1016/j.jcp.2024.113482 , urldate =

work page doi:10.1016/j.jcp.2024.113482 2025
[47]

2023 , month = dec, journal =

A split-explicit second-order Runge--Kutta method for solving 3D hydrodynamic equations , author =. 2023 , month = dec, journal =. doi:10.1016/j.ocemod.2023.102273 , urldate =

work page doi:10.1016/j.ocemod.2023.102273 2023
[48]

2020 , month = jun, journal =

Discontinuous Galerkin discretization for two-equation turbulence closure models , author =. 2020 , month = jun, journal =. doi:10.1016/j.ocemod.2020.101619 , urldate =

work page doi:10.1016/j.ocemod.2020.101619 2020
[49]

Frontiers in Applied Mathematics, vol

Discontinuous Galerkin methods for solving elliptic and parabolic equations , author =. 2008 , month = jan, series =. doi:10.1137/1.9780898717440 , isbn =

work page doi:10.1137/1.9780898717440 2008
[50]

2014 , journal =

Penalty-free discontinuous Galerkin methods for incompressible Navier--Stokes equations , author =. 2014 , journal =

work page 2014
[51]

Proceedings of the ACM/IEEE Supercomputing Conference (SC) , year=

The TOP500 list and progress in high-performance computing , author=. Proceedings of the ACM/IEEE Supercomputing Conference (SC) , year=

work page

[1] [1]

2020 , publisher =

Allen Coral Atlas [Dataset] , shorttitle =. 2020 , publisher =. doi:10.5281/zenodo.3833242 , urldate =

work page doi:10.5281/zenodo.3833242 2020

[2] [2]

2021 , doi =

Global Distribution of Coral Reefs, Compiled from Multiple Sources Including the Millennium Coral Reef Mapping Project [Dataset] , author =. 2021 , doi =

work page 2021

[3] [3]

2020 , publisher =

AusBathyTopo (Great Barrier Reef) 30m 2017 -- A Regional-Scale Depth Model (20170025C) [Dataset] , author =. 2020 , publisher =. doi:10.4225/25/5a207b36022d2 , urldate =

work page doi:10.4225/25/5a207b36022d2 2017

[4] [4]

2020 , publisher =

High Resolution Depth Model for the Great Barrier Reef and Coral Sea 100 m [Dataset] , author =. 2020 , publisher =. doi:10.26186/5E2F8BB629D07 , urldate =

work page doi:10.26186/5e2f8bb629d07 2020

[5] [5]

2023 , publisher =

Torres Strait Bathymetry 30m 2020 -- A High-Resolution Depth Model (20200021C) [Dataset] , author =. 2023 , publisher =. doi:10.26186/144348 , urldate =

work page doi:10.26186/144348 2020

[6] [6]

2020 , month = mar, publisher =

Gulf of Papua Bathymetry Raster Dataset [Dataset] , author =. 2020 , month = mar, publisher =. doi:10.6084/m9.figshare.11986797.v1 , urldate =

work page doi:10.6084/m9.figshare.11986797.v1 2020

[7] [7]

2015 , urldate =

Planet Dump Retrieved from https://planet.osm.org [Dataset] , author =. 2015 , urldate =

work page 2015

[8] [8]

2023 , publisher =

Bureau of Meteorology atmospheric high-resolution regional reanalysis for Australia -- Version 2 (BARRA2) [Dataset] , author =. 2023 , publisher =

work page 2023

[9] [9]

Bluelink

Chamberlain, Matthew and. Bluelink. doi:10.25914/2WXJ-VT48 , urldate =

work page doi:10.25914/2wxj-vt48

[10] [10]

2002 , journal =

An oceanic general circulation model framed in hybrid isopycnic-Cartesian coordinates , author =. 2002 , journal =

work page 2002

[11] [11]

2022 , institution =

BARRA2: Development of the next-generation Australian regional atmospheric reanalysis , author =. 2022 , institution =

work page 2022

[12] [12]

2024 , institution =

BARRA-C2: Development of the kilometre-scale downscaled atmospheric reanalysis over Australia , author =. 2024 , institution =

work page 2024

[13] [13]

2023 , journal =

Simulations in the era of exascale Computing , author =. 2023 , journal =

work page 2023

[14] [14]

, year =

Chen, Changsheng and Liu, Hedong and Beardsley, Robert C. , year =. An unstructured grid, finite-volume, three-dimensional, primitive equations ocean model: application to coastal ocean and estuaries , shorttitle =. Journal of Atmospheric and Oceanic Technology , volume =

work page

[15] [15]

2021 , journal =

Evolution of the graphics processing unit (GPU) , author =. 2021 , journal =

work page 2021

[16] [16]

2017 , journal =

The finite-volume sea ice-ocean model (FESOM2) , author =. 2017 , journal =

work page 2017

[17] [17]

2015 , journal =

Resolving eddies by local mesh refinement , author =. 2015 , journal =

work page 2015

[18] [18]

2013 , journal =

A parallel local timestepping Runge--Kutta discontinuous Galerkin method with applications to coastal ocean modeling , author =. 2013 , journal =

work page 2013

[19] [19]

2002 , journal =

Efficient inverse modeling of barotropic ocean tides , author =. 2002 , journal =

work page 2002

[20] [20]

2021 , journal =

Fast, cheap, and turbulent---Global ocean modeling with GPU acceleration in python , author =. 2021 , journal =

work page 2021

[21] [21]

Thetis coastal ocean model: Discontinuous Galerkin discretization for the three-dimensional hydrostatic equations , shorttitle =

K. Thetis coastal ocean model: Discontinuous Galerkin discretization for the three-dimensional hydrostatic equations , shorttitle =. 2018 , journal =

work page 2018

[22] [22]

Korn, P. and Br. ICON-O: The ocean component of the ICON earth system model---Global simulation characteristics and local telescoping capability , shorttitle =. 2022 , journal =

work page 2022

[23] [23]

2022 , month = mar, publisher =

NEMO ocean engine [Software] , author =. 2022 , month = mar, publisher =. doi:10.5281/zenodo.6334656 , urldate =

work page doi:10.5281/zenodo.6334656 2022

[24] [24]

1997 , journal =

A finite-volume, incompressible Navier-Stokes model for studies of the ocean on parallel computers , author =. 1997 , journal =

work page 1997

[25] [25]

1971 , journal =

Oceanic diffusion diagrams , author =. 1971 , journal =

work page 1971

[26] [26]

High performance regional ocean modeling with GPU acceleration , booktitle =

Panzer, Ian and Lines, Spencer and Mak, Jason and Choboter, Paul and Lupo, Chris , year =. High performance regional ocean modeling with GPU acceleration , booktitle =

work page

[27] [27]

and McWilliams, James C

Shchepetkin, Alexander F. and McWilliams, James C. , year =. The regional oceanic modeling system (ROMS): A split-explicit, free-surface, topography-following-coordinate oceanic model , shorttitle =. Ocean Modelling , volume =

work page

[28] [28]

2025 , journal =

A GPU-based ocean dynamical core for routine mesoscale-resolving climate simulations , author =. 2025 , journal =

work page 2025

[29] [29]

and Hill, Christopher and Ardakani, Matin Raayai and Blaschke, Johannes and Campin, Jean-Michel and Churavy, Valentin and Constantinou, Navid C

Silvestri, Simone and Wagner, Gregory L. and Hill, Christopher and Ardakani, Matin Raayai and Blaschke, Johannes and Campin, Jean-Michel and Churavy, Valentin and Constantinou, Navid C. and Edelman, Alan and Marshall, John and Ramadhan, Ali and Souza, Andre and Ferrari, Raffaele , year =. Oceananigans.jl: A Julia library that achieves breakthrough resolut...

work page doi:10.48550/arxiv.2309.06662

[30] [30]

, year =

Smagorinsky, J. , year =. General circulation experiments with the primitive equations: I. The basic experiment , shorttitle =. Monthly Weather Review , volume =

work page

[31] [31]

2004 , month = nov, journal =

A finite element model for the Venice Lagoon: Development, setup, calibration, and validation , author =. 2004 , month = nov, journal =. doi:10.1016/j.jmarsys.2004.05.009 , urldate =

work page doi:10.1016/j.jmarsys.2004.05.009 2004

[32] [32]

2025 , month = feb, number =

High-level, high-resolution ocean modeling at all scales with Oceananigans , author =. 2025 , month = feb, number =. doi:10.48550/arXiv.2502.14148 , urldate =. 2502.14148 , archiveprefix =

work page doi:10.48550/arxiv.2502.14148 2025

[33] [33]

and Danilov, S

Wang, Q. and Danilov, S. and Sidorenko, D. and Timmermann, R. and Wekerle, C. and Wang, X. and Jung, T. and Schr. The finite element sea ice-ocean model (FESOM) v1.4: Formulation of an ocean general circulation model , shorttitle =. 2014 , month = apr, journal =. doi:10.5194/gmd-7-663-2014 , urldate =

work page doi:10.5194/gmd-7-663-2014 2014

[34] [34]

2024 , month = nov, journal =

Accelerating LASG/IAP climate system ocean model version 3 for performance portability using Kokkos , author =. 2024 , month = nov, journal =. doi:10.1016/j.future.2024.06.029 , urldate =

work page doi:10.1016/j.future.2024.06.029 2024

[35] [35]

2008 , month = mar, journal =

A basin- to channel-scale unstructured grid hurricane storm surge model applied to southern Louisiana , author =. 2008 , month = mar, journal =. doi:10.1175/2007MWR1946.1 , urldate =

work page doi:10.1175/2007mwr1946.1 2008

[36] [36]

and Huang, Xiaomeng and Zhang, Yan and Fu, Haohuan and Oey, Lie-Yauw and Xu, Fanghua and Yang, G

Xu, S. and Huang, Xiaomeng and Zhang, Yan and Fu, Haohuan and Oey, Lie-Yauw and Xu, Fanghua and Yang, G. , year =. gpuPOM: A GPU-based Princeton Ocean Model , shorttitle =. Geoscientific Model Development Discussions , volume =

work page

[37] [37]

2016 , month = jun, journal =

Seamless cross-scale modeling with SCHISM , author =. 2016 , month = jun, journal =. doi:10.1016/j.ocemod.2016.05.002 , urldate =

work page doi:10.1016/j.ocemod.2016.05.002 2016

[38] [38]

2008 , month = aug, journal =

A multi-scale model of the hydrodynamics of the whole Great Barrier Reef , author =. 2008 , month = aug, journal =. doi:10.1016/j.ecss.2008.03.016 , urldate =

work page doi:10.1016/j.ecss.2008.03.016 2008

[39] [39]

2023 , month = jun, journal =

Biophysical model resolution affects coral connectivity estimates , author =. 2023 , month = jun, journal =. doi:10.1038/s41598-023-36158-5 , urldate =

work page doi:10.1038/s41598-023-36158-5 2023

[40] [40]

2010 , journal =

Multi-scale modelling of coastal, shelf, and global ocean dynamics , author =. 2010 , journal =

work page 2010

[41] [41]

2006 , month = dec, journal =

Algorithms for density, potential temperature, conservative temperature, and the freezing temperature of seawater , author =. 2006 , month = dec, journal =. doi:10.1175/JTECH1946.1 , urldate =

work page doi:10.1175/jtech1946.1 2006

[42] [42]

2013 , month = dec, journal =

Multiscale modeling of coastal, shelf, and global ocean dynamics , author =. 2013 , month = dec, journal =. doi:10.1007/s10236-013-0655-8 , urldate =

work page doi:10.1007/s10236-013-0655-8 2013

[43] [43]

2013 , month = jan, journal =

A baroclinic discontinuous Galerkin finite element model for coastal flows , author =. 2013 , month = jan, journal =. doi:10.1016/j.ocemod.2012.09.009 , urldate =

work page doi:10.1016/j.ocemod.2012.09.009 2013

[44] [44]

2014 , month = jan, journal =

An efficient parallel implementation of explicit multirate Runge--Kutta schemes for discontinuous Galerkin computations , author =. 2014 , month = jan, journal =. doi:10.1016/j.jcp.2013.07.041 , urldate =

work page doi:10.1016/j.jcp.2013.07.041 2014

[45] [45]

2003 , month = jan, journal =

A generic length-scale equation for geophysical turbulence models , author =. 2003 , month = jan, journal =

work page 2003

[46] [46]

2025 , month = jan, journal =

A multi-scale IMEX second-order Runge--Kutta method for 3D hydrodynamic ocean models , author =. 2025 , month = jan, journal =. doi:10.1016/j.jcp.2024.113482 , urldate =

work page doi:10.1016/j.jcp.2024.113482 2025

[47] [47]

2023 , month = dec, journal =

A split-explicit second-order Runge--Kutta method for solving 3D hydrodynamic equations , author =. 2023 , month = dec, journal =. doi:10.1016/j.ocemod.2023.102273 , urldate =

work page doi:10.1016/j.ocemod.2023.102273 2023

[48] [48]

2020 , month = jun, journal =

Discontinuous Galerkin discretization for two-equation turbulence closure models , author =. 2020 , month = jun, journal =. doi:10.1016/j.ocemod.2020.101619 , urldate =

work page doi:10.1016/j.ocemod.2020.101619 2020

[49] [49]

Frontiers in Applied Mathematics, vol

Discontinuous Galerkin methods for solving elliptic and parabolic equations , author =. 2008 , month = jan, series =. doi:10.1137/1.9780898717440 , isbn =

work page doi:10.1137/1.9780898717440 2008

[50] [50]

2014 , journal =

Penalty-free discontinuous Galerkin methods for incompressible Navier--Stokes equations , author =. 2014 , journal =

work page 2014

[51] [51]

Proceedings of the ACM/IEEE Supercomputing Conference (SC) , year=

The TOP500 list and progress in high-performance computing , author=. Proceedings of the ACM/IEEE Supercomputing Conference (SC) , year=

work page