pith. machine review for the scientific record. sign in

arxiv: 2605.07612 · v1 · submitted 2026-05-08 · ⚛️ physics.comp-ph

Recognition: no theorem link

foap4: Adaptive mesh refinement with OpenACC, MPI, and p4est

Adrian Kelly, Chun Xia, Hao Wu, H\'ector R. Olivares S\'anchez, Jannis Teunissen, Jesse Vos, Johan Hidding, Leon Oostrum, Olaf Willocx, Oliver Porth, Rony Keppens, Victor Azizi, Yuhao Zhou

Authors on Pith no claims yet

Pith reviewed 2026-05-11 02:56 UTC · model grok-4.3

classification ⚛️ physics.comp-ph
keywords adaptive mesh refinementOpenACCMPIp4estGPUFortranEuler equationsgas dynamics
0
0 comments X

The pith

AMR simulations run efficiently on GPUs with OpenACC and MPI even using small 8^3 grid blocks

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces foap4, a Fortran framework for adaptive mesh refinement that combines OpenACC for GPU acceleration, MPI for distributed parallelism, and the p4est library for mesh management. It tests the setup by solving Euler's equations of gas dynamics with explicit time integration in both 2D and 3D, on static meshes and adaptive meshes, across different problem sizes and hardware. The results establish that good performance is possible on GPUs without requiring large grid blocks. A sympathetic reader would care because many scientific computing codes remain in Fortran, and this work maps a route to using modern accelerators while preserving the flexibility of adaptive meshing.

Core claim

foap4 demonstrates that AMR simulations of gas dynamics can be carried out efficiently on GPUs with OpenACC and MPI, even when using relatively small grid blocks of 8^3 or 16^3 cells, as shown through 2D and 3D benchmarks on both static and adaptive meshes of varying sizes.

What carries the argument

The foap4 framework, which integrates OpenACC directives for GPU offloading with MPI communication and the p4est library for adaptive mesh handling inside a Fortran codebase.

If this is right

  • Existing Fortran AMR codes can be updated for GPU use via OpenACC without major architectural changes.
  • Small grid blocks remain practical for maintaining high adaptivity in GPU-based AMR.
  • Both static and adaptive mesh modes achieve usable performance for explicit Euler solvers in 2D and 3D.
  • The approach scales across problem sizes and different accelerator hardware.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same integration pattern could be tested on other hyperbolic or parabolic equation systems.
  • OpenACC may lower the barrier for GPU porting compared with lower-level alternatives for legacy Fortran codes.
  • Energy consumption and memory bandwidth limits on GPUs could become the next performance bottlenecks to examine.
  • Multi-node GPU clusters might reveal additional communication overheads not fully captured in the current tests.

Load-bearing premise

The chosen benchmark problems and hardware configurations represent the production workloads that existing Fortran AMR codes would encounter when ported to GPUs.

What would settle it

A measurement showing substantially lower efficiency or poor scaling when the same framework is applied to larger, more complex real-world AMR problems on a wider range of GPU hardware.

Figures

Figures reproduced from arXiv: 2605.07612 by Adrian Kelly, Chun Xia, Hao Wu, H\'ector R. Olivares S\'anchez, Jannis Teunissen, Jesse Vos, Johan Hidding, Leon Oostrum, Olaf Willocx, Oliver Porth, Rony Keppens, Victor Azizi, Yuhao Zhou.

Figure 1
Figure 1. Figure 1: Schematic illustration of a quadtree mesh, with blocks containing 4 × 4 cells and a single layer of ghost cells (shaded gray). To fill the ghost cells around a block data has to be exchanged with neighboring blocks, which can be at the same level (light blue) or a different refinement level (purple), and which might reside on a different processor or GPU. The figure depicts a very simple AMR mesh; typical … view at source ↗
Figure 2
Figure 2. Figure 2: Overview of foap4 architecture with main public routines. Normally, one MPI rank is used per GPU. The GPU part can also be executed on a regular CPU. quadrant 4x4 block p4est mesh foap4 mesh [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of the difference between the p4est mesh and the foap4 mesh. Every quadrant in p4est (a single cell) corresponds to a block of cells (including ghost cells) in foap4. • The creation of an initial ‘brick’ geometry, consisting of a rectangular block of 𝑁𝑥×𝑁𝑦×𝑁𝑧 octrees (where 𝑁𝑥 , 𝑁𝑦 and 𝑁𝑧 do not need to be equal) • Obtaining the refinement levels and coordinates of local blocks (i.e., of a sin… view at source ↗
Figure 4
Figure 4. Figure 4: OpenACC loop style used in foap4. There is a gang loop over the blocks, and a collapsed multidimensional loop over the cells of a block. Here bx is the block size. Foap4 will therefore be less efficient when there are only a few large grid blocks. However, in typical AMR simulations it is more attractive to use many blocks of relatively small size (e.g. 163 ), so that the mesh can better adapt to the solut… view at source ↗
Figure 5
Figure 5. Figure 5: Code fragment showing how ghost cells at the same refinement level (SRL) are filled using OpenACC loops. The case shown corresponds to a face in the +𝑥 direction, with both sides present on the same MPI rank, in 2D. Since both sides are present, the other side is filled as well. The gc_srl_local array is sorted by face direction, and the gc_srl_local_iface allows to loop over one of these directions. Note … view at source ↗
Figure 6
Figure 6. Figure 6: Illustration of test for flux fixing. A scalar advection equation is solved on a statically refined mesh, so that the solution crosses several refinement boundaries. The pictures correspond to a 2D case with blocks of size 322 and five levels of initial refinement. Each block is indicated by a square. where 𝛾 the ratio of specific heats, and sound speeds are computed as 𝑐 = √ 𝛾 𝑝∕𝜌. (16) As discussed in se… view at source ↗
Figure 7
Figure 7. Figure 7: shows 𝐿1 and 𝐿2 errors as a function of grid spacing Δ𝑥 using two limiters: the van Leer limiter, which is up to second order accurate, and the WENO5 limiter, which is up to fifth order accurate. For time stepping, Heun’s method was used for the van Leer case and the classic RK4 scheme for the WENO5 case. To prevent temporal errors with the fourth-order accurate RK4 scheme from becoming important with the … view at source ↗
Figure 8
Figure 8. Figure 8: GPU strong scaling results for the Rayleigh–Taylor test case on a uniform 5123 grid. Block sizes of 8 3 , 163 and 323 were used, as indicated in the legend. The tests were performed on nodes containing four H100 GPUs. The left￾most results were performed using 1 and 2 H100s, with 1 GPU corresponding to 1/4 node. Parallel efficiency is normalized to the case using one full node. When using multiple nodes we… view at source ↗
Figure 9
Figure 9. Figure 9: CPU strong scaling results for the Rayleigh–Taylor test case on a uniform 5123 grid. The tests were performed on nodes containing two AMD Genoa 9654 CPUs, for a total of 192 cores per node. The left-most results were performed using 48 and 96 cores. Parallel efficiency is normalized to the case using one full node. 1.5; and (c) the block counts at the different refinement levels. The lowest refinement leve… view at source ↗
Figure 11
Figure 11. Figure 11: (a) Evolution of the mesh over time in the Rayleigh–Taylor test case with AMR. The colors correspond to refinement levels 2 (blue) to 6 (red). Each cell corresponds to a 163 block. Part of the computational domain, which is the unit cube, is cut out. (b) Density 𝜌 at 𝑡 = 1.5, rendered with VisIt. (c) Cumulative block count per refinement level. At 𝑡 = 1.5 there are about 105 blocks and 0.4 × 109 cells. 0.… view at source ↗
Figure 13
Figure 13. Figure 13: Breakdown of computational cost of foap4 for the Rayleigh–Taylor test case with AMR on H100 GPUs. There are two stages of filling ghost cells (R1 and R2), as discussed in section 2.6. For each round, the time for filling ghost cells and for filling buffers (to be sent) is shown. The communication time (MPI comm.) is shown for both rounds combined. The cost of changing the mesh is divided into the p4est pa… view at source ↗
Figure 12
Figure 12. Figure 12: GPU strong scaling results for the Rayleigh–Taylor test case with AMR, using blocks of 163 . The percentage of time spent in foap4 excluding flux computation and solution update is shown at the bottom, see also figure 13. both GPUs and CPUs. We believe the foap4 code has met its original design goal: to provide a performance reference so that better informed decisions can be made about adding GPU support … view at source ↗
Figure 14
Figure 14. Figure 14: CPU strong scaling results for the Rayleigh–Taylor test case with AMR, using blocks of 163 . will appear two or three times, respectively. Ensuring non￾negativity thus requires 𝜃 ≤ 2 in 2D and 𝜃 ≤ 4∕3 in 3D. CRediT authorship contribution statement Jannis Teunissen: Conceptualization of this study, Fund￾ing acquisition, Methodology, Software, Visualization, Writ￾ing – original draft. Héctor R. Olivares Sá… view at source ↗
read the original abstract

GPUs and other accelerators are increasingly used for scientific computing. In the future, we want to add GPU support to parallel adaptive mesh refinement (AMR) codes written in Fortran. To understand which changes are necessary to obtain good performance we have developed foap4, an AMR framework implemented in Fortran that uses OpenACC, MPI, and the p4est library. We discuss the design and implementation of the framework. Several benchmark problems are considered, in which Euler's equations of gas dynamics are solved using explicit time integration. These benchmarks are performed in both 2D and 3D, using static and adaptive meshes, for varying problem sizes on different hardware. Our results show that AMR simulations can be carried out efficiently on GPUs with OpenACC and MPI, even when using relatively small grid blocks of $8^3$ or $16^3$ cells.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents foap4, a Fortran AMR framework that combines OpenACC for GPU acceleration, MPI for distributed parallelism, and the p4est library for mesh management. It reports benchmarks solving the Euler equations with explicit time integration on static and adaptive meshes in both 2D and 3D, across varying problem sizes and hardware, and concludes that efficient GPU performance is achievable even with small blocks of 8^3 or 16^3 cells.

Significance. If the reported timings are robust, the work provides a concrete, portable path for adding GPU support to existing Fortran AMR codes without requiring large block sizes or complete rewrites, which addresses a practical barrier in computational physics and fluid dynamics on accelerator hardware.

major comments (2)
  1. [Results] The efficiency claim for 8^3/16^3 blocks (abstract and results) rests on overall wall-clock timings but does not isolate the relative cost of MPI halo exchanges or the increased number of OpenACC kernel launches that accompany adaptive regridding; without such a breakdown it is unclear whether the result generalizes beyond the chosen Euler benchmarks.
  2. [Results] The manuscript states that both static and adaptive meshes were tested, yet the performance tables do not report separate overheads for mesh adaptation steps versus the time-stepping loop, leaving open the possibility that the small-block efficiency is an artifact of the specific test problems rather than a general property of the foap4 design.
minor comments (2)
  1. [Methods] Notation for block sizes (e.g., 8^3) is used without an explicit definition of the underlying cell count or ghost-layer width in the methods section.
  2. [Results] The hardware configurations and compiler flags used for the OpenACC runs should be listed in a dedicated table for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address the major comments point by point below, with partial revisions to add clarifying discussion on performance overheads.

read point-by-point responses
  1. Referee: [Results] The efficiency claim for 8^3/16^3 blocks (abstract and results) rests on overall wall-clock timings but does not isolate the relative cost of MPI halo exchanges or the increased number of OpenACC kernel launches that accompany adaptive regridding; without such a breakdown it is unclear whether the result generalizes beyond the chosen Euler benchmarks.

    Authors: We agree that isolating MPI halo exchange costs and extra kernel launches from regridding would strengthen generalization claims. In the revised manuscript we have added text noting that regridding occurs infrequently (every 10-20 steps in the reported tests) relative to the explicit time-stepping loop. The static-mesh benchmarks, which incur no regridding overhead, exhibit comparable small-block efficiency, indicating that the result is not driven solely by the adaptive overheads present in the Euler cases. A full quantitative breakdown would require additional instrumentation not performed in the original study. revision: partial

  2. Referee: [Results] The manuscript states that both static and adaptive meshes were tested, yet the performance tables do not report separate overheads for mesh adaptation steps versus the time-stepping loop, leaving open the possibility that the small-block efficiency is an artifact of the specific test problems rather than a general property of the foap4 design.

    Authors: The tables report aggregate wall-clock times for complete runs. We have revised the results section to state explicitly the adaptation frequency used and to highlight that time-stepping dominates runtime in both static and adaptive configurations. Separate adaptation overheads are not tabulated because they were secondary to demonstrating overall feasibility with small blocks; however, the direct comparison between static and adaptive timings in the same Euler benchmarks shows that small-block performance persists when adaptation is absent. The chosen test problems are standard for explicit AMR gas dynamics and the framework design itself is not problem-specific. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical implementation and benchmark results

full rationale

The paper describes the design and implementation of the foap4 AMR framework using Fortran, OpenACC, MPI, and p4est, followed by empirical performance measurements on benchmark problems solving Euler equations in 2D/3D with static and adaptive meshes. No derivation chain, first-principles predictions, or fitted parameters are present; all claims rest on direct timing results from hardware runs. No self-citations, ansatzes, or renamings reduce results to inputs by construction. The work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an engineering and benchmarking paper. No mathematical free parameters, axioms, or invented physical entities are introduced; the central claim rests on software implementation choices and empirical timing measurements.

pith-pipeline@v0.9.0 · 5496 in / 1085 out tokens · 43466 ms · 2026-05-11T02:56:47.709405+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages

  1. [1]

    Fypp – python powered fortran metaprogramming

    Aradi, B., 2026. Fypp – python powered fortran metaprogramming. https://github.com/aradi/fypp. Accessed: 2026-04-16

  2. [2]

    P4est : Scalable Algorithms for Parallel Adaptive Mesh Refinement on Forests of Octrees

    Burstedde, C., Wilcox, L.C., Ghattas, O., 2011. P4est : Scalable Algorithms for Parallel Adaptive Mesh Refinement on Forests of Octrees. SIAM Journal on Scientific Computing 33, 1103–1133. doi:10.1137/100791634

  3. [3]

    GPUAccelerationofanEstablishedSolarMHDCodeus- ing OpenACC

    Caplan, R.M., Linker, J.A., Mikić, Z., Downs, C., Török, T., Titov, V.S.,2019. GPUAccelerationofanEstablishedSolarMHDCodeus- ing OpenACC. Journal of Physics: Conference Series 1225, 012012. doi:10.1088/1742-6596/1225/1/012012

  4. [4]

    On Godunov-Type Methods for Gas Dynamics

    Einfeldt, B., 1988. On Godunov-Type Methods for Gas Dynamics. SIAM Journal on Numerical Analysis 25, 294–318. doi:10.1137/ 0725021

  5. [5]

    SIAMReview43,89–112

    Gottlieb,S.,Shu,C.W.,Tadmor,E.,2001.StrongStability-Preserving High-OrderTimeDiscretizationMethods. SIAMReview43,89–112. doi:10.1137/S003614450036757X

  6. [6]

    OnUpstreamDifferencing and Godunov-Type Schemes for Hyperbolic Conservation Laws, in: Hussaini, M.Y., Van Leer, B., Van Rosendale, J

    Harten,A.,Lax,P.D.,VanLeer,B.,1997. OnUpstreamDifferencing and Godunov-Type Schemes for Hyperbolic Conservation Laws, in: Hussaini, M.Y., Van Leer, B., Van Rosendale, J. (Eds.), Upwind and High-Resolution Schemes. Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 53–79. doi:10.1007/978-3-642-60543-7_4

  7. [7]

    Efficient Implementation of Weighted ENO Schemes

    Jiang, G.S., Shu, C.W., 1996. Efficient Implementation of Weighted ENO Schemes. Journal of Computational Physics 126, 202–228. doi:10.1006/jcph.1996.0130

  8. [8]

    2023, A&A, 673, A66, doi: 10.1051/0004-6361/202245359

    Keppens,R.,PopescuBraileanu,B.,Zhou,Y.,Ruan,W.,Xia,C.,Guo, Y., Claes, N., Bacchini, F., 2023. MPI-AMRVAC 3.0: Updates to an open-source simulation framework. Astronomy & Astrophysics 673, A66. doi:10.1051/0004-6361/202245359

  9. [9]

    MPI-AMRVAC: A parallel, grid-adaptive PDE toolkit

    Keppens,R.,Teunissen,J.,Xia,C.,Porth,O.,2021. MPI-AMRVAC: A parallel, grid-adaptive PDE toolkit. Computers & Mathematics with Applications 81, 316–333. doi:10.1016/j.camwa.2020.03.023

  10. [10]

    A robust upwind discretization method for ad- vection, diffusion and source terms, in: Vreugdenhil, C.B., Koren, B

    Koren, B., 1993. A robust upwind discretization method for ad- vection, diffusion and source terms, in: Vreugdenhil, C.B., Koren, B. (Eds.), Numerical Methods for Advection-Diffusion Problems. Braunschweig/Wiesbaden: Vieweg, pp. 117–138

  11. [11]

    Accelerating a C++ CFD Code with OpenACC, in: 2014 First Workshop on AcceleratorProgrammingUsingDirectives,IEEE,NewOrleans,LA, USA

    Kraus, J., Schlottke, M., Adinetz, A., Pleiter, D., 2014. Accelerating a C++ CFD Code with OpenACC, in: 2014 First Workshop on AcceleratorProgrammingUsingDirectives,IEEE,NewOrleans,LA, USA. pp. 47–54. doi:10.1109/WACCPD.2014.11

  12. [12]

    IDEFIX: A versatile performance-portableGodunovcodeforastrophysicalflows

    Lesur, G.R.J., Baghdadi, S., Wafflard-Fernandez, G., Mauxion, J., Robert, C.M.T., Van Den Bossche, M., 2023. IDEFIX: A versatile performance-portableGodunovcodeforastrophysicalflows. Astron- omy & Astrophysics 677, A9. doi:10.1051/0004-6361/202346005

  13. [13]

    H-AMR: A New GPU-accelerated GRMHD Code for Exascale Computing with 3D Adaptive Mesh Refinement and Local Adaptive Time Stepping

    Liska, M.T.P., Chatterjee, K., Issa, D., Yoon, D., Kaaz, N., Tchekhovskoy, A., Van Eijnatten, D., Musoke, G., Hesp, C., Rohoza, V., Markoff, S., Ingram, A., Van Der Klis, M., 2022. H-AMR: A New GPU-accelerated GRMHD Code for Exascale Computing with 3D Adaptive Mesh Refinement and Local Adaptive Time Stepping. The Astrophysical Journal Supplement Series 26...

  14. [14]

    An adaptive finite element scheme for transient problems in CFD

    Lohner, R., 1987. An adaptive finite element scheme for transient problems in CFD. Computer Methods in Applied Mechanics and Engineering 61, 323–338. doi:10.1016/0045-7825(87)90098-3

  15. [15]

    On the Performance Portability of OpenACC, OpenMP, Kokkos and RAJA, in: International Conference on High PerformanceComputinginAsia-PacificRegion,ACM,VirtualEvent Japan

    Marowka, A., 2022. On the Performance Portability of OpenACC, OpenMP, Kokkos and RAJA, in: International Conference on High PerformanceComputinginAsia-PacificRegion,ACM,VirtualEvent Japan. pp. 103–114. doi:10.1145/3492805.3492806

  16. [16]

    McCall, A.J., Roy, C.J., 2017. A Multilevel Parallelism Approach with MPI and OpenACC for Complex CFD Codes, in: 23rd AIAA Computational Fluid Dynamics Conference, American Institute of Aeronautics and Astronautics, Denver, Colorado. doi:10.2514/6. 2017-3293

  17. [17]

    Agile.https: //research-software-directory.org/projects/agile

    Netherlands eScience Center, 2026. Agile.https: //research-software-directory.org/projects/agile. Accessed: 2026-04-16

  18. [18]

    The PLUTO Code on GPUs: A First Look at Eulerian MHD Methods

    Rossazza, M., Mignone, A., Bugli, M., Truzzi, S., Riha, L., Panoc, T., Vysocky, O., Shukla, N., Romeo, A., Berta, V., 2025. The PLUTO Code on GPUs: A First Look at Eulerian MHD Methods. doi:10.48550/ARXIV.2511.20337

  19. [19]

    The calculation of the interaction of non- stationaryshockwaveswithbarriers

    Rusanov, V.V., 1961. The calculation of the interaction of non- stationaryshockwaveswithbarriers. ZhurnalVychislitel’noiMatem- atiki i Matematicheskoi Fiziki 1, 267–279. Teunissen et al.:Preprint submitted to ElsevierPage 16 of 17 foap4: AMR with OpenACC, MPI, and p4est

  20. [20]

    High-Order Strong-Stability- Preserving Runge–Kutta Methods with Downwind-Biased Spatial Discretizations

    Ruuth, S.J., Spiteri, R.J., 2004. High-Order Strong-Stability- Preserving Runge–Kutta Methods with Downwind-Biased Spatial Discretizations. SIAM Journal on Numerical Analysis 42, 974–996. doi:10.1137/S0036142902419284

  21. [21]

    Gamer-2: A GPU-accelerated adaptive mesh refinement code – accuracy, performance, and scalability

    Schive, H.Y., ZuHone, J.A., Goldbaum, N.J., Turk, M.J., Gaspari, M., Cheng, C.Y., 2018. Gamer-2: A GPU-accelerated adaptive mesh refinement code – accuracy, performance, and scalability. Monthly Notices of the Royal Astronomical Society 481, 4815–4840. doi:10. 1093/mnras/sty2586

  22. [22]

    M., Mullen, P

    Stone, J.M., Mullen, P.D., Fielding, D., Grete, P., Guo, M., Kemp- ski, P., Most, E.R., White, C.J., Wong, G.N., 2024. AthenaK: A Performance-Portable Version of the Athena++ AMR Framework. doi:10.48550/arXiv.2409.16053,arXiv:2409.16053

  23. [23]

    Simulating streamer discharges in 3D with the parallel adaptive Afivo framework

    Teunissen, J., Ebert, U., 2017. Simulating streamer discharges in 3D with the parallel adaptive Afivo framework. Journal of Physics D: Applied Physics 50, 474001. doi:10.1088/1361-6463/aa8faf

  24. [24]

    Computer Physics Communications 233, 156–166

    Teunissen,J.,Ebert,U.,2018.Afivo:Aframeworkforquadtree/octree AMR with shared-memory parallelization and geometric multigrid methods. Computer Physics Communications 233, 156–166. doi:10. 1016/j.cpc.2018.06.018

  25. [25]

    The Kokkos EcoSystem: Comprehensive Performance Portability for High Performance Computing

    Trott, C., Berger-Vergiat, L., Poliakoff, D., Rajamanickam, S., Lebrun-Grandie,D.,Madsen,J.,AlAwar,N.,Gligoric,M.,Shipman, G., Womeldorff, G., 2021. The Kokkos EcoSystem: Comprehensive Performance Portability for High Performance Computing. Com- puting in Science & Engineering 23, 10–18. doi:10.1109/MCSE.2021. 3098509

  26. [26]

    Towards the ultimate conservative difference scheme III

    Van Leer, B., 1977. Towards the ultimate conservative difference scheme III. Upstream-centered finite-difference schemes for ideal compressible flow. Journal of Computational Physics 23, 263–275. doi:10.1016/0021-9991(77)90094-8

  27. [27]

    MPI-AMRVAC 2.0 for Solar and Astrophysical Applications

    Xia, C., Teunissen, J., Mellah, I.E., Chané, E., Keppens, R., 2018. MPI-AMRVAC 2.0 for Solar and Astrophysical Applications. The Astrophysical Journal Supplement Series 234, 30. doi:10.3847/ 1538-4365/aaa6c8

  28. [28]

    An improved framework of GPU computing for CFD applications on structured grids using OpenACC

    Xue, W., Jackson, C.W., Roy, C.J., 2021. An improved framework of GPU computing for CFD applications on structured grids using OpenACC. Journal of Parallel and Distributed Computing 156, 64–

  29. [29]

    doi:10.1016/j.jpdc.2021.05.010

  30. [30]

    Zhang, W., Almgren, A., Beckner, V., Bell, J., Blaschke, J., Chan, C., Day, M., Friesen, B., Gott, K., Graves, D., Katz, M., Myers, A., Nguyen, T., Nonaka, A., Rosso, M., Williams, S., Zingale, M.,

  31. [31]

    Journal of Open Source Software 4, 1370

    AMReX: A framework for block-structured adaptive mesh refinement. Journal of Open Source Software 4, 1370. doi:10.21105/ joss.01370. Teunissen et al.:Preprint submitted to ElsevierPage 17 of 17