arxiv: 2605.07612 · v1 · submitted 2026-05-08 · ⚛️ physics.comp-ph

Recognition: no theorem link

foap4: Adaptive mesh refinement with OpenACC, MPI, and p4est

Adrian Kelly, Chun Xia, Hao Wu, H\'ector R. Olivares S\'anchez, Jannis Teunissen, Jesse Vos, Johan Hidding, Leon Oostrum, Olaf Willocx, Oliver Porth, Rony Keppens, Victor Azizi, Yuhao Zhou

Authors on Pith no claims yet

Pith reviewed 2026-05-11 02:56 UTC · model grok-4.3

classification ⚛️ physics.comp-ph

keywords adaptive mesh refinementOpenACCMPIp4estGPUFortranEuler equationsgas dynamics

0 comments

The pith

AMR simulations run efficiently on GPUs with OpenACC and MPI even using small 8^3 grid blocks

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces foap4, a Fortran framework for adaptive mesh refinement that combines OpenACC for GPU acceleration, MPI for distributed parallelism, and the p4est library for mesh management. It tests the setup by solving Euler's equations of gas dynamics with explicit time integration in both 2D and 3D, on static meshes and adaptive meshes, across different problem sizes and hardware. The results establish that good performance is possible on GPUs without requiring large grid blocks. A sympathetic reader would care because many scientific computing codes remain in Fortran, and this work maps a route to using modern accelerators while preserving the flexibility of adaptive meshing.

Core claim

foap4 demonstrates that AMR simulations of gas dynamics can be carried out efficiently on GPUs with OpenACC and MPI, even when using relatively small grid blocks of 8^3 or 16^3 cells, as shown through 2D and 3D benchmarks on both static and adaptive meshes of varying sizes.

What carries the argument

The foap4 framework, which integrates OpenACC directives for GPU offloading with MPI communication and the p4est library for adaptive mesh handling inside a Fortran codebase.

If this is right

Existing Fortran AMR codes can be updated for GPU use via OpenACC without major architectural changes.
Small grid blocks remain practical for maintaining high adaptivity in GPU-based AMR.
Both static and adaptive mesh modes achieve usable performance for explicit Euler solvers in 2D and 3D.
The approach scales across problem sizes and different accelerator hardware.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same integration pattern could be tested on other hyperbolic or parabolic equation systems.
OpenACC may lower the barrier for GPU porting compared with lower-level alternatives for legacy Fortran codes.
Energy consumption and memory bandwidth limits on GPUs could become the next performance bottlenecks to examine.
Multi-node GPU clusters might reveal additional communication overheads not fully captured in the current tests.

Load-bearing premise

The chosen benchmark problems and hardware configurations represent the production workloads that existing Fortran AMR codes would encounter when ported to GPUs.

What would settle it

A measurement showing substantially lower efficiency or poor scaling when the same framework is applied to larger, more complex real-world AMR problems on a wider range of GPU hardware.

Figures

Figures reproduced from arXiv: 2605.07612 by Adrian Kelly, Chun Xia, Hao Wu, H\'ector R. Olivares S\'anchez, Jannis Teunissen, Jesse Vos, Johan Hidding, Leon Oostrum, Olaf Willocx, Oliver Porth, Rony Keppens, Victor Azizi, Yuhao Zhou.

**Figure 1.** Figure 1: Schematic illustration of a quadtree mesh, with blocks containing 4 × 4 cells and a single layer of ghost cells (shaded gray). To fill the ghost cells around a block data has to be exchanged with neighboring blocks, which can be at the same level (light blue) or a different refinement level (purple), and which might reside on a different processor or GPU. The figure depicts a very simple AMR mesh; typical … view at source ↗

**Figure 2.** Figure 2: Overview of foap4 architecture with main public routines. Normally, one MPI rank is used per GPU. The GPU part can also be executed on a regular CPU. quadrant 4x4 block p4est mesh foap4 mesh [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Illustration of the difference between the p4est mesh and the foap4 mesh. Every quadrant in p4est (a single cell) corresponds to a block of cells (including ghost cells) in foap4. • The creation of an initial ‘brick’ geometry, consisting of a rectangular block of 𝑁𝑥×𝑁𝑦×𝑁𝑧 octrees (where 𝑁𝑥 , 𝑁𝑦 and 𝑁𝑧 do not need to be equal) • Obtaining the refinement levels and coordinates of local blocks (i.e., of a sin… view at source ↗

**Figure 4.** Figure 4: OpenACC loop style used in foap4. There is a gang loop over the blocks, and a collapsed multidimensional loop over the cells of a block. Here bx is the block size. Foap4 will therefore be less efficient when there are only a few large grid blocks. However, in typical AMR simulations it is more attractive to use many blocks of relatively small size (e.g. 163 ), so that the mesh can better adapt to the solut… view at source ↗

**Figure 5.** Figure 5: Code fragment showing how ghost cells at the same refinement level (SRL) are filled using OpenACC loops. The case shown corresponds to a face in the +𝑥 direction, with both sides present on the same MPI rank, in 2D. Since both sides are present, the other side is filled as well. The gc_srl_local array is sorted by face direction, and the gc_srl_local_iface allows to loop over one of these directions. Note … view at source ↗

**Figure 6.** Figure 6: Illustration of test for flux fixing. A scalar advection equation is solved on a statically refined mesh, so that the solution crosses several refinement boundaries. The pictures correspond to a 2D case with blocks of size 322 and five levels of initial refinement. Each block is indicated by a square. where 𝛾 the ratio of specific heats, and sound speeds are computed as 𝑐 = √ 𝛾 𝑝∕𝜌. (16) As discussed in se… view at source ↗

**Figure 7.** Figure 7: shows 𝐿1 and 𝐿2 errors as a function of grid spacing Δ𝑥 using two limiters: the van Leer limiter, which is up to second order accurate, and the WENO5 limiter, which is up to fifth order accurate. For time stepping, Heun’s method was used for the van Leer case and the classic RK4 scheme for the WENO5 case. To prevent temporal errors with the fourth-order accurate RK4 scheme from becoming important with the … view at source ↗

**Figure 8.** Figure 8: GPU strong scaling results for the Rayleigh–Taylor test case on a uniform 5123 grid. Block sizes of 8 3 , 163 and 323 were used, as indicated in the legend. The tests were performed on nodes containing four H100 GPUs. The leftmost results were performed using 1 and 2 H100s, with 1 GPU corresponding to 1/4 node. Parallel efficiency is normalized to the case using one full node. When using multiple nodes we… view at source ↗

**Figure 9.** Figure 9: CPU strong scaling results for the Rayleigh–Taylor test case on a uniform 5123 grid. The tests were performed on nodes containing two AMD Genoa 9654 CPUs, for a total of 192 cores per node. The left-most results were performed using 48 and 96 cores. Parallel efficiency is normalized to the case using one full node. 1.5; and (c) the block counts at the different refinement levels. The lowest refinement leve… view at source ↗

**Figure 11.** Figure 11: (a) Evolution of the mesh over time in the Rayleigh–Taylor test case with AMR. The colors correspond to refinement levels 2 (blue) to 6 (red). Each cell corresponds to a 163 block. Part of the computational domain, which is the unit cube, is cut out. (b) Density 𝜌 at 𝑡 = 1.5, rendered with VisIt. (c) Cumulative block count per refinement level. At 𝑡 = 1.5 there are about 105 blocks and 0.4 × 109 cells. 0.… view at source ↗

**Figure 13.** Figure 13: Breakdown of computational cost of foap4 for the Rayleigh–Taylor test case with AMR on H100 GPUs. There are two stages of filling ghost cells (R1 and R2), as discussed in section 2.6. For each round, the time for filling ghost cells and for filling buffers (to be sent) is shown. The communication time (MPI comm.) is shown for both rounds combined. The cost of changing the mesh is divided into the p4est pa… view at source ↗

**Figure 12.** Figure 12: GPU strong scaling results for the Rayleigh–Taylor test case with AMR, using blocks of 163 . The percentage of time spent in foap4 excluding flux computation and solution update is shown at the bottom, see also figure 13. both GPUs and CPUs. We believe the foap4 code has met its original design goal: to provide a performance reference so that better informed decisions can be made about adding GPU support … view at source ↗

**Figure 14.** Figure 14: CPU strong scaling results for the Rayleigh–Taylor test case with AMR, using blocks of 163 . will appear two or three times, respectively. Ensuring nonnegativity thus requires 𝜃 ≤ 2 in 2D and 𝜃 ≤ 4∕3 in 3D. CRediT authorship contribution statement Jannis Teunissen: Conceptualization of this study, Funding acquisition, Methodology, Software, Visualization, Writing – original draft. Héctor R. Olivares Sá… view at source ↗

read the original abstract

GPUs and other accelerators are increasingly used for scientific computing. In the future, we want to add GPU support to parallel adaptive mesh refinement (AMR) codes written in Fortran. To understand which changes are necessary to obtain good performance we have developed foap4, an AMR framework implemented in Fortran that uses OpenACC, MPI, and the p4est library. We discuss the design and implementation of the framework. Several benchmark problems are considered, in which Euler's equations of gas dynamics are solved using explicit time integration. These benchmarks are performed in both 2D and 3D, using static and adaptive meshes, for varying problem sizes on different hardware. Our results show that AMR simulations can be carried out efficiently on GPUs with OpenACC and MPI, even when using relatively small grid blocks of $8^3$ or $16^3$ cells.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

foap4 shows a working OpenACC + p4est + MPI stack for Fortran AMR that keeps performance reasonable on 8^3 and 16^3 blocks, but the overhead numbers for adaptive regridding and halo exchanges are not isolated enough to be fully convincing.

read the letter

The paper delivers a concrete Fortran framework that combines p4est for mesh handling with OpenACC directives for GPU offload and MPI for distributed runs. They implemented it, then ran Euler-equation benchmarks in 2D and 3D on both static and adaptive meshes across a range of sizes and hardware. That is the useful part: it gives people with existing Fortran AMR codes a direct template for adding accelerator support without a full rewrite to CUDA or Kokkos. The benchmarks cover the exact small-block sizes that matter for many legacy codes, and the abstract reports that throughput stays acceptable. That is real engineering value for the target audience. The central claim holds up in the sense that they actually shipped and timed the thing rather than just describing an idea. The soft spot is the stress-test point on small blocks. When blocks are 8^3 or 16^3 the surface-to-volume ratio climbs fast, so ghost-cell exchanges and the number of kernel launches per step become first-order costs. The abstract says both static and adaptive cases were tested, but without separate timing breakdowns that isolate communication and launch overhead under frequent regridding, it is hard to know whether the good numbers come from the foap4 design or from the specific test problems. The paper would be stronger with explicit scaling plots for the communication fraction and a CPU baseline on the same hardware. This is a software-porting and benchmarking study aimed at computational physicists who already have Fortran AMR infrastructure and need to move to GPUs. A reader in that position will find the design choices and the reported timings directly usable for planning. It is not a new algorithm or a theoretical advance, so it is not something I would cite for a method, but the artifact itself is worth knowing about. I would send it to peer review. The implementation is reproducible enough and the benchmarks are broad enough that referees can check the overhead claims and ask for the missing breakdowns.

Referee Report

2 major / 2 minor

Summary. The manuscript presents foap4, a Fortran AMR framework that combines OpenACC for GPU acceleration, MPI for distributed parallelism, and the p4est library for mesh management. It reports benchmarks solving the Euler equations with explicit time integration on static and adaptive meshes in both 2D and 3D, across varying problem sizes and hardware, and concludes that efficient GPU performance is achievable even with small blocks of 8^3 or 16^3 cells.

Significance. If the reported timings are robust, the work provides a concrete, portable path for adding GPU support to existing Fortran AMR codes without requiring large block sizes or complete rewrites, which addresses a practical barrier in computational physics and fluid dynamics on accelerator hardware.

major comments (2)

[Results] The efficiency claim for 8^3/16^3 blocks (abstract and results) rests on overall wall-clock timings but does not isolate the relative cost of MPI halo exchanges or the increased number of OpenACC kernel launches that accompany adaptive regridding; without such a breakdown it is unclear whether the result generalizes beyond the chosen Euler benchmarks.
[Results] The manuscript states that both static and adaptive meshes were tested, yet the performance tables do not report separate overheads for mesh adaptation steps versus the time-stepping loop, leaving open the possibility that the small-block efficiency is an artifact of the specific test problems rather than a general property of the foap4 design.

minor comments (2)

[Methods] Notation for block sizes (e.g., 8^3) is used without an explicit definition of the underlying cell count or ghost-layer width in the methods section.
[Results] The hardware configurations and compiler flags used for the OpenACC runs should be listed in a dedicated table for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address the major comments point by point below, with partial revisions to add clarifying discussion on performance overheads.

read point-by-point responses

Referee: [Results] The efficiency claim for 8^3/16^3 blocks (abstract and results) rests on overall wall-clock timings but does not isolate the relative cost of MPI halo exchanges or the increased number of OpenACC kernel launches that accompany adaptive regridding; without such a breakdown it is unclear whether the result generalizes beyond the chosen Euler benchmarks.

Authors: We agree that isolating MPI halo exchange costs and extra kernel launches from regridding would strengthen generalization claims. In the revised manuscript we have added text noting that regridding occurs infrequently (every 10-20 steps in the reported tests) relative to the explicit time-stepping loop. The static-mesh benchmarks, which incur no regridding overhead, exhibit comparable small-block efficiency, indicating that the result is not driven solely by the adaptive overheads present in the Euler cases. A full quantitative breakdown would require additional instrumentation not performed in the original study. revision: partial
Referee: [Results] The manuscript states that both static and adaptive meshes were tested, yet the performance tables do not report separate overheads for mesh adaptation steps versus the time-stepping loop, leaving open the possibility that the small-block efficiency is an artifact of the specific test problems rather than a general property of the foap4 design.

Authors: The tables report aggregate wall-clock times for complete runs. We have revised the results section to state explicitly the adaptation frequency used and to highlight that time-stepping dominates runtime in both static and adaptive configurations. Separate adaptation overheads are not tabulated because they were secondary to demonstrating overall feasibility with small blocks; however, the direct comparison between static and adaptive timings in the same Euler benchmarks shows that small-block performance persists when adaptation is absent. The chosen test problems are standard for explicit AMR gas dynamics and the framework design itself is not problem-specific. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical implementation and benchmark results

full rationale

The paper describes the design and implementation of the foap4 AMR framework using Fortran, OpenACC, MPI, and p4est, followed by empirical performance measurements on benchmark problems solving Euler equations in 2D/3D with static and adaptive meshes. No derivation chain, first-principles predictions, or fitted parameters are present; all claims rest on direct timing results from hardware runs. No self-citations, ansatzes, or renamings reduce results to inputs by construction. The work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an engineering and benchmarking paper. No mathematical free parameters, axioms, or invented physical entities are introduced; the central claim rests on software implementation choices and empirical timing measurements.

pith-pipeline@v0.9.0 · 5496 in / 1085 out tokens · 43466 ms · 2026-05-11T02:56:47.709405+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages

[1]

Fypp – python powered fortran metaprogramming

Aradi, B., 2026. Fypp – python powered fortran metaprogramming. https://github.com/aradi/fypp. Accessed: 2026-04-16

work page 2026
[2]

P4est : Scalable Algorithms for Parallel Adaptive Mesh Refinement on Forests of Octrees

Burstedde, C., Wilcox, L.C., Ghattas, O., 2011. P4est : Scalable Algorithms for Parallel Adaptive Mesh Refinement on Forests of Octrees. SIAM Journal on Scientific Computing 33, 1103–1133. doi:10.1137/100791634

work page doi:10.1137/100791634 2011
[3]

GPUAccelerationofanEstablishedSolarMHDCodeus- ing OpenACC

Caplan, R.M., Linker, J.A., Mikić, Z., Downs, C., Török, T., Titov, V.S.,2019. GPUAccelerationofanEstablishedSolarMHDCodeus- ing OpenACC. Journal of Physics: Conference Series 1225, 012012. doi:10.1088/1742-6596/1225/1/012012

work page doi:10.1088/1742-6596/1225/1/012012 2019
[4]

On Godunov-Type Methods for Gas Dynamics

Einfeldt, B., 1988. On Godunov-Type Methods for Gas Dynamics. SIAM Journal on Numerical Analysis 25, 294–318. doi:10.1137/ 0725021

work page 1988
[5]

SIAMReview43,89–112

Gottlieb,S.,Shu,C.W.,Tadmor,E.,2001.StrongStability-Preserving High-OrderTimeDiscretizationMethods. SIAMReview43,89–112. doi:10.1137/S003614450036757X

work page doi:10.1137/s003614450036757x 2001
[6]

OnUpstreamDifferencing and Godunov-Type Schemes for Hyperbolic Conservation Laws, in: Hussaini, M.Y., Van Leer, B., Van Rosendale, J

Harten,A.,Lax,P.D.,VanLeer,B.,1997. OnUpstreamDifferencing and Godunov-Type Schemes for Hyperbolic Conservation Laws, in: Hussaini, M.Y., Van Leer, B., Van Rosendale, J. (Eds.), Upwind and High-Resolution Schemes. Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 53–79. doi:10.1007/978-3-642-60543-7_4

work page doi:10.1007/978-3-642-60543-7_4 1997
[7]

Efficient Implementation of Weighted ENO Schemes

Jiang, G.S., Shu, C.W., 1996. Efficient Implementation of Weighted ENO Schemes. Journal of Computational Physics 126, 202–228. doi:10.1006/jcph.1996.0130

work page doi:10.1006/jcph.1996.0130 1996
[8]

2023, A&A, 673, A66, doi: 10.1051/0004-6361/202245359

Keppens,R.,PopescuBraileanu,B.,Zhou,Y.,Ruan,W.,Xia,C.,Guo, Y., Claes, N., Bacchini, F., 2023. MPI-AMRVAC 3.0: Updates to an open-source simulation framework. Astronomy & Astrophysics 673, A66. doi:10.1051/0004-6361/202245359

work page doi:10.1051/0004-6361/202245359 2023
[9]

MPI-AMRVAC: A parallel, grid-adaptive PDE toolkit

Keppens,R.,Teunissen,J.,Xia,C.,Porth,O.,2021. MPI-AMRVAC: A parallel, grid-adaptive PDE toolkit. Computers & Mathematics with Applications 81, 316–333. doi:10.1016/j.camwa.2020.03.023

work page doi:10.1016/j.camwa.2020.03.023 2021
[10]

A robust upwind discretization method for ad- vection, diffusion and source terms, in: Vreugdenhil, C.B., Koren, B

Koren, B., 1993. A robust upwind discretization method for ad- vection, diffusion and source terms, in: Vreugdenhil, C.B., Koren, B. (Eds.), Numerical Methods for Advection-Diffusion Problems. Braunschweig/Wiesbaden: Vieweg, pp. 117–138

work page 1993
[11]

Accelerating a C++ CFD Code with OpenACC, in: 2014 First Workshop on AcceleratorProgrammingUsingDirectives,IEEE,NewOrleans,LA, USA

Kraus, J., Schlottke, M., Adinetz, A., Pleiter, D., 2014. Accelerating a C++ CFD Code with OpenACC, in: 2014 First Workshop on AcceleratorProgrammingUsingDirectives,IEEE,NewOrleans,LA, USA. pp. 47–54. doi:10.1109/WACCPD.2014.11

work page doi:10.1109/waccpd.2014.11 2014
[12]

IDEFIX: A versatile performance-portableGodunovcodeforastrophysicalflows

Lesur, G.R.J., Baghdadi, S., Wafflard-Fernandez, G., Mauxion, J., Robert, C.M.T., Van Den Bossche, M., 2023. IDEFIX: A versatile performance-portableGodunovcodeforastrophysicalflows. Astron- omy & Astrophysics 677, A9. doi:10.1051/0004-6361/202346005

work page doi:10.1051/0004-6361/202346005 2023
[13]

H-AMR: A New GPU-accelerated GRMHD Code for Exascale Computing with 3D Adaptive Mesh Refinement and Local Adaptive Time Stepping

Liska, M.T.P., Chatterjee, K., Issa, D., Yoon, D., Kaaz, N., Tchekhovskoy, A., Van Eijnatten, D., Musoke, G., Hesp, C., Rohoza, V., Markoff, S., Ingram, A., Van Der Klis, M., 2022. H-AMR: A New GPU-accelerated GRMHD Code for Exascale Computing with 3D Adaptive Mesh Refinement and Local Adaptive Time Stepping. The Astrophysical Journal Supplement Series 26...

work page 2022
[14]

An adaptive finite element scheme for transient problems in CFD

Lohner, R., 1987. An adaptive finite element scheme for transient problems in CFD. Computer Methods in Applied Mechanics and Engineering 61, 323–338. doi:10.1016/0045-7825(87)90098-3

work page doi:10.1016/0045-7825(87)90098-3 1987
[15]

On the Performance Portability of OpenACC, OpenMP, Kokkos and RAJA, in: International Conference on High PerformanceComputinginAsia-PacificRegion,ACM,VirtualEvent Japan

Marowka, A., 2022. On the Performance Portability of OpenACC, OpenMP, Kokkos and RAJA, in: International Conference on High PerformanceComputinginAsia-PacificRegion,ACM,VirtualEvent Japan. pp. 103–114. doi:10.1145/3492805.3492806

work page doi:10.1145/3492805.3492806 2022
[16]

McCall, A.J., Roy, C.J., 2017. A Multilevel Parallelism Approach with MPI and OpenACC for Complex CFD Codes, in: 23rd AIAA Computational Fluid Dynamics Conference, American Institute of Aeronautics and Astronautics, Denver, Colorado. doi:10.2514/6. 2017-3293

work page doi:10.2514/6 2017
[17]

Agile.https: //research-software-directory.org/projects/agile

Netherlands eScience Center, 2026. Agile.https: //research-software-directory.org/projects/agile. Accessed: 2026-04-16

work page 2026
[18]

The PLUTO Code on GPUs: A First Look at Eulerian MHD Methods

Rossazza, M., Mignone, A., Bugli, M., Truzzi, S., Riha, L., Panoc, T., Vysocky, O., Shukla, N., Romeo, A., Berta, V., 2025. The PLUTO Code on GPUs: A First Look at Eulerian MHD Methods. doi:10.48550/ARXIV.2511.20337

work page doi:10.48550/arxiv.2511.20337 2025
[19]

The calculation of the interaction of non- stationaryshockwaveswithbarriers

Rusanov, V.V., 1961. The calculation of the interaction of non- stationaryshockwaveswithbarriers. ZhurnalVychislitel’noiMatem- atiki i Matematicheskoi Fiziki 1, 267–279. Teunissen et al.:Preprint submitted to ElsevierPage 16 of 17 foap4: AMR with OpenACC, MPI, and p4est

work page 1961
[20]

High-Order Strong-Stability- Preserving Runge–Kutta Methods with Downwind-Biased Spatial Discretizations

Ruuth, S.J., Spiteri, R.J., 2004. High-Order Strong-Stability- Preserving Runge–Kutta Methods with Downwind-Biased Spatial Discretizations. SIAM Journal on Numerical Analysis 42, 974–996. doi:10.1137/S0036142902419284

work page doi:10.1137/s0036142902419284 2004
[21]

Gamer-2: A GPU-accelerated adaptive mesh refinement code – accuracy, performance, and scalability

Schive, H.Y., ZuHone, J.A., Goldbaum, N.J., Turk, M.J., Gaspari, M., Cheng, C.Y., 2018. Gamer-2: A GPU-accelerated adaptive mesh refinement code – accuracy, performance, and scalability. Monthly Notices of the Royal Astronomical Society 481, 4815–4840. doi:10. 1093/mnras/sty2586

work page 2018
[22]

M., Mullen, P

Stone, J.M., Mullen, P.D., Fielding, D., Grete, P., Guo, M., Kemp- ski, P., Most, E.R., White, C.J., Wong, G.N., 2024. AthenaK: A Performance-Portable Version of the Athena++ AMR Framework. doi:10.48550/arXiv.2409.16053,arXiv:2409.16053

work page doi:10.48550/arxiv.2409.16053 2024
[23]

Simulating streamer discharges in 3D with the parallel adaptive Afivo framework

Teunissen, J., Ebert, U., 2017. Simulating streamer discharges in 3D with the parallel adaptive Afivo framework. Journal of Physics D: Applied Physics 50, 474001. doi:10.1088/1361-6463/aa8faf

work page doi:10.1088/1361-6463/aa8faf 2017
[24]

Computer Physics Communications 233, 156–166

Teunissen,J.,Ebert,U.,2018.Afivo:Aframeworkforquadtree/octree AMR with shared-memory parallelization and geometric multigrid methods. Computer Physics Communications 233, 156–166. doi:10. 1016/j.cpc.2018.06.018

work page 2018
[25]

The Kokkos EcoSystem: Comprehensive Performance Portability for High Performance Computing

Trott, C., Berger-Vergiat, L., Poliakoff, D., Rajamanickam, S., Lebrun-Grandie,D.,Madsen,J.,AlAwar,N.,Gligoric,M.,Shipman, G., Womeldorff, G., 2021. The Kokkos EcoSystem: Comprehensive Performance Portability for High Performance Computing. Com- puting in Science & Engineering 23, 10–18. doi:10.1109/MCSE.2021. 3098509

work page doi:10.1109/mcse.2021 2021
[26]

Towards the ultimate conservative difference scheme III

Van Leer, B., 1977. Towards the ultimate conservative difference scheme III. Upstream-centered finite-difference schemes for ideal compressible flow. Journal of Computational Physics 23, 263–275. doi:10.1016/0021-9991(77)90094-8

work page doi:10.1016/0021-9991(77)90094-8 1977
[27]

MPI-AMRVAC 2.0 for Solar and Astrophysical Applications

Xia, C., Teunissen, J., Mellah, I.E., Chané, E., Keppens, R., 2018. MPI-AMRVAC 2.0 for Solar and Astrophysical Applications. The Astrophysical Journal Supplement Series 234, 30. doi:10.3847/ 1538-4365/aaa6c8

work page 2018
[28]

An improved framework of GPU computing for CFD applications on structured grids using OpenACC

Xue, W., Jackson, C.W., Roy, C.J., 2021. An improved framework of GPU computing for CFD applications on structured grids using OpenACC. Journal of Parallel and Distributed Computing 156, 64–

work page 2021
[29]

doi:10.1016/j.jpdc.2021.05.010

work page doi:10.1016/j.jpdc.2021.05.010 2021
[30]

Zhang, W., Almgren, A., Beckner, V., Bell, J., Blaschke, J., Chan, C., Day, M., Friesen, B., Gott, K., Graves, D., Katz, M., Myers, A., Nguyen, T., Nonaka, A., Rosso, M., Williams, S., Zingale, M.,

work page
[31]

Journal of Open Source Software 4, 1370

AMReX: A framework for block-structured adaptive mesh refinement. Journal of Open Source Software 4, 1370. doi:10.21105/ joss.01370. Teunissen et al.:Preprint submitted to ElsevierPage 17 of 17

work page