pith. sign in

arxiv: 2606.18927 · v1 · pith:Y4SGE4TAnew · submitted 2026-06-17 · ⚛️ physics.flu-dyn

APU-Accelerated Large Eddy Simulation with the Discontinuous Galerkin Solver GAL{AE}XI

Pith reviewed 2026-06-26 19:37 UTC · model grok-4.3

classification ⚛️ physics.flu-dyn
keywords discontinuous Galerkin spectral element methodlarge eddy simulationwall-modeled LEStransonic compressor cascadeshock-wave boundary layer interactionGPU accelerationAMD MI300A APUscaling performance
0
0 comments X

The pith

GALÆXI performs accurate wall-resolved large eddy simulation of a transonic compressor cascade on AMD MI300A APUs, capturing shock-wave and turbulent boundary-layer interactions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper advances the discontinuous Galerkin spectral element method solver GALÆXI for heterogeneous GPU systems by optimizing it for AMD MI300A APUs. It analyzes strong and weak scaling, integrates wall-modeled large eddy simulation algorithms, and validates them on a turbulent channel flow. The solver is applied to a wall-resolved large eddy simulation of a transonic compressor cascade. A reader would care because this shows how high-order CFD methods can run on exascale hardware while handling complex engineering flows.

Core claim

By linking hardware optimization on AMD MI300A APUs, software implementation of the DGSEM framework, and physical validation, the work shows that GALÆXI can accurately capture complex shock-wave/turbulent boundary-layer interactions in a wall-resolved large eddy simulation of a transonic compressor cascade.

What carries the argument

The architecture-agnostic DGSEM framework GALÆXI, which enables GPU acceleration on APUs and integration of wall-modeled LES algorithms while preserving high-order accuracy.

Load-bearing premise

The integration of wall-modeled LES algorithms into the GPU-accelerated DGSEM framework preserves the physical accuracy of the original method on the transonic compressor cascade without introducing unquantified numerical artifacts.

What would settle it

A direct comparison of predicted shock positions, wall shear stress distributions, or turbulence statistics from the compressor cascade simulation against experimental measurements that reveals discrepancies beyond expected numerical error would falsify the accuracy claim.

Figures

Figures reproduced from arXiv: 2606.18927 by Andrea Beck, Andreas Wanninger, Anna Schwarz, Johanna Hintz, Justin Du Plessis, Patrick Kopper, Rohan Kaushik, Spencer Starr.

Figure 1
Figure 1. Figure 1: Strong (left) and weak (right) scaling of GALÆXI on Hunter. The ideal [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Top: Comparison of instantaneous, streamwise body force [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Shock wave formation for a static pressure ratio (Case 2; left) and increased [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
read the original abstract

The exascale computing era, driven by heterogeneous GPU architectures, requires a fundamental redesign of traditional CFD solvers to fully leverage those heterogeneous systems. The discontinuous Galerkin spectral element method (DGSEM) provides an ideal foundation for this transition due to its high-order accuracy and local computational stencil. This work presents recent advances in the development and application of the architecture-agnostic DGSEM framework GAL{\AE}XI by linking hardware optimization, software implementation, and physical validation. The performance of GAL{\AE}XI on the AMD MI300A Accelerated Processing Units (APUs) featured on the Hunter supercomputer is analyzed. Specifically, evaluations of the strong and weak scaling performance and the impact of the compute partitioning modes available on the AMD MI300As are performed. Second, the strategy used to integrate the algorithms necessary for wall-modeled large eddy simulations into the GPU-accelerated framework is outlined. Validation of those algorithms is presented in the form of a plane turbulent channel testcase. Finally, the solver is applied to a demanding flow problem in the form of a wall-resolved large eddy simulation of a transonic compressor cascade. The results from this investigation demonstrate the capabilities of GAL{\AE}XI to accurately capture complex shock-wave/turbulent boundary-layer interactions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper presents advances in the GALÆXI DGSEM framework for APU-accelerated LES on AMD MI300A hardware. It reports strong/weak scaling and partitioning-mode performance, describes integration of wall-modeled LES algorithms, validates those algorithms on a plane turbulent channel, and applies the solver (as wall-resolved LES) to a transonic compressor cascade, claiming that the results demonstrate accurate capture of complex shock-wave/turbulent boundary-layer interactions.

Significance. If the scaling results and accuracy claims hold, the work provides a practical demonstration of porting a high-order DGSEM solver to emerging APU architectures and applying it to a demanding turbomachinery flow. The explicit linkage of hardware optimization, WMLES implementation, and a complex-geometry application is a strength for exascale CFD efforts.

major comments (1)
  1. [Abstract and application section] Abstract (final sentence) and application paragraph: the central claim that the results demonstrate accurate capture of shock-wave/turbulent boundary-layer interactions in the transonic compressor cascade lacks support. The manuscript validates the WMLES integration only on the plane channel test case and then applies wall-resolved LES to the cascade; no quantitative comparisons to experimental data or reference simulations are described for cascade-specific quantities such as shock location, surface pressure distributions, or boundary-layer profiles.
minor comments (1)
  1. [Title and abstract] The notation GAL{Æ}XI in the title and abstract should be rendered consistently as GALÆXI throughout.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review. We address the single major comment below.

read point-by-point responses
  1. Referee: [Abstract and application section] Abstract (final sentence) and application paragraph: the central claim that the results demonstrate accurate capture of shock-wave/turbulent boundary-layer interactions in the transonic compressor cascade lacks support. The manuscript validates the WMLES integration only on the plane channel test case and then applies wall-resolved LES to the cascade; no quantitative comparisons to experimental data or reference simulations are described for cascade-specific quantities such as shock location, surface pressure distributions, or boundary-layer profiles.

    Authors: We agree that the manuscript validates the WMLES algorithms solely on the plane turbulent channel and performs a wall-resolved LES of the transonic compressor cascade without providing quantitative comparisons (e.g., shock location, surface pressure, or boundary-layer profiles) to experiments or reference data. The final sentence of the abstract therefore overstates what the presented results support. We will revise the abstract and the application-section paragraph to state that the cascade simulation illustrates the solver's capability on a complex geometry involving shock-turbulence interactions, while removing the claim of demonstrated accuracy for this specific case. revision: yes

Circularity Check

0 steps flagged

No circularity; performance and validation are direct external measurements

full rationale

The manuscript reports direct hardware scaling measurements on AMD MI300A APUs and validates the WMLES integration against the standard plane turbulent channel benchmark before applying the solver (as wall-resolved LES) to the cascade. No equations, fitted parameters, or predictions are presented that reduce by construction to the paper's own inputs or prior self-citations. The central claims rest on external benchmarks and standard test cases rather than self-referential definitions or load-bearing self-citation chains, making the work self-contained against external references.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper introduces no new free parameters, physical axioms, or invented entities; it relies on the standard mathematical properties of the discontinuous Galerkin method and established LES modeling assumptions already present in the literature.

axioms (1)
  • domain assumption Standard mathematical properties of the discontinuous Galerkin spectral element method hold on the target hardware
    Invoked implicitly when claiming that the framework can be ported while retaining high-order accuracy.

pith-pipeline@v0.9.1-grok · 5784 in / 1232 out tokens · 22248 ms · 2026-06-26T19:37:23.825013+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

28 extracted references

  1. [1]

    Standard ISO 2533:1975, International Organization for Standardization, Geneva, Switzerland, 1975

    Standard atmosphere. Standard ISO 2533:1975, International Organization for Standardization, Geneva, Switzerland, 1975

  2. [2]

    AMD CDNA ™ Architec- ture.https://www.amd.com/content/dam/amd/en/ documents/instinct-tech-docs/white-papers/ amd-cdna-3-white-paper.pdf, 2025

    Advanced Micro Devices. AMD CDNA ™ Architec- ture.https://www.amd.com/content/dam/amd/en/ documents/instinct-tech-docs/white-papers/ amd-cdna-3-white-paper.pdf, 2025

  3. [3]

    AMD GPU Driver (amdgpu).https://rocm

    Advanced Micro Devices. AMD GPU Driver (amdgpu).https://rocm. docs.amd.com/projects/HIP/en/latest/, 2026

  4. [4]

    HIP.https://rocm.docs.amd.com/ projects/HIP/en/latest/, 2026

    Advanced Micro Devices. HIP.https://rocm.docs.amd.com/ projects/HIP/en/latest/, 2026

  5. [5]

    A. Beck, T. Bolemann, D. Flad, H. Frank, N. Krais, K. Kukuschkin, M. Son- ntag, and C.-D. Munz. Application and development of the high order discon- tinuous Galerkin spectral element method for compressible multiscale flows. InHigh Performance Computing in Science and Engineering’17, pages 387–

  6. [6]

    Blind, M

    M. Blind, M. Gao, D. Kempf, P. Kopper, M. Kurz, A. Schwarz, and A. Beck. Towards Exascale CFD Simulations Using the Discontinuous Galerkin Solver FLEXI, pages 207–221. Springer Nature Switzerland, 2026

  7. [7]

    Chandrashekar

    P. Chandrashekar. Kinetic Energy Preserving and Entropy Stable Finite V ol- ume Schemes for Compressible Euler and Navier-Stokes Equations.Commu- nications in Computational Physics, 14(5):1252–1286, 2013

  8. [8]

    J. W. Deardorff. A numerical study of three-dimensional turbulent channel flow at large Reynolds numbers.Journal of Fluid Mechanics, 41(2):453–480, 1970

  9. [9]

    Fior.CFD Study of an Installed Transonic Rotor

    E. Fior.CFD Study of an Installed Transonic Rotor. PhD thesis, Universita degli Studi di Padova, 2018

  10. [10]

    Harten and J

    A. Harten and J. M. Hyman. Self adjusting grid methods for one-dimensional hyperbolic conservation laws.Journal of Computational Physics, 50(2):235– 269, 1983

  11. [11]

    Kawai and J

    S. Kawai and J. Larsson. Wall-modeling in large eddy simulation: Length scales, grid resolution, and accuracy.Physics of Fluids, 24, 01 2012

  12. [12]

    J. Keim, A. Schwarz, P. Kopper, M. Blind, C. Rohde, and A. Beck. Entropy stable high-order discontinuous Galerkin spectral-element methods on curvi- linear, hybrid meshes.Journal of Computational Physics, 557:114829, 2026

  13. [13]

    Kempf, M

    D. Kempf, M. Gao, A. Beck, M. Blind, P. Kopper, T. Kuhn, , M. Kurz, A. Schwarz, and C.-D. Munz. Development of turbulent inflow methods for the high order HPC framework FLEXI. InHigh Performance Computing in Science and Engineering’21. Springer. In press

  14. [14]

    Kopper, A

    P. Kopper, A. Schwarz, S. M. Copplestone, P. Ortwein, S. Staudacher, and A. Beck. A framework for high-fidelity particle tracking on massively parallel systems.Computer Physics Communications, 289:108762, Apr. 2023

  15. [15]

    Krais, A

    N. Krais, A. Beck, T. Bolemann, H. Frank, D. Flad, G. Gassner, F. Hinden- lang, M. Hoffmann, T. Kuhn, M. Sonntag, and C.-D. Munz. FLEXI: A high APU-Accelerated Large Eddy Simulation with GALÆXI 15 order discontinuous Galerkin framework for hyperbolic–parabolic conserva- tion laws.Computers & Mathematics with Applications, 81:186–219, 2021

  16. [16]

    M. Kurz, D. Kempf, M. Blind, P. Kopper, P. Offenhäuser, A. Schwarz, S. Starr, J. Keim, and A. Beck. GALÆXI: Solving complex compressible flows with high-order discontinuous Galerkin methods on accelerator-based sys- tems.Computer Physics Communications, 306, 2024

  17. [17]

    Larsson, S

    J. Larsson, S. Kawai, J. Bodart, and I. Bermejo-Moreno. Large eddy simula- tion with modeled wall-stress: Recent progress and future directions.Mechan- ical Engineering Reviews, 3, 11 2015

  18. [18]

    F. Meng, J. Tang, J. Li, J. Zhong, and P. Guo. Large eddy simulation of shock wave/boundary layer interactions in a transonic compressor cascade.Physics of Fluids, 36(7):076101, 2024

  19. [19]

    M. J. Pierzga and J. R. Wood. Investigation of the three-dimensional flow field within a transonic fan rotor: Experiment and analysis.Journal of Engineering for Gas Turbines and Power, 107(2):436–448, 1985

  20. [20]

    J. Reid. The new features of Fortran 2003. 26(1):10–33, 2007

  21. [21]

    Schwarz, D

    A. Schwarz, D. Kempf, J. Keim, P. Kopper, C. Rohde, and A. Beck. Com- parison of entropy stable collocation high-order DG methods for compressible turbulent flows.Computers & Fluids, 303:106874, 2025

  22. [22]

    Starr, Y

    S. Starr, Y . Feldner, P. Kopper, M. Blind, D. Kempf, J. Schrempp, F. Rodach, A. Beck, and A. Schwarz. An architecture-agnostic high-order discontinuous Galerkin framework for compressible flows, 2026

  23. [23]

    A. J. Strazisar. Investigation of flow phenomena in a transonic fan rotor us- ing laser anemometry.Journal of Engineering for Gas Turbines and Power, 107(2):427–435, 1985

  24. [24]

    Strazisar, R

    J. Strazisar, R. Wood, D. Hathaway, and L. Suder. Laser anemometer mea- surements in a transonic axial-flow fan rotor. Technical report, NASA, 1989. Coordinates of rotor found in PDF starting from PDF page 22

  25. [25]

    E. R. Van Driest. Turbulent boundary layer in compressible fluids.J. Aero. Sci., 18:145–160, 1951

  26. [26]

    F. M. White.Viscous fluid flow. McGraw-Hill series in mechanical engineer- ing. McGraw-Hill, Boston, Mass., 3. ed. edition, 2006

  27. [27]

    J. R. Wood, A. J. Strazisar, and P. S. Simonyi. Shock Structure Measured in a Transonic Fan Using Laser Anemometry. InAGARD Conference Proceedings No. 401 Transonic and Supersonic Phenomena in Turbomachines, pages 2–1, Munich, Germany, 1986

  28. [28]

    X. Yang, J. Sadique, R. Mittal, and C. Meneveau. Integral wall model for large eddy simulations of wall-bounded turbulent flows.Physics of Fluids, 27:025112, 02 2015