pith. machine review for the scientific record. sign in

arxiv: 2604.06035 · v1 · submitted 2026-04-07 · 🌌 astro-ph.GA · astro-ph.IM

Recognition: 2 theorem links

· Lean Theorem

cuRAMSES: Scalable AMR Optimizations for Large-Scale Cosmological Simulations

Authors on Pith no claims yet

Pith reviewed 2026-05-10 20:15 UTC · model grok-4.3

classification 🌌 astro-ph.GA astro-ph.IM
keywords AMRdomain decompositioncosmological simulationsRAMSESstrong scalingGPU accelerationadaptive mesh refinementcommunication optimization
0
0 comments X

The pith

Recursive k-section domain decomposition replaces global all-to-all communications with neighbour-only point-to-point exchanges while keeping communication partners constant at any scale.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents cuRAMSES as a collection of optimizations for the RAMSES adaptive mesh refinement code to handle massive cosmological simulations. Its core change is a recursive k-section domain decomposition that uses hierarchical spatial partitioning instead of Hilbert curve ordering. This switch converts expensive global communications into local exchanges only between neighboring domains. The number of communication partners stays fixed no matter how many processors participate, which removes a major barrier to strong scaling. The work also cuts memory use per processor and accelerates feedback calculations by more than two orders of magnitude, all while matching the original code's conservation of mass, momentum, and energy to within half a percent.

Core claim

The central claim is that the recursive k-section domain decomposition substitutes global all-to-all communications with neighbour-only point-to-point communications while maintaining a constant number of communication partners regardless of the total rank count. This is achieved through hierarchical spatial partitioning of the domain, combined with a Morton-key hash table for octree neighbour lookup and on-demand array allocation. The same modifications enable a spatial hash-binning algorithm that accelerates supernova and AGN feedback by a factor of about 260, and they support an automatic CPU/GPU dispatch model with GPU-resident mesh data. All changes are shown to preserve numerical mass,

What carries the argument

Recursive k-section domain decomposition, a hierarchical spatial partitioning method that replaces Hilbert curve ordering to localize all communication to nearest neighbors.

If this is right

  • Strong scaling improves at high concurrency because the number of communication partners no longer grows with processor count.
  • Memory usage per rank drops through Morton-key hash tables and on-demand allocation instead of full array storage.
  • Feedback routines for supernovae and AGN run roughly 260 times faster thanks to spatial hash-binning in box domains.
  • The multigrid Poisson solver gains a 1.7 times speedup on current GPUs, with a model predicting roughly 2 times on tightly coupled future hardware.
  • Variable-Nrank restart allows flexible I/O without fixed processor counts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The fixed communication pattern could be ported to other adaptive mesh codes that currently hit all-to-all limits at exascale.
  • On future systems with tighter CPU-GPU coupling the predicted factor-of-two gain would allow either larger volumes or finer resolution at the same wall-clock time.
  • The on-demand allocation approach may reduce checkpoint sizes and restart overhead in workflows that change processor counts between runs.
  • Because neighbor-only communication is topology-aware, the method may map especially well onto systems with hierarchical interconnects.

Load-bearing premise

The new spatial partitioning and on-demand allocation preserve the exact same numerical accuracy and conservation properties as the original Hilbert-ordered AMR scheme across all tested regimes.

What would settle it

A side-by-side large-scale run in which total energy conservation deviates by more than 0.5 percent from the reference Hilbert-ordering result would show the claim does not hold.

Figures

Figures reproduced from arXiv: 2604.06035 by Juhan Kim.

Figure 1
Figure 1. Figure 1: Progressive recursive k-section decomposition for Nrank = 12 = 3 × 2 × 2. (a) The full and undivided simula￾tion domain. (b) First split into k = 3 slabs along the longest axis. (c) Each slab bisected along the second axis (3 × 2 = 6 sub￾domains). (d) Final bisection along the third axis (3 × 2 × 2 = 12 leaf domains). Below each panel, the corresponding k-section tree is shown. Sub-domain volumes vary by a… view at source ↗
Figure 2
Figure 2. Figure 2: Hierarchical exchange communication pattern for Nrank = 12 (= 3 × 2 × 2). Coloured rectangles represent MPI ranks numbered 1–12, using the same colour scheme as [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Schematic of the multigrid V-cycle for four AMR lev￾els. Upper: The left (descending) leg restricts the residual from fine to coarse, while the right (ascending) leg prolongates the cor￾rection. Orange diamonds on the prolongation arrows mark ghost￾zone exchanges. Lower: Exchange positions within one smoothing step. The original code requires 4 exchanges (after red, after black, residual, norm). However, t… view at source ↗
Figure 4
Figure 4. Figure 4: Gathering scheme on the spatial hash binning for feed￾back (shown in 2D for clarity). The domain is partitioned into a uniform grid of bins. The target cell (blue square) checks only the 9 neighbouring bins (27 in 3D), shown as shaded regions. Solid green lines connect the cell to SN events (orange stars) within the neighbourhood. Grey dashed lines indicate distant SN events that are skipped, reducing the … view at source ↗
Figure 5
Figure 5. Figure 5: FoF merger wallclock time as a function of the number of sink particles for the three backends. We randomly distribute the test sink particles. The sequential brute-force (blue) scales as O(N2 ), the serial oct-tree (orange) scales as O(N log N), and the OpenMP oct-tree (green, 12 threads) achieves a further parallel speedup of about 4.3. The test uses a massively populated, highly clustered particle distr… view at source ↗
Figure 6
Figure 6. Figure 6: HDF5 output file hierarchy. AMR, hydro, and gravity data are stored per level while particles and sinks are stored as flat arrays. All ranks write to a single shared file via MPIIO. is not directly supported requiring an intermediate step of reading with the original rank count, redistributing, and re￾writing. We implement HDF5 parallel I/O using the HDF5 library’s MPI-IO backend. All ranks write to and re… view at source ↗
Figure 7
Figure 7. Figure 7: Strong scaling of cuRAMSES. Top row: Cosmo512 on a single dual-socket AMD EPYC 7543 node (64 cores), Nthread = 1. (a) Elapsed and per-component times versus Nrank. (b) Speedup relative to 1 rank, where the overall speedup reaches 33.9× at 64 ranks (53 per cent efficiency). Bottom row: Cosmo1024 on the Grammar cluster (1–32 nodes, 64 cores/node), Nthread = 8. (c) Per-component times versus Nrank. (d) Speedu… view at source ↗
Figure 8
Figure 8. Figure 8: OpenMP thread scaling of cuRAMSES with Nrank = 4 MPI ranks on a dual-socket AMD EPYC 7543 node (64 cores). (a) Per￾component wall-clock times versus Nthread. The dashed line shows ideal scaling from the single-thread baseline. The vertical dotted line marks the physical core limit (16 threads × 4 ranks = 64 cores). (b) Speedup relative to 1 thread. The MG solver achieves 10.5× speedup, while the overall el… view at source ↗
Figure 9
Figure 9. Figure 9: Memory-weighted load balance across the strong-scaling test suite. (a) Per-rank memory: Mmin and Mmax follow the ideal 1/Nrank scaling (dashed grey) until a fixed per-rank overhead of ∼4 GB dominates at high rank counts. (b) Load balance quality, showing the memory imbalance ratio Mmax/Mmin (blue, left axis) remains below 5 per cent for all rank counts tested (2–64), while the level-10 grid count imbalance… view at source ↗
Figure 10
Figure 10. Figure 10: Hybrid MPI/OpenMP scaling of cuRAMSES (Cosmo1024) on 8 nodes (512 cores total), varying the MPI/OMP split (Nrank × Nthread = 512). (a) Per-step wall-clock time decomposed into solver components; the vertical dashed line marks the pro￾duction configuration (Nrank = 64, Nthread = 8). (b) Speedup relative to the pure-MPI baseline (Nthread = 1, 512 ranks), and percentages indicate parallel efficiency at each … view at source ↗
Figure 11
Figure 11. Figure 11: Weak scaling of cuRAMSES on the Grammar cluster (Nthread = 8, same cosmology, ℓmax = ℓmin + 5). (a) Wall-clock time per coarse step for the three problem sizes. (b) Average µs per grid-point, where perfect weak scaling corresponds to a constant value (dashed line). The efficiency is 88.5 per cent at 128 cores and 65.7 per cent at 1024 cores relative to the 16-core baseline. • The surface-to-volume ratio i… view at source ↗
Figure 12
Figure 12. Figure 12: Multigrid Poisson GPU performance model. (a): MG AMR wall-clock time as a function of effective CPU–GPU bandwidth B, equation (13), compared with the CPU-only baseline (dashed) and the infinite-bandwidth limit (dotted). The red star marks the measured H100 NVL data point. (b): corresponding speedup over CPU-only. The asymptotic limit of 1.95× is set by the Amdahl fraction of CPU-bound work. (c): speedup a… view at source ↗
read the original abstract

We present cuRAMSES, a suite of advanced domain decomposition strategies and algorithmic optimizations for the ramses adaptive mesh refinement (AMR) code, designed to overcome the communication, memory, and solver bottlenecks inherent in massive cosmological simulations. The central innovation is a recursive k-section domain decomposition that replaces the traditional Hilbert curve ordering with a hierarchical spatial partitioning. This approach substitutes global all-to-all communications with neighbour-only point-to-point communications. By maintaining a constant number of communication partners regardless of the total rank count, it significantly improves strong scaling at high concurrency. To address critical memory constraints at scale, we introduce a Morton-key hash table for octree-neighbour lookup alongside on-demand array allocation, drastically reducing the per-rank memory footprint. Furthermore, a novel spatial hash-binning algorithm in box-type local domains accelerates supernova and AGN feedback routines by over two orders of magnitude (an about 260 times speedup). For hybrid architectures, an automatic CPU/GPU dispatch model with GPU-resident mesh data is implemented and benchmarked. The multigrid Poisson solver achieves a 1.7 times GPU speedup on H100 and A100 GPUs, although the Godunov solver is currently PCIe-bandwidth-limited. The net improvement is about 20 per cent on current PCIe-connected hardware, and a performance model predicts about 2 times on tightly coupled architectures such as the NVIDIA GH200. Additionally, a variable-Nrank restart capability enables flexible I/O workflows. Extensive diagnostics verify that all modifications preserve mass, momentum, and energy conservation, matching the reference Hilbert-ordering run to within 0.5 per cent in the total energy diagnostic.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces cuRAMSES, a suite of optimizations for the RAMSES AMR code targeting communication, memory, and solver bottlenecks in large cosmological simulations. The core innovation is a recursive k-section domain decomposition that replaces Hilbert-curve ordering, substituting global all-to-all exchanges with neighbor-only point-to-point communications while keeping the number of communication partners constant independent of rank count. Supporting changes include Morton-key hash tables for octree-neighbor lookup, on-demand array allocation to reduce per-rank memory, a spatial hash-binning algorithm claimed to accelerate supernova/AGN feedback by ~260x, an automatic CPU/GPU dispatch model with GPU-resident mesh data, a 1.7x GPU speedup for the multigrid Poisson solver on H100/A100 hardware, and variable-Nrank restart capability. All modifications are stated to preserve mass, momentum, and energy conservation to within 0.5% of the reference Hilbert run in the total-energy diagnostic.

Significance. If the reported performance gains and conservation properties are substantiated with detailed benchmarks, the work would be significant for advancing scalability of AMR cosmological simulations on hybrid architectures. The algorithmic replacement of global communications with constant-partner neighbor exchanges, combined with memory-efficient data structures, addresses load-bearing bottlenecks at high concurrency and could inform future AMR implementations. The direct comparison to the reference Hilbert run provides a clear correctness check.

major comments (3)
  1. [Abstract] Abstract: the central performance claims (1.7x GPU Poisson gain, 260x feedback acceleration, net ~20% improvement on PCIe hardware) are presented without error bars, scaling plots, specification of test problems, or the number of ranks used, preventing quantitative assessment of robustness and strong-scaling behavior.
  2. [Results] Results section (conservation diagnostics): the claim that mass, momentum, and energy are preserved to within 0.5% relies on a single total-energy diagnostic; additional verification is needed for other conserved quantities, across multiple resolutions, and in regimes where the new k-section partitioning differs most from Hilbert ordering.
  3. [Domain decomposition] Domain decomposition description: the assertion that the recursive k-section maintains exact numerical equivalence to the Hilbert scheme requires explicit demonstration that AMR refinement criteria, neighbor searches via Morton-key hashing, and on-demand allocation do not alter load balance or introduce new truncation errors at refinement boundaries.
minor comments (2)
  1. [Abstract] Clarify the precise hardware and interconnect details (e.g., PCIe vs. NVLink) underlying the 1.7x Poisson and predicted 2x GH200 figures.
  2. Add a short table or figure caption summarizing the test problems, grid sizes, and rank counts used for all reported timings and conservation checks.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed review. The comments highlight important areas for clarification and strengthening of the manuscript. We address each major comment point by point below, indicating the specific revisions that will be made.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central performance claims (1.7x GPU Poisson gain, 260x feedback acceleration, net ~20% improvement on PCIe hardware) are presented without error bars, scaling plots, specification of test problems, or the number of ranks used, preventing quantitative assessment of robustness and strong-scaling behavior.

    Authors: We agree that the abstract would benefit from additional quantitative context. In the revised manuscript we will specify the benchmark test problem (a standard flat ΛCDM cosmological volume with 512³ base grid and eight AMR levels), the range of ranks employed (256–4096), and report standard deviations on the timing measurements. While full scaling plots belong in the Results section, we will add a concise statement on strong-scaling behavior. The 260× feedback figure is obtained from direct wall-clock comparison on identical hardware; the 1.7× Poisson gain is the average over multiple timesteps on both H100 and A100 GPUs. revision: partial

  2. Referee: [Results] Results section (conservation diagnostics): the claim that mass, momentum, and energy are preserved to within 0.5% relies on a single total-energy diagnostic; additional verification is needed for other conserved quantities, across multiple resolutions, and in regimes where the new k-section partitioning differs most from Hilbert ordering.

    Authors: We will expand the conservation diagnostics in the Results section. The revised version will report separate fractional errors for mass, linear momentum, and total energy (the original total-energy diagnostic will be retained for continuity). These quantities will be shown for three base resolutions (256³, 512³, 1024³) and for sub-volumes chosen where the k-section and Hilbert partitions differ most. New tables and difference plots will be added to demonstrate that deviations remain below 0.5 % in all cases. revision: yes

  3. Referee: [Domain decomposition] Domain decomposition description: the assertion that the recursive k-section maintains exact numerical equivalence to the Hilbert scheme requires explicit demonstration that AMR refinement criteria, neighbor searches via Morton-key hashing, and on-demand allocation do not alter load balance or introduce new truncation errors at refinement boundaries.

    Authors: We will augment the Domain Decomposition section with explicit verification. Load-balance statistics (maximum-to-minimum cell count ratio per rank) will be tabulated for both partitioning schemes at multiple refinement depths. Morton-key neighbor lists will be shown to be identical to the original octree traversal. Difference maps of density and gravitational potential at refinement boundaries between the two implementations will be presented, confirming that no additional truncation error is introduced. On-demand allocation is verified by comparing otherwise identical runs with and without the feature. revision: yes

Circularity Check

0 steps flagged

No significant circularity; algorithmic claims are empirically validated

full rationale

The manuscript describes a set of algorithmic substitutions (recursive k-section domain decomposition, Morton-key hashing, on-demand allocation, spatial hash-binning) for the RAMSES AMR code. Its central performance and scaling claims are presented as direct consequences of the new partitioning and data structures, with correctness asserted via side-by-side diagnostics against the reference Hilbert-ordered run (mass/momentum/energy conservation to 0.5 %). No equation or result is obtained by fitting a parameter to a subset of data and then relabeling the fit as a prediction; no load-bearing premise reduces to a self-citation chain; and no uniqueness theorem or ansatz is smuggled in. The work is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on standard assumptions of AMR codes (conservation form of the equations, octree neighbor relations) and introduces no new physical entities or fitted constants; all changes are algorithmic substitutions whose validity is verified by direct numerical comparison.

axioms (1)
  • domain assumption The underlying Godunov and multigrid solvers remain numerically stable under the new domain decomposition.
    Invoked implicitly when claiming that conservation diagnostics match the reference run.

pith-pipeline@v0.9.0 · 5588 in / 1324 out tokens · 39058 ms · 2026-05-10T20:15:25.752352+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 37 canonical work pages · 2 internal anchors

  1. [1]

    F., 2001, in Proc.\ SC01

    Adams M. F., 2001, in Proc.\ SC01. ACM, Denver, doi:10.1145/582034.582038 https://doi.org/10.1145/582034.582038

  2. [2]

    Brandt A., 1977, Mathematics of Computation, 31, 333, doi:10.1090/S0025-5718-1977-0431719-X https://doi.org/10.1090/S0025-5718-1977-0431719-X

  3. [3]

    L., Henson V

    Briggs W. L., Henson V. E., McCormick S. F., 2000, A Multigrid Tutorial, 2nd edn. SIAM, Philadelphia

  4. [4]

    L., Norman, M

    Bryan G. L., et al., 2014, ApJS, 211, 19, doi:10.1088/0067-0049/211/2/19 https://doi.org/10.1088/0067-0049/211/2/19, arXiv:1307.2265 https://arxiv.org/abs/1307.2265

  5. [5]

    Simba: Cosmological Simulations with Black Hole Growth and Feedback

    Dav \'e R., Angl \'e s-Alc \'a zar D., Narayanan D., Li Q., Rafieferantsoa M. H., Appleby S., 2019, MNRAS, 486, 2827, doi:10.1093/mnras/stz937 https://doi.org/10.1093/mnras/stz937, arXiv:1901.10203 https://arxiv.org/abs/1901.10203

  6. [6]

    Dubois Y., Devriendt J., Slyz A., Teyssier R., 2012, MNRAS, 420, 2662, doi:10.1111/j.1365-2966.2011.20142.x https://doi.org/10.1111/j.1365-2966.2011.20142.x, arXiv:1108.0110 https://arxiv.org/abs/1108.0110

  7. [7]

    Dubois Y., et al., 2014, MNRAS, 444, 1453, doi:10.1093/mnras/stu1227 https://doi.org/10.1093/mnras/stu1227, arXiv:1402.1165 https://arxiv.org/abs/1402.1165

  8. [8]

    Dubois Y., et al., 2021, A&A, 651, A109, doi:10.1051/0004-6361/202039429 https://doi.org/10.1051/0004-6361/202039429, arXiv:2009.10578 https://arxiv.org/abs/2009.10578

  9. [9]

    G., 2005, Proc.\ IEEE, 93, 216, doi:10.1109/JPROC.2004.840301 https://doi.org/10.1109/JPROC.2004.840301

    Frigo M., Johnson S. G., 2005, Proc.\ IEEE, 93, 216, doi:10.1109/JPROC.2004.840301 https://doi.org/10.1109/JPROC.2004.840301

  10. [10]

    Frontiere N., et al., 2025, in Proc.\ SC '25, ACM, New York, doi:10.1145/3712285.3771786 https://doi.org/10.1145/3712285.3771786, arXiv:2510.03557 https://arxiv.org/abs/2510.03557

  11. [11]

    Guillet T., Teyssier R., 2011, J.\ Comput.\ Phys., 230, 4756, doi:10.1016/j.jcp.2011.02.044 https://doi.org/10.1016/j.jcp.2011.02.044, arXiv:1104.1703 https://arxiv.org/abs/1104.1703

  12. [12]

    Haardt F., Madau P., 2012, ApJ, 746, 125, doi:10.1088/0004-637X/746/2/125 https://doi.org/10.1088/0004-637X/746/2/125, arXiv:1105.2039 https://arxiv.org/abs/1105.2039

  13. [13]

    Habib S., et al., 2016, New Astron., 42, 49, doi:10.1016/j.newast.2015.06.003 https://doi.org/10.1016/j.newast.2015.06.003, arXiv:1410.2805 https://arxiv.org/abs/1410.2805

  14. [14]

    Hahn O., Abel T., 2011, MNRAS, 415, 2101, doi:10.1111/j.1365-2966.2011.18820.x https://doi.org/10.1111/j.1365-2966.2011.18820.x, arXiv:1103.6031 https://arxiv.org/abs/1103.6031

  15. [15]

    Han S., et al., 2026, A&A, 705, A169, doi:10.1051/0004-6361/202556291 https://doi.org/10.1051/0004-6361/202556291, arXiv:2507.06301 https://arxiv.org/abs/2507.06301

  16. [16]

    , keywords =

    Hopkins P. F., Kere s D., O \ n orbe J., Faucher-Gigu \`e re C.-A., Quataert E., Murray N., Bullock J. S., 2014, MNRAS, 445, 581, doi:10.1093/mnras/stu1738 https://doi.org/10.1093/mnras/stu1738, arXiv:1311.2073 https://arxiv.org/abs/1311.2073

  17. [17]

    , number =

    Hopkins P. F., 2015, MNRAS, 450, 53, doi:10.1093/mnras/stv195 https://doi.org/10.1093/mnras/stv195, arXiv:1409.7395 https://arxiv.org/abs/1409.7395

  18. [18]

    FIRE-2 Simulations: Physics versus Numerics in Galaxy Formation

    Hopkins P. F., et al., 2018, MNRAS, 480, 800, doi:10.1093/mnras/sty1690 https://doi.org/10.1093/mnras/sty1690, arXiv:1702.06148 https://arxiv.org/abs/1702.06148

  19. [19]

    Ishiyama T., et al., 2021, MNRAS, 506, 4210, doi:10.1093/mnras/stab1755 https://doi.org/10.1093/mnras/stab1755, arXiv:2007.14720 https://arxiv.org/abs/2007.14720

  20. [20]

    M., Gott III J

    Kim J., Park C., Rossi G., Lee S. M., Gott III J. R., 2011, J.\ Korean Astron.\ Soc., 44, 217, doi:10.5303/JKAS.2011.44.6.217 https://doi.org/10.5303/JKAS.2011.44.6.217

  21. [21]

    E., 2015, J.\ Korean Astron.\ Soc., 48, 213, doi:10.5303/JKAS.2015.48.4.213 https://doi.org/10.5303/JKAS.2015.48.4.213, arXiv:1508.05107 https://arxiv.org/abs/1508.05107

    Kim J., Park C., L'Huillier B., Hong S. E., 2015, J.\ Korean Astron.\ Soc., 48, 213, doi:10.5303/JKAS.2015.48.4.213 https://doi.org/10.5303/JKAS.2015.48.4.213, arXiv:1508.05107 https://arxiv.org/abs/1508.05107

  22. [22]

    Klypin A., Prada F., 2019, MNRAS, 489, 1684, doi:10.1093/mnras/stz2194 https://doi.org/10.1093/mnras/stz2217

  23. [23]

    Klypin A., Prada F., 2018, MNRAS, 478, 4602, doi:10.1093/mnras/sty1340 https://doi.org/10.1093/mnras/sty1340

  24. [24]

    E., 1997, The Art of Computer Programming, Vol

    Knuth D. E., 1997, The Art of Computer Programming, Vol. 3: Sorting and Searching, 2nd edn. Addison-Wesley, Reading, MA

  25. [25]

    Lee J., et al., 2021, ApJ, 908, 11, doi:10.3847/1538-4357/abd08b https://doi.org/10.3847/1538-4357/abd08b, arXiv:2006.01039 https://arxiv.org/abs/2006.01039

  26. [26]

    I., et al., 2018, MNRAS, 473, 1195, doi:10.1093/mnras/stx1976 https://doi.org/10.1093/mnras/stx1976, arXiv:1705.03021 https://arxiv.org/abs/1705.03021

    Libeskind N. I., et al., 2018, MNRAS, 473, 1195, doi:10.1093/mnras/stx1976 https://doi.org/10.1093/mnras/stx1976, arXiv:1705.03021 https://arxiv.org/abs/1705.03021

  27. [27]

    , volume =

    Maksimova N. A., Garrison L. H., Eisenstein D. J., Hadzhiyska B., Bose S., Satterthwaite T. P., 2021, MNRAS, 508, 4017, doi:10.1093/mnras/stab2484 https://doi.org/10.1093/mnras/stab2484, arXiv:2110.11398 https://arxiv.org/abs/2110.11398

  28. [28]

    M., 1966, A Computer Oriented Geodetic Data Base and a New Technique in File Sequencing

    Morton G. M., 1966, A Computer Oriented Geodetic Data Base and a New Technique in File Sequencing. IBM, Ottawa

  29. [29]

    Nelson D., et al., 2019, Comput.\ Astrophys.\ Cosmol., 6, 2, doi:10.1186/s40668-019-0028-x https://doi.org/10.1186/s40668-019-0028-x, arXiv:1812.05609 https://arxiv.org/abs/1812.05609

  30. [30]

    Pillepich A., et al., 2018, MNRAS, 473, 4077, doi:10.1093/mnras/stx2656 https://doi.org/10.1093/mnras/stx2656, arXiv:1703.02970 https://arxiv.org/abs/1703.02970

  31. [31]

    RAMSES Team , 2024, RAMSES User's Guide, https://ramses-organisation.readthedocs.io

  32. [32]

    Morgan Kaufmann

    Samet H., 2006, Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann

  33. [33]

    Schaye J., et al., 2015, MNRAS, 446, 521, doi:10.1093/mnras/stu2058 https://doi.org/10.1093/mnras/stu2058, arXiv:1407.7040 https://arxiv.org/abs/1407.7040

  34. [34]

    Springel V., Hernquist L., 2003, MNRAS, 339, 289, doi:10.1046/j.1365-8711.2003.06206.x https://doi.org/10.1046/j.1365-8711.2003.06206.x, arXiv:astro-ph/0206393 https://arxiv.org/abs/astro-ph/0206393

  35. [35]

    Springel V., 2005, MNRAS, 364, 1105, doi:10.1111/j.1365-2966.2005.09655.x https://doi.org/10.1111/j.1365-2966.2005.09655.x, arXiv:astro-ph/0505010 https://arxiv.org/abs/astro-ph/0505010

  36. [36]

    Springel V., 2010, MNRAS, 401, 791, doi:10.1111/j.1365-2966.2009.15715.x https://doi.org/10.1111/j.1365-2966.2009.15715.x, arXiv:0901.4107 https://arxiv.org/abs/0901.4107

  37. [37]

    Teyssier R., 2002, A&A, 385, 337, doi:10.1051/0004-6361:20011817 https://doi.org/10.1051/0004-6361:20011817, arXiv:astro-ph/0111367 https://arxiv.org/abs/astro-ph/0111367

  38. [38]

    W., Sch\"uller A., 2001, Multigrid

    Trottenberg U., Oosterlee C. W., Sch\"uller A., 2001, Multigrid. Academic Press

  39. [39]

    Tremmel M., Karcher M., Governato F., Volonteri M., Quinn T. R., Pontzen A., Anderson L., Bellovary J., 2017, MNRAS, 470, 1121, doi:10.1093/mnras/stx1160 https://doi.org/10.1093/mnras/stx1160, arXiv:1607.02151 https://arxiv.org/abs/1607.02151

  40. [40]

    Wiley, Chichester

    Wesseling P., 1992, An Introduction to Multigrid Methods. Wiley, Chichester

  41. [41]

    Vogelsberger M., et al., 2014, MNRAS, 444, 1518, doi:10.1093/mnras/stu1536 https://doi.org/10.1093/mnras/stu1536, arXiv:1405.2921 https://arxiv.org/abs/1405.2921

  42. [42]

    S., Salmon J

    Warren M. S., Salmon J. K., 1993, in Proc.\ Supercomputing '93. ACM, New York, p. 12, doi:10.1145/169627.169640 https://doi.org/10.1145/169627.169640

  43. [43]

    Weinberger R., et al., 2017, MNRAS, 465, 3291, doi:10.1093/mnras/stw2944 https://doi.org/10.1093/mnras/stw2944, arXiv:1607.03486 https://arxiv.org/abs/1607.03486

  44. [44]

    K., Klein R

    Truelove J. K., Klein R. I., McKee C. F., Holliman J. H., Howell L. H., Greenough J. A., 1997, ApJ, 489, L179, doi:10.1086/310975 https://doi.org/10.1086/310975