arxiv: 2604.06035 · v1 · submitted 2026-04-07 · 🌌 astro-ph.GA · astro-ph.IM

Recognition: 2 theorem links

· Lean Theorem

cuRAMSES: Scalable AMR Optimizations for Large-Scale Cosmological Simulations

Juhan Kim

Authors on Pith no claims yet

Pith reviewed 2026-05-10 20:15 UTC · model grok-4.3

classification 🌌 astro-ph.GA astro-ph.IM

keywords AMRdomain decompositioncosmological simulationsRAMSESstrong scalingGPU accelerationadaptive mesh refinementcommunication optimization

0 comments

The pith

Recursive k-section domain decomposition replaces global all-to-all communications with neighbour-only point-to-point exchanges while keeping communication partners constant at any scale.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents cuRAMSES as a collection of optimizations for the RAMSES adaptive mesh refinement code to handle massive cosmological simulations. Its core change is a recursive k-section domain decomposition that uses hierarchical spatial partitioning instead of Hilbert curve ordering. This switch converts expensive global communications into local exchanges only between neighboring domains. The number of communication partners stays fixed no matter how many processors participate, which removes a major barrier to strong scaling. The work also cuts memory use per processor and accelerates feedback calculations by more than two orders of magnitude, all while matching the original code's conservation of mass, momentum, and energy to within half a percent.

Core claim

The central claim is that the recursive k-section domain decomposition substitutes global all-to-all communications with neighbour-only point-to-point communications while maintaining a constant number of communication partners regardless of the total rank count. This is achieved through hierarchical spatial partitioning of the domain, combined with a Morton-key hash table for octree neighbour lookup and on-demand array allocation. The same modifications enable a spatial hash-binning algorithm that accelerates supernova and AGN feedback by a factor of about 260, and they support an automatic CPU/GPU dispatch model with GPU-resident mesh data. All changes are shown to preserve numerical mass,

What carries the argument

Recursive k-section domain decomposition, a hierarchical spatial partitioning method that replaces Hilbert curve ordering to localize all communication to nearest neighbors.

If this is right

Strong scaling improves at high concurrency because the number of communication partners no longer grows with processor count.
Memory usage per rank drops through Morton-key hash tables and on-demand allocation instead of full array storage.
Feedback routines for supernovae and AGN run roughly 260 times faster thanks to spatial hash-binning in box domains.
The multigrid Poisson solver gains a 1.7 times speedup on current GPUs, with a model predicting roughly 2 times on tightly coupled future hardware.
Variable-Nrank restart allows flexible I/O without fixed processor counts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The fixed communication pattern could be ported to other adaptive mesh codes that currently hit all-to-all limits at exascale.
On future systems with tighter CPU-GPU coupling the predicted factor-of-two gain would allow either larger volumes or finer resolution at the same wall-clock time.
The on-demand allocation approach may reduce checkpoint sizes and restart overhead in workflows that change processor counts between runs.
Because neighbor-only communication is topology-aware, the method may map especially well onto systems with hierarchical interconnects.

Load-bearing premise

The new spatial partitioning and on-demand allocation preserve the exact same numerical accuracy and conservation properties as the original Hilbert-ordered AMR scheme across all tested regimes.

What would settle it

A side-by-side large-scale run in which total energy conservation deviates by more than 0.5 percent from the reference Hilbert-ordering result would show the claim does not hold.

Figures

Figures reproduced from arXiv: 2604.06035 by Juhan Kim.

**Figure 1.** Figure 1: Progressive recursive k-section decomposition for Nrank = 12 = 3 × 2 × 2. (a) The full and undivided simulation domain. (b) First split into k = 3 slabs along the longest axis. (c) Each slab bisected along the second axis (3 × 2 = 6 subdomains). (d) Final bisection along the third axis (3 × 2 × 2 = 12 leaf domains). Below each panel, the corresponding k-section tree is shown. Sub-domain volumes vary by a… view at source ↗

**Figure 2.** Figure 2: Hierarchical exchange communication pattern for Nrank = 12 (= 3 × 2 × 2). Coloured rectangles represent MPI ranks numbered 1–12, using the same colour scheme as [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Schematic of the multigrid V-cycle for four AMR levels. Upper: The left (descending) leg restricts the residual from fine to coarse, while the right (ascending) leg prolongates the correction. Orange diamonds on the prolongation arrows mark ghostzone exchanges. Lower: Exchange positions within one smoothing step. The original code requires 4 exchanges (after red, after black, residual, norm). However, t… view at source ↗

**Figure 4.** Figure 4: Gathering scheme on the spatial hash binning for feedback (shown in 2D for clarity). The domain is partitioned into a uniform grid of bins. The target cell (blue square) checks only the 9 neighbouring bins (27 in 3D), shown as shaded regions. Solid green lines connect the cell to SN events (orange stars) within the neighbourhood. Grey dashed lines indicate distant SN events that are skipped, reducing the … view at source ↗

**Figure 5.** Figure 5: FoF merger wallclock time as a function of the number of sink particles for the three backends. We randomly distribute the test sink particles. The sequential brute-force (blue) scales as O(N2 ), the serial oct-tree (orange) scales as O(N log N), and the OpenMP oct-tree (green, 12 threads) achieves a further parallel speedup of about 4.3. The test uses a massively populated, highly clustered particle distr… view at source ↗

**Figure 6.** Figure 6: HDF5 output file hierarchy. AMR, hydro, and gravity data are stored per level while particles and sinks are stored as flat arrays. All ranks write to a single shared file via MPIIO. is not directly supported requiring an intermediate step of reading with the original rank count, redistributing, and rewriting. We implement HDF5 parallel I/O using the HDF5 library’s MPI-IO backend. All ranks write to and re… view at source ↗

**Figure 7.** Figure 7: Strong scaling of cuRAMSES. Top row: Cosmo512 on a single dual-socket AMD EPYC 7543 node (64 cores), Nthread = 1. (a) Elapsed and per-component times versus Nrank. (b) Speedup relative to 1 rank, where the overall speedup reaches 33.9× at 64 ranks (53 per cent efficiency). Bottom row: Cosmo1024 on the Grammar cluster (1–32 nodes, 64 cores/node), Nthread = 8. (c) Per-component times versus Nrank. (d) Speedu… view at source ↗

**Figure 8.** Figure 8: OpenMP thread scaling of cuRAMSES with Nrank = 4 MPI ranks on a dual-socket AMD EPYC 7543 node (64 cores). (a) Percomponent wall-clock times versus Nthread. The dashed line shows ideal scaling from the single-thread baseline. The vertical dotted line marks the physical core limit (16 threads × 4 ranks = 64 cores). (b) Speedup relative to 1 thread. The MG solver achieves 10.5× speedup, while the overall el… view at source ↗

**Figure 9.** Figure 9: Memory-weighted load balance across the strong-scaling test suite. (a) Per-rank memory: Mmin and Mmax follow the ideal 1/Nrank scaling (dashed grey) until a fixed per-rank overhead of ∼4 GB dominates at high rank counts. (b) Load balance quality, showing the memory imbalance ratio Mmax/Mmin (blue, left axis) remains below 5 per cent for all rank counts tested (2–64), while the level-10 grid count imbalance… view at source ↗

**Figure 10.** Figure 10: Hybrid MPI/OpenMP scaling of cuRAMSES (Cosmo1024) on 8 nodes (512 cores total), varying the MPI/OMP split (Nrank × Nthread = 512). (a) Per-step wall-clock time decomposed into solver components; the vertical dashed line marks the production configuration (Nrank = 64, Nthread = 8). (b) Speedup relative to the pure-MPI baseline (Nthread = 1, 512 ranks), and percentages indicate parallel efficiency at each … view at source ↗

**Figure 11.** Figure 11: Weak scaling of cuRAMSES on the Grammar cluster (Nthread = 8, same cosmology, ℓmax = ℓmin + 5). (a) Wall-clock time per coarse step for the three problem sizes. (b) Average µs per grid-point, where perfect weak scaling corresponds to a constant value (dashed line). The efficiency is 88.5 per cent at 128 cores and 65.7 per cent at 1024 cores relative to the 16-core baseline. • The surface-to-volume ratio i… view at source ↗

**Figure 12.** Figure 12: Multigrid Poisson GPU performance model. (a): MG AMR wall-clock time as a function of effective CPU–GPU bandwidth B, equation (13), compared with the CPU-only baseline (dashed) and the infinite-bandwidth limit (dotted). The red star marks the measured H100 NVL data point. (b): corresponding speedup over CPU-only. The asymptotic limit of 1.95× is set by the Amdahl fraction of CPU-bound work. (c): speedup a… view at source ↗

read the original abstract

We present cuRAMSES, a suite of advanced domain decomposition strategies and algorithmic optimizations for the ramses adaptive mesh refinement (AMR) code, designed to overcome the communication, memory, and solver bottlenecks inherent in massive cosmological simulations. The central innovation is a recursive k-section domain decomposition that replaces the traditional Hilbert curve ordering with a hierarchical spatial partitioning. This approach substitutes global all-to-all communications with neighbour-only point-to-point communications. By maintaining a constant number of communication partners regardless of the total rank count, it significantly improves strong scaling at high concurrency. To address critical memory constraints at scale, we introduce a Morton-key hash table for octree-neighbour lookup alongside on-demand array allocation, drastically reducing the per-rank memory footprint. Furthermore, a novel spatial hash-binning algorithm in box-type local domains accelerates supernova and AGN feedback routines by over two orders of magnitude (an about 260 times speedup). For hybrid architectures, an automatic CPU/GPU dispatch model with GPU-resident mesh data is implemented and benchmarked. The multigrid Poisson solver achieves a 1.7 times GPU speedup on H100 and A100 GPUs, although the Godunov solver is currently PCIe-bandwidth-limited. The net improvement is about 20 per cent on current PCIe-connected hardware, and a performance model predicts about 2 times on tightly coupled architectures such as the NVIDIA GH200. Additionally, a variable-Nrank restart capability enables flexible I/O workflows. Extensive diagnostics verify that all modifications preserve mass, momentum, and energy conservation, matching the reference Hilbert-ordering run to within 0.5 per cent in the total energy diagnostic.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

cuRAMSES replaces Hilbert ordering with recursive k-section partitioning to cut all-to-all comms in RAMSES, plus hashing for memory and feedback speed, but the reported gains rest on thin benchmark details.

read the letter

The main point is that this paper swaps the usual Hilbert curve for recursive k-section domain decomposition in RAMSES. That change turns global all-to-all messages into neighbor-only point-to-point exchanges while holding the number of communication partners fixed no matter the rank count. They pair it with Morton-key hash tables for octree lookups, on-demand allocation to shrink memory, and a spatial hash-binning step that speeds feedback routines by roughly 260 times. GPU offload for the Poisson solver adds another 1.7 times on H100 and A100 cards, though the Godunov solver stays PCIe-limited and the net gain sits around 20 percent on current hardware with a model predicting 2 times on tighter systems like GH200. Conservation checks against the original code stay within 0.5 percent on total energy, which is a basic but useful sanity test.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces cuRAMSES, a suite of optimizations for the RAMSES AMR code targeting communication, memory, and solver bottlenecks in large cosmological simulations. The core innovation is a recursive k-section domain decomposition that replaces Hilbert-curve ordering, substituting global all-to-all exchanges with neighbor-only point-to-point communications while keeping the number of communication partners constant independent of rank count. Supporting changes include Morton-key hash tables for octree-neighbor lookup, on-demand array allocation to reduce per-rank memory, a spatial hash-binning algorithm claimed to accelerate supernova/AGN feedback by ~260x, an automatic CPU/GPU dispatch model with GPU-resident mesh data, a 1.7x GPU speedup for the multigrid Poisson solver on H100/A100 hardware, and variable-Nrank restart capability. All modifications are stated to preserve mass, momentum, and energy conservation to within 0.5% of the reference Hilbert run in the total-energy diagnostic.

Significance. If the reported performance gains and conservation properties are substantiated with detailed benchmarks, the work would be significant for advancing scalability of AMR cosmological simulations on hybrid architectures. The algorithmic replacement of global communications with constant-partner neighbor exchanges, combined with memory-efficient data structures, addresses load-bearing bottlenecks at high concurrency and could inform future AMR implementations. The direct comparison to the reference Hilbert run provides a clear correctness check.

major comments (3)

[Abstract] Abstract: the central performance claims (1.7x GPU Poisson gain, 260x feedback acceleration, net ~20% improvement on PCIe hardware) are presented without error bars, scaling plots, specification of test problems, or the number of ranks used, preventing quantitative assessment of robustness and strong-scaling behavior.
[Results] Results section (conservation diagnostics): the claim that mass, momentum, and energy are preserved to within 0.5% relies on a single total-energy diagnostic; additional verification is needed for other conserved quantities, across multiple resolutions, and in regimes where the new k-section partitioning differs most from Hilbert ordering.
[Domain decomposition] Domain decomposition description: the assertion that the recursive k-section maintains exact numerical equivalence to the Hilbert scheme requires explicit demonstration that AMR refinement criteria, neighbor searches via Morton-key hashing, and on-demand allocation do not alter load balance or introduce new truncation errors at refinement boundaries.

minor comments (2)

[Abstract] Clarify the precise hardware and interconnect details (e.g., PCIe vs. NVLink) underlying the 1.7x Poisson and predicted 2x GH200 figures.
Add a short table or figure caption summarizing the test problems, grid sizes, and rank counts used for all reported timings and conservation checks.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed review. The comments highlight important areas for clarification and strengthening of the manuscript. We address each major comment point by point below, indicating the specific revisions that will be made.

read point-by-point responses

Referee: [Abstract] Abstract: the central performance claims (1.7x GPU Poisson gain, 260x feedback acceleration, net ~20% improvement on PCIe hardware) are presented without error bars, scaling plots, specification of test problems, or the number of ranks used, preventing quantitative assessment of robustness and strong-scaling behavior.

Authors: We agree that the abstract would benefit from additional quantitative context. In the revised manuscript we will specify the benchmark test problem (a standard flat ΛCDM cosmological volume with 512³ base grid and eight AMR levels), the range of ranks employed (256–4096), and report standard deviations on the timing measurements. While full scaling plots belong in the Results section, we will add a concise statement on strong-scaling behavior. The 260× feedback figure is obtained from direct wall-clock comparison on identical hardware; the 1.7× Poisson gain is the average over multiple timesteps on both H100 and A100 GPUs. revision: partial
Referee: [Results] Results section (conservation diagnostics): the claim that mass, momentum, and energy are preserved to within 0.5% relies on a single total-energy diagnostic; additional verification is needed for other conserved quantities, across multiple resolutions, and in regimes where the new k-section partitioning differs most from Hilbert ordering.

Authors: We will expand the conservation diagnostics in the Results section. The revised version will report separate fractional errors for mass, linear momentum, and total energy (the original total-energy diagnostic will be retained for continuity). These quantities will be shown for three base resolutions (256³, 512³, 1024³) and for sub-volumes chosen where the k-section and Hilbert partitions differ most. New tables and difference plots will be added to demonstrate that deviations remain below 0.5 % in all cases. revision: yes
Referee: [Domain decomposition] Domain decomposition description: the assertion that the recursive k-section maintains exact numerical equivalence to the Hilbert scheme requires explicit demonstration that AMR refinement criteria, neighbor searches via Morton-key hashing, and on-demand allocation do not alter load balance or introduce new truncation errors at refinement boundaries.

Authors: We will augment the Domain Decomposition section with explicit verification. Load-balance statistics (maximum-to-minimum cell count ratio per rank) will be tabulated for both partitioning schemes at multiple refinement depths. Morton-key neighbor lists will be shown to be identical to the original octree traversal. Difference maps of density and gravitational potential at refinement boundaries between the two implementations will be presented, confirming that no additional truncation error is introduced. On-demand allocation is verified by comparing otherwise identical runs with and without the feature. revision: yes

Circularity Check

0 steps flagged

No significant circularity; algorithmic claims are empirically validated

full rationale

The manuscript describes a set of algorithmic substitutions (recursive k-section domain decomposition, Morton-key hashing, on-demand allocation, spatial hash-binning) for the RAMSES AMR code. Its central performance and scaling claims are presented as direct consequences of the new partitioning and data structures, with correctness asserted via side-by-side diagnostics against the reference Hilbert-ordered run (mass/momentum/energy conservation to 0.5 %). No equation or result is obtained by fitting a parameter to a subset of data and then relabeling the fit as a prediction; no load-bearing premise reduces to a self-citation chain; and no uniqueness theorem or ansatz is smuggled in. The work is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on standard assumptions of AMR codes (conservation form of the equations, octree neighbor relations) and introduces no new physical entities or fitted constants; all changes are algorithmic substitutions whose validity is verified by direct numerical comparison.

axioms (1)

domain assumption The underlying Godunov and multigrid solvers remain numerically stable under the new domain decomposition.
Invoked implicitly when claiming that conservation diagnostics match the reference run.

pith-pipeline@v0.9.0 · 5588 in / 1324 out tokens · 39058 ms · 2026-05-10T20:15:25.752352+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

recursive k-section domain decomposition that replaces the traditional Hilbert curve ordering with a hierarchical spatial partitioning... constant number of communication partners regardless of the total rank count
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_injective unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Morton-key hash table for octree-neighbour lookup alongside on-demand array allocation

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 37 canonical work pages · 2 internal anchors

[1]

F., 2001, in Proc.\ SC01

Adams M. F., 2001, in Proc.\ SC01. ACM, Denver, doi:10.1145/582034.582038 https://doi.org/10.1145/582034.582038

work page doi:10.1145/582034.582038 2001
[2]

Brandt A., 1977, Mathematics of Computation, 31, 333, doi:10.1090/S0025-5718-1977-0431719-X https://doi.org/10.1090/S0025-5718-1977-0431719-X

work page doi:10.1090/s0025-5718-1977-0431719-x 1977
[3]

L., Henson V

Briggs W. L., Henson V. E., McCormick S. F., 2000, A Multigrid Tutorial, 2nd edn. SIAM, Philadelphia

2000
[4]

L., Norman, M

Bryan G. L., et al., 2014, ApJS, 211, 19, doi:10.1088/0067-0049/211/2/19 https://doi.org/10.1088/0067-0049/211/2/19, arXiv:1307.2265 https://arxiv.org/abs/1307.2265

work page doi:10.1088/0067-0049/211/2/19 2014
[5]

Simba: Cosmological Simulations with Black Hole Growth and Feedback

Dav \'e R., Angl \'e s-Alc \'a zar D., Narayanan D., Li Q., Rafieferantsoa M. H., Appleby S., 2019, MNRAS, 486, 2827, doi:10.1093/mnras/stz937 https://doi.org/10.1093/mnras/stz937, arXiv:1901.10203 https://arxiv.org/abs/1901.10203

work page Pith review doi:10.1093/mnras/stz937 2019
[6]

Dubois Y., Devriendt J., Slyz A., Teyssier R., 2012, MNRAS, 420, 2662, doi:10.1111/j.1365-2966.2011.20142.x https://doi.org/10.1111/j.1365-2966.2011.20142.x, arXiv:1108.0110 https://arxiv.org/abs/1108.0110

work page doi:10.1111/j.1365-2966.2011.20142.x 2012
[7]

Dubois Y., et al., 2014, MNRAS, 444, 1453, doi:10.1093/mnras/stu1227 https://doi.org/10.1093/mnras/stu1227, arXiv:1402.1165 https://arxiv.org/abs/1402.1165

work page Pith review doi:10.1093/mnras/stu1227 2014
[8]

Dubois Y., et al., 2021, A&A, 651, A109, doi:10.1051/0004-6361/202039429 https://doi.org/10.1051/0004-6361/202039429, arXiv:2009.10578 https://arxiv.org/abs/2009.10578

work page doi:10.1051/0004-6361/202039429 2021
[9]

G., 2005, Proc.\ IEEE, 93, 216, doi:10.1109/JPROC.2004.840301 https://doi.org/10.1109/JPROC.2004.840301

Frigo M., Johnson S. G., 2005, Proc.\ IEEE, 93, 216, doi:10.1109/JPROC.2004.840301 https://doi.org/10.1109/JPROC.2004.840301

work page doi:10.1109/jproc.2004.840301 2005
[10]

Frontiere N., et al., 2025, in Proc.\ SC '25, ACM, New York, doi:10.1145/3712285.3771786 https://doi.org/10.1145/3712285.3771786, arXiv:2510.03557 https://arxiv.org/abs/2510.03557

work page doi:10.1145/3712285.3771786 2025
[11]

Guillet T., Teyssier R., 2011, J.\ Comput.\ Phys., 230, 4756, doi:10.1016/j.jcp.2011.02.044 https://doi.org/10.1016/j.jcp.2011.02.044, arXiv:1104.1703 https://arxiv.org/abs/1104.1703

work page doi:10.1016/j.jcp.2011.02.044 2011
[12]

Haardt F., Madau P., 2012, ApJ, 746, 125, doi:10.1088/0004-637X/746/2/125 https://doi.org/10.1088/0004-637X/746/2/125, arXiv:1105.2039 https://arxiv.org/abs/1105.2039

work page doi:10.1088/0004-637x/746/2/125 2012
[13]

Habib S., et al., 2016, New Astron., 42, 49, doi:10.1016/j.newast.2015.06.003 https://doi.org/10.1016/j.newast.2015.06.003, arXiv:1410.2805 https://arxiv.org/abs/1410.2805

work page doi:10.1016/j.newast.2015.06.003 2016
[14]

Hahn O., Abel T., 2011, MNRAS, 415, 2101, doi:10.1111/j.1365-2966.2011.18820.x https://doi.org/10.1111/j.1365-2966.2011.18820.x, arXiv:1103.6031 https://arxiv.org/abs/1103.6031

work page doi:10.1111/j.1365-2966.2011.18820.x 2011
[15]

Han S., et al., 2026, A&A, 705, A169, doi:10.1051/0004-6361/202556291 https://doi.org/10.1051/0004-6361/202556291, arXiv:2507.06301 https://arxiv.org/abs/2507.06301

work page doi:10.1051/0004-6361/202556291 2026
[16]

, keywords =

Hopkins P. F., Kere s D., O \ n orbe J., Faucher-Gigu \`e re C.-A., Quataert E., Murray N., Bullock J. S., 2014, MNRAS, 445, 581, doi:10.1093/mnras/stu1738 https://doi.org/10.1093/mnras/stu1738, arXiv:1311.2073 https://arxiv.org/abs/1311.2073

work page doi:10.1093/mnras/stu1738 2014
[17]

, number =

Hopkins P. F., 2015, MNRAS, 450, 53, doi:10.1093/mnras/stv195 https://doi.org/10.1093/mnras/stv195, arXiv:1409.7395 https://arxiv.org/abs/1409.7395

work page doi:10.1093/mnras/stv195 2015
[18]

FIRE-2 Simulations: Physics versus Numerics in Galaxy Formation

Hopkins P. F., et al., 2018, MNRAS, 480, 800, doi:10.1093/mnras/sty1690 https://doi.org/10.1093/mnras/sty1690, arXiv:1702.06148 https://arxiv.org/abs/1702.06148

work page Pith review doi:10.1093/mnras/sty1690 2018
[19]

Ishiyama T., et al., 2021, MNRAS, 506, 4210, doi:10.1093/mnras/stab1755 https://doi.org/10.1093/mnras/stab1755, arXiv:2007.14720 https://arxiv.org/abs/2007.14720

work page doi:10.1093/mnras/stab1755 2021
[20]

M., Gott III J

Kim J., Park C., Rossi G., Lee S. M., Gott III J. R., 2011, J.\ Korean Astron.\ Soc., 44, 217, doi:10.5303/JKAS.2011.44.6.217 https://doi.org/10.5303/JKAS.2011.44.6.217

work page doi:10.5303/jkas.2011.44.6.217 2011
[21]

E., 2015, J.\ Korean Astron.\ Soc., 48, 213, doi:10.5303/JKAS.2015.48.4.213 https://doi.org/10.5303/JKAS.2015.48.4.213, arXiv:1508.05107 https://arxiv.org/abs/1508.05107

Kim J., Park C., L'Huillier B., Hong S. E., 2015, J.\ Korean Astron.\ Soc., 48, 213, doi:10.5303/JKAS.2015.48.4.213 https://doi.org/10.5303/JKAS.2015.48.4.213, arXiv:1508.05107 https://arxiv.org/abs/1508.05107

work page doi:10.5303/jkas.2015.48.4.213 2015
[22]

Klypin A., Prada F., 2019, MNRAS, 489, 1684, doi:10.1093/mnras/stz2194 https://doi.org/10.1093/mnras/stz2217

work page doi:10.1093/mnras/stz2194 2019
[23]

Klypin A., Prada F., 2018, MNRAS, 478, 4602, doi:10.1093/mnras/sty1340 https://doi.org/10.1093/mnras/sty1340

work page doi:10.1093/mnras/sty1340 2018
[24]

E., 1997, The Art of Computer Programming, Vol

Knuth D. E., 1997, The Art of Computer Programming, Vol. 3: Sorting and Searching, 2nd edn. Addison-Wesley, Reading, MA

1997
[25]

Lee J., et al., 2021, ApJ, 908, 11, doi:10.3847/1538-4357/abd08b https://doi.org/10.3847/1538-4357/abd08b, arXiv:2006.01039 https://arxiv.org/abs/2006.01039

work page doi:10.3847/1538-4357/abd08b 2021
[26]

I., et al., 2018, MNRAS, 473, 1195, doi:10.1093/mnras/stx1976 https://doi.org/10.1093/mnras/stx1976, arXiv:1705.03021 https://arxiv.org/abs/1705.03021

Libeskind N. I., et al., 2018, MNRAS, 473, 1195, doi:10.1093/mnras/stx1976 https://doi.org/10.1093/mnras/stx1976, arXiv:1705.03021 https://arxiv.org/abs/1705.03021

work page doi:10.1093/mnras/stx1976 2018
[27]

, volume =

Maksimova N. A., Garrison L. H., Eisenstein D. J., Hadzhiyska B., Bose S., Satterthwaite T. P., 2021, MNRAS, 508, 4017, doi:10.1093/mnras/stab2484 https://doi.org/10.1093/mnras/stab2484, arXiv:2110.11398 https://arxiv.org/abs/2110.11398

work page doi:10.1093/mnras/stab2484 2021
[28]

M., 1966, A Computer Oriented Geodetic Data Base and a New Technique in File Sequencing

Morton G. M., 1966, A Computer Oriented Geodetic Data Base and a New Technique in File Sequencing. IBM, Ottawa

1966
[29]

Nelson D., et al., 2019, Comput.\ Astrophys.\ Cosmol., 6, 2, doi:10.1186/s40668-019-0028-x https://doi.org/10.1186/s40668-019-0028-x, arXiv:1812.05609 https://arxiv.org/abs/1812.05609

work page doi:10.1186/s40668-019-0028-x 2019
[30]

Pillepich A., et al., 2018, MNRAS, 473, 4077, doi:10.1093/mnras/stx2656 https://doi.org/10.1093/mnras/stx2656, arXiv:1703.02970 https://arxiv.org/abs/1703.02970

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1093/mnras/stx2656 2018
[31]

RAMSES Team , 2024, RAMSES User's Guide, https://ramses-organisation.readthedocs.io

2024
[32]

Morgan Kaufmann

Samet H., 2006, Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann

2006
[33]

Schaye J., et al., 2015, MNRAS, 446, 521, doi:10.1093/mnras/stu2058 https://doi.org/10.1093/mnras/stu2058, arXiv:1407.7040 https://arxiv.org/abs/1407.7040

work page doi:10.1093/mnras/stu2058 2015
[34]

Springel V., Hernquist L., 2003, MNRAS, 339, 289, doi:10.1046/j.1365-8711.2003.06206.x https://doi.org/10.1046/j.1365-8711.2003.06206.x, arXiv:astro-ph/0206393 https://arxiv.org/abs/astro-ph/0206393

work page doi:10.1046/j.1365-8711.2003.06206.x 2003
[35]

Springel V., 2005, MNRAS, 364, 1105, doi:10.1111/j.1365-2966.2005.09655.x https://doi.org/10.1111/j.1365-2966.2005.09655.x, arXiv:astro-ph/0505010 https://arxiv.org/abs/astro-ph/0505010

work page doi:10.1111/j.1365-2966.2005.09655.x 2005
[36]

Springel V., 2010, MNRAS, 401, 791, doi:10.1111/j.1365-2966.2009.15715.x https://doi.org/10.1111/j.1365-2966.2009.15715.x, arXiv:0901.4107 https://arxiv.org/abs/0901.4107

work page doi:10.1111/j.1365-2966.2009.15715.x 2010
[37]

Teyssier R., 2002, A&A, 385, 337, doi:10.1051/0004-6361:20011817 https://doi.org/10.1051/0004-6361:20011817, arXiv:astro-ph/0111367 https://arxiv.org/abs/astro-ph/0111367

work page doi:10.1051/0004-6361:20011817 2002
[38]

W., Sch\"uller A., 2001, Multigrid

Trottenberg U., Oosterlee C. W., Sch\"uller A., 2001, Multigrid. Academic Press

2001
[39]

Tremmel M., Karcher M., Governato F., Volonteri M., Quinn T. R., Pontzen A., Anderson L., Bellovary J., 2017, MNRAS, 470, 1121, doi:10.1093/mnras/stx1160 https://doi.org/10.1093/mnras/stx1160, arXiv:1607.02151 https://arxiv.org/abs/1607.02151

work page doi:10.1093/mnras/stx1160 2017
[40]

Wiley, Chichester

Wesseling P., 1992, An Introduction to Multigrid Methods. Wiley, Chichester

1992
[41]

Vogelsberger M., et al., 2014, MNRAS, 444, 1518, doi:10.1093/mnras/stu1536 https://doi.org/10.1093/mnras/stu1536, arXiv:1405.2921 https://arxiv.org/abs/1405.2921

work page doi:10.1093/mnras/stu1536 2014
[42]

S., Salmon J

Warren M. S., Salmon J. K., 1993, in Proc.\ Supercomputing '93. ACM, New York, p. 12, doi:10.1145/169627.169640 https://doi.org/10.1145/169627.169640

work page doi:10.1145/169627.169640 1993
[43]

Weinberger R., et al., 2017, MNRAS, 465, 3291, doi:10.1093/mnras/stw2944 https://doi.org/10.1093/mnras/stw2944, arXiv:1607.03486 https://arxiv.org/abs/1607.03486

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1093/mnras/stw2944 2017
[44]

K., Klein R

Truelove J. K., Klein R. I., McKee C. F., Holliman J. H., Howell L. H., Greenough J. A., 1997, ApJ, 489, L179, doi:10.1086/310975 https://doi.org/10.1086/310975

work page doi:10.1086/310975 1997