pith. machine review for the scientific record. sign in

arxiv: 2605.10729 · v1 · submitted 2026-05-11 · 💻 cs.CE · physics.plasm-ph

Recognition: 2 theorem links

· Lean Theorem

On Distributed Parallelization Strategies for Particle-in-Fourier Schemes

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:43 UTC · model grok-4.3

classification 💻 cs.CE physics.plasm-ph
keywords particle-in-Fourierparallelization strategiesdomain decompositionparticle decompositionparareal algorithmkinetic plasma simulationsMPI scalingLandau damping
0
0 comments X

The pith

Three distributed parallelization strategies for particle-in-Fourier plasma schemes differ in communication patterns and scaling behavior depending on the relative numbers of particles and modes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares domain decomposition of both particles and Fourier modes, particle-only decomposition where every rank holds all modes, and space-time decomposition that layers parareal time parallelism on top of particle decomposition. It details the resulting communication patterns, the regimes in which each strategy performs best, and their respective advantages and disadvantages. Scaling experiments on 3D-3V Landau damping and Penning trap problems, run inside the IPPL library on Alps and JUWELS, measure dominant component timings and flag targets for future optimization. A reader would care because these choices determine whether large-scale kinetic plasma simulations remain feasible on current supercomputers. The work supplies concrete guidance on when to choose one decomposition over another.

Core claim

We present and compare three distributed parallelization strategies for particle-in-Fourier schemes: domain decomposition, in which both particles and Fourier modes are split across MPI ranks; particle decomposition, in which only particles are split while each rank retains all modes; and space-time decomposition, in which parareal time parallelization is added to particle decomposition. We describe the distinct communication patterns of each approach, the parameter regimes in which they work best, and their advantages and disadvantages. Implemented within the performance-portable IPPL library, the strategies are tested through scaling studies on 3D-3V Landau damping and Penning trap cases,,

What carries the argument

The three parallelization strategies (domain decomposition, particle decomposition, and space-time decomposition) that control how particles and Fourier modes are distributed across ranks and thereby fix the communication volume and scaling limits.

If this is right

  • Domain decomposition reduces per-rank memory for Fourier modes when their count is large relative to the particle count.
  • Particle decomposition avoids mode communication entirely but still requires particle data exchange at each step.
  • Space-time decomposition supplies an extra axis of parallelism that can be used once spatial decomposition saturates.
  • Dominant timings shift with strategy, identifying communication or local computation as the next optimization target.
  • The strategies are realized in a single performance-portable library, allowing direct comparison across architectures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same decomposition choices could be applied to other spectral particle methods in fields such as astrophysics or beam physics.
  • Network topology and latency characteristics of future machines may shift the crossover points between strategies.
  • Hybrid MPI plus shared-memory or GPU versions of the same decompositions would be a direct next implementation step.
  • The parareal layer could be replaced by other time-parallel methods if the plasma problem permits coarser propagators.

Load-bearing premise

The communication patterns and scaling behaviors measured on the Landau damping and Penning trap benchmarks remain representative for other kinetic plasma problems and hardware platforms.

What would settle it

A scaling experiment on a different problem such as plasma turbulence or two-stream instability that reverses the relative performance ordering of the three strategies would show the reported regimes are not general.

read the original abstract

We present and compare distributed parallelization strategies for the particle-in-Fourier (PIF) schemes used in kinetic plasma simulations. The different strategies are i) domain decomposition, where both the particles and Fourier modes are split between the MPI ranks ii) particle decomposition, where only the particles are split between the ranks and each rank carries all the modes, and, iii) space-time decomposition, in which time parallelization based on the parareal algorithm is added on top of the particle decomposition. We describe the different communication patterns involved in each of the strategies, the parameter regimes where they work best, and explain their advantages and disadvantages. We implement the strategies within the open-source, performance portable library IPPL and conduct scaling studies with 3D-3V Landau damping and Penning trap benchmark problems on Alps and JUWELS booster supercomputers. We analyze the dominant component timings in each of the strategies and identify areas for future optimizations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript presents and compares three distributed parallelization strategies for particle-in-Fourier (PIF) schemes: (i) domain decomposition (splitting both particles and Fourier modes across MPI ranks), (ii) particle decomposition (splitting only particles while each rank holds all modes), and (iii) space-time decomposition (adding parareal time parallelization atop particle decomposition). It describes the associated communication patterns, identifies parameter regimes where each performs best, discusses advantages and disadvantages, implements the strategies in the open-source IPPL library, and reports strong-scaling studies plus timing breakdowns for 3D-3V Landau damping and Penning trap problems on the Alps and JUWELS booster machines.

Significance. If the reported communication patterns, timing breakdowns, and regime identifications are reproducible, the work supplies concrete, actionable guidance for scaling PIF-based kinetic plasma simulations on current HPC platforms. The open-source IPPL implementation and the explicit analysis of dominant costs (particle-grid interpolation, FFTs, MPI exchanges) constitute reusable assets that can accelerate adoption and further optimization in the field.

major comments (1)
  1. [scaling studies and benchmark results sections] The central claim that the strategies are compared with respect to 'the parameter regimes where they work best' rests on scaling data from only two 3D-3V benchmarks (Landau damping and Penning trap). These problems share relatively uniform particle distributions and modest load imbalance; the manuscript does not demonstrate that the reported crossover points or optimal regimes remain stable under changes in density gradients, particle-per-mode counts, or geometry that commonly arise in other kinetic problems. This limits the generality of the stated advantages and disadvantages.
minor comments (1)
  1. [Abstract and §1] The abstract and introduction would benefit from an explicit statement of the velocity-space dimensionality and the number of Fourier modes retained in each benchmark, to allow readers to assess load-balance characteristics immediately.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The major comment raises an important point regarding the generality of our findings, which we address below. We will incorporate revisions to qualify our claims appropriately.

read point-by-point responses
  1. Referee: [scaling studies and benchmark results sections] The central claim that the strategies are compared with respect to 'the parameter regimes where they work best' rests on scaling data from only two 3D-3V benchmarks (Landau damping and Penning trap). These problems share relatively uniform particle distributions and modest load imbalance; the manuscript does not demonstrate that the reported crossover points or optimal regimes remain stable under changes in density gradients, particle-per-mode counts, or geometry that commonly arise in other kinetic problems. This limits the generality of the stated advantages and disadvantages.

    Authors: We agree that the two benchmarks (3D-3V Landau damping and Penning trap) feature relatively uniform particle distributions and modest load imbalance, and that our scaling studies do not include cases with strong density gradients, varying particle-per-mode ratios, or complex geometries. The identified parameter regimes, crossover points, and associated advantages/disadvantages are therefore specific to these standard test problems. In the revised manuscript, we will explicitly qualify the relevant claims in the scaling studies and benchmark results sections (and in the abstract and conclusions) to state that the reported regimes apply to the tested benchmarks. We will also add a brief discussion noting this limitation and identifying more complex kinetic problems as a direction for future validation. This change will ensure the claims are not overstated while preserving the concrete guidance provided by the current results and communication analysis. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical comparison of parallelization strategies with external benchmarks

full rationale

The manuscript is an empirical engineering study that implements three parallelization strategies (domain, particle, and space-time decomposition) inside the existing open-source IPPL library and measures their communication patterns and scaling on two standard 3D-3V kinetic benchmarks run on external supercomputers. No equations are derived, no parameters are fitted to the target results, and no uniqueness theorems or self-citations are invoked to justify the central claims. The reported regimes, timings, and trade-offs are direct observations from the performed runs; they do not reduce to the paper's own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work relies on standard MPI communication assumptions and the correctness of the existing IPPL library and parareal algorithm; no new free parameters, axioms beyond domain standards, or invented entities are introduced.

axioms (1)
  • domain assumption Standard assumptions about MPI communication latency and bandwidth in distributed-memory parallel computing.
    Implicit when describing communication patterns for each decomposition strategy.

pith-pipeline@v0.9.0 · 5471 in / 1322 out tokens · 31499 ms · 2026-05-12T04:43:05.053541+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages

  1. [1]

    CRC press (2004)

    Birdsall, C.K., Langdon, A.B.: Plasma physics via computer simulation. CRC press (2004)

  2. [2]

    CRC Press (2021) 19 (a) Landau DD (b) Penning DD (c) Landau PD (d) Penning PD (e) Landau ST (f) Penning ST Fig

    Hockney, R.W., Eastwood, J.W.: Computer simulation using particles. CRC Press (2021) 19 (a) Landau DD (b) Penning DD (c) Landau PD (d) Penning PD (e) Landau ST (f) Penning ST Fig. A1Strong scaling of dominant component timings of the three parallelization strategies for LandaudampingandPenningtraptestcasesonJUWELSbooster.Inthesub-figurelabels,DDstands for...

  3. [3]

    Journal of Computational Physics6(2), 247–267 (1970)

    Langdon, A.B.: Effects of the spatial grid in simulation plasmas. Journal of Computational Physics6(2), 247–267 (1970)

  4. [4]

    Computer Physics Communications207, 123–135 (2016)

    Huang, C.-K., Zeng, Y., Wang, Y., Meyers, M.D., Yi, S., Albright, B.J.: Finite grid instability and spectral fidelity of the electrostatic particle-in-cell algorithm. Computer Physics Communications207, 123–135 (2016)

  5. [5]

    Journal of Computational Physics230(18), 7037–7052 (2011)

    Markidis, S., Lapenta, G.: The energy conserving particle-in-cell method. Journal of Computational Physics230(18), 7037–7052 (2011)

  6. [6]

    Journal of Computational Physics 230(18), 7018–7036 (2011)

    Chen, G., Chacón, L., Barnes, D.: An energy-and charge-conserving, implicit, electrostatic particle-in-cell algorithm. Journal of Computational Physics 230(18), 7018–7036 (2011)

  7. [7]

    Squire, J., Qin, H., Tang, W.M.: Geometric integration of the Vlasov-Maxwell systemwithavariationalparticle-in-cellscheme.PhysicsofPlasmas19(8)(2012)

  8. [8]

    Plasma Science and Technology20(11), 110501 (2018)

    Jianyuan, X., Hong, Q., Jian, L.: Structure-preserving geometric particle-in-cell methods for Vlasov-Maxwell systems. Plasma Science and Technology20(11), 110501 (2018)

  9. [9]

    Journal of Scientific Computing101(3), 68 (2024)

    Campos Pinto, M., Ameres, J., Kormann, K., Sonnendrücker, E.: On variational fourier particle methods. Journal of Scientific Computing101(3), 68 (2024)

  10. [10]

    Journal of Scientific Computing91(2), 46 (2022)

    Campos Pinto, M., Kormann, K., Sonnendrücker, E.: Variational framework for structure-preserving electromagnetic particle-in-cell methods. Journal of Scientific Computing91(2), 46 (2022)

  11. [11]

    Physics of Plasmas23(9) (2016)

    He, Y., Sun, Y., Qin, H., Liu, J.: Hamiltonian particle-in-cell methods for Vlasov- Maxwell equations. Physics of Plasmas23(9) (2016)

  12. [12]

    Journal of Plasma Physics83(4), 905830401 (2017)

    Kraus, M., Kormann, K., Morrison, P.J., Sonnendrücker, E.: GEMPIC: geomet- ric electromagnetic particle-in-cell methods. Journal of Plasma Physics83(4), 905830401 (2017)

  13. [13]

    Journal of Computational Physics245, 376–398 (2013)

    Evstatiev, E.G., Shadwick, B.A.: Variational formulation of particle algorithms for kinetic plasma simulations. Journal of Computational Physics245, 376–398 (2013)

  14. [14]

    Journal of Computational Physics396, 837–847 (2019)

    Mitchell, M.S., Miecnikowski, M.T., Beylkin, G., Parker, S.E.: Efficient fourier basis particle simulation. Journal of Computational Physics396, 837–847 (2019)

  15. [15]

    Journal of Computational Physics519, 113390 (2024) 21

    Shen, C.N., Cerfon, A., Muralikrishnan, S.: A particle-in-fourier method with semi-discrete energy conservation for non-periodic boundary conditions. Journal of Computational Physics519, 113390 (2024) 21

  16. [16]

    Concurrency: Practice and experience2(4), 257– 288 (1990)

    Walker, D.W.: Characterizing the parallel performance of a large-scale, particle- in-cell plasma simulation code. Concurrency: Practice and experience2(4), 257– 288 (1990)

  17. [17]

    Concurrency: Practice and Experience9(12), 1377–1405 (1997)

    Carmona, E.A., Chandler, L.J.: On parallel pic versatility and the structure of parallel pic approaches. Concurrency: Practice and Experience9(12), 1377–1405 (1997)

  18. [18]

    Parallel Computing27(3), 295–314 (2001)

    Di Martino, B., Briguglio, S., Vlad, G., Sguazzero, P.: Parallel pic plasma sim- ulation through particle decomposition techniques. Parallel Computing27(3), 295–314 (2001)

  19. [19]

    Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment909, 476–479 (2018)

    Vay, J.-L., Almgren, A., Bell, J., Ge, L., Grote, D., Hogan, M., Kononenko, O., Lehe, R., Myers, A., Ng, C.,et al.: Warp-x: A new exascale computing plat- form for beam–plasma simulations. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment909, 476–479 (2018)

  20. [20]

    IEEE Transactions on Plasma Science 38(10), 2831–2839 (2010)

    Burau,H.,Widera,R.,Hönig,W.,Juckeland,G.,Debus,A.,Kluge,T.,Schramm, U., Cowan, T.E., Sauerbrey, R., Bussmann, M.: Picongpu: A fully relativistic particle-in-cell code for a gpu cluster. IEEE Transactions on Plasma Science 38(10), 2831–2839 (2010)

  21. [21]

    In: Proceedings of the 2024 SIAM Conference on Parallel Processing for Scientific Computing (PP), pp

    Muralikrishnan, S., Frey, M., Vinciguerra, A., Ligotino, M., Cerfon, A.J., Stoy- anov, M., Gayatri, R., Adelmann, A.: Scaling and performance portability of the particle-in-cell scheme for plasma physics applications through mini-apps tar- geting exascale architectures. In: Proceedings of the 2024 SIAM Conference on Parallel Processing for Scientific Comp...

  22. [22]

    Future Generation Computer Systems16(5), 541–552 (2000)

    Briguglio, S., Vlad, G., Di Martino, B., Fogaccia, G.: Parallelization of plasma simulation codes: gridless finite size particle versus particle in cell approach. Future Generation Computer Systems16(5), 541–552 (2000)

  23. [23]

    SIAM Journal on Scientific Computing, 311–336 (2025)

    Muralikrishnan, S., Speck, R.: Error analysis and parallel scaling study of a parareal parallel-in-time integration algorithm for particle-in-fourier schemes. SIAM Journal on Scientific Computing, 311–336 (2025)

  24. [24]

    SIAM Journal on Scientific Computing35(4), 411–437 (2013)

    Pippig,M.,Potts,D.:Parallelthree-dimensionalnonequispacedfastfouriertrans- forms and their application to particle simulation. SIAM Journal on Scientific Computing35(4), 411–437 (2013)

  25. [25]

    PhD thesis, Dissertation, Chemnitz, Technische Universität Chemnitz, 2015 (2016)

    Pippig, M.: Massively parallel, fast fourier transforms and particle-mesh meth- ods. PhD thesis, Dissertation, Chemnitz, Technische Universität Chemnitz, 2015 (2016)

  26. [26]

    SIAM Journal on Scientific computing14(6), 1368–1393 (1993) 22

    Dutt, A., Rokhlin, V.: Fast fourier transforms for nonequispaced data. SIAM Journal on Scientific computing14(6), 1368–1393 (1993) 22

  27. [27]

    Applied and Computational Harmonic Analysis2(1), 85–100 (1995)

    Dutt, A., Rokhlin, V.: Fast fourier transforms for nonequispaced data, ii. Applied and Computational Harmonic Analysis2(1), 85–100 (1995)

  28. [28]

    Modern Sampling Theory: Mathematics and Applications, 247–270 (2001)

    Potts, D., Steidl, G., Tasche, M.: Fast fourier transforms for nonequispaced data: A tutorial. Modern Sampling Theory: Mathematics and Applications, 247–270 (2001)

  29. [29]

    exponential of semicircle

    Barnett, A.H., Magland, J., Klinteberg, L.: A parallel nonuniform fast fourier transform library based on an “exponential of semicircle" kernel. SIAM Journal on Scientific Computing41(5), 479–504 (2019)

  30. [30]

    In: Proceedings of the Platform for Advanced Scientific Computing Conference, pp

    Rowan, M.E., Gott, K.N., Deslippe, J., Huebl, A., Thévenet, M., Lehe, R., Vay, J.-L.: In-situ assessment of device-side compute work for dynamic load balancing in a gpu-accelerated pic code. In: Proceedings of the Platform for Advanced Scientific Computing Conference, pp. 1–11 (2021)

  31. [31]

    Communications of the ACM7(12), 731–733 (1964)

    Nievergelt, J.: Parallel methods for integrating ordinary differential equations. Communications of the ACM7(12), 731–733 (1964)

  32. [32]

    In: Carraro, T., Geiger, M., Körkel, S., Rannacher, R

    Gander, M.J.: 50 years of time parallel time integration. In: Carraro, T., Geiger, M., Körkel, S., Rannacher, R. (eds.) Multiple Shooting and Time Domain Decomposition Methods, pp. 69–113. Springer, Cham (2015)

  33. [33]

    Computing and Visualization in Science23, 1–15 (2020)

    Ong, B.W., Schroder, J.B.: Applications of time parallelization. Computing and Visualization in Science23, 1–15 (2020)

  34. [34]

    https://parallel-in-time.org/

    PinT-Community: Parallel-in-Time webpage. https://parallel-in-time.org/. Accessed: 13-01-2026

  35. [35]

    Comptes Rendus de l’Académie des Sciences-Series I-Mathematics 332(7), 661–668 (2001)

    Lions, J.-L., Maday, Y., Turinici, G.: Résolution d’edp par un schéma en temps «pararéel». Comptes Rendus de l’Académie des Sciences-Series I-Mathematics 332(7), 661–668 (2001)

  36. [36]

    Parallel Computing 37(3), 172–182 (2011)

    Aubanel, E.: Scheduling of tasks in the parareal algorithm. Parallel Computing 37(3), 172–182 (2011)

  37. [37]

    Journal of Computational Physics371, 483–505 (2018)

    Nielsen, A.S., Brunner, G., Hesthaven, J.S.: Communication-aware adaptive parareal with application to a nonlinear hyperbolic system of partial differential equations. Journal of Computational Physics371, 483–505 (2018)

  38. [38]

    In: Workshops on Parallel-in-Time Integration, pp

    Götschel, S., Minion, M., Ruprecht, D., Speck, R.: Twelve ways to fool the masses when giving parallel-in-time results. In: Workshops on Parallel-in-Time Integration, pp. 81–94 (2020). Springer

  39. [39]

    Zenodo (2024) 23

    Frey, M., Vinciguerra, A., Muralikrishnan, S., Mayani, S., Montanaro, V., Sadr, M., Adelmann, A., Winkler, M., Schurk, F.: IPPL-framework/ippl: IPPL-3.2.0. Zenodo (2024) 23

  40. [40]

    In: 2021 IEEE Inter- national Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp

    Shih, Y.-h., Wright, G., Andén, J., Blaschke, J., Barnett, A.H.: cufinufft: a load- balanced gpu library for general-purpose nonuniform ffts. In: 2021 IEEE Inter- national Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 688–697 (2021). IEEE

  41. [41]

    ACM Transactions on Mathematical Software (TOMS)36(4), 1–30 (2009)

    Keiner, J., Kunis, S., Potts, D.: Using nfft 3—a software library for various noneq- uispaced fast fourier transforms. ACM Transactions on Mathematical Software (TOMS)36(4), 1–30 (2009)

  42. [42]

    Submitted to PASC Conference Proceedings 2026

    Fischill, P., Adelmann, A., Muralikrishnan, S.: A Performance-Portable, Mas- sively Parallel Distributed Nonuniform FFT. Submitted to PASC Conference Proceedings 2026

  43. [43]

    SIAM Journal on Scientific Computing36(6), 635–661 (2014) 24

    Falgout, R.D., Friedhoff, S.,Kolev, T.V.,MacLachlan, S.P., Schroder, J.B.: Paral- lel time integration with multigrid. SIAM Journal on Scientific Computing36(6), 635–661 (2014) 24