arxiv: 2605.10729 · v1 · submitted 2026-05-11 · 💻 cs.CE · physics.plasm-ph

Recognition: 2 theorem links

· Lean Theorem

On Distributed Parallelization Strategies for Particle-in-Fourier Schemes

Sriramkrishnan Muralikrishnan , Paul Fischill , Andreas Adelmann , Robert Speck

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:43 UTC · model grok-4.3

classification 💻 cs.CE physics.plasm-ph

keywords particle-in-Fourierparallelization strategiesdomain decompositionparticle decompositionparareal algorithmkinetic plasma simulationsMPI scalingLandau damping

0 comments

The pith

Three distributed parallelization strategies for particle-in-Fourier plasma schemes differ in communication patterns and scaling behavior depending on the relative numbers of particles and modes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares domain decomposition of both particles and Fourier modes, particle-only decomposition where every rank holds all modes, and space-time decomposition that layers parareal time parallelism on top of particle decomposition. It details the resulting communication patterns, the regimes in which each strategy performs best, and their respective advantages and disadvantages. Scaling experiments on 3D-3V Landau damping and Penning trap problems, run inside the IPPL library on Alps and JUWELS, measure dominant component timings and flag targets for future optimization. A reader would care because these choices determine whether large-scale kinetic plasma simulations remain feasible on current supercomputers. The work supplies concrete guidance on when to choose one decomposition over another.

Core claim

We present and compare three distributed parallelization strategies for particle-in-Fourier schemes: domain decomposition, in which both particles and Fourier modes are split across MPI ranks; particle decomposition, in which only particles are split while each rank retains all modes; and space-time decomposition, in which parareal time parallelization is added to particle decomposition. We describe the distinct communication patterns of each approach, the parameter regimes in which they work best, and their advantages and disadvantages. Implemented within the performance-portable IPPL library, the strategies are tested through scaling studies on 3D-3V Landau damping and Penning trap cases,,

What carries the argument

The three parallelization strategies (domain decomposition, particle decomposition, and space-time decomposition) that control how particles and Fourier modes are distributed across ranks and thereby fix the communication volume and scaling limits.

If this is right

Domain decomposition reduces per-rank memory for Fourier modes when their count is large relative to the particle count.
Particle decomposition avoids mode communication entirely but still requires particle data exchange at each step.
Space-time decomposition supplies an extra axis of parallelism that can be used once spatial decomposition saturates.
Dominant timings shift with strategy, identifying communication or local computation as the next optimization target.
The strategies are realized in a single performance-portable library, allowing direct comparison across architectures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same decomposition choices could be applied to other spectral particle methods in fields such as astrophysics or beam physics.
Network topology and latency characteristics of future machines may shift the crossover points between strategies.
Hybrid MPI plus shared-memory or GPU versions of the same decompositions would be a direct next implementation step.
The parareal layer could be replaced by other time-parallel methods if the plasma problem permits coarser propagators.

Load-bearing premise

The communication patterns and scaling behaviors measured on the Landau damping and Penning trap benchmarks remain representative for other kinetic plasma problems and hardware platforms.

What would settle it

A scaling experiment on a different problem such as plasma turbulence or two-stream instability that reverses the relative performance ordering of the three strategies would show the reported regimes are not general.

read the original abstract

We present and compare distributed parallelization strategies for the particle-in-Fourier (PIF) schemes used in kinetic plasma simulations. The different strategies are i) domain decomposition, where both the particles and Fourier modes are split between the MPI ranks ii) particle decomposition, where only the particles are split between the ranks and each rank carries all the modes, and, iii) space-time decomposition, in which time parallelization based on the parareal algorithm is added on top of the particle decomposition. We describe the different communication patterns involved in each of the strategies, the parameter regimes where they work best, and explain their advantages and disadvantages. We implement the strategies within the open-source, performance portable library IPPL and conduct scaling studies with 3D-3V Landau damping and Penning trap benchmark problems on Alps and JUWELS booster supercomputers. We analyze the dominant component timings in each of the strategies and identify areas for future optimizations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper supplies measured scaling data and communication breakdowns for three parallel strategies in particle-in-Fourier plasma codes, implemented in open IPPL, but its advice on best parameter regimes rests on only two benchmarks.

read the letter

This paper compares domain decomposition, particle decomposition, and a parareal-augmented space-time version for particle-in-Fourier schemes. The authors implement all three inside the IPPL library and time them on Landau damping and Penning trap problems run on Alps and JUWELS. They spell out the MPI patterns for each approach and break down which parts of the runtime dominate under different decompositions.

Referee Report

1 major / 1 minor

Summary. The manuscript presents and compares three distributed parallelization strategies for particle-in-Fourier (PIF) schemes: (i) domain decomposition (splitting both particles and Fourier modes across MPI ranks), (ii) particle decomposition (splitting only particles while each rank holds all modes), and (iii) space-time decomposition (adding parareal time parallelization atop particle decomposition). It describes the associated communication patterns, identifies parameter regimes where each performs best, discusses advantages and disadvantages, implements the strategies in the open-source IPPL library, and reports strong-scaling studies plus timing breakdowns for 3D-3V Landau damping and Penning trap problems on the Alps and JUWELS booster machines.

Significance. If the reported communication patterns, timing breakdowns, and regime identifications are reproducible, the work supplies concrete, actionable guidance for scaling PIF-based kinetic plasma simulations on current HPC platforms. The open-source IPPL implementation and the explicit analysis of dominant costs (particle-grid interpolation, FFTs, MPI exchanges) constitute reusable assets that can accelerate adoption and further optimization in the field.

major comments (1)

[scaling studies and benchmark results sections] The central claim that the strategies are compared with respect to 'the parameter regimes where they work best' rests on scaling data from only two 3D-3V benchmarks (Landau damping and Penning trap). These problems share relatively uniform particle distributions and modest load imbalance; the manuscript does not demonstrate that the reported crossover points or optimal regimes remain stable under changes in density gradients, particle-per-mode counts, or geometry that commonly arise in other kinetic problems. This limits the generality of the stated advantages and disadvantages.

minor comments (1)

[Abstract and §1] The abstract and introduction would benefit from an explicit statement of the velocity-space dimensionality and the number of Fourier modes retained in each benchmark, to allow readers to assess load-balance characteristics immediately.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The major comment raises an important point regarding the generality of our findings, which we address below. We will incorporate revisions to qualify our claims appropriately.

read point-by-point responses

Referee: [scaling studies and benchmark results sections] The central claim that the strategies are compared with respect to 'the parameter regimes where they work best' rests on scaling data from only two 3D-3V benchmarks (Landau damping and Penning trap). These problems share relatively uniform particle distributions and modest load imbalance; the manuscript does not demonstrate that the reported crossover points or optimal regimes remain stable under changes in density gradients, particle-per-mode counts, or geometry that commonly arise in other kinetic problems. This limits the generality of the stated advantages and disadvantages.

Authors: We agree that the two benchmarks (3D-3V Landau damping and Penning trap) feature relatively uniform particle distributions and modest load imbalance, and that our scaling studies do not include cases with strong density gradients, varying particle-per-mode ratios, or complex geometries. The identified parameter regimes, crossover points, and associated advantages/disadvantages are therefore specific to these standard test problems. In the revised manuscript, we will explicitly qualify the relevant claims in the scaling studies and benchmark results sections (and in the abstract and conclusions) to state that the reported regimes apply to the tested benchmarks. We will also add a brief discussion noting this limitation and identifying more complex kinetic problems as a direction for future validation. This change will ensure the claims are not overstated while preserving the concrete guidance provided by the current results and communication analysis. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical comparison of parallelization strategies with external benchmarks

full rationale

The manuscript is an empirical engineering study that implements three parallelization strategies (domain, particle, and space-time decomposition) inside the existing open-source IPPL library and measures their communication patterns and scaling on two standard 3D-3V kinetic benchmarks run on external supercomputers. No equations are derived, no parameters are fitted to the target results, and no uniqueness theorems or self-citations are invoked to justify the central claims. The reported regimes, timings, and trade-offs are direct observations from the performed runs; they do not reduce to the paper's own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work relies on standard MPI communication assumptions and the correctness of the existing IPPL library and parareal algorithm; no new free parameters, axioms beyond domain standards, or invented entities are introduced.

axioms (1)

domain assumption Standard assumptions about MPI communication latency and bandwidth in distributed-memory parallel computing.
Implicit when describing communication patterns for each decomposition strategy.

pith-pipeline@v0.9.0 · 5471 in / 1322 out tokens · 31499 ms · 2026-05-12T04:43:05.053541+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We present and compare distributed parallelization strategies for the particle-in-Fourier (PIF) schemes... scaling studies with 3D-3V Landau damping and Penning trap benchmark problems
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The different strategies are i) domain decomposition... ii) particle decomposition... iii) space-time decomposition

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages

[1]

CRC press (2004)

Birdsall, C.K., Langdon, A.B.: Plasma physics via computer simulation. CRC press (2004)

work page 2004
[2]

CRC Press (2021) 19 (a) Landau DD (b) Penning DD (c) Landau PD (d) Penning PD (e) Landau ST (f) Penning ST Fig

Hockney, R.W., Eastwood, J.W.: Computer simulation using particles. CRC Press (2021) 19 (a) Landau DD (b) Penning DD (c) Landau PD (d) Penning PD (e) Landau ST (f) Penning ST Fig. A1Strong scaling of dominant component timings of the three parallelization strategies for LandaudampingandPenningtraptestcasesonJUWELSbooster.Inthesub-figurelabels,DDstands for...

work page 2021
[3]

Journal of Computational Physics6(2), 247–267 (1970)

Langdon, A.B.: Effects of the spatial grid in simulation plasmas. Journal of Computational Physics6(2), 247–267 (1970)

work page 1970
[4]

Computer Physics Communications207, 123–135 (2016)

Huang, C.-K., Zeng, Y., Wang, Y., Meyers, M.D., Yi, S., Albright, B.J.: Finite grid instability and spectral fidelity of the electrostatic particle-in-cell algorithm. Computer Physics Communications207, 123–135 (2016)

work page 2016
[5]

Journal of Computational Physics230(18), 7037–7052 (2011)

Markidis, S., Lapenta, G.: The energy conserving particle-in-cell method. Journal of Computational Physics230(18), 7037–7052 (2011)

work page 2011
[6]

Journal of Computational Physics 230(18), 7018–7036 (2011)

Chen, G., Chacón, L., Barnes, D.: An energy-and charge-conserving, implicit, electrostatic particle-in-cell algorithm. Journal of Computational Physics 230(18), 7018–7036 (2011)

work page 2011
[7]

Squire, J., Qin, H., Tang, W.M.: Geometric integration of the Vlasov-Maxwell systemwithavariationalparticle-in-cellscheme.PhysicsofPlasmas19(8)(2012)

work page 2012
[8]

Plasma Science and Technology20(11), 110501 (2018)

Jianyuan, X., Hong, Q., Jian, L.: Structure-preserving geometric particle-in-cell methods for Vlasov-Maxwell systems. Plasma Science and Technology20(11), 110501 (2018)

work page 2018
[9]

Journal of Scientific Computing101(3), 68 (2024)

Campos Pinto, M., Ameres, J., Kormann, K., Sonnendrücker, E.: On variational fourier particle methods. Journal of Scientific Computing101(3), 68 (2024)

work page 2024
[10]

Journal of Scientific Computing91(2), 46 (2022)

Campos Pinto, M., Kormann, K., Sonnendrücker, E.: Variational framework for structure-preserving electromagnetic particle-in-cell methods. Journal of Scientific Computing91(2), 46 (2022)

work page 2022
[11]

Physics of Plasmas23(9) (2016)

He, Y., Sun, Y., Qin, H., Liu, J.: Hamiltonian particle-in-cell methods for Vlasov- Maxwell equations. Physics of Plasmas23(9) (2016)

work page 2016
[12]

Journal of Plasma Physics83(4), 905830401 (2017)

Kraus, M., Kormann, K., Morrison, P.J., Sonnendrücker, E.: GEMPIC: geomet- ric electromagnetic particle-in-cell methods. Journal of Plasma Physics83(4), 905830401 (2017)

work page 2017
[13]

Journal of Computational Physics245, 376–398 (2013)

Evstatiev, E.G., Shadwick, B.A.: Variational formulation of particle algorithms for kinetic plasma simulations. Journal of Computational Physics245, 376–398 (2013)

work page 2013
[14]

Journal of Computational Physics396, 837–847 (2019)

Mitchell, M.S., Miecnikowski, M.T., Beylkin, G., Parker, S.E.: Efficient fourier basis particle simulation. Journal of Computational Physics396, 837–847 (2019)

work page 2019
[15]

Journal of Computational Physics519, 113390 (2024) 21

Shen, C.N., Cerfon, A., Muralikrishnan, S.: A particle-in-fourier method with semi-discrete energy conservation for non-periodic boundary conditions. Journal of Computational Physics519, 113390 (2024) 21

work page 2024
[16]

Concurrency: Practice and experience2(4), 257– 288 (1990)

Walker, D.W.: Characterizing the parallel performance of a large-scale, particle- in-cell plasma simulation code. Concurrency: Practice and experience2(4), 257– 288 (1990)

work page 1990
[17]

Concurrency: Practice and Experience9(12), 1377–1405 (1997)

Carmona, E.A., Chandler, L.J.: On parallel pic versatility and the structure of parallel pic approaches. Concurrency: Practice and Experience9(12), 1377–1405 (1997)

work page 1997
[18]

Parallel Computing27(3), 295–314 (2001)

Di Martino, B., Briguglio, S., Vlad, G., Sguazzero, P.: Parallel pic plasma sim- ulation through particle decomposition techniques. Parallel Computing27(3), 295–314 (2001)

work page 2001
[19]

Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment909, 476–479 (2018)

Vay, J.-L., Almgren, A., Bell, J., Ge, L., Grote, D., Hogan, M., Kononenko, O., Lehe, R., Myers, A., Ng, C.,et al.: Warp-x: A new exascale computing plat- form for beam–plasma simulations. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment909, 476–479 (2018)

work page 2018
[20]

IEEE Transactions on Plasma Science 38(10), 2831–2839 (2010)

Burau,H.,Widera,R.,Hönig,W.,Juckeland,G.,Debus,A.,Kluge,T.,Schramm, U., Cowan, T.E., Sauerbrey, R., Bussmann, M.: Picongpu: A fully relativistic particle-in-cell code for a gpu cluster. IEEE Transactions on Plasma Science 38(10), 2831–2839 (2010)

work page 2010
[21]

In: Proceedings of the 2024 SIAM Conference on Parallel Processing for Scientific Computing (PP), pp

Muralikrishnan, S., Frey, M., Vinciguerra, A., Ligotino, M., Cerfon, A.J., Stoy- anov, M., Gayatri, R., Adelmann, A.: Scaling and performance portability of the particle-in-cell scheme for plasma physics applications through mini-apps tar- geting exascale architectures. In: Proceedings of the 2024 SIAM Conference on Parallel Processing for Scientific Comp...

work page 2024
[22]

Future Generation Computer Systems16(5), 541–552 (2000)

Briguglio, S., Vlad, G., Di Martino, B., Fogaccia, G.: Parallelization of plasma simulation codes: gridless finite size particle versus particle in cell approach. Future Generation Computer Systems16(5), 541–552 (2000)

work page 2000
[23]

SIAM Journal on Scientific Computing, 311–336 (2025)

Muralikrishnan, S., Speck, R.: Error analysis and parallel scaling study of a parareal parallel-in-time integration algorithm for particle-in-fourier schemes. SIAM Journal on Scientific Computing, 311–336 (2025)

work page 2025
[24]

SIAM Journal on Scientific Computing35(4), 411–437 (2013)

Pippig,M.,Potts,D.:Parallelthree-dimensionalnonequispacedfastfouriertrans- forms and their application to particle simulation. SIAM Journal on Scientific Computing35(4), 411–437 (2013)

work page 2013
[25]

PhD thesis, Dissertation, Chemnitz, Technische Universität Chemnitz, 2015 (2016)

Pippig, M.: Massively parallel, fast fourier transforms and particle-mesh meth- ods. PhD thesis, Dissertation, Chemnitz, Technische Universität Chemnitz, 2015 (2016)

work page 2015
[26]

SIAM Journal on Scientific computing14(6), 1368–1393 (1993) 22

Dutt, A., Rokhlin, V.: Fast fourier transforms for nonequispaced data. SIAM Journal on Scientific computing14(6), 1368–1393 (1993) 22

work page 1993
[27]

Applied and Computational Harmonic Analysis2(1), 85–100 (1995)

Dutt, A., Rokhlin, V.: Fast fourier transforms for nonequispaced data, ii. Applied and Computational Harmonic Analysis2(1), 85–100 (1995)

work page 1995
[28]

Modern Sampling Theory: Mathematics and Applications, 247–270 (2001)

Potts, D., Steidl, G., Tasche, M.: Fast fourier transforms for nonequispaced data: A tutorial. Modern Sampling Theory: Mathematics and Applications, 247–270 (2001)

work page 2001
[29]

exponential of semicircle

Barnett, A.H., Magland, J., Klinteberg, L.: A parallel nonuniform fast fourier transform library based on an “exponential of semicircle" kernel. SIAM Journal on Scientific Computing41(5), 479–504 (2019)

work page 2019
[30]

In: Proceedings of the Platform for Advanced Scientific Computing Conference, pp

Rowan, M.E., Gott, K.N., Deslippe, J., Huebl, A., Thévenet, M., Lehe, R., Vay, J.-L.: In-situ assessment of device-side compute work for dynamic load balancing in a gpu-accelerated pic code. In: Proceedings of the Platform for Advanced Scientific Computing Conference, pp. 1–11 (2021)

work page 2021
[31]

Communications of the ACM7(12), 731–733 (1964)

Nievergelt, J.: Parallel methods for integrating ordinary differential equations. Communications of the ACM7(12), 731–733 (1964)

work page 1964
[32]

In: Carraro, T., Geiger, M., Körkel, S., Rannacher, R

Gander, M.J.: 50 years of time parallel time integration. In: Carraro, T., Geiger, M., Körkel, S., Rannacher, R. (eds.) Multiple Shooting and Time Domain Decomposition Methods, pp. 69–113. Springer, Cham (2015)

work page 2015
[33]

Computing and Visualization in Science23, 1–15 (2020)

Ong, B.W., Schroder, J.B.: Applications of time parallelization. Computing and Visualization in Science23, 1–15 (2020)

work page 2020
[34]

https://parallel-in-time.org/

PinT-Community: Parallel-in-Time webpage. https://parallel-in-time.org/. Accessed: 13-01-2026

work page 2026
[35]

Comptes Rendus de l’Académie des Sciences-Series I-Mathematics 332(7), 661–668 (2001)

Lions, J.-L., Maday, Y., Turinici, G.: Résolution d’edp par un schéma en temps «pararéel». Comptes Rendus de l’Académie des Sciences-Series I-Mathematics 332(7), 661–668 (2001)

work page 2001
[36]

Parallel Computing 37(3), 172–182 (2011)

Aubanel, E.: Scheduling of tasks in the parareal algorithm. Parallel Computing 37(3), 172–182 (2011)

work page 2011
[37]

Journal of Computational Physics371, 483–505 (2018)

Nielsen, A.S., Brunner, G., Hesthaven, J.S.: Communication-aware adaptive parareal with application to a nonlinear hyperbolic system of partial differential equations. Journal of Computational Physics371, 483–505 (2018)

work page 2018
[38]

In: Workshops on Parallel-in-Time Integration, pp

Götschel, S., Minion, M., Ruprecht, D., Speck, R.: Twelve ways to fool the masses when giving parallel-in-time results. In: Workshops on Parallel-in-Time Integration, pp. 81–94 (2020). Springer

work page 2020
[39]

Zenodo (2024) 23

Frey, M., Vinciguerra, A., Muralikrishnan, S., Mayani, S., Montanaro, V., Sadr, M., Adelmann, A., Winkler, M., Schurk, F.: IPPL-framework/ippl: IPPL-3.2.0. Zenodo (2024) 23

work page 2024
[40]

In: 2021 IEEE Inter- national Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp

Shih, Y.-h., Wright, G., Andén, J., Blaschke, J., Barnett, A.H.: cufinufft: a load- balanced gpu library for general-purpose nonuniform ffts. In: 2021 IEEE Inter- national Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 688–697 (2021). IEEE

work page 2021
[41]

ACM Transactions on Mathematical Software (TOMS)36(4), 1–30 (2009)

Keiner, J., Kunis, S., Potts, D.: Using nfft 3—a software library for various noneq- uispaced fast fourier transforms. ACM Transactions on Mathematical Software (TOMS)36(4), 1–30 (2009)

work page 2009
[42]

Submitted to PASC Conference Proceedings 2026

Fischill, P., Adelmann, A., Muralikrishnan, S.: A Performance-Portable, Mas- sively Parallel Distributed Nonuniform FFT. Submitted to PASC Conference Proceedings 2026

work page 2026
[43]

SIAM Journal on Scientific Computing36(6), 635–661 (2014) 24

Falgout, R.D., Friedhoff, S.,Kolev, T.V.,MacLachlan, S.P., Schroder, J.B.: Paral- lel time integration with multigrid. SIAM Journal on Scientific Computing36(6), 635–661 (2014) 24

work page 2014