pith. sign in

arxiv: 2605.27282 · v1 · pith:ULKRENGVnew · submitted 2026-05-26 · ⚛️ physics.acc-ph · physics.comp-ph

The Memory Scaling of Reverse-Mode Differentiation in Particle Accelerator Simulations with Space Charge

Pith reviewed 2026-06-29 14:15 UTC · model grok-4.3

classification ⚛️ physics.acc-ph physics.comp-ph
keywords particle acceleratorspace chargereverse-mode differentiationautomatic differentiationmemory scalingdifferentiable simulationmacroparticlebeam tracking
0
0 comments X

The pith

Reverse-mode differentiation memory in accelerator simulations scales linearly with macroparticles and space charge kicks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies memory requirements when using reverse-mode automatic differentiation for particle accelerator simulations that include space charge effects. It implements space charge in the Cheetah code, which is built on PyTorch to support differentiation. Measurements show that memory usage increases linearly as the number of macroparticles or grid cells grows, and also increases in proportion to the number of space charge calculations performed. A sympathetic reader would care because this scaling law lets one determine in advance whether a planned differentiable simulation can run within the limits of available computer memory.

Core claim

The memory usage for reverse-mode differentiation grows linearly with the number of macroparticles and cells, and that it is proportional to the number of space charge kicks involved in the simulation. This general scaling can be used to evaluate whether a given differentiable simulation is feasible given hardware memory constraints.

What carries the argument

The PyTorch-based implementation of space charge kicks in Cheetah, which builds a computational graph for reverse-mode automatic differentiation.

If this is right

  • Memory requirements can be predicted from the number of macroparticles, cells, and space charge kicks before running the simulation.
  • Simulations become more feasible on limited hardware when fewer space charge kicks are used.
  • Differentiable modeling of accelerators with space charge is practical provided the linear growth stays within memory limits.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Other implementations of differentiable space charge might exhibit the same linear scaling if they use similar reverse-mode methods.
  • Techniques like gradient checkpointing could further reduce memory use in long simulations.
  • The result highlights a trade-off between simulation accuracy from more macroparticles and available memory.

Load-bearing premise

The linear scaling measured in this particular Cheetah implementation using PyTorch represents the general behavior for reverse-mode differentiation with space charge.

What would settle it

Implementing space charge in a different differentiable particle simulation framework and measuring whether memory usage follows the same linear dependence on macroparticles and kicks.

Figures

Figures reproduced from arXiv: 2605.27282 by Arjun Dhamrait, Axel Huebl, Chad E. Mitchell, Chenran Xu, Edoardo Zoni, Jan Kaiser, Jean-Luc Vay, Ji Qiang, Remi Lehe, Ryan Roussel.

Figure 1
Figure 1. Figure 1: Timeline of memory usage for a Cheetah simulation ( [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Coefficients for the increase in memory usage dur [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 2
Figure 2. Figure 2: Total change in memory usage over one space [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Increase in memory usage Δ incurred during the calculation of the Integrated Green Function (IGF), as part of one space charge kick (forward pass, with reverse-mode differentiation turned on), as a function of the number of macroparticles and cells. Eq. (1) is fitted to the data, and the corresponding coefficients are represented in [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: shows the increase in memory usage measured during charge deposition (using the profiling tools described in App. C) as part of the forward pass. As can be observed, the memory usage de￾pends strongly on the number of macroparticles (part. ≃ 550 B per macroparticle) and increases only mildly as a function of the number of cells (cells ≃ 8 B per cell). We observe no evidence for a dependency of the form par… view at source ↗
Figure 6
Figure 6. Figure 6: Again, we observe a strong dependency with [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 6
Figure 6. Figure 6: Increase in memory usage Δ incurred during field gather, as part of one space charge kick (forward pass, with reverse-mode differentiation turned on), as a function of the number of macroparticles and cells. Eq. (1) is fitted to the data, and the corresponding coefficients are represented in [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Plots of the Lorentz force = ( + × ) felt by particles of a relativistic beam in the lab frame, due to space charge, for various values of the Lorentz factor . The solid lines are obtained by evaluating Eq. (7) and Eq. (8), while the dots are from the space charge calculation within Cheetah. where is the beam radius, is the number of beam particles, is the classical electron radius, and and are the Lorentz… view at source ↗
read the original abstract

The recent development of differentiable simulation codes for particle accelerators has enabled gradient-based workflows that promise finer control and more realistic modeling of accelerator facilities. However, when using reverse-mode automatic differentiation, the memory usage continuously increases during the simulation, and can potentially exceed the available hardware memory -- especially when costly space charge computation is included. To study the memory requirements for differentiable simulations, we have implemented space charge in Cheetah, a PyTorch-based beam tracking code that supports reverse-mode differentiation. We find that the memory usage for reverse-mode differentiation grows linearly with the number of macroparticles and cells, and that it is proportional to the number of space charge kicks involved in the simulation. This general scaling can be used to evaluate whether a given differentiable simulation is feasible given hardware memory constraints.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript implements space charge in the Cheetah PyTorch-based differentiable beam tracking code and reports empirical measurements showing that reverse-mode automatic differentiation memory usage grows linearly with the number of macroparticles and grid cells and is proportional to the number of space charge kicks performed during the simulation. This scaling is presented as a practical guideline for assessing hardware feasibility of such simulations.

Significance. If the reported linear scaling is robust and not implementation-specific, the result supplies a concrete, directly measured rule of thumb that accelerator physicists can use to decide whether a planned differentiable simulation with space charge will fit in available GPU memory. The empirical character of the claim (rather than a fitted or derived expression) is a strength when the measurement protocol is fully documented.

major comments (2)
  1. [Abstract] Abstract and results sections: the central claim of linear scaling is stated without any description of the test cases, number of macroparticles/cells ranges, number of independent runs, error bars, or controls that would rule out artifacts from Cheetah's specific data structures or PyTorch autograd tape construction. This information is load-bearing for judging whether the observed linearity is general.
  2. [Results] The manuscript supplies no theoretical accounting of tape size (particle-to-grid deposition, field solve, grid-to-particle interpolation) and no cross-framework or cross-solver comparisons. Without these, the claim that the scaling is 'general' and can be used to evaluate 'a given differentiable simulation' remains specific to the Cheetah/PyTorch implementation and cannot yet be treated as implementation-independent.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive comments. We address each major comment below, indicating where revisions will be made to strengthen the documentation of our empirical results while clarifying the scope of the claims.

read point-by-point responses
  1. Referee: [Abstract] Abstract and results sections: the central claim of linear scaling is stated without any description of the test cases, number of macroparticles/cells ranges, number of independent runs, error bars, or controls that would rule out artifacts from Cheetah's specific data structures or PyTorch autograd tape construction. This information is load-bearing for judging whether the observed linearity is general.

    Authors: We agree that additional details on the experimental protocol are required for readers to evaluate the robustness of the reported linearity. In the revised manuscript we will expand the Results section (and update the abstract accordingly) to describe the test cases, the specific ranges of macroparticles and grid cells examined, the number of independent runs, any error bars or statistical measures, and the controls used to check for implementation-specific artifacts in Cheetah or PyTorch autograd. revision: yes

  2. Referee: [Results] The manuscript supplies no theoretical accounting of tape size (particle-to-grid deposition, field solve, grid-to-particle interpolation) and no cross-framework or cross-solver comparisons. Without these, the claim that the scaling is 'general' and can be used to evaluate 'a given differentiable simulation' remains specific to the Cheetah/PyTorch implementation and cannot yet be treated as implementation-independent.

    Authors: The work is an empirical study of memory scaling inside the Cheetah/PyTorch implementation; we will revise the abstract and discussion to make this scope explicit rather than using the term 'general' without qualification. We will also add a concise paragraph explaining why linear scaling with macroparticles, cells, and kicks is expected from the structure of reverse-mode differentiation applied to standard particle-in-cell operations. A full theoretical derivation of tape size or cross-framework comparisons lie outside the present study and would require new analysis not performed here. revision: partial

standing simulated objections not resolved
  • A theoretical accounting of tape size for particle-to-grid deposition, field solve, and grid-to-particle interpolation together with cross-framework or cross-solver comparisons.

Circularity Check

0 steps flagged

No circularity: empirical measurement, not a derivation

full rationale

The paper implements space charge in Cheetah (a PyTorch code) and reports direct measurements of memory usage during reverse-mode differentiation. The central claim is an observed linear scaling with macroparticles, cells, and space-charge kicks. No equations, fitted parameters, or self-citations are used to derive the scaling; it is presented as a measurement result. The derivation chain is therefore self-contained and does not reduce to its own inputs by construction. This is the expected honest outcome for an empirical scaling study.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no derivations, free parameters, or invented entities are described.

pith-pipeline@v0.9.1-grok · 5692 in / 1165 out tokens · 24567 ms · 2026-06-29T14:15:34.739832+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

28 extracted references · 21 canonical work pages

  1. [1]

    C. K. Birdsall and A. B. Langdon. 2004. Plasma Physics via Computer Simulation . CRC Press. doi:10.1201/9781315275048

  2. [2]

    Martinez, Connie Xu, Kathryn R

    Tobias Boltz, Jose L. Martinez, Connie Xu, Kathryn R. L. B aker, Zihan Zhu, Jenny Morgan, Ryan Roussel, Daniel Ratner, Brahim Mustapha , and Auralee L. Edelen. 2025. Leveraging prior mean models for faster Bayes ian optimiza- tion of particle accelerators. Scientific Reports 15, 1 (10 April 2025), 12232. doi:10.1038/s41598-025-95297-z

  3. [3]

    Kyle Cranmer, Johann Brehmer, and Gilles Louppe. 2020. T he frontier of simulation-based inference. Proceedings of the National Academy of Sciences 117, 48 (2020), 30055–30062. arXiv:https://www.pnas.org/doi/pdf/10.1073/pnas.1912789117 doi:10.1073/pnas.1912789117

  4. [4]

    Gauge r, Christian Glaser, Atılım G

    Tommaso Dorigo, Andrea Giammanco, Pietro Vischia, Max A ehle, Mateusz Bawaj, Alexey Boldyrev, Pablo de Castro Manzano, Denis Derk ach, Julien Donini, Auralee Edelen, Federica Fanzago, Nicolas R. Gauge r, Christian Glaser, Atılım G. Baydin, Lukas Heinrich, Ralf Keidel, Jan Kieseler , Claudius Krause, Maxime Lagrange, Max Lamparth, Lukas Layer, Gernot Maier...

  5. [5]

    Jeff Eldred, Valeri Lebedev, Kiyomi Seiya, and Vladimir S hiltsev. 2021. Beam intensity effects in Fermilab Booster synchrotron. Phys. Rev. Accel. Beams 24 (Apr 2021), 044001. Issue 4. doi:10.1103/PhysRevAccelBea ms.24.044001

  6. [6]

    Juan Pablo Gonzalez-Aguilera, Young-Kee Kim, Ryan Rous sel, Auralee Edelen, and Christopher Mayes. 2023. Towards fully differentiable accelerator modeling. JACoW IPAC2023 (2023), WEPA065. doi:10.18429/JACoW-IPAC2023-WEPA065

  7. [7]

    Andreas Griewank and Andrea Walther. 2008. Evaluating Deriva- tives (second ed.). Society for Industrial and Applied Mathemat- ics. arXiv:https://epubs.siam.org/doi/pdf/10.1137/1. 9780898717761 doi:10.1137/1.9780898717761

  8. [8]

    R. W. Hockney and J. W. Eastwood. 1988. Computer simulation using particles . Bristol: Hilger, 1988

  9. [9]

    Austin Hoover and Jonathan C. Wong. 2024. High-dimensio nal maximum- entropy phase space tomography using normalizing flows. Phys. Rev. Res. 6 (Aug 2024), 033163. Issue 3. doi:10.1103/PhysRevResearch.6.0 33163

  10. [10]

    Jan Kaiser, Chenran Xu, Annika Eichler, and Andrea Sant amaria Garcia. 2024. Bridging the gap between machine learning and particle acce lerator physics with high-speed, differentiable simulations. Phys. Rev. Accel. Beams 27 (May 2024), 054601. Issue 5. doi:10.1103/PhysRevAccelBeams.2 7.054601

  11. [11]

    Kuklev, M

    N. Kuklev, M. Wallbank, N. Banerjee, J. Jarvis, and A. Romanov. 2025. End-to-end differentiable digital twin for the IOTA/FAST facility. (20 25)

  12. [12]

    Christopher Mayes, Robert Ryne, and David Sagan. 2018. 3D Space Charge in Bmad. In 9th International Particle Accelerator Conference . doi:10.18429/JACoW-IPAC2018-THPAK085

  13. [13]

    Mitchell, M

    C.E. Mitchell, M. Garten, A. Huebl, R. Lehe, J. Qiang, R. T. Sandberg, and J.-L. Vay. 2024. ImpactX Modeling of Benchmark Tests for Space Cha rge Validation. In Proc. 68th Adv. Beam Dyn. Workshop High-Intensity High-Brig htness Hadron Beams (HB’23) (ICFA Advanced Beam Dynamics Workshop on High-Intensity and High-Brightness Hadron Beams, 68) . JACoW Publis...

  14. [14]

    PyTorch Contributors. 2024. torch.utils.checkpoint . https://docs.pytorch.org/docs/stable/checkpoint.html. Accessed: 2024

  15. [15]

    Ji Qiang. 2018. Three-dimensional envelope instabili ty in periodic focus- ing channels. Phys. Rev. Accel. Beams 21 (Mar 2018), 034201. Issue 3. doi:10.1103/PhysRevAccelBeams.21.034201

  16. [16]

    Ji Qiang. 2023. Differentiable self-consistent space- charge simulation for ac- celerator design. Phys. Rev. Accel. Beams 26 (Feb 2023), 024601. Issue 2. doi:10.1103/PhysRevAccelBeams.26.024601

  17. [17]

    Ji Qiang. 2025. Study of fully coupled 3D envelope insta bility using automatic differentiation. https://arxiv.org/abs/2512.02433. (20 25)

  18. [18]

    Ryne, and Cecile Limbor g-Deprey

    Ji Qiang, Steve Lidia, Robert D. Ryne, and Cecile Limbor g-Deprey. 2006. Three-dimensional quasistatic model for high brightness b eam dynamics simulation. Phys. Rev. ST Accel. Beams 9 (Apr 2006), 044204. Issue 4. doi:10.1103/PhysRevSTAB.9.044204

  19. [19]

    Ji Qiang, Chad Mitchell, Remi Lehe, and Arianna Forment i. 2024. Implementa- tion of the Integrated Green’s Function Method for 3D Poisso n’s Equation in a Large Aspect Ratio Computational Domain. Journal of Software Engineering and Applications 17, 9 (2024), 740–749

  20. [20]

    Ji Qiang, Jinyu Wan, Allen Qiang, and Yue Hao. 2025. Fast chaos in- dicator from auto-differentiation for dynamic aperture opt imization. https://arxiv.org/abs/2510.25196. (2025)

  21. [21]

    M. Reiser. 2008. Theory and Design of Charged Particle Beams . John Wiley & Sons, Ltd

  22. [22]

    Tunable, flexible, and efficient optimization of control pulses for practical qubits,

    R. Roussel, A. Edelen, C. Mayes, D. Ratner, J. P. Gonzalez-Aguilera, S. Kim, E. Wis- niewski, and J. Power. 2023. Phase Space Reconstruction from Accelerator Beam Measurements Using Neural Networks and Differentiable Simulations. Phys. Rev. Lett. 130 (Apr 2023), 145001. Issue 14. doi:10.1103/PhysRevLett .130.145001

  23. [23]

    Ryan Roussel, Juan Pablo Gonzalez-Aguilera, Eric Wisn iewski, Alexander Ody, Wanming Liu, John Power, Young-Kee Kim, and Auralee Edelen. 2024. Efficient six-dimensional phase space reconstructions from experim ental measurements using generative machine learning. Phys. Rev. Accel. Beams27 (Sep 2024), 094601. Issue 9. doi:10.1103/PhysRevAccelBeams.27.094601

  24. [24]

    J.-L. Vay. 2008. Simulation of beams or plasmas crossin g at relativistic velocity. Physics of Plasmas 15, 5 (02 2008), 056701. doi:10.1063/1.2837054

  25. [25]

    J. Wan, H. Alamprese, C. Ratcliff, J. Qiang, and Y. Hao. 20 25. JuTrack: a Ju- lia package for auto-differentiable accelerator modeling a nd particle tracking. Comp. Phys. Comm. 309 (2025), 109497

  26. [26]

    Eric Wong. 2017. pytorch_fft: A PyTorch wrapper for CUDA FFTs. https://github.com/locuslab/pytorch_fft. Apache-2.0 Li cense

  27. [27]

    Takaaki Yasui, Susumu Igarashi, Yoichi Sato, Tadashi K oseki, and Kazuhito Ohmi. 2020. Transverse emittance growth caused by space-ch arge-induced resonance. Phys. Rev. Accel. Beams 23 (Jun 2020), 061001. Issue 6. doi:10.1103/PhysRevAccelBeams.23.061001

  28. [28]

    Zolkin, A

    T. Zolkin, A. Burov, and B. Pandey. 2018. Transverse mod e-coupling instabil- ity and space charge. Phys. Rev. Accel. Beams 21 (Oct 2018), 104201. Issue 10. doi:10.1103/PhysRevAccelBeams.21.104201