A Simple GPU-Accelerated Solver for the Schr\"odinger Operator with Applications to Ground States and Hamiltonian Simulation
Pith reviewed 2026-05-21 06:29 UTC · model grok-4.3
The pith
A tensor-product direct solver inverts the Schrödinger operator at near-linear cost for separable potentials and preconditions it with mesh-independent efficiency for bounded perturbations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
When the potential admits a decomposition V = V1 + V2 with V1 separable and V2 bounded, the preconditioned operator (-Δ + V1)^{-1} (-Δ + V) has a bounded condition number and a spectrum clustered around one with only finitely many outliers, independently of the mesh size and, when V1 is confining, independently of the domain size. This spectral property explains the observed mesh- and domain-independent iteration counts in preconditioned conjugate gradient.
What carries the argument
The separable-potential decomposition V = V1 + V2 together with the tensor-product direct solver for the operator -Δ + V1, which is applied either as an exact solver for separable cases or as a preconditioner for general cases.
If this is right
- Ground-state computations via inverse iteration or gradient flow become practical in three dimensions with billions of degrees of freedom.
- Hamiltonian simulation via operator splitting reaches nine spatial dimensions on a single high-end GPU.
- Preconditioned conjugate gradient requires a number of iterations that does not grow with mesh refinement or domain enlargement.
- The same framework supplies both linear solvers and time propagators at comparable cost.
Where Pith is reading between the lines
- The decomposition strategy may extend to other elliptic operators that possess a dominant separable component.
- Clustering of the spectrum around unity suggests that additional acceleration techniques such as deflation could further reduce iteration counts.
- GPU implementations could be combined with distributed-memory parallelism to tackle even larger problems.
Load-bearing premise
The potential can be split into a separable component V1 for which a fast direct solver exists and a bounded remainder V2 whose size controls the perturbation.
What would settle it
Numerical experiments showing that the number of preconditioned conjugate gradient iterations grows without bound as the mesh is refined while keeping V2 bounded would disprove the claimed mesh-independent condition number.
read the original abstract
We extend the tensor-product direct solver from the Laplacian to the Schr\"odinger operator $-\Delta + V$. When the potential $V_1$ is separable, the operator $-\Delta + V_1$ is inverted or exponentiated at cost $O(N^{1+1/d})$ in $d$ dimensions via per-axis eigendecomposition. On a single NVIDIA A100 GPU, this costs less than one second for $10^9$ degrees of freedom in 3D. For non-separable potentials $V = V_1 + V_2$, the same solver provides a preconditioner $(-\Delta + V_1)^{-1}$ for the preconditioned conjugate gradient (PCG) method and a propagator for operator-splitting time integrators. For bounded $V_2$, we prove that the preconditioned operator has a bounded condition number and a clustered spectrum with at most finitely many outlier eigenvalues, independently of the mesh size, and also independently of the domain size when $V_1$ is a confining potential. This explains the mesh- and domain-independent PCG iteration counts observed in practice. We apply this method to ground state computation via inverse iteration for linear problems and via the $a_u$ gradient flow for Gross--Pitaevskii energy in 3D, and also Hamiltonian simulation via the approximated qHOP and Magnus-2 splitting methods from 3D to 9D on a single NVIDIA GH200 GPU.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript extends tensor-product direct solvers to the Schrödinger operator −Δ + V. For separable V1 the operator is inverted or exponentiated in O(N^{1+1/d}) time via per-axis eigendecompositions, achieving sub-second runtimes on an A100 for 10^9 degrees of freedom in 3D. For non-separable V = V1 + V2 with V2 bounded, (−Δ + V1)^{−1} is used as a preconditioner for PCG and as a propagator in operator-splitting schemes; the authors prove that the preconditioned operator has mesh-independent condition number and at most finitely many spectral outliers, with the same holding independent of domain size when V1 is confining. The method is applied to inverse iteration for linear ground states, au-gradient flow for the Gross–Pitaevskii equation in 3D, and approximated qHOP/Magnus-2 splitting for Hamiltonian simulation from 3D to 9D on a GH200 GPU.
Significance. If the spectral claims hold, the work supplies a practical, theoretically justified route to mesh- and domain-independent iteration counts for high-dimensional Schrödinger problems on current GPU hardware. The combination of an explicit fast direct solver for the separable part with a relatively-compact perturbation argument for the remainder is a clear strength; the reported timings and dimension range (up to 9D) further indicate immediate utility for quantum simulation tasks.
major comments (1)
- §3.2 (Preconditioning analysis): The statement that the discrete preconditioned operator inherits the continuous spectral bounds (bounded condition number and finitely many outliers) is asserted after invoking relatively-compact perturbation theory for I + P^{-1}V2. The manuscript should supply an explicit lemma or corollary showing how the finite-dimensional discretization preserves the essential spectrum and controls the number of outliers uniformly in h; without this step the mesh-independence claim for PCG rests on an unverified transfer from the continuous to the discrete setting.
minor comments (2)
- Notation: the symbol P is used both for the preconditioner (−Δ + V1) and for the number of grid points in some tables; a distinct symbol would improve readability.
- Figure 4 (PCG iteration counts): the legend does not indicate whether the plotted curves correspond to different values of ||V2||∞ or to different domain sizes; adding this information would make the mesh- and domain-independence claim visually immediate.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive feedback on the preconditioning analysis. We address the single major comment below and will revise the manuscript to incorporate the requested clarification.
read point-by-point responses
-
Referee: §3.2 (Preconditioning analysis): The statement that the discrete preconditioned operator inherits the continuous spectral bounds (bounded condition number and finitely many outliers) is asserted after invoking relatively-compact perturbation theory for I + P^{-1}V2. The manuscript should supply an explicit lemma or corollary showing how the finite-dimensional discretization preserves the essential spectrum and controls the number of outliers uniformly in h; without this step the mesh-independence claim for PCG rests on an unverified transfer from the continuous to the discrete setting.
Authors: We agree that an explicit bridge from the continuous relatively-compact perturbation result to the discrete setting would make the mesh-independence argument fully rigorous and self-contained. In the revised manuscript we will add a short corollary in §3.2. The corollary will invoke standard spectral approximation theory for self-adjoint elliptic operators under consistent finite-difference (or tensor-product spectral) discretizations: because V2 is bounded, the discrete perturbation remains relatively compact with respect to the discrete Laplacian, the essential spectrum is preserved, and the number of outliers outside any neighborhood of the essential spectrum is finite and independent of h. The same argument extends immediately to the confining case when the domain size is varied. This addition directly justifies the observed mesh- and domain-independent PCG iteration counts without altering any existing proofs or numerical results. revision: yes
Circularity Check
No significant circularity in the derivation chain
full rationale
The paper's central claim is a mathematical proof that for V = V1 + V2 with V1 separable and V2 bounded, the preconditioned operator P^{-1}A (P = -Δ + V1) has mesh-independent condition number and finitely many spectral outliers, obtained by rewriting it as I + P^{-1}V2 and invoking relatively compact perturbation theory on the essential spectrum (with domain-size independence when V1 confines). This relies on standard functional analysis and does not reduce to any fitted parameter, self-definition, or load-bearing self-citation. The tensor-product direct solver for separable cases follows from per-axis eigendecompositions at O(N^{1+1/d}) cost and is independent of the spectral bounds. The derivation is self-contained against external mathematical benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The potential V admits a decomposition V = V1 + V2 where V1 is separable and V2 is bounded (or confining when domain independence is claimed).
Reference graph
Works this paper leans on
-
[1]
D. An, D. F ang, and L. Lin , Time-dependent Hamiltonian simulation of highly oscillatory dynamics and superconvergence for Schr¨ odinger equation, Quantum, 6 (2022), p. 690
work page 2022
-
[2]
X. Antoine and R. Duboscq , Modeling and computation of Bose–Einstein condensates: sta- tionary states, nucleation, dynamics, stochasticity , in Nonlinear Optical and Atomic Sys- tems: At the Interface of Physics and Mathematics, Springer International Publishing, 2015, pp. 49–145
work page 2015
-
[3]
X. Antoine, A. Levitt, and Q. Tang , Efficient spectral computation of the stationary states of rotating Bose–Einstein condensates by preconditioned nonlinear conjugate gradient meth- ods, J. Comput. Phys., 343 (2017), pp. 92–109
work page 2017
- [4]
-
[5]
W. Bao, S. Jin, and P. A. Markowich , On time-splitting spectral approximations for the Schr¨ odinger equation in the semiclassical regime, J. Comput. Phys., 175 (2002), pp. 487– 524
work page 2002
-
[6]
T. J. Boerner, S. Deems, T. R. Furlani, S. L. Knuth, and J. Towns , ACCESS: Advanc- ing innovation: NSF’s advanced cyberinfrastructure coordination ecosystem: Services & support, in Practice and Experience in Advanced Research Computing 2023: Computing for the Common Good, 2023, pp. 173–176
work page 2023
-
[7]
Y. Borns-Weil, D. F ang, and J. Zhang , Discrete superconvergence analysis for quantum Magnus algorithms of unbounded Hamiltonian simulation , Commun. Math. Phys., 407 (2026), p. 29
work page 2026
-
[8]
M. Caliari, F. Cassini, L. Einkemmer, A. Ostermann, and F. Zivcovich , A µ-mode inte- grator for solving evolution equations in Kronecker form , J. Comput. Phys., 455 (2022), p. 110989
work page 2022
-
[9]
Z. Chen, J. Lu, Y. Lu, and X. Zhang , Fully discretized Sobolev gradient flow for the Gross– Pitaevskii eigenvalue problem , Math. Comp., 94 (2025), pp. 2723–2760
work page 2025
-
[10]
A. M. Childs, J. Leng, T. Li, J.-P. Liu, and C. Zhang , Quantum simulation of real-space dynamics, Quantum, 6 (2022), p. 860
work page 2022
-
[11]
I. Danaila and B. Protas , Computation of ground states of the Gross–Pitaevskii functional via Riemannian optimization , SIAM J. Sci. Comput., 39 (2017), pp. B1102–B1129
work page 2017
- [12]
- [13]
-
[14]
D. F ang and J. Zhang , Superconvergence of high-order Magnus quantum algorithms , 2025. arXiv:2509.22897
- [15]
-
[16]
I. K. Gainullin and M. A. Sonkin , High-performance parallel solver for 3D time-dependent Schr¨ odinger equation for large-scale nanosystems, Comput. Phys. Commun., 188 (2015), pp. 68–75
work page 2015
-
[17]
A. Hahn, P. Hartung, D. Burgarth, P. F acchi, and K. Yuasa , Lower bounds for the Trotter error, Phys. Rev. A, 111 (2025), p. 022417. 26 X. LIU AND X. ZHANG
work page 2025
-
[18]
D. B. Haidvogel and T. Zang , The accurate solution of Poisson’s equation by expansion in Chebyshev polynomials, J. Comput. Phys., 30 (1979), pp. 167–180
work page 1979
-
[19]
W. Hao, S. Lee, and X. Zhang , An efficient quasi-Newton method with tensor product im- plementation for solving quasi-linear elliptic equations and systems , J. Sci. Comput., 103 (2025), p. 89
work page 2025
-
[20]
P. Henning and D. Peterseim , Sobolev gradient flow for the Gross–Pitaevskii eigenvalue problem: Global convergence and computational efficiency , SIAM J. Numer. Anal., 58 (2020), pp. 1744–1772
work page 2020
-
[21]
M. Hochbruck and C. Lubich , On Magnus integrators for time-dependent Schr¨ odinger equa- tions, SIAM J. Numer. Anal., 41 (2003), pp. 945–963
work page 2003
-
[22]
B. Jackson, J. F. McCann, and C. S. Adams , Vortex formation in dilute inhomogeneous Bose–Einstein condensates, Phys. Rev. Lett., 80 (1998), pp. 3903–3906
work page 1998
-
[23]
T. Jahnke and C. Lubich , Error bounds for exponential operator splittings , BIT Numer. Math., 40 (2000), pp. 735–744
work page 2000
-
[24]
S. Jin, P. A. Markowich, and C. Sparber , Mathematical and computational methods for semiclassical Schr¨ odinger equations, Acta Numer., 20 (2011), pp. 121–209
work page 2011
-
[25]
Y.-Y. Kwan and J. Shen , An efficient direct parallel spectral-element solver for separable elliptic problems, J. Comput. Phys., 225 (2007), pp. 1721–1735
work page 2007
-
[26]
C. Lasser and C. Lubich , Computing quantum dynamics in the semiclassical regime , Acta Numer., 29 (2020), pp. 229–401
work page 2020
-
[27]
H. Li, D. Appel ¨o, and X. Zhang , Accuracy of spectral element method for wave, parabolic, and Schr¨ odinger equations, SIAM J. Numer. Anal., 60 (2022), pp. 339–363
work page 2022
-
[28]
X. Liu, J. Shen, and X. Zhang , A simple GPU implementation of spectral-element methods for solving 3D Poisson type equations on rectangular domains and its applications , Commun. Comput. Phys., 36 (2024), pp. 1157–1185
work page 2024
-
[29]
R. E. Lynch, J. R. Rice, and D. H. Thomas , Direct solution of partial difference equations by tensor product methods , Numer. Math., 6 (1964), pp. 185–199
work page 1964
-
[30]
J. E. Osborn , Spectral approximation for compact operators, Math. Comp., 29 (1975), pp. 712– 725
work page 1975
-
[31]
M. Reed and B. Simon , Methods of Modern Mathematical Physics, Vol. I: Functional Analy- sis, Academic Press, New York, 1972
work page 1972
-
[32]
IV: Analysis of Operators , Academic Press, New York, 1978
, Methods of Modern Mathematical Physics, Vol. IV: Analysis of Operators , Academic Press, New York, 1978
work page 1978
-
[33]
Shen , Efficient spectral-Galerkin method I
J. Shen , Efficient spectral-Galerkin method I. Direct solvers of second- and fourth-order equa- tions using Legendre polynomials, SIAM J. Sci. Comput., 15 (1994), pp. 1489–1505
work page 1994
-
[34]
J. Shen, T. Tang, and L.-L. W ang , Spectral Methods: Algorithms, Analysis and Applications, vol. 41 of Springer Series in Computational Mathematics, Springer, Heidelberg, 2011
work page 2011
- [35]
-
[36]
M. Thalhammer , High-order exponential operator splitting methods for time-dependent Schr¨ odinger equations, SIAM J. Numer. Anal., 46 (2008), pp. 2022–2038
work page 2008
-
[37]
A. van der Sluis and H. A. van der Vorst , The rate of convergence of conjugate gradients , Numer. Math., 48 (1986), pp. 543–560
work page 1986
-
[38]
Yoshida , Construction of higher order symplectic integrators , Phys
H. Yoshida , Construction of higher order symplectic integrators , Phys. Lett. A, 150 (1990), pp. 262–268
work page 1990
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.