Recognition: unknown
Large-Scale Quantum Circuit Simulation on an Exascale System for QPU Benchmarking
Pith reviewed 2026-05-07 13:37 UTC · model grok-4.3
The pith
Exascale classical simulations benchmark a quantum processor and identify its coherent performance limit at 93 qubits.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors perform large-scale noiseless simulations of linear ramp QAOA circuits on an exascale system for up to 48 qubits and 3,384 two-qubit gates to validate experimental results from the quantum processor. They determine that up to this size, the QPU samples cannot be distinguished from the noiseless simulation. Extending to larger sizes with experimental data and a mean-of-means resampling procedure using a 3σ threshold, the analysis shows coherent performance up to 93 qubits with 12,834 two-qubit gates, while at 95 qubits the outputs become statistically indistinguishable from random sampling.
What carries the argument
Linear ramp quantum approximate optimization algorithm (LR-QAOA) circuits combined with a mean-of-means resampling procedure and 3σ threshold for testing statistical distinguishability from random sampling.
If this is right
- Exascale simulations provide a reliable reference to confirm noise-tolerant operation in quantum processors up to 48 qubits.
- The statistical test extends the benchmarking to 95 qubits without requiring full classical simulation.
- A transition point is established where the processor shifts from producing distinguishable outputs to random-like behavior at 95 qubits.
- This method quantifies the practical limits of current quantum hardware for useful computation.
Where Pith is reading between the lines
- The benchmarking approach could be adapted to evaluate other quantum algorithms or hardware platforms for their noise tolerance boundaries.
- If the 3σ threshold proves robust, it might help in designing quantum circuits that maximize the coherent regime.
- Extending the simulations or tests to different circuit depths or gate types could reveal more about error accumulation in trapped-ion systems.
Load-bearing premise
The mean-of-means resampling procedure with a fixed 3σ threshold accurately identifies when experimental samples become indistinguishable from random sampling, assuming the exascale noiseless simulations are free of significant errors up to 48 qubits.
What would settle it
A repetition of the mean-of-means analysis showing that 93-qubit samples fall within the random threshold, or discovery of substantial discrepancies between the 48-qubit simulation and independent verification, would falsify the reported coherent regime.
Figures
read the original abstract
Recent advances in quantum computing have enabled the development of quantum processors with hundreds of qubits. However, noise continues to limit the amount of useful information that can be extracted from these systems, making it essential to identify the regime in which experimental outputs remain reliable. In this work, we benchmark Quantinuum Helios-1, a 98-qubit trapped-ion quantum processing unit, using the linear ramp quantum approximate optimization algorithm (LR-QAOA). To this end, we perform large-scale noiseless simulations on JUPITER, Europe's first exascale supercomputer, for circuits of up to 48 qubits and 3,384 two-qubit gates. These simulations, executed on 4,096 nodes equipped with 16,384 GH200 superchips and high-bandwidth CPU-GPU interconnects, provide a reference for validating experimental results at the edge of classical tractability. We find that, up to 48 qubits, Helios-1 remains in a noise-tolerant region, i.e., its samples cannot be clearly distinguished from those coming from a noiseless simulation. We then extend the analysis to larger system sizes using experimental data only, and apply a mean-of-means resampling procedure with a 3$\sigma$ threshold to determine whether the QPU output is statistically distinguishable from random sampling. This analysis identifies a regime of coherent performance up to 93 qubits (12,834 two-qubit gates), beyond which, at 95 qubits, the outputs become statistically indistinguishable from random sampling. These results demonstrate how exascale classical simulation can be used to validate quantum processors, and provide a quantitative boundary between noise-tolerant and random regimes in quantum processors.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper benchmarks the Quantinuum Helios-1 98-qubit trapped-ion QPU using linear-ramp QAOA circuits. It reports noiseless classical simulations up to 48 qubits (3,384 two-qubit gates) executed on the JUPITER exascale system (4,096 nodes, 16,384 GH200 superchips) that serve as reference; experimental samples up to 48 qubits are found statistically indistinguishable from these noiseless outputs. For larger sizes the authors apply a mean-of-means resampling procedure with a fixed 3σ threshold to experimental bitstrings only, concluding a coherent regime up to 93 qubits (12,834 two-qubit gates) and statistical indistinguishability from uniform random sampling at 95 qubits.
Significance. If the statistical procedure is shown to be calibrated and robust, the work supplies a concrete, quantitative boundary between noise-tolerant and random regimes for a near-term trapped-ion processor and demonstrates that exascale classical simulation can directly validate quantum hardware at the practical limit of classical tractability. The reported scale of the simulation (48 qubits, thousands of two-qubit gates on thousands of GPU nodes) is itself a technical achievement that strengthens the reference data.
major comments (2)
- [Statistical analysis] Statistical analysis (abstract and results sections): the central claim that experimental outputs remain distinguishable from random sampling up to 93 qubits (and become indistinguishable at 95 qubits) rests on a mean-of-means resampling procedure with an unvalidated fixed 3σ threshold. No calibration against controlled noisy data (e.g., noiseless LR-QAOA bitstrings with injected depolarizing or readout noise of varying strength), no reported false-positive rate, power analysis, or sensitivity to resampling count or output distribution is provided. This directly undermines the load-bearing distinction between the 93-qubit coherent regime and the 95-qubit random boundary.
- [Simulation methods and results] Simulation reference (methods and results up to 48 qubits): the noiseless exascale simulations are used to establish the noise-tolerant regime, yet the manuscript reports neither numerical error estimates, convergence diagnostics, nor floating-point precision checks for the 48-qubit runs. Without these, the statement that experimental samples “cannot be clearly distinguished” from the simulated reference lacks a quantified uncertainty.
minor comments (2)
- [Methods] The description of the mean-of-means procedure would be clearer if accompanied by an explicit equation or pseudocode block defining the resampling, the statistic being averaged, and the precise 3σ criterion.
- [Figures] Figure captions and axis labels for any plots showing the 3σ threshold or distinguishability metric should explicitly state the number of resamples and the exact test statistic used.
Simulated Author's Rebuttal
We thank the referee for the careful and constructive review. The comments identify key areas where additional validation and documentation will strengthen the manuscript's claims regarding the statistical distinction between coherent and random regimes and the reliability of the exascale simulation reference. We address each major comment below and outline the revisions we will make.
read point-by-point responses
-
Referee: [Statistical analysis] Statistical analysis (abstract and results sections): the central claim that experimental outputs remain distinguishable from random sampling up to 93 qubits (and become indistinguishable at 95 qubits) rests on a mean-of-means resampling procedure with an unvalidated fixed 3σ threshold. No calibration against controlled noisy data (e.g., noiseless LR-QAOA bitstrings with injected depolarizing or readout noise of varying strength), no reported false-positive rate, power analysis, or sensitivity to resampling count or output distribution is provided. This directly undermines the load-bearing distinction between the 93-qubit coherent regime and the 95-qubit random boundary.
Authors: We agree that explicit calibration of the mean-of-means procedure strengthens the central claim. In the revised manuscript we will add a dedicated subsection in the methods and results that applies the identical resampling protocol to synthetic datasets: noiseless LR-QAOA bitstrings with controlled depolarizing noise (p = 0.001–0.05) and readout error (ε = 0.001–0.02) injected at the circuit and measurement levels. For each noise strength we will report (i) the empirical false-positive rate at the fixed 3σ threshold, (ii) the power of the test as a function of the number of resamples (N = 100, 500, 1000), and (iii) the sensitivity of the 93-qubit versus 95-qubit boundary to modest changes in the threshold (2.5σ–3.5σ). This calibration will be performed on the same 48-qubit reference circuits already simulated, ensuring direct comparability with the experimental data. revision: yes
-
Referee: [Simulation methods and results] Simulation reference (methods and results up to 48 qubits): the noiseless exascale simulations are used to establish the noise-tolerant regime, yet the manuscript reports neither numerical error estimates, convergence diagnostics, nor floating-point precision checks for the 48-qubit runs. Without these, the statement that experimental samples “cannot be clearly distinguished” from the simulated reference lacks a quantified uncertainty.
Authors: The 48-qubit simulations were executed with double-precision (FP64) arithmetic throughout the state-vector evolution on the GH200 GPUs. To quantify numerical reliability we will add the following to the methods section: (1) a statement that all tensor contractions and gate applications used FP64 with no mixed-precision approximations; (2) convergence diagnostics obtained by repeating the 20-, 30-, and 40-qubit circuits with both the exascale code and an independent high-precision reference implementation (Qiskit Aer with exact state-vector mode), reporting the maximum total-variation distance between the two probability distributions (observed < 10^{-12}); (3) an a-posteriori error bound derived from the number of two-qubit gates and the machine epsilon, together with the empirical observation that increasing the number of shots in the simulation from 10^5 to 10^6 produced no statistically detectable change in the sampled bit-string distributions used for the indistinguishability test. These additions will supply the requested quantified uncertainty for the claim that experimental samples up to 48 qubits cannot be clearly distinguished from the simulated reference. revision: yes
Circularity Check
No significant circularity; analysis chain uses independent classical simulations and external statistical test
full rationale
The paper performs large-scale noiseless simulations on an exascale system up to 48 qubits to serve as an external reference, confirming that experimental QPU samples cannot be distinguished from noiseless outputs. For system sizes beyond this (up to 93-95 qubits), the analysis switches exclusively to experimental data and applies a mean-of-means resampling procedure with a fixed 3σ threshold to test statistical distinguishability from uniform random sampling. This procedure is not fitted to or derived from the simulation outputs, does not reduce to a self-definition, and involves no load-bearing self-citation or ansatz smuggling. The simulations act as an independent benchmark against which the experimental results are compared, and the statistical test compares to an external uniform distribution without tautologically forcing the coherent/random boundary from the paper's own inputs. The derivation chain is therefore self-contained against external references.
Axiom & Free-Parameter Ledger
free parameters (1)
- 3σ threshold
axioms (1)
- domain assumption Noiseless simulation on the exascale system accurately reproduces ideal quantum circuit outputs without numerical artifacts.
Reference graph
Works this paper leans on
-
[1]
Evidence for the utility of quantum computing before fault tolerance
Y . Kim, A. Eddins, S. Anand, K. X. Wei, E. van den Berg, S. Rosenblatt, H. Nayfeh, Y . Wu, M. Zaletel, K. Temme, and A. Kandala, “Evidence for the utility of quantum computing before fault tolerance,”Nature, vol. 618, pp. 500–505, 2023. [Online]. Available: https://doi.org/10.1038/s41586-023-06096-3
-
[2]
Helios: A 98-qubit trapped-ion quantum computer,
A. Ransford, M. S. Allman, J. Arkinstall, J. P. C. III, S. F. Cooper, R. D. Delaney, J. M. Dreiling, B. Estey, C. Figgatt, A. Hall, A. A. Husain, A. Isanaka, C. J. Kennedy, N. Kotibhaskar, I. S. Madjarov, K. Mayer, A. R. Milne, A. J. Park, A. P. Reed, R. Ancona, M. P. Andersen, P. Andres-Martinez, W. Angenent, L. Argueta, B. Arkin, L. Ascarrunz, W. Baker,...
-
[3]
Helios: A 98-qubit trapped-ion quantum computer,
[Online]. Available: https://doi.org/10.48550/arXiv.2511.05465
-
[4]
Quantum error correction below the surface code threshold,
R. Acharya, D. A. Abanin, L. Aghababaie-Beni, I. Aleiner, T. I. Andersen, M. Ansmann, F. Arute, K. Arya, A. Asfaw, N. Astrakhantsev, J. Atalaya, R. Babbush, D. Bacon, B. Ballard, J. C. Bardin, J. Bausch, A. Bengtsson, A. Bilmes, S. Blackwell, S. Boixo, G. Bortoli, A. Bourassa, J. Bovaird, L. Brill, M. Broughton, D. A. Browne, B. Buchea, B. B. Buckley, D. ...
-
[5]
Quantum error correction below the surface code threshold,
[Online]. Available: http://doi.org/10.1038/s41586-024-08449-y
-
[6]
D. Lall, A. Agarwal, W. Zhang, L. Lindoy, T. Lindstr ¨om, S. Webster, S. Hall, N. Chancellor, P. Wallden, R. Garcia-Patron, E. Kashefi, V . Kendon, J. Pritchard, A. Rossi, A. Datta, T. Kapourniotis, K. Georgopoulos, and I. Rungger, “A review and collection of metrics and benchmarks for quantum computers: definitions, methodologies and software,” 2025. [On...
-
[7]
Practical introduction to benchmarking and characterization of quantum computers,
A. Hashim, L. B. Nguyen, N. Goss, B. Marinelli, R. K. Naik, T. Chistolini, J. Hines, J. Marceaux, Y . Kim, P. Gokhale, T. Tomesh, S. Chen, L. Jiang, S. Ferracin, K. Rudinger, T. Proctor, K. C. Young, I. Siddiqi, and R. Blume-Kohout, “Practical introduction to benchmarking and characterization of quantum computers,”PRX Quantum, vol. 6, no. 3, 2025. [Online...
2025
-
[8]
Scalable noise estimation with random unitary operators,
J. Emerson, R. Alicki, and K. Zyczkowski, “Scalable noise estimation with random unitary operators,”Journal of Optics B: Quantum and Semiclassical Optics, vol. 7, no. 10, p. S347–S352, Sep. 2005. [Online]. Available: http://doi.org/10.1088/1464-4266/7/10/021
-
[9]
Benchmarking quantum processor performance at scale,
D. C. McKay, I. Hincks, E. J. Pritchett, M. Carroll, L. C. G. Govia, and S. T. Merkel, “Benchmarking quantum processor performance at scale,”
-
[10]
Available: https://doi.org/10.48550/arXiv.2311.05933
[Online]. Available: https://doi.org/10.48550/arXiv.2311.05933
-
[11]
A. W. Cross, L. S. Bishop, S. Sheldon, P. D. Nation, and J. M. Gambetta, “Validating quantum computers using randomized model circuits,”Phys. Rev. A, vol. 100, p. 032328, Sep 2019. [Online]. Available: https://doi.org/10.1103/PhysRevA.100.032328
-
[12]
Characterizing quantum supremacy in near-term devices,
S. Boixo, S. V . Isakov, V . N. Smelyanskiy, R. Babbush, N. Ding, Z. Jiang, M. J. Bremner, J. M. Martinis, and H. Neven, “Characterizing quantum supremacy in near-term devices,”Nature Physics, vol. 14, no. 6, pp. 595–600, 2018. [Online]. Available: https://doi.org/10.1038/ s41567-018-0124-x
2018
-
[13]
Evaluating the performance of quantum processing units at large width and depth,
J. A. Montanez-Barrera, K. Michielsen, and D. E. B. Neira, “Evaluating the performance of quantum processing units at large width and depth,”
-
[14]
[Online]. Available: https://doi.org/10.48550/arXiv.2502.06471
-
[15]
Measuring what matters: A scalable framework for application-level quantum benchmarking,
W. Aboumrad, C. Girotto, J. Goings, L. Zhao, M. A. Lopez-Ruiz, D. Zhu, A. Kaushik, S. Ray, S. Sekwao, J. Iaconis, A. Arrasmith, A. Maksymov, Y . de Sereville, F. Tripier, F. McKon, C. Collins, E. Epifanovsky, M. Yamada, and M. Roetteler, “Measuring what matters: A scalable framework for application-level quantum benchmarking,”
-
[16]
Measuring what matters: A scalable framework for application-level quantum benchmarking
[Online]. Available: https://arxiv.org/abs/2604.11781
work page internal anchor Pith review Pith/arXiv arXiv
-
[17]
A. Cosentino, C. Li, V . Russo, B. A. Chase, T. Lubinski, S. Niu, N. Patel, N. Shammah, and W. J. Zeng, “Metriq: A collaborative platform for benchmarking quantum computers,” 2026. [Online]. Available: https://arxiv.org/abs/2603.08680
-
[18]
H. D. Raedt, J. Kraus, A. Herten, V . Mehta, M. Bode, M. Hrywniak, K. Michielsen, and T. Lippert, “Universal quantum simulation of 50 qubits on europe‘s first exascale supercomputer harnessing its heterogeneous cpu-gpu architecture,” 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2511.03359
-
[19]
General framework for randomized benchmarking,
J. Helsen, I. Roth, E. Onorati, A. Werner, and J. Eisert, “General framework for randomized benchmarking,”PRX Quantum, vol. 3, no. 2, Jun. 2022. [Online]. Available: http://doi.org/10.1103/PRXQuantum.3. 020357
-
[20]
C. H. Baldwin, K. Mayer, N. C. Brown, C. Ryan-Anderson, and D. Hayes, “Re-examining the quantum volume test: Ideal distributions, compiler optimizations, confidence intervals, and scalable resource estimations,”Quantum, vol. 6, p. 707, May 2022. [Online]. Available: http://doi.org/10.22331/q-2022-05-09-707
-
[21]
Quantum volume in practice: What users can expect from nisq devices,
E. Pelofske, A. B ¨artschi, and S. Eidenbenz, “Quantum volume in practice: What users can expect from nisq devices,”IEEE Transactions on Quantum Engineering, vol. 3, p. 1–19, 2022. [Online]. Available: http://doi.org/10.1109/TQE.2022.3184764
-
[22]
A. Portik, O. Kalman, T. Monz, and Z. Zimboras, “Clifford volume and free fermion volume: Complementary scalable benchmarks for quantum computers,” 2025. [Online]. Available: https://doi.org/10.48550/arXiv. 2512.19413
work page internal anchor Pith review doi:10.48550/arxiv 2025
-
[23]
Benchmarking quantum coprocessors in an application-centric, hardware-agnostic, and scalable way,
S. Martiel, T. Ayral, and C. Allouche, “Benchmarking quantum coprocessors in an application-centric, hardware-agnostic, and scalable way,”IEEE Transactions on Quantum Engineering, vol. 2, p. 1–11,
-
[24]
Available: http://doi.org/10.1109/TQE.2021.3090207
[Online]. Available: http://doi.org/10.1109/TQE.2021.3090207
-
[25]
Benchmarking a trapped-ion quantum computer with 29 algorithmic qubits,
J.-S. Chen, E. Nielsen, M. Ebert, V . Inlek, K. Wright, V . Chaplin, A. Maksymov, E. P ´aez, A. Poudel, P. Maunz, and J. Gamble, “Benchmarking a trapped-ion quantum computer with 29 algorithmic qubits,”Quantum, 2023. [Online]. Available: https://doi.org/10.22331/ q-2024-11-07-1516
2023
-
[26]
A Quantum Approximate Optimization Algorithm
E. Farhi, J. Goldstone, and S. Gutmann, “A quantum approximate optimization algorithm,” 2014. [Online]. Available: https://doi.org/10. 48550/arXiv.1411.4028
work page internal anchor Pith review arXiv 2014
-
[27]
Quantum Computation by Adiabatic Evolution
E. Farhi, J. Goldstone, S. Gutmann, and M. Sipser, “Quantum computation by adiabatic evolution,” 2000. [Online]. Available: https://doi.org/10.48550/arXiv.quant-ph/0001106
work page Pith review doi:10.48550/arxiv.quant-ph/0001106 2000
-
[28]
https://doi.org/10.1103/PhysRevA.105.042406
V . R. Pascuzzi, A. He, C. W. Bauer, W. A. de Jong, and B. Nachman, “Computationally efficient zero-noise extrapolation for quantum-gate- error mitigation,”Phys. Rev. A, vol. 105, p. 042406, Apr 2022. [Online]. Available: https://doi.org/10.1103/PhysRevA.105.042406
-
[29]
Toward a linear-ramp qaoa protocol: evidence of a scaling advantage in solving some combinatorial optimization problems,
J. A. Monta ˜nez-Barrera and K. Michielsen, “Toward a linear-ramp qaoa protocol: evidence of a scaling advantage in solving some combinatorial optimization problems,”npj Quantum Information, vol. 11, no. 1, Aug
-
[30]
[Online]. Available: https://doi.org/10.1038/s41534-025-01082-1
-
[31]
Qiskit: An open-source framework for quantum computing,
The Qiskit Community, “Qiskit: An open-source framework for quantum computing,” 2023. [Online]. Available: https://qiskit.org/
2023
-
[32]
Cirq: A python framework for nisq-era quantum circuits,
The Cirq Developers, “Cirq: A python framework for nisq-era quantum circuits,” 2023. [Online]. Available: https://quantumai.google/cirq
2023
-
[33]
Qaptiva: Quantum application development platform,
Eviden, “Qaptiva: Quantum application development platform,” 2023. [Online]. Available: https://eviden.com/solutions/advanced-computing/ quantum-computing/qaptiva-hpc/
2023
-
[34]
Scaleqsim: Highly scalable quantum circuit simulation framework for exascale hpc systems,
C. Kim, E. Sohn, S. Kim, A. Sim, K. Wu, H. Tang, Y . Son, and S. Kim, “Scaleqsim: Highly scalable quantum circuit simulation framework for exascale hpc systems,”Proc. ACM Meas. Anal. Comput. Syst., vol. 9, no. 3, Dec. 2025. [Online]. Available: https://doi.org/10.1145/3771577
-
[35]
Massively parallel quantum computer simulator, eleven years later,
H. De Raedt, F. Jin, D. Willsch, M. Willsch, N. Yoshioka, N. Ito, S. Yuan, and K. Michielsen, “Massively parallel quantum computer simulator, eleven years later,”Comp. Phys. Comm., vol. 237, pp. 47 – 61, 2019. [Online]. Available: https://doi.org/10.1016/j.cpc.2018.11.005
-
[36]
Benchmarking the quantum approximate optimization algorithm,
M. Willsch, D. Willsch, F. Jin, H. De Raedt, and K. Michielsen, “Benchmarking the quantum approximate optimization algorithm,” Quantum Information Processing, vol. 19, no. 7, Jun. 2020. [Online]. Available: https://doi.org/10.1007/s11128-020-02692-8
-
[37]
Massively parallel quantum computer simulator,
K. De Raedt, K. Michielsen, H. De Raedt, B. Trieu, G. Arnold, M. Richter, Th. Lippert, H. Watanabe, and N. Ito, “Massively parallel quantum computer simulator,”Comp. Phys. Comm., vol. 176, pp. 121 – 136, 2007. [Online]. Available: https://doi.org/10.1016/j.cpc.2006.08. 007
-
[38]
Quantinuum systems’ workflow: Tracking usage with hardware quantum credits (HQCs),
Quantinuum Ltd., “Quantinuum systems’ workflow: Tracking usage with hardware quantum credits (HQCs),” 2025, accessed: 2026-04-08. [Online]. Available: https://docs.quantinuum.com/systems/user guide/ hardware user guide\protect\penalty\z@/workflow.html
2025
-
[39]
A Race-Track Trapped-Ion Quantum Processor
S. A. Moses, C. H. Baldwin, M. Allman, R. Ancona, L. Ascarrunz, C. Barnes, J. Bartolotta, B. Bjork, P. Blanchard, M. Bohn, J. Bohnet, N. Brown, N. Burdick, W. Burton, S. Campbell, J. Campora, C. Carron, J. Chambers, J. Chan, Y . Chen, A. Chernoguzov, E. Chertkov, J. Colina, J. Curtis, R. Daniel, M. DeCross, D. Deen, C. Delaney, J. Dreiling, C. Ertsgaard, ...
-
[40]
D. Alvarez, “JUWELS cluster and booster: Exascale pathfinder with modular supercomputing architecture at J ¨ulich Supercomputing Centre,” Journal of large-scale research facilities JLSRF, vol. 7, oct 2021. [Online]. Available: https://doi.org/10.17815/jlsrf-7-183
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.