pith. machine review for the scientific record. sign in

arxiv: 2604.11478 · v1 · submitted 2026-04-13 · 🪐 quant-ph

Recognition: unknown

Accuracy-Cost Trade-offs for Reference VQE Calculations of H₂ on IBM Quantum Hardware

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:17 UTC · model grok-4.3

classification 🪐 quant-ph
keywords variational quantum eigensolverquantum chemistryIBM Quantum hardwarehydrogen moleculeaccuracy-cost trade-offstapered mappingsresilience levelssession execution
0
0 comments X

The pith

Tapered mappings deliver the most consistent accuracy gains for VQE ground-state calculations of H2 on IBM quantum processors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper establishes a hardware-validated reference dataset for variational ground-state energy calculations of the hydrogen molecule on multiple IBM Quantum processors available in 2026. Using a fixed workflow, the authors measure how shot count, backend selection, optimization choices, and execution modes affect energy accuracy relative to exact diagonalization. Tapered mappings for circuit simplification produce the steadiest accuracy improvements across the tested setups. Resilience level 1 raises accuracy but at a marked cost increase, while session-based runs show no systematic accuracy benefit over single-job runs despite substantially higher billed time. The dataset and analysis supply a concrete baseline for new users to anticipate trade-offs in quantum-chemistry applications on present hardware.

Core claim

Across the configurations studied, circuit simplification through tapered mappings provides the most consistent accuracy gains, resilience level 1 improves accuracy at a substantial cost premium, and session-based execution yields no systematic accuracy advantage over single-job execution despite markedly higher billed time.

What carries the argument

The standardized benchmarking workflow that systematically varies shot count, backend, optimization strategy including tapered mappings, resilience settings, and session versus single-job execution to quantify accuracy-cost trade-offs for VQE on H2.

If this is right

  • Tapered mappings should be applied early in similar VQE workflows to obtain accuracy improvements without increasing shot counts.
  • Resilience level 1 can be chosen when accuracy is prioritized over cost, but the premium must be budgeted in advance.
  • Single-job execution remains sufficient for accuracy in this H2 case, making it preferable when minimizing billed time.
  • The released dataset lets practitioners estimate expected energy errors and run costs before launching their own calculations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Extending the workflow to slightly larger molecules would test whether the accuracy advantage of tapered mappings persists as circuit depth increases.
  • The lack of accuracy benefit from sessions implies that hardware drift during the experiment is not the dominant error source in these VQE runs.
  • Future reductions in hardware noise could shrink the cost gap for resilience level 1 and change the observed trade-off.

Load-bearing premise

The standardized workflow and selected hardware configurations are representative of typical VQE usage in quantum chemistry and that observed variability generalizes beyond the tested backends and periods.

What would settle it

Repeating the full set of VQE runs for H2 on an additional IBM backend or with a different small molecule such as LiH and checking whether tapered mappings still produce the largest accuracy gains would test the central claim.

Figures

Figures reproduced from arXiv: 2604.11478 by Jeanette Lorenz, Julen Larrucea, Marita Oliv.

Figure 1
Figure 1. Figure 1: Convergence of COBYLA and SPSA energies on Kingsto [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Cross-backend comparison at PT, COBYLA, 1024 shot [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Distribution of per-iteration quantum execution [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Backend comparison at fixed optimization depth (it [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Energy error and billed time for different numbers o [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Cell-wise early-convergence distributions ( [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Mapper comparison on ibm_aachen (left) and ibm_brussels (right) at COBYLA, 1024 shots, resilience=0, session mode. Eerr = |E − Eexact| (reported as Eerr × 10 in a.u.); btime denotes billed time; bars span minimum to maximum value. 12 [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Energy error and billed time vs. number of shots for [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Energy-improvement for resilience level 1 (left) [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Resilience impact on btime at PT, COBYLA, 1024 shots, session mode. Eerr = |E − Eexact| (reported as Eerr × 10 in a.u.); btime denotes billed time and qtime denotes quantum execution time [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Session vs single-job comparison of Eerr across backends for workflow with 1024 shots and PT mapper without error mitigation. Eerr = |E − Eexact| (reported as Eerr × 10 in a.u.). substantially higher billed time than single-job execution (Sections 3.6). Revisiting common best practices Several workflow choices are often recommended as broadly beneficial, including the use of runtime sessions, or increased… view at source ↗
Figure 12
Figure 12. Figure 12: Session vs single-job timing comparison across b [PITH_FULL_IMAGE:figures/full_fig_p017_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Shot sweep on ibm_brussels for PT, COBYLA, resilience=0, session mode. Eerr = |E − Eexact| (reported as Eerr × 10 in a.u.); btime denotes billed time and the vertical bars spread minimum to maximum value. 2  2 [PITH_FULL_IMAGE:figures/full_fig_p022_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Shot sweep on ibm_kingston for PT, COBYLA, resilience=0, session mode. Eerr = |E − Eexact| (reported as Eerr × 10 in a.u.); btime denotes billed time and the vertical bars spread minimum to maximum value [PITH_FULL_IMAGE:figures/full_fig_p022_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Cross-backend comparison at PT, COBYLA, 1024 sho [PITH_FULL_IMAGE:figures/full_fig_p023_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Compact multi-backend shot-sweep view with per- [PITH_FULL_IMAGE:figures/full_fig_p025_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Quantum time as function of number of shots aggreg [PITH_FULL_IMAGE:figures/full_fig_p025_17.png] view at source ↗
read the original abstract

We present a hardware-validated reference dataset for variational ground-state energy calculations of the hydrogen molecule H\(_2\) on several IBM Quantum processors available in 2026. Using a standardized workflow, we benchmark the impact of shot count, backend choice, optimization strategy, and runtime variability on the achievable energy accuracy relative to exact diagonalization. The resulting dataset and analysis provide a transparent baseline for assessing the current capabilities and limitations of IBM Quantum hardware for quantum-chemistry applications, and are meant to ease the entry for new users by providing a comprehensive overview of choices and their effects as well as runtime efforts and costs that can be expected. Across the configurations studied here, circuit simplification through tapered mappings provides the most consistent accuracy gains, resilience level 1 improves accuracy at a substantial cost premium, and session-based execution yields no systematic accuracy advantage over single-job execution despite markedly higher billed time.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript presents a hardware-validated reference dataset for variational quantum eigensolver (VQE) ground-state energy calculations of the H₂ molecule on several 2026 IBM Quantum processors. Using a standardized workflow, it benchmarks the effects of shot count, backend choice, optimization strategy, and runtime variability (including single-job vs. session execution) on energy accuracy relative to exact diagonalization. The central descriptive findings are that tapered mappings yield the most consistent accuracy gains, resilience level 1 improves accuracy at a substantial cost premium, and session-based execution provides no systematic accuracy advantage despite higher billed time. The work positions the dataset and analysis as a transparent baseline to assist new users with accuracy-cost trade-offs in quantum chemistry applications.

Significance. If the empirical observations hold under the stated scope, the paper supplies a useful, reproducible reference point for current IBM Quantum hardware performance in small-molecule VQE calculations. The explicit scoping of claims to the tested configurations, the standardized workflow, and the focus on both accuracy and billed-time costs are strengths that could ease entry for practitioners and support future benchmarking studies. No parameter-free derivations or machine-checked proofs are present, but the direct hardware measurements against exact diagonalization provide falsifiable, configuration-specific data.

major comments (2)
  1. [§4] §4 (Results) and associated figures/tables: The headline claim of 'no systematic accuracy advantage' for session-based execution over single-job execution is presented without statistical tests, confidence intervals, or quantitative error-bar analysis to support the absence of a difference, despite the manuscript noting runtime variability; visual or qualitative comparison alone is insufficient to substantiate this load-bearing observation.
  2. [Methods] Methods section: The standardized workflow is outlined at a high level, but the manuscript does not include raw data tables, complete error-bar details, the full list of tested configurations, or explicit criteria used to avoid post-hoc selection; these omissions prevent independent verification of the three headline claims on accuracy gains from tapered mappings, resilience level 1, and session execution.
minor comments (3)
  1. [Abstract] Abstract: The number of backends, total configurations, and specific shot counts or resilience levels tested could be stated explicitly to give readers immediate context for the scope of the findings.
  2. [Figures] Figure captions and axis labels: Ensure all panels clearly indicate the metric (e.g., energy error in Hartree), error representation (standard deviation, etc.), and backend identifiers so that the accuracy-cost plots are self-contained.
  3. [References] References: Include the exact IBM backend names and calibration dates used in 2026 to allow precise reproduction of the hardware conditions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful and constructive review of our manuscript. We address each major comment point by point below, indicating the revisions we will make to improve clarity, reproducibility, and statistical rigor.

read point-by-point responses
  1. Referee: [§4] §4 (Results) and associated figures/tables: The headline claim of 'no systematic accuracy advantage' for session-based execution over single-job execution is presented without statistical tests, confidence intervals, or quantitative error-bar analysis to support the absence of a difference, despite the manuscript noting runtime variability; visual or qualitative comparison alone is insufficient to substantiate this load-bearing observation.

    Authors: We agree that formal statistical support would strengthen the claim. In the revised manuscript we will add quantitative error bars (derived from the observed runtime variability) to the relevant figures in §4. We will also include a statistical comparison between single-job and session results, using a paired test appropriate to the data distribution (e.g., Wilcoxon signed-rank test) together with 95 % confidence intervals on the mean energy difference. The text will explicitly discuss the limitations imposed by runtime variability and state that the conclusion of “no systematic advantage” is now supported by both visual inspection and the statistical analysis. revision: yes

  2. Referee: [Methods] Methods section: The standardized workflow is outlined at a high level, but the manuscript does not include raw data tables, complete error-bar details, the full list of tested configurations, or explicit criteria used to avoid post-hoc selection; these omissions prevent independent verification of the three headline claims on accuracy gains from tapered mappings, resilience level 1, and session execution.

    Authors: We accept that additional methodological detail is required for independent verification. We will expand the Methods section with a more granular description of the workflow. In addition, we will deposit a Supplementary Information file containing (i) raw data tables for every configuration, (ii) the exact formulas and values used for all error bars, (iii) the complete enumerated list of tested configurations together with all circuit and runtime parameters, and (iv) the explicit, a-priori selection criteria employed to avoid post-hoc bias. Cross-references to this supplementary material will be inserted in both the Methods and Results sections so that readers can directly verify the three headline claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a purely empirical benchmarking study that reports hardware measurements of VQE energies for H2 against exact diagonalization. All central claims (accuracy gains from tapered mappings, cost trade-offs for resilience levels, and lack of session advantage) are direct observational comparisons from a standardized experimental workflow on specific IBM backends. No equations, derivations, fitted parameters, or self-citations are invoked as load-bearing steps that reduce any result to its own inputs by construction. The work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper is an empirical benchmarking study and introduces no new theoretical entities or derivations. It relies on standard quantum-chemistry assumptions.

axioms (1)
  • domain assumption Exact diagonalization yields the true ground-state energy of H2
    Invoked implicitly when accuracy is measured relative to exact diagonalization; standard in quantum-chemistry benchmarks.

pith-pipeline@v0.9.0 · 5455 in / 1333 out tokens · 83485 ms · 2026-05-10T16:17:00.911862+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages

  1. [1]

    A variational eigenvalue solver on a photonic quantum proc essor,

    A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. Zh ou, P. J. Love, A. Aspuru-Guzik, and J. L. O’Brien, “A variational eigenvalue solver on a photonic quantum proc essor,” Nature Communications, vol. 5, p. 4213, 2014

  2. [2]

    Hardware- efficient variational quantum eigensolver for small molecul es and quantum magnets,

    A. Kandala, A. Mezzacapo, K. Temme, M. Takita, M. Brink, J . M. Chow, and J. M. Gambetta, “Hardware- efficient variational quantum eigensolver for small molecul es and quantum magnets,” Nature, vol. 549, pp. 242–246, 2017

  3. [3]

    Quantum chemistry as a benchmark for near-term quantum computers,

    A. J. McCaskey, Z. P. Parks, J. Jakowski, S. Moore, T. Morr is, and T. S. Humble, “Quantum chemistry as a benchmark for near-term quantum computers,” npj Quantum Information , vol. 5, p. 99, 2019

  4. [4]

    The variational quantum eigensolver: A review of methods and best practices ,

    J. Tilly, H. Chen, Y. Cao, D. Picozzi, K. Setia, Y. Li, E. Gr ant, L. Wossnig, and I. Rungger, “The variational quantum eigensolver: A review of methods and best practices ,” Physics Reports, vol. 986, pp. 1–128, 2022

  5. [5]

    Hartree-fock on a superconducting qubit quantum compute r,

    F. Arute, K. Arya, R. Babbush et al. , “Hartree-fock on a superconducting qubit quantum compute r,” Science, vol. 369, no. 6507, pp. 1084–1089, 2020

  6. [6]

    Expressibil ity and entangling capability of parameterized quantum circuits for hybrid quantum-classical algorithms ,

    S. Sim, P. D. Johnson, and A. Aspuru-Guzik, “Expressibil ity and entangling capability of parameterized quantum circuits for hybrid quantum-classical algorithms ,” Advanced Quantum Technologies, vol. 2, no. 12, p. 1900070, 2019

  7. [7]

    Noise re silience of variational quantum compiling,

    K. Sharma, S. Khatri, M. Cerezo, and P. J. Coles, “Noise re silience of variational quantum compiling,” New Journal of Physics , vol. 22, no. 4, p. 043006, apr 2020. [Online]. A vailable: https://doi.org/10.1088/1367-2630/ab784c

  8. [8]

    Hybrid quantum-class ical algorithms and quantum error mitigation,

    S. Endo, S. C. Benjamin, and Y. Li, “Hybrid quantum-class ical algorithms and quantum error mitigation,” Journal of the Physical Society of Japan , vol. 90, no. 3, p. 032001, 2021

  9. [9]

    Progress towar ds practical quantum variational algorithms,

    D. Wecker, M. B. Hastings, and M. Troyer, “Progress towar ds practical quantum variational algorithms,” Physical Review A , vol. 92, p. 042303, 2015

  10. [10]

    Unbiasing qubit noise with quantum error mitigation,

    W. Huggins, J. Lee, B. O’Gorman et al. , “Unbiasing qubit noise with quantum error mitigation,” Nature, vol. 605, pp. 500–505, 2022

  11. [11]

    PySCF: the Python-based simulations of chemistry framework,

    Q. Sun, T. C. Berkelbach, N. S. Blunt, G. H. Booth, S. Guo, Z. Li, J. Liu, J. D. McCl ain, E. R. Sayfutyarova, S. Sharma, S. Wouters, and G. K.-L. Chan, “Pys cf: the python-based simulations of chemistry framework,” WIREs Computational Molecular Science , vol. 8, no. 1, p. e1340, 2018. [Online]. A vailable: https://wires.onlinelibrary.wiley.com/doi/abs/10...

  12. [12]

    Array programming with NumPy ,

    C. R. Harris, K. J. Millman, S. J. van der Walt, R. Gommers , P. Virtanen, D. Cournapeau, E. Wieser, J. Taylor, S. Berg, N. J. Smith, R. Kern, M. Picus, S. Hoyer, M. H. van Kerkwijk, M. Brett, A. Haldane, 19 J. F. del Río, M. Wiebe, P. Peterson, P. Gérard-Marchant, K. S heppard, T. Reddy, W. Weckesser, H. Abbasi, C. Gohlke, and T. E. Oliphant, “Array prog...

  13. [13]

    Über das paulische Äquivalenz verbot,

    P. Jordan and E. Wigner, “Über das paulische Äquivalenz verbot,” Zeitschrift für Physik , vol. 47, pp. 631–651, 1928

  14. [14]

    The bravyi-k itaev transformation for quantum computation of electronic structure,

    J. T. Seeley, M. J. Richard, and P. J. Love, “The bravyi-k itaev transformation for quantum computation of electronic structure,” The Journal of Chemical Physics , vol. 137, no. 22, p. 224109, 2012

  15. [15]

    Tapering off qubits to simulate fermionic Hamiltonians

    S. Bravyi, J. M. Gambetta, A. Mezzacapo, and K. Temme, “T apering off qubits to simulate fermionic hamil- tonians,” arXiv preprint arXiv:1701.08213 , 2017

  16. [16]

    Error mitigation and resilience options in qiskit run time,

    “Error mitigation and resilience options in qiskit run time,” https://quantum.cloud.ibm.com/docs/guides/con figure-error-mitiga accessed December 2025

  17. [17]

    Qiskit runtime estimator options: resilience,

    “Qiskit runtime estimator options: resilience,” http s://quantum.cloud.ibm.com/docs/api/qiskit-ibm-runti me/options-resilien accessed December 2025

  18. [18]

    A quantum engineer’s guide to superconducting qubits,

    P. Krantz, M. Kjaergaard, F. Yan, T. P. Orlando, S. Gusta vsson, and W. D. Oliver, “A quantum engineer’s guide to superconducting qubits,” Applied Physics Reviews , vol. 6, no. 2, p. 021318, 2019

  19. [19]

    Nocedal and S

    J. Nocedal and S. J. Wright, Numerical Optimization. Springer, 2006

  20. [20]

    Multivariate stochastic approximation usi ng a simultaneous perturbation gradient approximation,

    J. Spall, “Multivariate stochastic approximation usi ng a simultaneous perturbation gradient approximation,” IEEE Transactions on Automatic Control , vol. 37, no. 3, pp. 332–341, 1992

  21. [21]

    Spsa optimizer — qiskit algorit hms,

    Qiskit Contributors, “Spsa optimizer — qiskit algorit hms,” https://qiskit-community.github.io/qiskit-algo rithms/stubs/qiskit_ 2024, accessed 2026-01

  22. [22]

    Evalua ting the impact of noise on the performance of the variational quantum eigensolver,

    M. Oliv, A. Matic, T. Messerer, and J. M. Lorenz, “Evalua ting the impact of noise on the performance of the variational quantum eigensolver,” 2022. [Online]. A vaila ble: https://arxiv.org/abs/2209.12803

  23. [23]

    Scalability c hallenges in variational quantum optimiza- tion under stochastic noise,

    A. Bärligea, B. Poggel, and J. M. Lorenz, “Scalability c hallenges in variational quantum optimiza- tion under stochastic noise,” Phys. Rev. A , vol. 112, p. 032407, Sep 2025. [Online]. A vailable: https://link.aps.org/doi/10.1103/rgyh-8xw8

  24. [24]

    Best-Practice Aspects of Quantum-Computer Calculations: A Case Study of the Hydroge n Molecule,

    I. Miháliková, M. Friák, M. Pivoluska, M. Plesch, M. Sai p, and M. Šob, “Best-Practice Aspects of Quantum-Computer Calculations: A Case Study of the Hydroge n Molecule,” Molecules (Basel, Switzerland) , vol. 27, no. 3, p. 597, Jan. 2022. [Online]. A vailable: https ://arxiv.org/abs/2112.01208

  25. [25]

    Perfo rmance optimisation for drift-robust fidelity improvement of two-qubit gates,

    G. A. L. White, C. D. Hill, and L. C. L. Hollenberg, “Perfo rmance optimisation for drift-robust fidelity improvement of two-qubit gates,” Nov. 2019, arXiv:1911.12 096 [quant-ph] version: 1. [Online]. A vailable: http://arxiv.org/abs/1911.12096

  26. [26]

    IBM Quantum System Two: the era of quantum utility is he re | IBM Quantum Computing Blog

    “IBM Quantum System Two: the era of quantum utility is he re | IBM Quantum Computing Blog. ” [Online]. A vailable: https://www.ibm.com/quantum/blog/quantum-roadmap-2033

  27. [27]

    IBM launches Aachen, a 156-qubit quantum s ys- tem powered by Heron r2 processor,

    C. Trueman, “IBM launches Aachen, a 156-qubit quantum s ys- tem powered by Heron r2 processor,” Apr. 2025. [Online]. A va ilable: https://www.sdxcentral.com/news/ibm-launches-aachen -a-156-qubit-quantum-system-powered-by-heron-r2-pro cessor/

  28. [28]

    Probing Context- Dependent Errors in Quantum Processors,

    K. Rudinger, T. Proctor, D. Langharst, M. Sarovar, K. Yo ung, and R. Blume-Kohout, “Probing Context- Dependent Errors in Quantum Processors,” Physical Review X , vol. 9, no. 2, p. 021045, Jun. 2019, publisher: American Physical Society. [Online]. A vailable: https:// link.aps.org/doi/10.1103/PhysRevX.9.021045

  29. [29]

    Benchmarking VQE Configurations: Architectures, Initializations, and Optimizers for Silic on Ground State Energy,

    Z. Boutakka, N. Innan, M. Shafique, M. Bennai, and Z. Sakh i, “Benchmarking VQE Configurations: Architectures, Initializations, and Optimizers for Silic on Ground State Energy,” Oct. 2025, arXiv:2510.23171 [quant-ph] version: 1. [Online]. A vailable: http://arxiv .org/abs/2510.23171

  30. [30]

    Variational quantum algorithms,

    M. Cerezo, A. Arrasmith, R. Babbush et al., “Variational quantum algorithms,” Nature Reviews Physics, vol. 3, pp. 625–644, 2021

  31. [31]

    Error mitigati on for short-depth quantum circuits,

    K. Temme, S. Bravyi, and J. M. Gambetta, “Error mitigati on for short-depth quantum circuits,” Physical Review Letters, vol. 119, p. 180509, 2017. 20 A Supporting Information A.1 VQE Workflow: Detailed Object-Level Overview This subsection expands the end-to-end VQE chain used in thi s paper, from molecular input to hardware-measured energy and timing outp...

  32. [32]

    Step 1: Driver (chemistry model instantiation)

    provides a well structured review. Step 1: Driver (chemistry model instantiation). The workflow starts with PySCFDriver, which receives the molecular specification (atomic geometry, basis set, charg e, and spin). It executes the classical electronic-structu re setup and returns an ElectronicStructureProblem. At this stage, no quantum circuit exists yet: thi...