pith. sign in

arxiv: 2607.00516 · v1 · pith:GBDJHCNOnew · submitted 2026-07-01 · 💻 cs.SE

Auditing Empirical Comparisons in Quantum Software

Pith reviewed 2026-07-02 09:01 UTC · model grok-4.3

classification 💻 cs.SE
keywords quantum softwareempirical comparisonsauditing frameworkmaterialization gapreproducibilitybenchmarkingCLAIMSTAB-QCevidence classification
0
0 comments X

The pith

Only 8 of 455 reported quantum-software comparisons expose enough evidence for locked audit without proxy reconstruction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CLAIMSTAB-QC, a source-bounded auditing framework that records baselines, metrics, relations, and admissible evidence for a reported comparison, then locks the design before checking outcomes. When applied to 455 comparative claims drawn from 119 quantum-software papers, the framework shows a steep materialization gap: 175 claims can be represented for planning, 79 become scalar-directional records, 53 produce lockable designs, and just 8 supply matched evidence sufficient to audit the original claim. Of those 8, the outcomes split into 2 Sustained, 4 Unresolved, and 2 Reversed. Controlled diagnostics on 24 additional benchmark comparisons indicate that simpler checks often preserve directions whose support weakens once the audit scope is locked.

Core claim

CLAIMSTAB-QC classifies strict scalar-directional comparisons as Sustained, Unresolved, or Reversed inside a locked audit scope. Evaluation on 455 claims yields a materialization gap in which only 8 records expose matched evidence without proxy reconstruction, producing 2 Sustained, 4 Unresolved, and 2 Reversed outcomes; diagnostics show simpler checks can retain apparent directions that locked designs weaken.

What carries the argument

CLAIMSTAB-QC, a source-bounded framework that records baselines, metric, relation, and admissible evidence, locks the comparison design, and reports a scoped relation outcome or explicit evidence boundary.

If this is right

  • Most reported performance edges between compilers, optimizers, backends, or ansatzes cannot be verified from the evidence the papers expose.
  • Published comparisons that appear directional under informal checks frequently become Unresolved or Reversed once the audit scope is locked.
  • Benchmark-relevant comparisons require explicit recording of admissible evidence and locked designs before outcomes are computed.
  • Simpler post-hoc checks tend to preserve directions whose support weakens under the stricter locked-audit procedure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Journals and conferences could require authors to supply a locked-audit record alongside each comparative claim.
  • The same framework could be applied to other empirical domains where tool comparisons depend on benchmark scope and noise assumptions.
  • Reproducibility efforts would gain from treating the comparison design itself as an auditable artifact rather than only the code or data.
  • Extending CLAIMSTAB-QC to multi-metric or non-directional relations would cover a larger fraction of the 455 claims.

Load-bearing premise

The 455 extracted comparative claims are representative of empirical comparisons in the quantum software literature and CLAIMSTAB-QC's evidence classification rules can be applied consistently from the information stated in the source papers.

What would settle it

Re-running the audit on the same 119 papers after authors supply the missing matched evidence for the 45 claims that reached lockable designs but lacked full evidence, and counting how many of the original directions remain Sustained.

Figures

Figures reproduced from arXiv: 2607.00516 by Arif Ali Khan, Boshuai Ye, Maryam Tavassoli Sabzevari, Peng Liang.

Figure 1
Figure 1. Figure 1: The CLAIMSTAB-QC workflow. A reported comparison is represented as a claim card, locked before outcome computation, evaluated through comparison records, and reported with an explicit evidence boundary. TABLE II CORE CLAIMSTAB-QC CONCEPTS. Concept Meaning Reported comparison A source-paper statement comparing two baselines under a metric, scope, and outcome rule. Claim card Fixed representation of a report… view at source ↗
Figure 2
Figure 2. Figure 2: Corpus extraction and materialization funnel. The 53 lockable designs [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Wilson 95% confidence intervals for the eight Tier-1 proxy-free scoped [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: EX-C7 locked-cell grid. Each cell aggregates five transpiler seeds and [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
read the original abstract

Empirical quantum-software papers often report that one compiler, optimizer, backend, or ansatz outperforms another. Such comparisons are not properties of a tool alone: they can change with benchmark scope, circuit construction, compilation, sampling, backend or noise assumptions, optimizer choices, and resource budgets. Existing testing, benchmarking, and reproducibility methods help assess programs, tools, executions, and platforms, but they do not directly audit whether the reported comparison itself is supported by the evidence exposed in the source paper or accompanying materials. We present CLAIMSTAB-QC, a source-bounded framework for auditing empirical comparisons in quantum software. Given a reported comparison, the framework records the baselines, metric, relation, and admissible evidence; locks the comparison design before outcomes are computed; and reports either a scoped relation outcome or an explicit evidence boundary. For strict scalar-directional comparisons, the reported direction is classified as Sustained, Unresolved, or Reversed within the locked audit scope. We evaluate CLAIMSTAB-QC on 455 comparative claims from 119 quantum-software papers. The central finding is a materialization gap: 175 claims can be represented for audit planning, 79 become scalar-directional planning records, 53 yield lockable audit or diagnostic designs, and only 8 expose enough matched evidence to audit the original comparison without proxy reconstruction. These 8 records yield 2 Sustained, 4 Unresolved, and 2 Reversed outcomes. Controlled diagnostics over 24 benchmark-relevant comparisons further show that simpler checks can preserve apparent directions whose support weakens under locked audit designs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces CLAIMSTAB-QC, a source-bounded framework that records baselines, metrics, relations, and admissible evidence for empirical comparisons in quantum software, locks the audit design, and classifies strict scalar-directional outcomes as Sustained, Unresolved, or Reversed. Applied to 455 comparative claims extracted from 119 papers, it reports a materialization gap: 175 claims representable for planning, 79 scalar-directional, 53 lockable designs, and only 8 fully auditable without proxies, yielding 2 Sustained, 4 Unresolved, and 2 Reversed. Controlled diagnostics on 24 comparisons illustrate that simpler checks can preserve directions that weaken under locked audits.

Significance. If the sampled corpus is representative, the materialization gap would demonstrate that most reported comparisons in quantum software lack sufficient exposed evidence for direct verification, with implications for reproducibility and benchmarking practices in the field. The framework itself is a constructive contribution that separates planning from outcome computation and applies to external papers without circularity or self-referential parameters.

major comments (2)
  1. [Abstract and evaluation section] The selection of the 119 papers and extraction of the 455 claims is presented without any search strategy, inclusion/exclusion criteria, date bounds, database, or sampling justification (Abstract and the evaluation that produces the headline counts 175/79/53/8). This is load-bearing for the central claim of a literature-wide materialization gap, as the steep drop-off could be an artifact of an arbitrary or convenience corpus rather than a representative sample.
  2. [Abstract and framework application] The manuscript provides no information on claim selection criteria, inter-rater reliability, or how CLAIMSTAB-QC's evidence classification rules handle ambiguous cases when reducing 455 claims to the reported counts (Abstract). Without these details the precise materialization numbers cannot be independently verified or reproduced.
minor comments (1)
  1. [Abstract] The abstract uses the symbol 'o' in the chain 455 claims o 175 representable; this should be replaced by an explicit arrow or '→' for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which identify key gaps in methodological transparency. We address each major comment below and will revise the manuscript to incorporate the requested details.

read point-by-point responses
  1. Referee: [Abstract and evaluation section] The selection of the 119 papers and extraction of the 455 claims is presented without any search strategy, inclusion/exclusion criteria, date bounds, database, or sampling justification (Abstract and the evaluation that produces the headline counts 175/79/53/8). This is load-bearing for the central claim of a literature-wide materialization gap, as the steep drop-off could be an artifact of an arbitrary or convenience corpus rather than a representative sample.

    Authors: We agree that the absence of explicit corpus-construction details weakens the support for a literature-wide claim. The current manuscript describes the counts but does not document how the 119 papers were identified. In revision we will add a new subsection (likely 4.1) that reports: the database(s) queried (arXiv), the date range, the keyword combinations used to locate quantum-software papers containing empirical comparisons, the inclusion criteria applied to retain only papers with at least one explicit baseline-metric-relation statement, and any exclusion rules (e.g., purely theoretical or simulation-only works). We will also state that the sample is a convenience corpus of recent, publicly available papers rather than a probabilistically representative draw, and we will qualify the materialization-gap finding accordingly while retaining the illustrative value of the 8 fully auditable cases. revision: yes

  2. Referee: [Abstract and framework application] The manuscript provides no information on claim selection criteria, inter-rater reliability, or how CLAIMSTAB-QC's evidence classification rules handle ambiguous cases when reducing 455 claims to the reported counts (Abstract). Without these details the precise materialization numbers cannot be independently verified or reproduced.

    Authors: We concur that reproducibility of the headline counts requires documentation of the claim-extraction and classification process. The manuscript currently reports only the final tallies. In the revised version we will expand Section 4 to include: (i) the operational definition used to identify a “comparative claim” (explicit mention of two or more baselines, a scalar or directional metric, and a stated relation), (ii) whether extraction was performed by a single rater or multiple raters and, if the latter, any inter-rater agreement statistic, and (iii) concrete examples of ambiguous cases together with the exact rule from CLAIMSTAB-QC that resolved them (e.g., “when the paper states a direction but omits variance, the claim is classified as scalar-directional but not lockable”). These additions will allow an independent team to replicate the reduction from 455 to 8. revision: yes

Circularity Check

0 steps flagged

No circularity: framework application to external corpus yields independent counts

full rationale

The paper defines CLAIMSTAB-QC as a source-bounded auditing procedure and applies its classification rules (representable claims, scalar-directional records, lockable designs, matched evidence) directly to 455 claims extracted from 119 external quantum-software papers. The resulting materialization gap (175→79→53→8) is produced by those rule applications on outside data; no equations, fitted parameters, or self-citation chains reduce the reported outcomes to quantities defined inside the present work. The evaluation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The central claim rests on the introduction of CLAIMSTAB-QC and the representativeness of the sampled claims; no free parameters, standard axioms, or independent evidence for the framework are supplied in the abstract.

invented entities (1)
  • CLAIMSTAB-QC no independent evidence
    purpose: Source-bounded framework for auditing empirical comparisons
    Newly defined in the paper to perform the audits described.

pith-pipeline@v0.9.1-grok · 5818 in / 1200 out tokens · 41497 ms · 2026-07-02T09:01:42.085609+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

50 extracted references · 15 canonical work pages · 3 internal anchors

  1. [1]

    M. A. Nielsen and I. L. Chuang,Quantum computation and quantum information. Cambridge University Press, 2010

  2. [2]

    Algorithms for quantum computation: discrete logarithms and factoring,

    P. W. Shor, “Algorithms for quantum computation: discrete logarithms and factoring,” inProceedings 35th Annual Symposium on Foundations of Computer Science (FOCS). IEEE, 1994, pp. 124–134

  3. [3]

    A fast quantum mechanical algorithm for database search,

    L. K. Grover, “A fast quantum mechanical algorithm for database search,” inProceedings of the 28th Annual ACM symposium on Theory of computing (STOC). ACM, 1996, pp. 212–219

  4. [4]

    Quantum computing in the NISQ era and beyond,

    J. Preskill, “Quantum computing in the NISQ era and beyond,”Quantum, vol. 2, p. 79, 2018

  5. [5]

    A variational eigenvalue solver on a photonic quantum processor,

    A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. O’brien, “A variational eigenvalue solver on a photonic quantum processor,”Nature Communications, vol. 5, no. 1, p. 4213, 2014

  6. [6]

    A Quantum Approximate Optimization Algorithm

    E. Farhi, J. Goldstone, and S. Gutmann, “A quantum approximate optimization algorithm,”arXiv preprint arXiv:1411.4028, 2014

  7. [7]

    Cirq: A python framework for creating, editing, and invoking noisy intermediate-scale quantum (NISQ) circuits,

    Cirq Developers, “Cirq: A python framework for creating, editing, and invoking noisy intermediate-scale quantum (NISQ) circuits,” https:// github.com/quantumlib/Cirq, 2022, quantum AI Team, Google

  8. [8]

    PennyLane: Automatic differentiation of hybrid quantum-classical computations

    V . Bergholm, J. Izaac, M. Schuld, C. Gogolin, S. Ahmed, V . Ajith, M. S. Alam, G. Alonso-Linaje, B. AkashNarayanan, A. Asadiet al., “PennyLane: Automatic differentiation of hybrid quantum-classical computations,”arXiv preprint arXiv:1811.04968, 2018

  9. [9]

    Quantum computer benchmarking: An explorative systematic literature review,

    T. Rohe, F. H. Ruiloba, S. Egger, S. von Beck, J. Stein, and C. Linnhoff- Popien, “Quantum computer benchmarking: An explorative systematic literature review,”arXiv preprint arXiv:2509.03078, 2025

  10. [10]

    Tackling the qubit mapping problem for nisq-era quantum devices,

    G. Li, Y . Ding, and Y . Xie, “Tackling the qubit mapping problem for nisq-era quantum devices,” inProceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, 2019, pp. 1001–1014

  11. [11]

    An empirical study into the effects of transpilation on quantum circuit smells,

    M. D. Stefano, D. D. Nucci, F. Palomba, and A. D. Lucia, “An empirical study into the effects of transpilation on quantum circuit smells,”Empirical Software Engineering, vol. 29, no. 3, p. 61, 2024

  12. [12]

    MorphQ: Metamorphic testing of the qiskit quantum computing platform,

    M. Paltenghi and M. Pradel, “MorphQ: Metamorphic testing of the qiskit quantum computing platform,” inProceedings of the 45th IEEE/ACM International Conference on Software Engineering (ICSE). IEEE, 2023, pp. 2413–2424

  13. [13]

    Benchmarking the performance of quantum computing software for quantum circuit creation, manipulation and compilation,

    P. D. Nation, A. A. Saki, S. Brandhofer, L. Bello, S. Garion, M. Treinish, and A. Javadi-Abhari, “Benchmarking the performance of quantum computing software for quantum circuit creation, manipulation and compilation,”Nature Computational Science, vol. 5, pp. 427–435, 2025

  14. [14]

    1-2-3 reproducibility for quantum software experiments,

    W. Mauerer and S. Scherzinger, “1-2-3 reproducibility for quantum software experiments,” inProceedings of the 29th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 2022, pp. 1247–1248

  15. [15]

    Stability of quantum computers,

    S. Dasgupta, “Stability of quantum computers,”arXiv preprint arXiv:2404.19082, 2024

  16. [16]

    Bench- marking the quantum approximate optimization algorithm,

    M. Willsch, D. Willsch, F. Jin, H. De Raedt, and K. Michielsen, “Bench- marking the quantum approximate optimization algorithm,”Quantum Information Processing, vol. 19, no. 7, p. 197, 2020

  17. [17]

    Quantum noise in the flow of time: A temporal study of the noise in quantum computers,

    B. Baheri, Q. Guan, V . Chaudhary, and A. Li, “Quantum noise in the flow of time: A temporal study of the noise in quantum computers,” inProceedings of the 28th IEEE International Symposium on On-Line Testing and Robust System Design (IOLTS). IEEE, 2022, pp. 1–5

  18. [18]

    Adaptive mitigation of time-varying quantum noise,

    S. Dasgupta, T. S. Humble, and A. Danageozian, “Adaptive mitigation of time-varying quantum noise,” inProceedings of the 4th IEEE International Conference on Quantum Computing and Engineering (QCE). IEEE, 2023, pp. 99–110

  19. [19]

    CLAIMSTAB-QC: Audit evidence package,

    B. Ye, P. Liang, M. T. Sabzevari, and A. A. Khan, “CLAIMSTAB-QC: Audit evidence package,” 2026, artifact package to be released publicly after the review period

  20. [20]

    Arline benchmarks: Automated benchmarking platform for quantum compilers,

    Y . Kharkov, A. Ivanova, E. Mikhantiev, and A. Kotelnikov, “Arline benchmarks: Automated benchmarking platform for quantum compilers,” arXiv preprint arXiv:2202.14025, 2022

  21. [21]

    Probable inference, the law of succession, and statistical inference,

    E. B. Wilson, “Probable inference, the law of succession, and statistical inference,”Journal of the American Statistical Association, vol. 22, no. 158, pp. 209–212, 1927

  22. [22]

    Interval estimation for a binomial proportion,

    L. D. Brown, T. T. Cai, and A. DasGupta, “Interval estimation for a binomial proportion,”Statistical Science, vol. 16, no. 2, pp. 101–133, 2001

  23. [23]

    D. G. Altman, D. Machin, T. N. Bryant, and M. J. Gardner, Eds.,Statistics with Confidence: Confidence Intervals and Statistical Guidelines, 2nd ed. London: BMJ Books, 2000

  24. [24]

    Quantum computing with Qiskit

    A. Javadi-Abhari, M. Treinish, K. Krsulich, C. J. Wood, J. Lishman, J. Gacon, S. Martiel, P. D. Nation, L. S. Bishop, A. W. Crosset al., “Quantum computing with Qiskit,”arXiv preprint arXiv:2405.08810, 2024

  25. [25]

    Array programming with numpy,

    C. R. Harris, K. J. Millman, S. J. Van Der Walt, R. Gommers, P. Virtanen, D. Cournapeau, E. Wieser, J. Taylor, S. Berg, N. J. Smithet al., “Array programming with numpy,”Nature, vol. 585, no. 7825, pp. 357–362, 2020

  26. [26]

    SciPy 1.0: Fundamental algorithms for scientific computing in Python,

    P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Brightet al., “SciPy 1.0: Fundamental algorithms for scientific computing in Python,” Nature Methods, vol. 17, no. 3, pp. 261–272, 2020

  27. [27]

    A comparison of quantum compilers using a DAG-based or phase polynomial-based intermediate representation,

    A. Meijer-van de Griend, “A comparison of quantum compilers using a DAG-based or phase polynomial-based intermediate representation,” arXiv preprint arXiv:2304.08814, 2023

  28. [28]

    Optimal layout synthesis for quantum computing,

    B. Tan and J. Cong, “Optimal layout synthesis for quantum computing,” arXiv preprint arXiv:2007.15671, 2020

  29. [29]

    Quantum tree generator improves QAOA state-of-the-art for the knapsack problem,

    P. Christiansen, L. Binkowski, D. Ramacciotti, and S. Wilkening, “Quantum tree generator improves QAOA state-of-the-art for the knapsack problem,”arXiv preprint arXiv:2411.00518, 2024

  30. [30]

    Eclipse Qrisp QAOA: description and preliminary comparison with Qiskit counterparts,

    E. Osaba, M. Petri ˇc, I. Oregi, R. Seidel, A. Ruiz, S. Bock, and M.-A. Kourtis, “Eclipse Qrisp QAOA: description and preliminary comparison with Qiskit counterparts,”arXiv preprint arXiv:2405.20173, 2024

  31. [31]

    Reducing the CNOT count for Clifford+T circuits on NISQ architectures,

    V . Gheorghiu, J. Huang, S. M. Li, M. Mosca, and P. Mukhopadhyay, “Reducing the CNOT count for Clifford+T circuits on NISQ architectures,” arXiv preprint arXiv:2011.12191, 2020

  32. [32]

    Highly optimized quantum circuits synthesized via data- flow engines,

    P. Rakyta, G. Morse, J. N ´adori, Z. Majnay-Tak ´acs, O. Mencer, and Z. Zimbor ´as, “Highly optimized quantum circuits synthesized via data- flow engines,”arXiv preprint arXiv:2211.07685, 2022

  33. [33]

    QASMBench: A low- level quantum benchmark suite for NISQ evaluation and simulation,

    A. Li, S. Stein, S. Krishnamoorthy, and J. Ang, “QASMBench: A low- level quantum benchmark suite for NISQ evaluation and simulation,” ACM Transactions on Quantum Computing, vol. 4, no. 2, pp. 1–26, 2023

  34. [34]

    MQT Bench: Bench- marking software and design automation tools for quantum computing,

    N. Quetschlich, L. Burgholzer, and R. Wille, “MQT Bench: Bench- marking software and design automation tools for quantum computing,” Quantum, vol. 7, p. 1062, 2023

  35. [35]

    MaxCut quantum approximate optimization algorithm performance guarantees for p >1 ,

    J. Wurtz and P. J. Love, “MaxCut quantum approximate optimization algorithm performance guarantees for p >1 ,”Physical Review A, vol. 103, no. 4, p. 042612, 2021

  36. [36]

    Increasing transparency through a multiverse analysis,

    S. Steegen, F. Tuerlinckx, A. Gelman, and W. Vanpaemel, “Increasing transparency through a multiverse analysis,”Perspectives on Psychologi- cal Science, vol. 11, no. 5, pp. 702–712, 2016

  37. [37]

    Specification curve analysis,

    U. Simonsohn, J. P. Simmons, and L. D. Nelson, “Specification curve analysis,”Nature Human Behaviour, vol. 4, no. 11, pp. 1208–1214, 2020

  38. [38]

    Qdiff: Differential testing of quantum software stacks,

    J. Wang, Q. Zhang, G. H. Xu, and M. Kim, “Qdiff: Differential testing of quantum software stacks,” inProceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2021, pp. 692–704

  39. [39]

    Muskit: A mutation analysis tool for quantum software testing,

    E. Mendiluze, S. Ali, P. Arcaini, and T. Yue, “Muskit: A mutation analysis tool for quantum software testing,” inProceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2021, pp. 1266–1270

  40. [40]

    Quito: a coverage-guided test generator for quantum programs,

    X. Wang, P. Arcaini, T. Yue, and S. Ali, “Quito: a coverage-guided test generator for quantum programs,” inProceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2021, pp. 1237–1241

  41. [41]

    MorphQ++: A reproducibility study of metamorphic testing on quantum compilers,

    L. J. Kitt and M. B. Cohen, “MorphQ++: A reproducibility study of metamorphic testing on quantum compilers,” inProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW). ACM, 2024, pp. 8–14

  42. [42]

    Qite: Assembly-level, cross-platform testing of quantum computing platforms,

    M. Paltenghi and M. Pradel, “Qite: Assembly-level, cross-platform testing of quantum computing platforms,”arXiv preprint arXiv:2503.17322, 2025

  43. [43]

    Qsimbench: An execution-level benchmark suite for quantum software engineering,

    G. Bisicchia, A. Bocci, J. Garc ´ıa-Alonso, J. M. Murillo, and A. Brogi, “Qsimbench: An execution-level benchmark suite for quantum software engineering,” inProceedings of the 6th IEEE International Conference on Quantum Computing and Engineering (QCE). IEEE, 2025, pp. 175–180

  44. [44]

    The state of open science in software engineering research: A case study of ICSE artifacts,

    A. Muttakin, S. Mondal, and C. K. Roy, “The state of open science in software engineering research: A case study of ICSE artifacts,”arXiv preprint arXiv:2601.02066, 2026

  45. [45]

    Qef: Reproducible and exploratory quantum software experiments,

    V . Gierisch and W. Mauerer, “Qef: Reproducible and exploratory quantum software experiments,”arXiv preprint arXiv:2511.04563, 2025

  46. [46]

    Quantum software experiments: A reporting and laboratory package structure guidelines proposal,

    E. Moguel, J. A. Parejo, A. Ruiz-Cort ´es, J. Garcia-Alonso, and J. M. Murillo, “Quantum software experiments: A reporting and laboratory package structure guidelines proposal,” inProceedings of the 4th IEEE International Conference on Quantum Software (QSW). IEEE, 2025, pp. 185–194

  47. [47]

    Reproducibility in quantum computing,

    S. Dasgupta and T. S. Humble, “Reproducibility in quantum computing,” inProceedings of the 20th IEEE Computer Society Annual Symposium on VLSI (ISVLSI). IEEE, 2021, pp. 458–461

  48. [48]

    Bugs in quantum computing platforms: an empirical study,

    M. Paltenghi and M. Pradel, “Bugs in quantum computing platforms: an empirical study,”Proceedings of the ACM on Programming Languages, vol. 6, no. OOPSLA1, pp. 1–27, 2022

  49. [49]

    The quantum frontier of software engineering: A systematic mapping study,

    M. De Stefano, F. Pecorelli, D. Di Nucci, F. Palomba, and A. De Lucia, “The quantum frontier of software engineering: A systematic mapping study,”Information and Software Technology, vol. 175, p. 107525, 2024

  50. [50]

    Quantum software testing: State of the art,

    A. Garc´ıa de la Barrera, I. Garc ´ıa-Rodr´ıguez de Guzm´an, M. Polo, and M. Piattini, “Quantum software testing: State of the art,”Journal of Software: Evolution and Process, vol. 35, no. 4, p. e2419, 2023