Recognition: unknown
QMutBench: A Dataset of Quantum Circuit Mutants
Pith reviewed 2026-05-10 08:46 UTC · model grok-4.3
The pith
QMutBench supplies over 700,000 quantum circuit mutants as standardized benchmarks for testing techniques.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
QMutBench is a dataset containing over 700,000 quantum circuit mutants that represent different faults; it is accessible through an online interface that supports selection by original circuit, desired survival rate, and mutation characteristics such as faulty gate type.
What carries the argument
The online interface and filtering criteria that let users retrieve subsets of mutants to serve as fault benchmarks.
If this is right
- Developers can now measure test-suite quality by counting how many mutants each suite detects.
- Different testing techniques become directly comparable on identical mutant collections.
- Researchers can create new testing methods guided by the mutation operators already present in the dataset.
Where Pith is reading between the lines
- Widespread adoption could create a de-facto standard for reporting test effectiveness in quantum software papers.
- If the mutants prove unrepresentative of hardware noise, the dataset may need later calibration against real device error models.
- The same generation and hosting approach could be reused for other quantum programming languages or circuit representations.
Load-bearing premise
The generated mutants represent faults that are both representative of real quantum hardware errors and useful for distinguishing effective test suites from ineffective ones.
What would settle it
Apply several published quantum testing techniques to the same mutant subsets and measure whether the fraction of mutants killed consistently ranks the techniques in the same order as independent real-hardware fault-injection experiments.
Figures
read the original abstract
Quantum software testing has attracted interest in recent years, prompting the development of various techniques to automate the testing of quantum software. These techniques generate test cases that must be assessed for their effectiveness in detecting faults. Such an assessment requires benchmarks of faulty programs. However, there is a lack of benchmarks containing faults. In this data showcase, we propose QMutBench, a dataset that contains over 700,000 quantum circuit mutants representing different faults. The dataset is accessible via an online interface with selection criteria, such as the original quantum circuit(s) from which mutants are generated, the desired survival rate of the selected mutants, and other mutation characteristics (e.g., the type of faulty quantum gate). QMutBench provides quantum software developers and testers with an accessible online dataset to obtain benchmarks of mutants necessary to assess either the quality of the test cases generated by their testing technique or to compare different testing techniques. It also enables the development of new mutation-guided quantum software testing techniques.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents QMutBench, a dataset of over 700,000 quantum circuit mutants generated from original circuits to represent faults in quantum software. It describes an online interface allowing selection of mutants by criteria including the source circuit, survival rate, and mutation characteristics such as faulty gate type. The authors position the resource as a benchmark to evaluate the fault-detection effectiveness of test suites produced by quantum testing techniques, to compare different techniques, and to support development of mutation-guided testing methods.
Significance. If the mutants are shown to be representative of realistic faults and capable of distinguishing effective from ineffective test suites, the dataset would address a clear gap in quantum software testing benchmarks and enable reproducible empirical evaluations. The provision of an online selection interface is a practical strength that supports usability for the community.
major comments (3)
- [Abstract and §3 (dataset generation)] Abstract and dataset construction section: the central utility claim—that the mutants serve as benchmarks to assess or compare test-suite quality—requires evidence that some mutants are killed by certain test suites but not others. No mutation-score experiments, survival-rate analysis, or comparison of detection rates across techniques are reported, leaving the discriminative power unverified.
- [Abstract and §4] Abstract and §4 (validation or realism): the mutants are asserted to represent 'different faults,' yet no comparison is provided against real quantum hardware error models (e.g., depolarizing noise, T1/T2 relaxation, or gate-error distributions from IBM or Rigetti devices). Without such grounding, it is unclear whether the >700k mutants correspond to faults that occur in practice.
- [§2] §2 (mutation operators): the specific operators used to generate mutants from the original circuits are not enumerated or formally defined. This omission prevents assessment of whether the mutation set is comprehensive, non-redundant, or aligned with known quantum fault models.
minor comments (2)
- [§5] The online interface description would benefit from a screenshot or explicit list of all selectable fields to improve reproducibility for readers who cannot access the site immediately.
- [§3] Clarify the exact number of original circuits used as seeds and the distribution of mutant counts per seed circuit.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of QMutBench's potential utility and for the constructive major comments. We address each point below, indicating revisions where appropriate. As this is a data showcase paper, our focus is on releasing the dataset and interface rather than conducting full-scale empirical evaluations of testing techniques.
read point-by-point responses
-
Referee: Abstract and §3 (dataset generation): the central utility claim—that the mutants serve as benchmarks to assess or compare test-suite quality—requires evidence that some mutants are killed by certain test suites but not others. No mutation-score experiments, survival-rate analysis, or comparison of detection rates across techniques are reported, leaving the discriminative power unverified.
Authors: We agree that the manuscript does not report mutation-score experiments or direct comparisons of test-suite detection rates across techniques. As a data showcase, the paper's contribution is the release of the >700k mutants and the online interface that already supports selection by precomputed survival rate (among other criteria). This allows users to obtain mutant sets with desired killability for their own evaluations. To address the concern, we will add a short subsection in §3 with aggregate statistics on survival-rate distributions across the source circuits and an example of how the interface can be used to select benchmark sets for technique comparison. Full cross-technique experiments remain outside the scope of this data paper. revision: partial
-
Referee: Abstract and §4 (validation or realism): the mutants are asserted to represent 'different faults,' yet no comparison is provided against real quantum hardware error models (e.g., depolarizing noise, T1/T2 relaxation, or gate-error distributions from IBM or Rigetti devices). Without such grounding, it is unclear whether the >700k mutants correspond to faults that occur in practice.
Authors: The mutants are produced by applying syntactic mutation operators to quantum circuits drawn from established benchmarks; they are intended to represent programming-level faults rather than physical noise processes on specific hardware. We will revise the abstract and §4 to clarify this distinction and to note that the dataset does not claim to replicate hardware error distributions. A brief discussion of possible future extensions (e.g., weighting mutants by hardware error rates) will be added. No hardware-specific comparison data was collected for the current release. revision: yes
-
Referee: §2 (mutation operators): the specific operators used to generate mutants from the original circuits are not enumerated or formally defined. This omission prevents assessment of whether the mutation set is comprehensive, non-redundant, or aligned with known quantum fault models.
Authors: We thank the referee for pointing out this omission. Section 2 will be expanded to list and formally define every mutation operator (gate replacement, insertion, deletion, parameter perturbation, etc.), including the precise transformation rules and the source circuits to which they were applied. This addition will enable readers to evaluate coverage and alignment with quantum fault models. revision: yes
Circularity Check
No circularity: dataset release paper with no derivation or fitted results
full rationale
The paper is a data showcase describing the construction and online release of QMutBench, a collection of >700k mutants generated from quantum circuits via mutation operators. No equations, predictions, first-principles derivations, or parameter-fitting steps are present, so none of the enumerated circularity patterns (self-definitional, fitted-input-as-prediction, self-citation load-bearing, etc.) can apply. The central claim that the dataset enables assessment of test suites rests on an untested assumption about mutant realism, but this is an external-validity issue rather than a logical loop in which any result reduces to its own inputs by construction. The work is therefore self-contained as an artifact contribution.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Quantum software engineering: Roadmap and chal- lenges ahead,
J. M. Murillo, J. Garcia-Alonso, E. Moguel, J. Barzen, F. Leymann, S. Ali, T. Yue, P. Arcaini, R. P ´erez-Castillo, I. Garc ´ıa-Rodr´ıguez de Guzm´an, M. Piattini, A. Ruiz-Cort ´es, A. Brogi, J. Zhao, A. Miranskyy, and M. Wimmer, “Quantum software engineering: Roadmap and chal- lenges ahead,”ACM Trans. Softw. Eng. Methodol., vol. 34, no. 5, May 2025
2025
-
[2]
Testing and debugging quantum programs: The road to 2030,
N. C. Leite Ramalho, H. Amario de Souza, and M. Lordello Chaim, “Testing and debugging quantum programs: The road to 2030,”ACM Trans. Softw. Eng. Methodol., vol. 34, no. 5, May 2025
2030
-
[3]
Quantum program testing through commuting pauli strings on IBM’s quantum computers,
A. Muqeet, S. Ali, and P. Arcaini, “Quantum program testing through commuting pauli strings on IBM’s quantum computers,” inProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, ser. ASE ’24. New York, NY , USA: Association for Computing Machinery, 2024, pp. 2130–2141
2024
-
[4]
Assessing the effectiveness of input and output coverage criteria for testing quantum programs,
S. Ali, P. Arcaini, X. Wang, and T. Yue, “Assessing the effectiveness of input and output coverage criteria for testing quantum programs,” in2021 IEEE 14th International Conference on Software Testing, Validation and Verification (ICST), 2021, pp. 13–23
2021
-
[5]
Bugs4Q: A benchmark of real bugs for quantum programs,
P. Zhao, J. Zhao, Z. Miao, and S. Lan, “Bugs4Q: A benchmark of real bugs for quantum programs,” in2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2021, pp. 1373– 1376
2021
-
[6]
QBugs: A collection of reproducible bugs in quantum algorithms and a supporting infrastructure to enable con- trolled quantum software testing and debugging experiments,
J. Campos and A. Souto, “QBugs: A collection of reproducible bugs in quantum algorithms and a supporting infrastructure to enable con- trolled quantum software testing and debugging experiments,” in2021 IEEE/ACM 2nd International Workshop on Quantum Software Engineer- ing (Q-SE). Los Alamitos, CA, USA: IEEE Computer Society, 6 2021, pp. 28–32
2021
-
[7]
Muskit: A mutation analysis tool for quantum software testing,
E. Mendiluze Usandizaga, S. Ali, P. Arcaini, and T. Yue, “Muskit: A mutation analysis tool for quantum software testing,” inProceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering, ser. ASE ’21. IEEE Press, 2022, pp. 1266–1270
2022
-
[8]
QMutPy: A mutation testing tool for quantum algorithms and applications in Qiskit,
D. Fortunato, J. Campos, and R. Abreu, “QMutPy: A mutation testing tool for quantum algorithms and applications in Qiskit,” inProceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2022. New York, NY , USA: Association for Computing Machinery, 2022, pp. 797–800
2022
-
[9]
Quantum circuit mutants: Empirical analysis and recommendations,
E. Mendiluze Usandizaga, S. Ali, T. Yue, and P. Arcaini, “Quantum circuit mutants: Empirical analysis and recommendations,”Empirical Software Engineering, vol. 30, no. 4, p. 100, Apr 2025
2025
-
[10]
Open Quantum Assembly Language
A. W. Cross, L. S. Bishop, J. A. Smolin, and J. M. Gambetta, “Open quantum assembly language,”arXiv preprint arXiv:1707.03429, 2017
work page Pith review arXiv 2017
-
[11]
N. S. Yanofsky and M. A. Mannucci,Quantum computing for computer scientists. Cambridge University Press, 2008
2008
-
[12]
IBM quantum composer,
IBM, “IBM quantum composer,” 2025
2025
-
[13]
Chapter six - mutation testing advances: An analysis and survey,
M. Papadakis, M. Kintis, J. Zhang, Y . Jia, Y . Le Traon, and M. Harman, “Chapter six - mutation testing advances: An analysis and survey,” ser. Advances in Computers, A. M. Memon, Ed. Elsevier, 2019, vol. 112, pp. 275–378
2019
-
[14]
Mujava: an automated class mutation system,
Y .-S. Ma, J. Offutt, and Y . R. Kwon, “Mujava: an automated class mutation system,”Software Testing, Verification and Reliability, vol. 15, no. 2, pp. 97–133, 2005
2005
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.