pith. machine review for the scientific record. sign in

arxiv: 2604.18837 · v1 · submitted 2026-04-20 · 🪐 quant-ph · cs.LG

Recognition: unknown

Benchmarking Quantum Kernel Support Vector Machines Against Classical Baselines on Tabular Data: A Rigorous Empirical Study with Hardware Validation

Authors on Pith no claims yet

Pith reviewed 2026-05-10 03:59 UTC · model grok-4.3

classification 🪐 quant-ph cs.LG
keywords quantum kernel methodssupport vector machinestabular data classificationbenchmarking studystatistical significance testingkernel eigenspectraclassical baselines
0
0 comments X

The pith

Quantum kernel support vector machines show no statistically significant advantage over classical kernels on nine tabular classification datasets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that quantum kernels for support vector machines offer no reliable performance gain over classical alternatives when applied to standard tabular data. It runs 970 experiments across nine binary classification tasks, four quantum feature maps, and three classical kernels, using strict nested cross-validation and multiple noise models. None of the 29 direct quantum-versus-classical comparisons reach significance at the 0.05 level. Dataset identity explains 73 percent of the observed variance in results, while kernel type explains only 9 percent. Spectral analysis of the kernels indicates that current quantum feature maps produce eigenspectra that are either too uniform or too peaked, unlike the balanced profile of the strongest classical kernel.

Core claim

Across the tested quantum feature maps and classical kernels, no pairwise comparison yields a statistically significant accuracy difference at alpha equals 0.05. Dataset choice dominates performance variance with an effect size of 0.73, while kernel choice contributes only 0.09. Current quantum kernels exhibit eigenspectra that are either too flat or too concentrated and therefore miss the intermediate eigenvalue distribution that characterizes the best classical radial basis function kernel. Quantum kernel training reaches one competitive result on the breast cancer dataset but at roughly two thousand times the computational cost of classical training.

What carries the argument

Kruskal-Wallis factorial analysis of performance variance combined with direct comparison of kernel eigenspectra.

If this is right

  • Dataset selection should be the primary focus when designing tabular classification experiments rather than kernel engineering.
  • Quantum feature maps require redesign to generate eigenspectra that more closely match those of effective classical kernels such as the radial basis function.
  • Quantum kernel training can produce isolated competitive results but incurs orders-of-magnitude higher computational cost.
  • Hardware execution of quantum kernels can maintain fidelity above 0.97 on current devices for the tested circuits.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Extending the benchmark to datasets with explicit high-dimensional structure or known quantum-friendly correlations could reveal conditions under which the current spectral mismatch disappears.
  • Repeating the study with adaptive feature maps that target intermediate eigenvalue profiles would provide a direct test of the mechanistic explanation offered by the spectral analysis.

Load-bearing premise

The nine chosen datasets and four quantum feature maps are representative of the situations in which quantum kernel methods could show an advantage on tabular data.

What would settle it

A new dataset or feature map on which at least one quantum kernel achieves a statistically significant accuracy improvement over the strongest classical baseline under identical nested cross-validation.

Figures

Figures reproduced from arXiv: 2604.18837 by Christoph Strohmeyer, Michael Schlotter, Siavash Kakavand.

Figure 1
Figure 1. Figure 1: Balanced accuracy comparison across all nine datasets. For each dataset, bars show [PITH_FULL_IMAGE:figures/full_fig_p014_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Kruskal–Wallis epsilon-squared (ε 2 ) effect sizes for seven experimental factors. Dataset choice dominates performance variance at ε 2 = 0.73, while kernel family contributes only ε 2 = 0.089. Factorial analysis A Kruskal–Wallis analysis examines seven experimental factors on bal￾anced accuracy across all 222 extended-study experiments. Dataset choice dominates with ε 2 = 0.73 (p < 10−31), explaining 73% … view at source ↗
Figure 3
Figure 3. Figure 3: Kernel eigenspectra comparison across all extended-study configurations (3 datasets, [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Hardware vs. simulation kernel entries for six experiments on IBM ibm [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Dataset suitability criterion: non-linearity gap (RBF [PITH_FULL_IMAGE:figures/full_fig_p021_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Quantum circuits for the four evaluated feature maps, shown at [PITH_FULL_IMAGE:figures/full_fig_p027_6.png] view at source ↗
Figure 1
Figure 1. Figure 1: SVM balanced accuracy across all six hardware experiments. Solid bars: hardware [PITH_FULL_IMAGE:figures/full_fig_p033_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Kernel fidelity (Pearson r) of the hardware kernel matrix versus ideal statevector sim￾ulation for each experiment. All experiments exceed r = 0.97. Rot2dof (product-state circuits) consistently achieves higher fidelity than belis (entangling circuits), reflecting its shallower tran￾spiled depth on Heron r2. 33 [PITH_FULL_IMAGE:figures/full_fig_p033_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Kernel mean absolute error (MAE) versus transpiled circuit depth for all six hardware [PITH_FULL_IMAGE:figures/full_fig_p034_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Kernel matrix heatmaps for HW-01 (belis, breast [PITH_FULL_IMAGE:figures/full_fig_p034_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Kernel matrix heatmaps for HW-02 (rot2dof, breast [PITH_FULL_IMAGE:figures/full_fig_p035_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Computational cost comparison (log scale). Classical kernels complete in [PITH_FULL_IMAGE:figures/full_fig_p040_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Learning curves for all eight datasets across six training fractions (10%–100%). Each [PITH_FULL_IMAGE:figures/full_fig_p041_7.png] view at source ↗
read the original abstract

Quantum kernel methods have been proposed as a promising approach for leveraging near-term quantum computers for supervised learning, yet rigorous benchmarks against strong classical baselines remain scarce. We present a comprehensive empirical study of quantum kernel support vector machines (QSVMs) across nine binary classification datasets, four quantum feature maps, three classical kernels, and multiple noise models, totalling 970 experiments with strict nested cross-validation. Our analysis spans four phases: (i) statistical significance testing, revealing that none of 29 pairwise quantum-classical comparisons reach significance at $\alpha = 0.05$; (ii) learning curve analysis over six training fractions, showing steeper quantum slopes on six of eight datasets that nonetheless fail to close the gap to the best classical baseline; (iii) hardware validation on IBM ibm_fez (Heron r2), demonstrating kernel fidelity $r \geq 0.976$ across six experiments; and (iv) seed sensitivity analysis confirming reproducibility (mean CV 1.4%). A Kruskal-Wallis factorial analysis reveals that dataset choice dominates performance variance ($\varepsilon^2 = 0.73$), while kernel type accounts for only 9%. Spectral analysis offers a mechanistic explanation: current quantum feature maps produce eigenspectra that are either too flat or too concentrated, missing the intermediate profile of the best classical kernel, the radial basis function (RBF). Quantum kernel training (QKT) via kernel-target alignment yields the single competitive result -- balanced accuracy 0.968 on breast cancer -- but with ~2,000x computational overhead. Our findings provide actionable guidelines for quantum kernel research. The complete benchmark suite is publicly available to facilitate reproduction and extension.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript reports a large-scale empirical benchmark of quantum kernel support vector machines (QSVMs) on nine tabular binary classification datasets using four quantum feature maps, three classical kernels, and 970 total experiments under strict nested cross-validation. It finds that none of the 29 pairwise quantum-classical comparisons reach statistical significance at α=0.05, that dataset identity accounts for 73% of performance variance while kernel type accounts for only 9%, that learning curves show steeper but non-closing quantum slopes, and that hardware execution on IBM Heron yields kernel fidelity r≥0.976. Spectral analysis is offered to explain why current quantum maps fail to match the RBF kernel's intermediate eigenspectrum; quantum kernel training is noted as the sole competitive case but at ~2000× overhead. A public benchmark suite is released.

Significance. If the negative result holds, the work supplies a high-quality reference benchmark that tempers expectations for near-term quantum kernels on tabular data and supplies concrete spectral diagnostics plus reproducible code. The combination of nested CV, Kruskal-Wallis decomposition, learning-curve analysis, hardware fidelity checks, and seed-sensitivity verification constitutes a falsifiable, reproducible empirical contribution that the field can directly extend.

minor comments (3)
  1. §4.2 and Figure 3: the learning-curve panels would benefit from explicit shading for the standard deviation across the five outer folds rather than only mean lines, to make the overlap with classical baselines visually clearer.
  2. Table 2: the column headers for the four quantum feature maps are abbreviated (e.g., “ZZ”, “Pauli”) without a footnote reminding the reader of the exact circuit depth and entanglement structure used; adding this would aid reproducibility.
  3. §5.3: the statement that “dataset choice dominates” is supported by ε²=0.73, but the text does not report the corresponding partial η² or the full ANOVA table; including the complete factorial decomposition would strengthen the variance-partition claim.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive and thorough review, which accurately summarizes the scope, methodology, and findings of our benchmark study. The recommendation to accept is appreciated, as is the recognition of the reproducibility measures (nested CV, seed sensitivity, public code release) and the mechanistic spectral analysis. No major comments were raised that require rebuttal or revision.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper is a purely empirical benchmark study reporting results from 970 experiments on public datasets using nested cross-validation, Kruskal-Wallis variance decomposition, learning curves, and hardware fidelity checks. All performance claims (e.g., no pairwise significance at α=0.05, ε²=0.73 for dataset vs. 9% for kernel) are direct statistical outputs from external baselines and standard tests; none reduce by the paper's own equations or self-citations to fitted inputs or self-referential quantities. The design tests its stated scope without internal contradiction or load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central empirical claims rest on standard machine-learning evaluation assumptions and the representativeness of selected datasets and feature maps rather than derived quantities.

axioms (1)
  • domain assumption Performance metrics across folds and seeds follow distributions amenable to Kruskal-Wallis and pairwise significance testing
    Invoked for the statistical significance analysis in phase (i).

pith-pipeline@v0.9.0 · 5621 in / 1226 out tokens · 42409 ms · 2026-05-10T03:59:58.687019+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

17 extracted references · 16 canonical work pages · 1 internal anchor

  1. [1]

    Towards a realistic track reconstruction algorithm based on graph neural networks for the HL-LHC

    doi:10.1051/epjconf/202125103070. Jacob Biamonte, Peter Wittek, Nicola Pancotti, Patrick Rebentrost, Nathan Wiebe, and Seth Lloyd. Quantum machine learning.Nature, 549(7671):195–202,

  2. [2]

    Quantum machine learning.Nature, 549(7671):195–202, Sep 2017

    doi:10.1038/nature23474. Christopher M. Bishop. Training with noise is equivalent to Tikhonov regularization.Neural Computation, 7(1):108–116, 1995.doi:10.1162/neco.1995.7.1.108. Gavin C. Cawley and Nicola L. C. Talbot. On over-fitting in model selection and subsequent selection bias in performance evaluation.Journal of Machine Learning Research, 11:2079– 2107,

  3. [3]

    Machine Learning , year =

    Corinna Cortes and Vladimir Vapnik. Support-vector networks.Machine Learning, 20(3): 273–297, 1995.doi:10.1007/BF00994018. Corinna Cortes, Mehryar Mohri, and Afshin Rostamizadeh. Algorithms for learning kernels based on centered alignment.Journal of Machine Learning Research, 13:795–828,

  4. [4]

    ics.uci.edu/ml

    URLhttps://archive. ics.uci.edu/ml. Accessed: 2025-01-15. Milton Friedman. The use of ranks to avoid the assumption of normality implicit in the anal- ysis of variance.Journal of the American Statistical Association, 32(200):675–701,

  5. [5]

    Journal of the American Statistical Association , volume =

    doi:10.1080/01621459.1937.10503522. Jennifer R. Glick, Tanvi P. Gujarati, Antonio D. C´ orcoles, Youngseok Kim, Abhinav Kandala, Jay M. Gambetta, and Kristan Temme. Covariant quantum kernels for data with group structure.Nature Physics, 20:479–483, 2024.doi:10.1038/s41567-023-02340-9. Vojtˇ ech Havl´ ıˇ cek, Antonio D. C´ orcoles, Kristan Temme, Aram W. H...

  6. [6]

    43 Hsin-Yuan Huang, Michael Broughton, Masoud Mohseni, Ryan Babbush, Sergio Boixo, Hart- mut Neven, and Jarrod R

    doi:10.1103/PhysRevA.106.052421. 43 Hsin-Yuan Huang, Michael Broughton, Masoud Mohseni, Ryan Babbush, Sergio Boixo, Hart- mut Neven, and Jarrod R. McClean. Power of data in quantum machine learning.Nature Communications, 12(1):2631, 2021.doi:10.1038/s41467-021-22539-9. Thomas Hubregtsen, David Wierichs, Elies Gil-Fuster, Peter-Jan H. S. Derks, Paul K. Fae...

  7. [7]

    Training quantum embedding kernels on near-term quantum computers

    doi:10.1103/PhysRevA.106.042431. IBM Quantum. IBM quantum platform.https://quantum.ibm.com,

  8. [8]

    Quantum computing with Qiskit

    Ali Javadi-Abhari, Matthew Treinish, Kevin Krsulich, Christopher J. Wood, Jake Lish- man, Julien Gacon, Simon Martiel, Paul D. Nation, Lev S. Bishop, Andrew W. Cross, Blake R. Johnson, and Jay M. Gambetta. Quantum computing with Qiskit.arXiv preprint arXiv:2405.08810,

  9. [9]

    doi:10.1080/01621459.1952.10483441. Jonas M. K¨ ubler, Simon Buchholz, and Bernhard Sch¨ olkopf. The inductive bias of quantum kernels. InAdvances in Neural Information Processing Systems, volume 34, pages 12661– 12673, Red Hook, NY,

  10. [10]

    Learning the parts of objects by non-negative matrix factorization.Nature1999, 401, 788–791

    Curran Associates, Inc. Daniel D. Lee and H. Sebastian Seung. Learning the parts of objects by non-negative matrix factorization.Nature, 401(6755):788–791, 1999.doi:10.1038/44565. Yunchao Liu, Srinivasan Arunachalam, and Kristan Temme. A rigorous and robust quan- tum speed-up in supervised machine learning.Nature Physics, 17:1013–1017,

  11. [11]

    doi:10.1038/s41567-021-01287-z. Fabian Pedregosa, Ga¨ el Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and ´Edouard Duchesnay. Scikit-learn: Machine learning in Python.J...

  12. [12]

    Patrick Rebentrost, Masoud Mohseni, and Seth Lloyd

    doi:10.1038/s41534-021-00498-9. Patrick Rebentrost, Masoud Mohseni, and Seth Lloyd. Quantum support vector ma- chine for big data classification.Physical Review Letters, 113(13):130503,

  13. [13]

    Alona Sakhnenko, Corey O’Meara, Kumar J

    doi:10.1103/PhysRevLett.113.130503. Alona Sakhnenko, Corey O’Meara, Kumar J. B. Ghosh, Christian B. Mendl, Giorgio Cortiana, and Juan Bernab´ e-Moreno. Hybrid classical-quantum autoencoder for anomaly detection. Quantum Machine Intelligence, 4(2):27, 2022.doi:10.1007/s42484-022-00075-z. 44 Jan Schnabel and Marco Roth. Quantum kernel methods under scrutiny...

  14. [14]

    Quantum machine learning in feature Hilbert spaces

    Maria Schuld and Nathan Killoran. Quantum machine learning in feature Hilbert spaces.Phys- ical Review Letters, 122(4):040504, 2019.doi:10.1103/PhysRevLett.122.040504. John Shawe-Taylor and Nello Cristianini.Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge, UK, 2004.doi:10.1017/CBO9780511809682. Ruslan Shaydulin and Stefan M. Wil...

  15. [15]

    Joaquin Vanschoren, Jan N

    doi:10.1038/s41467-024-49287-w. Joaquin Vanschoren, Jan N. van Rijn, Bernd Bischl, and Luis Torgo. OpenML: Networked science in machine learning.ACM SIGKDD Explorations Newsletter, 15(2):49–60,

  16. [16]

    URL https://dl.acm.org/doi/10.1145/2641190.2641198

    doi:10.1145/2641190.2641198. Vladimir N. Vapnik.Statistical Learning Theory. Wiley-Interscience, New York,

  17. [17]

    Bias in error estimation when using cross- validation for model selection,

    Sudhir Varma and Richard Simon. Bias in error estimation when using cross-validation for model selection.BMC Bioinformatics, 7(1):91, 2006.doi:10.1186/1471-2105-7-91. Frank Wilcoxon. Individual comparisons by ranking methods.Biometrics Bulletin, 1(6):80–83, 1945.doi:10.2307/3001968. 45