Spectral Anatomy of Quantum Gaussian Process Kernels

Chao Li; Delu Zeng; Guang Lin; Jian Xu; John Paisley; Qibin Zhao; Yuning Qiu

arxiv: 2605.30952 · v2 · pith:IHWWGULCnew · submitted 2026-05-29 · 💻 cs.LG

Spectral Anatomy of Quantum Gaussian Process Kernels

Jian Xu , Chao Li , Guang Lin , Yuning Qiu , Delu Zeng , John Paisley , Qibin Zhao This is my paper

Pith reviewed 2026-06-28 23:34 UTC · model grok-4.3

classification 💻 cs.LG

keywords quantum gaussian processesspectral entropykernel gram matrixnyström approximationvariance contractionbayesian optimizationquantum kernelsposterior pathologies

0 comments

The pith

The normalized spectral entropy of the kernel Gram matrix governs both the absence of exponential speedups and the appearance of posterior pathologies in quantum Gaussian processes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that two separate issues in quantum Gaussian processes are controlled by one quantity. The lack of exponential speedups in typical well-conditioned regression and the posterior pathologies that break Bayesian optimization with expressive kernels both trace to the normalized spectral entropy S(K)/log n of the kernel Gram matrix. A sympathetic reader would care because this supplies a single, kernel-agnostic diagnostic that also transfers from simulation to real hardware with small error. The authors prove a tail bound on Nyström error, a variance-contraction identity using degrees of freedom, and a link between optimal entropy and the target's intrinsic dimension in the eigenbasis. They further show that the same entropy curves appear for hardware-efficient, matchgate, IQP, and classical kernels alike.

Core claim

We show that these seemingly unrelated phenomena are governed by the same quantity: the normalized spectral entropy S(K)/log n of the kernel Gram matrix. We prove a Cauchy-Schwarz tail bound on Nyström approximation error, a finite-sample variance-contraction identity in terms of Bach's degrees of freedom d_σ(K), and a characterization of the target-dependent optimal entropy via the intrinsic dimension of the target in the kernel eigenbasis. Empirically, the diagnostic is kernel-agnostic and the NLL sweet spot lives at high entropy for smooth targets and at low entropy for band-limited quantum-data targets.

What carries the argument

The normalized spectral entropy S(K)/log n of the kernel Gram matrix, which unifies approximation error bounds, variance contraction, and target-dependent performance optima across quantum and classical kernels.

If this is right

Nyström approximation error obeys a Cauchy-Schwarz tail bound controlled by the entropy.
Finite-sample variance contraction follows an identity expressed through Bach's degrees of freedom.
The entropy value that minimizes negative log likelihood is high for smooth targets and low for band-limited targets.
The same entropy curves describe hardware-efficient, matchgate, IQP, and classical kernel families on dequantization and variance panels.
The diagnostic transfers to IBM Heron hardware with median absolute error of 3.2 percent across configurations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Kernel design in other quantum machine learning settings could use spectral entropy as a tunable knob to trade expressivity against stability.
Because the entropy curves coincide for classical kernels after dequantization, the measure supplies a common yardstick for comparing quantum and classical kernel methods.
Reliable transfer to current hardware suggests spectral entropy could guide kernel choice on noisy devices without requiring full error mitigation.
For quantum data that are inherently band-limited, deliberately low-entropy kernels may be preferable to the high-entropy choices that work for smooth classical targets.

Load-bearing premise

The target-dependent optimal entropy is fully characterized by the intrinsic dimension of the target in the kernel eigenbasis without additional unmodeled effects from quantum circuit structure or data distribution.

What would settle it

An experiment in which a kernel with measured S(K)/log n shows either exponential speedup in a well-conditioned task or avoids pathologies on band-limited targets in a manner inconsistent with the intrinsic-dimension prediction would falsify the claimed unification.

Figures

Figures reproduced from arXiv: 2605.30952 by Chao Li, Delu Zeng, Guang Lin, Jian Xu, John Paisley, Qibin Zhao, Yuning Qiu.

**Figure 2.** Figure 2: Single-family concept verification on the hardware-efficient ansatz ( [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Eight candidate spectral diagnostics, plotted against test NLL on the M1 sweep ( [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Classical kernels (squares/triangles/diamonds: Mat [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Spectral anatomy on two real regression benchmarks ( [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Best test NLL per kernel family on the two real-data benchmarks. Annotations show the [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Ablation. RAND-PAULI-NOENT (black +, NO entangling gates) and CLIFFORDT (orange ×) overlaid on HE/MATCHGATE/IQP. The universal spectral curves are unchanged. The NLL panel shows mild scatter for CLIFFORDT at low S (sensitivity to random Clifford choice), but the average trend matches. and target. We now show that the location of the useful-hardness frontier shifts dramatically when the target changes chara… view at source ↗

**Figure 8.** Figure 8: Spectral anatomy across two targets at nq = 6. Top row: synthetic target with ±1 s.d. cross-seed errorbars over 3 seeds (drawing both training inputs and ansatz parameters); the NLL minimum (stars) sits at S/ log n ≈ 0.9, identifying a classical U-shape useful-hardness frontier. Bottom row: quantum-data target y(x) = ⟨ψ(x)|O|ψ(x)⟩ produced by a fixed L = 3 ansatz (single seed because the data-generating ci… view at source ↗

**Figure 9.** Figure 9: Scaling at nq = 8 qubits, synthetic target, three ansatz families, 90 total configurations. The qualitative picture is preserved but the useful-hardness frontier sharpens: the NLL minimum migrates to S/ log n ≈ 0.98 and the U becomes deeper. Larger Hilbert spaces shrink the useful spectral window. high-SNR regime (β 2 i ≫ σ 2 for i ∈ supp{λ ∗ i }), the NLL-optimal kernel K∗ of Theorem 2 satisfies reff(K∗ )… view at source ↗

**Figure 10.** Figure 10: Hardware validation of the spectral diagnostic on [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗

**Figure 11.** Figure 11: BO simple-regret curves on two objectives, shaded [PITH_FULL_IMAGE:figures/full_fig_p015_11.png] view at source ↗

**Figure 12.** Figure 12: Resilience sweep on the previously-worst HE configuration. (a) Hardware estimate of [PITH_FULL_IMAGE:figures/full_fig_p023_12.png] view at source ↗

**Figure 13.** Figure 13: Hardware drift: five consecutive reruns of the worst M3-XL configuration on [PITH_FULL_IMAGE:figures/full_fig_p024_13.png] view at source ↗

**Figure 14.** Figure 14: Hardware overview. All four hardware sweeps overlaid: M3-XL aachen [PITH_FULL_IMAGE:figures/full_fig_p025_14.png] view at source ↗

**Figure 15.** Figure 15: Scaleup from n = 30 to n = 100 on the synthetic target, nq = 6, HE family. (Left) The same NLL U-shape; stars mark the best-NLL configuration in each sweep, which is identical in (L, s) space but lies at a different S(K)/ log n value (0.79 vs. 0.91) because log n grew. (Right) Perconfiguration S(K)/ log n at n = 100 vs. n = 30; all points lie below the identity line, confirming the log n normalization ef… view at source ↗

**Figure 16.** Figure 16: Spectral regularization trajectories on the M1 sweep manifold (gray points). (a) Shrink [PITH_FULL_IMAGE:figures/full_fig_p028_16.png] view at source ↗

read the original abstract

Two recent results have reshaped quantum Gaussian processes (QGPs). On the one hand, \citet{lowe2025assessing} rule out the exponential speedups claimed by HHL-based QGP regression in the typical, well-conditioned regime; on the other, an independent line of work shows that highly expressive quantum kernels suffer posterior pathologies that break Bayesian optimization. We show that these seemingly unrelated phenomena are governed by the same quantity: the normalized spectral entropy $S(K)/\log n$ of the kernel Gram matrix. We prove a Cauchy--Schwarz tail bound on Nystr\"om approximation error, a finite-sample variance-contraction identity in terms of Bach's degrees of freedom $d_\sigma(K)$, and a characterization of the \emph{target-dependent} optimal entropy via the intrinsic dimension of the target in the kernel eigenbasis. Empirically, the diagnostic is kernel-agnostic: hardware-efficient, matchgate, IQP \emph{and} RBF/Mat\'ern/RFF/deep-kernel families all collapse onto identical $S/\log n$ curves on dequantization, ECE, and variance-contraction panels. The NLL sweet spot lives at high entropy for smooth targets and at low entropy for band-limited quantum-data targets. The diagnostic transfers from simulator to IBM Heron hardware with median absolute error $3.2\%$ and mean $5.2\%$ in $S/\log n$ across $24$ configurations at $n_q = 4$, with matchgate and IQP within $5\%$ mean and a single HE configuration returning a $30\%$ outlier that drops to $0.5\%$ on rerun (attributed to calibration drift); the same diagnostic transfers to a second Heron backend (mean error $2.7\%$) and to a $n_q = 6$ scale-up on the original backend (mean error $1.7\%$). No error mitigation is applied throughout.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper unifies HHL dequantization failure and expressive-kernel pathologies under normalized spectral entropy with kernel-agnostic bounds and hardware transfer data, but the target-dependent optimum rests on an assumption that may overlook circuit effects.

read the letter

The main new piece is the claim that normalized spectral entropy S(K)/log n governs both the loss of HHL speedups in well-conditioned regimes and the posterior pathologies in expressive quantum kernels. They derive a Cauchy-Schwarz tail bound on Nyström error, a finite-sample variance contraction identity tied to Bach degrees of freedom, and a characterization of the target-dependent entropy optimum via intrinsic dimension in the kernel eigenbasis. Empirically they show that hardware-efficient, matchgate, IQP, RBF, Matérn, and deep kernels all fall on the same S/log n curves for dequantization, ECE, and variance contraction, which is a clean result if it holds.

The hardware transfer numbers are the strongest part: median absolute error of 3.2% and mean 5.2% across 24 configurations on IBM Heron at n_q=4, with most kernels under 5% and only one outlier that improves on rerun. Similar low errors appear on a second backend and at n_q=6. That level of quantified stability across simulator-to-hardware is useful and not common in this area.

The softer spot is the target-dependent step. The paper states that the NLL sweet spot sits at high entropy for smooth targets and low entropy for band-limited quantum data, pinned to intrinsic dimension. If circuit-induced correlations or non-stationary data add variance outside the eigenbasis projection, entropy alone would not be the sole governor. The abstract presents this as a characterization, but the stress-test concern lands here because the other identities are kernel-agnostic while this one is not.

This is for people already working on quantum kernel selection or QGP regression who want a practical diagnostic before committing hardware time. The combination of stated derivations, cross-family collapse, and hardware numbers is enough to send to referees rather than desk reject, though the target characterization will need close checking in review.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that the normalized spectral entropy S(K)/log n of the kernel Gram matrix unifies two phenomena in quantum Gaussian processes: the dequantization of HHL-based QGP regression (per Lowe et al.) and posterior pathologies in expressive quantum kernels. It proves a Cauchy-Schwarz tail bound on Nyström approximation error, a finite-sample variance-contraction identity in terms of Bach's degrees of freedom d_σ(K), and a target-dependent characterization of optimal entropy via the intrinsic dimension of the target in the kernel eigenbasis. Empirically, hardware-efficient, matchgate, IQP, RBF, Matérn, RFF and deep-kernel families collapse onto identical S/log n curves for dequantization, ECE and variance contraction; the NLL optimum shifts with target smoothness. The diagnostic transfers to IBM Heron hardware (median error 3.2%) across n_q=4/6 and multiple backends without error mitigation.

Significance. If the derivations and the target-dependent characterization hold, the work supplies a single, kernel-agnostic diagnostic that explains when quantum speedups are precluded and when Bayesian optimization fails, with direct hardware validation and cross-family collapse. The explicit bounds and the variance identity constitute reusable theoretical tools; the hardware-transfer results (quantified error across 24+ configurations) add practical weight.

major comments (2)

[Abstract / target-dependent entropy characterization] Abstract / target-dependent entropy characterization: the claim that S(K)/log n is the sole governor of both dequantization failure and posterior pathologies rests on the assertion that the target-dependent optimum is exactly the intrinsic dimension of the target projected onto the kernel eigenbasis. This step is load-bearing; if circuit-induced correlations or non-stationary data effects contribute variance outside that projection, the entropy value would not fully govern the phenomena. The manuscript must supply the explicit equations or proof sketch for this characterization and demonstrate that no additional unmodeled terms arise.
[Abstract] Abstract: the finite-sample variance identity is stated to be in terms of d_σ(K), yet the abstract supplies no equation number or derivation outline; because this identity is presented as one of the three central results linking entropy to posterior behavior, the full derivation (including any assumptions on the noise model or kernel positive-definiteness) must be inspectable to confirm it is not tautological with the entropy definition.

minor comments (2)

[Abstract] Abstract: the phrase 'median absolute error 3.2% and mean 5.2% in S/log n' should specify whether the percentages are absolute or relative to the simulator value, and whether the single 30% HE outlier is included in the reported statistics.
[Abstract] Abstract: 'No error mitigation is applied throughout' is useful but should be paired with a brief statement on how readout or gate errors were quantified or bounded in the hardware experiments.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The two major comments both concern the clarity and inspectability of the central theoretical claims in the abstract. We address each below, indicating where we will revise the manuscript to make the derivations fully accessible while preserving the existing proofs.

read point-by-point responses

Referee: [Abstract / target-dependent entropy characterization] Abstract / target-dependent entropy characterization: the claim that S(K)/log n is the sole governor of both dequantization failure and posterior pathologies rests on the assertion that the target-dependent optimum is exactly the intrinsic dimension of the target projected onto the kernel eigenbasis. This step is load-bearing; if circuit-induced correlations or non-stationary data effects contribute variance outside that projection, the entropy value would not fully govern the phenomena. The manuscript must supply the explicit equations or proof sketch for this characterization and demonstrate that no additional unmodeled terms arise.

Authors: The full manuscript (Section 3.3) derives the target-dependent optimum by minimizing the expected posterior variance E[||f - f^*||^2] under the GP model. This reduces exactly to the sum of the projected eigenvalues of the target function in the kernel eigenbasis, i.e., d_target = sum_i (lambda_i / (lambda_i + sigma^2)) where the sum is taken after expanding the target in the eigenfunctions of K. The derivation uses only the standard GP assumptions (zero-mean prior, additive Gaussian noise independent of the kernel) and the spectral decomposition of the Gram matrix; because K is defined by the quantum feature map, all circuit-induced correlations are already encoded in its eigenvalues and eigenvectors. Consequently, no residual variance terms outside this projection appear. We will add a one-sentence proof sketch and the explicit equation for d_target to the abstract, together with a forward reference to Section 3.3. revision: yes
Referee: [Abstract] Abstract: the finite-sample variance identity is stated to be in terms of d_σ(K), yet the abstract supplies no equation number or derivation outline; because this identity is presented as one of the three central results linking entropy to posterior behavior, the full derivation (including any assumptions on the noise model or kernel positive-definiteness) must be inspectable to confirm it is not tautological with the entropy definition.

Authors: The finite-sample identity appears as Eq. (14) in Section 4.2: Var[y | X] = sigma^2 (n - d_sigma(K)) / n, where d_sigma(K) = sum_i lambda_i / (lambda_i + sigma^2) is Bach's degrees of freedom. The derivation follows from the Woodbury identity applied to the posterior covariance under the standard assumptions that K is positive definite and the noise is homoscedastic and independent of the design points. It is not tautological with S(K) because d_sigma(K) is a weighted trace that contracts differently from the unweighted entropy; the link to entropy is obtained only after taking the large-n limit and applying the proved Cauchy-Schwarz tail bound. We will insert the equation number (Eq. 14) and a one-line derivation outline into the abstract. revision: yes

Circularity Check

0 steps flagged

No circularity: derivations rest on external inequalities and Bach degrees of freedom

full rationale

The paper's core derivations—a Cauchy-Schwarz Nyström tail bound, finite-sample variance contraction via Bach's d_σ(K), and target-dependent optimal entropy characterization via intrinsic dimension in the kernel eigenbasis—are presented as following from standard inequalities and prior non-self-cited results. The unifying claim that S(K)/log n governs both dequantization failure and posterior pathologies is supported by these identities plus kernel-agnostic empirical collapse, without any quoted reduction of a prediction to a fitted input or load-bearing self-citation chain. The target-dependent step is framed as an independent characterization rather than a tautology. No steps meet the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Central claim rests on standard mathematical inequalities and prior concepts from kernel theory; no free parameters or new entities are introduced in the abstract.

axioms (2)

standard math Cauchy-Schwarz inequality
Invoked for the tail bound on Nyström approximation error.
domain assumption Finite-sample variance-contraction identity expressed via Bach's degrees of freedom d_σ(K)
Used to relate spectral entropy to prediction variance.

pith-pipeline@v0.9.1-grok · 5902 in / 1340 out tokens · 33085 ms · 2026-06-28T23:34:10.107989+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

4 extracted references · 1 canonical work pages · 1 internal anchor

[1]

Deep Neural Networks as Gaussian Processes

PMlR, 2019. Jonas K ¨ubler, Simon Buchholz, and Bernhard Sch ¨olkopf. The inductive bias of quantum kernels. Advances in Neural Information Processing Systems, 34:12661–12673, 2021. Jaehoon Lee, Yasaman Bahri, Roman Novak, Samuel S Schoenholz, Jeffrey Pennington, and Jascha Sohl-Dickstein. Deep neural networks as gaussian processes.arXiv preprint arXiv:17...

work page internal anchor Pith review Pith/arXiv arXiv 2019
[2]

TheNLL-optimal configuration(L, s) = (1,0.5)is invariantbetweenn= 30and n= 100
[3]

Thebest NLL improvesfrom+0.89atn= 30to+0.46atn= 100, as expected from more training data
[4]

Consequently the abso- lute sweet-spotS(K ∗)/lognshifts from0.91atn= 30to0.79atn= 100

At fixed(L, s)the normalized spectral entropyS(K)/logndecreaseswhenngrows (right panel), becauselogngrows faster than the spectral entropy itself. Consequently the abso- lute sweet-spotS(K ∗)/lognshifts from0.91atn= 30to0.79atn= 100. Combined with then q-dependent shift discussed in Section 5 (0.91 atn q = 6versus 0.99 at nq = 8), this confirms that Corol...

[1] [1]

Deep Neural Networks as Gaussian Processes

PMlR, 2019. Jonas K ¨ubler, Simon Buchholz, and Bernhard Sch ¨olkopf. The inductive bias of quantum kernels. Advances in Neural Information Processing Systems, 34:12661–12673, 2021. Jaehoon Lee, Yasaman Bahri, Roman Novak, Samuel S Schoenholz, Jeffrey Pennington, and Jascha Sohl-Dickstein. Deep neural networks as gaussian processes.arXiv preprint arXiv:17...

work page internal anchor Pith review Pith/arXiv arXiv 2019

[2] [2]

TheNLL-optimal configuration(L, s) = (1,0.5)is invariantbetweenn= 30and n= 100

[3] [3]

Thebest NLL improvesfrom+0.89atn= 30to+0.46atn= 100, as expected from more training data

[4] [4]

Consequently the abso- lute sweet-spotS(K ∗)/lognshifts from0.91atn= 30to0.79atn= 100

At fixed(L, s)the normalized spectral entropyS(K)/logndecreaseswhenngrows (right panel), becauselogngrows faster than the spectral entropy itself. Consequently the abso- lute sweet-spotS(K ∗)/lognshifts from0.91atn= 30to0.79atn= 100. Combined with then q-dependent shift discussed in Section 5 (0.91 atn q = 6versus 0.99 at nq = 8), this confirms that Corol...