Trainability of IQP Quantum Circuit Born Machines Under Gaussian Initialization

Gennaro De Luca

arxiv: 2606.10179 · v1 · pith:NX2DKD4Knew · submitted 2026-06-08 · 🪐 quant-ph · cs.LG

Trainability of IQP Quantum Circuit Born Machines Under Gaussian Initialization

Gennaro De Luca This is my paper

Pith reviewed 2026-06-27 15:59 UTC · model grok-4.3

classification 🪐 quant-ph cs.LG

keywords IQP circuitsquantum circuit born machinesGaussian initializationStein's lemmabarren plateausMMD lossgradient variancetrainability

0 comments

The pith

Gaussian initialization of IQP QCBMs yields an analytical lower bound on MMD gradient variance via Stein's lemma.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives a closed-form lower bound on the variance of the gradient of the maximum mean discrepancy loss when IQP circuit parameters are drawn from a Gaussian distribution. It pairs this with a high-probability bound on how far any single gradient realization strays from its mean, obtained through Lipschitz concentration. These results identify regimes of initialization variance and circuit depth where exponential concentration is suppressed or encouraged. A reader cares because the bounds apply directly to a model class whose sampling is believed hard yet whose training expectations remain classically computable.

Core claim

For parameters initialized as independent Gaussians, the partial derivatives of the MMD loss with respect to those parameters satisfy the hypotheses of Stein's lemma, producing an explicit positive lower bound on gradient variance; the same Lipschitz property of the loss further implies that the gradient concentrates around its expectation with probability decaying exponentially in the deviation size.

What carries the argument

Stein's lemma applied to the gradient of the MMD loss expressed in the parameters of an IQP circuit under Gaussian measure.

If this is right

Choice of Gaussian variance and circuit structure can be used to avoid or promote exponential gradient concentration.
Barren plateaus become more probable once the Gaussian variance exceeds a threshold set by circuit depth.
Classical simulation of training remains viable because the expectation values needed for the MMD loss are efficiently computable.
The analytic bound supplies a certificate of trainability that does not require Monte-Carlo estimation of the variance itself.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same Stein-Lipschitz route may extend to other differentiable losses provided the requisite smoothness and moment conditions hold.
Numerical verification on small instances would reveal how tight the derived lower bound is in practice.
Relaxing the Gaussian assumption to other product distributions would require new concentration tools.

Load-bearing premise

The gradient of the MMD loss must obey the exact technical hypotheses of Stein's lemma and the Lipschitz concentration inequality with no additional circuit-specific approximations.

What would settle it

For a fixed small-depth IQP circuit and chosen Gaussian variance, direct computation of the gradient variance over many samples falls below the closed-form lower bound supplied by the lemma.

read the original abstract

Quantum Circuit Born Machines (QCBMs) offer a natural approach to generative machine learning by leveraging the Born rule. Recent work has provided a method to classically train QCBMs with Instantaneous Quantum Polynomial (IQP) circuits via the Maximum Mean Discrepancy (MMD) loss. Despite the assumed intractability of sampling from IQP circuits classically, their expectation values can be computed classically, enabling training of these IQP QCBMs. However, quantum machine learning (QML) models have various other challenges, including trainability issues caused by exponential concentration or barren plateaus. While these issues have been explored for parameters sampled from a uniform distribution, little work has been done to rigorously treat the use of arbitrary Gaussian initialization schemes. This work leverages Stein's lemma and Lipschitz concentration bounds for Gaussian random variables to provide an analytical lower bound of the variance of the gradient and a probabilistic concentration bound of the deviation of the gradient from its mean. It discusses strategies to either avoid or encourage exponential concentration, as well as the conditions under which barren plateaus are more likely to occur.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives new analytical variance lower bounds and concentration results for gradients in Gaussian-initialized IQP QCBMs via Stein's lemma, but applicability to the MMD loss needs verification.

read the letter

The main thing here is that the paper derives a lower bound on gradient variance and a concentration bound around the mean for IQP QCBMs with MMD loss when parameters come from a Gaussian, using Stein's lemma and Lipschitz bounds on Gaussians. This extends earlier uniform-initialization results, which is the concrete new piece.

It does a clean job stating the problem and outlining how the bounds could inform initialization choices to avoid or encourage concentration. The discussion of barren plateau conditions is straightforward and ties back to the cited QML literature.

The soft spot is that the abstract states the bounds without showing steps or listing the exact assumptions on the loss and circuit. Stein's lemma requires almost-everywhere differentiability plus integrable derivative and controlled growth; the stress-test note is right that these are not automatically guaranteed for the MMD gradient on IQP circuits, where kernel terms or phase structure could interfere. Without the full derivation it is impossible to judge whether the bounds are unconditional or rest on extra restrictions.

The work is for researchers working on trainability of quantum generative models, especially those already using IQP circuits and classical training via MMD. A reader who wants analytical rather than purely numerical results on initialization would get something from it.

It deserves serious referee time because the question is practical and the method is standard concentration analysis applied to a new initialization regime. I would send it to peer review so the proofs and any numerical checks can be examined.

Referee Report

2 major / 2 minor

Summary. The paper analyzes trainability of IQP Quantum Circuit Born Machines (QCBMs) under Gaussian initialization when trained with the Maximum Mean Discrepancy (MMD) loss. It invokes Stein's lemma and Lipschitz concentration bounds on Gaussian random variables to derive an analytical lower bound on the variance of the gradient together with a probabilistic concentration bound on the deviation of the gradient from its mean; the manuscript also discusses initialization strategies that avoid or encourage exponential concentration and the conditions under which barren plateaus become likely.

Significance. If the derivations are valid, the work supplies the first explicit analytic treatment of gradient statistics for this classically simulable subclass of QCBMs under a common initialization distribution, thereby furnishing concrete, falsifiable criteria for when training is expected to succeed or fail.

major comments (2)

[Main analytic results (gradient variance and concentration bounds)] The central claims rest on direct application of Stein's lemma (E[X f(X)] = E[∇f(X)]) and Gaussian Lipschitz concentration to the MMD gradient; the manuscript supplies no explicit verification that the map from Gaussian parameters to the MMD loss satisfies the required almost-everywhere differentiability, integrability of the derivative, and controlled-growth hypotheses (see skeptic note on hidden non-differentiability or kernel growth).
[Abstract and opening paragraphs of §3] Abstract states the variance lower bound and concentration statement but the supplied text contains no derivation steps, explicit assumptions on the loss function, or circuit-specific checks that the Born-rule probabilities of the IQP circuit meet the lemma hypotheses without further approximation.

minor comments (2)

Define the precise form of the MMD kernel and the IQP circuit ansatz (including the number of layers and the support of the Gaussian) at the first appearance of the gradient expressions.
Add a short paragraph contrasting the Gaussian case with the uniform-initialization results already in the literature.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback. We address each major comment below and will revise the manuscript to incorporate additional verifications and clarifications as outlined.

read point-by-point responses

Referee: [Main analytic results (gradient variance and concentration bounds)] The central claims rest on direct application of Stein's lemma (E[X f(X)] = E[∇f(X)]) and Gaussian Lipschitz concentration to the MMD gradient; the manuscript supplies no explicit verification that the map from Gaussian parameters to the MMD loss satisfies the required almost-everywhere differentiability, integrability of the derivative, and controlled-growth hypotheses (see skeptic note on hidden non-differentiability or kernel growth).

Authors: We acknowledge that the manuscript applies Stein's lemma and Gaussian concentration bounds without an explicit appendix verifying the technical hypotheses. The MMD loss is a finite linear combination of kernel evaluations on expectation values of the IQP circuit; these expectations are trigonometric polynomials in the parameters and hence differentiable almost everywhere, while the Gaussian kernel ensures the requisite integrability and sub-exponential growth. Nevertheless, we agree that an explicit check strengthens the presentation. In the revised manuscript we will add a dedicated subsection (or appendix) that confirms almost-everywhere differentiability of the parameter-to-loss map, verifies the integrability condition, and bounds the growth of the gradient to justify direct application of Stein's lemma without further approximation. revision: yes
Referee: [Abstract and opening paragraphs of §3] Abstract states the variance lower bound and concentration statement but the supplied text contains no derivation steps, explicit assumptions on the loss function, or circuit-specific checks that the Born-rule probabilities of the IQP circuit meet the lemma hypotheses without further approximation.

Authors: The derivations appear in Sections 3–4, yet we concur that the abstract and the opening of §3 would benefit from greater transparency. We will revise the abstract to state the principal assumptions (Gaussian initialization, MMD loss with Gaussian kernel, and the polynomial form of IQP expectation values) and will insert a short paragraph at the beginning of §3 that lists the lemma hypotheses and confirms, via the explicit trigonometric structure of the Born-rule probabilities, that they hold without approximation for the IQP circuits under consideration. revision: yes

Circularity Check

0 steps flagged

No circularity: external lemmas applied to model gradients

full rationale

The paper derives variance lower bounds and concentration results for MMD gradients on Gaussian-initialized IQP circuits by direct invocation of Stein's lemma and Lipschitz concentration bounds. These are independent external mathematical facts whose hypotheses are asserted to hold for the Born-rule expectation values; the derivation does not reduce any claimed prediction to a fitted parameter, self-citation chain, or definitional tautology. No self-citations appear load-bearing, no ansatz is smuggled, and no renaming of known results occurs. The central claims therefore remain non-circular and externally grounded.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the applicability of two standard probability results to the gradient of the MMD loss under Gaussian parameters; no free parameters, new entities, or ad-hoc axioms are introduced in the abstract.

axioms (2)

domain assumption Stein's lemma applies directly to the gradient random variable formed by the MMD loss on IQP circuits
Invoked to obtain the analytical lower bound on gradient variance.
domain assumption Lipschitz concentration bounds hold for the Gaussian random variables representing circuit parameters
Used to derive the probabilistic bound on gradient deviation from its mean.

pith-pipeline@v0.9.1-grok · 5712 in / 1250 out tokens · 23422 ms · 2026-06-27T15:59:04.997769+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Qudit extension of parameterized IQP circuits: A generative quantum machine learning approach to integer data
quant-ph 2026-06 unverdicted novelty 5.0

Qudit extension of parameterized IQP circuits proposed for generative modeling of integer data, with loss function and covariance matrix, validated on electron shower energy deposits in CLIC electromagnetic calorimeter.

Reference graph

Works this paper leans on

20 extracted references · 4 linked inside Pith · cited by 1 Pith paper

[1]

Differentiable learning of quantum circuit born machines

Jin-Guo Liu and Lei Wang. “Differentiable learning of quantum circuit born machines”. In:Physical Review A98.6 (2018), p. 062324

2018
[2]

The Born supremacy: quantum advantage and training of an Ising Born machine

Brian Coyle et al. “The Born supremacy: quantum advantage and training of an Ising Born machine”. In:npj Quantum Information6.1 (2020), p. 60

2020
[3]

Do quantum circuit born machines generalize?

Kaitlin Gili et al. “Do quantum circuit born machines generalize?” In: Quantum Science and Technology8.3 (2023), p. 035021

2023
[4]

Variational quantum generative modeling by sampling expectation values of tunable observables

Kevin Shen et al. “Variational quantum generative modeling by sampling expectation values of tunable observables”. In:npj Quantum Information 11.1 (2025), p. 178

2025
[5]

A continuous variable Born machine

Ieva ˇCepait˙ e, Brian Coyle, and Elham Kashefi. “A continuous variable Born machine”. In:Quantum Machine Intelligence4.1 (2022), p. 6

2022
[6]

Enhancing combinatorial optimization with classical and quantum generative models

Javier Alcazar et al. “Enhancing combinatorial optimization with classical and quantum generative models”. In:Nature Communications15.1 (2024), p. 2761

2024
[7]

Generaliza- tion metrics for practical quantum advantage in generative models

Kaitlin Gili, Marta Mauri, and Alejandro Perdomo-Ortiz. “Generaliza- tion metrics for practical quantum advantage in generative models”. In: Physical Review Applied21.4 (2024), p. 044032

2024
[8]

A framework for demonstrating practical quantum advantage: comparing quantum against classical generative mod- els

Mohamed Hibat-Allah et al. “A framework for demonstrating practical quantum advantage: comparing quantum against classical generative mod- els”. In:Communications Physics7.1 (2024), p. 68

2024
[9]

Average- case complexity versus approximate simulation of commuting quantum computations

Michael J Bremner, Ashley Montanaro, and Dan J Shepherd. “Average- case complexity versus approximate simulation of commuting quantum computations”. In:Physical review letters117.8 (2016), p. 080501

2016
[10]

Supervised learning with quantum-enhanced fea- ture spaces

Vojtˇ ech Havl´ ıˇ cek et al. “Supervised learning with quantum-enhanced fea- ture spaces”. In:Nature567.7747 (2019), pp. 209–212

2019
[11]

Simulating quantum computers with probabilistic methods

M Nest. “Simulating quantum computers with probabilistic methods”. In: arXiv preprint arXiv:0911.1624(2009)

Pith/arXiv arXiv 2009
[12]

Train on classical, deploy on quantum: scaling generative quantum machine learn- ing to a thousand qubits

Erik Recio-Armengol, Shahnawaz Ahmed, and Joseph Bowles. “Train on classical, deploy on quantum: scaling generative quantum machine learn- ing to a thousand qubits”. In:arXiv preprint arXiv:2503.02934(2025)

arXiv 2025
[13]

Iqpopt: Fast optimization of instantaneous quantum polynomial circuits in jax

Erik Recio-Armengol and Joseph Bowles. “Iqpopt: Fast optimization of instantaneous quantum polynomial circuits in jax”. In:arXiv preprint arXiv:2501.04776(2025)

Pith/arXiv arXiv 2025
[14]

Barren plateaus in quantum neural network training landscapes

Jarrod R McClean et al. “Barren plateaus in quantum neural network training landscapes”. In:Nature communications9.1 (2018), p. 4812. 10

2018
[15]

Exponential concentration in quantum kernel methods

Supanut Thanasilp et al. “Exponential concentration in quantum kernel methods”. In:Nature communications15.1 (2024), p. 5200

2024
[16]

Characterizing trainability of instantaneous quantum polynomial circuit born machines

Kevin Shen et al. “Characterizing trainability of instantaneous quantum polynomial circuit born machines”. In:arXiv preprint arXiv:2602.11042 (2026)

arXiv 2026
[17]

IQP Born Machines under Data-dependent and Agnos- tic Initialization Strategies

Sacha Lerch et al. “IQP Born Machines under Data-dependent and Agnos- tic Initialization Strategies”. In:arXiv preprint arXiv:2603.14576(2026)

arXiv 2026
[18]

On weight initialization in deep neural net- works

Siddharth Krishna Kumar. “On weight initialization in deep neural net- works”. In:arXiv preprint arXiv:1704.08863(2017)

Pith/arXiv arXiv 2017
[19]

Exact so- lutions to the nonlinear dynamics of learning in deep linear neural net- works

Andrew M Saxe, James L McClelland, and Surya Ganguli. “Exact so- lutions to the nonlinear dynamics of learning in deep linear neural net- works”. In:arXiv preprint arXiv:1312.6120(2013)

Pith/arXiv arXiv 2013
[20]

Escaping from the barren plateau via gaussian ini- tializations in deep variational quantum circuits

Kaining Zhang et al. “Escaping from the barren plateau via gaussian ini- tializations in deep variational quantum circuits”. In:Advances in Neural Information Processing Systems35 (2022), pp. 18612–18627. 11 A Derivation of the Gradient and Hessian πa :=⟨Z a⟩p fa(θ) :=⟨Z a⟩qθ C(θ) = MMD(p, q θ) =E a∼Pσ (πa −f a(θ))2 C(θ) = X a Pσ(a)(πa −f a(θ))2 fa(θ) =E ...

2022

[1] [1]

Differentiable learning of quantum circuit born machines

Jin-Guo Liu and Lei Wang. “Differentiable learning of quantum circuit born machines”. In:Physical Review A98.6 (2018), p. 062324

2018

[2] [2]

The Born supremacy: quantum advantage and training of an Ising Born machine

Brian Coyle et al. “The Born supremacy: quantum advantage and training of an Ising Born machine”. In:npj Quantum Information6.1 (2020), p. 60

2020

[3] [3]

Do quantum circuit born machines generalize?

Kaitlin Gili et al. “Do quantum circuit born machines generalize?” In: Quantum Science and Technology8.3 (2023), p. 035021

2023

[4] [4]

Variational quantum generative modeling by sampling expectation values of tunable observables

Kevin Shen et al. “Variational quantum generative modeling by sampling expectation values of tunable observables”. In:npj Quantum Information 11.1 (2025), p. 178

2025

[5] [5]

A continuous variable Born machine

Ieva ˇCepait˙ e, Brian Coyle, and Elham Kashefi. “A continuous variable Born machine”. In:Quantum Machine Intelligence4.1 (2022), p. 6

2022

[6] [6]

Enhancing combinatorial optimization with classical and quantum generative models

Javier Alcazar et al. “Enhancing combinatorial optimization with classical and quantum generative models”. In:Nature Communications15.1 (2024), p. 2761

2024

[7] [7]

Generaliza- tion metrics for practical quantum advantage in generative models

Kaitlin Gili, Marta Mauri, and Alejandro Perdomo-Ortiz. “Generaliza- tion metrics for practical quantum advantage in generative models”. In: Physical Review Applied21.4 (2024), p. 044032

2024

[8] [8]

A framework for demonstrating practical quantum advantage: comparing quantum against classical generative mod- els

Mohamed Hibat-Allah et al. “A framework for demonstrating practical quantum advantage: comparing quantum against classical generative mod- els”. In:Communications Physics7.1 (2024), p. 68

2024

[9] [9]

Average- case complexity versus approximate simulation of commuting quantum computations

Michael J Bremner, Ashley Montanaro, and Dan J Shepherd. “Average- case complexity versus approximate simulation of commuting quantum computations”. In:Physical review letters117.8 (2016), p. 080501

2016

[10] [10]

Supervised learning with quantum-enhanced fea- ture spaces

Vojtˇ ech Havl´ ıˇ cek et al. “Supervised learning with quantum-enhanced fea- ture spaces”. In:Nature567.7747 (2019), pp. 209–212

2019

[11] [11]

Simulating quantum computers with probabilistic methods

M Nest. “Simulating quantum computers with probabilistic methods”. In: arXiv preprint arXiv:0911.1624(2009)

Pith/arXiv arXiv 2009

[12] [12]

Train on classical, deploy on quantum: scaling generative quantum machine learn- ing to a thousand qubits

Erik Recio-Armengol, Shahnawaz Ahmed, and Joseph Bowles. “Train on classical, deploy on quantum: scaling generative quantum machine learn- ing to a thousand qubits”. In:arXiv preprint arXiv:2503.02934(2025)

arXiv 2025

[13] [13]

Iqpopt: Fast optimization of instantaneous quantum polynomial circuits in jax

Erik Recio-Armengol and Joseph Bowles. “Iqpopt: Fast optimization of instantaneous quantum polynomial circuits in jax”. In:arXiv preprint arXiv:2501.04776(2025)

Pith/arXiv arXiv 2025

[14] [14]

Barren plateaus in quantum neural network training landscapes

Jarrod R McClean et al. “Barren plateaus in quantum neural network training landscapes”. In:Nature communications9.1 (2018), p. 4812. 10

2018

[15] [15]

Exponential concentration in quantum kernel methods

Supanut Thanasilp et al. “Exponential concentration in quantum kernel methods”. In:Nature communications15.1 (2024), p. 5200

2024

[16] [16]

Characterizing trainability of instantaneous quantum polynomial circuit born machines

Kevin Shen et al. “Characterizing trainability of instantaneous quantum polynomial circuit born machines”. In:arXiv preprint arXiv:2602.11042 (2026)

arXiv 2026

[17] [17]

IQP Born Machines under Data-dependent and Agnos- tic Initialization Strategies

Sacha Lerch et al. “IQP Born Machines under Data-dependent and Agnos- tic Initialization Strategies”. In:arXiv preprint arXiv:2603.14576(2026)

arXiv 2026

[18] [18]

On weight initialization in deep neural net- works

Siddharth Krishna Kumar. “On weight initialization in deep neural net- works”. In:arXiv preprint arXiv:1704.08863(2017)

Pith/arXiv arXiv 2017

[19] [19]

Exact so- lutions to the nonlinear dynamics of learning in deep linear neural net- works

Andrew M Saxe, James L McClelland, and Surya Ganguli. “Exact so- lutions to the nonlinear dynamics of learning in deep linear neural net- works”. In:arXiv preprint arXiv:1312.6120(2013)

Pith/arXiv arXiv 2013

[20] [20]

Escaping from the barren plateau via gaussian ini- tializations in deep variational quantum circuits

Kaining Zhang et al. “Escaping from the barren plateau via gaussian ini- tializations in deep variational quantum circuits”. In:Advances in Neural Information Processing Systems35 (2022), pp. 18612–18627. 11 A Derivation of the Gradient and Hessian πa :=⟨Z a⟩p fa(θ) :=⟨Z a⟩qθ C(θ) = MMD(p, q θ) =E a∼Pσ (πa −f a(θ))2 C(θ) = X a Pσ(a)(πa −f a(θ))2 fa(θ) =E ...

2022