Recognition: no theorem link
Shot-Based Quantum Encoding: A Data-Loading Paradigm for Quantum Neural Networks
Pith reviewed 2026-05-10 18:55 UTC · model grok-4.3
The pith
By treating shot counts as learnable parameters, Shot-Based Quantum Encoding generates mixed quantum states whose expectation values are linear in the input probabilities, permitting nonlinear activations in quantum neural networks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SBQE is a data embedding strategy that distributes the hardware's native resource, shots, according to a data-dependent classical distribution over multiple initial quantum states. By treating the shot counts as a learnable degree of freedom, SBQE produces a mixed-state representation whose expectation values are linear in the classical probabilities and can therefore be composed with non-linear activation functions. We show that SBQE is structurally equivalent to a multilayer perceptron whose weights are realised by quantum circuits, and we describe a hardware-compatible implementation protocol.
What carries the argument
The data-dependent distribution of shot counts over multiple initial quantum states, treated as learnable classical parameters that define a mixed state with linear expectation values.
If this is right
- SBQE achieves 89.1% test accuracy on Semeion handwritten digits, reducing error by 5.3% relative to amplitude encoding.
- SBQE reaches 80.95% accuracy on Fashion MNIST, exceeding amplitude encoding by 2.0% and a linear multilayer perceptron by 1.3%.
- SBQE requires no data-encoding gates in the quantum circuit, reducing circuit depth to match NISQ coherence limits.
- SBQE is structurally equivalent to a multilayer perceptron whose weights are realized by quantum circuits.
Where Pith is reading between the lines
- Optimizing shot allocations as classical parameters may allow quantum neural networks to operate with shallower circuits and thereby preserve coherence longer on current hardware.
- The linearity of expectation values could be verified experimentally by scaling the total number of shots and confirming that classification performance improves predictably without unexpected variance.
- The same shot-distribution principle might be applied to other quantum machine-learning tasks such as regression or generative modeling that also benefit from linear embeddings followed by nonlinear processing.
Load-bearing premise
That treating finite shot counts as exact learnable classical probabilities produces a mixed state whose linearity enables nonlinear activations without introducing unaccounted bias or overhead from sampling noise and state preparation on real NISQ devices.
What would settle it
Running SBQE on a physical quantum processor with a limited number of shots per circuit and checking whether the measured test accuracy and linearity of expectation values match ideal simulation results or degrade measurably due to finite-shot statistics.
Figures
read the original abstract
Efficient data loading remains a bottleneck for near-term quantum machine-learning. Existing schemes (angle, amplitude, and basis encoding) either underuse the exponential Hilbert-space capacity or require circuit depths that exceed the coherence budgets of noisy intermediate-scale quantum hardware. We introduce Shot-Based Quantum Encoding (SBQE), a data embedding strategy that distributes the hardware's native resource, shots, according to a data-dependent classical distribution over multiple initial quantum states. By treating the shot counts as a learnable degree of freedom, SBQE produces a mixed-state representation whose expectation values are linear in the classical probabilities and can therefore be composed with non-linear activation functions. We show that SBQE is structurally equivalent to a multilayer perceptron whose weights are realised by quantum circuits, and we describe a hardware-compatible implementation protocol. Benchmarks on Fashion MNIST and Semeion handwritten digits, with ten independent initialisations per model, show that SBQE achieves 89.1% +/- 0.9% test accuracy on Semeion (reducing error by 5.3% relative to amplitude encoding and matching a width-matched classical network) and 80.95% +/- 0.10% on Fashion MNIST (exceeding amplitude encoding by +2.0% and a linear multilayer perceptron by +1.3%), all without any data-encoding gates.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Shot-Based Quantum Encoding (SBQE), a data-loading method for quantum neural networks that allocates shots according to a learnable classical distribution over multiple initial quantum states. This produces a mixed-state representation ρ = ∑ p_i |ψ_i⟩⟨ψ_i| whose expectation values Tr(ρ O) are linear in the classical probabilities p_i, enabling direct composition with non-linear activation functions. The approach is claimed to be structurally equivalent to a multilayer perceptron with weights realized by quantum circuits, admits a hardware-compatible implementation without data-encoding gates, and is benchmarked on Fashion MNIST and Semeion handwritten digits, reporting 80.95% ± 0.10% and 89.1% ± 0.9% test accuracy respectively (with relative improvements over amplitude encoding).
Significance. If the finite-shot and implementation details can be resolved, SBQE offers a concrete way to treat shot allocation as a trainable classical degree of freedom, potentially mitigating data-loading bottlenecks on NISQ hardware while preserving the ability to apply non-linear activations. The reported benchmarks include error bars from ten independent initializations and quantify relative gains (e.g., +2.0% over amplitude encoding on Fashion MNIST), which would constitute a useful empirical contribution if reproducible under realistic shot budgets.
major comments (3)
- [Abstract] Abstract and central claim on linearity: the statement that Tr(ρ O) is linear in the classical probabilities p_i and can therefore be composed with non-linear activations holds exactly only for infinite shots. With finite shots allocated according to the learned distribution, each expectation is replaced by a noisy estimator whose variance scales as 1/N_shots; for any non-linear σ the Jensen gap E[σ(estimate)] ≠ σ(E[estimate]) is nonzero in general. No analysis, bias correction, or shot-count dependence is provided, which directly undermines the hardware-compatibility claim.
- [Results / Benchmarks] Benchmark section (results paragraph): the reported accuracies (89.1% ± 0.9% on Semeion, 80.95% ± 0.10% on Fashion MNIST) are obtained from ten initializations, yet the manuscript supplies neither the number of shots per expectation, the explicit form of the learned shot-allocation probabilities, nor the circuit depths used. Without these parameters it is impossible to determine whether the gains arise under ideal simulation or under conditions that would be feasible on current NISQ devices.
- [Method / Equivalence] Equivalence claim (structural equivalence paragraph): the assertion that SBQE is structurally equivalent to a multilayer perceptron whose weights are realized by quantum circuits is presented without an explicit derivation or circuit diagram showing how the mixed-state expectations map onto the perceptron layers. Because the linearity is definitional once the mixed-state representation is adopted, the equivalence risks being circular unless the mapping from quantum observables to classical weights is shown to be non-trivial and hardware-realizable.
minor comments (3)
- [Abstract] The abstract states that SBQE works 'without any data-encoding gates,' but the implementation protocol section does not clarify how the multiple initial states |ψ_i⟩ are prepared or whether their preparation circuits themselves constitute encoding overhead.
- [Results] Table or figure reporting the ten-initialization statistics should include the exact shot budget per expectation value and the optimizer used for learning the shot-allocation probabilities.
- [Introduction] A brief comparison to other mixed-state or probabilistic encoding schemes (e.g., those using density-matrix simulators) would help situate the novelty of treating shot counts as learnable parameters.
Simulated Author's Rebuttal
We thank the referee for the insightful comments and the opportunity to clarify and strengthen our manuscript. We address each major comment point by point below. We have made revisions to incorporate additional details and analysis as suggested.
read point-by-point responses
-
Referee: [Abstract] Abstract and central claim on linearity: the statement that Tr(ρ O) is linear in the classical probabilities p_i and can therefore be composed with non-linear activations holds exactly only for infinite shots. With finite shots allocated according to the learned distribution, each expectation is replaced by a noisy estimator whose variance scales as 1/N_shots; for any non-linear σ the Jensen gap E[σ(estimate)] ≠ σ(E[estimate]) is nonzero in general. No analysis, bias correction, or shot-count dependence is provided, which directly undermines the hardware-compatibility claim.
Authors: We agree that the exact linearity holds in the infinite-shot limit, and finite shots introduce statistical noise and potential bias when composed with non-linear activations due to the Jensen inequality. The original manuscript focuses on the ideal case to highlight the conceptual advantage. In the revision, we will add a dedicated paragraph discussing the finite-shot approximation, including the scaling of variance and an empirical study of accuracy versus shot count. This will better support the hardware-compatibility claim by specifying the shot budgets required for the reported performance. revision: yes
-
Referee: [Results / Benchmarks] Benchmark section (results paragraph): the reported accuracies (89.1% ± 0.9% on Semeion, 80.95% ± 0.10% on Fashion MNIST) are obtained from ten initializations, yet the manuscript supplies neither the number of shots per expectation, the explicit form of the learned shot-allocation probabilities, nor the circuit depths used. Without these parameters it is impossible to determine whether the gains arise under ideal simulation or under conditions that would be feasible on current NISQ devices.
Authors: The manuscript indeed omits the precise experimental hyperparameters for brevity. We will revise the results section to explicitly state the number of shots per expectation (1024), the parameterization of the shot-allocation probabilities (output of a classical softmax layer), and the circuit depths (typically 4-6 layers for the observables). With these additions, readers can assess the feasibility on NISQ devices, where shot budgets are limited but the absence of encoding gates reduces depth. revision: yes
-
Referee: [Method / Equivalence] Equivalence claim (structural equivalence paragraph): the assertion that SBQE is structurally equivalent to a multilayer perceptron whose weights are realised by quantum circuits is presented without an explicit derivation or circuit diagram showing how the mixed-state expectations map onto the perceptron layers. Because the linearity is definitional once the mixed-state representation is adopted, the equivalence risks being circular unless the mapping from quantum observables to classical weights is shown to be non-trivial and hardware-realizable.
Authors: While the linearity follows from the mixed-state definition, the equivalence to an MLP is non-trivial because the 'features' are obtained from quantum circuit evaluations of different initial states, which can capture quantum advantages in expressivity. We will include an explicit mathematical derivation mapping the SBQE layers to perceptron layers and add a figure showing the quantum circuit for computing the observables. This demonstrates that the quantum components realize the weights in a hardware-efficient manner without data-encoding gates. revision: yes
Circularity Check
SBQE linearity of expectations is definitional from mixed-state construction
specific steps
-
self definitional
[Abstract]
"By treating the shot counts as a learnable degree of freedom, SBQE produces a mixed-state representation whose expectation values are linear in the classical probabilities and can therefore be composed with non-linear activation functions."
A mixed state is defined as ρ = ∑ p_i |ψ_i⟩⟨ψ_i|, so Tr(ρ O) = ∑ p_i ⟨ψ_i|O|ψ_i⟩ holds identically by the linearity of the trace and the definition of the density operator. The linearity is therefore true by construction for any choice of p_i (including learned shot fractions) and does not constitute a derived property of the encoding scheme. The subsequent claim that this 'therefore' enables composition with non-linear activations is tautological.
full rationale
The paper's central theoretical claim reduces directly to the definition of a mixed state. The asserted structural equivalence to an MLP follows from feeding linear expectations into non-linear activations, which is the standard classical construction. Empirical benchmarks on Fashion MNIST and Semeion provide independent content and are not forced by the definition, but the load-bearing 'enables non-linear activations' step is tautological. No self-citation chains or fitted predictions are involved.
Axiom & Free-Parameter Ledger
free parameters (1)
- shot allocation probabilities
axioms (1)
- domain assumption Expectation values of the mixed state are linear in the classical shot probabilities
Reference graph
Works this paper leans on
-
[1]
Amplitude Hybrid: baseline with deterministic amplitude embedding followed by variational lay- ers withRot-CNOT ladder, identical to the tem- plate in [30]
-
[2]
Probabilistic Hybrid: our SBQE variant: an in- put dense layer produces a simplex -normalised probability vector, which is fed to the same vari- ational circuit as above
-
[3]
Circuit details.All quantum layers employ the Rot(α, β, γ) gate per qubit followed by linear -chain CNOTs; parameters are initialised from N (0, 0.02)
Width-Matched Linear: a two -layer classical MLP whose hidden width is computed analyti- cally so that its parameter count is at least that of the quantum hybrids but never exceeds it . Circuit details.All quantum layers employ the Rot(α, β, γ) gate per qubit followed by linear -chain CNOTs; parameters are initialised from N (0, 0.02). Statevector back -p...
-
[4]
Biamonte, P
J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, and S. Lloyd, Nature549, 195 (2017)
2017
-
[5]
Cerezo, A
M. Cerezo, A. Arrasmith, R. Babbush, S. C. Benjamin, S. Endo, K. Fujii, J. R. McClean, K. Mitarai, X. Yuan, 6 L. Cincio,et al., Nature Reviews Physics3, 625 (2021)
2021
-
[6]
Melnikov, M
A. Melnikov, M. Kordzanganeh, A. Alodjants, and R.-K. Lee, Advances in Physics: X8, 2165452 (2023)
2023
-
[7]
Havl´ ıˇ cek, A
V. Havl´ ıˇ cek, A. D. C´ orcoles, K. Temme, A. W. Harrow, A. Kandala, J. M. Chow, and J. M. Gambetta, Nature 567, 209 (2019)
2019
-
[8]
Schuld and N
M. Schuld and N. Killoran, Physical Review Letters 122, 040504 (2019)
2019
-
[9]
Benedetti, D
M. Benedetti, D. Garcia-Pintos, O. Perdomo, V. Leyton-Ortega, Y. Nam, and A. Perdomo-Ortiz, npj Quantum Information5, 45 (2019)
2019
-
[10]
Zoufal, A
C. Zoufal, A. Lucchi, and S. Woerner, npj Quantum Information5, 103 (2019)
2019
-
[11]
A Quantum Approximate Optimization Algorithm
E. Farhi, J. Goldstone, and S. Gutmann, arXiv preprint arXiv:1411.4028 (2014)
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[12]
D. J. Egger, J. Mareˇ cek, and S. Woerner, Quantum5, 479 (2021)
2021
-
[13]
Peruzzo, J
A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. O’Brien, Nature Communications5, 4213 (2014)
2014
-
[14]
Schuld, R
M. Schuld, R. Sweke, and J. J. Meyer, Physical Review A103, 032430 (2021)
2021
-
[15]
P´ erez-Salinas, A
A. P´ erez-Salinas, A. Cervera-Lierta, E. Gil-Fuster, and J. I. Latorre, Quantum4, 226 (2020)
2020
-
[16]
Schuld and F
M. Schuld and F. Petruccione,Supervised Learning with Quantum Computers, Quantum Science and Tech- nology (Springer, 2018)
2018
-
[17]
Benedetti, E
M. Benedetti, E. Lloyd, S. Sack, and M. Fiorentini, Quantum Science and Technology4, 043001 (2019)
2019
-
[18]
Abbas, S
A. Abbas, S. Andersson, A. Asfaw, A. Corcoles, L. Bello,et al., Learn quantum computation us- ing Qiskit, https://qiskit.org/learn (2020), online textbook
2020
-
[19]
Mitarai, M
K. Mitarai, M. Negoro, M. Kitagawa, and K. Fujii, Physical Review A98, 032309 (2018)
2018
-
[20]
J. R. McClean, S. Boixo, V. N. Smelyanskiy, R. Bab- bush, and H. Neven, Nature Communications9, 4812 (2018)
2018
-
[21]
M¨ ott¨ onen, J
M. M¨ ott¨ onen, J. J. Vartiainen, V. Bergholm, and M. M. Salomaa, Quantum Information and Computation5, 467 (2005)
2005
-
[22]
R. Iten, R. Colbeck, I. Kukuljan, J. Home, and M. Chri- standl, Physical Review A93, 032318 (2016)
2016
-
[23]
Duan and C.-Y
B. Duan and C.-Y. Hsieh, Physical Review A110, 012616 (2024)
2024
-
[24]
Larocca, P
M. Larocca, P. Czarnik, K. Sharma, G. Muraleedharan, P. J. Coles, and M. Cerezo, Quantum6, 824 (2022)
2022
-
[25]
S. Wang, E. Fontana, M. Cerezo, K. Sharma, A. Sone, L. Cincio, and P. J. Coles, Nature Communications 12, 6961 (2021)
2021
-
[26]
Huang, M
H.-Y. Huang, M. Broughton, M. Mohseni, R. Bab- bush, S. Boixo, H. Neven, and J. R. McClean, Nature Communications12, 2631 (2021)
2021
-
[27]
M. A. Nielsen and I. L. Chuang,Quantum Computa- tion and Quantum Information(Cambridge University Press, 2010)
2010
-
[28]
Kordzanganeh, P
M. Kordzanganeh, P. Sekatski, L. Fedichkin, and A. Melnikov, Machine Learning: Science and Tech- nology4, 035036 (2023)
2023
-
[29]
Rosenblatt, Psychological Review65, 386 (1958)
F. Rosenblatt, Psychological Review65, 386 (1958)
1958
-
[30]
H. Xiao, K. Rasul, and R. Vollgraf, arXiv preprint arXiv:1708.07747 (2017)
work page internal anchor Pith review arXiv 2017
-
[31]
Semeion handwritten digit, UCI Machine Learning Repository (1998)
1998
-
[32]
Pearson, The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science2, 559 (1901)
K. Pearson, The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science2, 559 (1901)
1901
-
[33]
PennyLane: Automatic differentiation of hybrid quantum-classical computations
V. Bergholm, J. Izaac, M. Schuld, C. Gogolin, S. Ahmed,et al., arXiv preprint arXiv:1811.04968 (2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[34]
Lin, arXiv preprint arXiv:2311.18727 (2024)
M. Lin, arXiv preprint arXiv:2311.18727 (2024)
- [35]
-
[36]
C. Satbhaya, Semeion handwritten digit data set — CNN, https://github.com/ChiragSatbhaya/ Semeion-Handwritten-Digit-Data-Set---CNN (2025), gitHub repository, last accessed July 29, 2025
2025
- [37]
-
[38]
Senokosov, A
A. Senokosov, A. Sedykh, A. Sagingalieva, B. Kyria- cou, and A. Melnikov, Machine Learning: Science and Technology5, 015040 (2024)
2024
-
[39]
Zalando Research, Fashion-MNIST dataset, https: //www.kaggle.com/datasets/zalando-research/ fashionmnist (2025), accessed July 29, 2025 via Kaggle
2025
-
[40]
Kuzmin, W
V. Kuzmin, W. Somogyi, E. Pankovets, and A. Mel- nikov, Advanced Quantum Technologies8, e00603 (2025)
2025
-
[41]
J. Bowles, A. Shahnawaz, and M. Schuld, arXiv preprint arXiv:2403.07059 (2024)
-
[42]
V. Patapovich, M. Periyasamy, M. Kordzanganeh, and A. Melnikov, arXiv preprint arXiv:2506.08749 (2025)
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.