pith. machine review for the scientific record. sign in

arxiv: 2605.13979 · v1 · submitted 2026-05-13 · 🪐 quant-ph · cs.LG· stat.ML

Recognition: no theorem link

Winning Lottery Tickets in Neural Networks via a Quantum-Inspired Classical Algorithm

Authors on Pith no claims yet

Pith reviewed 2026-05-15 05:50 UTC · model grok-4.3

classification 🪐 quant-ph cs.LGstat.ML
keywords lottery ticket hypothesisneural network pruningridgelet transformquantum-inspired algorithmssparse subnetworksdequantizationclassical samplingshallow neural networks
0
0 comments X

The pith

A classical algorithm samples from a ridgelet-defined distribution to select sparse neural subnetworks in polynomial time in data dimension D.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs a fully classical algorithm for the sampling step in a quantum machine learning method that finds lottery-ticket subnetworks in shallow neural nets. It uses the ridgelet transform to define an optimized probability distribution over hidden nodes and then shows how to draw samples from that distribution efficiently on ordinary computers. The resulting runtime is polynomial in the input dimension D rather than exponential, while numerical tests indicate the selected subnetworks achieve risk comparable to exact optimized sampling and better than uniform random selection. A sympathetic reader would care because the result demonstrates that a quantum-inspired idea can be turned into a practical classical procedure that scales to higher-dimensional data without requiring quantum hardware.

Core claim

The authors construct and analyze a quantum-inspired classical algorithm that samples hidden nodes from the probability distribution defined by the ridgelet transform. They prove the algorithm runs in O(poly(D)) time and demonstrate through simulations that the selected subnetworks attain empirical risk comparable to exact sampling from the optimized distribution while scaling exponentially better than the naive classical implementation that enumerates all candidate nodes.

What carries the argument

The ridgelet transform, which defines an optimized probability distribution over candidate hidden nodes that admits an efficient classical sampling procedure.

If this is right

  • Sparse subnetwork selection for shallow networks becomes feasible for data dimensions that are currently prohibitive for naive classical methods.
  • The selected subnetworks achieve empirical risk comparable to exact sampling from the ridgelet-optimized distribution.
  • Runtime scaling improves from exponential to polynomial in D, enabling the approach on conventional hardware.
  • The method supplies a classical baseline that any future quantum algorithm for the same task must beat.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar dequantization techniques may apply to other sampling-based quantum machine learning procedures that rely on ridgelet-style transforms.
  • The sampler could be integrated into existing lottery-ticket pruning pipelines to improve scalability on high-dimensional inputs.
  • If the polynomial scaling holds for deeper architectures, the approach might reduce the need for specialized quantum hardware in network compression tasks.

Load-bearing premise

The ridgelet transform produces a probability distribution over hidden nodes from which samples can be drawn classically with only polynomial dependence on the data dimension D.

What would settle it

An experiment that measures wall-clock runtime of the sampler while increasing the data dimension D and checks whether the observed scaling remains polynomial or reverts to exponential, or a direct comparison of test risk between subnetworks chosen by this sampler versus those chosen by the original quantum sampler on the same dataset.

Figures

Figures reproduced from arXiv: 2605.13979 by Hayata Yamasaki, Mio Murao, Natsuto Isogai, Sho Sonoda.

Figure 1
Figure 1. Figure 1: Empirical risk of sparse subnetworks constructed from [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Runtime comparison between the naive sampling algorithm (blue solid line) and our [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The function fD to be learned and the activation function g, when the dimension D = 1 and P = 7. E Details of Numerical Experiments We describe the detailed setting used in numerical experiments in Figures 1 and 2. All experiments are performed in the finite-domain setting over ZP . E.1 Empirical-risk experiment For the empirical-risk experiment, we set P = 7, D ∈ {1, 2, 3, 4, 5, 6}, M = 50D. (71) For each… view at source ↗
read the original abstract

Quantum machine learning (QML) aims to accelerate machine learning tasks by exploiting quantum computation. Previous work studied a QML algorithm for selecting sparse subnetworks from large shallow neural networks. Instead of directly solving an optimization problem over a large-scale network, this algorithm constructs a sparse subnetwork by sampling hidden nodes from an optimized probability distribution defined using the ridgelet transform. The quantum algorithm performs this sampling in time $O(D)$ in the data dimension $D$, whereas a naive classical implementation relies on handling exponentially many candidate nodes and hence takes $\exp[O(D)]$ time. In this work, we construct and analyze a quantum-inspired fully classical algorithm for the same sampling task. We show that our algorithm runs in time $O(\operatorname{poly}(D))$, thereby removing the exponential dependence on $D$ from the previous classical approach. Numerical simulations show that the proposed sampler achieves empirical risk comparable to exact sampling from the optimized distribution and substantially lower than sampling from the non-optimized uniform distribution, while also exhibiting exponentially improved runtime scaling compared with the conventional classical implementation. These successful dequantization results show that sparse subnetwork selection via optimized sampling can be achieved classically with polynomial data-dimension scaling on conventional computers without quantum hardware, providing an alternative to the existing quantum algorithm.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents a quantum-inspired classical algorithm for selecting sparse subnetworks ('winning lottery tickets') from large shallow neural networks. It replaces the quantum sampling step of a prior QML algorithm—which samples hidden nodes from a ridgelet-transform-defined optimized probability distribution in O(D) time—with a fully classical procedure claimed to run in O(poly(D)) time, eliminating the exp(O(D)) cost of naive classical enumeration over candidate nodes. Numerical simulations are reported to match the empirical risk of exact sampling from the optimized distribution while exhibiting improved runtime scaling.

Significance. If the polynomial runtime claim holds with a rigorous complexity analysis, the result strengthens the dequantization literature by exhibiting a classical polynomial-time algorithm for a sampling task previously requiring either quantum hardware or exponential classical time. The work provides concrete evidence that ridgelet-based optimized sampling for lottery-ticket selection can be performed classically without quantum resources, and the reported numerical agreement with exact sampling supports practical relevance for moderate D.

major comments (2)
  1. [§3.2, §4] §3.2 and §4: The central O(poly(D)) runtime claim for the classical sampler is load-bearing for the dequantization result, yet the manuscript provides only a high-level description of the ridgelet-transform sampling procedure without a complete complexity proof. It is not shown that the necessary integrals, weight computations, or matrix constructions can be performed (or approximated to sufficient precision) without incurring hidden exponential dependence on D, network width, or inverse error; a formal runtime analysis bounding all steps by poly(D) is required.
  2. [§4.1] §4.1, Theorem 1 (or equivalent): The error analysis relating the approximate classical sampler to exact sampling from the ridgelet distribution is incomplete. No explicit bound is given on the total variation distance or on the number of samples needed to achieve a target approximation error as a function of D and precision parameters; without this, the claim that empirical risk matches exact sampling cannot be rigorously tied to the polynomial runtime.
minor comments (2)
  1. [Abstract] Abstract: the notation 'exp[O(D)]' should be standardized to exp(O(D)) for consistency with complexity literature.
  2. [§5, Figure 2] Figure 2 caption and §5: the reported runtime scaling plots would benefit from explicit labels indicating the fitted exponents and the range of D values used, to allow direct verification of the claimed polynomial versus exponential behavior.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help strengthen the rigor of our dequantization result. We address each major comment below and will revise the manuscript to provide the requested formal analyses.

read point-by-point responses
  1. Referee: [§3.2, §4] §3.2 and §4: The central O(poly(D)) runtime claim for the classical sampler is load-bearing for the dequantization result, yet the manuscript provides only a high-level description of the ridgelet-transform sampling procedure without a complete complexity proof. It is not shown that the necessary integrals, weight computations, or matrix constructions can be performed (or approximated to sufficient precision) without incurring hidden exponential dependence on D, network width, or inverse error; a formal runtime analysis bounding all steps by poly(D) is required.

    Authors: We agree that the current high-level description requires a complete formal complexity proof to rigorously support the O(poly(D)) claim. In the revision we will add a detailed analysis in §3.2 and §4 that explicitly bounds the time for (i) approximating the ridgelet integrals via quadrature or Monte Carlo with poly(D) samples, (ii) constructing the weight matrix and probability distribution, and (iii) performing the sampling step. Under standard Lipschitz and bounded-norm assumptions on the network and data, all steps are shown to be polynomial in D, network width, and inverse precision, with no hidden exponential dependence. revision: yes

  2. Referee: [§4.1] §4.1, Theorem 1 (or equivalent): The error analysis relating the approximate classical sampler to exact sampling from the ridgelet distribution is incomplete. No explicit bound is given on the total variation distance or on the number of samples needed to achieve a target approximation error as a function of D and precision parameters; without this, the claim that empirical risk matches exact sampling cannot be rigorously tied to the polynomial runtime.

    Authors: We acknowledge the gap in the error analysis. In the revised §4.1 we will derive an explicit bound on the total variation distance between the approximate classical distribution and the exact ridgelet distribution, showing it decays as O(1/poly(D)) under the chosen approximation parameters. We will also state the number of samples required to achieve a target TV distance ε as a function of D and ε, thereby rigorously connecting the polynomial runtime to the observed empirical-risk agreement with exact sampling. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in the derivation chain

full rationale

The paper presents a new classical sampling algorithm constructed from the ridgelet transform definition, with an explicit complexity analysis claiming O(poly(D)) runtime that is independent of any fitted parameters or prior outputs. No equations or steps in the provided abstract reduce a prediction to a fitted input by construction, nor does the central claim rely on a self-citation chain that is itself unverified. The derivation is self-contained, resting on a fresh algorithmic reduction rather than re-deriving quantities from the quantum prior work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the existence of an efficient classical sampler for the ridgelet-defined distribution; no explicit free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption The ridgelet transform produces an optimized probability distribution over hidden nodes that can be sampled classically in polynomial time.
    This assumption is invoked to replace the exponential classical enumeration with the new poly(D) procedure.

pith-pipeline@v0.9.0 · 5539 in / 1174 out tokens · 59750 ms · 2026-05-15T05:50:22.825489+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 3 internal anchors

  1. [1]

    Song Han, Huizi Mao, and William J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding, 2016. URL https://arxiv. org/abs/1510.00149

  2. [2]

    SNIP: SINGLE-SHOT NETWORK PRUNING BASED ON CONNECTION SENSITIVITY

    Namhoon Lee, Thalaiyasingam Ajanthan, and Philip Torr. SNIP: SINGLE-SHOT NETWORK PRUNING BASED ON CONNECTION SENSITIVITY. InInternational Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=B1VZqjAcYX

  3. [3]

    Hidenori Tanaka, Daniel Kunin, Daniel L. K. Yamins, and Surya Ganguli. Pruning neural networks without any data by iteratively conserving synaptic flow. InProceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, Red Hook, NY , USA, 2020. Curran Associates Inc. ISBN 9781713829546

  4. [4]

    Rigging the lottery: making all tickets winners

    Utku Evci, Trevor Gale, Jacob Menick, Pablo Samuel Castro, and Erich Elsen. Rigging the lottery: making all tickets winners. InProceedings of the 37th International Conference on Machine Learning, ICML’20. JMLR.org, 2020

  5. [5]

    The lottery ticket hypothesis: Finding sparse, trainable neural networks

    Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural networks. InInternational Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=rJl-b3RcF7

  6. [6]

    A survey of lottery ticket hypothesis, 2024

    Bohan Liu, Zijie Zhang, Peixiong He, Zhensen Wang, Yang Xiao, Ruimeng Ye, Yang Zhou, Wei-Shinn Ku, and Bo Hui. A survey of lottery ticket hypothesis, 2024. URL https://arxiv. org/abs/2403.04861

  7. [7]

    Random features for large-scale kernel machines

    Ali Rahimi and Benjamin Recht. Random features for large-scale kernel machines. In J. Platt, D. Koller, Y . Singer, and S. Roweis, editors,Advances in Neural Information Processing Sys- tems, volume 20. Curran Associates, Inc., 2007. URL https://proceedings.neurips.cc/ paper_files/paper/2007/file/013a006f03dbc5392effeb8f18fda755-Paper.pdf

  8. [8]

    Weighted sums of random kitchen sinks: Replacing minimiza- tion with randomization in learning

    Ali Rahimi and Benjamin Recht. Weighted sums of random kitchen sinks: Replacing minimiza- tion with randomization in learning. In D. Koller, D. Schuurmans, Y . Bengio, and L. Bottou, editors,Advances in Neural Information Processing Systems, volume 21. Curran Associates, Inc., 2008. URL https://proceedings.neurips.cc/paper_files/paper/2008/file/ 0efe32849...

  9. [9]

    On the equivalence between kernel quadrature rules and random feature expansions.Journal of Machine Learning Research, 18(21):1–38, 2017

    Francis Bach. On the equivalence between kernel quadrature rules and random feature expansions.Journal of Machine Learning Research, 18(21):1–38, 2017. URL http: //jmlr.org/papers/v18/15-178.html

  10. [10]

    Neural network with unbounded activation functions is universal approximator.Applied and Computational Harmonic Analysis, 43(2):233–268,

    Sho Sonoda and Noboru Murata. Neural network with unbounded activation functions is universal approximator.Applied and Computational Harmonic Analysis, 43(2):233–268,

  11. [11]

    doi: https://doi.org/10.1016/j.acha.2015.12.005

    ISSN 1063-5203. doi: https://doi.org/10.1016/j.acha.2015.12.005. URL https: //www.sciencedirect.com/science/article/pii/S1063520315001748

  12. [12]

    Quan- tum ridgelet transform: Winning lottery ticket of neural networks with quantum computation

    Hayata Yamasaki, Sathyawageeswar Subramanian, Satoshi Hayakawa, and Sho Sonoda. Quan- tum ridgelet transform: Winning lottery ticket of neural networks with quantum computation. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors,Proceedings of the 40th International Conference on Machine Lear...

  13. [13]

    Elsevier, 2014

    Peter Wittek.Quantum Machine Learning: What Quantum Computing Means to Data Min- ing. Elsevier, 2014. URL https://www.sciencedirect.com/book/9780128009536/ quantum-machine-learning

  14. [14]

    Quantum machine learning.Nature, 549(7671):195–202, Sep 2017

    Jacob Biamonte, Peter Wittek, Nicola Pancotti, Patrick Rebentrost, Nathan Wiebe, and Seth Lloyd. Quantum machine learning.Nature, 549(7671):195–202, Sep 2017. ISSN 1476-4687. doi: 10.1038/nature23474. URLhttps://doi.org/10.1038/nature23474. 11

  15. [15]

    Harrow, Avinatan Hassidim, and Seth Lloyd

    Aram W. Harrow, Avinatan Hassidim, and Seth Lloyd. Quantum algorithm for linear systems of equations.Phys. Rev. Lett., 103:150502, Oct 2009. doi: 10.1103/PhysRevLett.103.150502. URLhttps://link.aps.org/doi/10.1103/PhysRevLett.103.150502

  16. [16]

    Quantum singular value transformation and beyond: exponential improvements for quantum matrix arithmetics

    András Gilyén, Yuan Su, Guang Hao Low, and Nathan Wiebe. Quantum singular value transformation and beyond: exponential improvements for quantum matrix arithmetics. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, page 193–204, New York, NY , USA, 2019. Association for Computing Machinery. ISBN 9781450367059. doi: 1...

  17. [17]

    Quantum Computing in the NISQ era and beyond.Quantum, 2:79, August 2018

    John Preskill. Quantum Computing in the NISQ era and beyond.Quantum, 2:79, August 2018. ISSN 2521-327X. doi: 10.22331/q-2018-08-06-79. URL https://doi.org/10.22331/ q-2018-08-06-79

  18. [18]

    Abanin, Laleh Aghababaie-Beni, Igor Aleiner, Trond I

    Rajeev Acharya, Dmitry A. Abanin, Laleh Aghababaie-Beni, Igor Aleiner, Trond I. Andersen, Markus Ansmann, Frank Arute, Kunal Arya, Abraham Asfaw, Nikita Astrakhantsev, Juan Atalaya, Ryan Babbush, Dave Bacon, Brian Ballard, Joseph C. Bardin, Johannes Bausch, An- dreas Bengtsson, Alexander Bilmes, Sam Blackwell, Sergio Boixo, Gina Bortoli, Alexandre Bourass...

  19. [19]

    Nielsen and Isaac L

    Michael A. Nielsen and Isaac L. Chuang.Quantum Computation and Quan- tum Information: 10th Anniversary Edition. Cambridge University Press, 10th edition, 2011. ISBN 9781107002173. URL https://www.cambridge.org/ highereducation/books/quantum-computation-and-quantum-information/ 01E10196D0A682A6AEFFEA52D53BE9AE

  20. [20]

    Cambridge University Press, 2009

    Sanjeev Arora and Boaz Barak.Computational complexity: A Modern Approach. Cambridge University Press, 2009. doi: 10.1017/CBO9780511804090

  21. [21]

    A quantum-inspired classical algorithm for recommendation systems

    Ewin Tang. A quantum-inspired classical algorithm for recommendation systems. InPro- ceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, page 217–228, New York, NY , USA, 2019. Association for Computing Machinery. ISBN 9781450367059. doi: 10.1145/3313276.3316310. URL https://doi.org/10.1145/ 3313276.3316310

  22. [22]

    Sampling-based sublinear low-rank matrix arithmetic framework for dequantizing quantum machine learning.J

    Nai-Hui Chia, András Pal Gilyén, Tongyang Li, Han-Hsuan Lin, Ewin Tang, and Chunhao Wang. Sampling-based sublinear low-rank matrix arithmetic framework for dequantizing quantum machine learning.J. ACM, 69(5), October 2022. ISSN 0004-5411. doi: 10.1145/3549524. URLhttps://doi.org/10.1145/3549524

  23. [23]

    Robust dequantization of the quantum singular value transformation and quantum machine learning algorithms.computational complexity, 34(1):2, 2025

    François Le Gall. Robust dequantization of the quantum singular value transformation and quantum machine learning algorithms.computational complexity, 34(1):2, 2025. doi: 10.1007/s00037-024-00262-3. URL https://link.springer.com/article/10.1007/ s00037-024-00262-3

  24. [24]

    A.R. Barron. Universal approximation bounds for superpositions of a sigmoidal function.IEEE Transactions on Information Theory, 39(3):930–945, 1993. doi: 10.1109/18.256500

  25. [25]

    An integral representation of functions using three-layered networks and their approximation bounds.Neural Networks, 9(6):947–956, 1996

    Noboru Murata. An integral representation of functions using three-layered networks and their approximation bounds.Neural Networks, 9(6):947–956, 1996. ISSN 0893-6080. doi: https://doi.org/10.1016/0893-6080(96)00000-7. URL https://www.sciencedirect.com/ science/article/pii/0893608096000007

  26. [26]

    Candès.Ridgelets: Theory and Applications

    Emmanuel J. Candès.Ridgelets: Theory and Applications. Ph.d. thesis, Stanford University,

  27. [27]

    URLhttps://candes.su.domains/publications/downloads/Thesis.pdf

  28. [28]

    Forrelation: A problem that optimally separates quantum from classical computing

    Scott Aaronson and Andris Ambainis. Forrelation: A problem that optimally separates quantum from classical computing. InProceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing, STOC ’15, page 307–316, New York, NY , USA, 2015. Association for Computing Machinery. ISBN 9781450335362. doi: 10.1145/2746539.2746547. URL https://doi.org/10...

  29. [29]

    Creating superpositions that correspond to efficiently integrable probability distributions

    Lov Grover and Terry Rudolph. Creating superpositions that correspond to efficiently integrable probability distributions, 2002. URLhttps://arxiv.org/abs/quant-ph/0208112

  30. [30]

    Quantum Recommendation Systems

    Iordanis Kerenidis and Anupam Prakash. Quantum Recommendation Systems. In Christos H. Papadimitriou, editor,8th Innovations in Theoretical Computer Science Conference (ITCS 2017), volume 67 ofLeibniz International Proceedings in Informatics (LIPIcs), pages 49:1– 49:21, Dagstuhl, Germany, 2017. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. ISBN 978-3-...

  31. [31]

    Dequantizing the quantum singular value transformation: hardness and applications to quantum chemistry and the quantum pcp conjecture

    Sevag Gharibian and François Le Gall. Dequantizing the quantum singular value transformation: hardness and applications to quantum chemistry and the quantum pcp conjecture. InProceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2022, page 19–32, New York, NY , USA, 2022. Association for Computing Machinery. ISBN 9781450392648. ...

  32. [32]

    Variable time amplitude amplification and quantum algorithms for lin- ear algebra problems

    Andris Ambainis. Variable time amplitude amplification and quantum algorithms for lin- ear algebra problems. In Christoph Dürr and Thomas Wilke, editors,29th International Symposium on Theoretical Aspects of Computer Science (STACS 2012), volume 14 of Leibniz International Proceedings in Informatics (LIPIcs), pages 636–647, Dagstuhl, Ger- many, 2012. Schl...

  33. [33]

    Childs, Robin Kothari, and Rolando D

    Andrew M. Childs, Robin Kothari, and Rolando D. Somma. Quantum algorithm for systems of linear equations with exponentially improved dependence on precision.SIAM Journal on Computing, 46(6):1920–1950, 2017. doi: 10.1137/16M1087072. URL https://doi.org/ 10.1137/16M1087072. 14 A Preliminaries A.1 Notations and general definitions Let N be the set of natural...