Recognition: no theorem link
Winning Lottery Tickets in Neural Networks via a Quantum-Inspired Classical Algorithm
Pith reviewed 2026-05-15 05:50 UTC · model grok-4.3
The pith
A classical algorithm samples from a ridgelet-defined distribution to select sparse neural subnetworks in polynomial time in data dimension D.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors construct and analyze a quantum-inspired classical algorithm that samples hidden nodes from the probability distribution defined by the ridgelet transform. They prove the algorithm runs in O(poly(D)) time and demonstrate through simulations that the selected subnetworks attain empirical risk comparable to exact sampling from the optimized distribution while scaling exponentially better than the naive classical implementation that enumerates all candidate nodes.
What carries the argument
The ridgelet transform, which defines an optimized probability distribution over candidate hidden nodes that admits an efficient classical sampling procedure.
If this is right
- Sparse subnetwork selection for shallow networks becomes feasible for data dimensions that are currently prohibitive for naive classical methods.
- The selected subnetworks achieve empirical risk comparable to exact sampling from the ridgelet-optimized distribution.
- Runtime scaling improves from exponential to polynomial in D, enabling the approach on conventional hardware.
- The method supplies a classical baseline that any future quantum algorithm for the same task must beat.
Where Pith is reading between the lines
- Similar dequantization techniques may apply to other sampling-based quantum machine learning procedures that rely on ridgelet-style transforms.
- The sampler could be integrated into existing lottery-ticket pruning pipelines to improve scalability on high-dimensional inputs.
- If the polynomial scaling holds for deeper architectures, the approach might reduce the need for specialized quantum hardware in network compression tasks.
Load-bearing premise
The ridgelet transform produces a probability distribution over hidden nodes from which samples can be drawn classically with only polynomial dependence on the data dimension D.
What would settle it
An experiment that measures wall-clock runtime of the sampler while increasing the data dimension D and checks whether the observed scaling remains polynomial or reverts to exponential, or a direct comparison of test risk between subnetworks chosen by this sampler versus those chosen by the original quantum sampler on the same dataset.
Figures
read the original abstract
Quantum machine learning (QML) aims to accelerate machine learning tasks by exploiting quantum computation. Previous work studied a QML algorithm for selecting sparse subnetworks from large shallow neural networks. Instead of directly solving an optimization problem over a large-scale network, this algorithm constructs a sparse subnetwork by sampling hidden nodes from an optimized probability distribution defined using the ridgelet transform. The quantum algorithm performs this sampling in time $O(D)$ in the data dimension $D$, whereas a naive classical implementation relies on handling exponentially many candidate nodes and hence takes $\exp[O(D)]$ time. In this work, we construct and analyze a quantum-inspired fully classical algorithm for the same sampling task. We show that our algorithm runs in time $O(\operatorname{poly}(D))$, thereby removing the exponential dependence on $D$ from the previous classical approach. Numerical simulations show that the proposed sampler achieves empirical risk comparable to exact sampling from the optimized distribution and substantially lower than sampling from the non-optimized uniform distribution, while also exhibiting exponentially improved runtime scaling compared with the conventional classical implementation. These successful dequantization results show that sparse subnetwork selection via optimized sampling can be achieved classically with polynomial data-dimension scaling on conventional computers without quantum hardware, providing an alternative to the existing quantum algorithm.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a quantum-inspired classical algorithm for selecting sparse subnetworks ('winning lottery tickets') from large shallow neural networks. It replaces the quantum sampling step of a prior QML algorithm—which samples hidden nodes from a ridgelet-transform-defined optimized probability distribution in O(D) time—with a fully classical procedure claimed to run in O(poly(D)) time, eliminating the exp(O(D)) cost of naive classical enumeration over candidate nodes. Numerical simulations are reported to match the empirical risk of exact sampling from the optimized distribution while exhibiting improved runtime scaling.
Significance. If the polynomial runtime claim holds with a rigorous complexity analysis, the result strengthens the dequantization literature by exhibiting a classical polynomial-time algorithm for a sampling task previously requiring either quantum hardware or exponential classical time. The work provides concrete evidence that ridgelet-based optimized sampling for lottery-ticket selection can be performed classically without quantum resources, and the reported numerical agreement with exact sampling supports practical relevance for moderate D.
major comments (2)
- [§3.2, §4] §3.2 and §4: The central O(poly(D)) runtime claim for the classical sampler is load-bearing for the dequantization result, yet the manuscript provides only a high-level description of the ridgelet-transform sampling procedure without a complete complexity proof. It is not shown that the necessary integrals, weight computations, or matrix constructions can be performed (or approximated to sufficient precision) without incurring hidden exponential dependence on D, network width, or inverse error; a formal runtime analysis bounding all steps by poly(D) is required.
- [§4.1] §4.1, Theorem 1 (or equivalent): The error analysis relating the approximate classical sampler to exact sampling from the ridgelet distribution is incomplete. No explicit bound is given on the total variation distance or on the number of samples needed to achieve a target approximation error as a function of D and precision parameters; without this, the claim that empirical risk matches exact sampling cannot be rigorously tied to the polynomial runtime.
minor comments (2)
- [Abstract] Abstract: the notation 'exp[O(D)]' should be standardized to exp(O(D)) for consistency with complexity literature.
- [§5, Figure 2] Figure 2 caption and §5: the reported runtime scaling plots would benefit from explicit labels indicating the fitted exponents and the range of D values used, to allow direct verification of the claimed polynomial versus exponential behavior.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help strengthen the rigor of our dequantization result. We address each major comment below and will revise the manuscript to provide the requested formal analyses.
read point-by-point responses
-
Referee: [§3.2, §4] §3.2 and §4: The central O(poly(D)) runtime claim for the classical sampler is load-bearing for the dequantization result, yet the manuscript provides only a high-level description of the ridgelet-transform sampling procedure without a complete complexity proof. It is not shown that the necessary integrals, weight computations, or matrix constructions can be performed (or approximated to sufficient precision) without incurring hidden exponential dependence on D, network width, or inverse error; a formal runtime analysis bounding all steps by poly(D) is required.
Authors: We agree that the current high-level description requires a complete formal complexity proof to rigorously support the O(poly(D)) claim. In the revision we will add a detailed analysis in §3.2 and §4 that explicitly bounds the time for (i) approximating the ridgelet integrals via quadrature or Monte Carlo with poly(D) samples, (ii) constructing the weight matrix and probability distribution, and (iii) performing the sampling step. Under standard Lipschitz and bounded-norm assumptions on the network and data, all steps are shown to be polynomial in D, network width, and inverse precision, with no hidden exponential dependence. revision: yes
-
Referee: [§4.1] §4.1, Theorem 1 (or equivalent): The error analysis relating the approximate classical sampler to exact sampling from the ridgelet distribution is incomplete. No explicit bound is given on the total variation distance or on the number of samples needed to achieve a target approximation error as a function of D and precision parameters; without this, the claim that empirical risk matches exact sampling cannot be rigorously tied to the polynomial runtime.
Authors: We acknowledge the gap in the error analysis. In the revised §4.1 we will derive an explicit bound on the total variation distance between the approximate classical distribution and the exact ridgelet distribution, showing it decays as O(1/poly(D)) under the chosen approximation parameters. We will also state the number of samples required to achieve a target TV distance ε as a function of D and ε, thereby rigorously connecting the polynomial runtime to the observed empirical-risk agreement with exact sampling. revision: yes
Circularity Check
No significant circularity detected in the derivation chain
full rationale
The paper presents a new classical sampling algorithm constructed from the ridgelet transform definition, with an explicit complexity analysis claiming O(poly(D)) runtime that is independent of any fitted parameters or prior outputs. No equations or steps in the provided abstract reduce a prediction to a fitted input by construction, nor does the central claim rely on a self-citation chain that is itself unverified. The derivation is self-contained, resting on a fresh algorithmic reduction rather than re-deriving quantities from the quantum prior work.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The ridgelet transform produces an optimized probability distribution over hidden nodes that can be sampled classically in polynomial time.
Reference graph
Works this paper leans on
-
[1]
Song Han, Huizi Mao, and William J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding, 2016. URL https://arxiv. org/abs/1510.00149
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[2]
SNIP: SINGLE-SHOT NETWORK PRUNING BASED ON CONNECTION SENSITIVITY
Namhoon Lee, Thalaiyasingam Ajanthan, and Philip Torr. SNIP: SINGLE-SHOT NETWORK PRUNING BASED ON CONNECTION SENSITIVITY. InInternational Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=B1VZqjAcYX
work page 2019
-
[3]
Hidenori Tanaka, Daniel Kunin, Daniel L. K. Yamins, and Surya Ganguli. Pruning neural networks without any data by iteratively conserving synaptic flow. InProceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, Red Hook, NY , USA, 2020. Curran Associates Inc. ISBN 9781713829546
work page 2020
-
[4]
Rigging the lottery: making all tickets winners
Utku Evci, Trevor Gale, Jacob Menick, Pablo Samuel Castro, and Erich Elsen. Rigging the lottery: making all tickets winners. InProceedings of the 37th International Conference on Machine Learning, ICML’20. JMLR.org, 2020
work page 2020
-
[5]
The lottery ticket hypothesis: Finding sparse, trainable neural networks
Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural networks. InInternational Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=rJl-b3RcF7
work page 2019
-
[6]
A survey of lottery ticket hypothesis, 2024
Bohan Liu, Zijie Zhang, Peixiong He, Zhensen Wang, Yang Xiao, Ruimeng Ye, Yang Zhou, Wei-Shinn Ku, and Bo Hui. A survey of lottery ticket hypothesis, 2024. URL https://arxiv. org/abs/2403.04861
-
[7]
Random features for large-scale kernel machines
Ali Rahimi and Benjamin Recht. Random features for large-scale kernel machines. In J. Platt, D. Koller, Y . Singer, and S. Roweis, editors,Advances in Neural Information Processing Sys- tems, volume 20. Curran Associates, Inc., 2007. URL https://proceedings.neurips.cc/ paper_files/paper/2007/file/013a006f03dbc5392effeb8f18fda755-Paper.pdf
work page 2007
-
[8]
Weighted sums of random kitchen sinks: Replacing minimiza- tion with randomization in learning
Ali Rahimi and Benjamin Recht. Weighted sums of random kitchen sinks: Replacing minimiza- tion with randomization in learning. In D. Koller, D. Schuurmans, Y . Bengio, and L. Bottou, editors,Advances in Neural Information Processing Systems, volume 21. Curran Associates, Inc., 2008. URL https://proceedings.neurips.cc/paper_files/paper/2008/file/ 0efe32849...
work page 2008
-
[9]
Francis Bach. On the equivalence between kernel quadrature rules and random feature expansions.Journal of Machine Learning Research, 18(21):1–38, 2017. URL http: //jmlr.org/papers/v18/15-178.html
work page 2017
-
[10]
Sho Sonoda and Noboru Murata. Neural network with unbounded activation functions is universal approximator.Applied and Computational Harmonic Analysis, 43(2):233–268,
-
[11]
doi: https://doi.org/10.1016/j.acha.2015.12.005
ISSN 1063-5203. doi: https://doi.org/10.1016/j.acha.2015.12.005. URL https: //www.sciencedirect.com/science/article/pii/S1063520315001748
-
[12]
Quan- tum ridgelet transform: Winning lottery ticket of neural networks with quantum computation
Hayata Yamasaki, Sathyawageeswar Subramanian, Satoshi Hayakawa, and Sho Sonoda. Quan- tum ridgelet transform: Winning lottery ticket of neural networks with quantum computation. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors,Proceedings of the 40th International Conference on Machine Lear...
work page 2023
-
[13]
Peter Wittek.Quantum Machine Learning: What Quantum Computing Means to Data Min- ing. Elsevier, 2014. URL https://www.sciencedirect.com/book/9780128009536/ quantum-machine-learning
-
[14]
Quantum machine learning.Nature, 549(7671):195–202, Sep 2017
Jacob Biamonte, Peter Wittek, Nicola Pancotti, Patrick Rebentrost, Nathan Wiebe, and Seth Lloyd. Quantum machine learning.Nature, 549(7671):195–202, Sep 2017. ISSN 1476-4687. doi: 10.1038/nature23474. URLhttps://doi.org/10.1038/nature23474. 11
-
[15]
Harrow, Avinatan Hassidim, and Seth Lloyd
Aram W. Harrow, Avinatan Hassidim, and Seth Lloyd. Quantum algorithm for linear systems of equations.Phys. Rev. Lett., 103:150502, Oct 2009. doi: 10.1103/PhysRevLett.103.150502. URLhttps://link.aps.org/doi/10.1103/PhysRevLett.103.150502
-
[16]
András Gilyén, Yuan Su, Guang Hao Low, and Nathan Wiebe. Quantum singular value transformation and beyond: exponential improvements for quantum matrix arithmetics. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, page 193–204, New York, NY , USA, 2019. Association for Computing Machinery. ISBN 9781450367059. doi: 1...
-
[17]
Quantum Computing in the NISQ era and beyond.Quantum, 2:79, August 2018
John Preskill. Quantum Computing in the NISQ era and beyond.Quantum, 2:79, August 2018. ISSN 2521-327X. doi: 10.22331/q-2018-08-06-79. URL https://doi.org/10.22331/ q-2018-08-06-79
work page internal anchor Pith review doi:10.22331/q-2018-08-06-79 2018
-
[18]
Abanin, Laleh Aghababaie-Beni, Igor Aleiner, Trond I
Rajeev Acharya, Dmitry A. Abanin, Laleh Aghababaie-Beni, Igor Aleiner, Trond I. Andersen, Markus Ansmann, Frank Arute, Kunal Arya, Abraham Asfaw, Nikita Astrakhantsev, Juan Atalaya, Ryan Babbush, Dave Bacon, Brian Ballard, Joseph C. Bardin, Johannes Bausch, An- dreas Bengtsson, Alexander Bilmes, Sam Blackwell, Sergio Boixo, Gina Bortoli, Alexandre Bourass...
-
[19]
Michael A. Nielsen and Isaac L. Chuang.Quantum Computation and Quan- tum Information: 10th Anniversary Edition. Cambridge University Press, 10th edition, 2011. ISBN 9781107002173. URL https://www.cambridge.org/ highereducation/books/quantum-computation-and-quantum-information/ 01E10196D0A682A6AEFFEA52D53BE9AE
work page 2011
-
[20]
Cambridge University Press, 2009
Sanjeev Arora and Boaz Barak.Computational complexity: A Modern Approach. Cambridge University Press, 2009. doi: 10.1017/CBO9780511804090
-
[21]
A quantum-inspired classical algorithm for recommendation systems
Ewin Tang. A quantum-inspired classical algorithm for recommendation systems. InPro- ceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, page 217–228, New York, NY , USA, 2019. Association for Computing Machinery. ISBN 9781450367059. doi: 10.1145/3313276.3316310. URL https://doi.org/10.1145/ 3313276.3316310
-
[22]
Nai-Hui Chia, András Pal Gilyén, Tongyang Li, Han-Hsuan Lin, Ewin Tang, and Chunhao Wang. Sampling-based sublinear low-rank matrix arithmetic framework for dequantizing quantum machine learning.J. ACM, 69(5), October 2022. ISSN 0004-5411. doi: 10.1145/3549524. URLhttps://doi.org/10.1145/3549524
-
[23]
François Le Gall. Robust dequantization of the quantum singular value transformation and quantum machine learning algorithms.computational complexity, 34(1):2, 2025. doi: 10.1007/s00037-024-00262-3. URL https://link.springer.com/article/10.1007/ s00037-024-00262-3
-
[24]
A.R. Barron. Universal approximation bounds for superpositions of a sigmoidal function.IEEE Transactions on Information Theory, 39(3):930–945, 1993. doi: 10.1109/18.256500
-
[25]
Noboru Murata. An integral representation of functions using three-layered networks and their approximation bounds.Neural Networks, 9(6):947–956, 1996. ISSN 0893-6080. doi: https://doi.org/10.1016/0893-6080(96)00000-7. URL https://www.sciencedirect.com/ science/article/pii/0893608096000007
-
[26]
Candès.Ridgelets: Theory and Applications
Emmanuel J. Candès.Ridgelets: Theory and Applications. Ph.d. thesis, Stanford University,
-
[27]
URLhttps://candes.su.domains/publications/downloads/Thesis.pdf
-
[28]
Forrelation: A problem that optimally separates quantum from classical computing
Scott Aaronson and Andris Ambainis. Forrelation: A problem that optimally separates quantum from classical computing. InProceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing, STOC ’15, page 307–316, New York, NY , USA, 2015. Association for Computing Machinery. ISBN 9781450335362. doi: 10.1145/2746539.2746547. URL https://doi.org/10...
-
[29]
Creating superpositions that correspond to efficiently integrable probability distributions
Lov Grover and Terry Rudolph. Creating superpositions that correspond to efficiently integrable probability distributions, 2002. URLhttps://arxiv.org/abs/quant-ph/0208112
work page internal anchor Pith review Pith/arXiv arXiv 2002
-
[30]
Quantum Recommendation Systems
Iordanis Kerenidis and Anupam Prakash. Quantum Recommendation Systems. In Christos H. Papadimitriou, editor,8th Innovations in Theoretical Computer Science Conference (ITCS 2017), volume 67 ofLeibniz International Proceedings in Informatics (LIPIcs), pages 49:1– 49:21, Dagstuhl, Germany, 2017. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. ISBN 978-3-...
-
[31]
Sevag Gharibian and François Le Gall. Dequantizing the quantum singular value transformation: hardness and applications to quantum chemistry and the quantum pcp conjecture. InProceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2022, page 19–32, New York, NY , USA, 2022. Association for Computing Machinery. ISBN 9781450392648. ...
-
[32]
Variable time amplitude amplification and quantum algorithms for lin- ear algebra problems
Andris Ambainis. Variable time amplitude amplification and quantum algorithms for lin- ear algebra problems. In Christoph Dürr and Thomas Wilke, editors,29th International Symposium on Theoretical Aspects of Computer Science (STACS 2012), volume 14 of Leibniz International Proceedings in Informatics (LIPIcs), pages 636–647, Dagstuhl, Ger- many, 2012. Schl...
-
[33]
Childs, Robin Kothari, and Rolando D
Andrew M. Childs, Robin Kothari, and Rolando D. Somma. Quantum algorithm for systems of linear equations with exponentially improved dependence on precision.SIAM Journal on Computing, 46(6):1920–1950, 2017. doi: 10.1137/16M1087072. URL https://doi.org/ 10.1137/16M1087072. 14 A Preliminaries A.1 Notations and general definitions Let N be the set of natural...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.