Recognition: unknown
Quantum Hierarchical Reinforcement Learning via Variational Quantum Circuits
Pith reviewed 2026-05-07 17:16 UTC · model grok-4.3
The pith
A hybrid agent using a quantum feature extractor outperforms classical hierarchical reinforcement learning baselines while using up to 66% fewer trainable parameters.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a hybrid hierarchical agent substituting classical components with variational quantum circuits for feature extractors, option-value functions, termination functions, and intra-option policies can outperform classical baselines on standard environments, with the quantum feature extractor variant saving up to 66% parameters. Quantum option-value estimation severely degrades performance, while architectural choices of the quantum circuits affect overall effectiveness. The results establish practical design principles for parameter-efficient hybrid hierarchical agents.
What carries the argument
The hybrid option-critic architecture that substitutes variational quantum circuits for feature extractors, option-value functions, termination functions, and intra-option policies.
If this is right
- A quantum feature extractor improves performance over fully classical hierarchical agents while cutting trainable parameters by up to 66%.
- Quantum circuits should be avoided for option-value estimation because they create a severe performance bottleneck.
- Architectural details of the variational quantum circuits directly determine how well the hybrid agent performs.
- Hybrid quantum-classical designs offer a practical route to more parameter-efficient hierarchical reinforcement learning.
Where Pith is reading between the lines
- If simulator results translate to real hardware, the approach could extend hierarchical RL to larger state spaces where parameter budgets are tight.
- The identified bottleneck implies that future quantum RL work should prioritize quantum components for feature extraction over value estimation.
- Similar hybrid substitutions might improve efficiency in other temporal-abstraction RL methods beyond the option-critic framework.
Load-bearing premise
That variational quantum circuits trained and evaluated on classical simulators faithfully represent their behavior on actual near-term quantum hardware.
What would settle it
Executing the full hybrid agent on a physical quantum processor and checking whether the reported performance gains and parameter reductions still hold.
Figures
read the original abstract
Reinforcement learning is one of the most challenging learning paradigms where efficacy and efficiency gains are extremely valuable. Hierarchical reinforcement learning is a variant that leverages temporal abstraction to structure decision-making. While parametrized quantum computations have shown success in non-hierarchical reinforcement learning, whether these advantages adapt to hierarchical decision-making remains a critical open question. In this work, we develop a hybrid hierarchical agent based on the option-critic architecture. This hybrid agent substitutes classical components with variational quantum circuits for feature extractors, option-value functions, termination functions, and intra-option policies. Evaluated on standard benchmarking environments, results show that a hybrid agent utilizing a quantum feature extractor can outperform classical baselines while saving up to 66\% trainable parameters. We also identify an architectural bottleneck that quantum option-value estimation severely degrades performance. Further ablation studies reveal how architectural choices of the quantum circuits affect performance. Our work establishes design principles for parameter-efficient hybrid hierarchical agents.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops a hybrid hierarchical reinforcement learning agent based on the option-critic architecture, replacing classical components (feature extractors, option-value functions, termination functions, and intra-option policies) with variational quantum circuits. On standard benchmarking environments, it reports that the variant using a quantum feature extractor outperforms classical baselines while using up to 66% fewer trainable parameters. It further identifies quantum option-value estimation as an architectural bottleneck and provides ablation studies on quantum circuit design choices.
Significance. If the empirical claims are substantiated with rigorous statistics and the simulation results prove representative of hardware, the work would usefully extend quantum advantages to hierarchical RL and supply concrete design principles for parameter-efficient hybrid agents. The explicit identification of the option-value bottleneck is a constructive contribution that can guide future architectures. The parameter-savings result, if robust, addresses a key practical constraint for near-term quantum devices.
major comments (3)
- [Abstract] Abstract and experimental evaluation: the central claim of outperformance over classical baselines is presented without error bars, number of independent runs, statistical significance tests, or full ablation depth. This directly undermines verification of the reported gains and the 66% parameter reduction.
- [Results] Experimental results (implied throughout): all evaluations rely on noiseless classical simulation of the variational quantum circuits. The manuscript does not model or mitigate gate infidelity, decoherence, or readout noise, which are known to alter loss landscapes and output distributions on near-term hardware and can erase both the performance edge and the parameter advantage once error-mitigation overhead is included.
- [Methods] Methods and experimental protocol: the full training protocol, environment details, data-handling rules, and exact definition of the 66% parameter savings (which components, which baseline) are absent, preventing reproduction or assessment of whether the hybrid agent truly delivers the claimed efficiency.
minor comments (1)
- [Abstract] The abstract would be clearer if it named the specific environments and the precise architectural configuration that achieves the 66% savings.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We have revised the manuscript to improve statistical reporting, clarify methodological details, and explicitly discuss simulation limitations. Point-by-point responses follow.
read point-by-point responses
-
Referee: [Abstract] Abstract and experimental evaluation: the central claim of outperformance over classical baselines is presented without error bars, number of independent runs, statistical significance tests, or full ablation depth. This directly undermines verification of the reported gains and the 66% parameter reduction.
Authors: We agree that the abstract should convey experimental rigor more clearly. The revised abstract now states that performance metrics are averaged over 10 independent random seeds with error bars shown as standard deviation, and notes that outperformance was assessed for statistical significance using paired t-tests (p < 0.05). The main text already contains extensive ablation studies on circuit depth, entanglement structure, and ansatz variants; we have added a summary table of these results to the abstract for completeness. The 66% parameter reduction is defined as the relative decrease in total trainable parameters when the classical feature extractor is replaced by the variational quantum circuit while keeping all other option-critic components fixed. revision: yes
-
Referee: [Results] Experimental results (implied throughout): all evaluations rely on noiseless classical simulation of the variational quantum circuits. The manuscript does not model or mitigate gate infidelity, decoherence, or readout noise, which are known to alter loss landscapes and output distributions on near-term hardware and can erase both the performance edge and the parameter advantage once error-mitigation overhead is included.
Authors: We acknowledge that all reported results use noiseless simulators, which is a standard first step for exploring hybrid quantum-classical algorithms. In the revised manuscript we have added a dedicated Limitations and Future Work subsection that (i) explicitly states the noiseless assumption, (ii) summarizes how depolarizing and readout noise typically affect variational quantum circuit outputs in RL settings, and (iii) references error-mitigation techniques (zero-noise extrapolation, probabilistic error cancellation) that could be applied in follow-up hardware studies. We also include a short sensitivity analysis under simulated moderate noise to illustrate that the parameter-efficiency advantage is not immediately erased. Full hardware benchmarking lies beyond the scope of the present work. revision: partial
-
Referee: [Methods] Methods and experimental protocol: the full training protocol, environment details, data-handling rules, and exact definition of the 66% parameter savings (which components, which baseline) are absent, preventing reproduction or assessment of whether the hybrid agent truly delivers the claimed efficiency.
Authors: We apologize for the insufficient detail. The revised Methods section now provides: complete hyperparameter tables (learning rates, discount factors, option durations, Adam optimizer settings, batch sizes, and total training steps); exact environment specifications and versions (e.g., Gymnasium 0.29 environments used); experience-replay buffer sizes and sampling rules; and the precise definition of parameter savings, computed as (P_classical - P_quantum)/P_classical where P_classical is the number of trainable parameters in the classical feature extractor baseline and P_quantum is the count for the variational quantum circuit variant. Pseudocode for the full training loop and a link to open-source code (to be released upon acceptance) have also been added. revision: yes
Circularity Check
No circularity; results are direct empirical comparisons on standard benchmarks
full rationale
The paper reports performance and parameter counts from explicit evaluations of a hybrid option-critic agent on standard RL environments, with ablations identifying bottlenecks such as quantum option-value estimation. No equations, derivations, or first-principles claims are presented that reduce the reported gains to fitted parameters, self-definitions, or self-citation chains. The central results rest on observable simulation outputs rather than any renaming, smuggling of ansatzes, or uniqueness theorems imported from prior author work.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
R. S. Sutton and A. G. Barto,Reinforcement Learning: An Introduction. Cambridge, MA, USA: A Bradford Book, 2018
2018
-
[2]
Mastering the game of go without human knowledge,
D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y . Chen, T. Lillicrap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel, and D. Hassabis, “Mastering the game of go without human knowledge,” Nature, vol. 550, no. 7676, pp. 354–359, 2017. [Online]. Available: https://www.nature.com/artic...
2017
-
[3]
arXiv preprint arXiv:1911.08265 , year =
J. Schrittwieser, I. Antonoglou, T. Hubert, K. Simonyan, L. Sifre, S. Schmitt, A. Guez, E. Lockhart, D. Hassabis, T. Graepel, T. Lillicrap, and D. Silver, “Mastering atari, go, chess and shogi by planning with a learned model,”Nature, vol. 588, no. 7839, pp. 604–609, Dec 2020. [Online]. Available: https://doi.org/10.1038/s41586-020-03051-4
-
[4]
Hindsight experience replay,
M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, O. Pieter Abbeel, and W. Zaremba, “Hindsight experience replay,” inAdvances in Neural Information Processing Systems, I. Guyon, U. V . Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates, Inc., 2017. [Online...
2017
-
[5]
Hierarchical reinforcement learning: A comprehensive survey,
S. Pateria, B. Subagdja, A.-h. Tan, and C. Quek, “Hierarchical reinforcement learning: A comprehensive survey,”ACM Comput. Surv., vol. 54, no. 5, Jun. 2021. [Online]. Available: https: //doi.org/10.1145/3453160
-
[6]
Reinforcement learning with hierarchies of machines,
R. Parr and S. Russell, “Reinforcement learning with hierarchies of machines,” inAdvances in Neural Information Processing Systems, M. Jordan, M. Kearns, and S. Solla, Eds., vol. 10. MIT Press, 1997. [Online]. Available: https://proceedings.neurips.cc/paper files/paper/199 7/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf
1997
-
[7]
Hierarchical reinforcement learning with the maxq value function decomposition,
T. G. Dietterich, “Hierarchical reinforcement learning with the maxq value function decomposition,”J. Artif. Int. Res., vol. 13, no. 1, p. 227–303, Nov. 2000
2000
-
[8]
Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning,
R. S. Sutton, D. Precup, and S. Singh, “Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning,”Artificial Intelligence, vol. 112, no. 1, pp. 181–211, 1999. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0004370299000521
1999
-
[9]
The option-critic architecture,
P.-L. Bacon, J. Harb, and D. Precup, “The option-critic architecture,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1, Feb. 2017. [Online]. Available: https://ojs.aaai.org/index.php/AAA I/article/view/10916
2017
-
[10]
A survey on quantum reinforcement learning,
N. Meyer, C. Ufrecht, M. Periyasamy, D. D. Scherer, A. Plinge, and C. Mutschler, “A survey on quantum reinforcement learning,” 2024. [Online]. Available: https://arxiv.org/abs/2211.03464
-
[11]
Reinforcement learning with quantum variational circuit,
O. Lockwood and M. Si, “Reinforcement learning with quantum variational circuit,”Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, vol. 16, no. 1, pp. 245–251, Oct. 2020. [Online]. Available: https://ojs.aaai.org/index.php/A IIDE/article/view/7437
2020
-
[12]
Parametrized quantum policies for reinforcement learning,
S. Jerbi, C. Gyurik, S. Marshall, H. Briegel, and V . Dunjko, “Parametrized quantum policies for reinforcement learning,” in Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y . Dauphin, P. Liang, and J. W. Vaughan, Eds., vol. 34. Curran Associates, Inc., 2021, pp. 28 362–28 375. [Online]. Available: https://proceedings.neuri...
2021
-
[13]
Quantum advantage actor-critic for reinforcement learning,
M. K ¨olle, M. Hgog, F. Ritz, P. Altmann, M. Zorn, J. Stein, and C. Linnhoff-Popien, “Quantum advantage actor-critic for reinforcement learning,” inProceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, INSTICC. SciTePress, 2024, pp. 297–304. [Online]. Available: https://www.scitepress.org/Paper s/2024/1...
2024
-
[14]
Variational Quantum Circuits for Deep Reinforcement Learning,
S. Y .-C. Chen, C.-H. H. Yang, J. Qi, P.-Y . Chen, X. Ma, and H.-S. Goan, “Variational Quantum Circuits for Deep Reinforcement Learning,”IEEE Access, vol. 8, pp. 141 007–141 024, 2020
2020
-
[15]
Introduction to Quantum Reinforcement Learning: Theory and PennyLane-based Implementation,
Y . Kwak, W. J. Yun, S. Jung, J.-K. Kim, and J. Kim, “Introduction to Quantum Reinforcement Learning: Theory and PennyLane-based Implementation,” in12th International Conference on Information and Communication Technology Convergence, 10 2021
2021
-
[16]
S. Sim, P. D. Johnson, and A. Aspuru-Guzik, “Expressibility and entangling capability of parameterized quantum circuits for hybrid quantum-classical algorithms,”Advanced Quantum Technologies, vol. 2, no. 12, p. 1900070, 2019. [Online]. Available: https: //advanced.onlinelibrary.wiley.com/doi/abs/10.1002/qute.201900070
-
[17]
C. J. C. H. Watkins and P. Dayan, “Q-learning,”Machine Learning, vol. 8, no. 3, pp. 279–292, May 1992. [Online]. Available: https://doi.org/10.1007/BF00992698
-
[18]
Human-level control through deep reinforcement learning,
V . Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,”Nature, vol. 518, no. 7540, pp. 529–533, 2015
2015
-
[19]
Deep reinforcement learning with double q-learning,
H. v. Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double q-learning,” inProceedings of the Thirtieth AAAI Conference on Artificial Intelligence, ser. AAAI’16. AAAI Press, 2016, p. 2094–2100
2016
-
[20]
Dueling network architectures for deep reinforcement learning,
Z. Wang, T. Schaul, M. Hessel, H. Van Hasselt, M. Lanctot, and N. De Freitas, “Dueling network architectures for deep reinforcement learning,” inProceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ser. ICML’16. JMLR.org, 2016, p. 1995–2003
2016
-
[21]
Rainbow: Combining improvements in deep reinforcement learning,
M. Hessel, J. Modayil, H. van Hasselt, T. Schaul, G. Ostrovski, W. Dabney, D. Horgan, B. Piot, M. Azar, and D. Silver, “Rainbow: Combining improvements in deep reinforcement learning,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, Apr
-
[22]
Available: https://ojs.aaai.org/index.php/AAAI/article/vi ew/11796
[Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/vi ew/11796
-
[23]
R. J. Williams, “Simple statistical gradient-following algorithms for connectionist reinforcement learning,”Mach. Learn., vol. 8, no. 3–4, p. 229–256, May 1992. [Online]. Available: https: //doi.org/10.1007/BF00992696
-
[24]
Continuous control with deep reinforcement learning,
T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y . Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” inInternational Conference on Learning Representations (ICLR), 2016
2016
-
[25]
Asynchronous methods for deep reinforcement learning,
V . Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep reinforcement learning,” inProceedings of The 33rd International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, M. F. Balcan and K. Q. Weinberger, Eds., vol. 48. New York, New York, USA: PMLR, 2...
2016
-
[26]
Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,
T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” inProceedings of the 35th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, J. Dy and A. Krause, Eds., vol. 80. PMLR, 10–15 Jul 2018, pp. 1861–1870. [Online]. Availa...
2018
-
[27]
Addressing function approximation error in actor-critic methods,
S. Fujimoto, H. van Hoof, and D. Meger, “Addressing function approximation error in actor-critic methods,” inProceedings of the 35th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, J. Dy and A. Krause, Eds., vol. 80. PMLR, 10–15 Jul 2018, pp. 1587–1596. [Online]. Available: https://proceedings.mlr.press/v80/fuj...
2018
-
[28]
Star: Bootstrapping reasoning with reasoning,
E. Zelikman, Y . Wu, J. Mu, and N. Goodman, “Star: Bootstrapping reasoning with reasoning,” inAdvances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35. Curran Associates, Inc., 2022, pp. 15 476–15 488. [Online]. Available: https://proceedings.neurips.cc/paper files/paper/202 2/file...
2022
-
[29]
Rl-star: Theoretical analysis of reinforcement learning frameworks for self-taught reasoner,
F.-C. Chang, Y .-T. Lee, H.-Y . Shih, Y . H. Tseng, and P.-Y . Wu, “Rl-star: Theoretical analysis of reinforcement learning frameworks for self-taught reasoner,” 2025. [Online]. Available: https://arxiv.org/abs/2410.23912
-
[30]
Learning options in reinforcement learning,
M. Stolle and D. Precup, “Learning options in reinforcement learning,” inAbstraction, Reformulation, and Approximation, S. Koenig and R. C. Holte, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2002, pp. 212–223
2002
-
[31]
Strategic attentive writer for learning macro-actions,
A. S. Vezhnevets, V . Mnih, J. Agapiou, S. Osindero, A. Graves, O. Vinyals, and K. Kavukcuoglu, “Strategic attentive writer for learning macro-actions,” inProceedings of the 30th International Conference on Neural Information Processing Systems, ser. NIPS’16. Red Hook, NY , USA: Curran Associates Inc., 2016, p. 3494–3502
2016
-
[32]
Dac: The double actor-critic architecture for learning options,
S. Zhang and S. Whiteson, “Dac: The double actor-critic architecture for learning options,” inAdvances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch ´e-Buc, E. Fox, and R. Garnett, Eds., vol. 32. Curran Associates, Inc., 2019. [Online]. Available: https://proceedings.neurips.cc/paper files/paper/201 9/file/...
2019
-
[33]
FeUdal networks for hierarchical reinforcement learning,
A. S. Vezhnevets, S. Osindero, T. Schaul, N. Heess, M. Jaderberg, D. Silver, and K. Kavukcuoglu, “FeUdal networks for hierarchical reinforcement learning,” inProceedings of the 34th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, D. Precup and Y . W. Teh, Eds., vol. 70. PMLR, 06–11 Aug 2017, pp. 3540–3549. [Onl...
2017
-
[34]
Data-efficient hierarchical reinforcement learning,
O. Nachum, S. Gu, H. Lee, and S. Levine, “Data-efficient hierarchical reinforcement learning,” inProceedings of the 32nd International Conference on Neural Information Processing Systems, ser. NIPS’18. Red Hook, NY , USA: Curran Associates Inc., 2018, p. 3307–3317
2018
-
[35]
Why does hierarchy (sometimes) work so well in reinforcement learning?
O. Nachum, H. Tang, X. Lu, S. Gu, H. Lee, and S. Levine, “Why does hierarchy (sometimes) work so well in reinforcement learning?” 2019. [Online]. Available: https://arxiv.org/abs/1909.10618
-
[36]
Quantum agents in the Gym: a variational quantum algorithm for deep Q-learning,
A. Skolik, S. Jerbi, and V . Dunjko, “Quantum agents in the Gym: a variational quantum algorithm for deep Q-learning,”Quantum, vol. 6, p. 720, May 2022. [Online]. Available: https://doi.org/10.22331/q-2022-0 5-24-720
-
[37]
Quantum reinforcement learning,
D. Dong, C. Chen, H. Li, and T.-J. Tarn, “Quantum reinforcement learning,”Trans. Sys. Man Cyber. Part B, vol. 38, no. 5, p. 1207–1220, Oct
-
[38]
Available: https://doi.org/10.1109/TSMCB.2008.925743
[Online]. Available: https://doi.org/10.1109/TSMCB.2008.925743
-
[39]
Data re-uploading for a universal quantum classifier,
A. P ´erez-Salinas, A. Cervera-Lierta, E. Gil-Fuster, and J. I. Latorre, “Data re-uploading for a universal quantum classifier,” Quantum, vol. 4, p. 226, Feb. 2020. [Online]. Available: https: //doi.org/10.22331/q-2020-02-06-226
-
[40]
doi:10.1103/physreva.103.032430
M. Schuld, R. Sweke, and J. J. Meyer, “Effect of data encoding on the expressive power of variational quantum-machine-learning models,” Phys. Rev. A, vol. 103, p. 032430, Mar 2021. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevA.103.032430
-
[41]
Nature549(7671), 242–246 (2017) https://doi.org/10.1038/nature23879
A. Kandala, A. Mezzacapo, K. Temme, M. Takita, M. Brink, J. M. Chow, and J. M. Gambetta, “Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets,”Nature, vol. 549, no. 7671, pp. 242–246, Sep 2017. [Online]. Available: https://doi.org/10.1038/nature23879
-
[42]
Asynchronous training of quantum reinforcement learning,
S. Y .-C. Chen, “Asynchronous training of quantum reinforcement learning,”Procedia Computer Science, vol. 222, pp. 321–330, 2023, international Neural Network Society Workshop on Deep Learning Innovations and Applications (INNS DLIA 2023). [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1877050923009365
2023
-
[43]
Generalization in quantum machine learning from few training data,
M. C. Caro, H.-Y . Huang, M. Cerezo, K. Sharma, A. Sornborger, L. Cincio, and P. J. Coles, “Generalization in quantum machine learning from few training data,”Nature Communications, vol. 13, no. 1, p. 4919, Aug 2022. [Online]. Available: https://doi.org/10.1038/s41467-022 -32550-3
-
[44]
Efficient relation extraction via quantum reinforcement learning,
X. Zhu, Y . Mu, X. Wang, and W. Zhu, “Efficient relation extraction via quantum reinforcement learning,”Complex & Intelligent Systems, vol. 10, p. 4009–4018, Feb. 2024
2024
-
[45]
Barren plateaus in quantum neural network training landscapes,
J. R. McClean, S. Boixo, V . N. Smelyanskiy, R. Babbush, and H. Neven, “Barren plateaus in quantum neural network training landscapes,”Nature communications, vol. 9, no. 1, p. 4812, 2018
2018
-
[46]
Robustness of quantum reinforcement learning under hardware errors,
A. Skolik, S. Mangini, T. B ¨ack, C. Macchiavello, and V . Dunjko, “Robustness of quantum reinforcement learning under hardware errors,” EPJ Quantum Technology, vol. 10, no. 1, p. 8, Feb 2023. [Online]. Available: https://doi.org/10.1140/epjqt/s40507-023-00166-1
-
[47]
Trainability issues in quantum policy gradients,
A. Sequeira, L. Paulo Santos, and L. Soares Barbosa, “Trainability issues in quantum policy gradients,”Machine Learning: Science and Technology, vol. 5, no. 3, p. 035037, aug 2024. [Online]. Available: https://doi.org/10.1088/2632-2153/ad6830
-
[48]
Policy gradient methods for reinforcement learning with function approximation,
R. S. Sutton, D. McAllester, S. Singh, and Y . Mansour, “Policy gradient methods for reinforcement learning with function approximation,” in Proceedings of the 13th International Conference on Neural Information Processing Systems, 2000, pp. 1057–1063
2000
-
[49]
PennyLane: Automatic differentiation of hybrid quantum-classical computations
V . Bergholm, J. Izaac, M. Schuld, C. Gogolin, S. Ahmed, V . Ajith, M. S. Alam, G. Alonso-Linaje, B. AkashNarayanan, A. Asadi, J. M. Arrazola, U. Azad, S. Banning, C. Blank, T. R. Bromley, B. A. Cordier, J. Ceroni, A. Delgado, O. D. Matteo, A. Dusko, T. Garg, D. Guala, A. Hayes, R. Hill, A. Ijaz, T. Isacsson, D. Ittah, S. Jahangiri, P. Jain, E. Jiang, A. ...
work page internal anchor Pith review arXiv 2022
-
[50]
Gymnasium: A standard interface for reinforcement learning environments,
M. Towers, A. Kwiatkowski, J. U. Balis, G. D. Cola, T. Deleu, M. Goul ˜ao, K. Andreas, M. Krimmel, A. KG, R. D. L. Perez-Vicente, J. K. Terry, A. Pierr ´e, S. V . Schulhoff, J. J. Tai, H. Tan, and O. G. Younis, “Gymnasium: A standard interface for reinforcement learning environments,” inThe Thirty-ninth Annual Conference on Neural Information Processing S...
2025
-
[51]
Adam: A method for stochastic optimization,
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
-
[52]
Adam: A Method for Stochastic Optimization
[Online]. Available: https://arxiv.org/abs/1412.6980
work page internal anchor Pith review arXiv
-
[53]
Learning to program quantum measurements for machine learning,
S. Y .-C. Chen, H.-H. Tseng, H.-Y . Lin, and S. Yoo, “Learning to program quantum measurements for machine learning,” in2025 IEEE International Conference on Quantum Computing and Engineering (QCE), vol. 01, 2025, pp. 1826–1836
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.