arxiv: 2604.13812 · v1 · submitted 2026-04-15 · 💻 cs.AI · quant-ph

Recognition: unknown

AlphaCNOT: Learning CNOT Minimization with Model-Based Planning

Carla Piazza, Daniele Lizzio Bosco, Giuseppe Serra, Jacopo Cossio, Riccardo Romanello

Authors on Pith no claims yet

Pith reviewed 2026-05-10 13:06 UTC · model grok-4.3

classification 💻 cs.AI quant-ph

keywords CNOT minimizationquantum circuit optimizationmodel-based RLMonte Carlo Tree Searchlinear reversible synthesistopology constrained synthesisquantum computing

0 comments

The pith

A model-based reinforcement learning approach using Monte Carlo Tree Search minimizes the number of CNOT gates in quantum circuits more effectively than existing methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces AlphaCNOT to solve the problem of reducing CNOT gates, which are key but error-prone operations in quantum computing. It frames the task as a planning problem where a learned model guides lookahead search through possible sequences of gates. This model-based method outperforms the standard PMH heuristic by up to 32 percent fewer gates in unconstrained linear reversible synthesis. It also beats current reinforcement learning techniques on circuits with topological constraints for up to 8 qubits. The work suggests that adding planning to learning can improve circuit optimization across various quantum tasks.

Core claim

AlphaCNOT is a model-based RL framework that uses Monte Carlo Tree Search with a learned model to plan sequences of CNOT gates. By evaluating future trajectories, it finds shorter circuits than non-planning RL or heuristic methods. On linear reversible synthesis it achieves up to 32% reduction in gate count compared to the PMH baseline, and on constrained topologies with up to 8 qubits it consistently reduces gates relative to state-of-the-art RL solutions.

What carries the argument

The Monte Carlo Tree Search planner guided by a learned model of CNOT effects within a reinforcement learning setup for discovering minimal gate sequences.

If this is right

Quantum circuits can be compiled with fewer CNOT operations, lowering the overall error rate on noisy hardware.
The approach works for both unconstrained synthesis and cases where only certain qubit pairs can interact due to hardware topology.
Similar model-based planning can be applied to minimize other sets of gates, such as in Clifford circuit optimization.
Smaller gate counts enable deeper algorithms on current quantum devices before errors dominate.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Extending the method to more than 8 qubits would require scaling the search efficiency or model accuracy.
Integrating AlphaCNOT into standard quantum software libraries could make optimized circuits available to more users.
Testing the optimized circuits on actual quantum hardware would reveal if the gate reductions translate to measurable improvements in fidelity.
Combining this with other optimization passes like gate cancellation might yield even larger savings.

Load-bearing premise

The learned model combined with tree search can consistently identify shorter global sequences of CNOTs than non-planning methods across different sizes and qubit connection graphs.

What would settle it

Finding a specific circuit or topology with 8 or fewer qubits where AlphaCNOT produces a CNOT count equal to or higher than the best PMH or RL baseline would disprove the consistent improvement claim.

Figures

Figures reproduced from arXiv: 2604.13812 by Carla Piazza, Daniele Lizzio Bosco, Giuseppe Serra, Jacopo Cossio, Riccardo Romanello.

**Figure 1.** Figure 1: AlphaCNOT Framework. A: The target circuit is encoded into its parity matrix. B: The system dynamics are modeled as a tree, where nodes corresponds to different circuits obtainable from the starting parity matrix, based on the values and policies of each node estimated by a pair of Neural Networks (C). D: Eventually, a suitable path from the root node to the identity matrix is identified. E: The CNOT seque… view at source ↗

**Figure 2.** Figure 2: Parity matrix 𝑀C of a given circuit C. On the other hand, any invertible Boolean matrix 𝑀 ∈ 𝐺𝐿(𝑛, F2) can be synthesized into a linear reversible circuit composed solely of CNOT gates. Property 1. Given a CNOT circuit C and its corresponding parity matrix 𝑀𝐶, performing a CNOT with control 𝑖 and target 𝑗 on C is equivalent of applying a XOR operation between rows 𝑖 and 𝑗 on the 𝑗-th row on the parity matri… view at source ↗

**Figure 3.** Figure 3: Two equivalent circuits C1 and C2. both of them associated with the parity matrix 𝑀 = © « 1 0 0 1 1 1 0 1 0 ª ® ® ¬ . Since each CNOT circuit 𝐶 can be mapped to a parity matrix 𝑀𝐶, and any CNOT circuit can be expressed as a concatenation 𝐶 = 𝐶1 · · ·𝐶𝑛 with 𝑀𝐶 = 𝑀𝑛 · · · 𝑀1 for the corresponding parity matrices 𝑀𝑖 , in what follows we will use the circuit and its parity matrix interchangeably whenever … view at source ↗

**Figure 4.** Figure 4: Four-panel visualization of MCTS paradigm. Selection: starting from the root node, which represents [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Performance gain of mixed reward over the informed one in the unconstrained setting. After the [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: Training Dynamics of “6-T” model. The green upper graph exhibits the total episode reward, while [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: Normalized gate count in the topological setting. Values represent the percentage of operations used by [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: Performance gain of mixed reward AlphaCNOT over Informed Baseline. The table reports the length reduction (in %) obtained when switching from the Hamming-based reward to the non-informed setting. 6 Conclusion As we steadily progress toward the so-called “quantum utility” [12, 16] phase of quantum computing, in which quantum technology will become a practical and reliable tool for research and industry, qua… view at source ↗

**Figure 9.** Figure 9: Inference time evaluation across all tested topologies. Results for RL-CL are reported from [ [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗

**Figure 10.** Figure 10: Training Dynamics of “6-Y”, “7-L”, and “8-T1” models shown as three parallel panels. Each panel [PITH_FULL_IMAGE:figures/full_fig_p021_10.png] view at source ↗

**Figure 11.** Figure 11: Complete set of the fourteen hardware topologies considered in the study, spanning configurations [PITH_FULL_IMAGE:figures/full_fig_p022_11.png] view at source ↗

read the original abstract

Quantum circuit optimization is a central task in Quantum Computing, as current Noisy Intermediate Scale Quantum devices suffer from error propagation that often scales with the number of operations. Among quantum operations, the CNOT gate is of fundamental importance, being the only 2-qubit gate in the universal Clifford+T set. The problem of CNOT gates minimization has been addressed by heuristic algorithms such as the well-known Patel-Markov-Hayes (PMH) for linear reversible synthesis (i.e., CNOT minimization with no topological constraints), and more recently by Reinforcement Learning (RL) based strategies in the more complex case of topology-aware synthesis, where each CNOT can act on a subset of all qubits pairs. In this work we introduce AlphaCNOT, a RL framework based on Monte Carlo Tree Search (MCTS) that address effectively the CNOT minimization problem by modeling it as a planning problem. In contrast to other RL- based solution, our method is model-based, i.e. it can leverage lookahead search to evaluate future trajectories, thus finding more efficient sequences of CNOTs. Our method achieves a reduction of up to 32% in CNOT gate count compared to PMH baseline on linear reversible synthesis, while in the constraint version we report a consistent gate count reduction on a variety of topologies with up to 8 qubits, with respect to state-of-the-art RL-based solutions. Our results suggest the combination of RL with search-based strategies can be applied to different circuit optimization tasks, such as Clifford minimization, thus fostering the transition toward the "quantum utility" era.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AlphaCNOT shows that adding MCTS lookahead to a learned model inside RL beats PMH and earlier RL baselines on CNOT minimization for instances up to 8 qubits.

read the letter

AlphaCNOT shows that adding MCTS lookahead to a learned model inside RL beats PMH and earlier RL baselines on CNOT minimization for instances up to 8 qubits. The core idea is to treat the problem as planning rather than pure policy learning, so the agent can evaluate future sequences before committing to a CNOT. That distinction is what the paper actually contributes beyond the cited heuristics and RL work. The experiments run direct comparisons on random linear-reversible cases and on several constrained topologies, report the training setup and search parameters, and show lower gate counts without internal contradictions in the methods or results. The full manuscript backs the abstract numbers with those details, so the central claim holds up on the evidence given. The main limitation is scale. All tests stay at or below 8 qubits, which is reasonable for a first demonstration but leaves the behavior on larger circuits unexamined. The reported gains are the best-case reductions; average improvement and variance across instances would make the practical takeaway clearer. The closing suggestion about Clifford minimization is not tested, so it remains an unproven extension. This paper is for people working on quantum circuit compilation or on search-augmented RL for discrete optimization tasks. A reader who needs concrete small-instance numbers and a clear description of the MCTS integration will find usable material. It deserves peer review because the formulation is explicit, the comparisons are direct, and the results are checkable from the provided details.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces AlphaCNOT, a model-based reinforcement learning framework that combines a learned dynamics model with Monte Carlo Tree Search (MCTS) lookahead to minimize CNOT gate counts. It addresses both unconstrained linear reversible synthesis (outperforming the PMH heuristic by up to 32%) and topology-constrained synthesis on circuits with up to 8 qubits (showing consistent reductions versus prior RL baselines). Training details, search hyperparameters, and results across multiple random instances and topologies are provided.

Significance. If the reported gains hold, the work demonstrates that model-based planning can discover shorter CNOT sequences than pure heuristics or non-planning RL methods. Explicit provision of training procedures, hyperparameters, and multi-instance experimental comparisons is a strength; the suggestion that the approach may extend to Clifford minimization is a natural implication worth exploring.

minor comments (3)

[Abstract] Abstract: the quantitative claims (32% reduction, 'up to 8 qubits') are stated without reference to the number of test instances, circuit sizes, or any statistical measures; a one-sentence summary of the experimental scale would improve readability.
[Results] Results section: tables or figures reporting CNOT counts should explicitly list the exact qubit numbers and topologies for each row/column so that the 'consistent reduction' claim can be verified at a glance.
[Methods] Notation: the distinction between the learned model and the MCTS planner is clear in the methods, but a short paragraph contrasting the state representation used here versus prior RL baselines would help readers unfamiliar with quantum circuit synthesis.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and accurate summary of our manuscript, as well as for recommending minor revision. We are encouraged that the model-based planning aspects of AlphaCNOT, the reported gains over PMH and prior RL baselines, and the experimental details were viewed favorably.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The manuscript describes an empirical RL framework (AlphaCNOT) that combines a learned model with MCTS lookahead for CNOT minimization. All performance claims are quantified via direct experimental comparisons against external baselines (PMH heuristic for unconstrained linear-reversible synthesis and prior RL methods for topology-constrained cases up to 8 qubits). No equations, fitted parameters, or first-principles derivations appear; reported reductions are measured on held-out random instances and topologies rather than self-referential quantities. The method description, training procedure, and search hyperparameters are stated explicitly, allowing independent reproduction and falsification against the cited external references. No self-citation load-bearing steps, uniqueness theorems, or ansatz smuggling are invoked to support the central result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review based on abstract only; full details on parameters and assumptions unavailable. The approach implicitly relies on the standard RL assumption that an accurate forward model exists for planning.

axioms (1)

domain assumption The quantum circuit environment is fully observable and accurately modelable for lookahead search
Model-based planning requires a reliable simulator of gate effects on the circuit state.

pith-pipeline@v0.9.0 · 5593 in / 1233 out tokens · 29762 ms · 2026-05-10T13:06:51.803900+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Equivariant Reinforcement Learning for Clifford Quantum Circuit Synthesis
quant-ph 2026-05 unverdicted novelty 7.0

Equivariant RL agent synthesizes near-optimal Clifford circuits up to 30 qubits with lower two-qubit gate counts than Qiskit baselines.

Reference graph

Works this paper leans on

46 extracted references · 12 canonical work pages · cited by 1 Pith paper · 3 internal anchors

[1]

Abrams and Seth Lloyd

Daniel S. Abrams and Seth Lloyd. 1999. Quantum Algorithm Providing Exponential Speed Increase for Finding Eigenvalues and Eigenvectors.Physical Review Letters83 (1999), 5162–5165

1999
[2]

Matthew Amy, Parsiad Azimzadeh, and Michele Mosca. 2018. On the controlled-NOT complexity of controlled- NOT–phase circuits.Quantum Science and Technology4 (2018), 015002

2018
[3]

Anderson Andrew G

Charles W. Anderson Andrew G. Barto, Richard S. Sutton. 1983. Neuronlike adaptive elements that can solve difficult learning control problems.IEEE Transactions on Systems, Man, and Cybernetics SMC13 (1983), 834–846

1983
[4]

Frank Arute, Kunal Arya, Ryan Babbush, Dave Bacon, Joseph C Bardin, Rami Barends, Rupak Biswas, Sergio Boixo, Fernando GSL Brandao, David A Buell, et al . 2019. Quantum supremacy using a programmable superconducting processor.Nature574 (2019), 505–510

2019
[5]

Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. InProceedings of the 26th annual international conference on machine learning

2009
[6]

2018.JAX: composable transformations of Python+NumPy programs

James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. 2018.JAX: composable transformations of Python+NumPy programs

2018
[7]

Colin D Bruzewicz, John Chiaverini, Robert McConnell, and Jeremy M Sage. 2019. Trapped-ion quantum computing: Progress and challenges.Applied physics reviews6 (2019)

2019
[8]

Stephen A. Cook. 1971. The complexity of theorem-proving procedures. InProceedings of the third annual ACM symposium on Theory of computing

1971
[9]

Rémi Coulom. 2007. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. InComputers and Games: 5th International Conference, CG 2006, Turin, Italy, May 29-31, 2006. Revised Papers

2007
[10]

Timothée Goubault de Brugière, Marc Baboulin, Benoît Valiron, Simon Martiel, and Cyril Allouche. 2021. Gaussian elimination versus greedy methods for the synthesis of linear reversible circuits.ACM Transactions on Quantum Computing2 (2021), 1–26

2021
[11]

2020.The DeepMind JAX Ecosystem

DeepMind, Igor Babuschkin, Kate Baumli, Alison Bell, Surya Bhupatiraju, Jake Bruce, Peter Buchlovsky, David Budden, Trevor Cai, Aidan Clark, Ivo Danihelka, Antoine Dedieu, Claudio Fantacci, Jonathan Godwin, Chris Jones, Ross Hemsley, Tom Hennigan, Matteo Hessel, Shaobo Hou, Steven Kapturowski, Thomas Keck, Iurii Kemaev, Michael King, Markus Kunesch, Lena ...

2020
[12]

Jens Eisert and John Preskill. 2025. Mind the gaps: The fraught road to quantum advantage. arXiv:2510.19928 [quant-ph] https://arxiv.org/abs/2510.19928

work page arXiv 2025
[13]

Edward Farhi, Jeffrey Goldstone, and Sam Gutmann. 2014. A Quantum Approximate Optimization Algorithm. arXiv:1411.4028 [quant-ph] https://arxiv.org/abs/1411.4028

work page internal anchor Pith review Pith/arXiv arXiv 2014
[14]

2013.Answer Set Solving in Practice

Martin Gebser, Roland Kaminski, Benjamin Kaufmann, and Torsten Schaub. 2013.Answer Set Solving in Practice. Springer International Publishing

2013
[15]

Grimsley, Sophia E

Harper R. Grimsley, Sophia E. Economou, Edwin Barnes, and Nicholas J. Mayhall. 2019. An adaptive variational algorithm for exact molecular simulations on a quantum computer.Nature Communications10 (2019), 3007

2019
[16]

Doherty, Angus Mingare, Jason C

Nils Herrmann, Daanish Arya, Marcus W. Doherty, Angus Mingare, Jason C. Pillay, Florian Preis, and Stefan Prestel
[17]

In2023 IEEE International Conference on Quantum Software (QSW)

Quantum utility – definition and assessment of a practical quantum advantage . In2023 IEEE International Conference on Quantum Software (QSW)
[18]

Minki Hhan, Takashi Yamakawa, and Aaram Yun. 2024. Quantum Complexity for Discrete Logarithms and Related Problems. InAdvances in Cryptology – CRYPTO 2024

2024
[19]

Toshinari Itoko, Rudy Raymond, Takashi Imamichi, and Atsushi Matsuo. 2020. Optimization of quantum circuit mapping using gate transformation and commutation.Integration70 (2020), 43–50

2020
[20]

Quantum computing with Qiskit

Ali Javadi-Abhari, Matthew Treinish, Kevin Krsulich, Christopher J. Wood, Jake Lishman, Julien Gacon, Simon Martiel, Paul D. Nation, Lev S. Bishop, Andrew W. Cross, Blake R. Johnson, and Jay M. Gambetta. 2024. Quantum computing with Qiskit. arXiv:2405.08810 [quant-ph]

work page internal anchor Pith review arXiv 2024
[21]

2020.Optimal Space-Depth Trade-Off of CNOT Circuits in Quantum Logic Synthesis

Jiaqing Jiang, Xiaoming Sun, Shang-Hua Teng, Bujiao Wu, Kewen Wu, and Jialin Zhang. 2020.Optimal Space-Depth Trade-Off of CNOT Circuits in Quantum Logic Synthesis. 213–229. 18 Cossio et al

2020
[22]

Petar Jurcevic, Ali Javadi-Abhari, Lev S Bishop, Isaac Lauer, Daniela F Bogorin, Markus Brink, Lauren Capelluto, Oktay Günlük, Toshinari Itoko, Naoki Kanazawa, Abhinav Kandala, George A Keefe, Kevin Krsulich, William Landers, Eric P Lewandowski, Douglas T McClure, Giacomo Nannicini, Adinath Narasgond, Hasan M Nayfeh, Emily Pritchett, Mary Beth Rothwell, S...

2021
[23]

Levente Kocsis and Csaba Szepesvári. 2006. Bandit Based Monte-Carlo Planning. InMachine Learning: ECML 2006, 17th European Conference on Machine Learning

2006
[24]

David Kremer, Ali Javadi-Abhari, and Priyanka Mukhopadhyay. 2025. Optimizing the non-Clifford-count in unitary synthesis using Reinforcement Learning. arXiv:2509.21709 [quant-ph]

work page arXiv 2025
[25]

David Kremer, Victor Villar, Hanhee Paik, Ivan Duran, Ismael Faro, and Juan Cruz-Benito. 2024. Practical and efficient quantum circuit synthesis and transpiling with reinforcement learning.arXiv:2405.13196(2024)

work page arXiv 2024
[26]

Sergey Levine, Peter Pastor, Alex Krizhevsky, Julian Ibarz, and Deirdre Quillen. 2018. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection.The International Journal of Robotics Research 37 (2018), 421–436

2018
[27]

Daniele Lizzio Bosco, Lukasz Cincio, Giuseppe Serra, and M. Cerezo. 2026. Quantum Circuit Pre-Synthesis: Learning Local Edits to Reduce𝑇-count. arXiv:2601.19738 [quant-ph]

work page arXiv 2026
[28]

Seth Lloyd. 1996. Universal Quantum Simulators.Science273 (1996), 1073–1078

1996
[29]

2018.SAT-based CNOT, T Quantum Circuit Synthesis: 10th International Conference, RC 2018, Leicester, UK, September 12-14, 2018, Proceedings

Giulia Meuli, Mathias Soeken, and Giovanni Micheli. 2018.SAT-based CNOT, T Quantum Circuit Synthesis: 10th International Conference, RC 2018, Leicester, UK, September 12-14, 2018, Proceedings. 175–188

2018
[30]

Kouhei Nakaji, Jonathan Wurtz, Haozhe Huang, Luis Mantilla Calderón, Karthik Panicker, Elica Kyoseva, and Alán Aspuru-Guzik. 2025. Quantum circuits as a game: A reinforcement learning agent for quantum compilation and its application to reconfigurable neutral atom arrays. arXiv:2506.05536 [quant-ph]

work page arXiv 2025
[31]

Patel, Igor L

Ketan N. Patel, Igor L. Markov, and John P. Hayes. 2003. Efficient Synthesis of Linear Reversible Circuits.Quantum Information and Computation8 (2003), 282–294

2003
[32]

Carla Piazza and Riccardo Romanello. 2023. Synthesis of CNOT minimal quantum circuits with topological constraints through ASP. InProceedings of the International Workshop on AI for Quantum and Quantum for AI (AIQxQIA 2023) (CEUR Workshop Proceedings)

2023
[33]

Carla Piazza, Riccardo Romanello, and Robert Wille. 2023. An ASP Approach for the Synthesis of CNOT Mini- mal Quantum Circuits. InProceedings of the 38th Italian Conference on Computational Logic, 2023 (CEUR Workshop Proceedings)

2023
[34]

John Preskill. 2018. Quantum Computing in the NISQ era and beyond.Quantum2 (2018), 79

2018
[35]

Estarellas

Jordi Riu, Jan Nogué, Gerard Vilaplana, Artur Garcia-Saez, and Marta P. Estarellas. 2025. Reinforcement Learning Based Quantum Circuit Optimization via ZX-Calculus.Quantum9 (2025), 1758

2025
[36]

Riccardo Romanello, Daniele Lizzio Bosco, Jacopo Cossio, Dusan Sutulovic, Giuseppe Serra, Carla Piazza, and Paolo Burelli. 2025. CNOT Minimal Circuit Synthesis: A Reinforcement Learning Approach. In2025 IEEE International Conference on Quantum Artificial Intelligence (QAI)

2025
[37]

Francisco J. R. Ruiz, Tuomas Laakkonen, Johannes Bausch, Matej Balog, Mohammadamin Barekatain, Francisco J. H. Heras, Alexander Novikov, Nathan Fitzpatrick, Bernardino Romera-Paredes, John van de Wetering, Alhussein Fawzi, Konstantinos Meichanetzidis, and Pushmeet Kohli. 2025. Quantum Circuit Optimization with AlphaTensor.Nature Machine Intelligence7 (202...

2025
[38]

Ben Schaeffer and Marek Perkowski. 2014. A Cost Minimization Approach to Synthesis of Linear Reversible Circuits. arXiv:1407.0070 [cs.ET]

work page arXiv 2014
[39]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal Policy Optimization Algorithms. arXiv:1707.06347

work page internal anchor Pith review Pith/arXiv arXiv 2017
[40]

Peter W. Shor. 1994. Algorithms for quantum computation: discrete logarithms and factoring. InProceedings 35th Annual Symposium on Foundations of Computer Science

1994
[41]

David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, et al. 2017. Mastering chess and shogi by self-play with a general reinforcement learning algorithm.arXiv:1712.01815(2017)

work page Pith review arXiv 2017
[42]

David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. 2017. Mastering the game of go without human knowledge.Nature 550 (2017), 354–359

2017
[43]

Sutton, Andrew G

Richard S. Sutton, Andrew G. Barto, et al. 1998.Reinforcement learning: An introduction. Vol. 1

1998
[44]

Sutton, David McAllester, Satinder Singh, and Yishay Mansour

Richard S. Sutton, David McAllester, Satinder Singh, and Yishay Mansour. 1999. Policy gradient methods for reinforce- ment learning with function approximation.Advances in neural information processing systems12 (1999). AlphaCNOT: Learning CNOT Minimization with Model-Based Planning 19

1999
[45]

Z. T. Wang, Qiuhao Chen, Yuxuan Du, Z. H. Yang, Xiaoxia Cai, Kaixuan Huang, Jingning Zhang, Kai Xu, Jun Du, Yinan Li, Yuling Jiao, Xingyao Wu, Wu Liu, Xiliang Lu, Huikai Xu, Yirong Jin, Ruixia Wang, Haifeng Yu, and S. P. Zhao. 2024. Quantum Compiling with Reinforcement Learning on a Superconducting Processor. arXiv:2406.12195 [quant-ph]

work page arXiv 2024
[46]

influence

Henry Zou, Matthew Treinish, Kevin Hartman, Alexander Ivrii, and Jake Lishman. 2024. LightSABRE: A lightweight and enhanced SABRE algorithm.arXiv:2409.08368(2024). 20 Cossio et al. Appendices This section provides supplementary materials, metrics and experimental results that support our algorithm, AlphaCNOT. The content is organized as follows: Appendix ...

work page arXiv 2024