Recognition: unknown
Architecture-aware Unitary Synthesis
Pith reviewed 2026-05-08 06:15 UTC · model grok-4.3
The pith
An architecture-aware transpilation method for unitary synthesis reduces CNOT counts by up to 36 percent and speeds up synthesis by up to 553 times on superconducting hardware.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By integrating greedy qubit mapping to minimize pairwise distances, adaptive Gray code selection combined with swaps for uniformly controlled Rz gates, and a heuristic that exploits long-range CNOT ladder structure, all inside the recursive block-ZXZ decomposition, the method achieves up to 36 percent CNOT reduction on the IQM Garnet square lattice and 34 percent on the IBM Marrakesh heavy-hex lattice, together with transpilation speedups up to 553 times, while remaining the only approach that can handle circuits beyond 10 qubits inside a 30-minute limit.
What carries the argument
Recursive embedding of architecture-aware choices inside the block-ZXZ decomposition, which lets mapping, Gray-code, and CNOT-ladder decisions occur at every recursion depth rather than in a separate post-processing stage.
If this is right
- Circuits synthesized this way require fewer CNOT gates and are therefore shallower and less error-prone on the target hardware.
- Transpilation times drop enough that exact synthesis of unitaries becomes practical for problems that previously timed out.
- The same three techniques work on both square-lattice and heavy-hex topologies without architecture-specific redesign.
- Beyond 10 qubits the method remains the only one that finishes inside the 30-minute cutoff while the others do not.
- Simultaneous gains in gate count and speed are achieved rather than trading one for the other.
Where Pith is reading between the lines
- The recursive integration pattern could be ported to other decompositions that have a similar tree structure to gain similar efficiency on different gate sets.
- For variational algorithms that call many different unitaries the cumulative time saving could become the dominant factor in overall runtime.
- Replacing the distance-based greedy mapper with one that also considers measured error rates might produce even lower effective error rates.
- Repeating the benchmarks on unitaries drawn from a wider distribution than the current test set would test whether the reported gains are tied to the specific benchmark instances.
Load-bearing premise
The performance advantages measured on 3-to-11-qubit benchmark instances will continue to hold for arbitrary unitaries and for qubit counts well beyond the tested range without further parameter tuning.
What would settle it
Run the method and the strongest competing transpiler on a random 15-qubit unitary on the IQM Garnet processor and check whether the CNOT count remains at least 20 percent lower and the runtime remains under the 30-minute limit.
Figures
read the original abstract
We present a novel architecture-aware transpilation method for exact general unitary gate synthesis on superconducting quantum hardware. Our approach is tightly integrated with the optimized block-ZXZ decomposition, exploiting its recursive structure to make hardware-aware decisions at each level of the recursion rather than treating transpilation as an independent post-processing step. The method introduces three key techniques: a greedy qubit mapping strategy that minimizes pairwise distances between physical qubits, an adaptive Gray code selection combined with qubit swapping that optimizes the construction of uniformly controlled Rz gates for the target topology, and a heuristic for reducing CNOT gates by exploiting the structure of long-range CNOT ladders. We benchmark our method against TKet, Qiskit, and Pennylane on the 20-qubit IQM Garnet (square lattice) and the 156-qubit IBM Marrakesh (heavy-hex) architectures with qubit counts ranging from 3 to 11. Our method achieves CNOT count reductions of up to 36 percent on the IQM Garnet and up to 34 percent on the IBM Marrakesh compared to the best competing transpiler, while simultaneously achieving transpilation speedups of up to 553x. Furthermore, our method is the only one capable of transpiling circuits beyond 10 qubits within a 30-minute time limit across both architectures.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a new architecture-aware transpilation technique for synthesizing general unitary gates on superconducting quantum processors. By embedding hardware topology considerations directly into the recursive structure of the block-ZXZ decomposition, the method employs a greedy qubit mapping to minimize distances, an adaptive Gray code with qubit swaps for controlled-Rz gates, and a pruning heuristic for long-range CNOT ladders. On benchmarks with 3 to 11 qubits using random unitaries, it reports CNOT count reductions of up to 36% on the IQM Garnet (square lattice) and 34% on the IBM Marrakesh (heavy-hex) relative to the best of TKet, Qiskit, and Pennylane, along with speedups up to 553 times and the ability to handle circuits larger than 10 qubits within a 30-minute limit where others cannot.
Significance. Should the empirical advantages prove robust under standardized competitor configurations, the work would represent a meaningful advance in quantum circuit compilation by demonstrating that architecture-specific decisions can be profitably integrated into the synthesis recursion rather than applied post hoc. This could lead to lower gate counts and faster compilation times for exact unitary synthesis on fixed-topology hardware, with potential implications for near-term quantum algorithms requiring dense unitaries. The dual-architecture evaluation strengthens the case for topology-aware methods.
major comments (2)
- [Benchmarking experiments (as described in the abstract and experimental section)] The paper does not detail the specific configurations used for the baseline transpilers, such as Qiskit's optimization_level, TKet's synthesis options, or whether Pennylane's methods were run with architecture-aware routing enabled. Given that the proposed method incorporates topology at every recursion level while baselines are general tools, this omission makes it difficult to confirm that the reported 36% CNOT reduction and 553x speedup reflect a fair comparison rather than differences in configuration.
- [Heuristic descriptions (likely §3)] The greedy mapping, adaptive Gray-code selection, and CNOT ladder pruning are presented as heuristics without accompanying analysis of their approximation ratios or performance on worst-case unitaries. Since the benchmarks use only random unitaries up to 11 qubits, it is unclear whether the gains generalize or if they depend on the particular structure of the test instances.
minor comments (2)
- [Abstract] The phrase 'the best competing transpiler' is used for the percentage reductions, but it is not stated whether this is the same competitor for both CNOT count and runtime metrics, or if it varies per architecture.
- [Introduction or methods] A brief reference or equation for the block-ZXZ decomposition would help readers unfamiliar with the prior work on which the recursion is based.
Simulated Author's Rebuttal
We thank the referee for their constructive comments and recommendation for major revision. We address each major comment point by point below, providing clarifications and indicating the changes we will incorporate into the revised manuscript.
read point-by-point responses
-
Referee: The paper does not detail the specific configurations used for the baseline transpilers, such as Qiskit's optimization_level, TKet's synthesis options, or whether Pennylane's methods were run with architecture-aware routing enabled. Given that the proposed method incorporates topology at every recursion level while baselines are general tools, this omission makes it difficult to confirm that the reported 36% CNOT reduction and 553x speedup reflect a fair comparison rather than differences in configuration.
Authors: We agree that explicit details on baseline configurations are essential for verifying the fairness of the comparisons. In the revised manuscript, we will expand the Experimental Setup section to specify the exact parameters employed: Qiskit was run with optimization_level=3 and the target coupling map provided to enable architecture-aware routing and optimization; TKet used its default unitary synthesis passes combined with explicit architecture mapping to the device topology; and Pennylane's methods were executed with default settings while supplying the hardware connectivity graph for any routing operations. These additions will ensure full reproducibility and demonstrate that the reported improvements arise from our integrated architecture-aware synthesis rather than from differences in baseline setup. revision: yes
-
Referee: The greedy mapping, adaptive Gray-code selection, and CNOT ladder pruning are presented as heuristics without accompanying analysis of their approximation ratios or performance on worst-case unitaries. Since the benchmarks use only random unitaries up to 11 qubits, it is unclear whether the gains generalize or if they depend on the particular structure of the test instances.
Authors: We acknowledge that the proposed techniques are heuristics without derived approximation ratios, as a full theoretical analysis of their worst-case performance would constitute a separate and substantial theoretical contribution beyond the empirical focus of this work. Random unitaries constitute the standard benchmark for general unitary synthesis precisely because they are dense and lack special structure that could be exploited by specialized methods. Our evaluation across 100 independent random instances for each qubit count from 3 to 11 shows consistent CNOT reductions and speedups, indicating that the observed advantages are not artifacts of particular test cases. In the revision, we will add a dedicated paragraph in the Discussion section that explicitly characterizes the methods as heuristics, notes the lack of approximation guarantees, and recommends future evaluation on structured unitaries arising from concrete quantum algorithms to further assess generalization. revision: partial
Circularity Check
No circularity: performance claims rest on external benchmarks, not internal self-definition or fitted predictions.
full rationale
The paper describes a new recursive block-ZXZ-based transpilation algorithm incorporating greedy mapping, adaptive Gray-code swapping, and long-range CNOT heuristics, then reports empirical CNOT-count and runtime results on 3-11 qubit random unitaries for two specific hardware topologies. These results are obtained by direct comparison against independently implemented external tools (TKet, Qiskit, Pennylane) rather than by fitting parameters to the target metrics or re-deriving the same quantities from the method's own outputs. No equations, uniqueness theorems, or ansatzes are shown to reduce to self-citations or to the benchmark data itself; the derivation chain therefore remains self-contained against external reference implementations.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Leonid et al. Abdurakhimov. Technology and performance benchmarks of iqm’s 20-qubit quantum computer, 2024. URL: https://arxiv.org/abs/ 2408.12433,arXiv:2408.12433
-
[2]
On the controlled-NOT complexity of controlled-NOT–phase circuits,
Matthew Amy, Parsiad Azimzadeh, and Michele Mosca. On the controlled-not complexity of controlled-not–phase circuits.Quantum Science and Technology, 4(1):015002, September 2018. URL: http:// dx.doi.org/10.1088/2058-9565/aad8ca,doi:10.1088/2058-9565/ aad8ca
-
[3]
Frank et al. Arute. Quantum supremacy using a programmable super- conducting processor.Nature, 574(7779):505–510, Oct 2019.doi: 10.1038/s41586-019-1666-5
-
[4]
Physi- cal Review A52(5), 3457–3467 (1995) https: //doi.org/10.1103/physreva.52.3457
Adriano Barenco, Charles H. Bennett, Richard Cleve, David P. DiVin- cenzo, Norman Margolus, Peter Shor, Tycho Sleator, John A. Smolin, and Harald Weinfurter. Elementary gates for quantum computation. Phys. Rev. A, 52:3457–3467, Nov 1995. URL: https://link.aps.org/doi/ 10.1103/PhysRevA.52.3457,doi:10.1103/PhysRevA.52.3457
-
[5]
PennyLane: Automatic differentiation of hybrid quantum-classical computations
Ville Bergholm, Josh Izaac, Maria Schuld, Christian Gogolin, Shah- nawaz Ahmed, Vishnu Ajber, M. Sohaib Alam, Guillermo Alonso- Linaje, B. AkashNarayanan, et al. Pennylane: Automatic differ- entiation of hybrid quantum-classical computations.arXiv preprint arXiv:1811.04968, 2018.arXiv:1811.04968
work page internal anchor Pith review arXiv 2018
-
[6]
Gilles Brassard, Peter Høyer, Michele Mosca, and Alain Tapp. Quantum amplitude amplification and estimation, 2002. URL: http://dx.doi.org/ 10.1090/conm/305/05215,doi:10.1090/conm/305/05215
-
[7]
Kexin Cao, Xueyun Cheng, Xinyu Chen, and Zhijin Guan. Pre- processing of quantum circuit mapping based on circuit partitioning and unitary matrix decomposition. In2024 4th International Conference on Electronics, Circuits and Information Engineering (ECIE), pages 30–34, 2024.doi:10.1109/ECIE61885.2024.10626814
-
[8]
Andrew M. Childs. Universal computation by quantum walk.Physical Review Letters, 102:180501, 2009.doi:10.1103/PhysRevLett. 102.180501
-
[9]
Childs, Dmitri Maslov, Yunseong Nam, Neil J
Andrew M. Childs, Dmitri Maslov, Yunseong Nam, Neil J. Ross, and Yuan Su. Toward the first quantum simulation with quantum speedup. Proceedings of the National Academy of Sciences, 115(38):9456–9461, 2018.doi:10.1073/pnas.1801723115
-
[10]
Peephole optimization for quantum approximate synthesis
Joseph Clark and Himanshu Thapliyal. Peephole optimization for quantum approximate synthesis. In2024 25th International Symposium on Quality Electronic Design (ISQED), pages 1–8, 2024.doi:10. 1109/ISQED60706.2024.10528701
-
[11]
A. De V os and S. De Baerdemacker. Block-zxzsynthesis of an arbitrary quantum circuit.Phys. Rev. A, 94:052317, Nov 2016. URL: https://link.aps.org/doi/10.1103/PhysRevA.94.052317,doi:10. 1103/PhysRevA.94.052317
-
[12]
A Quantum Approximate Optimization Algorithm
Edward Farhi, Jeffrey Goldstone, and Sam Gutmann. A quantum approximate optimization algorithm.arXiv preprint arXiv:1411.4028, 2014.arXiv:1411.4028
work page internal anchor Pith review arXiv 2014
-
[13]
Andr’as Gily’en, Yuan Su, Guang Hao Low, and Nathan Wiebe. Quan- tum singular value transformation and beyond: exponential improve- ments for quantum matrix arithmetics. InProceedings of the 51st Annual ACM Symposium on Theory of Computing (STOC), pages 193–204, 2019.doi:10.1145/3313276.3316366
-
[14]
PhD thesis, California Institute of Technology, 1997.doi:10.7907/ rzr7-dt72
Daniel Gottesman.Stabilizer codes and quantum error correction. PhD thesis, California Institute of Technology, 1997.doi:10.7907/ rzr7-dt72
1997
-
[15]
Lov K. Grover. A fast quantum mechanical algorithm for database search. InProceedings of the 28th Annual ACM Symposium on Theory of Computing (STOC), pages 212–219, 1996.doi:10.1145/237814. 237866
-
[16]
Harrow, Avinatan Hassidim, and Seth Lloyd
Aram W. Harrow, Avinatan Hassidim, and Seth Lloyd. Quantum algorithm for linear systems of equations.Physical Review Letters, 103:150502, 2009.doi:10.1103/PhysRevLett.103.150502
- [17]
-
[18]
Anna M. Krol and Zaid Al-Ars. Beyond quantum shannon decom- position: Circuit construction forn-qubit gates based on block-zxz decomposition.Phys. Rev. Appl., 22:034019, Sep 2024. URL: https:// link.aps.org/doi/10.1103/PhysRevApplied.22.034019,doi:10.1103/ PhysRevApplied.22.034019
-
[19]
Lloyd, Universal quantum simulators
Seth Lloyd. Universal quantum simulators.Science, 273(5278):1073– 1078, 1996.doi:10.1126/science.273.5278.1073
-
[20]
Guang Hao Low and Isaac L. Chuang. Hamiltonian simula- tion by qubitization.Quantum, 3:163, 2019.doi:10.22331/ q-2019-07-12-163
2019
-
[21]
Vartiainen, Ville Bergholm, and Martti M
Mikko M ¨ott¨onen, Juha J. Vartiainen, Ville Bergholm, and Martti M. Salomaa. Quantum circuits for general multiqubit gates.Phys. Rev. Lett., 93:130502, Sep 2004. URL: https://link.aps.org/doi/10. 1103/PhysRevLett.93.130502,doi:10.1103/PhysRevLett.93. 130502
-
[22]
Quan- tum phase estimation of multiple eigenvalues for small-scale (noisy) experiments.New Journal of Physics, 21(2):023022, February
Thomas E O’Brien, Brian Tarasinski, and Barbara M Terhal. Quan- tum phase estimation of multiple eigenvalues for small-scale (noisy) experiments.New Journal of Physics, 21(2):023022, February
-
[23]
URL: http://dx.doi.org/10.1088/1367-2630/aafb8e,doi:10. 1088/1367-2630/aafb8e
-
[24]
A substrate scheduler for compiling arbitrary fault-tolerant graph states,
Jennifer Paykin, Albert T. Schmitz, Mohannad Ibrahim, Xin-Chuan Wu, and A. Y . Matsuura. Pcoast: A pauli-based quantum circuit optimization framework. In2023 IEEE International Conference on Quantum Computing and Engineering (QCE), volume 01, pages 715– 726, 2023.doi:10.1109/QCE57702.2023.00087
-
[25]
Love, Al’an Aspuru-Guzik, and Jeremy L
Alberto Peruzzo, Jarrod McClean, Peter Shadbolt, Man-Hong Yung, Xiao-Qi Zhou, Peter J. Love, Al’an Aspuru-Guzik, and Jeremy L. O’Brien. A variational eigenvalue solver on a photonic quantum processor.Nature Communications, 5:4213, 2014.doi:10.1038/ ncomms5213
2014
-
[26]
Martin Plesch and ˇCaslav Brukner. Quantum-state preparation with universal gate decompositions.Phys. Rev. A, 83:032302, Mar 2011. URL: https://link.aps.org/doi/10.1103/PhysRevA.83.032302,doi:10. 1103/PhysRevA.83.032302
-
[27]
https://doi.org/10.5281/zenodo.2573505
Qiskit contributors. Qiskit: An open-source framework for quantum computing.Zenodo, 2023.doi:10.5281/zenodo.2573505
-
[28]
Maria Schuld, Ilya Sinayskiy, and Francesco Petruccione. An introduc- tion to quantum machine learning.Contemporary Physics, 56(2):172– 185, 2015.doi:10.1080/00107514.2014.964942
-
[29]
Vivek V . Shende, Igor L. Markov, and Stephen S. Bullock. Minimal uni- versal two-qubit controlled-not-based circuits.Phys. Rev. A, 69:062321, Jun 2004. URL: https://link.aps.org/doi/10.1103/PhysRevA.69.062321, doi:10.1103/PhysRevA.69.062321
-
[30]
V .V . Shende, S.S. Bullock, and I.L. Markov. Synthesis of quantum-logic circuits.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 25(6):1000–1010, 2006.doi:10.1109/TCAD. 2005.855930
-
[31]
Seyon Sivarajah, Silas Dilkes, Alexander Cowtan, Will Simmons, Alec Edgington, and Ross Duncan. t—ket〉: a retargetable compiler for NISQ devices.Quantum Science and Technology, 6(1):014003, 2021.doi: 10.1088/2058-9565/ab8e92
-
[32]
Wide quantum circuit optimization with topology aware synthesis
Mathias Weiden, Justin Kalloor, John Kubiatowicz, Ed Younis, and Costin Iancu. Wide quantum circuit optimization with topology aware synthesis. In2022 IEEE/ACM Third International Workshop on Quan- tum Computing Software (QCS), pages 1–11, 2022.doi:10.1109/ QCS56647.2022.00006
-
[33]
Ed Younis, Costin C. Iancu, Wim Lavrijsen, Marc Davis, and Ethan Smith. Berkeley quantum synthesis toolkit (bqskit) v1. [Computer Software] https://doi.org/10.11578/dc.20210603.2, apr 2021.doi:10. 11578/dc.20210603.2
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.