pith. machine review for the scientific record. sign in

arxiv: 2604.23777 · v2 · submitted 2026-04-26 · 🪐 quant-ph

Recognition: unknown

Architecture-aware Unitary Synthesis

Authors on Pith no claims yet

Pith reviewed 2026-05-08 06:15 UTC · model grok-4.3

classification 🪐 quant-ph
keywords quantum transpilationunitary synthesisCNOT optimizationsuperconducting qubitsarchitecture-aware mappingblock-ZXZ decompositionGray code selection
0
0 comments X

The pith

An architecture-aware transpilation method for unitary synthesis reduces CNOT counts by up to 36 percent and speeds up synthesis by up to 553 times on superconducting hardware.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a method that folds hardware topology knowledge directly into the recursive block-ZXZ decomposition instead of applying it afterward. At each recursion level it chooses qubit mappings, Gray codes, and CNOT placements to match the target processor's connectivity. This produces both fewer two-qubit gates and dramatically shorter runtimes than treating decomposition and transpilation as separate steps. A sympathetic reader cares because exact unitary synthesis is a basic building block for quantum algorithms, and these gains make it feasible on present-day devices with limited coherence.

Core claim

By integrating greedy qubit mapping to minimize pairwise distances, adaptive Gray code selection combined with swaps for uniformly controlled Rz gates, and a heuristic that exploits long-range CNOT ladder structure, all inside the recursive block-ZXZ decomposition, the method achieves up to 36 percent CNOT reduction on the IQM Garnet square lattice and 34 percent on the IBM Marrakesh heavy-hex lattice, together with transpilation speedups up to 553 times, while remaining the only approach that can handle circuits beyond 10 qubits inside a 30-minute limit.

What carries the argument

Recursive embedding of architecture-aware choices inside the block-ZXZ decomposition, which lets mapping, Gray-code, and CNOT-ladder decisions occur at every recursion depth rather than in a separate post-processing stage.

If this is right

  • Circuits synthesized this way require fewer CNOT gates and are therefore shallower and less error-prone on the target hardware.
  • Transpilation times drop enough that exact synthesis of unitaries becomes practical for problems that previously timed out.
  • The same three techniques work on both square-lattice and heavy-hex topologies without architecture-specific redesign.
  • Beyond 10 qubits the method remains the only one that finishes inside the 30-minute cutoff while the others do not.
  • Simultaneous gains in gate count and speed are achieved rather than trading one for the other.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The recursive integration pattern could be ported to other decompositions that have a similar tree structure to gain similar efficiency on different gate sets.
  • For variational algorithms that call many different unitaries the cumulative time saving could become the dominant factor in overall runtime.
  • Replacing the distance-based greedy mapper with one that also considers measured error rates might produce even lower effective error rates.
  • Repeating the benchmarks on unitaries drawn from a wider distribution than the current test set would test whether the reported gains are tied to the specific benchmark instances.

Load-bearing premise

The performance advantages measured on 3-to-11-qubit benchmark instances will continue to hold for arbitrary unitaries and for qubit counts well beyond the tested range without further parameter tuning.

What would settle it

Run the method and the strongest competing transpiler on a random 15-qubit unitary on the IQM Garnet processor and check whether the CNOT count remains at least 20 percent lower and the runtime remains under the 30-minute limit.

Figures

Figures reproduced from arXiv: 2604.23777 by Arianne Meijer-van de Griend, Frans Perkkola, Jukka K. Nurminen.

Figure 1
Figure 1. Figure 1: A three-qubit unitary example of the CNOT merging optimization procedure. a) Decompose only the view at source ↗
Figure 2
Figure 2. Figure 2: An example of a long range CNOTv0 v3 gate. In this example, the shortest distance between the physical qubits v0 and v3 on the hardware is 3, and the shortest path is (v0, v1, v2, v3). a) q0 q1 q2 q3 Rz Rz Rz Rz Rz Rz Rz Rz b) q0 q1 q2 q3 Rz Rz Rz Rz Rz Rz Rz Rz view at source ↗
Figure 3
Figure 3. Figure 3: Four-qubit uniformly controlled Rz gates. a) The UC gate constructed using a binary reflected Gray code as in [21]. b) An equivalent construction for the UC gate using a different Gray code view at source ↗
Figure 5
Figure 5. Figure 5: d). This concludes our optimization process for the a) b) view at source ↗
read the original abstract

We present a novel architecture-aware transpilation method for exact general unitary gate synthesis on superconducting quantum hardware. Our approach is tightly integrated with the optimized block-ZXZ decomposition, exploiting its recursive structure to make hardware-aware decisions at each level of the recursion rather than treating transpilation as an independent post-processing step. The method introduces three key techniques: a greedy qubit mapping strategy that minimizes pairwise distances between physical qubits, an adaptive Gray code selection combined with qubit swapping that optimizes the construction of uniformly controlled Rz gates for the target topology, and a heuristic for reducing CNOT gates by exploiting the structure of long-range CNOT ladders. We benchmark our method against TKet, Qiskit, and Pennylane on the 20-qubit IQM Garnet (square lattice) and the 156-qubit IBM Marrakesh (heavy-hex) architectures with qubit counts ranging from 3 to 11. Our method achieves CNOT count reductions of up to 36 percent on the IQM Garnet and up to 34 percent on the IBM Marrakesh compared to the best competing transpiler, while simultaneously achieving transpilation speedups of up to 553x. Furthermore, our method is the only one capable of transpiling circuits beyond 10 qubits within a 30-minute time limit across both architectures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a new architecture-aware transpilation technique for synthesizing general unitary gates on superconducting quantum processors. By embedding hardware topology considerations directly into the recursive structure of the block-ZXZ decomposition, the method employs a greedy qubit mapping to minimize distances, an adaptive Gray code with qubit swaps for controlled-Rz gates, and a pruning heuristic for long-range CNOT ladders. On benchmarks with 3 to 11 qubits using random unitaries, it reports CNOT count reductions of up to 36% on the IQM Garnet (square lattice) and 34% on the IBM Marrakesh (heavy-hex) relative to the best of TKet, Qiskit, and Pennylane, along with speedups up to 553 times and the ability to handle circuits larger than 10 qubits within a 30-minute limit where others cannot.

Significance. Should the empirical advantages prove robust under standardized competitor configurations, the work would represent a meaningful advance in quantum circuit compilation by demonstrating that architecture-specific decisions can be profitably integrated into the synthesis recursion rather than applied post hoc. This could lead to lower gate counts and faster compilation times for exact unitary synthesis on fixed-topology hardware, with potential implications for near-term quantum algorithms requiring dense unitaries. The dual-architecture evaluation strengthens the case for topology-aware methods.

major comments (2)
  1. [Benchmarking experiments (as described in the abstract and experimental section)] The paper does not detail the specific configurations used for the baseline transpilers, such as Qiskit's optimization_level, TKet's synthesis options, or whether Pennylane's methods were run with architecture-aware routing enabled. Given that the proposed method incorporates topology at every recursion level while baselines are general tools, this omission makes it difficult to confirm that the reported 36% CNOT reduction and 553x speedup reflect a fair comparison rather than differences in configuration.
  2. [Heuristic descriptions (likely §3)] The greedy mapping, adaptive Gray-code selection, and CNOT ladder pruning are presented as heuristics without accompanying analysis of their approximation ratios or performance on worst-case unitaries. Since the benchmarks use only random unitaries up to 11 qubits, it is unclear whether the gains generalize or if they depend on the particular structure of the test instances.
minor comments (2)
  1. [Abstract] The phrase 'the best competing transpiler' is used for the percentage reductions, but it is not stated whether this is the same competitor for both CNOT count and runtime metrics, or if it varies per architecture.
  2. [Introduction or methods] A brief reference or equation for the block-ZXZ decomposition would help readers unfamiliar with the prior work on which the recursion is based.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments and recommendation for major revision. We address each major comment point by point below, providing clarifications and indicating the changes we will incorporate into the revised manuscript.

read point-by-point responses
  1. Referee: The paper does not detail the specific configurations used for the baseline transpilers, such as Qiskit's optimization_level, TKet's synthesis options, or whether Pennylane's methods were run with architecture-aware routing enabled. Given that the proposed method incorporates topology at every recursion level while baselines are general tools, this omission makes it difficult to confirm that the reported 36% CNOT reduction and 553x speedup reflect a fair comparison rather than differences in configuration.

    Authors: We agree that explicit details on baseline configurations are essential for verifying the fairness of the comparisons. In the revised manuscript, we will expand the Experimental Setup section to specify the exact parameters employed: Qiskit was run with optimization_level=3 and the target coupling map provided to enable architecture-aware routing and optimization; TKet used its default unitary synthesis passes combined with explicit architecture mapping to the device topology; and Pennylane's methods were executed with default settings while supplying the hardware connectivity graph for any routing operations. These additions will ensure full reproducibility and demonstrate that the reported improvements arise from our integrated architecture-aware synthesis rather than from differences in baseline setup. revision: yes

  2. Referee: The greedy mapping, adaptive Gray-code selection, and CNOT ladder pruning are presented as heuristics without accompanying analysis of their approximation ratios or performance on worst-case unitaries. Since the benchmarks use only random unitaries up to 11 qubits, it is unclear whether the gains generalize or if they depend on the particular structure of the test instances.

    Authors: We acknowledge that the proposed techniques are heuristics without derived approximation ratios, as a full theoretical analysis of their worst-case performance would constitute a separate and substantial theoretical contribution beyond the empirical focus of this work. Random unitaries constitute the standard benchmark for general unitary synthesis precisely because they are dense and lack special structure that could be exploited by specialized methods. Our evaluation across 100 independent random instances for each qubit count from 3 to 11 shows consistent CNOT reductions and speedups, indicating that the observed advantages are not artifacts of particular test cases. In the revision, we will add a dedicated paragraph in the Discussion section that explicitly characterizes the methods as heuristics, notes the lack of approximation guarantees, and recommends future evaluation on structured unitaries arising from concrete quantum algorithms to further assess generalization. revision: partial

Circularity Check

0 steps flagged

No circularity: performance claims rest on external benchmarks, not internal self-definition or fitted predictions.

full rationale

The paper describes a new recursive block-ZXZ-based transpilation algorithm incorporating greedy mapping, adaptive Gray-code swapping, and long-range CNOT heuristics, then reports empirical CNOT-count and runtime results on 3-11 qubit random unitaries for two specific hardware topologies. These results are obtained by direct comparison against independently implemented external tools (TKet, Qiskit, Pennylane) rather than by fitting parameters to the target metrics or re-deriving the same quantities from the method's own outputs. No equations, uniqueness theorems, or ansatzes are shown to reduce to self-citations or to the benchmark data itself; the derivation chain therefore remains self-contained against external reference implementations.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As an algorithmic and empirical paper on quantum transpilation, the abstract does not introduce mathematical axioms, free parameters, or new physical entities; the method relies on standard quantum computing assumptions and heuristic choices not detailed here.

pith-pipeline@v0.9.0 · 5532 in / 1320 out tokens · 108576 ms · 2026-05-08T06:15:10.957297+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 29 canonical work pages · 2 internal anchors

  1. [1]

    Abdurakhimov, J

    Leonid et al. Abdurakhimov. Technology and performance benchmarks of iqm’s 20-qubit quantum computer, 2024. URL: https://arxiv.org/abs/ 2408.12433,arXiv:2408.12433

  2. [2]

    On the controlled-NOT complexity of controlled-NOT–phase circuits,

    Matthew Amy, Parsiad Azimzadeh, and Michele Mosca. On the controlled-not complexity of controlled-not–phase circuits.Quantum Science and Technology, 4(1):015002, September 2018. URL: http:// dx.doi.org/10.1088/2058-9565/aad8ca,doi:10.1088/2058-9565/ aad8ca

  3. [3]

    Frank et al. Arute. Quantum supremacy using a programmable super- conducting processor.Nature, 574(7779):505–510, Oct 2019.doi: 10.1038/s41586-019-1666-5

  4. [4]

    Physi- cal Review A52(5), 3457–3467 (1995) https: //doi.org/10.1103/physreva.52.3457

    Adriano Barenco, Charles H. Bennett, Richard Cleve, David P. DiVin- cenzo, Norman Margolus, Peter Shor, Tycho Sleator, John A. Smolin, and Harald Weinfurter. Elementary gates for quantum computation. Phys. Rev. A, 52:3457–3467, Nov 1995. URL: https://link.aps.org/doi/ 10.1103/PhysRevA.52.3457,doi:10.1103/PhysRevA.52.3457

  5. [5]

    PennyLane: Automatic differentiation of hybrid quantum-classical computations

    Ville Bergholm, Josh Izaac, Maria Schuld, Christian Gogolin, Shah- nawaz Ahmed, Vishnu Ajber, M. Sohaib Alam, Guillermo Alonso- Linaje, B. AkashNarayanan, et al. Pennylane: Automatic differ- entiation of hybrid quantum-classical computations.arXiv preprint arXiv:1811.04968, 2018.arXiv:1811.04968

  6. [6]

    In: Samuel J

    Gilles Brassard, Peter Høyer, Michele Mosca, and Alain Tapp. Quantum amplitude amplification and estimation, 2002. URL: http://dx.doi.org/ 10.1090/conm/305/05215,doi:10.1090/conm/305/05215

  7. [7]

    Pre- processing of quantum circuit mapping based on circuit partitioning and unitary matrix decomposition

    Kexin Cao, Xueyun Cheng, Xinyu Chen, and Zhijin Guan. Pre- processing of quantum circuit mapping based on circuit partitioning and unitary matrix decomposition. In2024 4th International Conference on Electronics, Circuits and Information Engineering (ECIE), pages 30–34, 2024.doi:10.1109/ECIE61885.2024.10626814

  8. [8]

    Andrew M. Childs. Universal computation by quantum walk.Physical Review Letters, 102:180501, 2009.doi:10.1103/PhysRevLett. 102.180501

  9. [9]

    Childs, Dmitri Maslov, Yunseong Nam, Neil J

    Andrew M. Childs, Dmitri Maslov, Yunseong Nam, Neil J. Ross, and Yuan Su. Toward the first quantum simulation with quantum speedup. Proceedings of the National Academy of Sciences, 115(38):9456–9461, 2018.doi:10.1073/pnas.1801723115

  10. [10]

    Peephole optimization for quantum approximate synthesis

    Joseph Clark and Himanshu Thapliyal. Peephole optimization for quantum approximate synthesis. In2024 25th International Symposium on Quality Electronic Design (ISQED), pages 1–8, 2024.doi:10. 1109/ISQED60706.2024.10528701

  11. [11]

    De V os and S

    A. De V os and S. De Baerdemacker. Block-zxzsynthesis of an arbitrary quantum circuit.Phys. Rev. A, 94:052317, Nov 2016. URL: https://link.aps.org/doi/10.1103/PhysRevA.94.052317,doi:10. 1103/PhysRevA.94.052317

  12. [12]

    A Quantum Approximate Optimization Algorithm

    Edward Farhi, Jeffrey Goldstone, and Sam Gutmann. A quantum approximate optimization algorithm.arXiv preprint arXiv:1411.4028, 2014.arXiv:1411.4028

  13. [13]

    Quantum singular value transformation and beyond: exponential improvements for quantum matrix arithmetics

    Andr’as Gily’en, Yuan Su, Guang Hao Low, and Nathan Wiebe. Quan- tum singular value transformation and beyond: exponential improve- ments for quantum matrix arithmetics. InProceedings of the 51st Annual ACM Symposium on Theory of Computing (STOC), pages 193–204, 2019.doi:10.1145/3313276.3316366

  14. [14]

    PhD thesis, California Institute of Technology, 1997.doi:10.7907/ rzr7-dt72

    Daniel Gottesman.Stabilizer codes and quantum error correction. PhD thesis, California Institute of Technology, 1997.doi:10.7907/ rzr7-dt72

  15. [15]

    Lov K. Grover. A fast quantum mechanical algorithm for database search. InProceedings of the 28th Annual ACM Symposium on Theory of Computing (STOC), pages 212–219, 1996.doi:10.1145/237814. 237866

  16. [16]

    Harrow, Avinatan Hassidim, and Seth Lloyd

    Aram W. Harrow, Avinatan Hassidim, and Seth Lloyd. Quantum algorithm for linear systems of equations.Physical Review Letters, 103:150502, 2009.doi:10.1103/PhysRevLett.103.150502

  17. [17]

    A. C. Hughes, R. Srinivas, C. M. L ¨oschnauer, H. M. Knaack, R. Matt, C. J. Ballance, M. Malinowski, T. P. Harty, and R. T. Sutherland. Trapped-ion two-qubit gates with ¿99.99 URL: https://arxiv.org/abs/ 2510.17286,arXiv:2510.17286

  18. [18]

    Krol and Zaid Al-Ars

    Anna M. Krol and Zaid Al-Ars. Beyond quantum shannon decom- position: Circuit construction forn-qubit gates based on block-zxz decomposition.Phys. Rev. Appl., 22:034019, Sep 2024. URL: https:// link.aps.org/doi/10.1103/PhysRevApplied.22.034019,doi:10.1103/ PhysRevApplied.22.034019

  19. [19]

    Lloyd, Universal quantum simulators

    Seth Lloyd. Universal quantum simulators.Science, 273(5278):1073– 1078, 1996.doi:10.1126/science.273.5278.1073

  20. [20]

    Guang Hao Low and Isaac L. Chuang. Hamiltonian simula- tion by qubitization.Quantum, 3:163, 2019.doi:10.22331/ q-2019-07-12-163

  21. [21]

    Vartiainen, Ville Bergholm, and Martti M

    Mikko M ¨ott¨onen, Juha J. Vartiainen, Ville Bergholm, and Martti M. Salomaa. Quantum circuits for general multiqubit gates.Phys. Rev. Lett., 93:130502, Sep 2004. URL: https://link.aps.org/doi/10. 1103/PhysRevLett.93.130502,doi:10.1103/PhysRevLett.93. 130502

  22. [22]

    Quan- tum phase estimation of multiple eigenvalues for small-scale (noisy) experiments.New Journal of Physics, 21(2):023022, February

    Thomas E O’Brien, Brian Tarasinski, and Barbara M Terhal. Quan- tum phase estimation of multiple eigenvalues for small-scale (noisy) experiments.New Journal of Physics, 21(2):023022, February

  23. [23]

    1088/1367-2630/aafb8e

    URL: http://dx.doi.org/10.1088/1367-2630/aafb8e,doi:10. 1088/1367-2630/aafb8e

  24. [24]

    A substrate scheduler for compiling arbitrary fault-tolerant graph states,

    Jennifer Paykin, Albert T. Schmitz, Mohannad Ibrahim, Xin-Chuan Wu, and A. Y . Matsuura. Pcoast: A pauli-based quantum circuit optimization framework. In2023 IEEE International Conference on Quantum Computing and Engineering (QCE), volume 01, pages 715– 726, 2023.doi:10.1109/QCE57702.2023.00087

  25. [25]

    Love, Al’an Aspuru-Guzik, and Jeremy L

    Alberto Peruzzo, Jarrod McClean, Peter Shadbolt, Man-Hong Yung, Xiao-Qi Zhou, Peter J. Love, Al’an Aspuru-Guzik, and Jeremy L. O’Brien. A variational eigenvalue solver on a photonic quantum processor.Nature Communications, 5:4213, 2014.doi:10.1038/ ncomms5213

  26. [26]

    Plesch and ˇC

    Martin Plesch and ˇCaslav Brukner. Quantum-state preparation with universal gate decompositions.Phys. Rev. A, 83:032302, Mar 2011. URL: https://link.aps.org/doi/10.1103/PhysRevA.83.032302,doi:10. 1103/PhysRevA.83.032302

  27. [27]

    https://doi.org/10.5281/zenodo.2573505

    Qiskit contributors. Qiskit: An open-source framework for quantum computing.Zenodo, 2023.doi:10.5281/zenodo.2573505

  28. [28]

    An introduc- tion to quantum machine learning.Contemporary Physics, 56(2):172– 185, 2015.doi:10.1080/00107514.2014.964942

    Maria Schuld, Ilya Sinayskiy, and Francesco Petruccione. An introduc- tion to quantum machine learning.Contemporary Physics, 56(2):172– 185, 2015.doi:10.1080/00107514.2014.964942

  29. [29]

    Shende, Igor L

    Vivek V . Shende, Igor L. Markov, and Stephen S. Bullock. Minimal uni- versal two-qubit controlled-not-based circuits.Phys. Rev. A, 69:062321, Jun 2004. URL: https://link.aps.org/doi/10.1103/PhysRevA.69.062321, doi:10.1103/PhysRevA.69.062321

  30. [30]

    Shende, S.S

    V .V . Shende, S.S. Bullock, and I.L. Markov. Synthesis of quantum-logic circuits.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 25(6):1000–1010, 2006.doi:10.1109/TCAD. 2005.855930

  31. [31]

    Quantum Science and Technol- ogy6(1), 014003 (2020).https://doi.org/10.1088/2058-9565/ab8e92 A Notation T able 1.Notation used in the manuscript

    Seyon Sivarajah, Silas Dilkes, Alexander Cowtan, Will Simmons, Alec Edgington, and Ross Duncan. t—ket〉: a retargetable compiler for NISQ devices.Quantum Science and Technology, 6(1):014003, 2021.doi: 10.1088/2058-9565/ab8e92

  32. [32]

    Wide quantum circuit optimization with topology aware synthesis

    Mathias Weiden, Justin Kalloor, John Kubiatowicz, Ed Younis, and Costin Iancu. Wide quantum circuit optimization with topology aware synthesis. In2022 IEEE/ACM Third International Workshop on Quan- tum Computing Software (QCS), pages 1–11, 2022.doi:10.1109/ QCS56647.2022.00006

  33. [33]

    Younis, C

    Ed Younis, Costin C. Iancu, Wim Lavrijsen, Marc Davis, and Ethan Smith. Berkeley quantum synthesis toolkit (bqskit) v1. [Computer Software] https://doi.org/10.11578/dc.20210603.2, apr 2021.doi:10. 11578/dc.20210603.2