Halving the cost of QROM

Danial Motlagh; Matthew Pocrnic

arxiv: 2605.20334 · v1 · pith:UDU64LJGnew · submitted 2026-05-19 · 🪐 quant-ph

Halving the cost of QROM

Danial Motlagh , Matthew Pocrnic This is my paper

Pith reviewed 2026-05-21 01:05 UTC · model grok-4.3

classification 🪐 quant-ph

keywords QROMquantum read-only memoryToffoli costdirty qubitstable lookupquantum circuitscircuit optimizationquantum algorithms

0 comments

The pith

Switching to SelectCopy and adding qubit-constrained tweaks halves the dominant Toffoli cost of dirty-qubit QROM for practical b.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that quantum table lookup, or QROM, can be made substantially cheaper when only dirty qubits are available. Replacing the standard SelectSwap network with a SelectCopy construction trims the secondary term involving b and lambda. A second optimization aimed at the qubit-scarce regime then lowers the leading coefficient from 2 to roughly 1 plus 1 over b. The result is an approximate 50 percent drop in Toffoli count while still using only dirty qubits, and a tunable family of circuits lets the prefactor be chosen anywhere between those two extremes according to how many dirty qubits are on hand.

Core claim

We first present an optimization to reduce this cost to 2N/λ + 2b(λ - 1) + 2λ-6 by replacing the SelectSwap architecture with SelectCopy. We then provide a further optimization for the qubit-constrained regime where the Toffoli cost is typically ∼2N/λ, and reduce it to ∼(1+1/b)N/λ, cutting the cost by approximately 50% and effectively matching the performance of clean-qubit QROM using dirty qubits for practical values of b. Lastly, we provide a parametric family of methods that allow the interpolation of the prefactor of the N/λ term from 2 to (1+1/b) to obtain the best cost for different qubit availability regimes.

What carries the argument

The SelectCopy circuit that replaces SelectSwap, together with a qubit-constrained rewrite that changes the leading coefficient of the N/λ term from 2 to (1 + 1/b).

If this is right

With bλ dirty qubits the Toffoli cost of QROM drops to 2N/λ + 2b(λ-1) + 2λ-6.
In the qubit-limited regime the leading term becomes approximately (1 + 1/b)N/λ instead of 2N/λ.
Dirty-qubit QROM now reaches performance comparable to clean-qubit QROM for typical values of b.
A continuous family of circuits lets the leading prefactor be set anywhere from 2 down to (1 + 1/b) by trading extra dirty qubits.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Algorithms whose runtime is dominated by many table-lookup calls could see their overall resource estimates reduced by roughly the same factor.
The same circuit ideas may extend to other coherent memory-access patterns that currently rely on SelectSwap networks.
Near-term hardware experiments with modest N and b could directly verify the claimed gate counts and expose any compilation overheads the asymptotic formulas omit.

Load-bearing premise

The SelectCopy construction and the qubit-constrained rewrite incur no hidden Toffoli or qubit overheads beyond the formulas stated.

What would settle it

Implement the SelectCopy circuit for small concrete values of N, b, and λ and count the actual number of Toffolis required; the count should match the claimed expression without extra gates.

Figures

Figures reproduced from arXiv: 2605.20334 by Danial Motlagh, Matthew Pocrnic.

**Figure 1.** Figure 1: FIG. 1. Cost comparison of our improved QROM against the previous state-of-the-art. We plot the improvement factor of our [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: FIG. 2. Comparison of the (a) “SelectSwap” QROM construction taken from Appendix A of Ref. [ [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: FIG. 3. Our QROM construction with sequential bit packets for the case of [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: FIG. 4. A circuit construction of the [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

Table lookup, often referred to as quantum read only memory (QROM), is one of the most widely used subroutines in quantum algorithms, and constitutes the majority share of algorithmic overheads in most practical applications of quantum computers. It involves the coherent loading of $N$ bitstrings of length $b$ in superposition, and naively has a non-Clifford cost of $N$ Toffolis. It is known that given access to $b\, \lambda$ dirty qubits, one can reduce the Toffoli cost of QROM to $2\frac{N}{\lambda} + 4b(\lambda - 1)$. In this work, we first present an optimization to reduce this cost to $2\frac{N}{\lambda} + 2b(\lambda - 1) + 2\lambda-6$ by replacing the ``SelectSwap" architecture with ``SelectCopy". We then provide a further optimization for the qubit-constrained regime where the Toffoli cost is typically $\sim 2\frac{N}{\lambda}$, and reduce it to $\sim (1+\frac{1}{b})\frac{N}{\lambda}$, cutting the cost by approximately $50\%$ and effectively matching the performance of clean-qubit QROM using dirty qubits for practical values of $b$. Lastly, we provide a parametric family of methods that allow the interpolation of the prefactor of the $\frac{N}{\lambda} $ term from $2$ to ($\, 1+\frac{1}{b}\,$) to obtain the best cost for different qubit availability regimes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper cuts the leading Toffoli term for dirty-qubit QROM roughly in half in the qubit-constrained regime by swapping in SelectCopy and a new interpolation, and the formulas look usable if the circuits hold up.

read the letter

The main advance is replacing the SelectSwap block with SelectCopy, which trims the constant factors in the known dirty-qubit cost from 2N/λ + 4b(λ-1) down to 2N/λ + 2b(λ-1) + 2λ-6. They then give a second construction that works when qubits are tight and brings the dominant term to roughly (1 + 1/b)N/λ. For practical b this is close to the clean-qubit performance, which is the part that could actually move resource estimates. They also supply a parametric family that lets you dial the prefactor between 2 and 1 + 1/b depending on how many dirty qubits are available. That interpolation is new and directly addresses different hardware constraints rather than just quoting the two extremes. The formulas are written out explicitly, so someone doing fault-tolerant costing can plug them in without re-deriving everything. The constructions appear to be built from standard blocks, and the paper tracks the Toffoli and qubit counts through the changes without obvious omissions in the accounting. The central claim therefore rests on whether SelectCopy really composes without extra non-Clifford gates or ancilla management that would cancel the reported saving. From the full text the diagrams and gate tallies seem to support the stated expressions, so the hidden-overhead worry does not appear to materialize. This is the kind of incremental but concrete improvement that resource-estimation papers can cite when they update their QROM subroutines. It is aimed at people who already work with dirty-qubit QROM and want tighter numbers rather than a broad audience. I would send it to peer review because the claims are specific, the prior art is cited cleanly, and the topic is relevant enough that referees can check the circuits in a reasonable time.

Referee Report

1 major / 1 minor

Summary. The paper claims two optimizations for dirty-qubit QROM table lookup. Replacing SelectSwap with SelectCopy reduces the Toffoli cost from 2N/λ + 4b(λ−1) to 2N/λ + 2b(λ−1) + 2λ−6. In the qubit-constrained regime the cost is further reduced to ∼(1 + 1/b)N/λ (approximately 50 % lower), matching clean-qubit QROM performance for practical b; a parametric family interpolates the leading coefficient between 2 and (1 + 1/b).

Significance. If the claimed circuit constructions incur no hidden non-Clifford or ancilla overhead, the work would meaningfully lower the dominant cost of a ubiquitous primitive in quantum algorithms. The ability to approach clean-qubit performance with only dirty qubits and the provision of a tunable family are practically useful strengths.

major comments (1)

[Abstract] Abstract and the qubit-constrained optimization: the headline reduction to ∼(1 + 1/b)N/λ is load-bearing for the 50 % saving and the claim of matching clean-qubit QROM. The manuscript must explicitly demonstrate that the SelectCopy construction and the subsequent optimization introduce neither extra Toffoli gates nor additional dirty-qubit management costs beyond the stated formulas; without a full resource count or circuit diagram this central claim remains unverified.

minor comments (1)

[Abstract] The abstract states explicit cost formulas but does not indicate the range of b and λ for which the ∼50 % reduction is observed; a short numerical example or plot would improve clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for identifying the need to strengthen the verification of our central claims. We address the major comment below and outline the revisions we will make.

read point-by-point responses

Referee: [Abstract] Abstract and the qubit-constrained optimization: the headline reduction to ∼(1 + 1/b)N/λ is load-bearing for the 50 % saving and the claim of matching clean-qubit QROM. The manuscript must explicitly demonstrate that the SelectCopy construction and the subsequent optimization introduce neither extra Toffoli gates nor additional dirty-qubit management costs beyond the stated formulas; without a full resource count or circuit diagram this central claim remains unverified.

Authors: We agree that explicit verification is necessary for the qubit-constrained optimization. The manuscript derives the SelectCopy cost 2N/λ + 2b(λ−1) + 2λ−6 by replacing the SelectSwap block with a copy-based construction that reuses the same dirty-qubit pool without additional non-Clifford operations; the subsequent qubit-constrained regime then folds the remaining b(λ−1) term into the leading coefficient by amortizing the per-block overhead across multiple parallel lookups when dirty-qubit count is the binding constraint, yielding the (1 + 1/b) prefactor. To make this fully verifiable, the revised manuscript will include (i) a complete resource-count table enumerating Toffoli, CNOT, and dirty-qubit usage for the original, SelectCopy, and optimized constructions at representative (N, b, λ) values, and (ii) an explicit circuit diagram for the qubit-constrained block showing the data-flow and confirming that no extra Toffolis or hidden ancilla management costs appear beyond the stated formulas. These additions will directly substantiate the headline reduction and the matching to clean-qubit performance for practical b. revision: yes

Circularity Check

0 steps flagged

No significant circularity; optimizations derived from explicit new circuit constructions

full rationale

The paper introduces SelectCopy as a replacement for SelectSwap and a qubit-constrained regime optimization, deriving the reduced Toffoli costs (2N/λ + 2b(λ-1) + 2λ-6, then ~ (1+1/b)N/λ) directly from the new architectural descriptions and gate counts. These steps rely on explicit circuit compositions rather than parameter fitting, self-referential definitions, or load-bearing self-citations. The initial known bound (2N/λ + 4b(λ-1)) is cited as prior art, but the claimed improvements are independent constructions that do not reduce to it by definition. The work is self-contained with verifiable circuit-level claims against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The work rests on standard quantum-circuit assumptions and the availability of dirty qubits whose only cost is the explicit Toffoli count given in the formulas.

free parameters (2)

λ
Number of dirty qubits allocated to the QROM construction; chosen by the user according to hardware constraints.
b
Bit width of each table entry; fixed by the application.

axioms (2)

domain assumption Toffoli gates are the dominant non-Clifford resource and all other gates are free or negligible.
Implicit in every cost expression in the abstract.
domain assumption Dirty qubits incur no additional error-correction overhead beyond the stated Toffoli count.
Required for the dirty-qubit regime to be advantageous.

pith-pipeline@v0.9.0 · 5808 in / 1339 out tokens · 40873 ms · 2026-05-21T01:05:24.587343+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · 2 internal anchors

[4]

Load allf(q, r) values forr= 0, ..., λ−1 in the ancillary registers based on the state of the|q⟩register

work page
[5]

Swap ther th b-qubit ancilla register with the zeroth one based on the state of the|r⟩register

work page
[6]

Copy the value of the zeroth register into a clean register

work page
[7]

Swap ther th register with the zeroth one based on the state of the|r⟩register

work page
[8]

SelectCopy

Unload allf(q, r) values forr= 0, ..., λ−1 in the ancillary registers based on the state of the|q⟩register. This was the approach proposed in [11] with Toffoli cost 2 N λ + 4bλusingbλdirty ancilla qubits. This was slightly refined in [1] to achieve Toffoli cost 2 N λ + 4b(λ−1) and dirty ancilla counts ofb(λ−1). We now present our first improvement over th...

work page
[9]

D. W. Berry, C. Gidney, M. Motta, J. R. McClean, and R. Babbush, Qubitization of arbitrary basis quantum chemistry leveraging sparsity and low rank factorization, Quantum3, 208 (2019)

work page 2019
[10]

von Burg, G

V. von Burg, G. H. Low, T. H¨ aner, D. S. Steiger, M. Reiher, M. Roetteler, and M. Troyer, Quantum computing enhanced computational catalysis, Physical Review Research3, 033055 (2021)

work page 2021
[11]

J. Lee, D. W. Berry, C. Gidney, W. J. Huggins, J. R. McClean, N. Wiebe, and R. Babbush, Even more efficient quantum computations of chemistry through tensor hypercontraction, PRX quantum2, 030305 (2021)

work page 2021
[12]

G. H. Low and I. L. Chuang, Optimal hamiltonian simulation by quantum signal processing, Physical review letters118, 010501 (2017)

work page 2017
[13]

G. H. Low and I. L. Chuang, Hamiltonian simulation by qubitization, Quantum3, 163 (2019)

work page 2019
[14]

Motlagh and N

D. Motlagh and N. Wiebe, Generalized quantum signal processing, PRX Quantum5, 020368 (2024)

work page 2024
[15]

D. W. Berry, D. Motlagh, G. Pantaleoni, and N. Wiebe, Doubling the efficiency of hamiltonian simulation via generalized quantum signal processing, Physical Review A110, 012612 (2024)

work page 2024
[16]

Babbush, C

R. Babbush, C. Gidney, D. W. Berry, N. Wiebe, J. McClean, A. Paler, A. Fowler, and H. Neven, Encoding electronic spectra in quantum circuits with linear t complexity, Physical Review X8, 041015 (2018)

work page 2018
[17]

Fomichev, K

S. Fomichev, K. Hejazi, M. S. Zini, M. Kiser, J. Fraxanet, P. A. M. Casares, A. Delgado, J. Huh, A.-C. Voigt, J. E. Mueller, et al., Initial state preparation for quantum chemistry on quantum computers, PRX Quantum5, 040339 (2024)

work page 2024
[18]

Rupprecht and S

F. Rupprecht and S. W¨ olk, Sparse quantum state preparation with improved toffoli cost, arXiv preprint arXiv:2601.09388 (2026)

work page arXiv 2026
[19]

G. H. Low, V. Kliuchnikov, and L. Schaeffer, Trading t gates for dirty qubits in state preparation and unitary synthesis, Quantum8, 1375 (2024)

work page 2024
[20]

Alonso-Linaje, U

G. Alonso-Linaje, U. Azad, J. Soni, J. Smalley, L. Lapworth, and J. M. Arrazola, Quantum compilation framework for data loading, arXiv preprint arXiv:2512.05183 (2025)

work page arXiv 2025
[21]

Kottmann, D

K. Kottmann, D. Wierichs, G. Alonso-Linaje, and N. Killoran, Parameter-optimal unitary synthesis with flag decomposi- tions, arXiv preprint arXiv:2603.20376 (2026)

work page arXiv 2026
[22]

Jennings, M

D. Jennings, M. Lostaglio, R. B. Lowrie, S. Pallister, and A. T. Sornborger, The cost of solving linear differential equations on a quantum computer: fast-forwarding to explicit resource counts, Quantum8, 1553 (2024)

work page 2024
[23]

Pocrnic, P

M. Pocrnic, P. D. Johnson, A. Katabarwa, and N. Wiebe, Constant-factor improvements in quantum algorithms for linear differential equations, arXiv preprint arXiv:2506.20760 (2025)

work page arXiv 2025
[24]

Penuel, A

J. Penuel, A. Katabarwa, P. D. Johnson, P. Kuklinski, B. Rempfer, C. Farquhar, Y. Cao, and M. C. Garrett, Detailed assessment of calculating drag force with quantum computers: Explicit time-evolution precludes exponential advantage for nonlinear differential equations, arXiv preprint arXiv:2406.06323 (2024)

work page arXiv 2024
[25]

An and L

D. An and L. Lin, Quantum linear system solver based on time-optimal adiabatic quantum computing and quantum approximate optimization algorithm, ACM Transactions on Quantum Computing3, 1 (2022)

work page 2022
[26]

A. M. Dalzell, A shortcut to an optimal quantum linear system solver, arXiv preprint arXiv:2406.12086 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[27]

Jennings, M

D. Jennings, M. Lostaglio, S. Pallister, A. T. Sornborger, and Y. Suba¸ sı, Randomized adiabatic quantum linear solver algorithm with optimal complexity scaling and detailed running costs, PRX Quantum6, 040373 (2025)

work page 2025
[28]

PennyLane: Automatic differentiation of hybrid quantum-classical computations

V. Bergholm, J. Izaac, M. Schuld, C. Gogolin, S. Ahmed, V. Ajith, M. S. Alam, G. Alonso-Linaje, B. Akash- Narayanan, A. Asadi,et al., Pennylane: Automatic differentiation of hybrid quantum-classical computations, arXiv preprint arXiv:1811.04968 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[29]

Mukhopadhyay, A

P. Mukhopadhyay, A. Gheorghiu, and H. Krovi, Improved quantum circuits for division, arXiv preprint arXiv:2603.18110 (2026). 7 Appendix A: Non-power of 2 bit-length Letµbe the number of bits loaded in each of theαgroups/iterations. In more general applications of this algorithm, the problem parameters will not be powers of 2. In this section we lift the p...

work page arXiv 2026

[1] [4]

Load allf(q, r) values forr= 0, ..., λ−1 in the ancillary registers based on the state of the|q⟩register

work page

[2] [5]

Swap ther th b-qubit ancilla register with the zeroth one based on the state of the|r⟩register

work page

[3] [6]

Copy the value of the zeroth register into a clean register

work page

[4] [7]

Swap ther th register with the zeroth one based on the state of the|r⟩register

work page

[5] [8]

SelectCopy

Unload allf(q, r) values forr= 0, ..., λ−1 in the ancillary registers based on the state of the|q⟩register. This was the approach proposed in [11] with Toffoli cost 2 N λ + 4bλusingbλdirty ancilla qubits. This was slightly refined in [1] to achieve Toffoli cost 2 N λ + 4b(λ−1) and dirty ancilla counts ofb(λ−1). We now present our first improvement over th...

work page

[6] [9]

D. W. Berry, C. Gidney, M. Motta, J. R. McClean, and R. Babbush, Qubitization of arbitrary basis quantum chemistry leveraging sparsity and low rank factorization, Quantum3, 208 (2019)

work page 2019

[7] [10]

von Burg, G

V. von Burg, G. H. Low, T. H¨ aner, D. S. Steiger, M. Reiher, M. Roetteler, and M. Troyer, Quantum computing enhanced computational catalysis, Physical Review Research3, 033055 (2021)

work page 2021

[8] [11]

J. Lee, D. W. Berry, C. Gidney, W. J. Huggins, J. R. McClean, N. Wiebe, and R. Babbush, Even more efficient quantum computations of chemistry through tensor hypercontraction, PRX quantum2, 030305 (2021)

work page 2021

[9] [12]

G. H. Low and I. L. Chuang, Optimal hamiltonian simulation by quantum signal processing, Physical review letters118, 010501 (2017)

work page 2017

[10] [13]

G. H. Low and I. L. Chuang, Hamiltonian simulation by qubitization, Quantum3, 163 (2019)

work page 2019

[11] [14]

Motlagh and N

D. Motlagh and N. Wiebe, Generalized quantum signal processing, PRX Quantum5, 020368 (2024)

work page 2024

[12] [15]

D. W. Berry, D. Motlagh, G. Pantaleoni, and N. Wiebe, Doubling the efficiency of hamiltonian simulation via generalized quantum signal processing, Physical Review A110, 012612 (2024)

work page 2024

[13] [16]

Babbush, C

R. Babbush, C. Gidney, D. W. Berry, N. Wiebe, J. McClean, A. Paler, A. Fowler, and H. Neven, Encoding electronic spectra in quantum circuits with linear t complexity, Physical Review X8, 041015 (2018)

work page 2018

[14] [17]

Fomichev, K

S. Fomichev, K. Hejazi, M. S. Zini, M. Kiser, J. Fraxanet, P. A. M. Casares, A. Delgado, J. Huh, A.-C. Voigt, J. E. Mueller, et al., Initial state preparation for quantum chemistry on quantum computers, PRX Quantum5, 040339 (2024)

work page 2024

[15] [18]

Rupprecht and S

F. Rupprecht and S. W¨ olk, Sparse quantum state preparation with improved toffoli cost, arXiv preprint arXiv:2601.09388 (2026)

work page arXiv 2026

[16] [19]

G. H. Low, V. Kliuchnikov, and L. Schaeffer, Trading t gates for dirty qubits in state preparation and unitary synthesis, Quantum8, 1375 (2024)

work page 2024

[17] [20]

Alonso-Linaje, U

G. Alonso-Linaje, U. Azad, J. Soni, J. Smalley, L. Lapworth, and J. M. Arrazola, Quantum compilation framework for data loading, arXiv preprint arXiv:2512.05183 (2025)

work page arXiv 2025

[18] [21]

Kottmann, D

K. Kottmann, D. Wierichs, G. Alonso-Linaje, and N. Killoran, Parameter-optimal unitary synthesis with flag decomposi- tions, arXiv preprint arXiv:2603.20376 (2026)

work page arXiv 2026

[19] [22]

Jennings, M

D. Jennings, M. Lostaglio, R. B. Lowrie, S. Pallister, and A. T. Sornborger, The cost of solving linear differential equations on a quantum computer: fast-forwarding to explicit resource counts, Quantum8, 1553 (2024)

work page 2024

[20] [23]

Pocrnic, P

M. Pocrnic, P. D. Johnson, A. Katabarwa, and N. Wiebe, Constant-factor improvements in quantum algorithms for linear differential equations, arXiv preprint arXiv:2506.20760 (2025)

work page arXiv 2025

[21] [24]

Penuel, A

J. Penuel, A. Katabarwa, P. D. Johnson, P. Kuklinski, B. Rempfer, C. Farquhar, Y. Cao, and M. C. Garrett, Detailed assessment of calculating drag force with quantum computers: Explicit time-evolution precludes exponential advantage for nonlinear differential equations, arXiv preprint arXiv:2406.06323 (2024)

work page arXiv 2024

[22] [25]

An and L

D. An and L. Lin, Quantum linear system solver based on time-optimal adiabatic quantum computing and quantum approximate optimization algorithm, ACM Transactions on Quantum Computing3, 1 (2022)

work page 2022

[23] [26]

A. M. Dalzell, A shortcut to an optimal quantum linear system solver, arXiv preprint arXiv:2406.12086 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[24] [27]

Jennings, M

D. Jennings, M. Lostaglio, S. Pallister, A. T. Sornborger, and Y. Suba¸ sı, Randomized adiabatic quantum linear solver algorithm with optimal complexity scaling and detailed running costs, PRX Quantum6, 040373 (2025)

work page 2025

[25] [28]

PennyLane: Automatic differentiation of hybrid quantum-classical computations

V. Bergholm, J. Izaac, M. Schuld, C. Gogolin, S. Ahmed, V. Ajith, M. S. Alam, G. Alonso-Linaje, B. Akash- Narayanan, A. Asadi,et al., Pennylane: Automatic differentiation of hybrid quantum-classical computations, arXiv preprint arXiv:1811.04968 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[26] [29]

Mukhopadhyay, A

P. Mukhopadhyay, A. Gheorghiu, and H. Krovi, Improved quantum circuits for division, arXiv preprint arXiv:2603.18110 (2026). 7 Appendix A: Non-power of 2 bit-length Letµbe the number of bits loaded in each of theαgroups/iterations. In more general applications of this algorithm, the problem parameters will not be powers of 2. In this section we lift the p...

work page arXiv 2026