arxiv: 2603.04377 · v2 · submitted 2026-03-04 · 🪐 quant-ph · cs.DC· cs.ET

Recognition: no theorem link

Benchmarking Quantum Computers via Protocols, Comparing IBM's Heron vs IBM's Eagle

Nitay Mayo , Tal Mor , Yossi Weinstein

Authors on Pith no claims yet

Pith reviewed 2026-05-15 16:40 UTC · model grok-4.3

classification 🪐 quant-ph cs.DCcs.ET

keywords quantum benchmarkingIBM HeronIBM Eaglequantum advantageprotocol evaluationquantum processorserror rates

0 comments

The pith

Protocol benchmarks show substantial performance improvements in IBM's newer Heron over Eagle processors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper applies an existing protocol-based benchmarking method that uses quantumness thresholds to assess quantum processors. It evaluates the older Eagle architecture against the newer Heron generation at the level of complete protocols rather than individual gates. The method determines whether a processor or sub-chip can demonstrate practical quantum advantage by meeting defined thresholds. Results indicate clear gains in the Heron, exposing its operational strengths and remaining limitations.

Core claim

Applying protocol-level quantumness thresholds to IBM's Eagle and Heron processors reveals substantial performance improvements in the Heron generation, offering a transparent way to check for practical quantum advantage.

What carries the argument

Protocol-based benchmarking methodology that applies well-defined quantumness thresholds to evaluate entire protocols instead of gate-level metrics.

If this is right

The Heron architecture delivers genuine operational strengths over the Eagle in protocol performance.
Sub-chips within processors can be tested independently to locate pockets of quantum advantage.
Protocol-level assessment provides an intuitive guide for prioritizing hardware research.
Fair access policies enable objective third-party comparisons across quantum devices.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same protocol thresholds could be used to benchmark processors from other hardware vendors.
Hardware teams might focus future improvements on raising the fraction of sub-chips that cross the quantumness thresholds.
Protocol results could help users decide which machine or sub-chip to reserve for a given task.

Load-bearing premise

The quantumness thresholds correctly mark when a processor has achieved practical quantum advantage on these IBM devices.

What would settle it

Running the same protocols on Heron and Eagle sub-chips and finding no measurable improvement or a decline in meeting the quantumness thresholds would disprove the claim of substantial gains.

read the original abstract

As quantum computing hardware rapidly advances, objectively evaluating the capabilities and error rates of new processors remains a critical challenge for the field. A clear and realistic understanding of current quantum performance is essential to guide research priorities and drive meaningful progress. In this work, we apply and extend a protocol-based benchmarking methodology (Meirom, Mor, Weinstein Arxiv 2505.12441) that utilizes well-defined quantumness thresholds. By evaluating performance at protocol level rather then the gate level, this approach provides a transparent and intuitive assessment of whether specific quantum processors, or isolated sub-chips within them, can demonstrate a practical quantum advantage. To illustrate the utility of this method, we compare two generations of IBM quantum computers: the older Eagle architecture and the newer Heron architecture. Our findings reveal the genuine operational strengths and limitations of these devices, demonstrating substantial performance improvements in the newer Heron generation. This work was made possible by IBM Quantum policies which enable independent and objective assessment on their quantum computers and sub-chips. We strongly encourage other companies to emulate the independent qubit availability and the fair pricing which allow researchers to perform such assessments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper runs the existing protocol on real IBM hardware and reports Heron clearly ahead of Eagle, but applies the thresholds without fresh calibration for these devices.

read the letter

The core new piece is the direct comparison: Heron processors come out ahead of Eagle on the protocol-level quantumness metrics, with the authors pointing to meaningful operational gains in the newer generation. They also note IBM's open access to sub-chips as enabling this kind of independent check, which is a practical plus for the field. The work stays grounded in the cited Meirom et al. framework and does not claim to invent new thresholds or theory, so the value sits in the concrete hardware data point rather than methodological novelty. That said, the central claim of substantial improvement rests on those fixed thresholds being the right ones for Eagle and Heron noise. The manuscript applies them directly without re-deriving or validating against the actual error rates, crosstalk, or readout behavior of these specific processors. If the dominant noise channels differ from what the original protocol assumed, the reported gap could shrink or shift. The abstract presents the findings as clear, but the strength of the quantitative support depends on the full data tables, error bars, and run statistics that are not visible here. This is the sort of paper that people tracking current quantum hardware or running similar benchmarks will want to read. It supplies a transparent side-by-side that can inform priorities even if the threshold calibration needs more discussion. I would bring it to a reading group to talk through whether the cutoffs hold for these IBM chips. It deserves peer review because the comparison is timely and the method is explicit, though referees should press on the threshold justification and raw data presentation.

Referee Report

1 major / 1 minor

Summary. The manuscript applies and extends a protocol-based benchmarking methodology from Meirom et al. (arXiv:2505.12441) that employs fixed quantumness thresholds to evaluate IBM Eagle and Heron processors (including sub-chips). It claims that the Heron generation exhibits substantial performance improvements over Eagle, providing a transparent assessment of practical quantum advantage at the protocol level rather than gate level.

Significance. If the central claim holds after threshold validation, the work supplies a clear, protocol-level framework for hardware comparison that could guide research priorities and highlight genuine operational progress in newer quantum processors. The emphasis on independent access to sub-chips is a constructive contribution to reproducible benchmarking.

major comments (1)

[Abstract and methodology] Abstract and methodology section: The headline claim of substantial Heron improvements rests on quantumness thresholds taken directly from Meirom et al. (2505.12441) without device-specific re-derivation or calibration against the measured error rates, crosstalk, or readout noise of the IBM Eagle/Heron processors. If these fixed thresholds are misaligned with the dominant noise channels, the reported performance gap reduces to an artifact of the external cutoff rather than an intrinsic difference.

minor comments (1)

[Abstract] Abstract: The phrase 'genuine operational strengths and limitations' is used without a corresponding quantitative breakdown of limitations in the Heron results; a brief table or paragraph summarizing failure modes would improve balance.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We provide a point-by-point response to the major comment below, defending the methodology on substantive grounds while offering clarifications in revision.

read point-by-point responses

Referee: [Abstract and methodology] Abstract and methodology section: The headline claim of substantial Heron improvements rests on quantumness thresholds taken directly from Meirom et al. (2505.12441) without device-specific re-derivation or calibration against the measured error rates, crosstalk, or readout noise of the IBM Eagle/Heron processors. If these fixed thresholds are misaligned with the dominant noise channels, the reported performance gap reduces to an artifact of the external cutoff rather than an intrinsic difference.

Authors: We thank the referee for this insightful comment. The quantumness thresholds are intentionally adopted from Meirom et al. as they are theoretically derived criteria for establishing quantum advantage in the protocols, independent of specific hardware noise models. This fixed approach ensures a standardized and reproducible benchmarking framework applicable across different processors, allowing direct comparison of Eagle and Heron without confounding factors from customized thresholds. While we did not perform device-specific re-derivation, the substantial improvements observed in Heron are consistent across multiple protocols and sub-chips, indicating that the performance gap arises from intrinsic hardware advancements rather than threshold misalignment. In the revised version, we will expand the methodology section to elaborate on the theoretical basis of these thresholds and their robustness to variations in error rates, crosstalk, and readout noise typical of IBM quantum processors. revision: partial

Circularity Check

1 steps flagged

Minor self-citation to prior protocol thresholds; hardware benchmarking and comparison remain independent

specific steps

self citation load bearing [Abstract]
"we apply and extend a protocol-based benchmarking methodology (Meirom, Mor, Weinstein Arxiv 2505.12441) that utilizes well-defined quantumness thresholds. By evaluating performance at protocol level rather then the gate level, this approach provides a transparent and intuitive assessment of whether specific quantum processors, or isolated sub-chips within them, can demonstrate a practical quantum advantage."

The thresholds that classify runs as demonstrating practical quantum advantage are imported from the overlapping-author prior paper and applied without re-derivation from the present IBM-device error rates or crosstalk models; the reported Heron improvements therefore inherit their interpretation from that external definition.

full rationale

The manuscript applies the protocol and fixed quantumness thresholds defined in the cited prior work (Meirom et al. 2505.12441) to new measurements on IBM Eagle and Heron processors. No parameters are fitted to the current data and then presented as predictions, no self-definitional equations appear, and the relative performance claims derive from direct experimental runs on external hardware rather than reducing to the citation by construction. The self-citation is therefore minor and non-load-bearing for the core empirical comparison.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of quantumness thresholds defined in the cited prior work and on the assumption that protocol-level success translates to practical advantage; no new free parameters or invented entities are introduced in this manuscript.

axioms (1)

domain assumption Quantumness thresholds defined in Meirom, Mor, Weinstein (arXiv 2505.12441) accurately indicate practical quantum advantage for the tested protocols.
Invoked when interpreting protocol results as evidence of quantum capability rather than classical simulation.

pith-pipeline@v0.9.0 · 5499 in / 1211 out tokens · 35390 ms · 2026-05-15T16:40:10.477466+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Thermalization of SU(2) Lattice Gauge Fields on Quantum Computers
hep-lat 2026-03 unverdicted novelty 7.0

Quantum hardware simulation of SU(2) lattice gauge thermalization matches classical extrapolations up to 101 plaquettes after error mitigation, establishing feasibility for chaotic quantum field systems.
A Conceptual Technology-Dependent Framework of Ternary Quantum Gates
quant-ph 2026-04 unverdicted novelty 3.0

A technology-dependent conceptual design for ternary quantum gates including Chrestenson, Z3 variants, controlled gates, a non-phase SWAP, and a GF(3)-based Toffoli for qutrit systems.