arxiv: 2605.12583 · v1 · submitted 2026-05-12 · 🪐 quant-ph · cs.MS

Recognition: 1 theorem link

· Lean Theorem

QuPort: Topology-, Port-, and Congestion-Aware Compilation for Modular Multi-QPU Quantum Systems

Soumyadip Sarkar , Subhasree Bhattacharjee

Authors on Pith no claims yet

Pith reviewed 2026-05-14 20:41 UTC · model grok-4.3

classification 🪐 quant-ph cs.MS

keywords quantum compilationmodular quantum processorsmulti-QPU systemsgraph partitioningcircuit mappinginterconnect congestionport-aware routingremote gate extraction

0 comments

The pith

QuPort's TPCCAP finds circuit partitions for multi-QPU systems that jointly minimize cut distance, port overflow, and link congestion on a three-level graph.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a compiler framework that treats modular quantum machines as three coupled graphs: the logical interaction graph, the physical coupling map inside each QPU, and the interconnect graph between QPUs. TPCCAP partitions the circuit to reduce the weighted sum of long-range cuts, overloaded communication ports, and congested interconnect links, then applies clustering, annealing refinement, and local routing. A sympathetic reader would care because single-QPU compilers ignore the dominant cost of cross-QPU operations; a method that accounts for ports and congestion could let larger machines run without traffic hotspots. The framework extracts remote two-qubit gates, routes locally inside each QPU, and estimates schedules that reflect the chosen topology.

Core claim

The central claim is that an explicit three-level model together with the TPCCAP objective (weighted cut distance plus port overflow plus routed link load) produces mappings that respect both intra-QPU topology and inter-QPU communication limits, and that the supporting steps of heavy-edge clustering, balanced greedy partitioning, simulated-annealing refinement, port-aware layout, and remote-gate extraction are sufficient to realize those mappings.

What carries the argument

TPCCAP, the partitioning routine that optimizes the combined objective of weighted cut distance, communication-port overflow, and routed link-load congestion on the three-level graph model.

If this is right

Fewer remote two-qubit operations need to be scheduled across QPUs.
Communication ports are less likely to exceed their capacity.
Interconnect links carry more balanced traffic loads.
Local routing inside each QPU can proceed independently once the partition is fixed.
Schedule estimates can incorporate the actual topology of both intra- and inter-QPU links.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same three-level objective could be reused for dynamic re-partitioning if hardware allows mid-circuit qubit movement.
Weight tuning experiments on synthetic benchmarks with varying port counts would reveal which term dominates for realistic interconnect densities.
Extending the model to include qubit reset or measurement costs could further tighten the link between compiler output and hardware runtime.
Classical distributed-memory compilers face analogous port and congestion constraints; the TPCCAP structure might transfer directly.

Load-bearing premise

That minimizing the weighted combination of cut distance, port overflow, and congestion on the abstract three-level graphs produces mappings that improve actual execution on modular hardware.

What would settle it

Compile the same circuit with TPCCAP and with a standard single-graph partitioner, then execute both versions on a calibrated multi-QPU simulator or hardware testbed and compare measured total runtime or error rate.

Figures

Figures reproduced from arXiv: 2605.12583 by Soumyadip Sarkar, Subhasree Bhattacharjee.

**Figure 1.** Figure 1: Compilation paths represented in QuPort. The global path invokes Qiskit over one directed physical coupling map. The distributed path preserves cross-QPU gates as remote events and routes only inside each QPU [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Three graph views used by QuPort. The logical graph stores circuit interaction weights, the physical coupling map is the directed graph passed to Qiskit, and the QPU graph is used for hop distance, traffic, congestion, and scheduling. The figure is illustrative. 4 Partitioning Objective The basic remote-interaction cut of a partition is cut(π) = X (i,j)∈EL wij 1[π(i) ̸= π(j)]. (12) Cut weight alone does no… view at source ↗

**Figure 3.** Figure 3: Terms optimized by TPCCAP. Each term is computed from the logical interaction graph, the QPU partition, and the QPU-level interconnect graph. 5 Algorithms This section describes the algorithms present in QuPort. The descriptions use mathematical names for clarity, but each algorithm corresponds to a concrete component of the implementation at the referenced repository snapshot [13]. 5.1 Heavy-edge cluster… view at source ↗

**Figure 4.** Figure 4: Remote-round feasibility in the topology-aware estimator. A round must respect endpoint communication-port limits and link-capacity limits along the chosen QPUnetwork paths. The scalar cost model used in the global compilation path is also abstract: Clocal = τ1n1 + τ2n2 + τswapnswap, (22) Cremote = nremote(τE + τC + τR), (23) Ctotal = Clocal + Cremote + 0.1dC τ2, (24) where dC is circuit depth. These equa… view at source ↗

read the original abstract

Modular quantum processors require a compiler to reason about two resources at the same time: local device connectivity and communication across QPUs. A mapping that is acceptable on a single coupling graph may be unsuitable for a modular machine if it creates excessive cross-QPU traffic, concentrates that traffic on a small number of interconnect links, or assigns many boundary qubits to a QPU with few communication ports. This paper presents QuPort, a Python and Qiskit-based compilation framework that studies this setting through an explicit three-level model: a weighted logical interaction graph, a directed physical coupling map, and an undirected QPU-level interconnect graph. The main partitioning method, TPCCAP, optimizes the implemented objective formed by weighted cut distance, communication-port overflow, and routed link-load congestion. The framework also includes heavy-edge clustering, balanced greedy partitioning, simulated-annealing refinement, communication-port-aware layout, extraction of remote two-qubit operations, local-only routing of per-QPU circuits, and topology-aware schedule estimation. The model is a compiler-level abstraction. It does not claim a calibrated hardware runtime or an implementation of a physical remote-gate protocol.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

QuPort gives a clean three-level graph model and TPCCAP optimizer for multi-QPU compilation, but the paper stops at the algorithm description with zero benchmarks or comparisons.

read the letter

The paper's core contribution is a three-level modeling approach for compiling to modular quantum hardware: a weighted logical interaction graph, per-QPU physical coupling maps, and an interconnect graph between QPUs. TPCCAP then partitions while minimizing a weighted sum of cut distance, communication-port overflow, and routed link-load congestion. It layers on standard steps like heavy-edge clustering, greedy balanced partitioning, simulated-annealing refinement, port-aware layout, remote-gate extraction, and local routing. This setup is more explicit than most single-device mappers in the literature it cites, and the objective is stated directly in graph terms without fitted parameters or circular self-references. That part is solid and worth noting for anyone building compilers in this space. The main limitation is the total absence of results. The manuscript describes the framework and claims it optimizes the stated objective, but reports no circuit examples, no cost numbers, no baseline comparisons, and no even small-scale simulations. Without those, it is impossible to judge whether the model produces meaningfully better mappings or whether the chosen weights are reasonable. The authors correctly frame the work as a compiler abstraction rather than a hardware claim, so the missing validation does not create an internal contradiction. Still, for a compilation paper the evaluation gap is large. This is the kind of work that belongs in a reading group for people actively working on multi-QPU compilers or scaling roadmaps. I would cite the modeling framework if I were extending similar ideas, but not any performance result. It deserves a serious referee because the three-level formulation is new and the problem it targets is central, provided the authors add concrete experiments in revision.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces QuPort, a Qiskit-based compilation framework for modular multi-QPU quantum systems. It defines a three-level graph model (weighted logical interaction graph, directed physical coupling map, undirected QPU interconnect graph) and presents TPCCAP as the main partitioning algorithm that optimizes an explicit objective combining weighted cut distance, communication-port overflow, and routed link-load congestion. The framework additionally incorporates heavy-edge clustering, balanced greedy partitioning, simulated-annealing refinement, port-aware layout, remote two-qubit operation extraction, local-only per-QPU routing, and topology-aware schedule estimation. The work is framed strictly as a compiler-level abstraction without claims of calibrated hardware runtimes or physical remote-gate protocols.

Significance. The explicit three-level model and the joint optimization of cut distance, port overflow, and congestion address a timely compiler challenge for scaling quantum systems via modularity. The parameter-free formulation of the objective in terms of directly measurable graph quantities is a strength that could serve as a reusable foundation for future modular compilers. However, the absence of any quantitative benchmarks or baseline comparisons limits the demonstrated impact to the design description alone.

major comments (1)

[Evaluation section] Evaluation section: the manuscript reports no quantitative benchmarks, comparisons to existing partitioners, or metrics (e.g., achieved objective values, port utilization, or congestion levels) on any circuit suite. Because the central claim is that TPCCAP optimizes the stated objective, the lack of even synthetic-graph results or ablation studies on the three components makes it impossible to verify that the implemented algorithm produces the intended improvements.

minor comments (2)

[Abstract] Abstract: the phrase 'the implemented objective' is used without an equation or explicit weight values; adding a compact definition (e.g., min w1·cut + w2·overflow + w3·load) would improve immediate readability.
[Model definition] Notation: the three-level graph model is introduced with descriptive names but without a single consolidated table or figure that lists all symbols (G_L, G_P, G_Q, etc.) and their edge-weight interpretations; a dedicated notation table would reduce ambiguity for readers.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We agree that the absence of quantitative benchmarks limits the ability to verify the effectiveness of TPCCAP and will revise the manuscript to include a dedicated Evaluation section with benchmarks, comparisons, and ablation studies.

read point-by-point responses

Referee: [Evaluation section] Evaluation section: the manuscript reports no quantitative benchmarks, comparisons to existing partitioners, or metrics (e.g., achieved objective values, port utilization, or congestion levels) on any circuit suite. Because the central claim is that TPCCAP optimizes the stated objective, the lack of even synthetic-graph results or ablation studies on the three components makes it impossible to verify that the implemented algorithm produces the intended improvements.

Authors: We acknowledge that this observation is correct and that the current manuscript is limited to a design description of the three-level model and TPCCAP without empirical results. In the revised version we will add a full Evaluation section containing: (i) experiments on synthetic interaction graphs that report the achieved values of the composite objective (weighted cut distance + port overflow + routed congestion), (ii) direct comparisons against baseline partitioners (e.g., METIS and a greedy multi-level variant) using the same three-level graph model, (iii) per-metric breakdowns of port utilization and link-load congestion, and (iv) ablation studies that isolate the contribution of each term in the objective. These additions will allow readers to verify that the implemented algorithm produces the intended improvements. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper defines TPCCAP as a partitioning method that directly optimizes an explicitly constructed objective (weighted cut distance + port overflow + routed link-load congestion) on a three-level graph model. This is a standard algorithmic definition with no reduction of any claimed prediction or result to a fitted parameter, self-citation chain, or self-referential definition. No equations or steps are shown that rename a known result, smuggle an ansatz via prior work, or import uniqueness from the authors' own citations. The framework description (heavy-edge clustering, simulated annealing, etc.) is self-contained as a compiler abstraction and does not rely on external benchmarks or prior self-citations to establish its central claim.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework builds on standard graph-partitioning algorithms and Qiskit primitives; the main addition is the integrated multi-objective cost for modular systems. No new physical entities are postulated.

free parameters (1)

objective weights
The TPCCAP objective is a weighted sum of cut distance, port overflow, and congestion; the weights must be chosen or tuned.

axioms (1)

domain assumption Communication costs in modular quantum systems can be accurately captured by graph-cut distance, port counts, and link-load congestion.
This assumption underpins the TPCCAP objective and the three-level model.

pith-pipeline@v0.9.0 · 5506 in / 1374 out tokens · 50798 ms · 2026-05-14T20:41:40.836393+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The implemented TPCCAP objective is J(π) = α Σ w_ij d(π(i),π(j)) + β Σ max(0,b_q−P)^2 + η Σ L_e^2

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages

[1]

Computer Networks254, 110672 (2024).https://doi.org/10.1016/j.comnet.2024.110672

Caleffi, M., Amoretti, M., Ferrari, D., Illiano, J., Manzalini, A., Cacciapuoti, A.S.: Distributed quantum computing: A survey. Computer Networks254, 110672 (2024).https://doi.org/10.1016/j.comnet.2024.110672

work page doi:10.1016/j.comnet.2024.110672 2024
[2]

Physical Review A59, 4249–4254 (1999).https: //doi.org/10.1103/PhysRevA.59.4249

Cirac, J.I., Ekert, A.K., Huelga, S.F., Macchiavello, C.: Distributed quantum com- putation over noisy channels. Physical Review A59, 4249–4254 (1999).https: //doi.org/10.1103/PhysRevA.59.4249

work page doi:10.1103/physreva.59.4249 1999
[3]

Scientific Reports12, 15421 (2022).https://doi

Davarzani, Z., Zomorodi, M., Houshmand, M.: A hierarchical approach for building distributed quantum systems. Scientific Reports12, 15421 (2022).https://doi. org/10.1038/s41598-022-18989-w

work page doi:10.1038/s41598-022-18989-w 2022
[4]

IEEE Transactions on Quantum Engineering2, 1–20 (2021).https://doi.org/10.1109/TQE.2021.3053921

Ferrari, D., Cacciapuoti, A.S., Amoretti, M., Caleffi, M.: Compiler design for dis- tributed quantum computing. IEEE Transactions on Quantum Engineering2, 1–20 (2021).https://doi.org/10.1109/TQE.2021.3053921

work page doi:10.1109/tqe.2021.3053921 2021
[5]

IEEE Transactions on Quantum Engineering 4, 1–13 (2023).https://doi.org/10.1109/TQE.2023.3303935 14 S

Ferrari,D.,Carretta,S.,Amoretti,M.:Amodularquantumcompilationframework for distributed quantum computing. IEEE Transactions on Quantum Engineering 4, 1–13 (2023).https://doi.org/10.1109/TQE.2023.3303935 14 S. Sarkar, S. Bhattacharjee

work page doi:10.1109/tqe.2023.3303935 2023
[6]

Qiskit API Documentation (2026),https: //quantum.cloud.ibm.com/docs/en/api/qiskit/2.3/qiskit.transpiler

IBM Quantum: Couplingmap. Qiskit API Documentation (2026),https: //quantum.cloud.ibm.com/docs/en/api/qiskit/2.3/qiskit.transpiler. CouplingMap

work page 2026
[7]

IBM Quantum Documentation (2026),https://quantum.cloud.ibm.com/docs/en/guides/transpile

IBM Quantum: Introduction to transpilation. IBM Quantum Documentation (2026),https://quantum.cloud.ibm.com/docs/en/guides/transpile

work page 2026
[8]

SIAM Journal on Scientific Computing20(1), 359–392 (1998)

Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing20(1), 359–392 (1998). https://doi.org/10.1137/S1064827595287997

work page doi:10.1137/s1064827595287997 1998
[9]

In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems

Li, G., Ding, Y., Xie, Y.: Tackling the qubit mapping problem for NISQ-era quan- tum devices. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. pp. 1001–1014 (2019).https://doi.org/10.1145/3297858.3304023

work page doi:10.1145/3297858.3304023 2019
[10]

Nature638, 383–388 (2025).https://doi.org/10.1038/ s41586-024-08404-x

Main, D., Drmota, P., Nadlinger, D.P., Ainley, E.M., Agrawal, A., Nichol, B.C., Srinivas, R., Araneda, G., Lucas, D.M.: Distributed quantum computing across an optical network link. Nature638, 383–388 (2025).https://doi.org/10.1038/ s41586-024-08404-x

work page 2025
[11]

Nature Elec- tronics8, 610–619 (2025).https://doi.org/10.1038/s41928-025-01404-3

Mollenhauer, M., Irfan, A., Cao, X., Mandal, S., Pfaff, W.: A high-efficiency ele- mentary network of interchangeable superconducting qubit devices. Nature Elec- tronics8, 610–619 (2025).https://doi.org/10.1038/s41928-025-01404-3

work page doi:10.1038/s41928-025-01404-3 2025
[12]

Physical Review A89, 022317 (2014).https://doi

Monroe, C., Raussendorf, R., Ruthven, A., Brown, K.R., Maunz, P., Duan, L.M., Kim, J.: Large-scale modular quantum-computer architecture with atomic memory and photonic interconnects. Physical Review A89, 022317 (2014).https://doi. org/10.1103/PhysRevA.89.022317

work page doi:10.1103/physreva.89.022317 2014
[13]

GitHub repository (2026),https://github.com/neuralsorcerer/ quport

Sarkar, S.: Quport: multi-qpu circuit mapping, routing, splitting, scheduling, and benchmarking. GitHub repository (2026),https://github.com/neuralsorcerer/ quport

work page 2026
[14]

Quantum Science and Technol- ogy6(1), 014003 (2020).https://doi.org/10.1088/2058-9565/ab8e92 A Notation T able 1.Notation used in the manuscript

Sivarajah, S., Dilkes, S., Cowtan, A., Simmons, W., Edgington, A., Duncan, R.: t|ket>: A retargetable compiler for NISQ devices. Quantum Science and Technol- ogy6(1), 014003 (2020).https://doi.org/10.1088/2058-9565/ab8e92 A Notation T able 1.Notation used in the manuscript. Symbol Meaning QL logical-qubit set GL weighted logical interaction graph GQ undir...

work page doi:10.1088/2058-9565/ab8e92 2020