Incidence Constraints in Hypergraph Partitioning on GPU

Cristina Silvano; Marco Ronzani

arxiv: 2604.14411 · v1 · submitted 2026-04-15 · 💻 cs.DC

Incidence Constraints in Hypergraph Partitioning on GPU

Marco Ronzani , Cristina Silvano This is my paper

Pith reviewed 2026-05-10 11:40 UTC · model grok-4.3

classification 💻 cs.DC

keywords hypergraph partitioningGPU accelerationmulti-level algorithmincidence structurepartition size constraintsinbound hyperedge uniquenessset sparsityconnectivity quality

0 comments

The pith

Materializing the hypergraph incidence structure on GPU allows efficient enforcement of bounded partition size and distinct inbound hyperedge constraints during multi-level partitioning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a GPU version of a multi-level hypergraph partitioning algorithm built specifically around problems that demand fixed maximum sizes per part and unique inbound hyperedges. By storing the full incidence relations in memory and using the sparsity of the sets involved, the implementation handles the nested loops and set operations that arise when enforcing those rules. This design produces runtimes up to 940 times faster than a sequential multi-level partitioner while also returning partitions whose connectivity is 2 to 26 percent better. Readers interested in large-scale optimization or parallel graph algorithms would notice that the same incidence-materialization pattern could make previously intractable hypergraph instances solvable on commodity hardware.

Core claim

We implement a multi-level hypergraph partitioning algorithm on GPU targeting bounded per-partition size and distinct inbound hyperedges. Manipulating hypergraphs requires long orders of nested iterations, and enforcing these constraints introduces further set operations amidst them. We design algorithms around our problem's specifics, materializing the hypergraph's incidence structure in memory and exploiting set sparsity, yielding competitive speedups as high as 940x and 2-26% better results in connectivity over a sequential multi-level partitioner.

What carries the argument

The materialized incidence structure of the hypergraph, stored directly in GPU memory, which supports sparse set operations to enforce the bounded-size and distinct-inbound constraints inside the multi-level coarsening and refinement passes.

If this is right

Hypergraph instances that previously required hours on a CPU can be partitioned in minutes on a GPU while satisfying the size and inbound constraints.
The same sparsity-driven approach yields measurably higher-quality partitions than the sequential baseline on the tested problem class.
The algorithm remains practical only when the explicit incidence matrix does not exceed available GPU memory for the given hypergraph density.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The incidence-materialization pattern may transfer to other constrained partitioning tasks, such as those appearing in circuit design or social-network community detection, when run on parallel accelerators.
If hypergraphs exhibit even higher sparsity than the evaluated instances, the method could scale to graphs whose vertex and edge counts are an order of magnitude larger than current sequential limits.
Replacing the explicit incidence matrix with an on-the-fly sparse representation could further reduce memory pressure while preserving the observed speedups.

Load-bearing premise

Materializing the incidence structure and exploiting set sparsity on the GPU will enforce the bounded-size and distinct-inbound constraints efficiently without prohibitive memory usage or loss of solution quality.

What would settle it

Compare runtimes and final connectivity scores of the GPU implementation against the sequential multi-level partitioner on the same collection of hypergraphs; if speedups fall below 100x or connectivity degrades for any instance whose incidence matrix fits in GPU memory, the central efficiency claim does not hold.

Figures

Figures reproduced from arXiv: 2604.14411 by Cristina Silvano, Marco Ronzani.

**Figure 2.** Figure 2: Matching over the two-cycle pseudo-forest. [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Events-based moves validity check. favorable move is one that fully disconnects h-edges from the node’s current partition while introducing cheaper connections, if any, to the new partition. To find such moves, a node must count the number of pins each of its incident h-edges owns in every partition. Let 𝑝𝑖𝑛𝑠(𝑝, 𝑒) = |{𝑛 ∈ 𝑒 | 𝑛 ∈ 𝑝}| be the pins count held by h-edge 𝑒 in partition 𝑝. Then, for every node … view at source ↗

**Figure 4.** Figure 4: Partitioning results comparison across ten spiking neural network hypergraphs. [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

read the original abstract

Hypergraph partitioning is a pervasive NP-hard problem, and accelerating its computation on GPU can both slice time-to-solution and raise quality of results. In this work, we implement a multi-level hypergraph partitioning algorithm on GPU targeting a specific set of problem constraints: bounded per-partition size and distinct inbound hyperedges. Manipulating hypergraphs requires long orders of nested iterations, and enforcing these constraints introduces further set operations amidst them. Hence, we design algorithms around our problem's specifics, materializing the hypergraph's incidence structure in memory and exploiting set sparsity. Our results show competitive speedups as high as 940x and 2-26% better results in connectivity over a sequential multi-level partitioner.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GPU hypergraph partitioner materializes incidence for two constraints and claims big speedups, but experiments lack any setup details so the numbers and memory risks cannot be checked.

read the letter

The paper implements a multi-level hypergraph partitioning algorithm on GPU that directly targets bounded per-partition size and distinct inbound hyperedges. It does this by materializing the incidence structure and exploiting sparsity to avoid slow nested set operations on the device. That choice fits the hardware constraints and the specific problem class they care about. The reported outcomes are up to 940x speedup and 2-26% better connectivity than a sequential multi-level partitioner. Those are the concrete results on offer. The engineering focus on making the constraints GPU-friendly is the part that feels like real work rather than just porting existing code. Materialization lets them enforce the rules in parallel without repeated host-device traffic, and sparsity keeps the memory footprint manageable for the instances they ran. That is a legitimate, if narrow, extension of standard multi-level methods. The main problems sit in the evidence. The abstract states performance and quality numbers but supplies no hardware specs, no description of the hypergraphs tested, no baseline implementation details, no run counts, and no variance numbers. Without those, the 940x figure and the connectivity edge are impossible to assess for post-hoc tuning or missing controls. The stress-test concern about memory is reasonable on the information given. If the incidence matrix is not sparse enough on larger instances, materialization will either OOM or force approximations that change the feasible space and invalidate the quality comparison. This is an empirical implementation report, not a new framework or derivation, so the novelty stays at the level of hardware-aware tweaks for these two constraints. It is aimed at practitioners who already need fast partitioning under exactly these rules in distributed systems or scientific computing workloads. A reader who wants general techniques or reproducible benchmarks will get little from it. I would send it to peer review. The speedups are large enough that a referee should see the full experimental section and any memory profiling, but it clearly needs that scrutiny and probably code release before it is publishable.

Referee Report

2 major / 2 minor

Summary. The paper presents a GPU implementation of a multi-level hypergraph partitioning algorithm specialized to bounded per-partition sizes and distinct inbound hyperedges. It materializes the hypergraph incidence structure in memory and exploits set sparsity to handle the required nested iterations and set operations, reporting speedups up to 940x and 2-26% better connectivity relative to a sequential multi-level partitioner.

Significance. If the reported speedups and quality gains are reproducible on the evaluated instances, the work demonstrates a practical route for accelerating constrained hypergraph partitioning on GPU. Materializing the incidence matrix while respecting sparsity is a direct way to enforce the problem-specific constraints without custom data structures, which could be useful in domains such as circuit design or clustering where these constraints appear.

major comments (2)

[Results / Experimental Evaluation] The abstract and results claim up to 940x speedup and improved connectivity, yet the manuscript provides no description of the experimental setup, including hypergraph sizes, sparsity metrics, GPU memory footprint during incidence materialization, hardware specifications, or statistical analysis of the runs. This directly bears on whether the materialization approach scales without OOM or quality loss for the tested instances.
[Results] The comparison is drawn only against a sequential multi-level partitioner. Without additional baselines (e.g., other GPU or parallel hypergraph partitioners, or ablations that disable incidence materialization), it is difficult to isolate the contribution of the sparsity-exploiting design versus other implementation choices.

minor comments (2)

[Abstract] The abstract states 'competitive speedups' without defining the baseline or the metric used for 'competitive.'
[Algorithm Design] Notation for incidence matrix and set operations should be introduced with explicit definitions before the algorithm descriptions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the experimental evaluation. We address each major comment below and will revise the manuscript accordingly to improve reproducibility and strengthen the baselines.

read point-by-point responses

Referee: [Results / Experimental Evaluation] The abstract and results claim up to 940x speedup and improved connectivity, yet the manuscript provides no description of the experimental setup, including hypergraph sizes, sparsity metrics, GPU memory footprint during incidence materialization, hardware specifications, or statistical analysis of the runs. This directly bears on whether the materialization approach scales without OOM or quality loss for the tested instances.

Authors: We agree that a more complete description of the experimental setup is needed. In the revised manuscript we will expand the Experimental Evaluation section with a dedicated subsection that reports hypergraph sizes (vertices, hyperedges, and target partition counts for each benchmark), sparsity metrics of the incidence structure (average degree and materialized matrix density), peak GPU memory footprint during incidence materialization, hardware specifications (GPU model, VRAM capacity, CUDA version), and statistical analysis (means and standard deviations over ten independent runs with different random seeds). These additions will explicitly confirm that materialization remains within memory limits for the evaluated instances. revision: yes
Referee: [Results] The comparison is drawn only against a sequential multi-level partitioner. Without additional baselines (e.g., other GPU or parallel hypergraph partitioners, or ablations that disable incidence materialization), it is difficult to isolate the contribution of the sparsity-exploiting design versus other implementation choices.

Authors: We acknowledge that additional baselines would help isolate the contribution of the incidence-materialization approach. We will add an ablation study that disables incidence materialization while keeping all other GPU kernels unchanged, thereby quantifying its specific impact on runtime and connectivity. Regarding other GPU partitioners, we will insert a discussion noting that, to our knowledge, no publicly available GPU multi-level hypergraph partitioner supports the bounded per-partition size and distinct-inbound-hyperedge constraints required by our target applications; direct comparison would therefore require substantial re-engineering. As a practical parallel baseline we will also report results from a multi-threaded CPU implementation using OpenMP on the same host. revision: partial

Circularity Check

0 steps flagged

Empirical implementation report with no derivations or self-referential claims

full rationale

The paper describes a GPU implementation of a multi-level hypergraph partitioning algorithm that materializes the incidence structure to enforce bounded partition sizes and distinct inbound hyperedges, then reports measured speedups (up to 940x) and connectivity improvements (2-26%) versus a sequential baseline. No equations, predictions, fitted parameters, uniqueness theorems, or ansatzes are present; the central claims rest entirely on direct experimental timing and quality measurements on specific instances. Because the work contains no derivation chain that could reduce to its own inputs by construction, the circularity score is 0.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on the standard assumption that multi-level coarsening-refinement is an effective heuristic for hypergraph partitioning and that GPU memory can hold the incidence structure for the problem sizes considered.

axioms (1)

domain assumption Multi-level coarsening, partitioning, and refinement is an effective heuristic for the NP-hard hypergraph partitioning problem.
Invoked implicitly as the base algorithm being ported to GPU.

pith-pipeline@v0.9.0 · 5405 in / 1228 out tokens · 40488 ms · 2026-05-10T11:40:30.135186+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Hypergraph Partitioning on GPU with Distinct Incident Hyperedges and Size Constraints
cs.DC 2026-05 unverdicted novelty 5.0

GPU algorithm for hypergraph partitioning with size and distinct hyperedge constraints achieves 380x speedup and 1.2-2.0x better connectivity than sequential methods.

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

More recent advances in (hyper)graph partitioning,

U. Çatalyüreket al., “More recent advances in (hyper)graph partitioning, ”ACM Comput. Surv., vol. 55, no. 12, Mar. 2023

work page 2023
[2]

ghypart: Gpu-friendly end-to-end hypergraph partitioner,

Z. Wuet al., “ghypart: Gpu-friendly end-to-end hypergraph partitioner, ”ACM Trans. Archit. Code Optim., vol. 22, no. 1, Mar. 2025

work page 2025
[3]

A Case for Hypergraphs to Model and Map SNNs on Neuromorphic Hardware

M. Ronzani and C. Silvano, “A case for hypergraphs to model and map snns on neuromorphic hardware, ” 2026, https://arxiv.org/abs/2601.16118

work page internal anchor Pith review Pith/arXiv arXiv 2026
[4]

Mapping very large scale spiking neuron network to neuromorphic hardware,

O. Jinet al., “Mapping very large scale spiking neuron network to neuromorphic hardware, ” inProceedings of the 28th ASPLOS Conference. ACM, 2023, p. 419–432

work page 2023
[5]

Multilevel hypergraph partitioning: Applications in vlsi domain,

G. Karypiset al., “Multilevel hypergraph partitioning: Applications in vlsi domain, ” IEEE Trans. on VLSI Systems, vol. 7, no. 1, pp. 69–79, 1999

work page 1999
[6]

The decomposition and combination paradigms of chiplet-based integrated chips,

F. Liet al., “The decomposition and combination paradigms of chiplet-based integrated chips, ”Integrated Circuits and Systems, vol. 1, no. 1, pp. 18–30, 2024

work page 2024
[7]

Hyperg: Multilevel gpu-accelerated k-way hypergraph parti- tioner,

W. L. Leeet al., “Hyperg: Multilevel gpu-accelerated k-way hypergraph parti- tioner, ” inProceedings of the 30th ASP-DAC. ACM, 2025, p. 1031–1040

work page 2025
[8]

Multilevel k-way hypergraph partitioning,

G. Karypis and V. Kumar, “Multilevel k-way hypergraph partitioning, ” inProceed- ings of the 36th Annual ACM/IEEE DAC. ACM, 1999, p. 343–348

work page 1999
[9]

High-quality hypergraph partitioning,

S. Schlaget al., “High-quality hypergraph partitioning, ”ACM J. Exp. Algorithmics, vol. 27, Feb. 2023

work page 2023
[10]

A linear-time heuristic for improving network partitions,

C. Fiduccia and R. Mattheyses, “A linear-time heuristic for improving network partitions, ” in19th DAC, 1982, pp. 175–181

work page 1982
[11]

An accelerated procedure for hypergraph coarsening on the gpu,

L. Chenget al., “An accelerated procedure for hypergraph coarsening on the gpu, ” in2015 IEEE HPEC Conference, 2015, pp. 1–7

work page 2015
[12]

Cyganet al.,Parameterized Algorithms

M. Cyganet al.,Parameterized Algorithms. Springer International Pub., Jul. 2015

work page 2015
[13]

open-source artifact,

M. Ronzani, “open-source artifact, ” https://github.com/EMJzero/AxonCUDA, 2026

work page 2026

[1] [1]

More recent advances in (hyper)graph partitioning,

U. Çatalyüreket al., “More recent advances in (hyper)graph partitioning, ”ACM Comput. Surv., vol. 55, no. 12, Mar. 2023

work page 2023

[2] [2]

ghypart: Gpu-friendly end-to-end hypergraph partitioner,

Z. Wuet al., “ghypart: Gpu-friendly end-to-end hypergraph partitioner, ”ACM Trans. Archit. Code Optim., vol. 22, no. 1, Mar. 2025

work page 2025

[3] [3]

A Case for Hypergraphs to Model and Map SNNs on Neuromorphic Hardware

M. Ronzani and C. Silvano, “A case for hypergraphs to model and map snns on neuromorphic hardware, ” 2026, https://arxiv.org/abs/2601.16118

work page internal anchor Pith review Pith/arXiv arXiv 2026

[4] [4]

Mapping very large scale spiking neuron network to neuromorphic hardware,

O. Jinet al., “Mapping very large scale spiking neuron network to neuromorphic hardware, ” inProceedings of the 28th ASPLOS Conference. ACM, 2023, p. 419–432

work page 2023

[5] [5]

Multilevel hypergraph partitioning: Applications in vlsi domain,

G. Karypiset al., “Multilevel hypergraph partitioning: Applications in vlsi domain, ” IEEE Trans. on VLSI Systems, vol. 7, no. 1, pp. 69–79, 1999

work page 1999

[6] [6]

The decomposition and combination paradigms of chiplet-based integrated chips,

F. Liet al., “The decomposition and combination paradigms of chiplet-based integrated chips, ”Integrated Circuits and Systems, vol. 1, no. 1, pp. 18–30, 2024

work page 2024

[7] [7]

Hyperg: Multilevel gpu-accelerated k-way hypergraph parti- tioner,

W. L. Leeet al., “Hyperg: Multilevel gpu-accelerated k-way hypergraph parti- tioner, ” inProceedings of the 30th ASP-DAC. ACM, 2025, p. 1031–1040

work page 2025

[8] [8]

Multilevel k-way hypergraph partitioning,

G. Karypis and V. Kumar, “Multilevel k-way hypergraph partitioning, ” inProceed- ings of the 36th Annual ACM/IEEE DAC. ACM, 1999, p. 343–348

work page 1999

[9] [9]

High-quality hypergraph partitioning,

S. Schlaget al., “High-quality hypergraph partitioning, ”ACM J. Exp. Algorithmics, vol. 27, Feb. 2023

work page 2023

[10] [10]

A linear-time heuristic for improving network partitions,

C. Fiduccia and R. Mattheyses, “A linear-time heuristic for improving network partitions, ” in19th DAC, 1982, pp. 175–181

work page 1982

[11] [11]

An accelerated procedure for hypergraph coarsening on the gpu,

L. Chenget al., “An accelerated procedure for hypergraph coarsening on the gpu, ” in2015 IEEE HPEC Conference, 2015, pp. 1–7

work page 2015

[12] [12]

Cyganet al.,Parameterized Algorithms

M. Cyganet al.,Parameterized Algorithms. Springer International Pub., Jul. 2015

work page 2015

[13] [13]

open-source artifact,

M. Ronzani, “open-source artifact, ” https://github.com/EMJzero/AxonCUDA, 2026

work page 2026