pith. machine review for the scientific record. sign in

arxiv: 2605.13690 · v1 · submitted 2026-05-13 · 💻 cs.LG · cs.AI

Recognition: no theorem link

The WidthWall: A Strict Expressivity Hierarchy for Hypergraph Neural Networks

Basel Alomair, Bhaskar Ramasubramanian, Fengqing Jiang, Kaiyuan Zheng, Linda Bushnell, Luyao Niu, Radha Poovendran, Yichen Feng, Yuetai Li

Authors on Pith no claims yet

Pith reviewed 2026-05-14 20:22 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords hypergraph neural networksexpressivity hierarchyhomomorphism densitieshypertree widthwidth wallhigher-order interactionsmotif countinggraph invariants
0
0 comments X

The pith

Hypergraph neural networks cannot represent invariants beyond a fixed hypertree width.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that homomorphism densities, which count how often small structural motifs appear, generate all continuous hypergraph invariants. These densities sort the invariants into a strict hierarchy indexed by hypertree width. The resulting Width Wall sets a hard bound: no fixed-depth HGNN, regardless of hidden dimension or training procedure, can capture invariants that require wider patterns. The framework unifies analysis across fifteen existing architectures and shows exactly what information is lost when hypergraphs are reduced to ordinary graphs by clique expansion.

Core claim

Homomorphism densities generate all continuous hypergraph invariants and organize them into a strict hierarchy indexed by hypertree width. This produces a Width Wall that no fixed-depth hypergraph neural network can cross, even with arbitrary hidden dimensions or training.

What carries the argument

Homomorphism densities that quantify motif frequencies, arranged into a hierarchy by hypertree width.

If this is right

  • Clique expansion necessarily discards information carried by patterns wider than the chosen clique size.
  • Fifteen distinct HGNN architectures receive a uniform characterization by the maximum hypertree width they can access.
  • Density-aware architectures can represent invariants lying above the width bound of ordinary message passing.
  • No increase in model size or training data overcomes the limit for invariants requiring wider hypertrees.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Node-classification performance on real hypergraphs should degrade precisely when the task requires patterns above the model's width.
  • Explicit computation of homomorphism densities offers one route to bypass the wall without increasing depth.
  • The same width-based separation may appear in other relational models that rely on local aggregation over higher-order relations.

Load-bearing premise

Homomorphism densities together with invariant approximation fully capture the continuous invariants relevant to HGNN expressivity and the resulting hierarchy is strict for every architecture considered.

What would settle it

A fixed-depth HGNN that accurately distinguishes or classifies hypergraphs differing only by an invariant whose minimal representation requires a hypertree width strictly larger than the model's message-passing width.

Figures

Figures reproduced from arXiv: 2605.13690 by Basel Alomair, Bhaskar Ramasubramanian, Fengqing Jiang, Kaiyuan Zheng, Linda Bushnell, Luyao Niu, Radha Poovendran, Yichen Feng, Yuetai Li.

Figure 1
Figure 1. Figure 1: Density lift vs. mean hyperedge size ¯k (strict ablation: identical recipe, density branch on/off). Filled: headroom datasets; open: ceiling. The density branch provides the largest lift on Gene-Disease (+4.3%) and House (+2.2%), the two highest-¯k datasets, and is correctly suppressed on the Senate ceiling case [PITH_FULL_IMAGE:figures/full_fig_p024_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Gate dynamics validate the profiler’s regime prediction. [PITH_FULL_IMAGE:figures/full_fig_p040_2.png] view at source ↗
read the original abstract

Hypergraphs provide a natural framework to model higher-order interactions in scientific, social, and biological systems. Hypergraph neural networks (HGNNs) aim to learn from such data, yet it remains unclear which higher-order structures these models can represent. We show that hypergraph expressivity is governed by which small patterns an architecture can detect and count. We formalize this via homomorphism densities, which measure how often a structural motif appears in a hypergraph. Combining classical homomorphism-count completeness with invariant approximation, we show that homomorphism densities generate all continuous hypergraph invariants and organize them into a strict hierarchy indexed by hypertree width. This yields a Width Wall: a fundamental architectural limit beyond which no hidden dimension, training procedure or fixed-depth HGNN can represent invariants requiring wider patterns. Our framework provides a unified characterization of 15 HGNN architectures, precisely identifies information lost by clique expansion, and motivates density-aware models that extend expressivity beyond bounded-width message passing. We experimentally validate this finding on an APPLICATION NODE CLASSIFICATION SUITE of real-world hypergraphs, where the Width Wall predicts when graph-reduction baselines fail and when density features help.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper claims that hypergraph expressivity is governed by homomorphism densities, which measure motif occurrences. Combining classical homomorphism-count completeness with invariant approximation, it shows these densities generate all continuous hypergraph invariants and induce a strict hierarchy indexed by hypertree width. This establishes the Width Wall: no hidden dimension, training procedure, or fixed-depth HGNN can represent invariants requiring wider patterns. The framework unifies 15 HGNN architectures, identifies information lost by clique expansion, and is validated experimentally on a node classification suite of real-world hypergraphs where the wall predicts failure of graph-reduction baselines.

Significance. If the result holds, the work provides a fundamental, parameter-free characterization of HGNN limits via homomorphism densities and hypertree width. The unification of 15 architectures and precise identification of clique-expansion losses are notable strengths, as is the motivation for density-aware models. The experimental validation on real hypergraphs adds practical relevance by showing when the theoretical wall manifests in application node classification.

major comments (1)
  1. [Main theorem on invariant generation and Width Wall] The central derivation (combining classical homomorphism-count completeness with the invariant-approximation step) lacks explicit error bounds or full derivation details for the approximation. This is load-bearing for the claim that homomorphism densities generate all continuous invariants and that the resulting hypertree-width hierarchy is strict for every fixed-depth HGNN architecture.
minor comments (1)
  1. [Abstract] The abstract capitalizes 'APPLICATION NODE CLASSIFICATION SUITE' without defining it; clarify whether this is a named benchmark or rephrase for readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive assessment of the paper's contributions, including the unification of 15 HGNN architectures, the identification of clique-expansion losses, and the experimental validation on real-world hypergraphs. We address the major comment below and will revise the manuscript accordingly to strengthen the central derivation.

read point-by-point responses
  1. Referee: [Main theorem on invariant generation and Width Wall] The central derivation (combining classical homomorphism-count completeness with the invariant-approximation step) lacks explicit error bounds or full derivation details for the approximation. This is load-bearing for the claim that homomorphism densities generate all continuous invariants and that the resulting hypertree-width hierarchy is strict for every fixed-depth HGNN architecture.

    Authors: We agree that the proof of the main theorem would benefit from expanded details. In the revised manuscript we will provide a complete, self-contained derivation that first recalls the classical homomorphism-count completeness theorem and then explicitly constructs the invariant-approximation step. We will derive explicit error bounds using the uniform continuity of continuous invariants on the compact space of hypergraphs with bounded degree and size; these bounds will quantify the approximation error in terms of the modulus of continuity and the sampling density of the homomorphism densities. The revised proof will thereby rigorously establish both that homomorphism densities generate all continuous hypergraph invariants and that the induced hypertree-width hierarchy is strict for any fixed-depth HGNN, independent of hidden dimension or training procedure. revision: yes

Circularity Check

0 steps flagged

Derivation self-contained via classical results

full rationale

The paper combines classical homomorphism-count completeness with an invariant-approximation step to show that homomorphism densities generate all continuous hypergraph invariants and organize them into a strict hypertree-width hierarchy. No equations reduce the central claims (Width Wall, expressivity limits) to fitted parameters, self-citations, or definitional equivalences. The hierarchy is derived from standard expressivity theory applied to hypergraphs without internal reduction to the paper's own inputs or prior author work. Experimental validation is presented separately and does not bear the proof load.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim depends on the assumption that homomorphism densities generate all continuous hypergraph invariants and that the width-indexed hierarchy is strict; no free parameters or invented physical entities are introduced.

axioms (2)
  • domain assumption Homomorphism densities generate all continuous hypergraph invariants
    Invoked when combining classical homomorphism-count completeness with invariant approximation to obtain the hierarchy.
  • ad hoc to paper The resulting hierarchy is strict and applies to all fixed-depth HGNN architectures
    Required for the Width Wall to be a fundamental architectural limit.
invented entities (1)
  • Width Wall no independent evidence
    purpose: Name for the expressivity limit beyond which no fixed-depth HGNN can represent wider-pattern invariants
    Conceptual label introduced to summarize the hierarchy result; no independent falsifiable prediction is attached.

pith-pipeline@v0.9.0 · 5531 in / 1383 out tokens · 55515 ms · 2026-05-14T20:22:38.410398+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

72 extracted references · 7 canonical work pages · 2 internal anchors

  1. [1]

    Absil, Robert Mahony, and Rodolphe Sepulchre.Optimization Algorithms on Matrix Manifolds

    P.-A. Absil, Robert Mahony, and Rodolphe Sepulchre.Optimization Algorithms on Matrix Manifolds. Princeton University Press, 2008

  2. [2]

    Ranking via Sinkhorn Propagation

    Ryan P. Adams and Richard S. Zemel. Ranking via sinkhorn propagation. InarXiv preprint arXiv:1106.1925, 2011

  3. [3]

    HyperSAGE: Gen- eralizing inductive representation learning on hypergraphs

    Devanshu Arya, Deepak Kumar Gupta, Stevan Rudinac, and Marcel Worring. HyperSAGE: Gen- eralizing inductive representation learning on hypergraphs. InarXiv preprint arXiv:2010.04558, 2020

  4. [4]

    Parameter-free hypergraph neural network for few-shot node classification.Advances in Neural Information Processing Systems (NeurIPS), 2025

    Chaewoon Bae, Doyun Choi, Jaehyun Lee, and Jaemin Yoo. Parameter-free hypergraph neural network for few-shot node classification.Advances in Neural Information Processing Systems (NeurIPS), 2025

  5. [5]

    Bartlett, Dylan J

    Peter L. Bartlett, Dylan J. Foster, and Matus J. Telgarsky. Spectrally-normalized margin bounds for neural networks. InAdvances in Neural Information Processing Systems (NeurIPS), 2017

  6. [6]

    Networks beyond pairwise interactions: Structure and dynamics.Physics Reports, 874:1–92, 2020

    Federico Battiston, Giulia Cencetti, Iacopo Iacopini, Vito Latora, Maxime Lucas, Alice Patania, Jean-Gabriel Young, and Giovanni Petri. Networks beyond pairwise interactions: Structure and dynamics.Physics Reports, 874:1–92, 2020

  7. [7]

    Benson, David F

    Austin R. Benson, David F. Gleich, and Jure Leskovec. Higher-order organization of complex networks.Science, 353(6295):163–166, 2016

  8. [8]

    Weisfeiler and Lehman go topological: Message passing simplicial networks

    Cristian Bodnar, Fabrizio Frasca, Yu Guang Wang, Nina Otter, Guido Montufar, Pietro Liò, and Michael Bronstein. Weisfeiler and Lehman go topological: Message passing simplicial networks. InInternational Conference on Machine Learning (ICML), 2021

  9. [9]

    Color refinement, homomorphisms, and hypergraphs

    Jan Böker. Color refinement, homomorphisms, and hypergraphs. InGraph-Theoretic Concepts in Computer Science (WG), volume 11789 ofLNCS, pages 338–350, 2019. doi: 10.1007/ 978-3-030-30786-8_26

  10. [10]

    Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

    Michael M. Bronstein, Joan Bruna, Taco Cohen, and Petar Veliˇckovi´c. Geometric deep learning: Grids, groups, graphs, geodesics, and gauges.arXiv preprint arXiv:2104.13478, 2021

  11. [11]

    An optimal lower bound on the number of variables for graph identification.Combinatorica, 12(4):389–410, 1992

    Jin-Yi Cai, Martin Fürer, and Neil Immerman. An optimal lower bound on the number of variables for graph identification.Combinatorica, 12(4):389–410, 1992

  12. [12]

    A recursive theta body for hypergraphs.Combinatorica, 43:909–938, 2023

    Davi Castro-Silva, Fernando Mário de Oliveira Filho, Lucas Slot, and Frank Vallentin. A recursive theta body for hypergraphs.Combinatorica, 43:909–938, 2023

  13. [13]

    Hubert Chan, Anand Louis, Zhihao Gavin Tang, and Chenzi Zhang

    T.-H. Hubert Chan, Anand Louis, Zhihao Gavin Tang, and Chenzi Zhang. Spectral properties of hypergraph laplacian and approximation algorithms.Journal of the ACM, 65(3), 2018

  14. [14]

    You are AllSet: A multiset function framework for hypergraph neural networks

    Eli Chien, Chao Pan, Jianhao Peng, and Olgica Milenkovic. You are AllSet: A multiset function framework for hypergraph neural networks. InInternational Conference on Learning Representations (ICLR), 2022

  15. [15]

    Weisfeiler and Lehman go categorical.arXiv preprint arXiv:2602.06787, 2026

    Seongjin Choi, Gahee Kim, and Se-Young Yun. Weisfeiler and Lehman go categorical.arXiv preprint arXiv:2602.06787, 2026

  16. [16]

    Colbourn and Jeffrey H

    Charles J. Colbourn and Jeffrey H. Dinitz.Handbook of Combinatorial Designs. CRC Press, 2nd edition, 2007

  17. [17]

    All convex invariant functions of Hermitian matrices.Archiv der Mathematik, 8:276–278, 1957

    Chandler Davis. All convex invariant functions of Hermitian matrices.Archiv der Mathematik, 8:276–278, 1957

  18. [18]

    Exploiting group symmetry in semidefinite programming relaxations of the quadratic assignment problem.Mathematical Programming, 122:225–246, 2010

    Etienne de Klerk and Renata Sotirov. Exploiting group symmetry in semidefinite programming relaxations of the quadratic assignment problem.Mathematical Programming, 122:225–246, 2010

  19. [19]

    Lovász meets Weisfeiler and Leman

    Holger Dell, Martin Grohe, and Gaurav Rattan. Lovász meets Weisfeiler and Leman. In International Colloquium on Automata, Languages, and Programming (ICALP), 2018. 11

  20. [20]

    HNHN: Hypergraph networks with hyperedge neurons

    Yihe Dong, Will Sawin, and Yoshua Bengio. HNHN: Hypergraph networks with hyperedge neurons. InICML Workshop on Graph Representation Learning and Beyond (GRL+), 2020. arXiv:2006.12278

  21. [21]

    Sheaf hypergraph networks

    Iulia Duta, Giulia Cassarà, Fabrizio Silvestri, and Pietro Liò. Sheaf hypergraph networks. In Advances in Neural Information Processing Systems (NeurIPS), 2023

  22. [22]

    A measure-theoretic approach to the theory of dense hyper- graphs.Advances in Mathematics, 231(3–4):1731–1772, 2012

    Gábor Elek and Balázs Szegedy. A measure-theoretic approach to the theory of dense hyper- graphs.Advances in Mathematics, 231(3–4):1731–1772, 2012

  23. [23]

    Hypergraph neural networks

    Yifan Feng, Haoxuan You, Zizhao Zhang, Rongrong Ji, and Yue Gao. Hypergraph neural networks. InAAAI Conference on Artificial Intelligence, 2019

  24. [24]

    James H. Fowler. Legislative cosponsorship networks in the US House and Senate.Social Networks, 28(4):454–465, 2006

  25. [25]

    Size-independent sample complexity of neural networks

    Noah Golowich, Alexander Rakhlin, and Ohad Shamir. Size-independent sample complexity of neural networks. InProceedings of the 31st Conference on Learning Theory (COLT), 2018

  26. [26]

    Hypertree decompositions and tractable queries.Journal of Computer and System Sciences, 64(3):579–627, 2002

    Georg Gottlob, Nicola Leone, and Francesco Scarcello. Hypertree decompositions and tractable queries.Journal of Computer and System Sciences, 64(3):579–627, 2002

  27. [27]

    Topological deep learning: Going beyond graph data

    Mustafa Hajij, Ghada Zamzmi, Theodore Papamarkou, Nina Miolane, Aldo Guzmán-Sáenz, Karthikeyan Natesan Ramamurthy, et al. Topological deep learning: Going beyond graph data. arXiv preprint arXiv:2206.00606, 2022

  28. [28]

    UniGNN: a unified framework for graph and hypergraph neural networks

    Jing Huang and Jie Yang. UniGNN: a unified framework for graph and hypergraph neural networks. InInternational Joint Conference on Artificial Intelligence (IJCAI), 2021

  29. [29]

    Universal invariant and equivariant graph neural networks

    Nicolas Keriven and Gabriel Peyré. Universal invariant and equivariant graph neural networks. InAdvances in Neural Information Processing Systems (NeurIPS), 2019

  30. [30]

    Equivariant hypergraph neural networks

    Jinwoo Kim, Saeyoon Oh, Sungjun Cho, and Seunghoon Hong. Equivariant hypergraph neural networks. InEuropean Conference on Computer Vision (ECCV), 2022

  31. [31]

    Kingma and Jimmy Ba

    Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InInterna- tional Conference on Learning Representations (ICLR), 2015

  32. [32]

    Kolda and Brett W

    Tamara G. Kolda and Brett W. Bader. Tensor decompositions and applications.SIAM Review, 51(3):455–500, 2009

  33. [33]

    Adrian S. Lewis. The convex analysis of unitarily invariant matrix functions.Journal of Convex Analysis, 2:173–183, 1995

  34. [34]

    Submodular hypergraphs: p-Laplacians, Cheeger inequalities and spectral clustering

    Pan Li and Olgica Milenkovic. Submodular hypergraphs: p-Laplacians, Cheeger inequalities and spectral clustering. InInternational Conference on Machine Learning (ICML), 2018

  35. [35]

    Implicit hypergraph neural networks: A sta- ble framework for higher-order relational learning with provable guarantees.arXiv preprint arXiv:2508.09427, 2025

    Xiaoyu Li, Guangyu Tang, and Jiaojiao Jiang. Implicit hypergraph neural networks: A sta- ble framework for higher-order relational learning with provable guarantees.arXiv preprint arXiv:2508.09427, 2025

  36. [36]

    Operations with structures.Acta Mathematica Hungarica, 18(3–4):321–328, 1967

    László Lovász. Operations with structures.Acta Mathematica Hungarica, 18(3–4):321–328, 1967

  37. [37]

    American Mathematical Society, 2012

    László Lovász.Large Networks and Graph Limits, volume 60 ofColloquium Publications. American Mathematical Society, 2012

  38. [38]

    Limits of dense graph sequences.Journal of Combinatorial Theory, Series B, 96(6):933–957, 2006

    László Lovász and Balázs Szegedy. Limits of dense graph sequences.Journal of Combinatorial Theory, Series B, 96(6):933–957, 2006

  39. [39]

    Provably powerful graph networks

    Haggai Maron, Heli Ben-Hamu, Hadar Serviansky, and Yaron Lipman. Provably powerful graph networks. InAdvances in Neural Information Processing Systems (NeurIPS), 2019

  40. [40]

    Invariant and equivariant graph networks

    Haggai Maron, Heli Ben-Hamu, Nadav Shamir, and Yaron Lipman. Invariant and equivariant graph networks. InInternational Conference on Learning Representations (ICLR), 2019. 12

  41. [41]

    Hamilton, Jan Eric Lenssen, Gaurav Rattan, and Martin Grohe

    Christopher Morris, Martin Ritzert, Matthias Fey, William L. Hamilton, Jan Eric Lenssen, Gaurav Rattan, and Martin Grohe. Weisfeiler and leman go neural: Higher-order graph neural networks. InAAAI Conference on Artificial Intelligence, 2019

  42. [42]

    Kriege, Martin Grohe, Matthias Fey, and Karsten Borgwardt

    Christopher Morris, Yaron Lipman, Haggai Maron, Bastian Rieck, Nils M. Kriege, Martin Grohe, Matthias Fey, and Karsten Borgwardt. Weisfeiler and Leman go machine learning: The story so far.Journal of Machine Learning Research, 24(333):1–59, 2023

  43. [43]

    On homomorphism indistinguishability and hypertree depth

    Benjamin Scheidt. On homomorphism indistinguishability and hypertree depth. InInternational Colloquium on Automata, Languages, and Programming (ICALP), 2024

  44. [44]

    Counting homomorphisms from hypergraphs of bounded generalised hypertree width: A logical characterisation

    Benjamin Scheidt and Nicole Schweikardt. Counting homomorphisms from hypergraphs of bounded generalised hypertree width: A logical characterisation. InInternational Symposium on Mathematical Foundations of Computer Science (MFCS), 2023

  45. [45]

    A relationship between arbitrary positive matrices and doubly stochastic matrices.Annals of Mathematical Statistics, 35:876–879, 1964

    Richard Sinkhorn. A relationship between arbitrary positive matrices and doubly stochastic matrices.Annals of Mathematical Statistics, 35:876–879, 1964

  46. [46]

    Springer, 2nd edition, 2008

    Bernd Sturmfels.Algorithms in Invariant Theory. Springer, 2nd edition, 2008

  47. [47]

    Training-free message passing for learning on hypergraphs

    Bohan Tang, Zexi Liu, Keyue Jiang, Siheng Chen, and Xiaowen Dong. Training-free message passing for learning on hypergraphs. InInternational Conference on Learning Representations (ICLR), 2025

  48. [48]

    Equivariant hyper- graph diffusion neural operators

    Peihao Wang, Shenghao Yang, Yunyu Liu, Zhangyang Wang, and Pan Li. Equivariant hyper- graph diffusion neural operators. InInternational Conference on Learning Representations (ICLR), 2023

  49. [49]

    How powerful are graph neural networks? InInternational Conference on Learning Representations (ICLR), 2019

    Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? InInternational Conference on Learning Representations (ICLR), 2019

  50. [50]

    HyperGCN: A new method for training graph convolutional networks on hypergraphs

    Naganand Yadati, Madhav Nimishakavi, Prateek Yadav, Vikram Niber, Anand Louis, and Partha Talukdar. HyperGCN: A new method for training graph convolutional networks on hypergraphs. InAdvances in Neural Information Processing Systems (NeurIPS), 2019

  51. [51]

    Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabás Póczos, Ruslan Salakhutdinov, and Alexander J. Smola. Deep sets. InAdvances in Neural Information Processing Systems (NeurIPS), 2017

  52. [52]

    Improved expressivity of hypergraph neural networks through high-dimensional generalized Weisfeiler- Leman algorithms

    Detian Zhang, Chengqiang Zhang, Yanghui Rao, Qing Li, and Chunjiang Zhu. Improved expressivity of hypergraph neural networks through high-dimensional generalized Weisfeiler- Leman algorithms. InInternational Conference on Machine Learning (ICML), volume 267 of PMLR, pages 76880–76908, 2025

  53. [53]

    Hypergraph limits: A regularity approach.Random Structures & Algorithms, 47 (2):205–226, 2015

    Yufei Zhao. Hypergraph limits: A regularity approach.Random Structures & Algorithms, 47 (2):205–226, 2015. 13 A Proofs This appendix collects the proofs of all results stated in the main text. The algebra-structure statement and completeness theorem (§2.3) are proved in §A.A.1–A.2; the strict hierarchy (Theorem 3.6) and its infinite refinement are in §A.4...

  54. [54]

    ✓” entries satisfy conditions (C1)–(C5). The “△

    for all A′ 2 ∈ O(A2). Applying the Reynolds operator: R(q)(A1) =q(A 1) (since q is already correct at A1 after averaging), andR(q)(A 2)is the average ofqoverO(A 2). More precisely, since O(A1) and O(A2) are finite disjoint sets in Rd, there exists a polynomial q taking the value 1 on O(A1) and 0 on O(A2) (by Lagrange interpolation in high enough degree). ...

  55. [55]

    Identical clique expansions:Both cover every pair exactly once, so ϕ(A1) =ϕ(A 2) = J−I(=K 13)

  56. [56]

    Since the Pasch count is an isomorphism invariant, the two systems are provably non-isomorphic

    Non-isomorphic: H1 has 13 Pasch configurations and automorphism group Z13 ⋊ Z3 (order 39); H2 has 8 Pasch configurations and a smaller automorphism group. Since the Pasch count is an isomorphism invariant, the two systems are provably non-isomorphic

  57. [57]

    F.2 The CFI Pair (Native vs

    Separated by elementary invariant: ΘA1(A1) = 156>Θ A1(A2), since no permutation mapsH 2 toH 1. F.2 The CFI Pair (Native vs. All) We adapt the Cai–Fürer–Immerman construction [11] to 3-uniform hypergraphs. Let G= (V G, EG) be a connected 3-regular base graph with girth >2k+ 1 . For each edge e={u, v} , introduce auxiliary nodesa 1 e, a2 e and hyperedges: T...

  58. [58]

    For datasets with large hyperedges (¯k≥20 , e.g., House), we use mean normalization: m(ℓ) e ←m (ℓ) e /|e| and ˜h(ℓ) v ← ˜h(ℓ) v /deg(v)

    AllDeepSets backbone.A two-layer message-passing network with set-function aggregation: h(0) =W inxv,(8) m(ℓ) e =P u∈e h(ℓ−1) u , ˜h(ℓ) v =P e∋v m(ℓ) e ,(9) h(ℓ) v = LN h(ℓ−1) v + MLP(ℓ) v (MLP(ℓ) e (˜h(ℓ) v )) ,(10) where each MLP is a two-layer network with ReLU and dropout, and LN denotes LayerNorm. For datasets with large hyperedges (¯k≥20 , e.g., Hou...

  59. [59]

    , ˆt(FP ,star(v))] via Monte Carlo sampling over the star neighborhood star(v) = {e∈ E:v∈e} , using 200 Monte Carlo samples per node (selected from grid {100,200,500} ; Table 10)

    Density branch.For each node v, we estimate local pattern densities ρv = [ˆt(F1,star(v)), . . . , ˆt(FP ,star(v))] via Monte Carlo sampling over the star neighborhood star(v) = {e∈ E:v∈e} , using 200 Monte Carlo samples per node (selected from grid {100,200,500} ; Table 10). The pattern library {F1, . . . , FP } is selected adaptively by the DATASETPROFIL...

  60. [60]

    Per-node gate.A learned gate controls the density contribution per node: gv =σ(w ⊤h(L) v +b g),(11) where σ is the sigmoid function,h(L) v is the backbone embedding, andbg is initialized togate_init— a hyperparameter set by the profiler: • Higher-order (¯k≥7):b g = 0.0(density active from epoch 0) • Mixed (4≤ ¯k <7):b g =−2.0(density partially suppressed)...

  61. [61]

    equation (11)).The gated density embedding is concatenated with the backbone output: ˆyv = MLPout h(L) v ∥g v ·z v ,(12) whereMLP out :R 2dh →R C is a two-layer classifier

    Concatenation fusion (cf. equation (11)).The gated density embedding is concatenated with the backbone output: ˆyv = MLPout h(L) v ∥g v ·z v ,(12) whereMLP out :R 2dh →R C is a two-layer classifier. 27

  62. [62]

    During the frozen phase, the model trains as a pure AllDeepSets backbone

    Staged unfreezing.To prevent early noise from the density branch corrupting the backbone, we freeze the gate and density projection for an initial period: higher-order datasets unfreeze at epoch 0; mixed at epoch 30; pairwise at epoch 50. During the frozen phase, the model trains as a pure AllDeepSets backbone. H.2 Training Recipe All datasets share the s...

  63. [63]

    Our work synthesizes these into a complete HGNN expressivity characterization

    refined this to hypertree depth. Our work synthesizes these into a complete HGNN expressivity characterization. GNN expressivity and the WL hierarchy.Xu et al. [49] and Morris et al. [41] established the connection between message-passing GNN expressivity and 1-WL. Higher-order networks match k-WL [ 39, 40]. For hypergraphs, Böker [9] extended color refin...

  64. [64]

    edge contribution

    established the Deep Sets foundation. Duta et al. [21] introduced sheaf-theoretic structure. Topological and geometric deep learning.Bodnar et al. [8] extended message passing to simplicial complexes; Hajij et al. [27] proposed a general topological framework. Bronstein et al. [10] proposed a group-theoretic design framework. Our work instantiates this fo...

  65. [65]

    r∗ ≥2 is necessary

    decomposes into pattern densities of ghtw = O(ℓ) via (14), approximation at hierarchy level Hr can estimate moments up to degree L≈c·r . Combining with Theorem K.5: widthr=⇒spectral gap accuracyϵ≈O Λ r (up to logarithmic factors). This is exactly the Jackson-type width–accuracy relationship flagged as open in §6, instantiated for the Hodge spectral gap. I...

  66. [66]

    Instead, it reduces Senate-Bills accuracy by 5.4 percentage points (90.5%→85.1% ) butimprovesHouse by 4.5 points

    MEAN normalization hurts small-edge datasets.We expected normalizing density features by edge count (replacing sum-pooling with mean-pooling over MC samples) to improve performance by removing scale dependence. Instead, it reduces Senate-Bills accuracy by 5.4 percentage points (90.5%→85.1% ) butimprovesHouse by 4.5 points. The explanation is structural: S...

  67. [67]

    A grid search over hidden ∈ {128,256} and J∈ {4,8} yielded at best 80.1%

    Cora-CA tuning does not transfer from Senate.Applying the Senate-Bills recipe (hidden =128, J=4, dropout=0.2) to Cora-CA produced 79.7±0.7% , marginallybelowthe default configuration (80.0±1.1% ). A grid search over hidden ∈ {128,256} and J∈ {4,8} yielded at best 80.1%. The lesson: Cora-CA’s small edges (¯k= 4.28 ) leave little room for invariant-based fe...

  68. [68]

    Additive fusion constrains density features to the same embedding space as the backbone, limiting the expressivity of the combined representation

    Additive fusion underperforms concatenation.An early DENSNET-D variant used additive fusion (hv +g v ·z density v ) instead of the concatenation-based design of Equation 5 in the main text. Additive fusion constrains density features to the same embedding space as the backbone, limiting the expressivity of the combined representation. Switching to concate...

  69. [69]

    AllSetTransformer similarly underperformed on the Multi-Order IWS task: 69.7% (vs

    AllDeepSets and AllSetTransformer collapse on structured data.In an early baseline sweep (v3 code, different hyperparameters), AllDeepSets achieved only 50.2% on House, near chance for a binary task; the final tuned result in Table 1 is higher ( 67.8%). AllSetTransformer similarly underperformed on the Multi-Order IWS task: 69.7% (vs. 100% for HNHN/UniGNN...

  70. [70]

    indifferent

    Gene-Disease: density lift revealed by strict ablation ( +4.3%).Gene-Disease ( n=5,012, m=2,009, ¯k=14, 21 classes) was initially classified as “indifferent” based on the original ablation (AllDeepSets at hidden=64 scoring 86.1% vs. DENSNET-D at hidden =128 scoring 86.0%). The strict ablation (Table 4) reveals the true picture: AllDeepSets at theidentical...

  71. [71]

    On CE-Hard this is expected (any native architecture suffices)

    Hierarchy scaling curve is flat.Running PDN with pattern libraries restricted to vmax ∈ {3,4,5,6} on all three IWS benchmarks yields 100% at every level. On CE-Hard this is expected (any native architecture suffices). On Native-Hard, AllDeepSets alone reaches only 92.4%, so the density branchiscontributing—but even the simplest patterns ( vmax=3) suffice ...

  72. [72]

    Sinkhorn alignment is prohibitive at scale.INVNETPDN on Gene-Disease ( n=5,012) achieves only 64.7% (single seed), far below every baseline including MLP. The Sinkhorn alignment branch requires O(n2) pairwise distance computation and iterative normalization; at this scale, the opti- mization landscape becomes intractable and the invariant features degener...