pith. sign in

arxiv: 2505.15325 · v3 · submitted 2025-05-21 · 💻 cs.CV

SoftHGNN: Soft Hypergraph Neural Networks for General Visual Recognition

Pith reviewed 2026-05-22 14:12 UTC · model grok-4.3

classification 💻 cs.CV
keywords hypergraph neural networkssoft hyperedgesvisual recognitionhigh-order interactionscontinuous participation weightslearnable prototypessparse hyperedge selection
0
0 comments X

The pith

SoftHGNN assigns image features to hyperedges via continuous weights from similarities to learnable prototypes, enabling efficient high-order relation modeling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops SoftHGNN to let vision models capture multi-way interactions among image tokens that pair-wise attention overlooks. It replaces rigid binary hyperedge links with soft versions where each token receives a continuous participation weight computed from its feature similarity to a small collection of trainable prototypes. These weights support message passing that mixes information across related groups in a differentiable way. A top-k selection step keeps only the most relevant hyperedges active while a balancing term prevents underused prototypes. Readers would care if this yields better feature representations for recognition tasks because it adds high-order context at low extra cost inside existing pipelines.

Core claim

Soft hyperedges are formed by computing similarities between vertex features and a small set of learnable hyperedge prototypes to produce continuous and differentiable participation weights; these weights serve as the medium for message aggregation that enriches representations with high-order contextual associations, with top-k selection and load-balancing regularization added to control cost when the number of prototypes grows.

What carries the argument

Soft hyperedges generated by continuous participation weights from feature-to-prototype similarities.

If this is right

  • Existing vision backbones receive high-order scene context through a lightweight late-stage module without redesigning early feature extraction.
  • Redundant hyperedge computation drops because only the top-k most relevant prototypes participate per input.
  • Balanced prototype usage prevents collapse to a few dominant hyperedges during training.
  • Performance rises on classification, detection, and related tasks across the tested datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The prototype-based soft assignment could transfer to video or point-cloud data where relations also involve more than two elements at once.
  • Varying the number of prototypes per task might expose a practical limit on how many distinct high-order groupings are useful in typical scenes.
  • Layering this soft hypergraph block inside transformer stages could test whether early and late high-order mixing compound each other.

Load-bearing premise

Similarities between vertex features and a small set of learnable prototypes yield participation weights that correctly reflect semantically meaningful high-order visual groupings.

What would settle it

Replacing the learned prototype similarities with uniform random weights and re-running the experiments on the five datasets; if accuracy gains remain comparable, the semantic weighting mechanism would not be essential to the reported improvements.

read the original abstract

Visual recognition relies on understanding the semantics of image tokens and their complex interactions. Mainstream self-attention methods, while effective at modeling global pair-wise relations, fail to capture high-order associations inherent in real-world scenes and often suffer from redundant computation. Hypergraphs extend conventional graphs by modeling high-order interactions and offer a promising framework for addressing these limitations. However, existing hypergraph neural networks typically rely on static and hard hyperedge assignments, which lead to redundant hyperedges and overlooking the continuity of visual semantics. In this work, we present Soft Hypergraph Neural Networks (SoftHGNN), a lightweight plug-and-play hypergraph computation method for late-stage semantic reasoning in existing vision pipelines. Our SoftHGNN introduces the concept of soft hyperedges, where each vertex is associated with hyperedges via continuous and differentiable participation weights rather than hard binary assignments. These weights are produced by measuring similarities between vertex features and a small set of learnable hyperedge prototypes, yielding input-adaptive and semantically rich soft hyperedges. Using soft hyperedges as the medium for message aggregation and dissemination, SoftHGNN enriches feature representations with high-order contextual associations. To further enhance efficiency when scaling up the number of soft hyperedges, we incorporate a sparse hyperedge selection mechanism that activates only the top-k important hyperedges, along with a load-balancing regularizer to ensure adequate and balanced hyperedge utilization. Experimental results across three tasks on five datasets demonstrate that SoftHGNN efficiently captures high-order associations in visual scenes, achieving significant performance improvements. The code is available at: https://github.com/Mengqi-Lei/SoftHGNN.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces SoftHGNN, a lightweight plug-and-play hypergraph module for late-stage reasoning in vision models. Soft hyperedges are constructed via continuous, differentiable participation weights obtained by measuring similarities between vertex (token) features and a small set of learnable hyperedge prototypes, followed by top-k sparse selection and a load-balancing regularizer. The module is inserted into existing pipelines and evaluated on three visual recognition tasks across five datasets, with the central claim that the soft hyperedges enable efficient capture of high-order associations and yield significant performance gains over baselines.

Significance. If the results and attribution hold, SoftHGNN would offer a practical, scalable alternative to dense attention or static hypergraphs for modeling multi-way relations in scenes. The public code release is a clear strength that supports reproducibility and follow-up work.

major comments (3)
  1. [§3.2] §3.2 (Soft Hyperedge Construction): the participation weights are defined directly from cosine similarity (or equivalent) between features and a fixed set of learnable prototypes followed by top-k and balancing; this construction is mathematically equivalent to soft feature clustering/assignment and does not contain an explicit mechanism that enforces or verifies multi-way relational structure (e.g., specific co-occurrence patterns). This is load-bearing for the claim that the module 'captures high-order associations' rather than simply adding capacity via prototype-based aggregation.
  2. [§4] §4 (Experimental Results and Ablations): the reported gains are not isolated from confounding factors such as increased model capacity or the effect of the load-balancing regularizer alone. No control experiment replaces the soft-hyperedge aggregator with a comparably parameterized MLP or standard soft-clustering layer; without this, it remains unclear whether the performance delta is attributable to high-order modeling.
  3. [Table 2] Table 2 (or equivalent main results table): absolute improvements are shown, but the manuscript does not report run-to-run variance, statistical significance tests, or confidence intervals. This weakens the assertion of 'significant performance improvements' across tasks.
minor comments (2)
  1. [§3.2] The exact mathematical definition of the participation weight matrix (including temperature or normalization) and the load-balancing loss term should be written out explicitly with equation numbers for clarity.
  2. [Figure 1] Figure 1 would benefit from explicit labels on the prototype similarity and top-k selection steps to make the data flow unambiguous.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, providing clarifications and indicating the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [§3.2] the participation weights are defined directly from cosine similarity (or equivalent) between features and a fixed set of learnable prototypes followed by top-k and balancing; this construction is mathematically equivalent to soft feature clustering/assignment and does not contain an explicit mechanism that enforces or verifies multi-way relational structure (e.g., specific co-occurrence patterns). This is load-bearing for the claim that the module 'captures high-order associations' rather than simply adding capacity via prototype-based aggregation.

    Authors: We appreciate this observation. While the soft assignment uses prototype similarities, the subsequent hypergraph message passing aggregates features across multiple vertices per hyperedge via the incidence matrix. This step explicitly realizes multi-way interactions, as each hyperedge pools information from a variable set of vertices in one operation—distinct from independent soft clustering. We have revised §3.2 to clarify this distinction and the role of the hypergraph convolution in enforcing high-order structure. revision: partial

  2. Referee: [§4] the reported gains are not isolated from confounding factors such as increased model capacity or the effect of the load-balancing regularizer alone. No control experiment replaces the soft-hyperedge aggregator with a comparably parameterized MLP or standard soft-clustering layer; without this, it remains unclear whether the performance delta is attributable to high-order modeling.

    Authors: This is a valid concern about isolating the source of gains. We have added ablation experiments in the revised §4 that replace the hypergraph aggregator with (i) an MLP of matched parameter count and (ii) a soft-clustering layer lacking the hypergraph structure. We also ablate the load-balancing regularizer in isolation. The new results show that the complete SoftHGNN outperforms these controls, supporting that the gains arise from high-order modeling. revision: yes

  3. Referee: [Table 2] absolute improvements are shown, but the manuscript does not report run-to-run variance, statistical significance tests, or confidence intervals. This weakens the assertion of 'significant performance improvements' across tasks.

    Authors: We agree that variability and statistical measures would improve rigor. In the revised manuscript we now report standard deviations over multiple random seeds for the main results in Table 2 and include paired t-test p-values comparing SoftHGNN against baselines to substantiate the significance of the observed improvements. revision: yes

Circularity Check

0 steps flagged

No circularity: soft hyperedge weights are an explicit design choice validated externally

full rationale

The paper defines soft hyperedges directly via a similarity-based construction between vertex features and learnable prototypes, followed by standard message passing. This is presented as a modeling decision in the method, not as a derived prediction or result that reduces to its own inputs by construction. No equations or claims show a fitted parameter being renamed as a prediction, nor does any load-bearing step rely on self-citation chains or imported uniqueness theorems. Experiments across independent datasets and tasks serve as external validation. The central mechanism does not collapse to tautology; concerns about whether the construction truly captures high-order relations are questions of modeling validity rather than circular derivation.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 2 invented entities

The method rests on the existence of a small fixed set of learnable prototypes that can represent the space of high-order visual semantics and on the assumption that top-k selection preserves the most relevant associations without introducing selection bias.

free parameters (2)
  • number of hyperedge prototypes
    A small set of learnable vectors whose count is chosen by the user and directly determines the number of soft hyperedges.
  • top-k value
    The number of most important soft hyperedges retained per input; controls sparsity and is a design choice.
axioms (2)
  • domain assumption Feature similarity to prototypes yields semantically continuous and differentiable participation weights.
    Invoked when constructing the soft hyperedge assignment matrix from vertex features.
  • standard math Standard back-propagation through the soft assignment and sparse selection operations is stable.
    Required for end-to-end training of the plug-and-play module.
invented entities (2)
  • soft hyperedge no independent evidence
    purpose: Continuous, input-adaptive high-order relation that replaces hard binary hyperedge membership.
    Central new construct that enables differentiable message passing over high-order groups.
  • hyperedge prototype no independent evidence
    purpose: Learnable reference vector used to compute participation weights for every vertex.
    New parameter set introduced to generate the soft assignments.

pith-pipeline@v0.9.0 · 5842 in / 1533 out tokens · 45364 ms · 2026-05-22T14:12:19.316354+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Sparse Hypergraph-Enhanced Frame-Event Object Detection with Fine-Grained MoE

    cs.CV 2026-04 unverdicted novelty 6.0

    Hyper-FEOD fuses RGB and event data via sparse hypergraph cross-modal fusion and region-specialized MoE experts to improve accuracy-efficiency in object detection.

  2. RACANet: Reliability-Aware Crowd Anchor Network for RGB-T Crowd Counting

    cs.CV 2026-04 unverdicted novelty 5.0

    RACANet proposes a reliability-aware two-stage fusion network with cross-modal pretraining and local anchor modules that outperforms prior RGB-T crowd counting methods on standard benchmarks.

Reference graph

Works this paper leans on

74 extracted references · 74 canonical work pages · cited by 2 Pith papers · 4 internal anchors

  1. [1]

    Wanyan, Y., Yang, X., Dong, W., Xu, C.: A comprehensive review of few-shot action recognition. Int. J. Comput. Vis. (2025)

  2. [2]

    IEEE Trans

    Chen, Y., Mancini, M., Zhu, X., Akata, Z.: Semi-supervised and unsupervised deep visual learning: A survey. IEEE Trans. Pat- tern Anal. Mach. Intell.46(3), 1327–1347 (2024)

  3. [3]

    IEEE Trans

    Li, X., Ding, H., Yuan, H., Zhang, W., Pang, J., Cheng, G., Chen, K., Liu, Z., Loy, C.C.: Transformer-based visual segmentation: A survey. IEEE Trans. Pattern Anal. Mach. Intell.46(12), 10138–10163 (2024)

  4. [4]

    Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S.,et al.: An image is worth 16×16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020)

  5. [5]

    Wang, W., Xie, E., Li, X., Fan, D.-P., Song, K., Liang, D., Lu, T., Luo, P., Shao, L.: Pyra- mid vision transformer: A versatile backbone for dense prediction without convolutions. In: Int. Conf. Comput. Vis., pp. 568–578 (2021)

  6. [6]

    Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin Trans- former: Hierarchical vision transformer using shifted windows. In: Int. Conf. Comput. Vis., pp. 10012–10022 (2021)

  7. [7]

    IEEE Trans

    Zhang, S., Meng, N., Lam, E.Y.: LRT: An efficient low-light restoration Transformer for dark light field images. IEEE Trans. Image Process.32, 4314–4326 (2023)

  8. [8]

    Zhang, J., Li, X., Wang, Y., Wang, C., Yang, Y., Liu, Y., Tao, D.: Eatformer: Improving vision transformer inspired by evolutionary algorithm. Int. J. Comput. Vis.132(9), 3509– 3536 (2024)

  9. [9]

    Wang, T., Zhang, K., Shao, Z., Luo, W., 22 Stenger, B., Lu, T., Kim, T.-K., Liu, W., Li, H.: Gridformer: Residual dense transformer with grid structure for image restoration in adverse weather conditions. Int. J. Comput. Vis.132(10), 4541–4563 (2024)

  10. [10]

    Vaswani, A., Shazeer, N., Parmar, N., Uszko- reit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. Adv. in Neur. Info. Process. Sys. (2017)

  11. [11]

    Wang, P., Zheng, W., Chen, T., Wang, Z.: Anti-oversmoothing in deep Vision Trans- formers via the Fourier domain analysis: From theory to practice. In: Int. Conf. Learn. Represent. (2022)

  12. [12]

    Nguyen, T., Nguyen, T., Baraniuk, R.: Mit- igating over-smoothing in Transformers via regularized nonlocal functionals. Adv. Neu- ral Inform. Process. Syst.36, 80233–80256 (2023)

  13. [13]

    Zhai, S., Likhomanenko, T., Littwin, E., Bus- bridge, D., Ramapuram, J., Zhang, Y., Gu, J., Susskind, J.M.: Stabilizing Transformer training by preventing attention entropy col- lapse. In: Int. Conf. Learn. Represent., pp. 40770–40803 (2023)

  14. [14]

    Engineering (2024)

    Gao, Y., Ji, S., Han, X., Dai, Q.: Hypergraph computation. Engineering (2024)

  15. [15]

    ACM Comp

    Antelmi, A., Cordasco, G., Polato, M., Scarano, V., Spagnuolo, C., Yang, D.: A sur- vey on hypergraph representation learning. ACM Comp. Surv.56(1), 1–38 (2023)

  16. [16]

    In: AAAI, pp

    Feng, Y., You, H., Zhang, Z., Ji, R., Gao, Y.: Hypergraph neural networks. In: AAAI, pp. 3558–3565 (2019)

  17. [17]

    arXiv preprint arXiv:2503.07959 (2025)

    Yang, M., Xu, X.-J.: Recent advances in hypergraph neural networks. arXiv preprint arXiv:2503.07959 (2025)

  18. [18]

    IEEE Trans

    Feng, Y., Huang, J., Du, S., Ying, S., Yong, J.-H., Li, Y., Ding, G., Ji, R., Gao, Y.: Hyper- YOLO: When Visual Object Detection Meets Hypergraph Computation. IEEE Trans. Pat- tern Anal. Mach. Intell.47(4), 2388–2401 (2025)

  19. [19]

    In: IJCAI, pp

    Cai, D., Song, M., Sun, C., Zhang, B., Hong, S., Li, H.: Hypergraph structure learning for hypergraph neural networks. In: IJCAI, pp. 1923–1929 (2022)

  20. [20]

    In: IJCAI, pp

    Zhang, Z., Lin, H., Gao, Y., BNRist, K.: Dynamic hypergraph structure learning. In: IJCAI, pp. 3162–3169 (2018)

  21. [21]

    IEEE Trans

    Liu, Q., Sun, Y., Wang, C., Liu, T., Tao, D.: Elastic net hypergraph learning for image clustering and semi-supervised classification. IEEE Trans. Image Process.26(1), 452–463 (2016)

  22. [22]

    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.,et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis.115(3), 211–252 (2015)

  23. [23]

    In: IEEE Conf

    Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: IEEE Conf. Comput. Vis. Pattern Recog., pp. 589–597 (2016)

  24. [24]

    Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll´ ar, P., Zitnick, C.L.: Microsoft COCO: Common objects in context. In: Eur. Conf. Comput. Vis., pp. 740–755 (2014)

  25. [25]

    ACM Comp

    Lee, G., Bu, F., Eliassi-Rad, T., Shin, K.: A survey on hypergraph mining: Patterns, tools, and generators. ACM Comp. Surv. (2024)

  26. [26]

    Di, D., Yang, J., Luo, C., Xue, Z., Chen, W., Yang, X., Gao, Y.: Hyper-3dg: Text-to- 3d gaussian generation via hypergraph. Int. J. Comput. Vis.133(5), 2886–2909 (2025)

  27. [27]

    In: ACM SIGKDD, pp

    Kim, S., Lee, S.Y., Gao, Y., Antelmi, A., Polato, M., Shin, K.: A survey on hypergraph neural networks: An in-depth and step-by- step guide. In: ACM SIGKDD, pp. 6534–6544 (2024)

  28. [28]

    IEEE Trans

    Gao, Y., Feng, Y., Ji, S., Ji, R.: HGNN+: General hypergraph neural networks. IEEE Trans. Pattern Anal. Mach. Intell.45(3), 23 3181–3199 (2022)

  29. [29]

    In: IJCAI, pp

    Jiang, J., Wei, Y., Feng, Y., Cao, J., Gao, Y.: Dynamic hypergraph neural networks. In: IJCAI, pp. 2635–2641 (2019)

  30. [30]

    In: IEEE Conf

    Kim, E.-S., Kang, W.Y., On, K.-W., Heo, Y.-J., Zhang, B.-T.: Hypergraph attention networks for multimodal learning. In: IEEE Conf. Comput. Vis. Pattern Recog., pp. 14581–14590 (2020)

  31. [31]

    ACM Trans

    Li, M., Zhang, Y., Li, X., Zhang, Y., Yin, B.: Hypergraph Transformer neural networks. ACM Trans. Know. Disc. Data17(5), 1–22 (2023)

  32. [32]

    IEEE Trans

    Zhu, J., Zhu, J., Ghosh, S., Wu, W., Yuan, J.: Social influence maximization in hypergraph in social networks. IEEE Trans. Net. Sci. Eng. 6(4), 801–811 (2018)

  33. [33]

    IEEE Trans

    Yang, D., Qu, B., Yang, J., Cudr´ e-Mauroux, P.: LBSN2Vec++: Heterogeneous hyper- graph embedding for location-based social networks. IEEE Trans. Know. Data Eng. 34(4), 1843–1855 (2020)

  34. [34]

    In: AAAI, pp

    Zeng, Y., Jin, Q., Bao, T., Li, W.: Multi- modal knowledge hypergraph for diverse image retrieval. In: AAAI, pp. 3376–3383 (2023)

  35. [35]

    In: AAAI, pp

    Xia, X., Yin, H., Yu, J., Wang, Q., Cui, L., Zhang, X.: Self-supervised hypergraph con- volutional networks for session-based recom- mendation. In: AAAI, pp. 4503–4511 (2021)

  36. [36]

    IEEE Trans

    La Gatta, V., Moscato, V., Pennone, M., Postiglione, M., Sperl´ ı, G.: Music recom- mendation via hypergraph embedding. IEEE Trans. Neur. Net. Learn. Sys.34(10), 7887– 7899 (2022)

  37. [37]

    IEEE Trans- actions on Image Processing33, 3301–3313 (2024)

    Ma, N., Wu, Z., Feng, Y., Wang, C., Gao, Y.: Multi-view time-series hypergraph neural network for action recognition. IEEE Trans- actions on Image Processing33, 3301–3313 (2024)

  38. [38]

    IEEE Transactions on Image Processing30, 2263–2275 (2021)

    Hao, X., Li, J., Guo, Y., Jiang, T., Yu, M.: Hypergraph neural network for skeleton- based action recognition. IEEE Transactions on Image Processing30, 2263–2275 (2021)

  39. [39]

    BMC Bioinfo.22(1), 287 (2021)

    Feng, S., Heath, E., Jefferson, B., Joslyn, C., Kvinge, H., Mitchell, H.D., Praggastis, B., Eisfeld, A.J., Sims, A.C., Thackray, L.B.,et al.: Hypergraph models of biological networks to identify genes critical to pathogenic viral response. BMC Bioinfo.22(1), 287 (2021)

  40. [40]

    Wang, Y., Wang, Z., Yu, X., Wang, X., Song, J., Yu, D.-J., Ge, F.: More: A multi- omics data-driven hypergraph integration network for biomedical data classification and biomarker identification. Brief. in Bioinfo. 26(1), 658 (2025)

  41. [41]

    IEEE Trans

    Bai, J., Gong, B., Zhao, Y., Lei, F., Yan, C., Gao, Y.: Multi-scale representation learn- ing on hypergraph for 3D shape retrieval and recognition. IEEE Trans. Image Process.30, 5327–5338 (2021)

  42. [42]

    IEEE Access12, 42816–42833 (2024)

    Hussain, M.: YOLOv1 to v8: Unveiling each variant-a comprehensive review of YOLO. IEEE Access12, 42816–42833 (2024)

  43. [43]

    Han, Y., Wang, P., Kundu, S., Ding, Y., Wang, Z.: Vision HGNN: An image is more than a graph of nodes. In: Int. Conf. Comput. Vis., pp. 19878–19888 (2023)

  44. [44]

    IEEE Trans

    Wang, H., Zhang, S., Leng, B.: HGFormer: Topology-aware vision transformer with hypergraph learning. IEEE Trans. Multime- dia (2025)

  45. [45]

    In: ACM Int

    Chen, L., Wang, Q., Li, Z., Yin, Y.: Hypergraph-guided intra-and inter-category relation modeling for fine-grained visual recognition. In: ACM Int. Conf. Multimedia, pp. 8043–8052 (2024)

  46. [46]

    In: IEEE Conf

    Fixelle, J.: Hypergraph Vision Transform- ers: Images are more than nodes, more than edges. In: IEEE Conf. Comput. Vis. Pattern Recog., pp. 9751–9761 (2025)

  47. [47]

    A survey on mixture of experts

    Cai, W., Jiang, J., Wang, F., Tang, J., Kim, S., Huang, J.: A survey on mixture of experts. arXiv preprint arXiv:2407.06204 (2024) 24

  48. [48]

    DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

    Dai, D., Deng, C., Zhao, C., Xu, R., Gao, H., Chen, D., Li, J., Zeng, W., Yu, X., Wu, Y., et al.: DeepseekMoE: Towards ulti- mate expert specialization in mixture-of- experts language models. arXiv preprint arXiv:2401.06066 (2024)

  49. [49]

    Han, K., Wang, Y., Guo, J., Tang, Y., Wu, E.: Vision GNN: An image is worth graph of nodes. Adv. Neural Inform. Process. Syst.35, 8291–8303 (2022)

  50. [50]

    arXiv preprint arXiv:2109.14483 (2021)

    Tian, Y., Chu, X., Wang, H.: CCTrans: Sim- plifying and improving crowd counting with transformer. arXiv preprint arXiv:2109.14483 (2021)

  51. [51]

    IEEE Trans

    Liu, X., Li, G., Qi, Y., Han, Z., Hen- gel, A., Sebe, N., Yang, M.-H., Huang, Q.: Consistency-aware anchor pyramid network for crowd localization. IEEE Trans. Pattern Anal. Mach. Intell. (2024)

  52. [52]

    IEEE Trans

    Sam, D.B., Peri, S.V., Sundararaman, M.N., Kamath, A., Babu, R.V.: Locate, size, and count: Accurately resolving people in dense crowds via detection. IEEE Trans. Pattern Anal. Mach. Intell.43(8), 2739–2751 (2021)

  53. [53]

    In: IEEE Conf

    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conf. Comput. Vis. Pattern Recog., pp. 770–778 (2016)

  54. [54]

    Adam: A Method for Stochastic Optimization

    Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  55. [55]

    https://github.com/ultralytics/ultralytics

    Jocher, G., Qiu, J.: Ultralytics YOLO11. https://github.com/ultralytics/ultralytics

  56. [56]

    YOLOv12: Attention-Centric Real-Time Object Detectors

    Tian, Y., Ye, Q., Doermann, D.: YOLOv12: Attention-centric real-time object detectors. arXiv preprint arXiv:2502.12524 (2025)

  57. [57]

    In: IEEE Conf

    Li, Y., Zhang, X., Chen, D.: CSRNet: Dilated convolutional neural networks for under- standing the highly congested scenes. In: IEEE Conf. Comput. Vis. Pattern Recog., pp. 1091–1100 (2018)

  58. [58]

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  59. [59]

    Ma, Z., Wei, X., Hong, X., Gong, Y.: Bayesian loss for crowd count estimation with point supervision. In: Int. Conf. Comput. Vis., pp. 6142–6151 (2019)

  60. [60]

    1595–1607 (2020)

    Wang, B., Liu, H., Samaras, D., Nguyen, M.H.: Distribution matching for crowd count- ing, pp. 1595–1607 (2020)

  61. [61]

    In: AAAI, pp

    Abousamra, S., Hoai, M., Samaras, D., Chen, C.: Localization in the crowd with topological constraints. In: AAAI, pp. 872–881 (2021)

  62. [62]

    Zeng, X., Hu, S., Wang, H., Zhang, J.: Joint contextual transformer and multi-scale infor- mation shared network for crowd counting. In: Int. Conf. Pattern Recog. Arti. Intell., pp. 412–417 (2022)

  63. [63]

    Liang, D., Xu, W., Bai, X.: An end-to-end transformer model for crowd localization. In: Eur. Conf. Comput. Vis., pp. 38–54 (2022)

  64. [64]

    IEEE Trans

    Wang, J., Gao, J., Yuan, Y., Wang, Q.: Crowd localization from gaussian mixture scoped knowledge and scoped teacher. IEEE Trans. Image Process.32, 1802–1814 (2023)

  65. [65]

    IEEE Trans

    Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., Liu, D., Mu, Y., Tan, M., Wang, X.,et al.: Deep high-resolution representation learning for visual recogni- tion. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3349–3364 (2020)

  66. [66]

    IEEE Trans

    Shu, W., Wan, J., Chan, A.B.: Generalized characteristic function loss for crowd anal- ysis in the frequency domain. IEEE Trans. Pattern Anal. Mach. Intell.46(5), 2882–2899 (2023)

  67. [67]

    In: AAAI, pp

    Lin, H., Ma, Z., Hong, X., Shangguan, Q., Meng, D.: GramFormer: Learning crowd counting via graph-modulated transformer. In: AAAI, pp. 3395–3403 (2024)

  68. [68]

    In: IEEE Conf

    Guo, M., Yuan, L., Yan, Z., Chen, B., Wang, Y., Ye, Q.: Regressor-segmenter mutual 25 prompt learning for crowd counting. In: IEEE Conf. Comput. Vis. Pattern Recog., pp. 28380–28389 (2024)

  69. [69]

    Chu, X., Tian, Z., Wang, Y., Zhang, B., Ren, H., Wei, X., Xia, H., Shen, C.: Twins: Revisiting the design of spatial attention in vision transformers. In: Adv. Neural Inform. Process. Syst., pp. 9355–9366 (2021)

  70. [70]

    Wang, W., Xie, E., Li, X., Fan, D.-P., Song, K., Liang, D., Lu, T., Luo, P., Shao, L.: PVT v2: Improved baselines with pyramid vision transformer. Comput. Visual Media 8(3), 415–424 (2022)

  71. [71]

    Wang, C., He, W., Nie, Y., Guo, J., Liu, C., Wang, Y., Han, K.: Gold-YOLO: Efficient object detector via gather-and-distribute mechanism. Adv. Neural Inform. Process. Syst., 51094–51112 (2023)

  72. [72]

    https://github.com/ultralytics/ ultralytics

    Jocher, G., Chaurasia, A., Qiu, J.: Ultralytics YOLOv8. https://github.com/ultralytics/ ultralytics

  73. [73]

    Wang, C.-Y., Liao, H.-Y.M.: YOLOv9: Learning what you want to learn using pro- grammable gradient information (2024)

  74. [74]

    Ao, W., Hui, C., Lihao, L.: YOLOv10: Real-time end-to-end object detection. arXiv preprint arXiv:2405.14458 (2024) 26 9 Appendix 9.1 Pseudo Code We provide the pseudo-code for SoftHGNN’s soft hyperedge generation, message passing on soft hyperedges, and sparse hyperedge selection, as shown in Algorithm 1, 2 and 3. 27 Algorithm 1Soft Hyperedge Generation R...