SoftHGNN: Soft Hypergraph Neural Networks for General Visual Recognition

Juan Wang; Mengqi Lei; Shaoyi Du; Siqi Li; Xinhu Zheng; Yihong Wu; Yue Gao

arxiv: 2505.15325 · v3 · submitted 2025-05-21 · 💻 cs.CV

SoftHGNN: Soft Hypergraph Neural Networks for General Visual Recognition

Mengqi Lei , Yihong Wu , Siqi Li , Xinhu Zheng , Juan Wang , Shaoyi Du , Yue Gao This is my paper

Pith reviewed 2026-05-22 14:12 UTC · model grok-4.3

classification 💻 cs.CV

keywords hypergraph neural networkssoft hyperedgesvisual recognitionhigh-order interactionscontinuous participation weightslearnable prototypessparse hyperedge selection

0 comments

The pith

SoftHGNN assigns image features to hyperedges via continuous weights from similarities to learnable prototypes, enabling efficient high-order relation modeling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops SoftHGNN to let vision models capture multi-way interactions among image tokens that pair-wise attention overlooks. It replaces rigid binary hyperedge links with soft versions where each token receives a continuous participation weight computed from its feature similarity to a small collection of trainable prototypes. These weights support message passing that mixes information across related groups in a differentiable way. A top-k selection step keeps only the most relevant hyperedges active while a balancing term prevents underused prototypes. Readers would care if this yields better feature representations for recognition tasks because it adds high-order context at low extra cost inside existing pipelines.

Core claim

Soft hyperedges are formed by computing similarities between vertex features and a small set of learnable hyperedge prototypes to produce continuous and differentiable participation weights; these weights serve as the medium for message aggregation that enriches representations with high-order contextual associations, with top-k selection and load-balancing regularization added to control cost when the number of prototypes grows.

What carries the argument

Soft hyperedges generated by continuous participation weights from feature-to-prototype similarities.

If this is right

Existing vision backbones receive high-order scene context through a lightweight late-stage module without redesigning early feature extraction.
Redundant hyperedge computation drops because only the top-k most relevant prototypes participate per input.
Balanced prototype usage prevents collapse to a few dominant hyperedges during training.
Performance rises on classification, detection, and related tasks across the tested datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The prototype-based soft assignment could transfer to video or point-cloud data where relations also involve more than two elements at once.
Varying the number of prototypes per task might expose a practical limit on how many distinct high-order groupings are useful in typical scenes.
Layering this soft hypergraph block inside transformer stages could test whether early and late high-order mixing compound each other.

Load-bearing premise

Similarities between vertex features and a small set of learnable prototypes yield participation weights that correctly reflect semantically meaningful high-order visual groupings.

What would settle it

Replacing the learned prototype similarities with uniform random weights and re-running the experiments on the five datasets; if accuracy gains remain comparable, the semantic weighting mechanism would not be essential to the reported improvements.

read the original abstract

Visual recognition relies on understanding the semantics of image tokens and their complex interactions. Mainstream self-attention methods, while effective at modeling global pair-wise relations, fail to capture high-order associations inherent in real-world scenes and often suffer from redundant computation. Hypergraphs extend conventional graphs by modeling high-order interactions and offer a promising framework for addressing these limitations. However, existing hypergraph neural networks typically rely on static and hard hyperedge assignments, which lead to redundant hyperedges and overlooking the continuity of visual semantics. In this work, we present Soft Hypergraph Neural Networks (SoftHGNN), a lightweight plug-and-play hypergraph computation method for late-stage semantic reasoning in existing vision pipelines. Our SoftHGNN introduces the concept of soft hyperedges, where each vertex is associated with hyperedges via continuous and differentiable participation weights rather than hard binary assignments. These weights are produced by measuring similarities between vertex features and a small set of learnable hyperedge prototypes, yielding input-adaptive and semantically rich soft hyperedges. Using soft hyperedges as the medium for message aggregation and dissemination, SoftHGNN enriches feature representations with high-order contextual associations. To further enhance efficiency when scaling up the number of soft hyperedges, we incorporate a sparse hyperedge selection mechanism that activates only the top-k important hyperedges, along with a load-balancing regularizer to ensure adequate and balanced hyperedge utilization. Experimental results across three tasks on five datasets demonstrate that SoftHGNN efficiently captures high-order associations in visual scenes, achieving significant performance improvements. The code is available at: https://github.com/Mengqi-Lei/SoftHGNN.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Soft hyperedges via prototypes offer a practical alternative to hard assignments but may not distinctly capture high-order relations beyond clustering.

read the letter

Hey, about the SoftHGNN paper, the main thing to know is that they propose generating soft hyperedges by computing similarities between image token features and a small set of learnable prototypes, then applying top-k selection and a load-balancing regularizer. This replaces the hard binary assignments common in earlier hypergraph neural nets. What they do well is keep the module lightweight and adaptable for adding high-order context at the end of vision models. The continuous weights allow for more nuanced participation, and the sparsity helps with efficiency when the number of hyperedges grows. Reporting gains on multiple datasets for different tasks, plus open-sourcing the code, gives it some practical value. The design seems coherent on paper for addressing redundancy in static hyperedges. Where it gets softer is in linking the mechanism to actual high-order visual relations. The prototype similarity approach is basically a soft assignment to clusters, and nothing in the setup forces or verifies that the resulting groups correspond to meaningful multi-way interactions like object parts or co-occurrences rather than broad feature similarity. It is possible the improvements come from the extra learnable parameters or the way messages are aggregated, not specifically from better high-order modeling. The reader's stress-test note hits on a real issue here, and the low soundness score from the abstract-only review makes sense until the full methods and ablations are checked. This would be relevant for a reading group on graph neural networks applied to vision or on efficient relational reasoning. Someone already working in that area might get value from the specific construction and the empirical results. I think it should go to peer review. The novelty in the soft construction is real, and the work is grounded enough to benefit from referee feedback on the attribution of results.

Referee Report

3 major / 2 minor

Summary. The paper introduces SoftHGNN, a lightweight plug-and-play hypergraph module for late-stage reasoning in vision models. Soft hyperedges are constructed via continuous, differentiable participation weights obtained by measuring similarities between vertex (token) features and a small set of learnable hyperedge prototypes, followed by top-k sparse selection and a load-balancing regularizer. The module is inserted into existing pipelines and evaluated on three visual recognition tasks across five datasets, with the central claim that the soft hyperedges enable efficient capture of high-order associations and yield significant performance gains over baselines.

Significance. If the results and attribution hold, SoftHGNN would offer a practical, scalable alternative to dense attention or static hypergraphs for modeling multi-way relations in scenes. The public code release is a clear strength that supports reproducibility and follow-up work.

major comments (3)

[§3.2] §3.2 (Soft Hyperedge Construction): the participation weights are defined directly from cosine similarity (or equivalent) between features and a fixed set of learnable prototypes followed by top-k and balancing; this construction is mathematically equivalent to soft feature clustering/assignment and does not contain an explicit mechanism that enforces or verifies multi-way relational structure (e.g., specific co-occurrence patterns). This is load-bearing for the claim that the module 'captures high-order associations' rather than simply adding capacity via prototype-based aggregation.
[§4] §4 (Experimental Results and Ablations): the reported gains are not isolated from confounding factors such as increased model capacity or the effect of the load-balancing regularizer alone. No control experiment replaces the soft-hyperedge aggregator with a comparably parameterized MLP or standard soft-clustering layer; without this, it remains unclear whether the performance delta is attributable to high-order modeling.
[Table 2] Table 2 (or equivalent main results table): absolute improvements are shown, but the manuscript does not report run-to-run variance, statistical significance tests, or confidence intervals. This weakens the assertion of 'significant performance improvements' across tasks.

minor comments (2)

[§3.2] The exact mathematical definition of the participation weight matrix (including temperature or normalization) and the load-balancing loss term should be written out explicitly with equation numbers for clarity.
[Figure 1] Figure 1 would benefit from explicit labels on the prototype similarity and top-k selection steps to make the data flow unambiguous.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, providing clarifications and indicating the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [§3.2] the participation weights are defined directly from cosine similarity (or equivalent) between features and a fixed set of learnable prototypes followed by top-k and balancing; this construction is mathematically equivalent to soft feature clustering/assignment and does not contain an explicit mechanism that enforces or verifies multi-way relational structure (e.g., specific co-occurrence patterns). This is load-bearing for the claim that the module 'captures high-order associations' rather than simply adding capacity via prototype-based aggregation.

Authors: We appreciate this observation. While the soft assignment uses prototype similarities, the subsequent hypergraph message passing aggregates features across multiple vertices per hyperedge via the incidence matrix. This step explicitly realizes multi-way interactions, as each hyperedge pools information from a variable set of vertices in one operation—distinct from independent soft clustering. We have revised §3.2 to clarify this distinction and the role of the hypergraph convolution in enforcing high-order structure. revision: partial
Referee: [§4] the reported gains are not isolated from confounding factors such as increased model capacity or the effect of the load-balancing regularizer alone. No control experiment replaces the soft-hyperedge aggregator with a comparably parameterized MLP or standard soft-clustering layer; without this, it remains unclear whether the performance delta is attributable to high-order modeling.

Authors: This is a valid concern about isolating the source of gains. We have added ablation experiments in the revised §4 that replace the hypergraph aggregator with (i) an MLP of matched parameter count and (ii) a soft-clustering layer lacking the hypergraph structure. We also ablate the load-balancing regularizer in isolation. The new results show that the complete SoftHGNN outperforms these controls, supporting that the gains arise from high-order modeling. revision: yes
Referee: [Table 2] absolute improvements are shown, but the manuscript does not report run-to-run variance, statistical significance tests, or confidence intervals. This weakens the assertion of 'significant performance improvements' across tasks.

Authors: We agree that variability and statistical measures would improve rigor. In the revised manuscript we now report standard deviations over multiple random seeds for the main results in Table 2 and include paired t-test p-values comparing SoftHGNN against baselines to substantiate the significance of the observed improvements. revision: yes

Circularity Check

0 steps flagged

No circularity: soft hyperedge weights are an explicit design choice validated externally

full rationale

The paper defines soft hyperedges directly via a similarity-based construction between vertex features and learnable prototypes, followed by standard message passing. This is presented as a modeling decision in the method, not as a derived prediction or result that reduces to its own inputs by construction. No equations or claims show a fitted parameter being renamed as a prediction, nor does any load-bearing step rely on self-citation chains or imported uniqueness theorems. Experiments across independent datasets and tasks serve as external validation. The central mechanism does not collapse to tautology; concerns about whether the construction truly captures high-order relations are questions of modeling validity rather than circular derivation.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 2 invented entities

The method rests on the existence of a small fixed set of learnable prototypes that can represent the space of high-order visual semantics and on the assumption that top-k selection preserves the most relevant associations without introducing selection bias.

free parameters (2)

number of hyperedge prototypes
A small set of learnable vectors whose count is chosen by the user and directly determines the number of soft hyperedges.
top-k value
The number of most important soft hyperedges retained per input; controls sparsity and is a design choice.

axioms (2)

domain assumption Feature similarity to prototypes yields semantically continuous and differentiable participation weights.
Invoked when constructing the soft hyperedge assignment matrix from vertex features.
standard math Standard back-propagation through the soft assignment and sparse selection operations is stable.
Required for end-to-end training of the plug-and-play module.

invented entities (2)

soft hyperedge no independent evidence
purpose: Continuous, input-adaptive high-order relation that replaces hard binary hyperedge membership.
Central new construct that enables differentiable message passing over high-order groups.
hyperedge prototype no independent evidence
purpose: Learnable reference vector used to compute participation weights for every vertex.
New parameter set introduced to generate the soft assignments.

pith-pipeline@v0.9.0 · 5842 in / 1533 out tokens · 45364 ms · 2026-05-22T14:12:19.316354+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

These weights are produced by measuring similarities between vertex features and a small set of learnable hyperedge prototypes, yielding input-adaptive and semantically rich soft hyperedges.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

A∈[0,1]^{N×M} is the participation matrix between vertices and hyperedges

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Sparse Hypergraph-Enhanced Frame-Event Object Detection with Fine-Grained MoE
cs.CV 2026-04 unverdicted novelty 6.0

Hyper-FEOD fuses RGB and event data via sparse hypergraph cross-modal fusion and region-specialized MoE experts to improve accuracy-efficiency in object detection.
RACANet: Reliability-Aware Crowd Anchor Network for RGB-T Crowd Counting
cs.CV 2026-04 unverdicted novelty 5.0

RACANet proposes a reliability-aware two-stage fusion network with cross-modal pretraining and local anchor modules that outperforms prior RGB-T crowd counting methods on standard benchmarks.

Reference graph

Works this paper leans on

74 extracted references · 74 canonical work pages · cited by 2 Pith papers · 4 internal anchors

[1]

Wanyan, Y., Yang, X., Dong, W., Xu, C.: A comprehensive review of few-shot action recognition. Int. J. Comput. Vis. (2025)

work page 2025
[2]

IEEE Trans

Chen, Y., Mancini, M., Zhu, X., Akata, Z.: Semi-supervised and unsupervised deep visual learning: A survey. IEEE Trans. Pat- tern Anal. Mach. Intell.46(3), 1327–1347 (2024)

work page 2024
[3]

IEEE Trans

Li, X., Ding, H., Yuan, H., Zhang, W., Pang, J., Cheng, G., Chen, K., Liu, Z., Loy, C.C.: Transformer-based visual segmentation: A survey. IEEE Trans. Pattern Anal. Mach. Intell.46(12), 10138–10163 (2024)

work page 2024
[4]

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S.,et al.: An image is worth 16×16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020)

work page 2020
[5]

Wang, W., Xie, E., Li, X., Fan, D.-P., Song, K., Liang, D., Lu, T., Luo, P., Shao, L.: Pyra- mid vision transformer: A versatile backbone for dense prediction without convolutions. In: Int. Conf. Comput. Vis., pp. 568–578 (2021)

work page 2021
[6]

Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin Trans- former: Hierarchical vision transformer using shifted windows. In: Int. Conf. Comput. Vis., pp. 10012–10022 (2021)

work page 2021
[7]

IEEE Trans

Zhang, S., Meng, N., Lam, E.Y.: LRT: An efficient low-light restoration Transformer for dark light field images. IEEE Trans. Image Process.32, 4314–4326 (2023)

work page 2023
[8]

Zhang, J., Li, X., Wang, Y., Wang, C., Yang, Y., Liu, Y., Tao, D.: Eatformer: Improving vision transformer inspired by evolutionary algorithm. Int. J. Comput. Vis.132(9), 3509– 3536 (2024)

work page 2024
[9]

Wang, T., Zhang, K., Shao, Z., Luo, W., 22 Stenger, B., Lu, T., Kim, T.-K., Liu, W., Li, H.: Gridformer: Residual dense transformer with grid structure for image restoration in adverse weather conditions. Int. J. Comput. Vis.132(10), 4541–4563 (2024)

work page 2024
[10]

Vaswani, A., Shazeer, N., Parmar, N., Uszko- reit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. Adv. in Neur. Info. Process. Sys. (2017)

work page 2017
[11]

Wang, P., Zheng, W., Chen, T., Wang, Z.: Anti-oversmoothing in deep Vision Trans- formers via the Fourier domain analysis: From theory to practice. In: Int. Conf. Learn. Represent. (2022)

work page 2022
[12]

Nguyen, T., Nguyen, T., Baraniuk, R.: Mit- igating over-smoothing in Transformers via regularized nonlocal functionals. Adv. Neu- ral Inform. Process. Syst.36, 80233–80256 (2023)

work page 2023
[13]

Zhai, S., Likhomanenko, T., Littwin, E., Bus- bridge, D., Ramapuram, J., Zhang, Y., Gu, J., Susskind, J.M.: Stabilizing Transformer training by preventing attention entropy col- lapse. In: Int. Conf. Learn. Represent., pp. 40770–40803 (2023)

work page 2023
[14]

Engineering (2024)

Gao, Y., Ji, S., Han, X., Dai, Q.: Hypergraph computation. Engineering (2024)

work page 2024
[15]

ACM Comp

Antelmi, A., Cordasco, G., Polato, M., Scarano, V., Spagnuolo, C., Yang, D.: A sur- vey on hypergraph representation learning. ACM Comp. Surv.56(1), 1–38 (2023)

work page 2023
[16]

In: AAAI, pp

Feng, Y., You, H., Zhang, Z., Ji, R., Gao, Y.: Hypergraph neural networks. In: AAAI, pp. 3558–3565 (2019)

work page 2019
[17]

arXiv preprint arXiv:2503.07959 (2025)

Yang, M., Xu, X.-J.: Recent advances in hypergraph neural networks. arXiv preprint arXiv:2503.07959 (2025)

work page arXiv 2025
[18]

IEEE Trans

Feng, Y., Huang, J., Du, S., Ying, S., Yong, J.-H., Li, Y., Ding, G., Ji, R., Gao, Y.: Hyper- YOLO: When Visual Object Detection Meets Hypergraph Computation. IEEE Trans. Pat- tern Anal. Mach. Intell.47(4), 2388–2401 (2025)

work page 2025
[19]

In: IJCAI, pp

Cai, D., Song, M., Sun, C., Zhang, B., Hong, S., Li, H.: Hypergraph structure learning for hypergraph neural networks. In: IJCAI, pp. 1923–1929 (2022)

work page 1923
[20]

In: IJCAI, pp

Zhang, Z., Lin, H., Gao, Y., BNRist, K.: Dynamic hypergraph structure learning. In: IJCAI, pp. 3162–3169 (2018)

work page 2018
[21]

IEEE Trans

Liu, Q., Sun, Y., Wang, C., Liu, T., Tao, D.: Elastic net hypergraph learning for image clustering and semi-supervised classification. IEEE Trans. Image Process.26(1), 452–463 (2016)

work page 2016
[22]

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.,et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis.115(3), 211–252 (2015)

work page 2015
[23]

In: IEEE Conf

Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: IEEE Conf. Comput. Vis. Pattern Recog., pp. 589–597 (2016)

work page 2016
[24]

Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll´ ar, P., Zitnick, C.L.: Microsoft COCO: Common objects in context. In: Eur. Conf. Comput. Vis., pp. 740–755 (2014)

work page 2014
[25]

ACM Comp

Lee, G., Bu, F., Eliassi-Rad, T., Shin, K.: A survey on hypergraph mining: Patterns, tools, and generators. ACM Comp. Surv. (2024)

work page 2024
[26]

Di, D., Yang, J., Luo, C., Xue, Z., Chen, W., Yang, X., Gao, Y.: Hyper-3dg: Text-to- 3d gaussian generation via hypergraph. Int. J. Comput. Vis.133(5), 2886–2909 (2025)

work page 2025
[27]

In: ACM SIGKDD, pp

Kim, S., Lee, S.Y., Gao, Y., Antelmi, A., Polato, M., Shin, K.: A survey on hypergraph neural networks: An in-depth and step-by- step guide. In: ACM SIGKDD, pp. 6534–6544 (2024)

work page 2024
[28]

IEEE Trans

Gao, Y., Feng, Y., Ji, S., Ji, R.: HGNN+: General hypergraph neural networks. IEEE Trans. Pattern Anal. Mach. Intell.45(3), 23 3181–3199 (2022)

work page 2022
[29]

In: IJCAI, pp

Jiang, J., Wei, Y., Feng, Y., Cao, J., Gao, Y.: Dynamic hypergraph neural networks. In: IJCAI, pp. 2635–2641 (2019)

work page 2019
[30]

In: IEEE Conf

Kim, E.-S., Kang, W.Y., On, K.-W., Heo, Y.-J., Zhang, B.-T.: Hypergraph attention networks for multimodal learning. In: IEEE Conf. Comput. Vis. Pattern Recog., pp. 14581–14590 (2020)

work page 2020
[31]

ACM Trans

Li, M., Zhang, Y., Li, X., Zhang, Y., Yin, B.: Hypergraph Transformer neural networks. ACM Trans. Know. Disc. Data17(5), 1–22 (2023)

work page 2023
[32]

IEEE Trans

Zhu, J., Zhu, J., Ghosh, S., Wu, W., Yuan, J.: Social influence maximization in hypergraph in social networks. IEEE Trans. Net. Sci. Eng. 6(4), 801–811 (2018)

work page 2018
[33]

IEEE Trans

Yang, D., Qu, B., Yang, J., Cudr´ e-Mauroux, P.: LBSN2Vec++: Heterogeneous hyper- graph embedding for location-based social networks. IEEE Trans. Know. Data Eng. 34(4), 1843–1855 (2020)

work page 2020
[34]

In: AAAI, pp

Zeng, Y., Jin, Q., Bao, T., Li, W.: Multi- modal knowledge hypergraph for diverse image retrieval. In: AAAI, pp. 3376–3383 (2023)

work page 2023
[35]

In: AAAI, pp

Xia, X., Yin, H., Yu, J., Wang, Q., Cui, L., Zhang, X.: Self-supervised hypergraph con- volutional networks for session-based recom- mendation. In: AAAI, pp. 4503–4511 (2021)

work page 2021
[36]

IEEE Trans

La Gatta, V., Moscato, V., Pennone, M., Postiglione, M., Sperl´ ı, G.: Music recom- mendation via hypergraph embedding. IEEE Trans. Neur. Net. Learn. Sys.34(10), 7887– 7899 (2022)

work page 2022
[37]

IEEE Trans- actions on Image Processing33, 3301–3313 (2024)

Ma, N., Wu, Z., Feng, Y., Wang, C., Gao, Y.: Multi-view time-series hypergraph neural network for action recognition. IEEE Trans- actions on Image Processing33, 3301–3313 (2024)

work page 2024
[38]

IEEE Transactions on Image Processing30, 2263–2275 (2021)

Hao, X., Li, J., Guo, Y., Jiang, T., Yu, M.: Hypergraph neural network for skeleton- based action recognition. IEEE Transactions on Image Processing30, 2263–2275 (2021)

work page 2021
[39]

BMC Bioinfo.22(1), 287 (2021)

Feng, S., Heath, E., Jefferson, B., Joslyn, C., Kvinge, H., Mitchell, H.D., Praggastis, B., Eisfeld, A.J., Sims, A.C., Thackray, L.B.,et al.: Hypergraph models of biological networks to identify genes critical to pathogenic viral response. BMC Bioinfo.22(1), 287 (2021)

work page 2021
[40]

Wang, Y., Wang, Z., Yu, X., Wang, X., Song, J., Yu, D.-J., Ge, F.: More: A multi- omics data-driven hypergraph integration network for biomedical data classification and biomarker identification. Brief. in Bioinfo. 26(1), 658 (2025)

work page 2025
[41]

IEEE Trans

Bai, J., Gong, B., Zhao, Y., Lei, F., Yan, C., Gao, Y.: Multi-scale representation learn- ing on hypergraph for 3D shape retrieval and recognition. IEEE Trans. Image Process.30, 5327–5338 (2021)

work page 2021
[42]

IEEE Access12, 42816–42833 (2024)

Hussain, M.: YOLOv1 to v8: Unveiling each variant-a comprehensive review of YOLO. IEEE Access12, 42816–42833 (2024)

work page 2024
[43]

Han, Y., Wang, P., Kundu, S., Ding, Y., Wang, Z.: Vision HGNN: An image is more than a graph of nodes. In: Int. Conf. Comput. Vis., pp. 19878–19888 (2023)

work page 2023
[44]

IEEE Trans

Wang, H., Zhang, S., Leng, B.: HGFormer: Topology-aware vision transformer with hypergraph learning. IEEE Trans. Multime- dia (2025)

work page 2025
[45]

In: ACM Int

Chen, L., Wang, Q., Li, Z., Yin, Y.: Hypergraph-guided intra-and inter-category relation modeling for fine-grained visual recognition. In: ACM Int. Conf. Multimedia, pp. 8043–8052 (2024)

work page 2024
[46]

In: IEEE Conf

Fixelle, J.: Hypergraph Vision Transform- ers: Images are more than nodes, more than edges. In: IEEE Conf. Comput. Vis. Pattern Recog., pp. 9751–9761 (2025)

work page 2025
[47]

A survey on mixture of experts

Cai, W., Jiang, J., Wang, F., Tang, J., Kim, S., Huang, J.: A survey on mixture of experts. arXiv preprint arXiv:2407.06204 (2024) 24

work page arXiv 2024
[48]

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Dai, D., Deng, C., Zhao, C., Xu, R., Gao, H., Chen, D., Li, J., Zeng, W., Yu, X., Wu, Y., et al.: DeepseekMoE: Towards ulti- mate expert specialization in mixture-of- experts language models. arXiv preprint arXiv:2401.06066 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[49]

Han, K., Wang, Y., Guo, J., Tang, Y., Wu, E.: Vision GNN: An image is worth graph of nodes. Adv. Neural Inform. Process. Syst.35, 8291–8303 (2022)

work page 2022
[50]

arXiv preprint arXiv:2109.14483 (2021)

Tian, Y., Chu, X., Wang, H.: CCTrans: Sim- plifying and improving crowd counting with transformer. arXiv preprint arXiv:2109.14483 (2021)

work page arXiv 2021
[51]

IEEE Trans

Liu, X., Li, G., Qi, Y., Han, Z., Hen- gel, A., Sebe, N., Yang, M.-H., Huang, Q.: Consistency-aware anchor pyramid network for crowd localization. IEEE Trans. Pattern Anal. Mach. Intell. (2024)

work page 2024
[52]

IEEE Trans

Sam, D.B., Peri, S.V., Sundararaman, M.N., Kamath, A., Babu, R.V.: Locate, size, and count: Accurately resolving people in dense crowds via detection. IEEE Trans. Pattern Anal. Mach. Intell.43(8), 2739–2751 (2021)

work page 2021
[53]

In: IEEE Conf

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conf. Comput. Vis. Pattern Recog., pp. 770–778 (2016)

work page 2016
[54]

Adam: A Method for Stochastic Optimization

Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[55]

https://github.com/ultralytics/ultralytics

Jocher, G., Qiu, J.: Ultralytics YOLO11. https://github.com/ultralytics/ultralytics

work page
[56]

YOLOv12: Attention-Centric Real-Time Object Detectors

Tian, Y., Ye, Q., Doermann, D.: YOLOv12: Attention-centric real-time object detectors. arXiv preprint arXiv:2502.12524 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[57]

In: IEEE Conf

Li, Y., Zhang, X., Chen, D.: CSRNet: Dilated convolutional neural networks for under- standing the highly congested scenes. In: IEEE Conf. Comput. Vis. Pattern Recog., pp. 1091–1100 (2018)

work page 2018
[58]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[59]

Ma, Z., Wei, X., Hong, X., Gong, Y.: Bayesian loss for crowd count estimation with point supervision. In: Int. Conf. Comput. Vis., pp. 6142–6151 (2019)

work page 2019
[60]

1595–1607 (2020)

Wang, B., Liu, H., Samaras, D., Nguyen, M.H.: Distribution matching for crowd count- ing, pp. 1595–1607 (2020)

work page 2020
[61]

In: AAAI, pp

Abousamra, S., Hoai, M., Samaras, D., Chen, C.: Localization in the crowd with topological constraints. In: AAAI, pp. 872–881 (2021)

work page 2021
[62]

Zeng, X., Hu, S., Wang, H., Zhang, J.: Joint contextual transformer and multi-scale infor- mation shared network for crowd counting. In: Int. Conf. Pattern Recog. Arti. Intell., pp. 412–417 (2022)

work page 2022
[63]

Liang, D., Xu, W., Bai, X.: An end-to-end transformer model for crowd localization. In: Eur. Conf. Comput. Vis., pp. 38–54 (2022)

work page 2022
[64]

IEEE Trans

Wang, J., Gao, J., Yuan, Y., Wang, Q.: Crowd localization from gaussian mixture scoped knowledge and scoped teacher. IEEE Trans. Image Process.32, 1802–1814 (2023)

work page 2023
[65]

IEEE Trans

Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., Liu, D., Mu, Y., Tan, M., Wang, X.,et al.: Deep high-resolution representation learning for visual recogni- tion. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3349–3364 (2020)

work page 2020
[66]

IEEE Trans

Shu, W., Wan, J., Chan, A.B.: Generalized characteristic function loss for crowd anal- ysis in the frequency domain. IEEE Trans. Pattern Anal. Mach. Intell.46(5), 2882–2899 (2023)

work page 2023
[67]

In: AAAI, pp

Lin, H., Ma, Z., Hong, X., Shangguan, Q., Meng, D.: GramFormer: Learning crowd counting via graph-modulated transformer. In: AAAI, pp. 3395–3403 (2024)

work page 2024
[68]

In: IEEE Conf

Guo, M., Yuan, L., Yan, Z., Chen, B., Wang, Y., Ye, Q.: Regressor-segmenter mutual 25 prompt learning for crowd counting. In: IEEE Conf. Comput. Vis. Pattern Recog., pp. 28380–28389 (2024)

work page 2024
[69]

Chu, X., Tian, Z., Wang, Y., Zhang, B., Ren, H., Wei, X., Xia, H., Shen, C.: Twins: Revisiting the design of spatial attention in vision transformers. In: Adv. Neural Inform. Process. Syst., pp. 9355–9366 (2021)

work page 2021
[70]

Wang, W., Xie, E., Li, X., Fan, D.-P., Song, K., Liang, D., Lu, T., Luo, P., Shao, L.: PVT v2: Improved baselines with pyramid vision transformer. Comput. Visual Media 8(3), 415–424 (2022)

work page 2022
[71]

Wang, C., He, W., Nie, Y., Guo, J., Liu, C., Wang, Y., Han, K.: Gold-YOLO: Efficient object detector via gather-and-distribute mechanism. Adv. Neural Inform. Process. Syst., 51094–51112 (2023)

work page 2023
[72]

https://github.com/ultralytics/ ultralytics

Jocher, G., Chaurasia, A., Qiu, J.: Ultralytics YOLOv8. https://github.com/ultralytics/ ultralytics

work page
[73]

Wang, C.-Y., Liao, H.-Y.M.: YOLOv9: Learning what you want to learn using pro- grammable gradient information (2024)

work page 2024
[74]

Ao, W., Hui, C., Lihao, L.: YOLOv10: Real-time end-to-end object detection. arXiv preprint arXiv:2405.14458 (2024) 26 9 Appendix 9.1 Pseudo Code We provide the pseudo-code for SoftHGNN’s soft hyperedge generation, message passing on soft hyperedges, and sparse hyperedge selection, as shown in Algorithm 1, 2 and 3. 27 Algorithm 1Soft Hyperedge Generation R...

work page arXiv 2024

[1] [1]

Wanyan, Y., Yang, X., Dong, W., Xu, C.: A comprehensive review of few-shot action recognition. Int. J. Comput. Vis. (2025)

work page 2025

[2] [2]

IEEE Trans

Chen, Y., Mancini, M., Zhu, X., Akata, Z.: Semi-supervised and unsupervised deep visual learning: A survey. IEEE Trans. Pat- tern Anal. Mach. Intell.46(3), 1327–1347 (2024)

work page 2024

[3] [3]

IEEE Trans

Li, X., Ding, H., Yuan, H., Zhang, W., Pang, J., Cheng, G., Chen, K., Liu, Z., Loy, C.C.: Transformer-based visual segmentation: A survey. IEEE Trans. Pattern Anal. Mach. Intell.46(12), 10138–10163 (2024)

work page 2024

[4] [4]

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S.,et al.: An image is worth 16×16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020)

work page 2020

[5] [5]

Wang, W., Xie, E., Li, X., Fan, D.-P., Song, K., Liang, D., Lu, T., Luo, P., Shao, L.: Pyra- mid vision transformer: A versatile backbone for dense prediction without convolutions. In: Int. Conf. Comput. Vis., pp. 568–578 (2021)

work page 2021

[6] [6]

Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin Trans- former: Hierarchical vision transformer using shifted windows. In: Int. Conf. Comput. Vis., pp. 10012–10022 (2021)

work page 2021

[7] [7]

IEEE Trans

Zhang, S., Meng, N., Lam, E.Y.: LRT: An efficient low-light restoration Transformer for dark light field images. IEEE Trans. Image Process.32, 4314–4326 (2023)

work page 2023

[8] [8]

Zhang, J., Li, X., Wang, Y., Wang, C., Yang, Y., Liu, Y., Tao, D.: Eatformer: Improving vision transformer inspired by evolutionary algorithm. Int. J. Comput. Vis.132(9), 3509– 3536 (2024)

work page 2024

[9] [9]

Wang, T., Zhang, K., Shao, Z., Luo, W., 22 Stenger, B., Lu, T., Kim, T.-K., Liu, W., Li, H.: Gridformer: Residual dense transformer with grid structure for image restoration in adverse weather conditions. Int. J. Comput. Vis.132(10), 4541–4563 (2024)

work page 2024

[10] [10]

Vaswani, A., Shazeer, N., Parmar, N., Uszko- reit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. Adv. in Neur. Info. Process. Sys. (2017)

work page 2017

[11] [11]

Wang, P., Zheng, W., Chen, T., Wang, Z.: Anti-oversmoothing in deep Vision Trans- formers via the Fourier domain analysis: From theory to practice. In: Int. Conf. Learn. Represent. (2022)

work page 2022

[12] [12]

Nguyen, T., Nguyen, T., Baraniuk, R.: Mit- igating over-smoothing in Transformers via regularized nonlocal functionals. Adv. Neu- ral Inform. Process. Syst.36, 80233–80256 (2023)

work page 2023

[13] [13]

Zhai, S., Likhomanenko, T., Littwin, E., Bus- bridge, D., Ramapuram, J., Zhang, Y., Gu, J., Susskind, J.M.: Stabilizing Transformer training by preventing attention entropy col- lapse. In: Int. Conf. Learn. Represent., pp. 40770–40803 (2023)

work page 2023

[14] [14]

Engineering (2024)

Gao, Y., Ji, S., Han, X., Dai, Q.: Hypergraph computation. Engineering (2024)

work page 2024

[15] [15]

ACM Comp

Antelmi, A., Cordasco, G., Polato, M., Scarano, V., Spagnuolo, C., Yang, D.: A sur- vey on hypergraph representation learning. ACM Comp. Surv.56(1), 1–38 (2023)

work page 2023

[16] [16]

In: AAAI, pp

Feng, Y., You, H., Zhang, Z., Ji, R., Gao, Y.: Hypergraph neural networks. In: AAAI, pp. 3558–3565 (2019)

work page 2019

[17] [17]

arXiv preprint arXiv:2503.07959 (2025)

Yang, M., Xu, X.-J.: Recent advances in hypergraph neural networks. arXiv preprint arXiv:2503.07959 (2025)

work page arXiv 2025

[18] [18]

IEEE Trans

Feng, Y., Huang, J., Du, S., Ying, S., Yong, J.-H., Li, Y., Ding, G., Ji, R., Gao, Y.: Hyper- YOLO: When Visual Object Detection Meets Hypergraph Computation. IEEE Trans. Pat- tern Anal. Mach. Intell.47(4), 2388–2401 (2025)

work page 2025

[19] [19]

In: IJCAI, pp

Cai, D., Song, M., Sun, C., Zhang, B., Hong, S., Li, H.: Hypergraph structure learning for hypergraph neural networks. In: IJCAI, pp. 1923–1929 (2022)

work page 1923

[20] [20]

In: IJCAI, pp

Zhang, Z., Lin, H., Gao, Y., BNRist, K.: Dynamic hypergraph structure learning. In: IJCAI, pp. 3162–3169 (2018)

work page 2018

[21] [21]

IEEE Trans

Liu, Q., Sun, Y., Wang, C., Liu, T., Tao, D.: Elastic net hypergraph learning for image clustering and semi-supervised classification. IEEE Trans. Image Process.26(1), 452–463 (2016)

work page 2016

[22] [22]

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.,et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis.115(3), 211–252 (2015)

work page 2015

[23] [23]

In: IEEE Conf

Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: IEEE Conf. Comput. Vis. Pattern Recog., pp. 589–597 (2016)

work page 2016

[24] [24]

Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll´ ar, P., Zitnick, C.L.: Microsoft COCO: Common objects in context. In: Eur. Conf. Comput. Vis., pp. 740–755 (2014)

work page 2014

[25] [25]

ACM Comp

Lee, G., Bu, F., Eliassi-Rad, T., Shin, K.: A survey on hypergraph mining: Patterns, tools, and generators. ACM Comp. Surv. (2024)

work page 2024

[26] [26]

Di, D., Yang, J., Luo, C., Xue, Z., Chen, W., Yang, X., Gao, Y.: Hyper-3dg: Text-to- 3d gaussian generation via hypergraph. Int. J. Comput. Vis.133(5), 2886–2909 (2025)

work page 2025

[27] [27]

In: ACM SIGKDD, pp

Kim, S., Lee, S.Y., Gao, Y., Antelmi, A., Polato, M., Shin, K.: A survey on hypergraph neural networks: An in-depth and step-by- step guide. In: ACM SIGKDD, pp. 6534–6544 (2024)

work page 2024

[28] [28]

IEEE Trans

Gao, Y., Feng, Y., Ji, S., Ji, R.: HGNN+: General hypergraph neural networks. IEEE Trans. Pattern Anal. Mach. Intell.45(3), 23 3181–3199 (2022)

work page 2022

[29] [29]

In: IJCAI, pp

Jiang, J., Wei, Y., Feng, Y., Cao, J., Gao, Y.: Dynamic hypergraph neural networks. In: IJCAI, pp. 2635–2641 (2019)

work page 2019

[30] [30]

In: IEEE Conf

Kim, E.-S., Kang, W.Y., On, K.-W., Heo, Y.-J., Zhang, B.-T.: Hypergraph attention networks for multimodal learning. In: IEEE Conf. Comput. Vis. Pattern Recog., pp. 14581–14590 (2020)

work page 2020

[31] [31]

ACM Trans

Li, M., Zhang, Y., Li, X., Zhang, Y., Yin, B.: Hypergraph Transformer neural networks. ACM Trans. Know. Disc. Data17(5), 1–22 (2023)

work page 2023

[32] [32]

IEEE Trans

Zhu, J., Zhu, J., Ghosh, S., Wu, W., Yuan, J.: Social influence maximization in hypergraph in social networks. IEEE Trans. Net. Sci. Eng. 6(4), 801–811 (2018)

work page 2018

[33] [33]

IEEE Trans

Yang, D., Qu, B., Yang, J., Cudr´ e-Mauroux, P.: LBSN2Vec++: Heterogeneous hyper- graph embedding for location-based social networks. IEEE Trans. Know. Data Eng. 34(4), 1843–1855 (2020)

work page 2020

[34] [34]

In: AAAI, pp

Zeng, Y., Jin, Q., Bao, T., Li, W.: Multi- modal knowledge hypergraph for diverse image retrieval. In: AAAI, pp. 3376–3383 (2023)

work page 2023

[35] [35]

In: AAAI, pp

Xia, X., Yin, H., Yu, J., Wang, Q., Cui, L., Zhang, X.: Self-supervised hypergraph con- volutional networks for session-based recom- mendation. In: AAAI, pp. 4503–4511 (2021)

work page 2021

[36] [36]

IEEE Trans

La Gatta, V., Moscato, V., Pennone, M., Postiglione, M., Sperl´ ı, G.: Music recom- mendation via hypergraph embedding. IEEE Trans. Neur. Net. Learn. Sys.34(10), 7887– 7899 (2022)

work page 2022

[37] [37]

IEEE Trans- actions on Image Processing33, 3301–3313 (2024)

Ma, N., Wu, Z., Feng, Y., Wang, C., Gao, Y.: Multi-view time-series hypergraph neural network for action recognition. IEEE Trans- actions on Image Processing33, 3301–3313 (2024)

work page 2024

[38] [38]

IEEE Transactions on Image Processing30, 2263–2275 (2021)

Hao, X., Li, J., Guo, Y., Jiang, T., Yu, M.: Hypergraph neural network for skeleton- based action recognition. IEEE Transactions on Image Processing30, 2263–2275 (2021)

work page 2021

[39] [39]

BMC Bioinfo.22(1), 287 (2021)

Feng, S., Heath, E., Jefferson, B., Joslyn, C., Kvinge, H., Mitchell, H.D., Praggastis, B., Eisfeld, A.J., Sims, A.C., Thackray, L.B.,et al.: Hypergraph models of biological networks to identify genes critical to pathogenic viral response. BMC Bioinfo.22(1), 287 (2021)

work page 2021

[40] [40]

Wang, Y., Wang, Z., Yu, X., Wang, X., Song, J., Yu, D.-J., Ge, F.: More: A multi- omics data-driven hypergraph integration network for biomedical data classification and biomarker identification. Brief. in Bioinfo. 26(1), 658 (2025)

work page 2025

[41] [41]

IEEE Trans

Bai, J., Gong, B., Zhao, Y., Lei, F., Yan, C., Gao, Y.: Multi-scale representation learn- ing on hypergraph for 3D shape retrieval and recognition. IEEE Trans. Image Process.30, 5327–5338 (2021)

work page 2021

[42] [42]

IEEE Access12, 42816–42833 (2024)

Hussain, M.: YOLOv1 to v8: Unveiling each variant-a comprehensive review of YOLO. IEEE Access12, 42816–42833 (2024)

work page 2024

[43] [43]

Han, Y., Wang, P., Kundu, S., Ding, Y., Wang, Z.: Vision HGNN: An image is more than a graph of nodes. In: Int. Conf. Comput. Vis., pp. 19878–19888 (2023)

work page 2023

[44] [44]

IEEE Trans

Wang, H., Zhang, S., Leng, B.: HGFormer: Topology-aware vision transformer with hypergraph learning. IEEE Trans. Multime- dia (2025)

work page 2025

[45] [45]

In: ACM Int

Chen, L., Wang, Q., Li, Z., Yin, Y.: Hypergraph-guided intra-and inter-category relation modeling for fine-grained visual recognition. In: ACM Int. Conf. Multimedia, pp. 8043–8052 (2024)

work page 2024

[46] [46]

In: IEEE Conf

Fixelle, J.: Hypergraph Vision Transform- ers: Images are more than nodes, more than edges. In: IEEE Conf. Comput. Vis. Pattern Recog., pp. 9751–9761 (2025)

work page 2025

[47] [47]

A survey on mixture of experts

Cai, W., Jiang, J., Wang, F., Tang, J., Kim, S., Huang, J.: A survey on mixture of experts. arXiv preprint arXiv:2407.06204 (2024) 24

work page arXiv 2024

[48] [48]

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Dai, D., Deng, C., Zhao, C., Xu, R., Gao, H., Chen, D., Li, J., Zeng, W., Yu, X., Wu, Y., et al.: DeepseekMoE: Towards ulti- mate expert specialization in mixture-of- experts language models. arXiv preprint arXiv:2401.06066 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[49] [49]

Han, K., Wang, Y., Guo, J., Tang, Y., Wu, E.: Vision GNN: An image is worth graph of nodes. Adv. Neural Inform. Process. Syst.35, 8291–8303 (2022)

work page 2022

[50] [50]

arXiv preprint arXiv:2109.14483 (2021)

Tian, Y., Chu, X., Wang, H.: CCTrans: Sim- plifying and improving crowd counting with transformer. arXiv preprint arXiv:2109.14483 (2021)

work page arXiv 2021

[51] [51]

IEEE Trans

Liu, X., Li, G., Qi, Y., Han, Z., Hen- gel, A., Sebe, N., Yang, M.-H., Huang, Q.: Consistency-aware anchor pyramid network for crowd localization. IEEE Trans. Pattern Anal. Mach. Intell. (2024)

work page 2024

[52] [52]

IEEE Trans

Sam, D.B., Peri, S.V., Sundararaman, M.N., Kamath, A., Babu, R.V.: Locate, size, and count: Accurately resolving people in dense crowds via detection. IEEE Trans. Pattern Anal. Mach. Intell.43(8), 2739–2751 (2021)

work page 2021

[53] [53]

In: IEEE Conf

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conf. Comput. Vis. Pattern Recog., pp. 770–778 (2016)

work page 2016

[54] [54]

Adam: A Method for Stochastic Optimization

Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[55] [55]

https://github.com/ultralytics/ultralytics

Jocher, G., Qiu, J.: Ultralytics YOLO11. https://github.com/ultralytics/ultralytics

work page

[56] [56]

YOLOv12: Attention-Centric Real-Time Object Detectors

Tian, Y., Ye, Q., Doermann, D.: YOLOv12: Attention-centric real-time object detectors. arXiv preprint arXiv:2502.12524 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[57] [57]

In: IEEE Conf

Li, Y., Zhang, X., Chen, D.: CSRNet: Dilated convolutional neural networks for under- standing the highly congested scenes. In: IEEE Conf. Comput. Vis. Pattern Recog., pp. 1091–1100 (2018)

work page 2018

[58] [58]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[59] [59]

Ma, Z., Wei, X., Hong, X., Gong, Y.: Bayesian loss for crowd count estimation with point supervision. In: Int. Conf. Comput. Vis., pp. 6142–6151 (2019)

work page 2019

[60] [60]

1595–1607 (2020)

Wang, B., Liu, H., Samaras, D., Nguyen, M.H.: Distribution matching for crowd count- ing, pp. 1595–1607 (2020)

work page 2020

[61] [61]

In: AAAI, pp

Abousamra, S., Hoai, M., Samaras, D., Chen, C.: Localization in the crowd with topological constraints. In: AAAI, pp. 872–881 (2021)

work page 2021

[62] [62]

Zeng, X., Hu, S., Wang, H., Zhang, J.: Joint contextual transformer and multi-scale infor- mation shared network for crowd counting. In: Int. Conf. Pattern Recog. Arti. Intell., pp. 412–417 (2022)

work page 2022

[63] [63]

Liang, D., Xu, W., Bai, X.: An end-to-end transformer model for crowd localization. In: Eur. Conf. Comput. Vis., pp. 38–54 (2022)

work page 2022

[64] [64]

IEEE Trans

Wang, J., Gao, J., Yuan, Y., Wang, Q.: Crowd localization from gaussian mixture scoped knowledge and scoped teacher. IEEE Trans. Image Process.32, 1802–1814 (2023)

work page 2023

[65] [65]

IEEE Trans

Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., Liu, D., Mu, Y., Tan, M., Wang, X.,et al.: Deep high-resolution representation learning for visual recogni- tion. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3349–3364 (2020)

work page 2020

[66] [66]

IEEE Trans

Shu, W., Wan, J., Chan, A.B.: Generalized characteristic function loss for crowd anal- ysis in the frequency domain. IEEE Trans. Pattern Anal. Mach. Intell.46(5), 2882–2899 (2023)

work page 2023

[67] [67]

In: AAAI, pp

Lin, H., Ma, Z., Hong, X., Shangguan, Q., Meng, D.: GramFormer: Learning crowd counting via graph-modulated transformer. In: AAAI, pp. 3395–3403 (2024)

work page 2024

[68] [68]

In: IEEE Conf

Guo, M., Yuan, L., Yan, Z., Chen, B., Wang, Y., Ye, Q.: Regressor-segmenter mutual 25 prompt learning for crowd counting. In: IEEE Conf. Comput. Vis. Pattern Recog., pp. 28380–28389 (2024)

work page 2024

[69] [69]

Chu, X., Tian, Z., Wang, Y., Zhang, B., Ren, H., Wei, X., Xia, H., Shen, C.: Twins: Revisiting the design of spatial attention in vision transformers. In: Adv. Neural Inform. Process. Syst., pp. 9355–9366 (2021)

work page 2021

[70] [70]

Wang, W., Xie, E., Li, X., Fan, D.-P., Song, K., Liang, D., Lu, T., Luo, P., Shao, L.: PVT v2: Improved baselines with pyramid vision transformer. Comput. Visual Media 8(3), 415–424 (2022)

work page 2022

[71] [71]

Wang, C., He, W., Nie, Y., Guo, J., Liu, C., Wang, Y., Han, K.: Gold-YOLO: Efficient object detector via gather-and-distribute mechanism. Adv. Neural Inform. Process. Syst., 51094–51112 (2023)

work page 2023

[72] [72]

https://github.com/ultralytics/ ultralytics

Jocher, G., Chaurasia, A., Qiu, J.: Ultralytics YOLOv8. https://github.com/ultralytics/ ultralytics

work page

[73] [73]

Wang, C.-Y., Liao, H.-Y.M.: YOLOv9: Learning what you want to learn using pro- grammable gradient information (2024)

work page 2024

[74] [74]

Ao, W., Hui, C., Lihao, L.: YOLOv10: Real-time end-to-end object detection. arXiv preprint arXiv:2405.14458 (2024) 26 9 Appendix 9.1 Pseudo Code We provide the pseudo-code for SoftHGNN’s soft hyperedge generation, message passing on soft hyperedges, and sparse hyperedge selection, as shown in Algorithm 1, 2 and 3. 27 Algorithm 1Soft Hyperedge Generation R...

work page arXiv 2024