SoftHGNN: Soft Hypergraph Neural Networks for General Visual Recognition
Pith reviewed 2026-05-22 14:12 UTC · model grok-4.3
The pith
SoftHGNN assigns image features to hyperedges via continuous weights from similarities to learnable prototypes, enabling efficient high-order relation modeling.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Soft hyperedges are formed by computing similarities between vertex features and a small set of learnable hyperedge prototypes to produce continuous and differentiable participation weights; these weights serve as the medium for message aggregation that enriches representations with high-order contextual associations, with top-k selection and load-balancing regularization added to control cost when the number of prototypes grows.
What carries the argument
Soft hyperedges generated by continuous participation weights from feature-to-prototype similarities.
If this is right
- Existing vision backbones receive high-order scene context through a lightweight late-stage module without redesigning early feature extraction.
- Redundant hyperedge computation drops because only the top-k most relevant prototypes participate per input.
- Balanced prototype usage prevents collapse to a few dominant hyperedges during training.
- Performance rises on classification, detection, and related tasks across the tested datasets.
Where Pith is reading between the lines
- The prototype-based soft assignment could transfer to video or point-cloud data where relations also involve more than two elements at once.
- Varying the number of prototypes per task might expose a practical limit on how many distinct high-order groupings are useful in typical scenes.
- Layering this soft hypergraph block inside transformer stages could test whether early and late high-order mixing compound each other.
Load-bearing premise
Similarities between vertex features and a small set of learnable prototypes yield participation weights that correctly reflect semantically meaningful high-order visual groupings.
What would settle it
Replacing the learned prototype similarities with uniform random weights and re-running the experiments on the five datasets; if accuracy gains remain comparable, the semantic weighting mechanism would not be essential to the reported improvements.
read the original abstract
Visual recognition relies on understanding the semantics of image tokens and their complex interactions. Mainstream self-attention methods, while effective at modeling global pair-wise relations, fail to capture high-order associations inherent in real-world scenes and often suffer from redundant computation. Hypergraphs extend conventional graphs by modeling high-order interactions and offer a promising framework for addressing these limitations. However, existing hypergraph neural networks typically rely on static and hard hyperedge assignments, which lead to redundant hyperedges and overlooking the continuity of visual semantics. In this work, we present Soft Hypergraph Neural Networks (SoftHGNN), a lightweight plug-and-play hypergraph computation method for late-stage semantic reasoning in existing vision pipelines. Our SoftHGNN introduces the concept of soft hyperedges, where each vertex is associated with hyperedges via continuous and differentiable participation weights rather than hard binary assignments. These weights are produced by measuring similarities between vertex features and a small set of learnable hyperedge prototypes, yielding input-adaptive and semantically rich soft hyperedges. Using soft hyperedges as the medium for message aggregation and dissemination, SoftHGNN enriches feature representations with high-order contextual associations. To further enhance efficiency when scaling up the number of soft hyperedges, we incorporate a sparse hyperedge selection mechanism that activates only the top-k important hyperedges, along with a load-balancing regularizer to ensure adequate and balanced hyperedge utilization. Experimental results across three tasks on five datasets demonstrate that SoftHGNN efficiently captures high-order associations in visual scenes, achieving significant performance improvements. The code is available at: https://github.com/Mengqi-Lei/SoftHGNN.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces SoftHGNN, a lightweight plug-and-play hypergraph module for late-stage reasoning in vision models. Soft hyperedges are constructed via continuous, differentiable participation weights obtained by measuring similarities between vertex (token) features and a small set of learnable hyperedge prototypes, followed by top-k sparse selection and a load-balancing regularizer. The module is inserted into existing pipelines and evaluated on three visual recognition tasks across five datasets, with the central claim that the soft hyperedges enable efficient capture of high-order associations and yield significant performance gains over baselines.
Significance. If the results and attribution hold, SoftHGNN would offer a practical, scalable alternative to dense attention or static hypergraphs for modeling multi-way relations in scenes. The public code release is a clear strength that supports reproducibility and follow-up work.
major comments (3)
- [§3.2] §3.2 (Soft Hyperedge Construction): the participation weights are defined directly from cosine similarity (or equivalent) between features and a fixed set of learnable prototypes followed by top-k and balancing; this construction is mathematically equivalent to soft feature clustering/assignment and does not contain an explicit mechanism that enforces or verifies multi-way relational structure (e.g., specific co-occurrence patterns). This is load-bearing for the claim that the module 'captures high-order associations' rather than simply adding capacity via prototype-based aggregation.
- [§4] §4 (Experimental Results and Ablations): the reported gains are not isolated from confounding factors such as increased model capacity or the effect of the load-balancing regularizer alone. No control experiment replaces the soft-hyperedge aggregator with a comparably parameterized MLP or standard soft-clustering layer; without this, it remains unclear whether the performance delta is attributable to high-order modeling.
- [Table 2] Table 2 (or equivalent main results table): absolute improvements are shown, but the manuscript does not report run-to-run variance, statistical significance tests, or confidence intervals. This weakens the assertion of 'significant performance improvements' across tasks.
minor comments (2)
- [§3.2] The exact mathematical definition of the participation weight matrix (including temperature or normalization) and the load-balancing loss term should be written out explicitly with equation numbers for clarity.
- [Figure 1] Figure 1 would benefit from explicit labels on the prototype similarity and top-k selection steps to make the data flow unambiguous.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, providing clarifications and indicating the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [§3.2] the participation weights are defined directly from cosine similarity (or equivalent) between features and a fixed set of learnable prototypes followed by top-k and balancing; this construction is mathematically equivalent to soft feature clustering/assignment and does not contain an explicit mechanism that enforces or verifies multi-way relational structure (e.g., specific co-occurrence patterns). This is load-bearing for the claim that the module 'captures high-order associations' rather than simply adding capacity via prototype-based aggregation.
Authors: We appreciate this observation. While the soft assignment uses prototype similarities, the subsequent hypergraph message passing aggregates features across multiple vertices per hyperedge via the incidence matrix. This step explicitly realizes multi-way interactions, as each hyperedge pools information from a variable set of vertices in one operation—distinct from independent soft clustering. We have revised §3.2 to clarify this distinction and the role of the hypergraph convolution in enforcing high-order structure. revision: partial
-
Referee: [§4] the reported gains are not isolated from confounding factors such as increased model capacity or the effect of the load-balancing regularizer alone. No control experiment replaces the soft-hyperedge aggregator with a comparably parameterized MLP or standard soft-clustering layer; without this, it remains unclear whether the performance delta is attributable to high-order modeling.
Authors: This is a valid concern about isolating the source of gains. We have added ablation experiments in the revised §4 that replace the hypergraph aggregator with (i) an MLP of matched parameter count and (ii) a soft-clustering layer lacking the hypergraph structure. We also ablate the load-balancing regularizer in isolation. The new results show that the complete SoftHGNN outperforms these controls, supporting that the gains arise from high-order modeling. revision: yes
-
Referee: [Table 2] absolute improvements are shown, but the manuscript does not report run-to-run variance, statistical significance tests, or confidence intervals. This weakens the assertion of 'significant performance improvements' across tasks.
Authors: We agree that variability and statistical measures would improve rigor. In the revised manuscript we now report standard deviations over multiple random seeds for the main results in Table 2 and include paired t-test p-values comparing SoftHGNN against baselines to substantiate the significance of the observed improvements. revision: yes
Circularity Check
No circularity: soft hyperedge weights are an explicit design choice validated externally
full rationale
The paper defines soft hyperedges directly via a similarity-based construction between vertex features and learnable prototypes, followed by standard message passing. This is presented as a modeling decision in the method, not as a derived prediction or result that reduces to its own inputs by construction. No equations or claims show a fitted parameter being renamed as a prediction, nor does any load-bearing step rely on self-citation chains or imported uniqueness theorems. Experiments across independent datasets and tasks serve as external validation. The central mechanism does not collapse to tautology; concerns about whether the construction truly captures high-order relations are questions of modeling validity rather than circular derivation.
Axiom & Free-Parameter Ledger
free parameters (2)
- number of hyperedge prototypes
- top-k value
axioms (2)
- domain assumption Feature similarity to prototypes yields semantically continuous and differentiable participation weights.
- standard math Standard back-propagation through the soft assignment and sparse selection operations is stable.
invented entities (2)
-
soft hyperedge
no independent evidence
-
hyperedge prototype
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
These weights are produced by measuring similarities between vertex features and a small set of learnable hyperedge prototypes, yielding input-adaptive and semantically rich soft hyperedges.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
A∈[0,1]^{N×M} is the participation matrix between vertices and hyperedges
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 2 Pith papers
-
Sparse Hypergraph-Enhanced Frame-Event Object Detection with Fine-Grained MoE
Hyper-FEOD fuses RGB and event data via sparse hypergraph cross-modal fusion and region-specialized MoE experts to improve accuracy-efficiency in object detection.
-
RACANet: Reliability-Aware Crowd Anchor Network for RGB-T Crowd Counting
RACANet proposes a reliability-aware two-stage fusion network with cross-modal pretraining and local anchor modules that outperforms prior RGB-T crowd counting methods on standard benchmarks.
Reference graph
Works this paper leans on
-
[1]
Wanyan, Y., Yang, X., Dong, W., Xu, C.: A comprehensive review of few-shot action recognition. Int. J. Comput. Vis. (2025)
work page 2025
-
[2]
Chen, Y., Mancini, M., Zhu, X., Akata, Z.: Semi-supervised and unsupervised deep visual learning: A survey. IEEE Trans. Pat- tern Anal. Mach. Intell.46(3), 1327–1347 (2024)
work page 2024
-
[3]
Li, X., Ding, H., Yuan, H., Zhang, W., Pang, J., Cheng, G., Chen, K., Liu, Z., Loy, C.C.: Transformer-based visual segmentation: A survey. IEEE Trans. Pattern Anal. Mach. Intell.46(12), 10138–10163 (2024)
work page 2024
-
[4]
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S.,et al.: An image is worth 16×16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020)
work page 2020
-
[5]
Wang, W., Xie, E., Li, X., Fan, D.-P., Song, K., Liang, D., Lu, T., Luo, P., Shao, L.: Pyra- mid vision transformer: A versatile backbone for dense prediction without convolutions. In: Int. Conf. Comput. Vis., pp. 568–578 (2021)
work page 2021
-
[6]
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin Trans- former: Hierarchical vision transformer using shifted windows. In: Int. Conf. Comput. Vis., pp. 10012–10022 (2021)
work page 2021
-
[7]
Zhang, S., Meng, N., Lam, E.Y.: LRT: An efficient low-light restoration Transformer for dark light field images. IEEE Trans. Image Process.32, 4314–4326 (2023)
work page 2023
-
[8]
Zhang, J., Li, X., Wang, Y., Wang, C., Yang, Y., Liu, Y., Tao, D.: Eatformer: Improving vision transformer inspired by evolutionary algorithm. Int. J. Comput. Vis.132(9), 3509– 3536 (2024)
work page 2024
-
[9]
Wang, T., Zhang, K., Shao, Z., Luo, W., 22 Stenger, B., Lu, T., Kim, T.-K., Liu, W., Li, H.: Gridformer: Residual dense transformer with grid structure for image restoration in adverse weather conditions. Int. J. Comput. Vis.132(10), 4541–4563 (2024)
work page 2024
-
[10]
Vaswani, A., Shazeer, N., Parmar, N., Uszko- reit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. Adv. in Neur. Info. Process. Sys. (2017)
work page 2017
-
[11]
Wang, P., Zheng, W., Chen, T., Wang, Z.: Anti-oversmoothing in deep Vision Trans- formers via the Fourier domain analysis: From theory to practice. In: Int. Conf. Learn. Represent. (2022)
work page 2022
-
[12]
Nguyen, T., Nguyen, T., Baraniuk, R.: Mit- igating over-smoothing in Transformers via regularized nonlocal functionals. Adv. Neu- ral Inform. Process. Syst.36, 80233–80256 (2023)
work page 2023
-
[13]
Zhai, S., Likhomanenko, T., Littwin, E., Bus- bridge, D., Ramapuram, J., Zhang, Y., Gu, J., Susskind, J.M.: Stabilizing Transformer training by preventing attention entropy col- lapse. In: Int. Conf. Learn. Represent., pp. 40770–40803 (2023)
work page 2023
-
[14]
Gao, Y., Ji, S., Han, X., Dai, Q.: Hypergraph computation. Engineering (2024)
work page 2024
- [15]
-
[16]
Feng, Y., You, H., Zhang, Z., Ji, R., Gao, Y.: Hypergraph neural networks. In: AAAI, pp. 3558–3565 (2019)
work page 2019
-
[17]
arXiv preprint arXiv:2503.07959 (2025)
Yang, M., Xu, X.-J.: Recent advances in hypergraph neural networks. arXiv preprint arXiv:2503.07959 (2025)
-
[18]
Feng, Y., Huang, J., Du, S., Ying, S., Yong, J.-H., Li, Y., Ding, G., Ji, R., Gao, Y.: Hyper- YOLO: When Visual Object Detection Meets Hypergraph Computation. IEEE Trans. Pat- tern Anal. Mach. Intell.47(4), 2388–2401 (2025)
work page 2025
-
[19]
Cai, D., Song, M., Sun, C., Zhang, B., Hong, S., Li, H.: Hypergraph structure learning for hypergraph neural networks. In: IJCAI, pp. 1923–1929 (2022)
work page 1923
-
[20]
Zhang, Z., Lin, H., Gao, Y., BNRist, K.: Dynamic hypergraph structure learning. In: IJCAI, pp. 3162–3169 (2018)
work page 2018
-
[21]
Liu, Q., Sun, Y., Wang, C., Liu, T., Tao, D.: Elastic net hypergraph learning for image clustering and semi-supervised classification. IEEE Trans. Image Process.26(1), 452–463 (2016)
work page 2016
-
[22]
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.,et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis.115(3), 211–252 (2015)
work page 2015
-
[23]
Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: IEEE Conf. Comput. Vis. Pattern Recog., pp. 589–597 (2016)
work page 2016
-
[24]
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll´ ar, P., Zitnick, C.L.: Microsoft COCO: Common objects in context. In: Eur. Conf. Comput. Vis., pp. 740–755 (2014)
work page 2014
- [25]
-
[26]
Di, D., Yang, J., Luo, C., Xue, Z., Chen, W., Yang, X., Gao, Y.: Hyper-3dg: Text-to- 3d gaussian generation via hypergraph. Int. J. Comput. Vis.133(5), 2886–2909 (2025)
work page 2025
-
[27]
Kim, S., Lee, S.Y., Gao, Y., Antelmi, A., Polato, M., Shin, K.: A survey on hypergraph neural networks: An in-depth and step-by- step guide. In: ACM SIGKDD, pp. 6534–6544 (2024)
work page 2024
-
[28]
Gao, Y., Feng, Y., Ji, S., Ji, R.: HGNN+: General hypergraph neural networks. IEEE Trans. Pattern Anal. Mach. Intell.45(3), 23 3181–3199 (2022)
work page 2022
-
[29]
Jiang, J., Wei, Y., Feng, Y., Cao, J., Gao, Y.: Dynamic hypergraph neural networks. In: IJCAI, pp. 2635–2641 (2019)
work page 2019
-
[30]
Kim, E.-S., Kang, W.Y., On, K.-W., Heo, Y.-J., Zhang, B.-T.: Hypergraph attention networks for multimodal learning. In: IEEE Conf. Comput. Vis. Pattern Recog., pp. 14581–14590 (2020)
work page 2020
- [31]
-
[32]
Zhu, J., Zhu, J., Ghosh, S., Wu, W., Yuan, J.: Social influence maximization in hypergraph in social networks. IEEE Trans. Net. Sci. Eng. 6(4), 801–811 (2018)
work page 2018
-
[33]
Yang, D., Qu, B., Yang, J., Cudr´ e-Mauroux, P.: LBSN2Vec++: Heterogeneous hyper- graph embedding for location-based social networks. IEEE Trans. Know. Data Eng. 34(4), 1843–1855 (2020)
work page 2020
-
[34]
Zeng, Y., Jin, Q., Bao, T., Li, W.: Multi- modal knowledge hypergraph for diverse image retrieval. In: AAAI, pp. 3376–3383 (2023)
work page 2023
-
[35]
Xia, X., Yin, H., Yu, J., Wang, Q., Cui, L., Zhang, X.: Self-supervised hypergraph con- volutional networks for session-based recom- mendation. In: AAAI, pp. 4503–4511 (2021)
work page 2021
-
[36]
La Gatta, V., Moscato, V., Pennone, M., Postiglione, M., Sperl´ ı, G.: Music recom- mendation via hypergraph embedding. IEEE Trans. Neur. Net. Learn. Sys.34(10), 7887– 7899 (2022)
work page 2022
-
[37]
IEEE Trans- actions on Image Processing33, 3301–3313 (2024)
Ma, N., Wu, Z., Feng, Y., Wang, C., Gao, Y.: Multi-view time-series hypergraph neural network for action recognition. IEEE Trans- actions on Image Processing33, 3301–3313 (2024)
work page 2024
-
[38]
IEEE Transactions on Image Processing30, 2263–2275 (2021)
Hao, X., Li, J., Guo, Y., Jiang, T., Yu, M.: Hypergraph neural network for skeleton- based action recognition. IEEE Transactions on Image Processing30, 2263–2275 (2021)
work page 2021
-
[39]
Feng, S., Heath, E., Jefferson, B., Joslyn, C., Kvinge, H., Mitchell, H.D., Praggastis, B., Eisfeld, A.J., Sims, A.C., Thackray, L.B.,et al.: Hypergraph models of biological networks to identify genes critical to pathogenic viral response. BMC Bioinfo.22(1), 287 (2021)
work page 2021
-
[40]
Wang, Y., Wang, Z., Yu, X., Wang, X., Song, J., Yu, D.-J., Ge, F.: More: A multi- omics data-driven hypergraph integration network for biomedical data classification and biomarker identification. Brief. in Bioinfo. 26(1), 658 (2025)
work page 2025
-
[41]
Bai, J., Gong, B., Zhao, Y., Lei, F., Yan, C., Gao, Y.: Multi-scale representation learn- ing on hypergraph for 3D shape retrieval and recognition. IEEE Trans. Image Process.30, 5327–5338 (2021)
work page 2021
-
[42]
IEEE Access12, 42816–42833 (2024)
Hussain, M.: YOLOv1 to v8: Unveiling each variant-a comprehensive review of YOLO. IEEE Access12, 42816–42833 (2024)
work page 2024
-
[43]
Han, Y., Wang, P., Kundu, S., Ding, Y., Wang, Z.: Vision HGNN: An image is more than a graph of nodes. In: Int. Conf. Comput. Vis., pp. 19878–19888 (2023)
work page 2023
-
[44]
Wang, H., Zhang, S., Leng, B.: HGFormer: Topology-aware vision transformer with hypergraph learning. IEEE Trans. Multime- dia (2025)
work page 2025
-
[45]
Chen, L., Wang, Q., Li, Z., Yin, Y.: Hypergraph-guided intra-and inter-category relation modeling for fine-grained visual recognition. In: ACM Int. Conf. Multimedia, pp. 8043–8052 (2024)
work page 2024
-
[46]
Fixelle, J.: Hypergraph Vision Transform- ers: Images are more than nodes, more than edges. In: IEEE Conf. Comput. Vis. Pattern Recog., pp. 9751–9761 (2025)
work page 2025
-
[47]
A survey on mixture of experts
Cai, W., Jiang, J., Wang, F., Tang, J., Kim, S., Huang, J.: A survey on mixture of experts. arXiv preprint arXiv:2407.06204 (2024) 24
-
[48]
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Dai, D., Deng, C., Zhao, C., Xu, R., Gao, H., Chen, D., Li, J., Zeng, W., Yu, X., Wu, Y., et al.: DeepseekMoE: Towards ulti- mate expert specialization in mixture-of- experts language models. arXiv preprint arXiv:2401.06066 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[49]
Han, K., Wang, Y., Guo, J., Tang, Y., Wu, E.: Vision GNN: An image is worth graph of nodes. Adv. Neural Inform. Process. Syst.35, 8291–8303 (2022)
work page 2022
-
[50]
arXiv preprint arXiv:2109.14483 (2021)
Tian, Y., Chu, X., Wang, H.: CCTrans: Sim- plifying and improving crowd counting with transformer. arXiv preprint arXiv:2109.14483 (2021)
-
[51]
Liu, X., Li, G., Qi, Y., Han, Z., Hen- gel, A., Sebe, N., Yang, M.-H., Huang, Q.: Consistency-aware anchor pyramid network for crowd localization. IEEE Trans. Pattern Anal. Mach. Intell. (2024)
work page 2024
-
[52]
Sam, D.B., Peri, S.V., Sundararaman, M.N., Kamath, A., Babu, R.V.: Locate, size, and count: Accurately resolving people in dense crowds via detection. IEEE Trans. Pattern Anal. Mach. Intell.43(8), 2739–2751 (2021)
work page 2021
-
[53]
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conf. Comput. Vis. Pattern Recog., pp. 770–778 (2016)
work page 2016
-
[54]
Adam: A Method for Stochastic Optimization
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[55]
https://github.com/ultralytics/ultralytics
Jocher, G., Qiu, J.: Ultralytics YOLO11. https://github.com/ultralytics/ultralytics
-
[56]
YOLOv12: Attention-Centric Real-Time Object Detectors
Tian, Y., Ye, Q., Doermann, D.: YOLOv12: Attention-centric real-time object detectors. arXiv preprint arXiv:2502.12524 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[57]
Li, Y., Zhang, X., Chen, D.: CSRNet: Dilated convolutional neural networks for under- standing the highly congested scenes. In: IEEE Conf. Comput. Vis. Pattern Recog., pp. 1091–1100 (2018)
work page 2018
-
[58]
Very Deep Convolutional Networks for Large-Scale Image Recognition
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[59]
Ma, Z., Wei, X., Hong, X., Gong, Y.: Bayesian loss for crowd count estimation with point supervision. In: Int. Conf. Comput. Vis., pp. 6142–6151 (2019)
work page 2019
-
[60]
Wang, B., Liu, H., Samaras, D., Nguyen, M.H.: Distribution matching for crowd count- ing, pp. 1595–1607 (2020)
work page 2020
-
[61]
Abousamra, S., Hoai, M., Samaras, D., Chen, C.: Localization in the crowd with topological constraints. In: AAAI, pp. 872–881 (2021)
work page 2021
-
[62]
Zeng, X., Hu, S., Wang, H., Zhang, J.: Joint contextual transformer and multi-scale infor- mation shared network for crowd counting. In: Int. Conf. Pattern Recog. Arti. Intell., pp. 412–417 (2022)
work page 2022
-
[63]
Liang, D., Xu, W., Bai, X.: An end-to-end transformer model for crowd localization. In: Eur. Conf. Comput. Vis., pp. 38–54 (2022)
work page 2022
-
[64]
Wang, J., Gao, J., Yuan, Y., Wang, Q.: Crowd localization from gaussian mixture scoped knowledge and scoped teacher. IEEE Trans. Image Process.32, 1802–1814 (2023)
work page 2023
-
[65]
Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., Liu, D., Mu, Y., Tan, M., Wang, X.,et al.: Deep high-resolution representation learning for visual recogni- tion. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3349–3364 (2020)
work page 2020
-
[66]
Shu, W., Wan, J., Chan, A.B.: Generalized characteristic function loss for crowd anal- ysis in the frequency domain. IEEE Trans. Pattern Anal. Mach. Intell.46(5), 2882–2899 (2023)
work page 2023
-
[67]
Lin, H., Ma, Z., Hong, X., Shangguan, Q., Meng, D.: GramFormer: Learning crowd counting via graph-modulated transformer. In: AAAI, pp. 3395–3403 (2024)
work page 2024
-
[68]
Guo, M., Yuan, L., Yan, Z., Chen, B., Wang, Y., Ye, Q.: Regressor-segmenter mutual 25 prompt learning for crowd counting. In: IEEE Conf. Comput. Vis. Pattern Recog., pp. 28380–28389 (2024)
work page 2024
-
[69]
Chu, X., Tian, Z., Wang, Y., Zhang, B., Ren, H., Wei, X., Xia, H., Shen, C.: Twins: Revisiting the design of spatial attention in vision transformers. In: Adv. Neural Inform. Process. Syst., pp. 9355–9366 (2021)
work page 2021
-
[70]
Wang, W., Xie, E., Li, X., Fan, D.-P., Song, K., Liang, D., Lu, T., Luo, P., Shao, L.: PVT v2: Improved baselines with pyramid vision transformer. Comput. Visual Media 8(3), 415–424 (2022)
work page 2022
-
[71]
Wang, C., He, W., Nie, Y., Guo, J., Liu, C., Wang, Y., Han, K.: Gold-YOLO: Efficient object detector via gather-and-distribute mechanism. Adv. Neural Inform. Process. Syst., 51094–51112 (2023)
work page 2023
-
[72]
https://github.com/ultralytics/ ultralytics
Jocher, G., Chaurasia, A., Qiu, J.: Ultralytics YOLOv8. https://github.com/ultralytics/ ultralytics
-
[73]
Wang, C.-Y., Liao, H.-Y.M.: YOLOv9: Learning what you want to learn using pro- grammable gradient information (2024)
work page 2024
-
[74]
Ao, W., Hui, C., Lihao, L.: YOLOv10: Real-time end-to-end object detection. arXiv preprint arXiv:2405.14458 (2024) 26 9 Appendix 9.1 Pseudo Code We provide the pseudo-code for SoftHGNN’s soft hyperedge generation, message passing on soft hyperedges, and sparse hyperedge selection, as shown in Algorithm 1, 2 and 3. 27 Algorithm 1Soft Hyperedge Generation R...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.