Rethinking Air-Ground Collaboration: A Progressive Cross-Task Benchmark and Socialized Learning Framework
Pith reviewed 2026-06-26 21:45 UTC · model grok-4.3
The pith
Task-conditioned collaboration outperforms uniform fusion for heterogeneous air-ground perception by reducing negative transfer.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that air-ground perception should be formulated as progressive cross-task collaboration, supported by the AGPC benchmark and implemented through the Socialized Co-Perception framework whose Dual-Layer Router decouples multi-scale expert selection from task-conditioned modulation, producing a 3.73% coevolutionary gain and 7.86% improvement in average downstream performance over uniform fusion.
What carries the argument
The Dual-Layer Router, which separates input-side multi-scale expert selection from output-side task-conditioned modulation to enable selective cross-view and cross-task interaction.
If this is right
- Task-conditioned routing reduces negative transfer across heterogeneous views.
- Aerial localization supplies useful priors for subsequent ground target association.
- Identity-aware parsing improves when it follows the prior cross-task stages.
- Average performance across localization, association, and parsing rises by 7.86%.
- The AGPC benchmark supplies a standardized testbed for evaluating progressive methods.
Where Pith is reading between the lines
- The same router structure could be tested on other multi-view settings such as vehicle-to-infrastructure perception.
- Deployment in varying weather or lighting would reveal whether the selective routing remains stable.
- Adding temporal consistency constraints across video frames might further increase the coevolutionary gain.
- The progressive ordering could be learned rather than fixed if downstream tasks vary in priority.
Load-bearing premise
Differences in geometry, scale, and occlusion between aerial and ground views make uniform feature sharing prone to negative transfer.
What would settle it
A controlled test on the AGPC benchmark in which a uniform fusion baseline matches or exceeds the reported 3.73% coevolutionary gain and 7.86% downstream improvement would falsify the central claim.
Figures
read the original abstract
Air-ground collaborative perception is crucial for robust visual understanding in real-world dynamic environments. However, existing studies typically formulate collaboration as single-task cross-view fusion, overlooking the functional dependencies among localization, target association, and fine-grained parsing. In addition, the heterogeneous nature of aerial and ground views introduces substantial geometric, scale, and occlusion discrepancies, making uniform feature sharing vulnerable to negative transfer. To tackle these issues, we model air-ground perception as a progressive cross-task collaboration task and construct the Air-Ground Progressive Collaboration (AGPC) benchmark, a spatio-temporally aligned benchmark comprising more than 745K raw video frames. Built upon this benchmark, we propose Socialized Co-Perception (SCP), a coarse-to-fine framework that organizes collaboration progressively from aerial global localization to ground target association and identity-aware parsing. Its core module, the Dual-Layer Router (DLR), decouples input-side multi-scale expert selection from output-side task-conditioned modulation, enabling selective cross-view and cross-task interaction while suppressing harmful interference. Extensive experiments demonstrate the effectiveness of SCP. It achieves a 3.73\% coevolutionary gain and a 7.86\% improvement in average downstream performance. These results show that task-conditioned collaboration is more effective than uniform fusion for heterogeneous air-ground perception. The code is available at https://github.com/g1136639260-spec/AGSCP.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the Air-Ground Progressive Collaboration (AGPC) benchmark comprising more than 745K spatio-temporally aligned video frames and proposes the Socialized Co-Perception (SCP) framework, whose core Dual-Layer Router (DLR) module decouples multi-scale expert selection from task-conditioned modulation. It claims that this progressive cross-task approach yields a 3.73% coevolutionary gain and 7.86% improvement in average downstream performance over uniform fusion, thereby mitigating negative transfer arising from geometric, scale, and occlusion discrepancies between aerial and ground views.
Significance. If the empirical claims hold under rigorous controls, the work would contribute a large-scale aligned benchmark and a task-conditioned collaboration mechanism to air-ground perception, a growing area in computer vision. The public code release at the cited GitHub repository and the benchmark construction itself constitute concrete strengths that support reproducibility and further research.
major comments (1)
- [Abstract and Experiments] Abstract and Experiments section: the central claim of a 3.73% coevolutionary gain and 7.86% average downstream improvement is presented without any description of the baselines, number of runs, statistical tests, error bars, train/validation/test splits, or controls for confounds. This information is load-bearing for assessing whether the DLR-driven task-conditioned interaction genuinely outperforms uniform fusion on the AGPC benchmark.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on experimental rigor. We agree that the reported gains require explicit supporting details to allow proper evaluation and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract and Experiments] Abstract and Experiments section: the central claim of a 3.73% coevolutionary gain and 7.86% average downstream improvement is presented without any description of the baselines, number of runs, statistical tests, error bars, train/validation/test splits, or controls for confounds. This information is load-bearing for assessing whether the DLR-driven task-conditioned interaction genuinely outperforms uniform fusion on the AGPC benchmark.
Authors: We acknowledge that the current manuscript does not provide sufficient detail on these experimental aspects. In the revised version we will expand the Experiments section to explicitly list all baselines and their configurations, report the number of independent runs together with statistical tests (e.g., paired t-tests) and error bars, document the precise train/validation/test splits on the AGPC benchmark, and include additional controls or ablations that address potential confounds arising from geometric, scale, and occlusion differences. We will also add a brief reference to these controls in the abstract. These additions will directly substantiate the claimed 3.73% coevolutionary gain and 7.86% downstream improvement over uniform fusion. revision: yes
Circularity Check
No significant circularity
full rationale
The paper presents an empirical study: it constructs the AGPC benchmark from 745K aligned frames and evaluates the SCP framework with its DLR module on downstream tasks, reporting measured gains (3.73% coevolutionary, 7.86% average) over uniform fusion baselines. No equations, parameter-fitting steps, or derivations appear in the abstract or described claims that reduce the reported improvements to quantities defined by the inputs themselves. The central claim rests on experimental comparison rather than any self-definitional, fitted-input, or self-citation chain that collapses by construction. The benchmark construction and code release supply independent grounding, making the derivation self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- Hyperparameters of SCP and DLR
axioms (1)
- domain assumption Functional dependencies exist among localization, target association, and fine-grained parsing that justify progressive rather than single-task modeling.
invented entities (1)
-
Dual-Layer Router (DLR)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
The social function of intellect,
N. K. Humphrey, “The social function of intellect,”Cambridge University Press, pp. 303–317, 1976
1976
-
[2]
The social brain hypothesis,
R. I. Dunbar, “The social brain hypothesis,”Evol. Anthropol., vol. 6, no. 5, pp. 178–190, 1998
1998
-
[3]
Multi-task learning for dense prediction tasks: A survey,
S. Vandenhende, S. Georgoulis, W. Van Gansbeke, M. Proesmans, D. Dai, and L. Van Gool, “Multi-task learning for dense prediction tasks: A survey,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 7, pp. 3614–3633, 2022
2022
-
[4]
Gradient surgery for multi-task learning,
T. Yu, S. Kumar, A. Gupta, S. Levine, K. Hausman, and C. Finn, “Gradient surgery for multi-task learning,” inProc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 33, 2020
2020
-
[5]
Conflict-averse gradient descent for multi-task learning,
B. Liu, X. Liu, X. Jin, P. Stone, and Q. Liu, “Conflict-averse gradient descent for multi-task learning,” inProc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 34, 2021
2021
-
[6]
Spatial-aware feature aggregation for image based cross-view geo-localization,
Y . Shi, L. Liu, X. Yu, and H. Li, “Spatial-aware feature aggregation for image based cross-view geo-localization,” inProc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 32, 2019. 12
2019
-
[7]
Vigor: Cross-view image geo-localization beyond one-to-one retrieval,
S. Zhu, T. Yang, and C. Chen, “Vigor: Cross-view image geo-localization beyond one-to-one retrieval,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 3640–3649
2021
-
[8]
Transgeo: Transformer is all you need for cross-view image geo-localization,
S. Zhu, M. Shah, and C. Chen, “Transgeo: Transformer is all you need for cross-view image geo-localization,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 1162–1171
2022
-
[9]
Taskprompter: Spatial-channel multi-task prompting for dense scene understanding,
H. Ye and D. Xu, “Taskprompter: Spatial-channel multi-task prompting for dense scene understanding,” inInt. Conf. Learn. Represent. (ICLR), 2023
2023
-
[10]
Common ravens, corvus corax, preferentially associate with grey wolves, canis lupus, as a foraging strategy in winter,
D. Stahler, B. Heinrich, and D. Smith, “Common ravens, corvus corax, preferentially associate with grey wolves, canis lupus, as a foraging strategy in winter,”Anim. Behav., vol. 64, no. 2, pp. 283–290, 2002
2002
-
[11]
Socialized coevolution: Advancing a better world through cross-task collaboration,
X. Yao, Y . Wang, P. Zhu, W. Lin, R. Zhao, Z. Guo, W. Li, and Q. Hu, “Socialized coevolution: Advancing a better world through cross-task collaboration,” inProc. Int. Conf. Mach. Learn. (ICML), vol. 267, 2025, pp. 71 780–71 797
2025
-
[12]
Cooperative task assignment for aerial-ground detection systems via a novel hybrid genetic method,
L. Yu, Y . Yang, X. Su, S. Sun, T. Jiang, and J. Huang, “Cooperative task assignment for aerial-ground detection systems via a novel hybrid genetic method,”IEEE Trans. Ind. Electron., vol. 72, no. 4, pp. 4063–4072, 2025
2025
-
[13]
A monocular vision-based localization system of size-uncertain ground targets for uavs,
J. Chen, G. Zhang, H. Jiang, and Y . He, “A monocular vision-based localization system of size-uncertain ground targets for uavs,”IEEE Trans. Instrum. Meas., vol. 74, pp. 1–10, 2025
2025
-
[14]
Ag-reid. v2: Bridging aerial and ground views for person re-identification,
H. Nguyen, K. Nguyen, S. Sridharan, and C. Fookes, “Ag-reid. v2: Bridging aerial and ground views for person re-identification,”IEEE Trans. Inf. Forensics Security, vol. 19, pp. 2896–2908, 2024
2024
-
[15]
Ag-vpreid: A challenging large-scale benchmark for aerial-ground video- based person re-identification,
H. Nguyen, K. Nguyen, A. Pemasiri, F. Liu, S. Sridharan, and C. Fookes, “Ag-vpreid: A challenging large-scale benchmark for aerial-ground video- based person re-identification,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2025, pp. 1241–1251
2025
-
[16]
Detreidx: A stress-test dataset for real-world uav-based person recognition,
K. A. Hambarde, N. Mbongo, M. Pavan Kumar, S. Mekewad, C. Fernan- des, G. Silahtaroglu, A. Nithya, P. Wasnik, M. Rashidunnabi, P. Samale, and H. Proenca, “Detreidx: A stress-test dataset for real-world uav-based person recognition,”IEEE Trans. Biometrics, Behav., Identity Sci., vol. 8, no. 3, pp. 365–377, 2026
2026
-
[17]
Agvot: Visual object tracking via cooperation of aerial and ground views,
K. Yan, W. Qian, J. Cao, and C. Bi, “Agvot: Visual object tracking via cooperation of aerial and ground views,”IEEE Trans. Intell. Transp. Syst., vol. 27, no. 1, pp. 1416–1425, 2026
2026
-
[18]
Air–ground cooperative multitarget hierarchical tracking method based on aerial fisheye view,
Y . Cui, H. Lu, X. Dong, J. Xiang, D. Li, and Z. Tu, “Air–ground cooperative multitarget hierarchical tracking method based on aerial fisheye view,”IEEE Trans. Syst., Man, Cybern., Syst., vol. 55, no. 11, pp. 7651–7662, 2025
2025
-
[19]
A2visr: An active and adaptive ground–aerial localization system using visual inertial and single-range fusion,
S. Chen and W. Dong, “A2visr: An active and adaptive ground–aerial localization system using visual inertial and single-range fusion,”IEEE Trans. Ind. Electron., vol. 73, no. 5, pp. 7340–7349, 2026
2026
-
[20]
Collaborative perception for connected and autonomous driving: Challenges, possible solutions and opportunities,
S. Hu, Z. Fang, Y . Deng, X. Chen, and Y . Fang, “Collaborative perception for connected and autonomous driving: Challenges, possible solutions and opportunities,”IEEE Wireless Commun., vol. 32, no. 5, pp. 228–234, 2025
2025
-
[21]
Vehicle-road-cloud collaborative perception framework and key technologies: A review,
B. Gao, J. Liu, H. Zou, J. Chen, L. He, and K. Li, “Vehicle-road-cloud collaborative perception framework and key technologies: A review,” IEEE Trans. Intell. Transp. Syst., vol. 25, no. 12, pp. 19 295–19 318, 2024
2024
-
[22]
Agc-drive: A large-scale dataset for real-world aerial-ground collaboration in driving scenarios,
Y . Hou, B. Zou, M. Zhang, S. Yang, Y . Zhang, J. Zhuo, S. Chen, J. Chen, and H. Ma, “Agc-drive: A large-scale dataset for real-world aerial-ground collaboration in driving scenarios,”Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 38, 2025
2025
-
[23]
Coopercept: Cooperative perception for 3d object detection of autonomous vehicles,
Y . Zhang, B. Chen, J. Qin, F. Hu, and J. Hao, “Coopercept: Cooperative perception for 3d object detection of autonomous vehicles,”Drones, vol. 8, no. 6, p. 228, 2024
2024
-
[24]
Research challenges and progress in the end-to-end v2x cooperative autonomous driving competition,
R. Hao, H. Yu, J. Zhong, C. Wang, J. Wang, Y . Kan, W. Yang, S. Fan, H. Yin, J. Qiu, Y . Mu, J. Sun, L. Chen, W. Zimmer, D. Zhang, S. Zhang, M. Schwager, P. Luo, and Z. Nie, “Research challenges and progress in the end-to-end v2x cooperative autonomous driving competition,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2025, pp. 1828–1839
2025
-
[25]
Di-v2x: Learning domain-invariant representation for vehicle-infrastructure collaborative 3d object detection,
X. Li, J. Yin, W. Li, C. Xu, R. Yang, and J. Shen, “Di-v2x: Learning domain-invariant representation for vehicle-infrastructure collaborative 3d object detection,” inProc. AAAI Conf. Artif. Intell. (AAAI), vol. 38, no. 4, 2024, pp. 3208–3215
2024
-
[26]
Multi-task learning with multi-query transformer for dense prediction,
Y . Xu, X. Li, H. Yuan, Y . Yang, and L. Zhang, “Multi-task learning with multi-query transformer for dense prediction,”IEEE Trans. Circuits Syst. Video Technol., vol. 34, no. 2, pp. 1228–1240, 2024
2024
-
[27]
Learning category- and instance-aware pixel embedding for fast panoptic segmentation,
N. Gao, Y . Shan, X. Zhao, and K. Huang, “Learning category- and instance-aware pixel embedding for fast panoptic segmentation,”IEEE Trans. Image Process., vol. 30, pp. 6013–6023, 2021
2021
-
[28]
Mask ssd: An effective single-stage approach to object instance segmentation,
H. Zhang, Y . Tian, K. Wang, W. Zhang, and F.-Y . Wang, “Mask ssd: An effective single-stage approach to object instance segmentation,”IEEE Trans. Image Process., vol. 29, pp. 2078–2093, 2020
2078
-
[29]
Mtsam: Multi-task fine- tuning for segment anything model,
X. Wang, Z. ZHUANG, F. YE, and Y . Zhang, “Mtsam: Multi-task fine- tuning for segment anything model,” inInt. Conf. Learn. Represent. (ICLR), Y . Yue, A. Garg, N. Peng, F. Sha, and R. Yu, Eds., vol. 2025, 2025, pp. 95 268–95 289
2025
-
[30]
Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks,
Z. Chen, J. Wu, W. Wang, W. Su, G. Chen, S. Xing, M. Zhong, Q. Zhang, X. Zhu, L. Lu, B. Li, P. Luo, T. Lu, Y . Qiao, and J. Dai, “Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2024, pp. 24 185–24 198
2024
-
[31]
Fedhca2: Towards hetero-client federated multi-task learning,
Y . Lu, S. Huang, Y . Yang, S. Sirejiding, Y . Ding, and H. Lu, “Fedhca2: Towards hetero-client federated multi-task learning,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2024, pp. 5599–5609
2024
-
[32]
One framework to rule them all: Unifying multimodal tasks with llm neural-tuning,
H. Sun, Y . Song, J. Liu, J. Hu, Y .-W. Chen, and L. Lin, “One framework to rule them all: Unifying multimodal tasks with llm neural-tuning,” Pattern Recognit., vol. 171, p. 112275, 2026
2026
-
[33]
Learning multiple tasks with multilinear relationship networks,
M. Long, Z. Cao, J. Wang, and P. S. Yu, “Learning multiple tasks with multilinear relationship networks,” inProc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 30, 2017
2017
-
[34]
Mtmlnet: Multi-task mutual learning network for infrared small target detection and segmentation,
B. Yang, F. Li, S. Zhao, W. Wang, J. Luo, H. Pu, M. Zhou, and Y . Pi, “Mtmlnet: Multi-task mutual learning network for infrared small target detection and segmentation,”IEEE Trans. Image Process., vol. 34, pp. 4414–4425, 2025
2025
-
[35]
Dense pixel-level interpretation of dynamic scenes with video panoptic segmentation,
D. Kim, S. Woo, J.-Y . Lee, and I. S. Kweon, “Dense pixel-level interpretation of dynamic scenes with video panoptic segmentation,” IEEE Trans. Image Process., vol. 31, pp. 5383–5395, 2022
2022
-
[36]
Instance motion tendency learning for video panoptic segmentation,
L. Wang, H. Liu, S. Zhou, W. Tang, and G. Hua, “Instance motion tendency learning for video panoptic segmentation,”IEEE Trans. Image Process., vol. 32, pp. 764–778, 2023
2023
-
[37]
Bdd100k: A diverse driving dataset for heterogeneous multitask learning,
F. Yu, H. Chen, X. Wang, W. Xian, Y . Chen, F. Liu, V . Madhavan, and T. Darrell, “Bdd100k: A diverse driving dataset for heterogeneous multitask learning,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2020, pp. 2636–2645
2020
-
[38]
Dair-v2x: A large-scale dataset for vehicle-infrastructure cooperative 3d object detection,
H. Yu, Y . Luo, M. Shu, Y . Huo, Z. Yang, Y . Shi, Z. Guo, H. Li, X. Hu, J. Yuanet al., “Dair-v2x: A large-scale dataset for vehicle-infrastructure cooperative 3d object detection,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 21 361–21 370
2022
-
[39]
Multi-query vehicle re-identification: Viewpoint-conditioned network, unified dataset and new metric,
A. Zheng, C. Zhang, C. Li, J. Tang, and C. Tan, “Multi-query vehicle re-identification: Viewpoint-conditioned network, unified dataset and new metric,”IEEE Trans. Image Process., vol. 32, pp. 5948–5960, 2023
2023
-
[40]
V2x-seq: A large-scale sequential dataset for vehicle-infrastructure cooperative perception and forecasting,
H. Yu, W. Yang, H. Ruan, Z. Yang, Y . Tang, X. Gao, X. Hao, Y . Shi, Y . Pan, N. Sunet al., “V2x-seq: A large-scale sequential dataset for vehicle-infrastructure cooperative perception and forecasting,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2023, pp. 5486–5495
2023
-
[41]
Griffin: Aerial-ground cooperative detection and tracking dataset and benchmark,
J. Wang, X. Cao, J. Zhong, Y . Zhang, Z. Han, H. Yu, C. Zhang, L. He, S. Xu, and J. Wang, “Griffin: Aerial-ground cooperative detection and tracking dataset and benchmark,” inProc. AAAI Conf. Artif. Intell. (AAAI), vol. 40, no. 12, 2026, pp. 9867–9875
2026
-
[42]
Bayes error estimation using parzen and k-nn procedures,
K. Fukunaga and D. M. Hummels, “Bayes error estimation using parzen and k-nn procedures,”IEEE Trans. Pattern Anal. Mach. Intell., no. 5, pp. 634–643, 1987
1987
-
[43]
Socialized learning: Making each other better through multi-agent collaboration,
X. Yao, Y . Wang, P. Zhu, W. Lin, J. Li, W. Li, and Q. Hu, “Socialized learning: Making each other better through multi-agent collaboration,” in Proc. Int. Conf. Mach. Learn. (ICML), vol. 235, 2024, pp. 56 927–56 945
2024
-
[44]
Mutual information driven equivariant contrastive learning for 3d action representation learning,
L. Lin, J. Zhang, and J. Liu, “Mutual information driven equivariant contrastive learning for 3d action representation learning,”IEEE Trans. Image Process., vol. 33, pp. 1883–1897, 2024
2024
-
[45]
A novel approach for effective multi-view clustering with information-theoretic perspective,
C. Cui, Y . Ren, J. Pu, J. Li, X. Pu, T. Wu, Y . Shi, and L. He, “A novel approach for effective multi-view clustering with information-theoretic perspective,” inProc. Adv. Neural Inf. Process. Syst. (NeurIPS), A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, Eds., vol. 36. Curran Associates, Inc., 2023, pp. 44 847–44 859
2023
-
[46]
Intern: A new learning paradigm towards general vision,
J. Shao, S. Chen, Y . Li, K. Wang, Z. Yin, Y . He, J. Teng, Q. Sun, M. Gao, J. Liuet al., “Intern: A new learning paradigm towards general vision,” arXiv preprint arXiv:2111.08687, 2021
-
[47]
Tadformer: Task-adaptive dynamic transformer for efficient multi-task learning,
S. Baek, S. Lee, H. Jo, H. Choi, and D. Min, “Tadformer: Task-adaptive dynamic transformer for efficient multi-task learning,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2025, pp. 14 858–14 868
2025
-
[48]
Few- shot incremental multi-modal learning via touch guidance and imaginary vision synthesis,
L. Wei, Y . Ma, Z. Lin, F. Wang, C. Jin, H. Zhao, and D. Chen, “Few- shot incremental multi-modal learning via touch guidance and imaginary vision synthesis,” inProc. Int. Joint Conf. Artif. Intell. (IJCAI), 2025, pp. 2045–2053
2025
-
[49]
Bidirectional channel- selective semantic interaction for semi-supervised medical segmentation,
K. Huang, Y . Zhang, Y . Zhou, T. Xu, and T. Zhou, “Bidirectional channel- selective semantic interaction for semi-supervised medical segmentation,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 40, no. 7, p. 5040–5048, Mar. 2026. 13
2026
-
[50]
Decoding with structured awareness: integrating directional, frequency-spatial, and structural attention for medical image segmentation,
F. Zhang, Z. Gu, and H. Wang, “Decoding with structured awareness: integrating directional, frequency-spatial, and structural attention for medical image segmentation,” inProc. AAAI Conf. Artif. Intell. (AAAI), vol. 40, no. 15, 2026, pp. 12 421–12 429
2026
-
[51]
Mambaseg: Harnessing mamba for accurate and efficient image-event semantic segmentation,
F. Gu, Y . Li, X. Long, K. Ji, C. Chen, Q. Gu, and Z. Ni, “Mambaseg: Harnessing mamba for accurate and efficient image-event semantic segmentation,” inProc. AAAI Conf. Artif. Intell. (AAAI), vol. 40, no. 6, 2026, pp. 4302–4310
2026
-
[52]
Semc: Structure- enhanced mixture-of-experts contrastive learning for ultrasound standard plane recognition,
Q. Cai, G. Yan, F. Zhang, C. Zhang, Z. Liuet al., “Semc: Structure- enhanced mixture-of-experts contrastive learning for ultrasound standard plane recognition,” inProc. AAAI Conf. Artif. Intell. (AAAI), vol. 40, no. 4, 2026, pp. 2543–2551
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.