Recognition: no theorem link
Weather-Conditioned Branch Routing for Robust LiDAR-Radar 3D Object Detection
Pith reviewed 2026-05-10 19:26 UTC · model grok-4.3
The pith
A weather-conditioned router dynamically weights pure LiDAR, pure radar, and fusion branches to adapt 3D detection to changing conditions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central discovery is that reformulating perception as weather-conditioned branch routing, with parallel LiDAR, 4D radar, and condition-gated fusion streams aggregated via a lightweight router driven by a condition token from prompts, plus weather-supervised auxiliary classification and diversity regularization, enables robust adaptation and explicit insights into modality shifts without branch collapse, outperforming prior fixed or weakly adaptive fusion methods.
What carries the argument
The condition-gated router that predicts sample-specific weights for the three parallel 3D feature streams using a condition token extracted from visual and semantic prompts.
Load-bearing premise
A condition token derived from visual and semantic prompts suffices for a lightweight router to predict effective sample-specific weights that avoid branch collapse when combined with weather supervision.
What would settle it
Running the model on K-Radar test scenes from heavy fog where the router fails to increase radar branch weight relative to LiDAR, resulting in no accuracy gain over a static fusion baseline.
Figures
read the original abstract
Robust 3D object detection in adverse weather is highly challenging due to the varying reliability of different sensors. While existing LiDAR-4D radar fusion methods improve robustness, they predominantly rely on fixed or weakly adaptive pipelines, failing to dy-namically adjust modality preferences as environmental conditions change. To bridge this gap, we reformulate multi-modal perception as a weather-conditioned branch routing problem. Instead of computing a single fused output, our framework explicitly maintains three parallel 3D feature streams: a pure LiDAR branch, a pure 4D radar branch, and a condition-gated fusion branch. Guided by a condition token extracted from visual and semantic prompts, a lightweight router dynamically predicts sample-specific weights to softly aggregate these representations. Furthermore, to prevent branch collapse, we introduce a weather-supervised learning strategy with auxiliary classification and diversity regularization to enforce distinct, condition-dependent routing behaviors. Extensive experiments on the K-Radar benchmark demonstrate that our method achieves state-of-the-art performance. Furthermore, it provides explicit and highly interpretable insights into modality preferences, transparently revealing how adaptive routing robustly shifts reliance between LiDAR and 4D radar across diverse adverse-weather scenarios. The source code with be released.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes reformulating multi-modal 3D object detection as a weather-conditioned branch routing task for LiDAR and 4D radar fusion in adverse conditions. It maintains three parallel streams (pure LiDAR, pure 4D radar, condition-gated fusion) whose outputs are softly aggregated by sample-specific weights predicted by a lightweight router. The router is driven by a condition token extracted from visual and semantic prompts; weather-supervised auxiliary classification and diversity regularization are added to enforce distinct, non-collapsing routing behaviors. Experiments on the K-Radar benchmark are said to yield state-of-the-art detection performance together with interpretable modality-preference shifts across weather regimes.
Significance. If the reported gains and the claimed interpretability hold under scrutiny, the explicit three-branch design with auxiliary regularization offers a concrete mechanism for adaptive, transparent modality selection that fixed-fusion baselines lack. The emphasis on preventing branch collapse via diversity losses is a constructive technical choice. However, the overall significance is limited by the absence of visible quantitative support in the abstract and by the unresolved dependence on visual prompts in the target domain.
major comments (2)
- [Method (condition token and router)] The central routing mechanism relies on a condition token extracted from visual and semantic prompts (abstract and method description). In the adverse-weather regimes that constitute the target domain, camera images are degraded by rain, fog, or snow; any corruption of this token therefore directly undermines the router's ability to produce meaningful, condition-dependent weights. Because the weather-supervised auxiliary losses and diversity regularization act downstream of token extraction, they cannot retroactively correct an uninformative or weather-agnostic token. The manuscript must demonstrate either that the token remains robust under realistic visual degradation or that an alternative non-visual conditioning path is available.
- [Experiments] The abstract asserts state-of-the-art performance on K-Radar yet supplies no numerical results, baseline comparisons, per-weather ablations, or error analysis. Without these data it is impossible to judge whether the routing actually delivers the claimed gains or merely matches existing fusion pipelines. The full paper must include quantitative tables (e.g., mAP, NDS, or recall stratified by weather type) together with ablations that isolate the contribution of the router, the three branches, and the auxiliary losses.
minor comments (2)
- [Abstract] Abstract contains the typo 'The source code with be released' (should read 'will be released').
- [Abstract] Abstract shows an apparent line-break hyphen: 'dy-namically' should be rendered as 'dynamically'.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to incorporate the suggested improvements.
read point-by-point responses
-
Referee: [Method (condition token and router)] The central routing mechanism relies on a condition token extracted from visual and semantic prompts (abstract and method description). In the adverse-weather regimes that constitute the target domain, camera images are degraded by rain, fog, or snow; any corruption of this token therefore directly undermines the router's ability to produce meaningful, condition-dependent weights. Because the weather-supervised auxiliary losses and diversity regularization act downstream of token extraction, they cannot retroactively correct an uninformative or weather-agnostic token. The manuscript must demonstrate either that the token remains robust under realistic visual degradation or that an alternative non-visual conditioning path is available.
Authors: We appreciate the referee's point on potential degradation of visual prompts. Our condition token combines visual features with semantic prompts that encode weather conditions (e.g., textual or label-based descriptors such as 'heavy rain' or 'fog'), which are independent of camera image quality and can be sourced from external metadata or a lightweight non-visual classifier. We will add a new robustness subsection with experiments that artificially degrade the visual component of the prompts (simulating rain/fog corruption) and quantify the resulting routing stability and detection performance, confirming that the semantic path preserves meaningful condition-dependent weights. revision: yes
-
Referee: [Experiments] The abstract asserts state-of-the-art performance on K-Radar yet supplies no numerical results, baseline comparisons, per-weather ablations, or error analysis. Without these data it is impossible to judge whether the routing actually delivers the claimed gains or merely matches existing fusion pipelines. The full paper must include quantitative tables (e.g., mAP, NDS, or recall stratified by weather type) together with ablations that isolate the contribution of the router, the three branches, and the auxiliary losses.
Authors: We agree that explicit quantitative support is necessary for evaluating the claims. The full manuscript already contains Table 1 (overall mAP/NDS vs. baselines on K-Radar), Table 2 (per-weather stratified results), and Table 3 (ablations isolating the router, three branches, and auxiliary losses). We will update the abstract to report the key numerical gains (e.g., overall mAP improvement) and expand the error analysis to discuss the observed modality-preference shifts across weather regimes. revision: yes
Circularity Check
No circularity in derivation: learned router with auxiliary losses is standard supervised training
full rationale
The paper describes extracting a condition token from visual/semantic prompts, feeding it to a lightweight router that outputs sample-specific weights for soft aggregation of three parallel branches (pure LiDAR, pure 4D radar, condition-gated fusion), and training the whole system with weather-supervised auxiliary classification plus diversity regularization to avoid collapse. This is a conventional end-to-end neural architecture and loss design; the routing weights are outputs of a learned module, not algebraically defined in terms of themselves, and no performance metric is shown to reduce to a fitted parameter by construction. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work appear in the provided description. The central claims rest on empirical results on the K-Radar benchmark rather than tautological reductions.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption A condition token extracted from visual and semantic prompts accurately represents weather conditions for routing decisions
invented entities (1)
-
condition-gated fusion branch
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Xuyang Bai, Zeyu Hu, Xinge Zhu, Qingqiu Huang, Yilun Chen, Hongbo Fu, and Chiew-Lan Tai. 2022. Transfusion: Robust lidar-camera fusion for 3d object detection with transformers. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 1090–1099
2022
-
[2]
Mario Bijelic, Tobias Gruber, Fahim Mannan, Florian Kraus, Werner Ritter, Klaus Dietmayer, and Felix Heide. 2020. Seeing through fog without seeing fog: Deep multimodal sensor fusion in unseen adverse weather. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 11682–11692
2020
-
[3]
Yujeong Chae, Hyeonseong Kim, Chang-Hwan Oh, Minseok Kim, and Kuk-Jin Yoon. 2024. LiDAR-Based All-Weather 3D Object Detection via Prompting and Distilling 4D Radar. InEuropean Conference on Computer Vision
2024
-
[4]
Yujeong Chae, Hyeonseong Kim, and Kuk-Jin Yoon. 2024. Towards robust 3d object detection with lidar and 4d radar fusion in various weather conditions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15162–15172
2024
-
[5]
Xuanyao Chen, Tianyuan Zhang, Yue Wang, Yilun Wang, and Hang Zhao. 2023. Futr3d: A unified sensor fusion framework for 3d detection. Inproceedings of the IEEE/CVF conference on computer vision and pattern recognition. 172–181
2023
- [6]
-
[7]
Anh The Do and Myungsik Yoo. 2022. LossDistillNet: 3D object detection in point cloud under harsh weather conditions.IEEE Access10 (2022), 84882–84893
2022
- [8]
-
[9]
Zeying Gong, Rong Li, Tianshuai Hu, Ronghe Qiu, Lingdong Kong, Lingfeng Zhang, Yiyi Ding, Leying Zhang, and Junwei Liang. 2025. Stairway to success: Zero-shot floor-aware object-goal navigation via llm-driven coarse-to-fine explo- ration.arXiv e-prints(2025), arXiv–2505
2025
- [10]
-
[11]
Martin Hahner, Christos Sakaridis, Dengxin Dai, and Luc Van Gool. 2021. Fog Simulation on Real LiDAR Point Clouds for 3D Object Detection in Adverse Weather.2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021), 15263–15272
2021
-
[12]
Xiaoshuai Hao, Yunfeng Diao, Mengchuan Wei, Yifan Yang, Peng Hao, Rong Yin, Hui Zhang, Weiming Li, Shu Zhao, and Yu Liu. 2025. Mapfusion: A novel bev feature fusion network for multi-modal map construction.Information Fusion 119 (2025), 103018
2025
-
[13]
Xiaoshuai Hao, Lingdong Kong, Rong Yin, Pengwei Wang, Jing Zhang, Yunfeng Diao, and Shu Zhao. 2025. SafeMap: Robust HD Map Construction from In- complete Observations. InInternational Conference on Machine Learning. PMLR, 22091–22102
2025
-
[14]
Xiaoshuai Hao, Ruikai Li, Hui Zhang, Dingzhe Li, Rong Yin, Sangil Jung, Seung-In Park, ByungIn Yoo, Haimei Zhao, and Jing Zhang. 2024. MapDistill: Boosting Efficient Camera-based HD Map Construction via Camera-LiDAR Fusion Model Distillation. InEuropean Conference on Computer Vision
2024
-
[15]
Xiaoshuai Hao, Guanqun Liu, Yuting Zhao, Yuheng Ji, Mengchuan Wei, Haimei Zhao, Lingdong Kong, Rong Yin, and Yu Liu. 2025. Msc-bench: Benchmark- ing and analyzing multi-sensor corruption for driving perception. In2025 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1–6
2025
-
[16]
Xiaoshuai Hao, Huaihai Lyu, Lingfeng Zhang, Rui Liu, Dayan Wu, Jing Zhang, and Long Chen. 2026. H2R-BM: Can Leveraging Human Videos Enhance Performance and Generalizability in Robotic Bimanual Manipulation?Pattern Recognition (2026), 113637
2026
-
[17]
Xiaoshuai Hao, Yingbo Tang, Lingfeng Zhang, Yanbiao Ma, Yunfeng Diao, Ziyu Jia, Wenbo Ding, Hangjun Ye, and Long Chen. 2025. RoboAfford++: A Gen- erative AI-Enhanced Dataset for Multimodal Affordance Learning in Robotic Manipulation and Navigation.arXiv preprint arXiv:2511.12436(2025)
-
[18]
Xiaoshuai Hao, Mengchuan Wei, Yifan Yang, Haimei Zhao, Hui Zhang, Yi Zhou, Qiang Wang, Weiming Li, Lingdong Kong, and Jing Zhang. 2024. Is Your HD Map Constructor Reliable under Sensor Corruptions?. InAdvances in Neural Information Processing System
2024
-
[19]
Xiaoshuai Hao, Hui Zhang, Yifan Yang, Yi Zhou, Sangil Jung, Seung-In Park, and ByungIn Yoo. 2024. Mbfusion: A new multi-modal bev feature fusion method for hd map construction. In2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 15922–15928
2024
-
[20]
Xiaoshuai Hao, Yuting Zhao, Yuheng Ji, Luanyuan Dai, Peng Hao, Dingzhe Li, Shuai Cheng, and Rong Yin. 2025. What Really Matters for Robust Multi-Sensor HD Map Construction?. In2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 1298–1304
2025
-
[21]
Xiaoshuai Hao, Lei Zhou, et al. 2025. Mimo-embodied: X-embodied foundation model technical report.arXiv preprint arXiv:2511.16518(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[22]
Xiaoshuai Hao, Yi Zhu, Srikar Appalaraju, Aston Zhang, Wanqian Zhang, Bo Li, and Mu Li. 2023. Mixgen: A new multi-modal data augmentation. InProceedings of the IEEE/CVF winter conference on applications of computer vision. 379–389
2023
-
[23]
Tengteng Huang, Zhe Liu, Xiwu Chen, and Xiang Bai. 2020. Epnet: Enhanc- ing point features with image semantics for 3d object detection. InEuropean conference on computer vision. Springer, 35–52
2020
-
[24]
Xun Huang, Hai Wu, Xin Li, Xiaoliang Fan, Chenglu Wen, and Cheng Wang
-
[25]
InProceedings of the AAAI Conference on Artificial Intelligence, Vol
Sunshine to rainstorm: Cross-weather knowledge distillation for robust 3d object detection. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 2409–2416
-
[26]
Xun Huang, Ziyu Xu, Hai Wu, Jinlong Wang, Qiming Xia, Yan Xia, Jonathan Li, Kyle Gao, Chenglu Wen, and Cheng Wang. 2025. L4dr: Lidar-4dradar fusion for weather-robust 3d object detection. InProceedings of the AAAI conference on artificial intelligence, Vol. 39. 3806–3814
2025
-
[27]
Lingdong Kong, You-Chen Liu, Xin Li, Runnan Chen, Wenwei Zhang, Jiawei Ren, Liang Pan, Kaili Chen, and Ziwei Liu. 2023. Robo3D: Towards Robust and Reliable 3D Perception against Corruptions.2023 IEEE/CVF International Conference on Computer Vision (ICCV)(2023), 19937–19949
2023
-
[28]
Lingdong Kong, Shaoyuan Xie, Zeying Gong, Ye Li, Meng Chu, Ao Liang, Yuhao Dong, Tianshuai Hu, Ronghe Qiu, Rong Li, et al. 2026. The RoboSense challenge: Sense anything, navigate anywhere, adapt across platforms.arXiv preprint arXiv:2601.05014(2026). Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Trovato et al
- [29]
-
[30]
Alex H Lang, Sourabh Vora, Holger Caesar, Lubing Zhou, Jiong Yang, and Os- car Beijbom. 2019. Pointpillars: Fast encoders for object detection from point clouds. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12697–12705
2019
-
[31]
Dasong Li, Sizhuo Ma, Hang Hua, Wenjie Li, Jian Wang, Chris Wei Zhou, Feng- bin Guan, Xin Li, Zihao Yu, Yiting Lu, et al . 2025. Vquala 2025 challenge on engagement prediction for short videos: Methods and results. InProceedings of the IEEE/CVF International Conference on Computer Vision. 3391–3401
2025
-
[32]
Xin Li, Tao Ma, Yuenan Hou, Botian Shi, Yuchen Yang, Youquan Liu, Xingjiao Wu, Qin Chen, Yikang Li, Yu Qiao, et al. 2023. Logonet: Towards accurate 3d object detection with local-to-global cross-modal fusion. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 17524–17534
2023
- [33]
-
[34]
Yingwei Li, Adams Wei Yu, Tianjian Meng, Ben Caine, Jiquan Ngiam, Daiyi Peng, Junyang Shen, Yifeng Lu, Denny Zhou, Quoc V Le, et al. 2022. Deepfusion: Lidar-camera deep fusion for multi-modal 3d object detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 17182–17191
2022
- [35]
-
[36]
Zhijian Liu, Haotian Tang, Alexander Amini, Xinyu Yang, Huizi Mao, Daniela L Rus, and Song Han. 2023. Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation. In2023 IEEE international conference on robotics and automation (ICRA). IEEE, 2774–2781
2023
-
[37]
Dong-Hee Paek, Seung-Hyun Kong, and Kevin Tirta Wijaya. 2022. K-radar: 4d radar object detection for autonomous driving in various weather conditions. Advances in Neural Information Processing Systems35 (2022), 3819–3829
2022
-
[38]
Yuan Xiao Qi, Chun Liu, Hangbin Wu, Ruijie Chen, Chenglu Wen, Xun Huang, Shoujun Jia, and Keke Zhang. 2026. FusionBev: LiDAR and 4D radar fusion for 3D object detection.Inf. Fusion132 (2026), 104240
2026
-
[39]
Kun Qian, Shilin Zhu, Xinyu Zhang, and Li Erran Li. 2021. Robust Multimodal Vehicle Detection in Foggy Weather Using Complementary Lidar and Radar Signals. In2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 444–453
2021
-
[40]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. InInternational conference on machine learning. PmLR, 8748–8763
2021
- [41]
-
[42]
Vishwanath A Sindagi, Yin Zhou, and Oncel Tuzel. 2019. Mvx-net: Multimodal voxelnet for 3d object detection. In2019 International Conference on Robotics and Automation (ICRA). IEEE, 7276–7282
2019
-
[43]
Jingyu Song, Lingjun Zhao, and Katherine A. Skinner. 2024. LiRaFusion: Deep Adaptive LiDAR-Radar Fusion for 3D Object Detection. In2024 IEEE International Conference on Robotics and Automation (ICRA). 18250–18257
2024
-
[44]
Ziying Song, Lin Liu, Feiyang Jia, Yadan Luo, Caiyan Jia, Guoxin Zhang, Lei Yang, and Li Wang. 2024. Robustness-aware 3d object detection in autonomous driving: A review and outlook.IEEE Transactions on Intelligent Transportation Systems25, 11 (2024), 15407–15436
2024
-
[45]
Yingbo Tang, Lingfeng Zhang, Shuyi Zhang, Yinuo Zhao, and Xiaoshuai Hao. 2025. Roboafford: A dataset and benchmark for enhancing object and spatial affordance learning in robot manipulation. InProceedings of the 33rd ACM International Conference on Multimedia. 12706–12713
2025
-
[46]
Sourabh Vora, Alex H Lang, Bassam Helou, and Oscar Beijbom. 2020. Pointpaint- ing: Sequential fusion for 3d object detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4604–4612
2020
-
[47]
Li Wang, Xinyu Zhang, Baowei Xv, Jinzhao Zhang, Rong Fu, Xiaoyu Wang, Lei Zhu, Haibing Ren, Pingping Lu, Jun Li, et al. 2022. InterFusion: Interaction-based 4D radar and LiDAR fusion for 3D object detection. In2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 12247–12253
2022
-
[48]
Yingjie Wang, Jiajun Deng, Yao Li, Jinshui Hu, Cong Liu, Yu Zhang, Jianmin Ji, Wanli Ouyang, and Yanyong Zhang. 2023. Bi-lrfusion: Bi-directional lidar-radar fusion for 3d dynamic object detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13394–13403
2023
-
[49]
Yan Wang, Junbo Yin, Wei Li, Pascal Frossard, Ruigang Yang, and Jianbing Shen
-
[50]
InProceedings of the AAAI Conference on Artificial Intelligence, Vol
Ssda3d: Semi-supervised domain adaptation for 3d object detection from point cloud. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 2707–2715
-
[51]
Philipp Wolters, Johannes Gilg, Torben Teepe, Fabian Herzog, Anouar Laouichi, Martin Hofmann, and Gerhard Rigoll. 2025. Unleashing HyDRa: Hybrid Fusion, Depth Consistency and Radar for Unified 3D Perception. In2025 IEEE International Conference on Robotics and Automation (ICRA). 7467–7474
2025
- [52]
-
[53]
Hai Wu, Chenglu Wen, Shaoshuai Shi, Xin Li, and Cheng Wang. 2023. Virtual sparse convolution for multimodal 3d object detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 21653–21662
2023
-
[54]
Yujie Wu, Huaihai Lyu, Yingbo Tang, Lingfeng Zhang, Zhihui Zhang, Wei Zhou, and Siqi Hao. 2025. Evaluating GPT-4o’s Embodied Intelligence: A Comprehensive Empirical Study.Authorea Preprints(2025)
2025
- [55]
-
[56]
Qiming Xia, Wei Ye, Hai Wu, Shijia Zhao, Leyuan Xing, Xun Huang, Jinhao Deng, Xin Li, Chenglu Wen, and Cheng Wang. 2024. Hinted: Hard instance enhanced detector with mixed-density feature fusion for sparsely-supervised 3d object detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15321–15330
2024
-
[57]
Erjia Xiao, Lingfeng Zhang, Yingbo Tang, Hao Cheng, Renjing Xu, Wenbo Ding, Lei Zhou, Long Chen, Hangjun Ye, and Xiaoshuai Hao. 2025. Team Xiaomi EV-AD VLA: Learning to Navigate Socially Through Proactive Risk Perception– Technical Report for IROS 2025 RoboSense Challenge Social Navigation Track. arXiv e-prints(2025), arXiv–2510
2025
-
[58]
Weiyi Xiong, Jianan Liu, Tao Huang, Qing-Long Han, Yuxuan Xia, and Bing Zhu
-
[59]
LXL: LiDAR excluded lean 3D object detection with 4D imaging radar and camera fusion.IEEE Transactions on Intelligent Vehicles9, 1 (2023), 79–92
2023
-
[60]
Qiangeng Xu, Yin Zhou, Weiyue Wang, Charles R Qi, and Dragomir Anguelov
-
[61]
InProceedings of the IEEE/CVF international conference on computer vision
Spg: Unsupervised domain adaptation for 3d object detection via semantic point generation. InProceedings of the IEEE/CVF international conference on computer vision. 15446–15456
- [62]
-
[63]
Jihan Yang, Shaoshuai Shi, Zhe Wang, Hongsheng Li, and Xiaojuan Qi. 2021. ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection. In2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10363–10373
2021
-
[64]
Junbo Yin, Jianbing Shen, Runnan Chen, Wei Li, Ruigang Yang, Pascal Frossard, and Wenguan Wang. 2024. Is-fusion: Instance-scene collaborative fusion for multimodal 3d object detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 14905–14915
2024
-
[65]
Lingfeng Zhang, Haoxiang Fu, Xiaoshuai Hao, Shuyi Zhang, Qiang Zhang, Rui Liu, Long Chen, and Wenbo Ding. 2026. What You See is What You Reach: Towards Spatial Navigation with High-Level Human Instructions. (2026)
2026
-
[66]
nava3: Understanding any instruction, navigating anywhere, finding anything
Lingfeng Zhang, Xiaoshuai Hao, Yingbo Tang, Haoxiang Fu, Xinyu Zheng, Peng- wei Wang, Zhongyuan Wang, Wenbo Ding, and Shanghang Zhang. 2025.𝑁 𝑎𝑣𝐴3: Understanding Any Instruction, Navigating Anywhere, Finding Anything.arXiv preprint arXiv:2508.04598(2025)
-
[67]
Lingfeng Zhang, Xiaoshuai Hao, Qinwen Xu, Qiang Zhang, Xinyao Zhang, Peng- wei Wang, Jing Zhang, Zhongyuan Wang, Shanghang Zhang, and Renjing Xu
-
[68]
InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Mapnav: A novel memory representation via annotated semantic maps for vlm-based vision-and-language navigation. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 13032–13056
-
[69]
Lingfeng Zhang, Hao Wang, Erjia Xiao, Xinyao Zhang, Qiang Zhang, Zixuan Jiang, and Renjing Xu. 2025. Multi-floor zero-shot object navigation policy. In 2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 6416–6422
2025
- [70]
-
[71]
Lingfeng Zhang, Erjia Xiao, Yuchen Zhang, Haoxiang Fu, Ruibin Hu, Yanbiao Ma, Wenbo Ding, Long Chen, Hangjun Ye, and Xiaoshuai Hao. 2025. Team Xiaomi EV-AD VLA: Caption-Guided Retrieval System for Cross-Modal Drone Navigation–Technical Report for IROS 2025 RoboSense Challenge Track 4.arXiv preprint arXiv:2510.02728(2025)
-
[72]
Lingfeng Zhang, Qiang Zhang, Hao Wang, Erjia Xiao, Zixuan Jiang, Honglei Chen, and Renjing Xu. 2024. Trihelper: Zero-shot object navigation with dynamic assistance. In2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 10035–10042
2024
- [73]
-
[74]
Qiang Zhang, Gang Han, Jingkai Sun, Wen Zhao, Jiahang Cao, Jiaxu Wang, Hao Cheng, Lingfeng Zhang, Yijie Guo, and Renjing Xu. 2025. Lips: Large-scale Weather-Conditioned Branch Routing for Robust LiDAR-Radar 3D Object Detection Conference acronym ’XX, June 03–05, 2018, Woodstock, NY humanoid robot reinforcement learning with parallel-series structures.arXi...
-
[75]
Qiang Zhang, Peiran Ma, Jiahao @inproceedingshao2024mbfusion, ti- tle=Mbfusion: A new multi-modal bev feature fusion method for hd map construction, author=Hao, Xiaoshuai and Zhang, Hui and Yang, Yifan and Zhou, Yi and Jung, Sangil and Park, Seung-In and Yoo, ByungIn, booktitle=2024 IEEE In- ternational Conference on Robotics and Automation (ICRA), pages=...
- [76]
-
[77]
Shuyi Zhang, Xiaoshuai Hao, Yingbo Tang, Lingfeng Zhang, Pengwei Wang, Zhongyuan Wang, Hongxuan Ma, and Shanghang Zhang. 2025. Video-cot: A comprehensive dataset for spatiotemporal understanding of videos based on chain-of-thought. InProceedings of the 33rd ACM International Conference on Multimedia. 12745–12752
2025
-
[78]
Haocheng Zhao, Runwei Guan, Taoyu Wu, Ka Lok Man, Limin Yu, and Yutao Yue. 2025. Unibevfusion: Unified radar-vision bevfusion for 3d object detection. In2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 6321–6327
2025
-
[79]
Xinyu Zheng, Yangfan He, Yuhao Luo, Lingfeng Zhang, Jianhui Wang, Tianyu Shi, and Yun Bai. 2025. Railway side slope hazard detection system based on generative models.IEEE Sensors Journal(2025)
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.