pith. machine review for the scientific record. sign in

arxiv: 2604.08074 · v1 · submitted 2026-04-09 · 💻 cs.CV

Recognition: unknown

DinoRADE: Full Spectral Radar-Camera Fusion with Vision Foundation Model Features for Multi-class Object Detection in Adverse Weather

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:22 UTC · model grok-4.3

classification 💻 cs.CV
keywords radar-camera fusionadverse weatherobject detectionDINOv3deformable cross-attentionmulti-class detectionK-Radarautonomous driving
0
0 comments X

The pith

DinoRADE fuses dense radar tensors with DINOv3 vision features via deformable cross-attention to improve multi-class object detection in adverse weather by 12.1 percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents DinoRADE as a radar-centered pipeline for object detection that remains effective when visibility is poor. It takes dense radar tensors as input and uses deformable cross-attention to gather relevant features from a DINOv3 vision foundation model at points obtained by transforming radar locations into the camera image plane. This design addresses the limited spatial resolution of radar alone, especially for smaller objects such as pedestrians and cyclists. The approach is evaluated on the K-Radar dataset across multiple weather conditions and delivers a 12.1 percent gain over recent radar-camera fusion baselines while reporting separate results for five object classes. A reader would care because current autonomous driving systems need reliable perception of vulnerable road users even in rain, snow, or fog.

Core claim

DinoRADE processes dense Radar tensors and aggregates vision features around transformed reference points in the camera perspective via deformable cross-attention, with vision features supplied by a DINOv3 Vision Foundation Model, yielding improved multi-class detection performance on the K-Radar dataset in all weather conditions and outperforming recent Radar-camera approaches by 12.1 percent.

What carries the argument

Deformable cross-attention that aggregates DINOv3 vision features around radar-transformed reference points projected into the camera view

If this is right

  • The pipeline enables separate performance reporting for five object classes including vulnerable road users on an adverse-weather dataset.
  • Radar-only limitations in fine spatial detail are mitigated by pulling in high-resolution vision features at radar reference locations.
  • The 12.1 percent gain over prior radar-camera methods holds across all weather conditions in the K-Radar evaluation.
  • Vision foundation model features can be incorporated into radar-centered detection without requiring full image processing at every step.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar deformable-attention fusion could be tested with other vision foundation models to check whether gains are specific to DINOv3.
  • The method's reliance on accurate radar-to-camera projection suggests potential sensitivity to calibration drift in deployed vehicles.
  • If the performance lift generalizes, it could reduce the required radar resolution or sensor count in production autonomous driving stacks.
  • Extending the same reference-point mechanism to lidar-camera pairs might address low-visibility scenarios beyond radar.

Load-bearing premise

The K-Radar dataset distribution and the chosen reference-point transformation accurately represent real-world radar-camera calibration and adverse-weather statistics, and DINOv3 features transfer without significant domain gap to radar-projected image regions.

What would settle it

Evaluating DinoRADE on an independent radar-camera dataset collected in adverse weather and observing no improvement or a drop in mean average precision for the five object classes would falsify the claimed performance advantage.

Figures

Figures reproduced from arXiv: 2604.08074 by Christof Leitgeb, Daniel Watzenig, Max Peter Ronecker, Thomas Puchleitner.

Figure 1
Figure 1. Figure 1: Overview of the DinoRADE architecture [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Reference points projected from 3D Radar queries to [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: DinoRADE performance visualization in four different scenarios: (a) university campus, (b) alleyway, (c) highway, and (d) road [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Examples for partially occluded (1), heavily occluded [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
read the original abstract

Reliable and weather-robust perception systems are essential for safe autonomous driving and typically employ multi-modal sensor configurations to achieve comprehensive environmental awareness. While recent automotive FMCW Radar-based approaches achieved remarkable performance on detection tasks in adverse weather conditions, they exhibited limitations in resolving fine-grained spatial details particularly critical for detecting smaller and vulnerable road users (VRUs). Furthermore, existing research has not adequately addressed VRU detection in adverse weather datasets such as K-Radar. We present DinoRADE, a Radar-centered detection pipeline that processes dense Radar tensors and aggregates vision features around transformed reference points in the camera perspective via deformable cross-attention. Vision features are provided by a DINOv3 Vision Foundation Model. We present a comprehensive performance evaluation on the K-Radar dataset in all weather conditions and are among the first to report detection performance individually for five object classes. Additionally, we compare our method with existing single-class detection approaches and outperform recent Radar-camera approaches by 12.1%. The code is available under https://github.com/chr-is-tof/RADE-Net.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces DinoRADE, a radar-centered detection pipeline that processes dense FMCW radar tensors and fuses them with features from a DINOv3 vision foundation model. Features are aggregated around radar-to-camera transformed reference points using deformable cross-attention. The method is evaluated on the K-Radar dataset across weather conditions, reporting multi-class results for five object categories (including VRUs) and claiming a 12.1% improvement over recent radar-camera fusion baselines.

Significance. If the reported gains are reproducible and attributable to the proposed fusion rather than dataset-specific factors, the work would usefully demonstrate how pre-trained vision foundation models can be integrated into radar-centric pipelines to improve spatial resolution for small objects in adverse weather. The emphasis on per-class metrics for five categories on K-Radar and the release of code are constructive contributions to the empirical literature on multi-modal adverse-weather perception.

major comments (3)
  1. [Abstract and §4] Abstract and §4 (Experiments): The central claim of a 12.1% outperformance is stated without the underlying metric (mAP, AP@0.5, etc.), the numerical scores of the compared radar-camera baselines, or any ablation isolating the DINOv3 deformable-attention component from the radar-only backbone. This absence prevents verification that the gain stems from the claimed full-spectral fusion rather than implementation details or dataset tuning.
  2. [§3] §3 (Method): The architecture description contains no domain-adaptation layer, weather-conditioned normalization, or explicit handling of the domain gap between DINOv3’s clear-weather pre-training distribution and the fog/rain/snow subsets of K-Radar. The deformable cross-attention simply consumes whatever features DINOv3 produces on the projected regions; if those features degrade substantially, the reported multi-modal benefit may be overstated.
  3. [§4] §4 (Experiments): No per-weather-condition breakdowns, error analysis, or statistical significance tests are referenced for the five-class results. Without these, it is impossible to determine whether the method’s advantage holds uniformly across adverse conditions or is driven by easier subsets.
minor comments (2)
  1. [Abstract] The abstract and introduction would benefit from a concise table or sentence listing the exact prior radar-camera methods being compared and their reported scores on the same K-Radar split.
  2. [§3] Notation for the radar tensor representation and the reference-point transformation could be made more explicit (e.g., coordinate frames and calibration parameters) to aid reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, indicating where we will revise the manuscript to improve clarity and completeness.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (Experiments): The central claim of a 12.1% outperformance is stated without the underlying metric (mAP, AP@0.5, etc.), the numerical scores of the compared radar-camera baselines, or any ablation isolating the DINOv3 deformable-attention component from the radar-only backbone. This absence prevents verification that the gain stems from the claimed full-spectral fusion rather than implementation details or dataset tuning.

    Authors: We agree that the metric, baseline scores, and an isolating ablation are necessary for full verification. The 12.1% figure refers to the improvement in mAP at IoU=0.5 over the strongest radar-camera baseline on the full K-Radar test set. We will revise the abstract and add a results table in §4 that lists exact mAP scores for all compared methods. We will also insert an ablation subsection in §4 that removes the DINOv3 deformable-attention branch and reports the resulting drop relative to the full model. revision: yes

  2. Referee: [§3] §3 (Method): The architecture description contains no domain-adaptation layer, weather-conditioned normalization, or explicit handling of the domain gap between DINOv3’s clear-weather pre-training distribution and the fog/rain/snow subsets of K-Radar. The deformable cross-attention simply consumes whatever features DINOv3 produces on the projected regions; if those features degrade substantially, the reported multi-modal benefit may be overstated.

    Authors: We acknowledge the domain-shift issue. Our current design freezes DINOv3 and applies no explicit adaptation or weather-conditioned normalization, relying on the foundation model’s reported robustness. We will expand §3 with a dedicated paragraph discussing the pre-training versus K-Radar distribution gap and its potential impact on feature quality. We will also add a short qualitative study of DINOv3 feature activation maps on adverse-weather images to the supplementary material. revision: partial

  3. Referee: [§4] §4 (Experiments): No per-weather-condition breakdowns, error analysis, or statistical significance tests are referenced for the five-class results. Without these, it is impossible to determine whether the method’s advantage holds uniformly across adverse conditions or is driven by easier subsets.

    Authors: We agree that condition-specific breakdowns strengthen the claims. K-Radar provides weather labels, so we will add a new table in §4 reporting mAP per weather subset (clear, fog, rain, snow) for the five classes. We will also include a brief error analysis highlighting common failure modes for VRUs and small objects, and report standard deviations across three random seeds to indicate variability. revision: yes

Circularity Check

0 steps flagged

Empirical pipeline with no derivations or predictions by construction

full rationale

The paper presents DinoRADE as an architecture (dense radar tensor processing + deformable cross-attention to aggregate DINOv3 features) and reports empirical mAP gains on the K-Radar dataset. No equations, first-principles derivations, fitted parameters renamed as predictions, or uniqueness theorems appear in the provided text. Performance claims rest on direct experimental comparison rather than any reduction to self-defined inputs. Self-citations, if present, are not load-bearing for any claimed derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No load-bearing free parameters, axioms, or invented entities are identifiable from the abstract. The pipeline uses standard deep-learning components (deformable attention, foundation-model features) and an existing public dataset.

pith-pipeline@v0.9.0 · 5501 in / 1254 out tokens · 53685 ms · 2026-05-10T17:22:52.495862+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

54 extracted references · 8 canonical work pages · 2 internal anchors

  1. [1]

    Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Gi- ancarlo Baldan, and Oscar Beijbom

    Holger Caesar, Varun Bankiti, Alex H. Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Gi- ancarlo Baldan, and Oscar Beijbom. nuScenes: A Multi- modal Dataset for Autonomous Driving . In2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11618–11628, Los Alamitos, CA, USA,

  2. [3]

    Zhe Chen, Yuchen Duan, Wenhai Wang, Junjun He, Tong Lu, Jifeng Dai, and Y . Qiao. Vision transformer adapter for dense predictions.ArXiv, abs/2205.08534, 2022

  3. [4]

    V oxel r-cnn: Towards high performance voxel-based 3d object detection.Proceed- ings of the AAAI Conference on Artificial Intelligence, 35: 1201–1209, 2021

    Jiajun Deng, Shaoshuai Shi, Peiwei Li, Wengang Zhou, Yanyong Zhang, and Houqiang Li. V oxel r-cnn: Towards high performance voxel-based 3d object detection.Proceed- ings of the AAAI Conference on Artificial Intelligence, 35: 1201–1209, 2021

  4. [5]

    A review of research on vehicle detection in adverse weather environments.Journal of Traffic and Trans- portation Engineering (English Edition), 12(5):1452–1483, 2025

    Sheng Feng, Xueying Cai, Limin Li, Weixing Wang, and Senang Ying. A review of research on vehicle detection in adverse weather environments.Journal of Traffic and Trans- portation Engineering (English Edition), 12(5):1452–1483, 2025. 8 Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026

  5. [6]

    F. Fent, A. Palffy, and H. Caesar. Dpft: Dual perspec- tive fusion transformer for camera-radar-based object de- tection.IEEE Transactions on Intelligent Vehicles, 10(11): 4929–4941, 2025

  6. [7]

    Raw ADC data of 77GHz MMWave radar for au- tomotive object detection, 2022

    Xiangyu Gao, Youchen Luo, Guanbin Xing, Sumit Roy, and Hui Liu. Raw ADC data of 77GHz MMWave radar for au- tomotive object detection, 2022. Distributed by IEEE Data- port

  7. [8]

    Are we ready for autonomous driving? the kitti vision benchmark suite

    Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 3354–3361, 2012

  8. [9]

    T- fftradnet: Object detection with swin vision transformers from raw adc radar signals

    James Giroux, Martin Bouchard, and Robert Lagani `ere. T- fftradnet: Object detection with swin vision transformers from raw adc radar signals. In2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pages 4032–4041, 2023

  9. [10]

    Rich feature hierarchies for accurate object detection and semantic segmentation

    Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In2014 IEEE Conference on Computer Vision and Pattern Recognition, pages 580–587, 2014

  10. [11]

    Wavelet-based multi-view fu- sion of 4d radar tensor and camera for robust 3d object de- tection, 2026

    Runwei Guan, Jianan Liu, Shaofeng Liang, Fangqiang Ding, Shanliang Yao, Xiaokai Bai, Daizong Liu, Tao Huang, Guo- qiang Mao, and Hui Xiong. Wavelet-based multi-view fu- sion of 4d radar tensor and camera for robust 3d object de- tection, 2026

  11. [12]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceed- ings of the IEEE Conference on Computer Vision and Pat- tern Recognition (CVPR), 2016

  12. [13]

    Bevdet: High-performance multi-camera 3d object detection in bird-eye-view.arXiv preprint arXiv:2112.11790, 2021

    Junjie Huang, Guan Huang, Zheng Zhu, Yun Ye, and Dalong Du. BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View, 2022. arXiv:2112.11790 [cs]

  13. [14]

    L4dr: Lidar-4dradar fusion for weather-robust 3d object detection.Proceedings of the AAAI Conference on Artificial Intelligence, 39(4):3806–3814, 2025

    Xun Huang, Ziyu Xu, Hai Wu, Jinlong Wang, Qiming Xia, Yan Xia, Jonathan Li, Kyle Gao, Chenglu Wen, and Cheng Wang. L4dr: Lidar-4dradar fusion for weather-robust 3d object detection.Proceedings of the AAAI Conference on Artificial Intelligence, 39(4):3806–3814, 2025

  14. [15]

    3d object detection for autonomous driving: A survey

    JunXin Jin, Wei Liu, Zuotao Ning, Qixi Zhao, Shuai Cheng, and Jun Hu. 3d object detection for autonomous driving: A survey. In2024 36th Chinese Control and Decision Confer- ence (CCDC), pages 3825–3832, 2024

  15. [16]

    RTNH+: Enhanced 4D Radar Object Detection Network Using Two-Level Preprocessing and Vertical Encoding

    Seung-Hyun Kong, Dong-Hee Paek, and Sangyeong Lee. RTNH+: Enhanced 4D Radar Object Detection Network Using Two-Level Preprocessing and Vertical Encoding. IEEE Transactions on Intelligent Vehicles, 10(2):1427– 1440, 2025

  16. [17]

    Akhil M Kurup and Jeremy P. Bos. Dsor: A scalable statis- tical filter for removing falling snow from lidar point clouds in severe winter weather.ArXiv, abs/2109.07078, 2021

  17. [18]

    RADE-Net: Robust Attention Net- work for Radar-only Object Detection in Adverse Weather, 2026

    Christof Leitgeb, Thomas Puchleitner, Max Peter Ronecker, and Daniel Watzenig. RADE-Net: Robust Attention Net- work for Radar-only Object Detection in Adverse Weather, 2026

  18. [19]

    Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chong- hao Sima, Tong Lu, Qiao Yu, and Jifeng Dai. Bevformer: Learning bird’s-eye-view representation from lidar-camera via spatiotemporal transformers.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 47(3):2020–2036, 2025

  19. [20]

    Radical: A synchronized fmcw radar, depth, imu and rgb camera data dataset with low-level fmcw radar signals.IEEE Journal of Selected Topics in Signal Processing, PP:1–1, 2021

    Teck Yian Lim, Spencer Markowitz, and Minh Do. Radical: A synchronized fmcw radar, depth, imu and rgb camera data dataset with low-level fmcw radar signals.IEEE Journal of Selected Topics in Signal Processing, PP:1–1, 2021

  20. [21]

    RCBEVDet: Radar-Camera Fusion in Bird’s Eye View for 3D Object Detection

    Zhiwei Lin, Zhe Liu, Zhongyu Xia, Xinhao Wang, Yongtao Wang, Shengxiang Qi, Yang Dong, Nan Dong, Le Zhang, and Ce Zhu. RCBEVDet: Radar-Camera Fusion in Bird’s Eye View for 3D Object Detection . In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14928–14937, Los Alamitos, CA, USA,

  21. [22]

    IEEE Computer Society

  22. [23]

    Echoes beyond points: Unleashing the power of raw radar data in multi-modality fusion

    Yang Liu, Feng Wang, Naiyan Wang, and ZHAO-XIANG ZHANG. Echoes beyond points: Unleashing the power of raw radar data in multi-modality fusion. InAdvances in Neu- ral Information Processing Systems, pages 53964–53982. Curran Associates, Inc., 2023

  23. [24]

    Zimmer-Dauphinee, Jor- dan M

    Siqi Lu, Junlin Guo, James R. Zimmer-Dauphinee, Jor- dan M. Nieusma, Xiao Wang, Parker VanValkenburgh, Steven A. Wernke, and Yuankai Huo. Vision foundation models in remote sensing: A survey.IEEE Geoscience and Remote Sensing Magazine, 13(3):190–215, 2025

  24. [25]

    3d object detection from images for autonomous driv- ing: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(5):3537–3556, 2024

    Xinzhu Ma, Wanli Ouyang, Andrea Simonelli, and Elisa Ricci. 3d object detection from images for autonomous driv- ing: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(5):3537–3556, 2024

  25. [26]

    Automotive radar dataset for deep learning based 3d object detection

    Michael Meyer and Georg Kuschk. Automotive radar dataset for deep learning based 3d object detection. In2019 16th European Radar Conference (EuRAD), pages 129–132, 2019

  26. [27]

    Radarpillars: Efficient object detec- tion from 4d radar point clouds

    Alexander Musiat, Laurenz Reichardt, Michael Schulze, and Oliver Wasenm¨uller. Radarpillars: Efficient object detec- tion from 4d radar point clouds. In2024 IEEE 27th Inter- national Conference on Intelligent Transportation Systems (ITSC), pages 1656–1663, 2024

  27. [28]

    Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fer- nandez, Daniel Haziza, Francisco Massa, Alaaeldin El- Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russ Howes, Po-Yao (Bernie) Huang, Shang-Wen Li, Is- han Misra, Michael G. Rabbat, Vasu Sharma, Gabriel Syn- naeve, Hu Xu, Herv ´e J ´ego...

  28. [29]

    Carrada dataset: Camera and au- tomotive radar with range- angle- doppler annotations

    Arthur Ouaknine, Alasdair Newson, Julien Rebut, Florence Tupin, and Patrick P´erez. Carrada dataset: Camera and au- tomotive radar with range- angle- doppler annotations. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 5068–5075, 2021

  29. [30]

    K-radar: 4d radar object detection for autonomous driving in various weather conditions

    Dong-Hee Paek, Seung-Hyun Kong, and Kevin Tirta Wi- jaya. K-radar: 4d radar object detection for autonomous driving in various weather conditions. InThirty-sixth Con- ference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022. 9 Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026

  30. [31]

    Andras Palffy, Ewoud Pool, Srimannarayana Baratam, Ju- lian F. P. Kooij, and Dariu M. Gavrila. Multi-class road user detection with 3+1d radar in the view-of-delft dataset.IEEE Robotics and Automation Letters, 7(2):4961–4968, 2022

  31. [32]

    Raw High-Definition Radar for Multi-Task Learn- ing

    Julien Rebut, Arthur Ouaknine, Waqas Malik, and Patrick Perez. Raw High-Definition Radar for Multi-Task Learn- ing . In2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17000–17009, Los Alamitos, CA, USA, 2022. IEEE Computer Society

  32. [33]

    Raw high-definition radar for multi-task learning

    Julien Rebut, Arthur Ouaknine, Waqas Malik, and Patrick P´erez. Raw high-definition radar for multi-task learning. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 17021–17030, 2022

  33. [34]

    You only look once: Unified, real-time object de- tection

    Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object de- tection. In2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 779–788, 2016

  34. [35]

    Bevcar: Camera-radar fusion for bev map and ob- ject segmentation

    Jonas Schramm, Niclas V ¨odisch, K ¨ursat Petek, B Ravi Ki- ran, Senthil Yogamani, Wolfram Burgard, and Abhinav Val- ada. Bevcar: Camera-radar fusion for bev map and ob- ject segmentation. In2024 IEEE/RSJ International Confer- ence on Intelligent Robots and Systems (IROS), pages 1435– 1442, 2024

  35. [36]

    Classification of human activities based on automotive radar spectral images using machine learning techniques: A case study

    Linda Senigagliesi, Gianluca Ciattaglia, Deivis Disha, and Ennio Gambi. Classification of human activities based on automotive radar spectral images using machine learning techniques: A case study. In2022 IEEE Radar Conference (RadarConf22), pages 1–6, 2022

  36. [37]

    Real-time 3d scene understanding for road safety: Depth es- timation and object detection for autonomous vehicle aware- ness.Vehicles, 8(2), 2026

    Marcel Simeonov, Andrei Kurdiumov, and Milan Dado. Real-time 3d scene understanding for road safety: Depth es- timation and object detection for autonomous vehicle aware- ness.Vehicles, 8(2), 2026

  37. [38]

    Oriane Sim ´eoni, Huy V . V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Micha ¨el Ramamonjisoa, Francisco Massa, Daniel Haziza, Luca Wehrstedt, Jianyuan Wang, Timoth´ee Darcet, Th´eo Moutakanni, Leonel Sentana, Claire Roberts, Andrea Vedaldi, Jamie Tolan, John Brandt, Camille Couprie,...

  38. [39]

    Enhanced 3d object detection via diverse feature representations of 4d radar tensor.IEEE Sensors Journal, pages 1–1, 2026

    Seung-Hyun Song, Dong-Hee Paek, Minh-Quan Dao, Ezio Malis, and Seung-Hyun Kong. Enhanced 3d object detection via diverse feature representations of 4d radar tensor.IEEE Sensors Journal, pages 1–1, 2026

  39. [40]

    Learning 3d fea- tures with 2d cnns via surface projection for ct volume seg- mentation

    Youyi Song, Zhen Yu, Teng Zhou, Jeremy Yuen-Chun Teoh, Baiying Lei, Kup-Sze Choi, and Jing Qin. Learning 3d fea- tures with 2d cnns via surface projection for ct volume seg- mentation. InMedical Image Computing and Computer As- sisted Intervention – MICCAI 2020, pages 176–186, Cham,

  40. [41]

    Springer International Publishing

  41. [42]

    Efficient 4d radar data auto-labeling method using lidar-based object detection network

    Min-Hyeok Sun, Dong-Hee Paek, Seung-Hyun Song, and Seung-Hyun Kong. Efficient 4d radar data auto-labeling method using lidar-based object detection network. In2024 IEEE Intelligent Vehicles Symposium (IV), pages 2616– 2621, 2024

  42. [43]

    Weinberger

    Yan Wang, Wei-Lun Chao, Divyansh Garg, Bharath Hari- haran, Mark Campbell, and Kilian Q. Weinberger. Pseudo- lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving. In2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8437–8445, 2019

  43. [44]

    Casa: A cascade attention network for 3-d object detection from lidar point clouds.IEEE Transactions on Geoscience and Remote Sensing, 60:1–11, 2022

    Hai Wu, Jinhao Deng, Chenglu Wen, Xin Li, Cheng Wang, and Jonathan Li. Casa: A cascade attention network for 3-d object detection from lidar point clouds.IEEE Transactions on Geoscience and Remote Sensing, 60:1–11, 2022

  44. [45]

    Transformation-equivariant 3d object detec- tion for autonomous driving.Proceedings of the AAAI Con- ference on Artificial Intelligence, 37:2795–2802, 2023

    Hai Wu, Chenglu Wen, Wei Li, Xin Li, Ruigang Yang, and Cheng Wang. Transformation-equivariant 3d object detec- tion for autonomous driving.Proceedings of the AAAI Con- ference on Artificial Intelligence, 37:2795–2802, 2023

  45. [46]

    Radar–camera fusion in perspective view and bird’s eye view for 3d object detection.Sensors, 25(19), 2025

    Yuhao Xiao, Xiaoqing Chen, Yingkai Wang, and Zhongliang Fu. Radar–camera fusion in perspective view and bird’s eye view for 3d object detection.Sensors, 25(19), 2025

  46. [47]

    ADCNet: Learning from Raw Radar Data via Distillation,

    Bo Yang, Ishan Khatri, Michael Happold, and Chulong Chen. ADCNet: Learning from Raw Radar Data via Dis- tillation, 2023. arXiv:2303.11420 [eess]

  47. [48]

    Rethinking rotated object detection with gaussian wasserstein distance loss.CoRR, abs/2101.11952, 2021

    Xue Yang, Junchi Yan, Qi Ming, Wentao Wang, Xi- aopeng Zhang, and Qi Tian. Rethinking rotated object detection with gaussian wasserstein distance loss.CoRR, abs/2101.11952, 2021

  48. [49]

    Radar-camera fusion for object detection and semantic segmentation in au- tonomous driving: A comprehensive review.IEEE Transac- tions on Intelligent Vehicles, 9(1):2094–2128, 2024

    Shanliang Yao, Runwei Guan, Xiaoyu Huang, Zhuoxiao Li, Xiangyu Sha, Yong Yue, Eng Gee Lim, Hyungjoon Seo, Ka Lok Man, Xiaohui Zhu, and Yutao Yue. Radar-camera fusion for object detection and semantic segmentation in au- tonomous driving: A comprehensive review.IEEE Transac- tions on Intelligent Vehicles, 9(1):2094–2128, 2024

  49. [50]

    Exploring radar data representations in au- tonomous driving: A comprehensive review.IEEE Trans- actions on Intelligent Transportation Systems, 26(6):7401– 7425, 2025

    Shanliang Yao, Runwei Guan, Zitian Peng, Chenhang Xu, Yilu Shi, Weiping Ding, Eng Gee Lim, Yong Yue, Hyungjoon Seo, Ka Lok Man, Jieming Ma, Xiaohui Zhu, and Yutao Yue. Exploring radar data representations in au- tonomous driving: A comprehensive review.IEEE Trans- actions on Intelligent Transportation Systems, 26(6):7401– 7425, 2025

  50. [51]

    Raddet: Range-azimuth-doppler based radar object detec- tion for dynamic road users

    Ao Zhang, Farzan Erlik Nowruzi, and Robert Laganiere. Raddet: Range-azimuth-doppler based radar object detec- tion for dynamic road users. In2021 18th Conference on Robots and Vision (CRV), pages 95–102, 2021

  51. [52]

    Mixedfusion: An efficient multimodal data fusion framework for 3-d object detection and tracking

    Cheng Zhang, Hai Wang, Long Chen, Yicheng Li, and Yingfeng Cai. Mixedfusion: An efficient multimodal data fusion framework for 3-d object detection and tracking. IEEE Transactions on Neural Networks and Learning Sys- tems, 36(1):1842–1856, 2025

  52. [53]

    Objects as points,

    Xingyi Zhou, Dequan Wang, and Philipp Kr ¨ahenb¨uhl. Ob- jects as points. InarXiv preprint arXiv:1904.07850, 2019

  53. [54]

    Vpfnet: Improving 3d object detection with virtual point based lidar and stereo data fusion.IEEE Transactions on Multimedia, 25:5291–5304, 2023

    Hanqi Zhu, Jiajun Deng, Yu Zhang, Jianmin Ji, Qiuyu Mao, Houqiang Li, and Yanyong Zhang. Vpfnet: Improving 3d object detection with virtual point based lidar and stereo data fusion.IEEE Transactions on Multimedia, 25:5291–5304, 2023

  54. [55]

    Deformable DETR: Deformable Transformers for End-to-End Object Detection

    Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. Deformable detr: Deformable transform- ers for end-to-end object detection.ArXiv, abs/2010.04159, 2020. 10