arxiv: 2604.08074 · v1 · submitted 2026-04-09 · 💻 cs.CV

Recognition: unknown

DinoRADE: Full Spectral Radar-Camera Fusion with Vision Foundation Model Features for Multi-class Object Detection in Adverse Weather

Christof Leitgeb , Thomas Puchleitner , Max Peter Ronecker , Daniel Watzenig

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:22 UTC · model grok-4.3

classification 💻 cs.CV

keywords radar-camera fusionadverse weatherobject detectionDINOv3deformable cross-attentionmulti-class detectionK-Radarautonomous driving

0 comments

The pith

DinoRADE fuses dense radar tensors with DINOv3 vision features via deformable cross-attention to improve multi-class object detection in adverse weather by 12.1 percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents DinoRADE as a radar-centered pipeline for object detection that remains effective when visibility is poor. It takes dense radar tensors as input and uses deformable cross-attention to gather relevant features from a DINOv3 vision foundation model at points obtained by transforming radar locations into the camera image plane. This design addresses the limited spatial resolution of radar alone, especially for smaller objects such as pedestrians and cyclists. The approach is evaluated on the K-Radar dataset across multiple weather conditions and delivers a 12.1 percent gain over recent radar-camera fusion baselines while reporting separate results for five object classes. A reader would care because current autonomous driving systems need reliable perception of vulnerable road users even in rain, snow, or fog.

Core claim

DinoRADE processes dense Radar tensors and aggregates vision features around transformed reference points in the camera perspective via deformable cross-attention, with vision features supplied by a DINOv3 Vision Foundation Model, yielding improved multi-class detection performance on the K-Radar dataset in all weather conditions and outperforming recent Radar-camera approaches by 12.1 percent.

What carries the argument

Deformable cross-attention that aggregates DINOv3 vision features around radar-transformed reference points projected into the camera view

If this is right

The pipeline enables separate performance reporting for five object classes including vulnerable road users on an adverse-weather dataset.
Radar-only limitations in fine spatial detail are mitigated by pulling in high-resolution vision features at radar reference locations.
The 12.1 percent gain over prior radar-camera methods holds across all weather conditions in the K-Radar evaluation.
Vision foundation model features can be incorporated into radar-centered detection without requiring full image processing at every step.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar deformable-attention fusion could be tested with other vision foundation models to check whether gains are specific to DINOv3.
The method's reliance on accurate radar-to-camera projection suggests potential sensitivity to calibration drift in deployed vehicles.
If the performance lift generalizes, it could reduce the required radar resolution or sensor count in production autonomous driving stacks.
Extending the same reference-point mechanism to lidar-camera pairs might address low-visibility scenarios beyond radar.

Load-bearing premise

The K-Radar dataset distribution and the chosen reference-point transformation accurately represent real-world radar-camera calibration and adverse-weather statistics, and DINOv3 features transfer without significant domain gap to radar-projected image regions.

What would settle it

Evaluating DinoRADE on an independent radar-camera dataset collected in adverse weather and observing no improvement or a drop in mean average precision for the five object classes would falsify the claimed performance advantage.

Figures

Figures reproduced from arXiv: 2604.08074 by Christof Leitgeb, Daniel Watzenig, Max Peter Ronecker, Thomas Puchleitner.

**Figure 2.** Figure 2: Reference points projected from 3D Radar queries to [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: DinoRADE performance visualization in four different scenarios: (a) university campus, (b) alleyway, (c) highway, and (d) road [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Examples for partially occluded (1), heavily occluded [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

Reliable and weather-robust perception systems are essential for safe autonomous driving and typically employ multi-modal sensor configurations to achieve comprehensive environmental awareness. While recent automotive FMCW Radar-based approaches achieved remarkable performance on detection tasks in adverse weather conditions, they exhibited limitations in resolving fine-grained spatial details particularly critical for detecting smaller and vulnerable road users (VRUs). Furthermore, existing research has not adequately addressed VRU detection in adverse weather datasets such as K-Radar. We present DinoRADE, a Radar-centered detection pipeline that processes dense Radar tensors and aggregates vision features around transformed reference points in the camera perspective via deformable cross-attention. Vision features are provided by a DINOv3 Vision Foundation Model. We present a comprehensive performance evaluation on the K-Radar dataset in all weather conditions and are among the first to report detection performance individually for five object classes. Additionally, we compare our method with existing single-class detection approaches and outperform recent Radar-camera approaches by 12.1%. The code is available under https://github.com/chr-is-tof/RADE-Net.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DinoRADE brings DINOv3 features into a radar-centered fusion pipeline via deformable attention and reports multi-class gains on K-Radar, but the 12.1% claim rests on thin experimental detail.

read the letter

The paper's core move is to keep radar as the anchor, project reference points into the camera view, and pull DINOv3 features around those points with deformable cross-attention before fusing back into the detection head. They run this on the full K-Radar set across weather conditions and break results out by five classes, including vulnerable road users. That combination and the per-class adverse-weather numbers are not in the earlier radar-camera papers they cite. Releasing the code is also straightforward and helpful for anyone who wants to test the pipeline themselves. The architecture is simple enough that the radar tensor supplies the weather-robust backbone while the vision features are meant to add the spatial precision radar lacks. That division of labor makes sense for the problem they target. The reported 12.1% lift over recent fusion baselines is the headline number, but the abstract and summary give almost no baseline definitions, no ablation isolating the DINOv3 contribution, and no error breakdown by weather severity or object size. Without those, it is hard to know whether the gain comes from the fusion step or from other implementation choices. The domain-shift worry also stands: DINOv3 was pretrained on clear-weather images, and the paper does not describe any weather-specific normalization or adaptation when the camera view is degraded. If the features become noisy in fog or heavy rain, the deformable attention may simply be passing through radar-driven detections rather than adding real multi-modal value. This work is aimed at groups building practical perception stacks for autonomous driving in mixed weather. Readers who need concrete fusion recipes and dataset numbers on K-Radar will find usable ideas here. It is solid enough to send for peer review; the architecture is grounded and the code is public, so referees can check the claims directly and ask for the missing ablations.

Referee Report

3 major / 2 minor

Summary. The paper introduces DinoRADE, a radar-centered detection pipeline that processes dense FMCW radar tensors and fuses them with features from a DINOv3 vision foundation model. Features are aggregated around radar-to-camera transformed reference points using deformable cross-attention. The method is evaluated on the K-Radar dataset across weather conditions, reporting multi-class results for five object categories (including VRUs) and claiming a 12.1% improvement over recent radar-camera fusion baselines.

Significance. If the reported gains are reproducible and attributable to the proposed fusion rather than dataset-specific factors, the work would usefully demonstrate how pre-trained vision foundation models can be integrated into radar-centric pipelines to improve spatial resolution for small objects in adverse weather. The emphasis on per-class metrics for five categories on K-Radar and the release of code are constructive contributions to the empirical literature on multi-modal adverse-weather perception.

major comments (3)

[Abstract and §4] Abstract and §4 (Experiments): The central claim of a 12.1% outperformance is stated without the underlying metric (mAP, AP@0.5, etc.), the numerical scores of the compared radar-camera baselines, or any ablation isolating the DINOv3 deformable-attention component from the radar-only backbone. This absence prevents verification that the gain stems from the claimed full-spectral fusion rather than implementation details or dataset tuning.
[§3] §3 (Method): The architecture description contains no domain-adaptation layer, weather-conditioned normalization, or explicit handling of the domain gap between DINOv3’s clear-weather pre-training distribution and the fog/rain/snow subsets of K-Radar. The deformable cross-attention simply consumes whatever features DINOv3 produces on the projected regions; if those features degrade substantially, the reported multi-modal benefit may be overstated.
[§4] §4 (Experiments): No per-weather-condition breakdowns, error analysis, or statistical significance tests are referenced for the five-class results. Without these, it is impossible to determine whether the method’s advantage holds uniformly across adverse conditions or is driven by easier subsets.

minor comments (2)

[Abstract] The abstract and introduction would benefit from a concise table or sentence listing the exact prior radar-camera methods being compared and their reported scores on the same K-Radar split.
[§3] Notation for the radar tensor representation and the reference-point transformation could be made more explicit (e.g., coordinate frames and calibration parameters) to aid reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, indicating where we will revise the manuscript to improve clarity and completeness.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Experiments): The central claim of a 12.1% outperformance is stated without the underlying metric (mAP, AP@0.5, etc.), the numerical scores of the compared radar-camera baselines, or any ablation isolating the DINOv3 deformable-attention component from the radar-only backbone. This absence prevents verification that the gain stems from the claimed full-spectral fusion rather than implementation details or dataset tuning.

Authors: We agree that the metric, baseline scores, and an isolating ablation are necessary for full verification. The 12.1% figure refers to the improvement in mAP at IoU=0.5 over the strongest radar-camera baseline on the full K-Radar test set. We will revise the abstract and add a results table in §4 that lists exact mAP scores for all compared methods. We will also insert an ablation subsection in §4 that removes the DINOv3 deformable-attention branch and reports the resulting drop relative to the full model. revision: yes
Referee: [§3] §3 (Method): The architecture description contains no domain-adaptation layer, weather-conditioned normalization, or explicit handling of the domain gap between DINOv3’s clear-weather pre-training distribution and the fog/rain/snow subsets of K-Radar. The deformable cross-attention simply consumes whatever features DINOv3 produces on the projected regions; if those features degrade substantially, the reported multi-modal benefit may be overstated.

Authors: We acknowledge the domain-shift issue. Our current design freezes DINOv3 and applies no explicit adaptation or weather-conditioned normalization, relying on the foundation model’s reported robustness. We will expand §3 with a dedicated paragraph discussing the pre-training versus K-Radar distribution gap and its potential impact on feature quality. We will also add a short qualitative study of DINOv3 feature activation maps on adverse-weather images to the supplementary material. revision: partial
Referee: [§4] §4 (Experiments): No per-weather-condition breakdowns, error analysis, or statistical significance tests are referenced for the five-class results. Without these, it is impossible to determine whether the method’s advantage holds uniformly across adverse conditions or is driven by easier subsets.

Authors: We agree that condition-specific breakdowns strengthen the claims. K-Radar provides weather labels, so we will add a new table in §4 reporting mAP per weather subset (clear, fog, rain, snow) for the five classes. We will also include a brief error analysis highlighting common failure modes for VRUs and small objects, and report standard deviations across three random seeds to indicate variability. revision: yes

Circularity Check

0 steps flagged

Empirical pipeline with no derivations or predictions by construction

full rationale

The paper presents DinoRADE as an architecture (dense radar tensor processing + deformable cross-attention to aggregate DINOv3 features) and reports empirical mAP gains on the K-Radar dataset. No equations, first-principles derivations, fitted parameters renamed as predictions, or uniqueness theorems appear in the provided text. Performance claims rest on direct experimental comparison rather than any reduction to self-defined inputs. Self-citations, if present, are not load-bearing for any claimed derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No load-bearing free parameters, axioms, or invented entities are identifiable from the abstract. The pipeline uses standard deep-learning components (deformable attention, foundation-model features) and an existing public dataset.

pith-pipeline@v0.9.0 · 5501 in / 1254 out tokens · 53685 ms · 2026-05-10T17:22:52.495862+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

54 extracted references · 8 canonical work pages · 2 internal anchors

[1]

Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Gi- ancarlo Baldan, and Oscar Beijbom

Holger Caesar, Varun Bankiti, Alex H. Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Gi- ancarlo Baldan, and Oscar Beijbom. nuScenes: A Multi- modal Dataset for Autonomous Driving . In2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11618–11628, Los Alamitos, CA, USA,
[3]

Zhe Chen, Yuchen Duan, Wenhai Wang, Junjun He, Tong Lu, Jifeng Dai, and Y . Qiao. Vision transformer adapter for dense predictions.ArXiv, abs/2205.08534, 2022

work page arXiv 2022
[4]

V oxel r-cnn: Towards high performance voxel-based 3d object detection.Proceed- ings of the AAAI Conference on Artificial Intelligence, 35: 1201–1209, 2021

Jiajun Deng, Shaoshuai Shi, Peiwei Li, Wengang Zhou, Yanyong Zhang, and Houqiang Li. V oxel r-cnn: Towards high performance voxel-based 3d object detection.Proceed- ings of the AAAI Conference on Artificial Intelligence, 35: 1201–1209, 2021

2021
[5]

A review of research on vehicle detection in adverse weather environments.Journal of Traffic and Trans- portation Engineering (English Edition), 12(5):1452–1483, 2025

Sheng Feng, Xueying Cai, Limin Li, Weixing Wang, and Senang Ying. A review of research on vehicle detection in adverse weather environments.Journal of Traffic and Trans- portation Engineering (English Edition), 12(5):1452–1483, 2025. 8 Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026

2025
[6]

F. Fent, A. Palffy, and H. Caesar. Dpft: Dual perspec- tive fusion transformer for camera-radar-based object de- tection.IEEE Transactions on Intelligent Vehicles, 10(11): 4929–4941, 2025

2025
[7]

Raw ADC data of 77GHz MMWave radar for au- tomotive object detection, 2022

Xiangyu Gao, Youchen Luo, Guanbin Xing, Sumit Roy, and Hui Liu. Raw ADC data of 77GHz MMWave radar for au- tomotive object detection, 2022. Distributed by IEEE Data- port

2022
[8]

Are we ready for autonomous driving? the kitti vision benchmark suite

Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 3354–3361, 2012

2012
[9]

T- fftradnet: Object detection with swin vision transformers from raw adc radar signals

James Giroux, Martin Bouchard, and Robert Lagani `ere. T- fftradnet: Object detection with swin vision transformers from raw adc radar signals. In2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pages 4032–4041, 2023

2023
[10]

Rich feature hierarchies for accurate object detection and semantic segmentation

Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In2014 IEEE Conference on Computer Vision and Pattern Recognition, pages 580–587, 2014

2014
[11]

Wavelet-based multi-view fu- sion of 4d radar tensor and camera for robust 3d object de- tection, 2026

Runwei Guan, Jianan Liu, Shaofeng Liang, Fangqiang Ding, Shanliang Yao, Xiaokai Bai, Daizong Liu, Tao Huang, Guo- qiang Mao, and Hui Xiong. Wavelet-based multi-view fu- sion of 4d radar tensor and camera for robust 3d object de- tection, 2026

2026
[12]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceed- ings of the IEEE Conference on Computer Vision and Pat- tern Recognition (CVPR), 2016

2016
[13]

Bevdet: High-performance multi-camera 3d object detection in bird-eye-view.arXiv preprint arXiv:2112.11790, 2021

Junjie Huang, Guan Huang, Zheng Zhu, Yun Ye, and Dalong Du. BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View, 2022. arXiv:2112.11790 [cs]

work page arXiv 2022
[14]

L4dr: Lidar-4dradar fusion for weather-robust 3d object detection.Proceedings of the AAAI Conference on Artificial Intelligence, 39(4):3806–3814, 2025

Xun Huang, Ziyu Xu, Hai Wu, Jinlong Wang, Qiming Xia, Yan Xia, Jonathan Li, Kyle Gao, Chenglu Wen, and Cheng Wang. L4dr: Lidar-4dradar fusion for weather-robust 3d object detection.Proceedings of the AAAI Conference on Artificial Intelligence, 39(4):3806–3814, 2025

2025
[15]

3d object detection for autonomous driving: A survey

JunXin Jin, Wei Liu, Zuotao Ning, Qixi Zhao, Shuai Cheng, and Jun Hu. 3d object detection for autonomous driving: A survey. In2024 36th Chinese Control and Decision Confer- ence (CCDC), pages 3825–3832, 2024

2024
[16]

RTNH+: Enhanced 4D Radar Object Detection Network Using Two-Level Preprocessing and Vertical Encoding

Seung-Hyun Kong, Dong-Hee Paek, and Sangyeong Lee. RTNH+: Enhanced 4D Radar Object Detection Network Using Two-Level Preprocessing and Vertical Encoding. IEEE Transactions on Intelligent Vehicles, 10(2):1427– 1440, 2025

2025
[17]

Akhil M Kurup and Jeremy P. Bos. Dsor: A scalable statis- tical filter for removing falling snow from lidar point clouds in severe winter weather.ArXiv, abs/2109.07078, 2021

work page arXiv 2021
[18]

RADE-Net: Robust Attention Net- work for Radar-only Object Detection in Adverse Weather, 2026

Christof Leitgeb, Thomas Puchleitner, Max Peter Ronecker, and Daniel Watzenig. RADE-Net: Robust Attention Net- work for Radar-only Object Detection in Adverse Weather, 2026

2026
[19]

Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chong- hao Sima, Tong Lu, Qiao Yu, and Jifeng Dai. Bevformer: Learning bird’s-eye-view representation from lidar-camera via spatiotemporal transformers.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 47(3):2020–2036, 2025

2020
[20]

Radical: A synchronized fmcw radar, depth, imu and rgb camera data dataset with low-level fmcw radar signals.IEEE Journal of Selected Topics in Signal Processing, PP:1–1, 2021

Teck Yian Lim, Spencer Markowitz, and Minh Do. Radical: A synchronized fmcw radar, depth, imu and rgb camera data dataset with low-level fmcw radar signals.IEEE Journal of Selected Topics in Signal Processing, PP:1–1, 2021

2021
[21]

RCBEVDet: Radar-Camera Fusion in Bird’s Eye View for 3D Object Detection

Zhiwei Lin, Zhe Liu, Zhongyu Xia, Xinhao Wang, Yongtao Wang, Shengxiang Qi, Yang Dong, Nan Dong, Le Zhang, and Ce Zhu. RCBEVDet: Radar-Camera Fusion in Bird’s Eye View for 3D Object Detection . In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14928–14937, Los Alamitos, CA, USA,
[22]

IEEE Computer Society
[23]

Echoes beyond points: Unleashing the power of raw radar data in multi-modality fusion

Yang Liu, Feng Wang, Naiyan Wang, and ZHAO-XIANG ZHANG. Echoes beyond points: Unleashing the power of raw radar data in multi-modality fusion. InAdvances in Neu- ral Information Processing Systems, pages 53964–53982. Curran Associates, Inc., 2023

2023
[24]

Zimmer-Dauphinee, Jor- dan M

Siqi Lu, Junlin Guo, James R. Zimmer-Dauphinee, Jor- dan M. Nieusma, Xiao Wang, Parker VanValkenburgh, Steven A. Wernke, and Yuankai Huo. Vision foundation models in remote sensing: A survey.IEEE Geoscience and Remote Sensing Magazine, 13(3):190–215, 2025

2025
[25]

3d object detection from images for autonomous driv- ing: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(5):3537–3556, 2024

Xinzhu Ma, Wanli Ouyang, Andrea Simonelli, and Elisa Ricci. 3d object detection from images for autonomous driv- ing: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(5):3537–3556, 2024

2024
[26]

Automotive radar dataset for deep learning based 3d object detection

Michael Meyer and Georg Kuschk. Automotive radar dataset for deep learning based 3d object detection. In2019 16th European Radar Conference (EuRAD), pages 129–132, 2019

2019
[27]

Radarpillars: Efficient object detec- tion from 4d radar point clouds

Alexander Musiat, Laurenz Reichardt, Michael Schulze, and Oliver Wasenm¨uller. Radarpillars: Efficient object detec- tion from 4d radar point clouds. In2024 IEEE 27th Inter- national Conference on Intelligent Transportation Systems (ITSC), pages 1656–1663, 2024

2024
[28]

Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fer- nandez, Daniel Haziza, Francisco Massa, Alaaeldin El- Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russ Howes, Po-Yao (Bernie) Huang, Shang-Wen Li, Is- han Misra, Michael G. Rabbat, Vasu Sharma, Gabriel Syn- naeve, Hu Xu, Herv ´e J ´ego...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[29]

Carrada dataset: Camera and au- tomotive radar with range- angle- doppler annotations

Arthur Ouaknine, Alasdair Newson, Julien Rebut, Florence Tupin, and Patrick P´erez. Carrada dataset: Camera and au- tomotive radar with range- angle- doppler annotations. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 5068–5075, 2021

2020
[30]

K-radar: 4d radar object detection for autonomous driving in various weather conditions

Dong-Hee Paek, Seung-Hyun Kong, and Kevin Tirta Wi- jaya. K-radar: 4d radar object detection for autonomous driving in various weather conditions. InThirty-sixth Con- ference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022. 9 Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026

2022
[31]

Andras Palffy, Ewoud Pool, Srimannarayana Baratam, Ju- lian F. P. Kooij, and Dariu M. Gavrila. Multi-class road user detection with 3+1d radar in the view-of-delft dataset.IEEE Robotics and Automation Letters, 7(2):4961–4968, 2022

2022
[32]

Raw High-Definition Radar for Multi-Task Learn- ing

Julien Rebut, Arthur Ouaknine, Waqas Malik, and Patrick Perez. Raw High-Definition Radar for Multi-Task Learn- ing . In2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17000–17009, Los Alamitos, CA, USA, 2022. IEEE Computer Society

2022
[33]

Raw high-definition radar for multi-task learning

Julien Rebut, Arthur Ouaknine, Waqas Malik, and Patrick P´erez. Raw high-definition radar for multi-task learning. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 17021–17030, 2022

2022
[34]

You only look once: Unified, real-time object de- tection

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object de- tection. In2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 779–788, 2016

2016
[35]

Bevcar: Camera-radar fusion for bev map and ob- ject segmentation

Jonas Schramm, Niclas V ¨odisch, K ¨ursat Petek, B Ravi Ki- ran, Senthil Yogamani, Wolfram Burgard, and Abhinav Val- ada. Bevcar: Camera-radar fusion for bev map and ob- ject segmentation. In2024 IEEE/RSJ International Confer- ence on Intelligent Robots and Systems (IROS), pages 1435– 1442, 2024

2024
[36]

Classification of human activities based on automotive radar spectral images using machine learning techniques: A case study

Linda Senigagliesi, Gianluca Ciattaglia, Deivis Disha, and Ennio Gambi. Classification of human activities based on automotive radar spectral images using machine learning techniques: A case study. In2022 IEEE Radar Conference (RadarConf22), pages 1–6, 2022

2022
[37]

Real-time 3d scene understanding for road safety: Depth es- timation and object detection for autonomous vehicle aware- ness.Vehicles, 8(2), 2026

Marcel Simeonov, Andrei Kurdiumov, and Milan Dado. Real-time 3d scene understanding for road safety: Depth es- timation and object detection for autonomous vehicle aware- ness.Vehicles, 8(2), 2026

2026
[38]

Oriane Sim ´eoni, Huy V . V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Micha ¨el Ramamonjisoa, Francisco Massa, Daniel Haziza, Luca Wehrstedt, Jianyuan Wang, Timoth´ee Darcet, Th´eo Moutakanni, Leonel Sentana, Claire Roberts, Andrea Vedaldi, Jamie Tolan, John Brandt, Camille Couprie,...

2025
[39]

Enhanced 3d object detection via diverse feature representations of 4d radar tensor.IEEE Sensors Journal, pages 1–1, 2026

Seung-Hyun Song, Dong-Hee Paek, Minh-Quan Dao, Ezio Malis, and Seung-Hyun Kong. Enhanced 3d object detection via diverse feature representations of 4d radar tensor.IEEE Sensors Journal, pages 1–1, 2026

2026
[40]

Learning 3d fea- tures with 2d cnns via surface projection for ct volume seg- mentation

Youyi Song, Zhen Yu, Teng Zhou, Jeremy Yuen-Chun Teoh, Baiying Lei, Kup-Sze Choi, and Jing Qin. Learning 3d fea- tures with 2d cnns via surface projection for ct volume seg- mentation. InMedical Image Computing and Computer As- sisted Intervention – MICCAI 2020, pages 176–186, Cham,

2020
[41]

Springer International Publishing
[42]

Efficient 4d radar data auto-labeling method using lidar-based object detection network

Min-Hyeok Sun, Dong-Hee Paek, Seung-Hyun Song, and Seung-Hyun Kong. Efficient 4d radar data auto-labeling method using lidar-based object detection network. In2024 IEEE Intelligent Vehicles Symposium (IV), pages 2616– 2621, 2024

2024
[43]

Weinberger

Yan Wang, Wei-Lun Chao, Divyansh Garg, Bharath Hari- haran, Mark Campbell, and Kilian Q. Weinberger. Pseudo- lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving. In2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8437–8445, 2019

2019
[44]

Casa: A cascade attention network for 3-d object detection from lidar point clouds.IEEE Transactions on Geoscience and Remote Sensing, 60:1–11, 2022

Hai Wu, Jinhao Deng, Chenglu Wen, Xin Li, Cheng Wang, and Jonathan Li. Casa: A cascade attention network for 3-d object detection from lidar point clouds.IEEE Transactions on Geoscience and Remote Sensing, 60:1–11, 2022

2022
[45]

Transformation-equivariant 3d object detec- tion for autonomous driving.Proceedings of the AAAI Con- ference on Artificial Intelligence, 37:2795–2802, 2023

Hai Wu, Chenglu Wen, Wei Li, Xin Li, Ruigang Yang, and Cheng Wang. Transformation-equivariant 3d object detec- tion for autonomous driving.Proceedings of the AAAI Con- ference on Artificial Intelligence, 37:2795–2802, 2023

2023
[46]

Radar–camera fusion in perspective view and bird’s eye view for 3d object detection.Sensors, 25(19), 2025

Yuhao Xiao, Xiaoqing Chen, Yingkai Wang, and Zhongliang Fu. Radar–camera fusion in perspective view and bird’s eye view for 3d object detection.Sensors, 25(19), 2025

2025
[47]

ADCNet: Learning from Raw Radar Data via Distillation,

Bo Yang, Ishan Khatri, Michael Happold, and Chulong Chen. ADCNet: Learning from Raw Radar Data via Dis- tillation, 2023. arXiv:2303.11420 [eess]

work page arXiv 2023
[48]

Rethinking rotated object detection with gaussian wasserstein distance loss.CoRR, abs/2101.11952, 2021

Xue Yang, Junchi Yan, Qi Ming, Wentao Wang, Xi- aopeng Zhang, and Qi Tian. Rethinking rotated object detection with gaussian wasserstein distance loss.CoRR, abs/2101.11952, 2021

work page arXiv 2021
[49]

Radar-camera fusion for object detection and semantic segmentation in au- tonomous driving: A comprehensive review.IEEE Transac- tions on Intelligent Vehicles, 9(1):2094–2128, 2024

Shanliang Yao, Runwei Guan, Xiaoyu Huang, Zhuoxiao Li, Xiangyu Sha, Yong Yue, Eng Gee Lim, Hyungjoon Seo, Ka Lok Man, Xiaohui Zhu, and Yutao Yue. Radar-camera fusion for object detection and semantic segmentation in au- tonomous driving: A comprehensive review.IEEE Transac- tions on Intelligent Vehicles, 9(1):2094–2128, 2024

2094
[50]

Exploring radar data representations in au- tonomous driving: A comprehensive review.IEEE Trans- actions on Intelligent Transportation Systems, 26(6):7401– 7425, 2025

Shanliang Yao, Runwei Guan, Zitian Peng, Chenhang Xu, Yilu Shi, Weiping Ding, Eng Gee Lim, Yong Yue, Hyungjoon Seo, Ka Lok Man, Jieming Ma, Xiaohui Zhu, and Yutao Yue. Exploring radar data representations in au- tonomous driving: A comprehensive review.IEEE Trans- actions on Intelligent Transportation Systems, 26(6):7401– 7425, 2025

2025
[51]

Raddet: Range-azimuth-doppler based radar object detec- tion for dynamic road users

Ao Zhang, Farzan Erlik Nowruzi, and Robert Laganiere. Raddet: Range-azimuth-doppler based radar object detec- tion for dynamic road users. In2021 18th Conference on Robots and Vision (CRV), pages 95–102, 2021

2021
[52]

Mixedfusion: An efficient multimodal data fusion framework for 3-d object detection and tracking

Cheng Zhang, Hai Wang, Long Chen, Yicheng Li, and Yingfeng Cai. Mixedfusion: An efficient multimodal data fusion framework for 3-d object detection and tracking. IEEE Transactions on Neural Networks and Learning Sys- tems, 36(1):1842–1856, 2025

2025
[53]

Objects as points,

Xingyi Zhou, Dequan Wang, and Philipp Kr ¨ahenb¨uhl. Ob- jects as points. InarXiv preprint arXiv:1904.07850, 2019

work page arXiv 1904
[54]

Vpfnet: Improving 3d object detection with virtual point based lidar and stereo data fusion.IEEE Transactions on Multimedia, 25:5291–5304, 2023

Hanqi Zhu, Jiajun Deng, Yu Zhang, Jianmin Ji, Qiuyu Mao, Houqiang Li, and Yanyong Zhang. Vpfnet: Improving 3d object detection with virtual point based lidar and stereo data fusion.IEEE Transactions on Multimedia, 25:5291–5304, 2023

2023
[55]

Deformable DETR: Deformable Transformers for End-to-End Object Detection

Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. Deformable detr: Deformable transform- ers for end-to-end object detection.ArXiv, abs/2010.04159, 2020. 10

work page internal anchor Pith review arXiv 2010