Radar-Guided Polynomial Fitting for Metric Depth Estimation

Alex Wong; Hyoungseob Park; Jeffrey Moon; Patrick Rim; Vadim Ezhov

arxiv: 2503.17182 · v4 · submitted 2025-03-21 · 💻 cs.CV

Radar-Guided Polynomial Fitting for Metric Depth Estimation

Patrick Rim , Hyoungseob Park , Vadim Ezhov , Jeffrey Moon , Alex Wong This is my paper

Pith reviewed 2026-05-22 22:32 UTC · model grok-4.3

classification 💻 cs.CV

keywords depth estimationradar guidancepolynomial fittingmonocular depthmetric depthcomputer vision

0 comments

The pith

Radar data predicts polynomial coefficients to convert scaleless monocular depth into metric depth by correcting misalignments between local regions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that pretrained monocular depth models often capture reasonable local structure within objects but misalign entire regions relative to each other, so a simple scale-and-shift is insufficient when three or more regions are present. It shows that cheap radar returns can be used to predict the coefficients of a polynomial that applies non-uniform corrections across depth ranges, introducing inflection points where needed. A first-derivative regularizer is added during training to keep the adjusted depth map locally monotonic. This approach is presented as more efficient than complex new architectures while producing metric outputs.

Core claim

POLAR uses radar to predict polynomial coefficients that transform scaleless depth predictions from pretrained MDE models into metric depth maps, generalizing beyond affine transformations by allowing inflection points, while a novel training objective enforces local monotonicity via first-derivative regularization.

What carries the argument

Radar-guided polynomial fitting that predicts coefficients to adaptively adjust depth predictions non-uniformly across ranges while preserving structural consistency.

If this is right

Metric depth can be obtained from any pretrained monocular model without retraining the model itself.
The method runs with lower latency and computational cost than competing radar or fusion approaches.
Performance gains of roughly 25 percent MAE and 33 percent RMSE appear on three standard datasets.
The framework extends naturally to any sensor that supplies sparse but reliable depth cues for coefficient prediction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same polynomial-correction idea could be tested with lidar or stereo cues in place of radar to see whether the benefit is specific to radar sparsity patterns.
If the number of inflection points is learned rather than fixed, the method might adapt automatically to scene complexity.
Failure cases may appear in highly dynamic scenes where radar returns lag the image frame, suggesting a need for temporal consistency terms.

Load-bearing premise

Radar returns contain enough information to predict polynomial coefficients that realign multiple mis-scaled local depth regions without creating new inconsistencies.

What would settle it

A controlled test on scenes with four or more distinct objects where the polynomial-adjusted depth map violates local monotonicity or increases error relative to ground-truth metric depths despite the regularizer.

Figures

Figures reproduced from arXiv: 2503.17182 by Alex Wong, Hyoungseob Park, Jeffrey Moon, Patrick Rim, Vadim Ezhov.

**Figure 2.** Figure 2: Method Overview. POLAR transforms scaleless MDE predictions into metric depth using polynomial fitting guided by radar features. Learnable prototypes extract patterns in the configurations of radar point clouds and are used to aggregate spatially-informed radar features. The geometry-aware MDE features are fused with the radar features via a learnable soft-correspondence module to yield a unified scene rep… view at source ↗

**Figure 3.** Figure 3: Qualitative results on nuScenes. GET-UP and RadarCam-Depth (RC-D) fail to reconstruct entire regions, yielding objects with large depth errors. Raw MDE yields reasonable relative reconstructions but suffers from incorrect global scale and cross-object misalignments. POLAR leverages polynomial fitting to recover a global scale and correct these misalignments. terms enable correction of misalignments in rela… view at source ↗

**Figure 4.** Figure 4: POLAR leverages spatial information from radar points to [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: To quantify absolute improvement in our qualitative [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: Additional qualitative comparison on VoD. POLAR more accurately predicts metric scale, and exhibits fewer global misalignments and geometric artifacts in comparison to other methods. Polynomial fitting, allowing POLAR to non-uniformly scale distinct local regions, results in notable improvements in metric depth estimation over initial MDE predictions. This is apparent in all MDE error maps: though object … view at source ↗

**Figure 7.** Figure 7: nuScenes dataset visualization. The elevation ambiguity of radar points results in erroneous projection onto the image plane that makes it challenging for depth completion methods to infer dense depth. In contrast, lidar points yield a denser, image-aligned projection, which is why accurate 3D scene reconstruction with depth completion methods is possible. dence, Tab. 12 shows that our method, evaluated ze… view at source ↗

read the original abstract

We propose POLAR, a novel radar-guided depth estimation method that introduces polynomial fitting to efficiently transform scaleless depth predictions from pretrained monocular depth estimation (MDE) models into metric depth maps. Unlike existing approaches that rely on complex architectures or expensive sensors, our method is grounded in a fundamental insight: although MDE models often infer reasonable local depth structure within each object or local region, they may misalign these regions relative to one another, making a linear scale and shift (affine) transformation insufficient given three or more of these regions. To address this limitation, we use polynomial coefficients predicted from cheap, ubiquitous radar data to adaptively adjust predictions non-uniformly across depth ranges. In this way, POLAR generalizes beyond affine transformations and is able to correct such misalignments by introducing inflection points. Importantly, our polynomial fitting framework preserves structural consistency through a novel training objective that enforces local monotonicity via first-derivative regularization. POLAR achieves state-of-the-art performance across three datasets, outperforming existing methods by an average of 24.9% in MAE and 33.2% in RMSE, while also achieving state-of-the-art efficiency in terms of latency and computational cost.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

POLAR shows radar can drive polynomial corrections for non-affine depth misalignments in monocular estimates, with reported gains that look worth checking in the full experiments.

read the letter

The paper's main contribution is replacing affine scale-shift with radar-predicted polynomials to realign multiple local depth regions that monocular models get internally consistent but globally offset. That framing is new relative to the affine baselines they cite, and the derivative regularizer is a direct way to keep the fitted curves from creating new crossings or reversals. If the radar returns really carry enough signal to set the coefficients without extra inconsistencies, this is a practical route to metric depth on cheap hardware. The efficiency numbers and the three-dataset SOTA claim would matter for robotics if they hold up under scrutiny. The work is grounded in a clear problem statement and uses an external sensor rather than trying to hallucinate metric scale from image alone. Credit for shipping a training objective that explicitly targets monotonicity. The soft spot is exactly the one the stress-test flags: with only a couple of free parameters and potentially sparse radar points, it is not obvious that the learned mapping will produce polynomials that stay consistent across region boundaries once you move off the training distribution. The abstract gives performance deltas but no visible ablation on residual monotonicity violations or on how sensitive the fit is to radar noise. If the full paper has those checks and they look clean, the central claim strengthens; if not, the gains could be partly dataset-specific. This is for people building depth pipelines that already have or can add low-cost radar. A reader who wants a concrete alternative to complex fusion networks would get something usable from it. The idea is coherent enough on its own terms to deserve referee time rather than a desk reject.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes POLAR, a radar-guided polynomial fitting method to convert scaleless depth predictions from pretrained monocular depth estimation models into metric depth maps. The key insight is that MDE models produce reasonable local structures but misalign multiple regions, requiring more than affine transformations; radar data is used to predict polynomial coefficients for non-uniform adjustments and inflection points, with a first-derivative regularization to enforce local monotonicity. The paper reports state-of-the-art results on three datasets with average gains of 24.9% in MAE and 33.2% in RMSE, plus efficiency advantages.

Significance. If the result holds, the significance is that the work offers an efficient radar-based approach to metric depth estimation that generalizes beyond affine transformations via polynomials, which could impact real-time applications in computer vision and robotics by reducing reliance on expensive sensors or complex models. The derivative regularizer for preserving local monotonicity is a constructive element if shown to be effective.

major comments (2)

[Abstract] Abstract: The abstract states performance numbers and a training objective but supplies no derivation details, dataset statistics, ablation results, or error analysis; therefore the math and data cannot be verified to support the claim as stated. This is load-bearing for the SOTA performance claim.
[Abstract] Abstract: The assumption that radar returns are sufficient to predict polynomial coefficients that correctly realign multiple mis-scaled local depth regions without introducing new inconsistencies or violating the local monotonicity enforced by the derivative regularizer is presented as the core mechanism but provides no quantitative check on residual violations or generalization beyond the training radar distribution. This is load-bearing for the central claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the two major comments point by point below, noting that the abstract serves as a high-level summary while detailed supporting material appears in the body of the paper.

read point-by-point responses

Referee: [Abstract] Abstract: The abstract states performance numbers and a training objective but supplies no derivation details, dataset statistics, ablation results, or error analysis; therefore the math and data cannot be verified to support the claim as stated. This is load-bearing for the SOTA performance claim.

Authors: Abstracts conventionally provide concise overviews of contributions and results rather than full derivations or statistics. The polynomial fitting formulation, first-derivative regularization objective, and associated math are derived in Section 3. Dataset statistics (including radar point densities and scene distributions) appear in Section 4.1. Ablation studies isolating the regularization term and polynomial degree are in Section 5.2, while error analysis and per-region breakdown are in Section 5.3. These sections contain the equations, tables, and figures needed to verify the reported 24.9% MAE and 33.2% RMSE gains. No changes to the abstract itself are required to maintain its brevity and standard format. revision: no
Referee: [Abstract] Abstract: The assumption that radar returns are sufficient to predict polynomial coefficients that correctly realign multiple mis-scaled local depth regions without introducing new inconsistencies or violating the local monotonicity enforced by the derivative regularizer is presented as the core mechanism but provides no quantitative check on residual violations or generalization beyond the training radar distribution. This is load-bearing for the central claim.

Authors: The manuscript validates the core mechanism through multiple quantitative checks. Section 5.1 reports end-to-end metric depth accuracy gains over affine baselines across three datasets, with qualitative examples showing correction of multi-region misalignments via polynomial inflection points. Table 3 ablates the first-derivative regularizer and shows measurable reduction in local depth inversions. Cross-dataset evaluation (Section 5.4) tests generalization under varying radar characteristics. While an explicit auxiliary metric for residual monotonicity violations on held-out radar distributions is not tabulated, the consistent SOTA performance and ablation results provide indirect but substantive support; we can add a targeted residual-violation table in revision if the referee deems it necessary. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation uses external radar inputs for empirical gains

full rationale

The paper's core mechanism predicts polynomial coefficients from radar measurements to non-uniformly rescale monocular depth outputs, with performance evaluated empirically on three datasets. No equations, self-citations, or fitted parameters are shown in the provided text that reduce the claimed MAE/RMSE improvements or the polynomial transformation itself to quantities defined by the inputs or by construction. The regularizer and radar-to-coefficient mapping are presented as independent components relying on external sensor data rather than tautological redefinitions or renamings of known results.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on domain assumptions about monocular depth model behavior and radar utility that are not independently verified in the provided abstract; no free parameters or invented entities are explicitly quantified.

free parameters (2)

polynomial degree
Degree chosen to allow inflection points while remaining stable; value not stated in abstract.
derivative regularization weight
Weight balancing monotonicity against fitting accuracy; value not stated in abstract.

axioms (2)

domain assumption MDE models infer reasonable local depth structure but misalign regions relative to one another when three or more regions are present
Explicitly stated as the fundamental insight grounding the method.
domain assumption Radar data can be used to predict polynomial coefficients that adaptively correct depth misalignments
Core premise of the radar-guided fitting approach.

pith-pipeline@v0.9.0 · 5748 in / 1409 out tokens · 41736 ms · 2026-05-22T22:32:39.371594+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

65 extracted references · 65 canonical work pages · 2 internal anchors

[1]

Barlow, David J

Richard E. Barlow, David J. Bartholomew, J. Martin Bremner, and H. D. Brunk.Statistical Inference Under Order Restric- tions: The Theory and Application of Isotonic Regression. John Wiley & Sons, New York, 1972. 7

work page 1972
[2]

Adabins: Depth estimation using adaptive bins

Shariq Farooq Bhat, Ibraheem Alhashim, and Peter Wonka. Adabins: Depth estimation using adaptive bins. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4009–4018, 2021. 14

work page 2021
[3]

Unsuper- vised scale-consistent depth and ego-motion learning from monocular video.Advances in neural information processing systems, 32, 2019

Jiawang Bian, Zhichao Li, Naiyan Wang, Huangying Zhan, Chunhua Shen, Ming-Ming Cheng, and Ian Reid. Unsuper- vised scale-consistent depth and ego-motion learning from monocular video.Advances in neural information processing systems, 32, 2019. 2

work page 2019
[4]

Bishop.Pattern Recognition and Machine Learning

Christopher M. Bishop.Pattern Recognition and Machine Learning. Springer, New York, 2006. 7

work page 2006
[5]

Richter, and Vladlen Koltun

Aleksei Bochkovskii, Amaël Delaunoy, Hugo Germain, Mar- cel Santos, Yichao Zhou, Stephan R. Richter, and Vladlen Koltun. Depth pro: Sharp monocular metric depth in less than a second, 2024. 1, 2, 12, 15

work page 2024
[6]

nuScenes: A multimodal dataset for autonomous driving

Holger Caesar, Varun Bankiti, Alex H. Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. nuscenes: A mul- timodal dataset for autonomous driving.arXiv preprint arXiv:1903.11027, 2019. 13

work page internal anchor Pith review Pith/arXiv arXiv 1903
[7]

Emerg- ing properties in self-supervised vision transformers

Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers. InPro- ceedings of the IEEE/CVF international conference on com- puter vision, pages 9650–9660, 2021. 2

work page 2021
[8]

Uncle: Benchmarking unsupervised continual learning for depth completion.arXiv preprint arXiv:2410.18074, 2024

Xien Chen, Rit Gangopadhyay, Michael Chu, Patrick Rim, Hyoungseob Park, and Alex Wong. Uncle: Benchmarking unsupervised continual learning for depth completion.arXiv preprint arXiv:2410.18074, 2024. 2

work page arXiv 2024
[9]

Vision transformer adapter for dense predictions,

Zhe Chen, Yuchen Duan, Wenhai Wang, Junjun He, Tong Lu, Jifeng Dai, and Yu Qiao. Vision transformer adapter for dense predictions.arXiv preprint arXiv:2205.08534, 2022. 16

work page arXiv 2022
[10]

Fixing the scale and shift in monocular depth for camera pose estimation.arXiv preprint arXiv:2501.07742, 2025

Yaqing Ding, Václav Vávra, Viktor Kocur, Jian Yang, Torsten Sattler, and Zuzana Kukelova. Fixing the scale and shift in monocular depth for camera pose estimation.arXiv preprint arXiv:2501.07742, 2025. 1, 3

work page arXiv 2025
[11]

An image is worth 16x16 words: Transform- ers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, et al. An image is worth 16x16 words: Transform- ers for image recognition at scale. InInternational Conference on Learning Representations, 2021. 2

work page 2021
[12]

Toyota drivers’ experiences with dynamic radar cruise control, pre-collision system, and lane-keeping assist.Journal of safety research, 56:67–73, 2016

Angela H Eichelberger and Anne T McCartt. Toyota drivers’ experiences with dynamic radar cruise control, pre-collision system, and lane-keeping assist.Journal of safety research, 56:67–73, 2016. 1, 2

work page 2016
[13]

All-day depth completion

Vadim Ezhov, Hyoungseob Park, Zhaoyang Zhang, Rishi Upadhyay, Howard Zhang, Chethan Chinder Chandrappa, Achuta Kadambi, Yunhao Ba, Julie Dorsey, and Alex Wong. All-day depth completion. In2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE,

work page
[14]

Deep learning for object detection and scene perception in self-driving cars: Survey, challenges, and open issues.Array, 10:100057, 2021

Abhishek Gupta, Alagan Anpalagan, Ling Guan, and Ahmed Shaharyar Khwaja. Deep learning for object detection and scene perception in self-driving cars: Survey, challenges, and open issues.Array, 10:100057, 2021. 1

work page 2021
[15]

4d millimeter- wave radar in autonomous driving: A survey.arXiv preprint arXiv:2306.04242, 2023

Zeyu Han, Jiahao Wang, Zikun Xu, Shuocheng Yang, Lei He, Shaobing Xu, Jianqiang Wang, and Keqiang Li. 4d millimeter- wave radar in autonomous driving: A survey.arXiv preprint arXiv:2306.04242, 2023. 1

work page arXiv 2023
[16]

Improving semi-supervised and domain-adaptive semantic segmentation with self-supervised depth estimation

Lukas Hoyer, Dengxin Dai, Qin Wang, Yuhua Chen, and Luc Van Gool. Improving semi-supervised and domain-adaptive semantic segmentation with self-supervised depth estimation. International Journal of Computer Vision, pages 1–27, 2023. 2

work page 2023
[17]

Lora: Low-rank adaptation of large language models.ICLR, 1(2):3,

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3,

work page
[18]

Penet: Towards precise and efficient image guided depth completion.2021 IEEE International Conference on Robotics and Automation (ICRA), pages 13656–13662, 2021

Mu Hu, Shuling Wang, Bin Li, Shiyu Ning, Li Fan, and Xiao- jin Gong. Penet: Towards precise and efficient image guided depth completion.2021 IEEE International Conference on Robotics and Automation (ICRA), pages 13656–13662, 2021. 2

work page 2021
[19]

Depthcrafter: Generating consistent long depth sequences for open-world videos

Wenbo Hu, Xiangjun Gao, Xiaoyu Li, Sijie Zhao, Xiaodong Cun, Yong Zhang, Long Quan, and Ying Shan. Depthcrafter: Generating consistent long depth sequences for open-world videos.arXiv preprint arXiv:2409.02095, 2024. 1, 3

work page arXiv 2024
[20]

Rad- cloud: Real-time high-resolution point cloud generation us- ing low-cost radars for aerial and ground vehicles

David Hunt, Shaocheng Luo, Amir Khazraei, Xiao Zhang, Spencer Hallyburton, Tingjun Chen, and Miroslav Pajic. Rad- cloud: Real-time high-resolution point cloud generation us- ing low-cost radars for aerial and ground vehicles. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 12269–12275. IEEE, 2024. 2

work page 2024
[21]

Millimeter-wave microstrip array antenna for automotive radars.IEICE transactions on communications, 86(9):2728–2738, 2003

Hideo Iizuka, Toshiaki Watanabe, Kazuo Sato, and Kunitoshi Nishikawa. Millimeter-wave microstrip array antenna for automotive radars.IEICE transactions on communications, 86(9):2728–2738, 2003. 1

work page 2003
[22]

Sparse and dense data with cnns: Depth completion and semantic segmentation

Maximilian Jaritz, Raoul De Charette, Etienne Wirbel, Xavier Perrotton, and Fawzi Nashashibi. Sparse and dense data with cnns: Depth completion and semantic segmentation. In2018 International Conference on 3D Vision (3DV), pages 52–60. IEEE, 2018. 2

work page 2018
[23]

From big to small: Multi-scale local planar guidance for monocular depth estimation.arXiv preprint arXiv:1907.10326, 2019

Jin Han Lee, Myung-Kyu Han, Dong Wook Ko, and Il Hong Suh. From big to small: Multi-scale local planar guidance for monocular depth estimation.arXiv preprint arXiv:1907.10326, 2019. 14

work page arXiv 1907
[24]

A multi-scale guided cascade hourglass network for depth completion

Ang Li, Zejian Yuan, Yonggen Ling, Wanchao Chi, Sheng- hao Zhang, and Chong Zhang. A multi-scale guided cascade hourglass network for depth completion. In2020 IEEE Win- ter Conference on Applications of Computer Vision (WACV), pages 32–40, 2020. 2

work page 2020
[25]

Sparse beats dense: Rethinking supervision in radar-camera depth completion

Huadong Li, Minhao Jing, Wang Jin, Shichao Dong, Jiajun Liang, Haoqiang Fan, and Renhe Ji. Sparse beats dense: Rethinking supervision in radar-camera depth completion. In European Conference on Computer Vision, pages 127–143. Springer, 2024. 3, 7, 8, 14

work page 2024
[26]

Radarcam-depth: Radar-camera fusion for 9 depth estimation with learned metric scale

Han Li, Yukai Ma, Yaqing Gu, Kewei Hu, Yong Liu, and Xingxing Zuo. Radarcam-depth: Radar-camera fusion for 9 depth estimation with learned metric scale. In2024 IEEE In- ternational Conference on Robotics and Automation (ICRA), pages 10665–10672. IEEE, 2024. 2, 3, 4, 5, 7, 8, 13, 14

work page 2024
[27]

Depth estimation from monocular images and sparse radar data

Juan-Ting Lin, Dengxin Dai, and Luc Van Gool. Depth estimation from monocular images and sparse radar data. In2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10233–10240. IEEE, 2020. 3, 8, 14

work page 2020
[28]

Lin, Tao Cheng, Qianglong Zhong, Wending Zhou, and Huanhuan Yang

Yuan Qin. Lin, Tao Cheng, Qianglong Zhong, Wending Zhou, and Huanhuan Yang. Dynamic spatial propagation network for depth completion. InAAAI Conference on Artificial Intel- ligence, 2022. 2

work page 2022
[29]

Depth estimation from monocular images and sparse radar using deep ordinal regression network

Chen-Chou Lo and Patrick Vandewalle. Depth estimation from monocular images and sparse radar using deep ordinal regression network. In2021 IEEE International Conference on Image Processing (ICIP), pages 3343–3347. IEEE, 2021. 3, 14

work page 2021
[30]

Full-velocity radar returns by radar-camera fusion

Yunfei Long, Daniel Morris, Xiaoming Liu, Marcos Castro, Punarjay Chakravarty, and Praveen Narayanan. Full-velocity radar returns by radar-camera fusion. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 16198–16207, 2021. 3, 14

work page 2021
[31]

Sparse-to-dense: Depth prediction from sparse depth samples and a single image

Fangchang Ma and Sertac Karaman. Sparse-to-dense: Depth prediction from sparse depth samples and a single image. In 2018 IEEE International Conference on Robotics and Au- tomation (ICRA), pages 4796–4803, 2018. 2

work page 2018
[32]

Fangchang Ma, Guilherme Venturelli Cavalheiro, and Sertac Karaman. Self-supervised sparse-to-dense: Self-supervised depth completion from lidar and monocular camera.2019 In- ternational Conference on Robotics and Automation (ICRA), pages 3288–3295, 2018. 2

work page 2019
[33]

Real- time navigation in 3d environments based on depth camera data

Daniel Maier, Armin Hornung, and Maren Bennewitz. Real- time navigation in 3d environments based on depth camera data. In2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012), pages 692–697. IEEE,

work page 2012
[34]

K-radar: 4d radar object detection for autonomous driving in various weather conditions.Advances in Neural Information Processing Systems, 35:3819–3829, 2022

Dong-Hee Paek, Seung-Hyun Kong, and Kevin Tirta Wijaya. K-radar: 4d radar object detection for autonomous driving in various weather conditions.Advances in Neural Information Processing Systems, 35:3819–3829, 2022. 2

work page 2022
[35]

Andras Palffy, Ewoud Pool, Srimannarayana Baratam, Julian F. P. Kooij, and Dariu M. Gavrila. Multi-class road user detection with 3+1d radar in the view-of-delft dataset.IEEE Robotics and Automation Letters, 7(2):4961–4968, 2022. 14

work page 2022
[36]

Non-local spatial propagation network for depth completion

Jinsun Park, Kyungdon Joo, Zhe Hu, Chi Liu, and In So Kweon. Non-local spatial propagation network for depth completion. InEuropean Conference on Computer Vision,

work page
[37]

P3depth: Monocular depth estimation with a piecewise planarity prior

Vaishakh Patil, Christos Sakaridis, Alexander Liniger, and Luc Van Gool. P3depth: Monocular depth estimation with a piecewise planarity prior. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1610–1621, 2022. 14

work page 2022
[38]

UniDepthV2: Universal monocular metric depth estimation made simpler, 2025

Luigi Piccinelli, Christos Sakaridis, Yung-Hsu Yang, Mat- tia Segu, Siyuan Li, Wim Abbeloos, and Luc Van Gool. UniDepthV2: Universal monocular metric depth estimation made simpler, 2025. 1, 2, 12, 15

work page 2025
[39]

On the uncertainty of self-supervised monocular depth estimation

Matteo Poggi, Filippo Aleotti, Fabio Tosi, and Stefano Mat- toccia. On the uncertainty of self-supervised monocular depth estimation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3227–3237,

work page
[40]

A survey on lidar scanning mechanisms.Electronics, 9(5):741, 2020

Thinal Raj, Fazida Hanim Hashim, Aqilah Baseri Huddin, Mohd Faisal Ibrahim, and Aini Hussain. A survey on lidar scanning mechanisms.Electronics, 9(5):741, 2020. 1, 2

work page 2020
[41]

René Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, and Vladlen Koltun. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer.IEEE transactions on pattern analysis and machine intelligence, 44(3):1623–1637, 2020. 2

work page 2020
[42]

Vi- sion transformers for dense prediction

René Ranftl, Alexey Bochkovskiy, and Vladlen Koltun. Vi- sion transformers for dense prediction. InProceedings of the IEEE/CVF international conference on computer vision, pages 12179–12188, 2021. 1, 12, 15

work page 2021
[43]

Guide- former: Transformers for image guided depth completion

Kyeongha Rho, Jinsung Ha, and Youngjung Kim. Guide- former: Transformers for image guided depth completion. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6240–6249, 2022. 2

work page 2022
[44]

Protodepth: Unsupervised continual depth completion with prototypes.arXiv preprint arXiv:2503.12745, 2025

Patrick Rim, Hyoungseob Park, S Gangopadhyay, Ziyao Zeng, Younjoon Chung, and Alex Wong. Protodepth: Unsupervised continual depth completion with prototypes.arXiv preprint arXiv:2503.12745, 2025. 2

work page arXiv 2025
[45]

Depth estimation from camera image and mmwave radar point cloud

Akash Deep Singh, Yunhao Ba, Ankur Sarker, Howard Zhang, Achuta Kadambi, Stefano Soatto, Mani Srivastava, and Alex Wong. Depth estimation from camera image and mmwave radar point cloud. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 9275–9285, 2023. 2, 3, 4, 5, 7, 8, 14

work page 2023
[46]

Monocular depth estimation using laplacian pyramid-based depth resid- uals.IEEE transactions on circuits and systems for video technology, 31(11):4381–4393, 2021

Minsoo Song, Seokjae Lim, and Wonjun Kim. Monocular depth estimation using laplacian pyramid-based depth resid- uals.IEEE transactions on circuits and systems for video technology, 31(11):4381–4393, 2021. 14

work page 2021
[47]

Unsupervised monocular estimation of depth and visual odometry uusing attention and depth-pose consistency loss.IEEE Transactions on Multime- dia, 2023

Xiaogang Song, Haoyue Hu, Li Liang, Weiwei Shi, Guo Xie, Xiaofeng Lu, and Xinhong Hei. Unsupervised monocular estimation of depth and visual odometry uusing attention and depth-pose consistency loss.IEEE Transactions on Multime- dia, 2023. 2

work page 2023
[48]

Radars for au- tonomous driving: A review of deep learning methods and challenges.IEEE Access, 11:97147–97168, 2023

Arvind Srivastav and Soumyajit Mandal. Radars for au- tonomous driving: A review of deep learning methods and challenges.IEEE Access, 11:97147–97168, 2023. 2

work page 2023
[49]

Cafnet: A confidence-driven framework for radar camera depth estimation

Huawei Sun, Hao Feng, Julius Ott, Lorenzo Servadei, and Robert Wille. Cafnet: A confidence-driven framework for radar camera depth estimation. In2024 IEEE/RSJ Interna- tional Conference on Intelligent Robots and Systems (IROS), pages 2734–2740. IEEE, 2024. 3, 14

work page 2024
[50]

Get-up: Geometric-aware depth estimation with radar points upsampling

Huawei Sun, Zixu Wang, Hao Feng, Julius Ott, Lorenzo Servadei, and Robert Wille. Get-up: Geometric-aware depth estimation with radar points upsampling. InProceedings of the Winter Conference on Applications of Computer Vision (WACV), pages 1850–1860, 2025. 3, 7, 8, 14

work page 2025
[51]

Bilateral propagation network for depth completion

Jie Tang, Fei-Peng Tian, Boshi An, Jian Li, and Ping Tan. Bilateral propagation network for depth completion. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9763–9772, 2024. 14 10

work page 2024
[52]

Marigold-dc: Zero-shot monocular depth completion with guided diffusion.arXiv preprint arXiv:2412.13389, 2024

Massimiliano Viola, Kevin Qu, Nando Metzger, Bingxin Ke, Alexander Becker, Konrad Schindler, and Anton Obukhov. Marigold-dc: Zero-shot monocular depth completion with guided diffusion.arXiv preprint arXiv:2412.13389, 2024. 1, 2, 3

work page arXiv 2024
[53]

Plug-and-Play: Improve Depth Estimation via Sparse Data Propagation

Tsun-Hsuan Wang, Fu-En Wang, Juan-Ting Lin, Yi-Hsuan Tsai, Wei-Chen Chiu, and Min Sun. Plug-and-play: Improve depth estimation via sparse data propagation.arXiv preprint arXiv:1812.08350, 2018. 14

work page internal anchor Pith review Pith/arXiv arXiv 2018
[54]

Tacodepth: Towards efficient radar-camera depth estimation with one-stage fusion

Yiran Wang, Jiaqi Li, Chaoyi Hong, Ruibo Li, Liusheng Sun, Xiao Song, Zhe Wang, Zhiguo Cao, and Guosheng Lin. Tacodepth: Towards efficient radar-camera depth estimation with one-stage fusion. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 10523–10533,

work page
[55]

Unsupervised depth completion from visual inertial odome- try.IEEE Robotics and Automation Letters, 5(2):1899–1906,

Alex Wong, Xiaohan Fei, Stephanie Tsuei, and Stefano Soatto. Unsupervised depth completion from visual inertial odome- try.IEEE Robotics and Automation Letters, 5(2):1899–1906,

work page 1906
[56]

Quadric representations for lidar odometry, mapping and localization.IEEE Robotics and Automation Letters, 8 (8):5023–5030, 2023

Chao Xia, Chenfeng Xu, Patrick Rim, Mingyu Ding, Nan- ning Zheng, Kurt Keutzer, Masayoshi Tomizuka, and Wei Zhan. Quadric representations for lidar odometry, mapping and localization.IEEE Robotics and Automation Letters, 8 (8):5023–5030, 2023. 14

work page 2023
[57]

Deformable spatial propagation networks for depth completion.2020 IEEE In- ternational Conference on Image Processing (ICIP), pages 913–917, 2020

Zheyuan Xu, Hongche Yin, and Jian Yao. Deformable spatial propagation networks for depth completion.2020 IEEE In- ternational Conference on Image Processing (ICIP), pages 913–917, 2020. 2

work page 2020
[58]

Rignet: Repetitive image guided network for depth completion

Zhiqiang Yan, Kun Wang, Xiang Li, Zhenyu Zhang, Baobei Xu, Jun Li, and Jian Yang. Rignet: Repetitive image guided network for depth completion. InEuropean Conference on Computer Vision, 2021. 2

work page 2021
[59]

Depth anything v2.Advances in Neural Information Processing Systems, 37: 21875–21911, 2024

Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiao- gang Xu, Jiashi Feng, and Hengshuang Zhao. Depth anything v2.Advances in Neural Information Processing Systems, 37: 21875–21911, 2024. 1, 2, 12, 14, 15

work page 2024
[60]

Metric3d: Towards zero-shot metric 3d prediction from a single image

Wei Yin, Chi Zhang, Hao Chen, Zhipeng Cai, Gang Yu, Kaix- uan Wang, Xiaozhi Chen, and Chunhua Shen. Metric3d: Towards zero-shot metric 3d prediction from a single image. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 9043–9053, 2023. 1, 3

work page 2023
[61]

Relative pose estimation through affine corrections of monocular depth priors.arXiv preprint arXiv:2501.05446, 2025

Yifan Yu, Shaohui Liu, Rémi Pautrat, Marc Pollefeys, and Viktor Larsson. Relative pose estimation through affine corrections of monocular depth priors.arXiv preprint arXiv:2501.05446, 2025

work page arXiv 2025
[62]

Rsa: Resolving scale ambiguities in monocu- lar depth estimators through language descriptions.Advances in neural information processing systems, 37:112684–112705,

Ziyao Zeng, Yangchao Wu, Hyoungseob Park, Daniel Wang, Fengyu Yang, Stefano Soatto, Dong Lao, Byung-Woo Hong, and Alex Wong. Rsa: Resolving scale ambiguities in monocu- lar depth estimators through language descriptions.Advances in neural information processing systems, 37:112684–112705,

work page
[63]

Completionformer: Depth completion with convolutions and vision transformers

Youmin Zhang, Xianda Guo, Matteo Poggi, Zheng Zhu, Guan Huang, and Stefano Mattoccia. Completionformer: Depth completion with convolutions and vision transformers. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18527–18536, 2023. 2

work page 2023
[64]

To- wards better generalization: Joint depth-pose learning without posenet

Wang Zhao, Shaohui Liu, Yezhi Shu, and Yong-Jin Liu. To- wards better generalization: Joint depth-pose learning without posenet. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9151–9161,

work page
[65]

2 11 Radar-Guided Polynomial Fitting for Metric Depth Estimation Supplementary Material nuScenes Method MAE RMSE DPT [42] 5188.2 6884.5 UniDepth [38] 2129.8 4887.7 Depth Anything [59] 2404.9 4851.1 Depth Pro [5] 3835.0 6600.3 RadarCam-Depth w/ DPT 1689.7 3948.0 RadarCam-Depth w/ UniDepth 1872.0 4321.2 RadarCam-Depth w/ Depth Anything 1953.6 5107.8 RadarCa...

work page arXiv 1953

[1] [1]

Barlow, David J

Richard E. Barlow, David J. Bartholomew, J. Martin Bremner, and H. D. Brunk.Statistical Inference Under Order Restric- tions: The Theory and Application of Isotonic Regression. John Wiley & Sons, New York, 1972. 7

work page 1972

[2] [2]

Adabins: Depth estimation using adaptive bins

Shariq Farooq Bhat, Ibraheem Alhashim, and Peter Wonka. Adabins: Depth estimation using adaptive bins. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4009–4018, 2021. 14

work page 2021

[3] [3]

Unsuper- vised scale-consistent depth and ego-motion learning from monocular video.Advances in neural information processing systems, 32, 2019

Jiawang Bian, Zhichao Li, Naiyan Wang, Huangying Zhan, Chunhua Shen, Ming-Ming Cheng, and Ian Reid. Unsuper- vised scale-consistent depth and ego-motion learning from monocular video.Advances in neural information processing systems, 32, 2019. 2

work page 2019

[4] [4]

Bishop.Pattern Recognition and Machine Learning

Christopher M. Bishop.Pattern Recognition and Machine Learning. Springer, New York, 2006. 7

work page 2006

[5] [5]

Richter, and Vladlen Koltun

Aleksei Bochkovskii, Amaël Delaunoy, Hugo Germain, Mar- cel Santos, Yichao Zhou, Stephan R. Richter, and Vladlen Koltun. Depth pro: Sharp monocular metric depth in less than a second, 2024. 1, 2, 12, 15

work page 2024

[6] [6]

nuScenes: A multimodal dataset for autonomous driving

Holger Caesar, Varun Bankiti, Alex H. Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. nuscenes: A mul- timodal dataset for autonomous driving.arXiv preprint arXiv:1903.11027, 2019. 13

work page internal anchor Pith review Pith/arXiv arXiv 1903

[7] [7]

Emerg- ing properties in self-supervised vision transformers

Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers. InPro- ceedings of the IEEE/CVF international conference on com- puter vision, pages 9650–9660, 2021. 2

work page 2021

[8] [8]

Uncle: Benchmarking unsupervised continual learning for depth completion.arXiv preprint arXiv:2410.18074, 2024

Xien Chen, Rit Gangopadhyay, Michael Chu, Patrick Rim, Hyoungseob Park, and Alex Wong. Uncle: Benchmarking unsupervised continual learning for depth completion.arXiv preprint arXiv:2410.18074, 2024. 2

work page arXiv 2024

[9] [9]

Vision transformer adapter for dense predictions,

Zhe Chen, Yuchen Duan, Wenhai Wang, Junjun He, Tong Lu, Jifeng Dai, and Yu Qiao. Vision transformer adapter for dense predictions.arXiv preprint arXiv:2205.08534, 2022. 16

work page arXiv 2022

[10] [10]

Fixing the scale and shift in monocular depth for camera pose estimation.arXiv preprint arXiv:2501.07742, 2025

Yaqing Ding, Václav Vávra, Viktor Kocur, Jian Yang, Torsten Sattler, and Zuzana Kukelova. Fixing the scale and shift in monocular depth for camera pose estimation.arXiv preprint arXiv:2501.07742, 2025. 1, 3

work page arXiv 2025

[11] [11]

An image is worth 16x16 words: Transform- ers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, et al. An image is worth 16x16 words: Transform- ers for image recognition at scale. InInternational Conference on Learning Representations, 2021. 2

work page 2021

[12] [12]

Toyota drivers’ experiences with dynamic radar cruise control, pre-collision system, and lane-keeping assist.Journal of safety research, 56:67–73, 2016

Angela H Eichelberger and Anne T McCartt. Toyota drivers’ experiences with dynamic radar cruise control, pre-collision system, and lane-keeping assist.Journal of safety research, 56:67–73, 2016. 1, 2

work page 2016

[13] [13]

All-day depth completion

Vadim Ezhov, Hyoungseob Park, Zhaoyang Zhang, Rishi Upadhyay, Howard Zhang, Chethan Chinder Chandrappa, Achuta Kadambi, Yunhao Ba, Julie Dorsey, and Alex Wong. All-day depth completion. In2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE,

work page

[14] [14]

Deep learning for object detection and scene perception in self-driving cars: Survey, challenges, and open issues.Array, 10:100057, 2021

Abhishek Gupta, Alagan Anpalagan, Ling Guan, and Ahmed Shaharyar Khwaja. Deep learning for object detection and scene perception in self-driving cars: Survey, challenges, and open issues.Array, 10:100057, 2021. 1

work page 2021

[15] [15]

4d millimeter- wave radar in autonomous driving: A survey.arXiv preprint arXiv:2306.04242, 2023

Zeyu Han, Jiahao Wang, Zikun Xu, Shuocheng Yang, Lei He, Shaobing Xu, Jianqiang Wang, and Keqiang Li. 4d millimeter- wave radar in autonomous driving: A survey.arXiv preprint arXiv:2306.04242, 2023. 1

work page arXiv 2023

[16] [16]

Improving semi-supervised and domain-adaptive semantic segmentation with self-supervised depth estimation

Lukas Hoyer, Dengxin Dai, Qin Wang, Yuhua Chen, and Luc Van Gool. Improving semi-supervised and domain-adaptive semantic segmentation with self-supervised depth estimation. International Journal of Computer Vision, pages 1–27, 2023. 2

work page 2023

[17] [17]

Lora: Low-rank adaptation of large language models.ICLR, 1(2):3,

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3,

work page

[18] [18]

Penet: Towards precise and efficient image guided depth completion.2021 IEEE International Conference on Robotics and Automation (ICRA), pages 13656–13662, 2021

Mu Hu, Shuling Wang, Bin Li, Shiyu Ning, Li Fan, and Xiao- jin Gong. Penet: Towards precise and efficient image guided depth completion.2021 IEEE International Conference on Robotics and Automation (ICRA), pages 13656–13662, 2021. 2

work page 2021

[19] [19]

Depthcrafter: Generating consistent long depth sequences for open-world videos

Wenbo Hu, Xiangjun Gao, Xiaoyu Li, Sijie Zhao, Xiaodong Cun, Yong Zhang, Long Quan, and Ying Shan. Depthcrafter: Generating consistent long depth sequences for open-world videos.arXiv preprint arXiv:2409.02095, 2024. 1, 3

work page arXiv 2024

[20] [20]

Rad- cloud: Real-time high-resolution point cloud generation us- ing low-cost radars for aerial and ground vehicles

David Hunt, Shaocheng Luo, Amir Khazraei, Xiao Zhang, Spencer Hallyburton, Tingjun Chen, and Miroslav Pajic. Rad- cloud: Real-time high-resolution point cloud generation us- ing low-cost radars for aerial and ground vehicles. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 12269–12275. IEEE, 2024. 2

work page 2024

[21] [21]

Millimeter-wave microstrip array antenna for automotive radars.IEICE transactions on communications, 86(9):2728–2738, 2003

Hideo Iizuka, Toshiaki Watanabe, Kazuo Sato, and Kunitoshi Nishikawa. Millimeter-wave microstrip array antenna for automotive radars.IEICE transactions on communications, 86(9):2728–2738, 2003. 1

work page 2003

[22] [22]

Sparse and dense data with cnns: Depth completion and semantic segmentation

Maximilian Jaritz, Raoul De Charette, Etienne Wirbel, Xavier Perrotton, and Fawzi Nashashibi. Sparse and dense data with cnns: Depth completion and semantic segmentation. In2018 International Conference on 3D Vision (3DV), pages 52–60. IEEE, 2018. 2

work page 2018

[23] [23]

From big to small: Multi-scale local planar guidance for monocular depth estimation.arXiv preprint arXiv:1907.10326, 2019

Jin Han Lee, Myung-Kyu Han, Dong Wook Ko, and Il Hong Suh. From big to small: Multi-scale local planar guidance for monocular depth estimation.arXiv preprint arXiv:1907.10326, 2019. 14

work page arXiv 1907

[24] [24]

A multi-scale guided cascade hourglass network for depth completion

Ang Li, Zejian Yuan, Yonggen Ling, Wanchao Chi, Sheng- hao Zhang, and Chong Zhang. A multi-scale guided cascade hourglass network for depth completion. In2020 IEEE Win- ter Conference on Applications of Computer Vision (WACV), pages 32–40, 2020. 2

work page 2020

[25] [25]

Sparse beats dense: Rethinking supervision in radar-camera depth completion

Huadong Li, Minhao Jing, Wang Jin, Shichao Dong, Jiajun Liang, Haoqiang Fan, and Renhe Ji. Sparse beats dense: Rethinking supervision in radar-camera depth completion. In European Conference on Computer Vision, pages 127–143. Springer, 2024. 3, 7, 8, 14

work page 2024

[26] [26]

Radarcam-depth: Radar-camera fusion for 9 depth estimation with learned metric scale

Han Li, Yukai Ma, Yaqing Gu, Kewei Hu, Yong Liu, and Xingxing Zuo. Radarcam-depth: Radar-camera fusion for 9 depth estimation with learned metric scale. In2024 IEEE In- ternational Conference on Robotics and Automation (ICRA), pages 10665–10672. IEEE, 2024. 2, 3, 4, 5, 7, 8, 13, 14

work page 2024

[27] [27]

Depth estimation from monocular images and sparse radar data

Juan-Ting Lin, Dengxin Dai, and Luc Van Gool. Depth estimation from monocular images and sparse radar data. In2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10233–10240. IEEE, 2020. 3, 8, 14

work page 2020

[28] [28]

Lin, Tao Cheng, Qianglong Zhong, Wending Zhou, and Huanhuan Yang

Yuan Qin. Lin, Tao Cheng, Qianglong Zhong, Wending Zhou, and Huanhuan Yang. Dynamic spatial propagation network for depth completion. InAAAI Conference on Artificial Intel- ligence, 2022. 2

work page 2022

[29] [29]

Depth estimation from monocular images and sparse radar using deep ordinal regression network

Chen-Chou Lo and Patrick Vandewalle. Depth estimation from monocular images and sparse radar using deep ordinal regression network. In2021 IEEE International Conference on Image Processing (ICIP), pages 3343–3347. IEEE, 2021. 3, 14

work page 2021

[30] [30]

Full-velocity radar returns by radar-camera fusion

Yunfei Long, Daniel Morris, Xiaoming Liu, Marcos Castro, Punarjay Chakravarty, and Praveen Narayanan. Full-velocity radar returns by radar-camera fusion. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 16198–16207, 2021. 3, 14

work page 2021

[31] [31]

Sparse-to-dense: Depth prediction from sparse depth samples and a single image

Fangchang Ma and Sertac Karaman. Sparse-to-dense: Depth prediction from sparse depth samples and a single image. In 2018 IEEE International Conference on Robotics and Au- tomation (ICRA), pages 4796–4803, 2018. 2

work page 2018

[32] [32]

Fangchang Ma, Guilherme Venturelli Cavalheiro, and Sertac Karaman. Self-supervised sparse-to-dense: Self-supervised depth completion from lidar and monocular camera.2019 In- ternational Conference on Robotics and Automation (ICRA), pages 3288–3295, 2018. 2

work page 2019

[33] [33]

Real- time navigation in 3d environments based on depth camera data

Daniel Maier, Armin Hornung, and Maren Bennewitz. Real- time navigation in 3d environments based on depth camera data. In2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012), pages 692–697. IEEE,

work page 2012

[34] [34]

K-radar: 4d radar object detection for autonomous driving in various weather conditions.Advances in Neural Information Processing Systems, 35:3819–3829, 2022

Dong-Hee Paek, Seung-Hyun Kong, and Kevin Tirta Wijaya. K-radar: 4d radar object detection for autonomous driving in various weather conditions.Advances in Neural Information Processing Systems, 35:3819–3829, 2022. 2

work page 2022

[35] [35]

Andras Palffy, Ewoud Pool, Srimannarayana Baratam, Julian F. P. Kooij, and Dariu M. Gavrila. Multi-class road user detection with 3+1d radar in the view-of-delft dataset.IEEE Robotics and Automation Letters, 7(2):4961–4968, 2022. 14

work page 2022

[36] [36]

Non-local spatial propagation network for depth completion

Jinsun Park, Kyungdon Joo, Zhe Hu, Chi Liu, and In So Kweon. Non-local spatial propagation network for depth completion. InEuropean Conference on Computer Vision,

work page

[37] [37]

P3depth: Monocular depth estimation with a piecewise planarity prior

Vaishakh Patil, Christos Sakaridis, Alexander Liniger, and Luc Van Gool. P3depth: Monocular depth estimation with a piecewise planarity prior. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1610–1621, 2022. 14

work page 2022

[38] [38]

UniDepthV2: Universal monocular metric depth estimation made simpler, 2025

Luigi Piccinelli, Christos Sakaridis, Yung-Hsu Yang, Mat- tia Segu, Siyuan Li, Wim Abbeloos, and Luc Van Gool. UniDepthV2: Universal monocular metric depth estimation made simpler, 2025. 1, 2, 12, 15

work page 2025

[39] [39]

On the uncertainty of self-supervised monocular depth estimation

Matteo Poggi, Filippo Aleotti, Fabio Tosi, and Stefano Mat- toccia. On the uncertainty of self-supervised monocular depth estimation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3227–3237,

work page

[40] [40]

A survey on lidar scanning mechanisms.Electronics, 9(5):741, 2020

Thinal Raj, Fazida Hanim Hashim, Aqilah Baseri Huddin, Mohd Faisal Ibrahim, and Aini Hussain. A survey on lidar scanning mechanisms.Electronics, 9(5):741, 2020. 1, 2

work page 2020

[41] [41]

René Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, and Vladlen Koltun. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer.IEEE transactions on pattern analysis and machine intelligence, 44(3):1623–1637, 2020. 2

work page 2020

[42] [42]

Vi- sion transformers for dense prediction

René Ranftl, Alexey Bochkovskiy, and Vladlen Koltun. Vi- sion transformers for dense prediction. InProceedings of the IEEE/CVF international conference on computer vision, pages 12179–12188, 2021. 1, 12, 15

work page 2021

[43] [43]

Guide- former: Transformers for image guided depth completion

Kyeongha Rho, Jinsung Ha, and Youngjung Kim. Guide- former: Transformers for image guided depth completion. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6240–6249, 2022. 2

work page 2022

[44] [44]

Protodepth: Unsupervised continual depth completion with prototypes.arXiv preprint arXiv:2503.12745, 2025

Patrick Rim, Hyoungseob Park, S Gangopadhyay, Ziyao Zeng, Younjoon Chung, and Alex Wong. Protodepth: Unsupervised continual depth completion with prototypes.arXiv preprint arXiv:2503.12745, 2025. 2

work page arXiv 2025

[45] [45]

Depth estimation from camera image and mmwave radar point cloud

Akash Deep Singh, Yunhao Ba, Ankur Sarker, Howard Zhang, Achuta Kadambi, Stefano Soatto, Mani Srivastava, and Alex Wong. Depth estimation from camera image and mmwave radar point cloud. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 9275–9285, 2023. 2, 3, 4, 5, 7, 8, 14

work page 2023

[46] [46]

Monocular depth estimation using laplacian pyramid-based depth resid- uals.IEEE transactions on circuits and systems for video technology, 31(11):4381–4393, 2021

Minsoo Song, Seokjae Lim, and Wonjun Kim. Monocular depth estimation using laplacian pyramid-based depth resid- uals.IEEE transactions on circuits and systems for video technology, 31(11):4381–4393, 2021. 14

work page 2021

[47] [47]

Unsupervised monocular estimation of depth and visual odometry uusing attention and depth-pose consistency loss.IEEE Transactions on Multime- dia, 2023

Xiaogang Song, Haoyue Hu, Li Liang, Weiwei Shi, Guo Xie, Xiaofeng Lu, and Xinhong Hei. Unsupervised monocular estimation of depth and visual odometry uusing attention and depth-pose consistency loss.IEEE Transactions on Multime- dia, 2023. 2

work page 2023

[48] [48]

Radars for au- tonomous driving: A review of deep learning methods and challenges.IEEE Access, 11:97147–97168, 2023

Arvind Srivastav and Soumyajit Mandal. Radars for au- tonomous driving: A review of deep learning methods and challenges.IEEE Access, 11:97147–97168, 2023. 2

work page 2023

[49] [49]

Cafnet: A confidence-driven framework for radar camera depth estimation

Huawei Sun, Hao Feng, Julius Ott, Lorenzo Servadei, and Robert Wille. Cafnet: A confidence-driven framework for radar camera depth estimation. In2024 IEEE/RSJ Interna- tional Conference on Intelligent Robots and Systems (IROS), pages 2734–2740. IEEE, 2024. 3, 14

work page 2024

[50] [50]

Get-up: Geometric-aware depth estimation with radar points upsampling

Huawei Sun, Zixu Wang, Hao Feng, Julius Ott, Lorenzo Servadei, and Robert Wille. Get-up: Geometric-aware depth estimation with radar points upsampling. InProceedings of the Winter Conference on Applications of Computer Vision (WACV), pages 1850–1860, 2025. 3, 7, 8, 14

work page 2025

[51] [51]

Bilateral propagation network for depth completion

Jie Tang, Fei-Peng Tian, Boshi An, Jian Li, and Ping Tan. Bilateral propagation network for depth completion. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9763–9772, 2024. 14 10

work page 2024

[52] [52]

Marigold-dc: Zero-shot monocular depth completion with guided diffusion.arXiv preprint arXiv:2412.13389, 2024

Massimiliano Viola, Kevin Qu, Nando Metzger, Bingxin Ke, Alexander Becker, Konrad Schindler, and Anton Obukhov. Marigold-dc: Zero-shot monocular depth completion with guided diffusion.arXiv preprint arXiv:2412.13389, 2024. 1, 2, 3

work page arXiv 2024

[53] [53]

Plug-and-Play: Improve Depth Estimation via Sparse Data Propagation

Tsun-Hsuan Wang, Fu-En Wang, Juan-Ting Lin, Yi-Hsuan Tsai, Wei-Chen Chiu, and Min Sun. Plug-and-play: Improve depth estimation via sparse data propagation.arXiv preprint arXiv:1812.08350, 2018. 14

work page internal anchor Pith review Pith/arXiv arXiv 2018

[54] [54]

Tacodepth: Towards efficient radar-camera depth estimation with one-stage fusion

Yiran Wang, Jiaqi Li, Chaoyi Hong, Ruibo Li, Liusheng Sun, Xiao Song, Zhe Wang, Zhiguo Cao, and Guosheng Lin. Tacodepth: Towards efficient radar-camera depth estimation with one-stage fusion. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 10523–10533,

work page

[55] [55]

Unsupervised depth completion from visual inertial odome- try.IEEE Robotics and Automation Letters, 5(2):1899–1906,

Alex Wong, Xiaohan Fei, Stephanie Tsuei, and Stefano Soatto. Unsupervised depth completion from visual inertial odome- try.IEEE Robotics and Automation Letters, 5(2):1899–1906,

work page 1906

[56] [56]

Quadric representations for lidar odometry, mapping and localization.IEEE Robotics and Automation Letters, 8 (8):5023–5030, 2023

Chao Xia, Chenfeng Xu, Patrick Rim, Mingyu Ding, Nan- ning Zheng, Kurt Keutzer, Masayoshi Tomizuka, and Wei Zhan. Quadric representations for lidar odometry, mapping and localization.IEEE Robotics and Automation Letters, 8 (8):5023–5030, 2023. 14

work page 2023

[57] [57]

Deformable spatial propagation networks for depth completion.2020 IEEE In- ternational Conference on Image Processing (ICIP), pages 913–917, 2020

Zheyuan Xu, Hongche Yin, and Jian Yao. Deformable spatial propagation networks for depth completion.2020 IEEE In- ternational Conference on Image Processing (ICIP), pages 913–917, 2020. 2

work page 2020

[58] [58]

Rignet: Repetitive image guided network for depth completion

Zhiqiang Yan, Kun Wang, Xiang Li, Zhenyu Zhang, Baobei Xu, Jun Li, and Jian Yang. Rignet: Repetitive image guided network for depth completion. InEuropean Conference on Computer Vision, 2021. 2

work page 2021

[59] [59]

Depth anything v2.Advances in Neural Information Processing Systems, 37: 21875–21911, 2024

Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiao- gang Xu, Jiashi Feng, and Hengshuang Zhao. Depth anything v2.Advances in Neural Information Processing Systems, 37: 21875–21911, 2024. 1, 2, 12, 14, 15

work page 2024

[60] [60]

Metric3d: Towards zero-shot metric 3d prediction from a single image

Wei Yin, Chi Zhang, Hao Chen, Zhipeng Cai, Gang Yu, Kaix- uan Wang, Xiaozhi Chen, and Chunhua Shen. Metric3d: Towards zero-shot metric 3d prediction from a single image. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 9043–9053, 2023. 1, 3

work page 2023

[61] [61]

Relative pose estimation through affine corrections of monocular depth priors.arXiv preprint arXiv:2501.05446, 2025

Yifan Yu, Shaohui Liu, Rémi Pautrat, Marc Pollefeys, and Viktor Larsson. Relative pose estimation through affine corrections of monocular depth priors.arXiv preprint arXiv:2501.05446, 2025

work page arXiv 2025

[62] [62]

Rsa: Resolving scale ambiguities in monocu- lar depth estimators through language descriptions.Advances in neural information processing systems, 37:112684–112705,

Ziyao Zeng, Yangchao Wu, Hyoungseob Park, Daniel Wang, Fengyu Yang, Stefano Soatto, Dong Lao, Byung-Woo Hong, and Alex Wong. Rsa: Resolving scale ambiguities in monocu- lar depth estimators through language descriptions.Advances in neural information processing systems, 37:112684–112705,

work page

[63] [63]

Completionformer: Depth completion with convolutions and vision transformers

Youmin Zhang, Xianda Guo, Matteo Poggi, Zheng Zhu, Guan Huang, and Stefano Mattoccia. Completionformer: Depth completion with convolutions and vision transformers. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18527–18536, 2023. 2

work page 2023

[64] [64]

To- wards better generalization: Joint depth-pose learning without posenet

Wang Zhao, Shaohui Liu, Yezhi Shu, and Yong-Jin Liu. To- wards better generalization: Joint depth-pose learning without posenet. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9151–9161,

work page

[65] [65]

2 11 Radar-Guided Polynomial Fitting for Metric Depth Estimation Supplementary Material nuScenes Method MAE RMSE DPT [42] 5188.2 6884.5 UniDepth [38] 2129.8 4887.7 Depth Anything [59] 2404.9 4851.1 Depth Pro [5] 3835.0 6600.3 RadarCam-Depth w/ DPT 1689.7 3948.0 RadarCam-Depth w/ UniDepth 1872.0 4321.2 RadarCam-Depth w/ Depth Anything 1953.6 5107.8 RadarCa...

work page arXiv 1953