pith. sign in

arxiv: 2503.17182 · v4 · submitted 2025-03-21 · 💻 cs.CV

Radar-Guided Polynomial Fitting for Metric Depth Estimation

Pith reviewed 2026-05-22 22:32 UTC · model grok-4.3

classification 💻 cs.CV
keywords depth estimationradar guidancepolynomial fittingmonocular depthmetric depthcomputer vision
0
0 comments X

The pith

Radar data predicts polynomial coefficients to convert scaleless monocular depth into metric depth by correcting misalignments between local regions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that pretrained monocular depth models often capture reasonable local structure within objects but misalign entire regions relative to each other, so a simple scale-and-shift is insufficient when three or more regions are present. It shows that cheap radar returns can be used to predict the coefficients of a polynomial that applies non-uniform corrections across depth ranges, introducing inflection points where needed. A first-derivative regularizer is added during training to keep the adjusted depth map locally monotonic. This approach is presented as more efficient than complex new architectures while producing metric outputs.

Core claim

POLAR uses radar to predict polynomial coefficients that transform scaleless depth predictions from pretrained MDE models into metric depth maps, generalizing beyond affine transformations by allowing inflection points, while a novel training objective enforces local monotonicity via first-derivative regularization.

What carries the argument

Radar-guided polynomial fitting that predicts coefficients to adaptively adjust depth predictions non-uniformly across ranges while preserving structural consistency.

If this is right

  • Metric depth can be obtained from any pretrained monocular model without retraining the model itself.
  • The method runs with lower latency and computational cost than competing radar or fusion approaches.
  • Performance gains of roughly 25 percent MAE and 33 percent RMSE appear on three standard datasets.
  • The framework extends naturally to any sensor that supplies sparse but reliable depth cues for coefficient prediction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same polynomial-correction idea could be tested with lidar or stereo cues in place of radar to see whether the benefit is specific to radar sparsity patterns.
  • If the number of inflection points is learned rather than fixed, the method might adapt automatically to scene complexity.
  • Failure cases may appear in highly dynamic scenes where radar returns lag the image frame, suggesting a need for temporal consistency terms.

Load-bearing premise

Radar returns contain enough information to predict polynomial coefficients that realign multiple mis-scaled local depth regions without creating new inconsistencies.

What would settle it

A controlled test on scenes with four or more distinct objects where the polynomial-adjusted depth map violates local monotonicity or increases error relative to ground-truth metric depths despite the regularizer.

Figures

Figures reproduced from arXiv: 2503.17182 by Alex Wong, Hyoungseob Park, Jeffrey Moon, Patrick Rim, Vadim Ezhov.

Figure 1
Figure 1. Figure 1: If an MDE model predicts incorrect relative depths [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Method Overview. POLAR transforms scaleless MDE predictions into metric depth using polynomial fitting guided by radar features. Learnable prototypes extract patterns in the configurations of radar point clouds and are used to aggregate spatially-informed radar features. The geometry-aware MDE features are fused with the radar features via a learnable soft-correspondence module to yield a unified scene rep… view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative results on nuScenes. GET-UP and RadarCam-Depth (RC-D) fail to reconstruct entire regions, yielding objects with large depth errors. Raw MDE yields reasonable relative reconstructions but suffers from incorrect global scale and cross-object misalignments. POLAR leverages polynomial fitting to recover a global scale and correct these misalignments. terms enable correction of misalignments in rela… view at source ↗
Figure 4
Figure 4. Figure 4: POLAR leverages spatial information from radar points to [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: To quantify absolute improvement in our qualitative [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Additional qualitative comparison on VoD. POLAR more accurately predicts metric scale, and exhibits fewer global misalign￾ments and geometric artifacts in comparison to other methods. Polynomial fitting, allowing POLAR to non-uniformly scale distinct local regions, results in notable improvements in metric depth estimation over initial MDE predictions. This is apparent in all MDE error maps: though object … view at source ↗
Figure 7
Figure 7. Figure 7: nuScenes dataset visualization. The elevation ambiguity of radar points results in erroneous projection onto the image plane that makes it challenging for depth completion methods to infer dense depth. In contrast, lidar points yield a denser, image-aligned projection, which is why accurate 3D scene reconstruction with depth completion methods is possible. dence, Tab. 12 shows that our method, evaluated ze… view at source ↗
read the original abstract

We propose POLAR, a novel radar-guided depth estimation method that introduces polynomial fitting to efficiently transform scaleless depth predictions from pretrained monocular depth estimation (MDE) models into metric depth maps. Unlike existing approaches that rely on complex architectures or expensive sensors, our method is grounded in a fundamental insight: although MDE models often infer reasonable local depth structure within each object or local region, they may misalign these regions relative to one another, making a linear scale and shift (affine) transformation insufficient given three or more of these regions. To address this limitation, we use polynomial coefficients predicted from cheap, ubiquitous radar data to adaptively adjust predictions non-uniformly across depth ranges. In this way, POLAR generalizes beyond affine transformations and is able to correct such misalignments by introducing inflection points. Importantly, our polynomial fitting framework preserves structural consistency through a novel training objective that enforces local monotonicity via first-derivative regularization. POLAR achieves state-of-the-art performance across three datasets, outperforming existing methods by an average of 24.9% in MAE and 33.2% in RMSE, while also achieving state-of-the-art efficiency in terms of latency and computational cost.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes POLAR, a radar-guided polynomial fitting method to convert scaleless depth predictions from pretrained monocular depth estimation models into metric depth maps. The key insight is that MDE models produce reasonable local structures but misalign multiple regions, requiring more than affine transformations; radar data is used to predict polynomial coefficients for non-uniform adjustments and inflection points, with a first-derivative regularization to enforce local monotonicity. The paper reports state-of-the-art results on three datasets with average gains of 24.9% in MAE and 33.2% in RMSE, plus efficiency advantages.

Significance. If the result holds, the significance is that the work offers an efficient radar-based approach to metric depth estimation that generalizes beyond affine transformations via polynomials, which could impact real-time applications in computer vision and robotics by reducing reliance on expensive sensors or complex models. The derivative regularizer for preserving local monotonicity is a constructive element if shown to be effective.

major comments (2)
  1. [Abstract] Abstract: The abstract states performance numbers and a training objective but supplies no derivation details, dataset statistics, ablation results, or error analysis; therefore the math and data cannot be verified to support the claim as stated. This is load-bearing for the SOTA performance claim.
  2. [Abstract] Abstract: The assumption that radar returns are sufficient to predict polynomial coefficients that correctly realign multiple mis-scaled local depth regions without introducing new inconsistencies or violating the local monotonicity enforced by the derivative regularizer is presented as the core mechanism but provides no quantitative check on residual violations or generalization beyond the training radar distribution. This is load-bearing for the central claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the two major comments point by point below, noting that the abstract serves as a high-level summary while detailed supporting material appears in the body of the paper.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The abstract states performance numbers and a training objective but supplies no derivation details, dataset statistics, ablation results, or error analysis; therefore the math and data cannot be verified to support the claim as stated. This is load-bearing for the SOTA performance claim.

    Authors: Abstracts conventionally provide concise overviews of contributions and results rather than full derivations or statistics. The polynomial fitting formulation, first-derivative regularization objective, and associated math are derived in Section 3. Dataset statistics (including radar point densities and scene distributions) appear in Section 4.1. Ablation studies isolating the regularization term and polynomial degree are in Section 5.2, while error analysis and per-region breakdown are in Section 5.3. These sections contain the equations, tables, and figures needed to verify the reported 24.9% MAE and 33.2% RMSE gains. No changes to the abstract itself are required to maintain its brevity and standard format. revision: no

  2. Referee: [Abstract] Abstract: The assumption that radar returns are sufficient to predict polynomial coefficients that correctly realign multiple mis-scaled local depth regions without introducing new inconsistencies or violating the local monotonicity enforced by the derivative regularizer is presented as the core mechanism but provides no quantitative check on residual violations or generalization beyond the training radar distribution. This is load-bearing for the central claim.

    Authors: The manuscript validates the core mechanism through multiple quantitative checks. Section 5.1 reports end-to-end metric depth accuracy gains over affine baselines across three datasets, with qualitative examples showing correction of multi-region misalignments via polynomial inflection points. Table 3 ablates the first-derivative regularizer and shows measurable reduction in local depth inversions. Cross-dataset evaluation (Section 5.4) tests generalization under varying radar characteristics. While an explicit auxiliary metric for residual monotonicity violations on held-out radar distributions is not tabulated, the consistent SOTA performance and ablation results provide indirect but substantive support; we can add a targeted residual-violation table in revision if the referee deems it necessary. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation uses external radar inputs for empirical gains

full rationale

The paper's core mechanism predicts polynomial coefficients from radar measurements to non-uniformly rescale monocular depth outputs, with performance evaluated empirically on three datasets. No equations, self-citations, or fitted parameters are shown in the provided text that reduce the claimed MAE/RMSE improvements or the polynomial transformation itself to quantities defined by the inputs or by construction. The regularizer and radar-to-coefficient mapping are presented as independent components relying on external sensor data rather than tautological redefinitions or renamings of known results.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on domain assumptions about monocular depth model behavior and radar utility that are not independently verified in the provided abstract; no free parameters or invented entities are explicitly quantified.

free parameters (2)
  • polynomial degree
    Degree chosen to allow inflection points while remaining stable; value not stated in abstract.
  • derivative regularization weight
    Weight balancing monotonicity against fitting accuracy; value not stated in abstract.
axioms (2)
  • domain assumption MDE models infer reasonable local depth structure but misalign regions relative to one another when three or more regions are present
    Explicitly stated as the fundamental insight grounding the method.
  • domain assumption Radar data can be used to predict polynomial coefficients that adaptively correct depth misalignments
    Core premise of the radar-guided fitting approach.

pith-pipeline@v0.9.0 · 5748 in / 1409 out tokens · 41736 ms · 2026-05-22T22:32:39.371594+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

65 extracted references · 65 canonical work pages · 2 internal anchors

  1. [1]

    Barlow, David J

    Richard E. Barlow, David J. Bartholomew, J. Martin Bremner, and H. D. Brunk.Statistical Inference Under Order Restric- tions: The Theory and Application of Isotonic Regression. John Wiley & Sons, New York, 1972. 7

  2. [2]

    Adabins: Depth estimation using adaptive bins

    Shariq Farooq Bhat, Ibraheem Alhashim, and Peter Wonka. Adabins: Depth estimation using adaptive bins. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4009–4018, 2021. 14

  3. [3]

    Unsuper- vised scale-consistent depth and ego-motion learning from monocular video.Advances in neural information processing systems, 32, 2019

    Jiawang Bian, Zhichao Li, Naiyan Wang, Huangying Zhan, Chunhua Shen, Ming-Ming Cheng, and Ian Reid. Unsuper- vised scale-consistent depth and ego-motion learning from monocular video.Advances in neural information processing systems, 32, 2019. 2

  4. [4]

    Bishop.Pattern Recognition and Machine Learning

    Christopher M. Bishop.Pattern Recognition and Machine Learning. Springer, New York, 2006. 7

  5. [5]

    Richter, and Vladlen Koltun

    Aleksei Bochkovskii, Amaël Delaunoy, Hugo Germain, Mar- cel Santos, Yichao Zhou, Stephan R. Richter, and Vladlen Koltun. Depth pro: Sharp monocular metric depth in less than a second, 2024. 1, 2, 12, 15

  6. [6]

    nuScenes: A multimodal dataset for autonomous driving

    Holger Caesar, Varun Bankiti, Alex H. Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. nuscenes: A mul- timodal dataset for autonomous driving.arXiv preprint arXiv:1903.11027, 2019. 13

  7. [7]

    Emerg- ing properties in self-supervised vision transformers

    Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers. InPro- ceedings of the IEEE/CVF international conference on com- puter vision, pages 9650–9660, 2021. 2

  8. [8]

    Uncle: Benchmarking unsupervised continual learning for depth completion.arXiv preprint arXiv:2410.18074, 2024

    Xien Chen, Rit Gangopadhyay, Michael Chu, Patrick Rim, Hyoungseob Park, and Alex Wong. Uncle: Benchmarking unsupervised continual learning for depth completion.arXiv preprint arXiv:2410.18074, 2024. 2

  9. [9]

    Vision transformer adapter for dense predictions,

    Zhe Chen, Yuchen Duan, Wenhai Wang, Junjun He, Tong Lu, Jifeng Dai, and Yu Qiao. Vision transformer adapter for dense predictions.arXiv preprint arXiv:2205.08534, 2022. 16

  10. [10]

    Fixing the scale and shift in monocular depth for camera pose estimation.arXiv preprint arXiv:2501.07742, 2025

    Yaqing Ding, Václav Vávra, Viktor Kocur, Jian Yang, Torsten Sattler, and Zuzana Kukelova. Fixing the scale and shift in monocular depth for camera pose estimation.arXiv preprint arXiv:2501.07742, 2025. 1, 3

  11. [11]

    An image is worth 16x16 words: Transform- ers for image recognition at scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, et al. An image is worth 16x16 words: Transform- ers for image recognition at scale. InInternational Conference on Learning Representations, 2021. 2

  12. [12]

    Toyota drivers’ experiences with dynamic radar cruise control, pre-collision system, and lane-keeping assist.Journal of safety research, 56:67–73, 2016

    Angela H Eichelberger and Anne T McCartt. Toyota drivers’ experiences with dynamic radar cruise control, pre-collision system, and lane-keeping assist.Journal of safety research, 56:67–73, 2016. 1, 2

  13. [13]

    All-day depth completion

    Vadim Ezhov, Hyoungseob Park, Zhaoyang Zhang, Rishi Upadhyay, Howard Zhang, Chethan Chinder Chandrappa, Achuta Kadambi, Yunhao Ba, Julie Dorsey, and Alex Wong. All-day depth completion. In2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE,

  14. [14]

    Deep learning for object detection and scene perception in self-driving cars: Survey, challenges, and open issues.Array, 10:100057, 2021

    Abhishek Gupta, Alagan Anpalagan, Ling Guan, and Ahmed Shaharyar Khwaja. Deep learning for object detection and scene perception in self-driving cars: Survey, challenges, and open issues.Array, 10:100057, 2021. 1

  15. [15]

    4d millimeter- wave radar in autonomous driving: A survey.arXiv preprint arXiv:2306.04242, 2023

    Zeyu Han, Jiahao Wang, Zikun Xu, Shuocheng Yang, Lei He, Shaobing Xu, Jianqiang Wang, and Keqiang Li. 4d millimeter- wave radar in autonomous driving: A survey.arXiv preprint arXiv:2306.04242, 2023. 1

  16. [16]

    Improving semi-supervised and domain-adaptive semantic segmentation with self-supervised depth estimation

    Lukas Hoyer, Dengxin Dai, Qin Wang, Yuhua Chen, and Luc Van Gool. Improving semi-supervised and domain-adaptive semantic segmentation with self-supervised depth estimation. International Journal of Computer Vision, pages 1–27, 2023. 2

  17. [17]

    Lora: Low-rank adaptation of large language models.ICLR, 1(2):3,

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3,

  18. [18]

    Penet: Towards precise and efficient image guided depth completion.2021 IEEE International Conference on Robotics and Automation (ICRA), pages 13656–13662, 2021

    Mu Hu, Shuling Wang, Bin Li, Shiyu Ning, Li Fan, and Xiao- jin Gong. Penet: Towards precise and efficient image guided depth completion.2021 IEEE International Conference on Robotics and Automation (ICRA), pages 13656–13662, 2021. 2

  19. [19]

    Depthcrafter: Generating consistent long depth sequences for open-world videos

    Wenbo Hu, Xiangjun Gao, Xiaoyu Li, Sijie Zhao, Xiaodong Cun, Yong Zhang, Long Quan, and Ying Shan. Depthcrafter: Generating consistent long depth sequences for open-world videos.arXiv preprint arXiv:2409.02095, 2024. 1, 3

  20. [20]

    Rad- cloud: Real-time high-resolution point cloud generation us- ing low-cost radars for aerial and ground vehicles

    David Hunt, Shaocheng Luo, Amir Khazraei, Xiao Zhang, Spencer Hallyburton, Tingjun Chen, and Miroslav Pajic. Rad- cloud: Real-time high-resolution point cloud generation us- ing low-cost radars for aerial and ground vehicles. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 12269–12275. IEEE, 2024. 2

  21. [21]

    Millimeter-wave microstrip array antenna for automotive radars.IEICE transactions on communications, 86(9):2728–2738, 2003

    Hideo Iizuka, Toshiaki Watanabe, Kazuo Sato, and Kunitoshi Nishikawa. Millimeter-wave microstrip array antenna for automotive radars.IEICE transactions on communications, 86(9):2728–2738, 2003. 1

  22. [22]

    Sparse and dense data with cnns: Depth completion and semantic segmentation

    Maximilian Jaritz, Raoul De Charette, Etienne Wirbel, Xavier Perrotton, and Fawzi Nashashibi. Sparse and dense data with cnns: Depth completion and semantic segmentation. In2018 International Conference on 3D Vision (3DV), pages 52–60. IEEE, 2018. 2

  23. [23]

    From big to small: Multi-scale local planar guidance for monocular depth estimation.arXiv preprint arXiv:1907.10326, 2019

    Jin Han Lee, Myung-Kyu Han, Dong Wook Ko, and Il Hong Suh. From big to small: Multi-scale local planar guidance for monocular depth estimation.arXiv preprint arXiv:1907.10326, 2019. 14

  24. [24]

    A multi-scale guided cascade hourglass network for depth completion

    Ang Li, Zejian Yuan, Yonggen Ling, Wanchao Chi, Sheng- hao Zhang, and Chong Zhang. A multi-scale guided cascade hourglass network for depth completion. In2020 IEEE Win- ter Conference on Applications of Computer Vision (WACV), pages 32–40, 2020. 2

  25. [25]

    Sparse beats dense: Rethinking supervision in radar-camera depth completion

    Huadong Li, Minhao Jing, Wang Jin, Shichao Dong, Jiajun Liang, Haoqiang Fan, and Renhe Ji. Sparse beats dense: Rethinking supervision in radar-camera depth completion. In European Conference on Computer Vision, pages 127–143. Springer, 2024. 3, 7, 8, 14

  26. [26]

    Radarcam-depth: Radar-camera fusion for 9 depth estimation with learned metric scale

    Han Li, Yukai Ma, Yaqing Gu, Kewei Hu, Yong Liu, and Xingxing Zuo. Radarcam-depth: Radar-camera fusion for 9 depth estimation with learned metric scale. In2024 IEEE In- ternational Conference on Robotics and Automation (ICRA), pages 10665–10672. IEEE, 2024. 2, 3, 4, 5, 7, 8, 13, 14

  27. [27]

    Depth estimation from monocular images and sparse radar data

    Juan-Ting Lin, Dengxin Dai, and Luc Van Gool. Depth estimation from monocular images and sparse radar data. In2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10233–10240. IEEE, 2020. 3, 8, 14

  28. [28]

    Lin, Tao Cheng, Qianglong Zhong, Wending Zhou, and Huanhuan Yang

    Yuan Qin. Lin, Tao Cheng, Qianglong Zhong, Wending Zhou, and Huanhuan Yang. Dynamic spatial propagation network for depth completion. InAAAI Conference on Artificial Intel- ligence, 2022. 2

  29. [29]

    Depth estimation from monocular images and sparse radar using deep ordinal regression network

    Chen-Chou Lo and Patrick Vandewalle. Depth estimation from monocular images and sparse radar using deep ordinal regression network. In2021 IEEE International Conference on Image Processing (ICIP), pages 3343–3347. IEEE, 2021. 3, 14

  30. [30]

    Full-velocity radar returns by radar-camera fusion

    Yunfei Long, Daniel Morris, Xiaoming Liu, Marcos Castro, Punarjay Chakravarty, and Praveen Narayanan. Full-velocity radar returns by radar-camera fusion. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 16198–16207, 2021. 3, 14

  31. [31]

    Sparse-to-dense: Depth prediction from sparse depth samples and a single image

    Fangchang Ma and Sertac Karaman. Sparse-to-dense: Depth prediction from sparse depth samples and a single image. In 2018 IEEE International Conference on Robotics and Au- tomation (ICRA), pages 4796–4803, 2018. 2

  32. [32]

    Fangchang Ma, Guilherme Venturelli Cavalheiro, and Sertac Karaman. Self-supervised sparse-to-dense: Self-supervised depth completion from lidar and monocular camera.2019 In- ternational Conference on Robotics and Automation (ICRA), pages 3288–3295, 2018. 2

  33. [33]

    Real- time navigation in 3d environments based on depth camera data

    Daniel Maier, Armin Hornung, and Maren Bennewitz. Real- time navigation in 3d environments based on depth camera data. In2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012), pages 692–697. IEEE,

  34. [34]

    K-radar: 4d radar object detection for autonomous driving in various weather conditions.Advances in Neural Information Processing Systems, 35:3819–3829, 2022

    Dong-Hee Paek, Seung-Hyun Kong, and Kevin Tirta Wijaya. K-radar: 4d radar object detection for autonomous driving in various weather conditions.Advances in Neural Information Processing Systems, 35:3819–3829, 2022. 2

  35. [35]

    Andras Palffy, Ewoud Pool, Srimannarayana Baratam, Julian F. P. Kooij, and Dariu M. Gavrila. Multi-class road user detection with 3+1d radar in the view-of-delft dataset.IEEE Robotics and Automation Letters, 7(2):4961–4968, 2022. 14

  36. [36]

    Non-local spatial propagation network for depth completion

    Jinsun Park, Kyungdon Joo, Zhe Hu, Chi Liu, and In So Kweon. Non-local spatial propagation network for depth completion. InEuropean Conference on Computer Vision,

  37. [37]

    P3depth: Monocular depth estimation with a piecewise planarity prior

    Vaishakh Patil, Christos Sakaridis, Alexander Liniger, and Luc Van Gool. P3depth: Monocular depth estimation with a piecewise planarity prior. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1610–1621, 2022. 14

  38. [38]

    UniDepthV2: Universal monocular metric depth estimation made simpler, 2025

    Luigi Piccinelli, Christos Sakaridis, Yung-Hsu Yang, Mat- tia Segu, Siyuan Li, Wim Abbeloos, and Luc Van Gool. UniDepthV2: Universal monocular metric depth estimation made simpler, 2025. 1, 2, 12, 15

  39. [39]

    On the uncertainty of self-supervised monocular depth estimation

    Matteo Poggi, Filippo Aleotti, Fabio Tosi, and Stefano Mat- toccia. On the uncertainty of self-supervised monocular depth estimation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3227–3237,

  40. [40]

    A survey on lidar scanning mechanisms.Electronics, 9(5):741, 2020

    Thinal Raj, Fazida Hanim Hashim, Aqilah Baseri Huddin, Mohd Faisal Ibrahim, and Aini Hussain. A survey on lidar scanning mechanisms.Electronics, 9(5):741, 2020. 1, 2

  41. [41]

    René Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, and Vladlen Koltun. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer.IEEE transactions on pattern analysis and machine intelligence, 44(3):1623–1637, 2020. 2

  42. [42]

    Vi- sion transformers for dense prediction

    René Ranftl, Alexey Bochkovskiy, and Vladlen Koltun. Vi- sion transformers for dense prediction. InProceedings of the IEEE/CVF international conference on computer vision, pages 12179–12188, 2021. 1, 12, 15

  43. [43]

    Guide- former: Transformers for image guided depth completion

    Kyeongha Rho, Jinsung Ha, and Youngjung Kim. Guide- former: Transformers for image guided depth completion. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6240–6249, 2022. 2

  44. [44]

    Protodepth: Unsupervised continual depth completion with prototypes.arXiv preprint arXiv:2503.12745, 2025

    Patrick Rim, Hyoungseob Park, S Gangopadhyay, Ziyao Zeng, Younjoon Chung, and Alex Wong. Protodepth: Unsupervised continual depth completion with prototypes.arXiv preprint arXiv:2503.12745, 2025. 2

  45. [45]

    Depth estimation from camera image and mmwave radar point cloud

    Akash Deep Singh, Yunhao Ba, Ankur Sarker, Howard Zhang, Achuta Kadambi, Stefano Soatto, Mani Srivastava, and Alex Wong. Depth estimation from camera image and mmwave radar point cloud. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 9275–9285, 2023. 2, 3, 4, 5, 7, 8, 14

  46. [46]

    Monocular depth estimation using laplacian pyramid-based depth resid- uals.IEEE transactions on circuits and systems for video technology, 31(11):4381–4393, 2021

    Minsoo Song, Seokjae Lim, and Wonjun Kim. Monocular depth estimation using laplacian pyramid-based depth resid- uals.IEEE transactions on circuits and systems for video technology, 31(11):4381–4393, 2021. 14

  47. [47]

    Unsupervised monocular estimation of depth and visual odometry uusing attention and depth-pose consistency loss.IEEE Transactions on Multime- dia, 2023

    Xiaogang Song, Haoyue Hu, Li Liang, Weiwei Shi, Guo Xie, Xiaofeng Lu, and Xinhong Hei. Unsupervised monocular estimation of depth and visual odometry uusing attention and depth-pose consistency loss.IEEE Transactions on Multime- dia, 2023. 2

  48. [48]

    Radars for au- tonomous driving: A review of deep learning methods and challenges.IEEE Access, 11:97147–97168, 2023

    Arvind Srivastav and Soumyajit Mandal. Radars for au- tonomous driving: A review of deep learning methods and challenges.IEEE Access, 11:97147–97168, 2023. 2

  49. [49]

    Cafnet: A confidence-driven framework for radar camera depth estimation

    Huawei Sun, Hao Feng, Julius Ott, Lorenzo Servadei, and Robert Wille. Cafnet: A confidence-driven framework for radar camera depth estimation. In2024 IEEE/RSJ Interna- tional Conference on Intelligent Robots and Systems (IROS), pages 2734–2740. IEEE, 2024. 3, 14

  50. [50]

    Get-up: Geometric-aware depth estimation with radar points upsampling

    Huawei Sun, Zixu Wang, Hao Feng, Julius Ott, Lorenzo Servadei, and Robert Wille. Get-up: Geometric-aware depth estimation with radar points upsampling. InProceedings of the Winter Conference on Applications of Computer Vision (WACV), pages 1850–1860, 2025. 3, 7, 8, 14

  51. [51]

    Bilateral propagation network for depth completion

    Jie Tang, Fei-Peng Tian, Boshi An, Jian Li, and Ping Tan. Bilateral propagation network for depth completion. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9763–9772, 2024. 14 10

  52. [52]

    Marigold-dc: Zero-shot monocular depth completion with guided diffusion.arXiv preprint arXiv:2412.13389, 2024

    Massimiliano Viola, Kevin Qu, Nando Metzger, Bingxin Ke, Alexander Becker, Konrad Schindler, and Anton Obukhov. Marigold-dc: Zero-shot monocular depth completion with guided diffusion.arXiv preprint arXiv:2412.13389, 2024. 1, 2, 3

  53. [53]

    Plug-and-Play: Improve Depth Estimation via Sparse Data Propagation

    Tsun-Hsuan Wang, Fu-En Wang, Juan-Ting Lin, Yi-Hsuan Tsai, Wei-Chen Chiu, and Min Sun. Plug-and-play: Improve depth estimation via sparse data propagation.arXiv preprint arXiv:1812.08350, 2018. 14

  54. [54]

    Tacodepth: Towards efficient radar-camera depth estimation with one-stage fusion

    Yiran Wang, Jiaqi Li, Chaoyi Hong, Ruibo Li, Liusheng Sun, Xiao Song, Zhe Wang, Zhiguo Cao, and Guosheng Lin. Tacodepth: Towards efficient radar-camera depth estimation with one-stage fusion. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 10523–10533,

  55. [55]

    Unsupervised depth completion from visual inertial odome- try.IEEE Robotics and Automation Letters, 5(2):1899–1906,

    Alex Wong, Xiaohan Fei, Stephanie Tsuei, and Stefano Soatto. Unsupervised depth completion from visual inertial odome- try.IEEE Robotics and Automation Letters, 5(2):1899–1906,

  56. [56]

    Quadric representations for lidar odometry, mapping and localization.IEEE Robotics and Automation Letters, 8 (8):5023–5030, 2023

    Chao Xia, Chenfeng Xu, Patrick Rim, Mingyu Ding, Nan- ning Zheng, Kurt Keutzer, Masayoshi Tomizuka, and Wei Zhan. Quadric representations for lidar odometry, mapping and localization.IEEE Robotics and Automation Letters, 8 (8):5023–5030, 2023. 14

  57. [57]

    Deformable spatial propagation networks for depth completion.2020 IEEE In- ternational Conference on Image Processing (ICIP), pages 913–917, 2020

    Zheyuan Xu, Hongche Yin, and Jian Yao. Deformable spatial propagation networks for depth completion.2020 IEEE In- ternational Conference on Image Processing (ICIP), pages 913–917, 2020. 2

  58. [58]

    Rignet: Repetitive image guided network for depth completion

    Zhiqiang Yan, Kun Wang, Xiang Li, Zhenyu Zhang, Baobei Xu, Jun Li, and Jian Yang. Rignet: Repetitive image guided network for depth completion. InEuropean Conference on Computer Vision, 2021. 2

  59. [59]

    Depth anything v2.Advances in Neural Information Processing Systems, 37: 21875–21911, 2024

    Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiao- gang Xu, Jiashi Feng, and Hengshuang Zhao. Depth anything v2.Advances in Neural Information Processing Systems, 37: 21875–21911, 2024. 1, 2, 12, 14, 15

  60. [60]

    Metric3d: Towards zero-shot metric 3d prediction from a single image

    Wei Yin, Chi Zhang, Hao Chen, Zhipeng Cai, Gang Yu, Kaix- uan Wang, Xiaozhi Chen, and Chunhua Shen. Metric3d: Towards zero-shot metric 3d prediction from a single image. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 9043–9053, 2023. 1, 3

  61. [61]

    Relative pose estimation through affine corrections of monocular depth priors.arXiv preprint arXiv:2501.05446, 2025

    Yifan Yu, Shaohui Liu, Rémi Pautrat, Marc Pollefeys, and Viktor Larsson. Relative pose estimation through affine corrections of monocular depth priors.arXiv preprint arXiv:2501.05446, 2025

  62. [62]

    Rsa: Resolving scale ambiguities in monocu- lar depth estimators through language descriptions.Advances in neural information processing systems, 37:112684–112705,

    Ziyao Zeng, Yangchao Wu, Hyoungseob Park, Daniel Wang, Fengyu Yang, Stefano Soatto, Dong Lao, Byung-Woo Hong, and Alex Wong. Rsa: Resolving scale ambiguities in monocu- lar depth estimators through language descriptions.Advances in neural information processing systems, 37:112684–112705,

  63. [63]

    Completionformer: Depth completion with convolutions and vision transformers

    Youmin Zhang, Xianda Guo, Matteo Poggi, Zheng Zhu, Guan Huang, and Stefano Mattoccia. Completionformer: Depth completion with convolutions and vision transformers. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18527–18536, 2023. 2

  64. [64]

    To- wards better generalization: Joint depth-pose learning without posenet

    Wang Zhao, Shaohui Liu, Yezhi Shu, and Yong-Jin Liu. To- wards better generalization: Joint depth-pose learning without posenet. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9151–9161,

  65. [65]

    2 11 Radar-Guided Polynomial Fitting for Metric Depth Estimation Supplementary Material nuScenes Method MAE RMSE DPT [42] 5188.2 6884.5 UniDepth [38] 2129.8 4887.7 Depth Anything [59] 2404.9 4851.1 Depth Pro [5] 3835.0 6600.3 RadarCam-Depth w/ DPT 1689.7 3948.0 RadarCam-Depth w/ UniDepth 1872.0 4321.2 RadarCam-Depth w/ Depth Anything 1953.6 5107.8 RadarCa...