arxiv: 2604.09352 · v1 · submitted 2026-04-10 · 💻 cs.CV

Recognition: unknown

LuMon: A Comprehensive Benchmark and Development Suite with Novel Datasets for Lunar Monocular Depth Estimation

Ayta\c{c} Sekmen , Fatih Emre Gunes , Furkan Horoz , H\"useyin Umut I\c{s}{\i}k , Mehmet Alp Ozaydin , Onur Altay Topaloglu , \c{S}ahin Umutcan \"Ust\"unda\c{s} , Yurdasen Alp Yeni

show 4 more authors

Halil Ersin Soken Erol Sahin Ramazan Gokberk Cinbis Sinan Kalkan

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:14 UTC · model grok-4.3

classification 💻 cs.CV

keywords monocular depth estimationlunar explorationdomain adaptationbenchmarkchang'e-3rover navigationsim-to-realextraterrestrial vision

0 comments

The pith

Monocular depth models show little gain on real lunar images after sim-to-real fine-tuning

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

LuMon introduces a benchmark suite for monocular depth estimation under lunar conditions. It supplies new datasets that include metric ground truth from Chang'e-3 stereo pairs and a dark analog environment. Systematic tests of existing networks expose consistent failures on harsh shadows, craters, rocks, and textureless surfaces. Fine-tuning a foundation model on synthetic data produces large gains on similar data but almost no improvement on authentic lunar scenes. The results identify a stubborn transfer gap that blocks reliable use of current methods for rover navigation.

Core claim

The paper establishes that sim-to-real domain adaptation for monocular depth estimation produces substantial improvements when tested on data similar to the fine-tuning set, yet delivers only marginal benefits when applied to real lunar imagery from Chang'e-3, thereby demonstrating a persistent cross-domain transfer gap that limits reliable autonomous navigation on the Moon.

What carries the argument

The LuMon framework consisting of novel datasets with stereo-derived metric depth maps from Chang'e-3 and CHERI, enabling systematic zero-shot and adapted evaluations of MDE architectures against lunar-specific challenges.

If this is right

Current state-of-the-art MDE networks cannot be directly deployed on lunar rovers without risking inaccurate depth maps in critical areas.
Domain adaptation techniques require significant improvements to bridge the gap between synthetic and real extraterrestrial data.
The benchmark highlights specific failure modes such as extreme shading and textureless regolith that future models must address.
Establishing this standard allows consistent comparison of new methods for planetary perception tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Incorporating physical models of lunar lighting into adaptation pipelines could reduce the observed transfer gap.
The new datasets enable development of networks that generalize to other planetary surfaces with similar regolith and illumination.
Limited real lunar data collected during early mission phases could support targeted fine-tuning to move beyond the current baseline.
Persistent monocular limitations may push designs toward multi-sensor fusion for safe rover operation.

Load-bearing premise

That the Chang'e-3 stereo data and CHERI analog supply accurate, representative metric ground truth that captures mission-critical lunar conditions such as extreme shading and textureless regolith.

What would settle it

A fine-tuned model achieving large accuracy gains on held-out real Chang'e-3 stereo pairs, compared with its zero-shot performance, would falsify the claim of a persistent transfer gap.

Figures

Figures reproduced from arXiv: 2604.09352 by Ayta\c{c} Sekmen, \c{S}ahin Umutcan \"Ust\"unda\c{s}, Erol Sahin, Fatih Emre Gunes, Furkan Horoz, Halil Ersin Soken, H\"useyin Umut I\c{s}{\i}k, Mehmet Alp Ozaydin, Onur Altay Topaloglu, Ramazan Gokberk Cinbis, Sinan Kalkan, Yurdasen Alp Yeni.

**Figure 1.** Figure 1: LuMon uses real and synthetic lunar surface datasets. We utilize leading metric and relative monocular depth estimation models [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Effect of training dataset scale for one dataset. (a) Scale [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: An example scene from Chang’e 3 (RGB GT), its ground-truth depth map (GT Depth), and the DepthAnything v2 predictions (base and fine-tuned). Note: Differences in visual boundaries between the RGB GT and depth maps result from the evaluation procedure (Section 3.5); where ground-truth maps and predictions are masked by maximum depth boundaries and invalid pixels. Additionally, the depth maps generated by th… view at source ↗

read the original abstract

Monocular Depth Estimation (MDE) is crucial for autonomous lunar rover navigation using electro-optical cameras. However, deploying terrestrial MDE networks to the Moon brings a severe domain gap due to harsh shadows, textureless regolith, and zero atmospheric scattering. Existing evaluations rely on analogs that fail to replicate these conditions and lack actual metric ground truth. To address this, we present LuMon, a comprehensive benchmarking framework to evaluate MDE methods for lunar exploration. We introduce novel datasets featuring high-quality stereo ground truth depth from the real Chang'e-3 mission and the CHERI dark analog dataset. Utilizing this framework, we conduct a systematic zero-shot evaluation of state-of-the-art architectures across synthetic, analog, and real datasets. We rigorously assess performance against mission critical challenges like craters, rocks, extreme shading, and varying depth ranges. Furthermore, we establish a sim-to-real domain adaptation baseline by fine tuning a foundation model on synthetic data. While this adaptation yields drastic in-domain performance gains, it exhibits minimal generalization to authentic lunar imagery, highlighting a persistent cross-domain transfer gap. Our extensive analysis reveals the inherent limitations of current networks and sets a standard foundation to guide future advancements in extraterrestrial perception and domain adaptation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LuMon supplies new real lunar stereo data and an analog set, plus a clear demo that sim-to-real fine-tuning still leaves a big gap on authentic imagery, but the stereo ground truth needs explicit validation checks.

read the letter

LuMon supplies new real lunar stereo data from Chang'e-3 and the CHERI dark analog set, and it shows that fine-tuning a foundation model on synthetic data produces big in-domain gains but almost no transfer to actual lunar scenes. That gap is the central takeaway and it lines up with the practical difficulties of shadows and textureless regolith. The paper does a clean job of running the same models zero-shot across synthetic, analog, and real data, then adding a straightforward sim-to-real baseline. It also flags mission-relevant failure modes like craters and varying depth ranges, which keeps the evaluation grounded. The datasets themselves are the concrete addition; prior work had analogs but not this combination of authentic stereo ground truth and systematic cross-domain testing. The main soft spot is the ground truth quality. Stereo matching on lunar imagery can produce errors exactly where the paper says the problem is worst—large textureless patches and strong shadows. The abstract calls the GT high-quality and metric, yet if the full paper lacks per-pixel , consistency checks across views, or external validation against laser altimetry or known features, then the reported minimal generalization could be inflated by noisy labels rather than model limits alone. The stress-test note on this point holds up unless the manuscript supplies those details. This paper is for the small set of people building perception systems for lunar or planetary rovers and for domain-adaptation researchers who want fresh test beds. A reader in either group will get usable datasets and a baseline protocol. It deserves peer review because the new data and the evaluation framework are real contributions that referees can check and improve, even if the GT validation section needs strengthening.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces LuMon, a benchmarking framework for monocular depth estimation (MDE) aimed at lunar rover navigation. It contributes two new datasets providing metric ground truth: stereo-derived depths from the Chang'e-3 mission and the CHERI dark analog dataset. The authors perform systematic zero-shot evaluations of state-of-the-art MDE architectures across synthetic, analog, and real lunar data, focusing on mission-critical challenges such as craters, rocks, extreme shading, and textureless regolith. They also establish a sim-to-real domain adaptation baseline by fine-tuning a foundation model on synthetic data, reporting drastic in-domain gains but minimal generalization to authentic lunar imagery and thereby highlighting a persistent cross-domain transfer gap.

Significance. If the ground-truth depths prove reliable, the work would usefully document the limitations of current terrestrial MDE models and domain-adaptation techniques under lunar conditions, while supplying publicly usable real-mission data. The emphasis on zero-shot transfer and the explicit identification of a generalization gap could guide future extraterrestrial perception research. However, the significance is tempered by the absence of any reported validation of the stereo ground truth.

major comments (2)

The abstract states that the Chang'e-3 stereo data supplies 'high-quality' metric ground truth, yet no quantitative error analysis, multi-view consistency checks, per-pixel confidence maps, or external validation (e.g., against laser altimetry or known crater depths) is described. In textureless, high-contrast lunar regions this omission directly undermines the reliability of all reported performance numbers and the central claim of a 'persistent cross-domain transfer gap'.
The headline result—that fine-tuning yields 'drastic in-domain performance gains' but 'minimal generalization'—is presented without accompanying numerical values, error bars, or statistical tests in the abstract. If the full manuscript likewise omits these details or fails to report them for the real lunar test set, the magnitude and robustness of the claimed generalization failure cannot be assessed.

minor comments (2)

The abstract would be strengthened by including at least one or two key quantitative metrics (e.g., RMSE or AbsRel on the real lunar set) to support the qualitative claims.
Clarify the exact procedure used to convert Chang'e-3 stereo disparities into metric depths, including any assumptions about camera calibration or baseline.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and valuable comments on our manuscript. We address each major comment below and outline the revisions we will make to strengthen the paper.

read point-by-point responses

Referee: The abstract states that the Chang'e-3 stereo data supplies 'high-quality' metric ground truth, yet no quantitative error analysis, multi-view consistency checks, per-pixel confidence maps, or external validation (e.g., against laser altimetry or known crater depths) is described. In textureless, high-contrast lunar regions this omission directly undermines the reliability of all reported performance numbers and the central claim of a 'persistent cross-domain transfer gap'.

Authors: We agree that providing explicit validation for the stereo-derived ground truth is essential to substantiate the 'high-quality' descriptor and the overall findings. The current version of the manuscript describes the dataset construction but does not include a dedicated error analysis. In the revised manuscript, we will add a new subsection under the dataset description that reports quantitative metrics such as reprojection error statistics, multi-view consistency measures, and comparisons with available external references like known crater dimensions from lunar orbiter data. This will directly address the concerns about reliability in challenging lunar regions and support the reported performance evaluations. revision: yes
Referee: The headline result—that fine-tuning yields 'drastic in-domain performance gains' but 'minimal generalization'—is presented without accompanying numerical values, error bars, or statistical tests in the abstract. If the full manuscript likewise omits these details or fails to report them for the real lunar test set, the magnitude and robustness of the claimed generalization failure cannot be assessed.

Authors: The full manuscript reports detailed quantitative results for all evaluations, including on the real lunar test set, in the experimental sections and associated tables. These include specific performance metrics (e.g., RMSE, δ1 accuracy) with comparisons between zero-shot and fine-tuned models, along with error bars from repeated experiments and statistical analysis. The abstract summarizes these findings at a high level without numbers to maintain brevity. To improve clarity, we will revise the abstract to include key numerical values illustrating the in-domain gains and the limited generalization, such as the percentage improvement in-domain versus the small change on real data. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical benchmarking with independent measurements

full rationale

The paper introduces new datasets with stereo-derived ground truth and reports direct empirical evaluations of MDE models in zero-shot and fine-tuned settings. Performance numbers are measured outcomes against the supplied GT rather than quantities derived by construction from fitted parameters or self-referential definitions. No mathematical derivation chain, uniqueness theorem, or ansatz is presented that reduces to the inputs; the domain-adaptation baseline is a standard procedure whose results are reported as observed rather than forced. The work is self-contained against external benchmarks and contains no load-bearing self-citation that substitutes for independent evidence.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Central claims depend on the unverified quality and representativeness of the two introduced datasets; no free parameters or invented entities are evident from the abstract.

axioms (2)

domain assumption Chang'e-3 stereo imagery supplies high-quality metric ground truth depth usable for monocular depth estimation benchmarking
Invoked when presenting the real lunar dataset as the source of authentic ground truth.
domain assumption The CHERI dark analog dataset sufficiently replicates lunar surface conditions including harsh shadows and textureless regolith
Used to bridge the gap between synthetic and real lunar evaluation.

pith-pipeline@v0.9.0 · 5600 in / 1278 out tokens · 46679 ms · 2026-05-10T17:14:45.195704+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references · 11 canonical work pages · 3 internal anchors

[1]

Icy moon surface simulation and stereo depth estimation for sampling autonomy

Ramchander Bhaskara, Georgios Georgakis, Jeremy Nash, Marissa Cameron, Joseph Bowkett, Adnan Ansar, Manoran- jan Majji, and Paul Backes. Icy moon surface simulation and stereo depth estimation for sampling autonomy. In2024 IEEE Aerospace Conference, page 1–16. IEEE, 2024. 2

2024
[2]

Digital lunar ex- ploration sites unreal simulation tool (dust)

Lee Bingham, Jack Kincaid, Benjamin Weno, Nicholas Davis, Eddie Paddock, and Cory Foreman. Digital lunar ex- ploration sites unreal simulation tool (dust). InProc. IEEE Aerospace Conf., pages 1–12, 2023. 2

2023
[3]

1–a model zoo for robust monocular relative depth estimation

Reiner Birkl, Diana Wofk, and Matthias M ¨uller. Midas v3.1 – a model zoo for robust monocular relative depth estimation. arXiv:2307.14460, 2023. 2, 3, 5, 6, 7, 4, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21

work page arXiv 2023
[4]

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

Aleksei Bochkovskii, Ama ¨el Delaunoy, Hugo Germain, Marcel Santos, Yichao Zhou, Stephan R Richter, and Vladlen Koltun. Depth pro: Sharp monocular metric depth in less than a second.arXiv:2410.02073, 2024. 1, 3, 5, 6, 7, 4, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21

work page internal anchor Pith review arXiv 2024
[5]

Deep learning-based modeling of high- resolution lunar topography using orbiter imagery

Hao Chen, Philipp Gl ¨aser, Xuanyu Hu, Konrad Willner, and J ¨urgen Oberst. Deep learning-based modeling of high- resolution lunar topography using orbiter imagery. In IGARSS 2024 - 2024 IEEE International Geoscience and Re- mote Sensing Symposium, pages 6084–6087, 2024. 2

2024
[6]

Elunardtmnet: Efficient reconstruction of high-resolution lunar dtm from single-view orbiter images

Hao Chen, Philipp Gl ¨aser, Xuanyu Hu, Konrad Willner, Yongjie Zheng, Friedrich Damme, Lorenzo Bruzzone, and J¨urgen Oberst. Elunardtmnet: Efficient reconstruction of high-resolution lunar dtm from single-view orbiter images. IEEE Transactions on Geoscience and Remote Sensing, 62: 1–20, 2024. 2

2024
[7]

Coloma, C

S. Coloma, C. Martinez, B.C. Yalc ¸ın, and M.A. Olivares- Mendez. Enhancing rover teleoperation on the moon with proprioceptive sensors and machine learning techniques. IEEE Robotics and Automation Letters, 7(4):11434–11441,
[8]

Depth map prediction from a single image using a multi-scale deep net- work

David Eigen, Christian Puhrsch, and Rob Fergus. Depth map prediction from a single image using a multi-scale deep net- work. InAdv. in Neural Inf. Proc. Systems, 2014. 2, 1

2014
[9]

Metricanything: Scaling metric depth pre- training with noisy heterogeneous sources.arXiv preprint,

Baorui Ma et al. Metricanything: Scaling metric depth pre- training with noisy heterogeneous sources.arXiv preprint,
[11]

arXiv:2501.12375 (2025)

Chen et al. Video depth anything: Consistent depth estima- tion for super-long videos.arXiv:2501.12375, 2025. 3, 5, 6, 7, 4, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21

work page arXiv 2025
[12]

Depth Anything 3: Recovering the Visual Space from Any Views

Haotong Lin et al. Depth anything 3: Recovering the visual space from any views.arXiv preprint arXiv:2511.10647,

work page internal anchor Pith review arXiv
[14]

Vision meets robotics: The kitti dataset.Int

Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. Vision meets robotics: The kitti dataset.Int. Jour- nal of Robotics Research (IJRR), 2013. 3

2013
[15]

The planetary stereo, solid-state lidar, inertial (s3li) dataset.IEEE Robotics and Automation Letters, 7(4):11434– 11441, 2022

Riccardo Giubilato, Fabian Bizjak, Martin V ¨olk, Sebastiano Chiodini, Marco Pertsch, Heiko Hirschm ¨uller, and Rudolph Triebel. The planetary stereo, solid-state lidar, inertial (s3li) dataset.IEEE Robotics and Automation Letters, 7(4):11434– 11441, 2022. 1, 3

2022
[16]

Riccardo Giubilato, Wolfgang St ¨urzl, Armin Wedler, and Rudolph Triebel. Challenges of slam in extremely unstruc- tured environments: The dlr planetary stereo, solid-state li- dar, inertial dataset.IEEE Robotics and Automation Letters, 7(4):8721–8728, 2022. 2

2022
[17]

Goldberg, M.W

S.B. Goldberg, M.W. Maimone, and L. Matthies. Stereo vision and rover navigation software for planetary explo- ration. InProceedings, IEEE Aerospace Conference, pages 5–5, 2002. 2

2002
[18]

Chang’e 3 Panoramic Cameras Dataset

Ground Research and Application System of China’s Lunar and Planetary Exploration Program. Chang’e 3 Panoramic Cameras Dataset. Data portal of the Ground Research and Application System (GRAS), 2020. Dataset. 1, 2, 3

2020
[19]

Lotus: Diffusion-based visual foundation model for high-quality dense prediction.Int

Jing He, Haodong Li, Wei Yin, Yixun Liang, Leheng Li, Kaiqiang Zhou, Hongbo Zhang, Bingbing Liu, and Ying- Cong Chen. Lotus: Diffusion-based visual foundation model for high-quality dense prediction.Int. Conf. Learn. Repre- sent., 2025. 3, 5, 6, 7, 4, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21

2025
[20]

Stereo processing by semiglobal match- ing and mutual information.IEEE Transactions on pattern analysis and machine intelligence, 30(2):328–341, 2008

Heiko Hirschm ¨uller. Stereo processing by semiglobal match- ing and mutual information.IEEE Transactions on pattern analysis and machine intelligence, 30(2):328–341, 2008. 3

2008
[21]

Metric3Dv2: A Versatile Monocular Geomet- ric Foundation Model for Zero-shot Metric Depth and Sur- face Normal Estimation.IEEE Trans

Mu Hu, Wei Yin, Chi Zhang, Zhipeng Cai, Xiaoxiao Long, Kaixuan Wang, Hao Chen, Gang Yu, Chunhua Shen, and Shaojie Shen. Metric3Dv2: A Versatile Monocular Geomet- ric Foundation Model for Zero-shot Metric Depth and Sur- face Normal Estimation.IEEE Trans. Pattern Anal. Mach. Intell., 46(12):10579–10596, 2024. 3, 5, 6, 7, 4, 8, 9, 10, 11, 12, 13, 14, 15, 16,...

2024
[22]

Depthcrafter: Generating consistent long depth sequences for open-world videos

Wenbo Hu, Xiangjun Gao, Xiaoyu Li, Sijie Zhao, Xiaodong Cun, Yong Zhang, Long Quan, and Ying Shan. Depthcrafter: Generating consistent long depth sequences for open-world videos. InIEEE Conf. Comput. Vis. Pattern Recog., 2025. 2, 3, 5, 6, 7, 4, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21

2025
[23]

Dispnet based stereo matching for planetary scene depth es- timation using remote sensing images

Qingling Jia, Xue Wan, Baoqin Hei, and Shengyang Li. Dispnet based stereo matching for planetary scene depth es- timation using remote sensing images. InIAPR Workshop on Pattern Recognition in Remote Sensing, pages 1–5, 2018. 2

2018
[24]

Repurpos- ing diffusion-based image generators for monocular depth estimation

Bingxin Ke, Anton Obukhov, Shengyu Huang, Nando Met- zger, Rodrigo Caye Daudt, and Konrad Schindler. Repurpos- ing diffusion-based image generators for monocular depth estimation. InIEEE Conf. Comput. Vis. Pattern Recog.,
[25]

1, 3, 5, 6, 7, 4, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 9
[26]

MapAnything: Universal Feed-Forward Metric 3D Reconstruction

Nikhil Keetha, Nils M ¨uller, Johannes Sch¨onberger, Lorenzo Porzi, Yinda Zhang, Tobias Fischer, Arno Knapitsch, David Zauss, Erik Weber, Nuno Antunes, Jonathon Luiten, Manuel Lopez-Antequera, Samuel Rota Bul `o, Christian Richardt, Deva Ramanan, Sebastian Scherer, and Peter Kontschieder. Mapanything: Universal feed-forward metric 3d reconstruc- tion.arXi...

work page internal anchor Pith review arXiv 2025
[27]

Monoc- ular depth estimation for autonomous driving based on in- stance clustering guidance

Dahyun Kim, Dongkwon Jin, and Chang-Su Kim. Monoc- ular depth estimation for autonomous driving based on in- stance clustering guidance. InAsia Pacific Signal and Infor- mation Processing Association Annual Summit and Confer- ence, pages 1–6, 2024. 1

2024
[28]

Evaluation of cnn-based single-image depth estimation methods, 2018

Tobias Koch, Lukas Liebel, Friedrich Fraundorfer, and Marco K¨orner. Evaluation of cnn-based single-image depth estimation methods, 2018. 1

2018
[29]

Grounding image matching in 3d with mast3r, 2024

Vincent Leroy, Yohann Cabon, and J´erˆome Revaud. Ground- ing image matching in 3d with mast3r.arXiV 2406.09756,

work page arXiv
[30]

High-precision disparity estimation for lunar scene using optimized census transform and superpixel refinement.Remote Sensing, 16 (21), 2024

Zhen Liang, Hongfeng Long, Zijian Zhu, Zifei Cao, Jinhui Yi, Yuebo Ma, Enhai Liu, and Rujin Zhao. High-precision disparity estimation for lunar scene using optimized census transform and superpixel refinement.Remote Sensing, 16 (21), 2024. 2

2024
[31]

Lusnar: A lunar segmentation, navigation and reconstruction dataset based on muti-sensor for autonomous exploration.arXiv:2407.06512, 2024

Jiayi Liu, Qianyu Zhang, Xue Wan, Shengyang Zhang, Yaolin Tian, Haodong Han, Yutao Zhao, Baichuan Liu, Zeyuan Zhao, and Xubo Luo. Lusnar: A lunar segmentation, navigation and reconstruction dataset based on muti-sensor for autonomous exploration.arXiv:2407.06512, 2024. 2, 3

work page arXiv 2024
[32]

Building a piece of the moon: Construction of two indoor lunar analogue environments

Philippe LUDIVIG, Abigail Calzada-Diaz, Miguel Angel OLIV ARES MENDEZ, Holger VOOS, and Julien Lamamy. Building a piece of the moon: Construction of two indoor lunar analogue environments. InProc. Int. Astronautical Congress, 2020. 2

2020
[33]

Indoor segmentation and support inference from rgbd images

Pushmeet Kohli Nathan Silberman, Derek Hoiem and Rob Fergus. Indoor segmentation and support inference from rgbd images. InEuropean Conf. on Computer Vision, 2012. 2

2012
[34]

The fourth monocular depth estimation challenge.arXiv preprint arXiv:2504.17787, 2025

Anton Obukhov, Matteo Poggi, Fabio Tosi, Ripu- daman Singh Arora, Jaime Spencer, Chris Russell, Si- mon Hadfield, Richard Bowden, Shuaihang Wang, Zhenxin Ma, Weijie Chen, Baobei Xu, Fengyu Sun, Di Xie, Jiang Zhu, Mykola Lavreniuk, Haining Guan, Qun Wu, Yupei Zeng, Chao Lu, Huanran Wang, Guangyuan Zhou, Hao- tian Zhang, Jianxiong Wang, Qiang Rao, et al. Th...

work page arXiv 2025
[35]

Lunar terrain depth mapping and 3d reconstruction: Benchmarking seven depth estima- tion models

Sharanya Patil, Sahini Reddy, Jaldu Asish, Ranjana Prakash, and Boppuru Rudra Prathap. Lunar terrain depth mapping and 3d reconstruction: Benchmarking seven depth estima- tion models. In2024 OITS International Conference on In- formation Technology (OCIT), pages 211–216, 2024. 2

2024
[36]

Unidepthv2: Universal monocular metric depth estimation made simpler

Luigi Piccinelli, Christos Sakaridis, Yung-Hsu Yang, Mat- tia Segu, Siyuan Li, Wim Abbeloos, and Luc Van Gool. Unidepthv2: Universal monocular metric depth estimation made simpler. InarXiv preprint arXiv:2502.20110, 2025. 2, 3, 5, 6, 7, 4, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21

work page arXiv 2025
[37]

Lunarsim: Lunar rover simulator focused on high visual fidelity and ros 2 integration for advanced com- puter vision algorithm development.Applied Sciences, 13 (22), 2023

Dominik Pieczy ´nski, Bartosz Ptak, Marek Kraft, and Paweł Drapikowski. Lunarsim: Lunar rover simulator focused on high visual fidelity and ros 2 integration for advanced com- puter vision algorithm development.Applied Sciences, 13 (22), 2023. 2

2023
[38]

Dsmnet: Deep high-precision 3d surface modeling from sparse point cloud frames, 2023

Changjie Qiu, Zhiyong Wang, Xiuhong Lin, Yu Zang, Cheng Wang, and Weiquan Liu. Dsmnet: Deep high-precision 3d surface modeling from sparse point cloud frames, 2023. 2

2023
[39]

Deep learning- based depth estimation methods from monocular image and videos: A comprehensive survey.ACM Computing Surveys, 56(12):1–51, 2024

Uchitha Rajapaksha, Ferdous Sohel, Hamid Laga, Dean Diepeveen, and Mohammed Bennamoun. Deep learning- based depth estimation methods from monocular image and videos: A comprehensive survey.ACM Computing Surveys, 56(12):1–51, 2024. 2

2024
[40]

Orb: An efficient alternative to sift or surf

Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. Orb: An efficient alternative to sift or surf. InPro- ceedings of the IEEE International Conference on Computer Vision (ICCV), pages 2564–2571. IEEE, 2011. 3

2011
[41]

Ashutosh Saxena, Min Sun, and Andrew Y . Ng. Make3d: Learning 3d scene structure from a single still image.IEEE Trans. Pattern Anal. Mach. Intell., 31(5):824–840, 2009. 2

2009
[42]

Sch¨onberger, Silvano Galliani, Torsten Sattler, Konrad Schindler, Marc Pollefeys, and An- dreas Geiger

Thomas Sch ¨ops, Johannes L. Sch¨onberger, Silvano Galliani, Torsten Sattler, Konrad Schindler, Marc Pollefeys, and An- dreas Geiger. A multi-view stereo benchmark with high- resolution images and multi-camera videos. InIEEE Conf. Comput. Vis. Pattern Recog., 2017. 2

2017
[43]

Depth anything at any condition.arXiv preprint arXiv:2507.01634,

Boyuan Sun, Modi Jin, Bowen Yin, and Qibin Hou. Depth anything at any condition.arXiv preprint arXiv:2507.01634,

work page arXiv
[44]

3, 5, 6, 7, 4, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21
[45]

Datasets of long range navigation experiments in a moon analogue environment on mount etna

Mallikarjuna Vayugundla, Florian Steidle, Michal Smisek, Martin Schuster, Kristin Bussmann, and Armin Wedler. Datasets of long range navigation experiments in a moon analogue environment on mount etna. InISR 2018; 50th International Symposium on Robotics, pages 1–7, 2018. 2

2018
[46]

Schuster, Kramer Bussmann, and Armin Wedler

Mallikarjuna Vayugundla, Florian Steidle, Michal Sm ´ıˇsek, Martin J. Schuster, Kramer Bussmann, and Armin Wedler. Datasets of long range navigation experiments in a moon analogue environment on mount etna. InISR 2018; 50th International Symposium on Robotics, pages 1–7, Munich, Germany, 2018. 1, 3

2018
[47]

Posediffusion: Solving pose estimation via diffusion-aided bundle adjustment, 2024

Jianyuan Wang, Christian Rupprecht, and David Novotny. Posediffusion: Solving pose estimation via diffusion-aided bundle adjustment, 2024. 7

2024
[48]

Moge-2: Accurate monocular geometry with metric scale and sharp details, 2025

Ruicheng Wang, Sicheng Xu, Yue Dong, Yu Deng, Jianfeng Xiang, Zelong Lv, Guangzhong Sun, Xin Tong, and Jiaolong Yang. Moge-2: Accurate monocular geometry with metric scale and sharp details, 2025. 3, 5, 6, 7, 4, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21

2025
[49]

FoundationStereo: Zero-Shot Stereo Matching, April 2025

Bowen Wen, Matthew Trepte, Joseph Aribido, Jan Kautz, Orazio Gallo, and Stan Birchfield. Foundationstereo: Zero- shot stereo matching.arXiv preprint arXiv:2501.09898,

work page arXiv
[50]

Michael Furlong, Matt Deans, and Terry Fong

Uland Wong, Ara Nefian, Larry Edwards, Xavier Buoys- sounouse, P. Michael Furlong, Matt Deans, and Terry Fong. Polar optical lunar analog reconstruction (POLAR) stereo dataset. NASA Ames Research Center, 2017. 2 10

2017
[51]

Iterative geometry encoding volume for stereo matching,

Gangwei Xu, Xianqi Wang, Xiaohuan Ding, and Xin Yang. Iterative geometry encoding volume for stereo matching,
[52]

Feng Xue, Guirong Zhuo, Ziyuan Huang, Wufei Fu, Zhuoyue Wu, and Marcelo H. Ang. Toward hierarchical self-supervised monocular absolute depth estimation for au- tonomous driving applications. InIEEE Int. Conf. Intelligent Robots and Systems, 2020. 1

2020
[53]

Depth any- thing v2.Adv

Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiao- gang Xu, Jiashi Feng, and Hengshuang Zhao. Depth any- thing v2.Adv. in Neural Inf. Proc. Systems, 2024. 1, 2, 3, 5, 6, 7, 4, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21

2024
[54]

Relative pose estimation through affine cor- rections of monocular depth priors

Yifan Yu, Shaohui Liu, R ´emi Pautrat, Marc Pollefeys, and Viktor Larsson. Relative pose estimation through affine cor- rections of monocular depth priors. InIEEE Conf. Comput. Vis. Pattern Recog., 2025. 7, 2

2025
[55]

Monocular depth estimation for drone obstacle avoidance in indoor environments

Haokun Zheng, Sidhant Rajadnya, and Avideh Zakhor. Monocular depth estimation for drone obstacle avoidance in indoor environments. In2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10027–10034, 2024. 1 11 LuMon: A Comprehensive Benchmark and Development Suite with Novel Datasets for Lunar Monocular Depth Estimation S...

2024