ShotcreteDepth: A Bi-modal Dataset for Robust Robotic Depth Perception in Shotcrete Construction Environments

Jakub Gregorek; Jonas Flink Bentzen; Lars Arnold Dethlefsen; Lazaros Nalpantidis; Mads Essenb{\ae}k; Patrick Schmidt

arxiv: 2606.23152 · v1 · pith:4ZL4MUFUnew · submitted 2026-06-22 · 💻 cs.RO

ShotcreteDepth: A Bi-modal Dataset for Robust Robotic Depth Perception in Shotcrete Construction Environments

Jakub Gregorek , Lars Arnold Dethlefsen , Patrick Schmidt , Mads Essenb{\ae}k , Jonas Flink Bentzen , Lazaros Nalpantidis This is my paper

Pith reviewed 2026-06-26 08:31 UTC · model grok-4.3

classification 💻 cs.RO

keywords ShotcreteDepthdatasetdepth estimationstereo matchingLiDARconstruction roboticsdepth completionshotcrete

0 comments

The pith

ShotcreteDepth provides a bi-modal dataset of synchronized stereo RGB and LiDAR data from active shotcrete construction under high turbidity and poor illumination.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ShotcreteDepth to supply realistic sensor data for testing robotic depth perception systems in construction settings. It collects 11,252 temporally synchronized stereo images and LiDAR point clouds from both active shotcreting processes and general sites, where turbidity and low light create noisy, incomplete measurements. The release includes a lightweight annotation tool for LiDAR and 220 annotated samples to support evaluation. This setup allows work on stereo matching, depth completion, and depth estimation that matches industrial operating conditions.

Core claim

The paper establishes the ShotcreteDepth dataset as a collection of 11,252 temporally synchronized stereo RGB images and LiDAR point clouds acquired in real-world shotcrete construction environments that feature high turbidity and poor illumination, accompanied by a lightweight annotation tool for LiDAR point clouds and 220 annotated samples for evaluation in stereo matching, depth completion, and depth estimation tasks.

What carries the argument

The ShotcreteDepth bi-modal dataset of stereo RGB imagery paired with LiDAR point clouds collected under harsh construction conditions.

If this is right

Stereo matching algorithms can be evaluated on imagery degraded by construction turbidity.
Depth completion techniques gain a testbed for recovering structure from sparse, noisy LiDAR returns in low light.
Depth estimation research obtains examples that reflect the incomplete observations typical of industrial robotics.
Autonomous construction systems can be trained and validated against data that includes both active spraying and static site conditions.
The annotation tool enables rapid expansion of labeled point clouds for further experiments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same data collection approach could be replicated for other dusty or low-visibility industrial tasks such as mining or tunneling.
Models pretrained on general outdoor datasets may require fine-tuning on this data to handle domain-specific noise patterns.
The dataset highlights sensor-fusion needs that could drive new hardware designs for construction robots.
Release of the annotation tool may accelerate labeling efforts in other point-cloud-heavy robotics domains.

Load-bearing premise

The collected stereo RGB imagery and LiDAR point clouds are temporally synchronized and accurately represent the high turbidity and poor illumination of active construction environments.

What would settle it

If depth estimation or completion models that succeed on the 220 annotated samples show no performance gain when deployed on independent recordings from the same shotcrete sites, the dataset's claimed representativeness would be refuted.

Figures

Figures reproduced from arXiv: 2606.23152 by Jakub Gregorek, Jonas Flink Bentzen, Lars Arnold Dethlefsen, Lazaros Nalpantidis, Mads Essenb{\ae}k, Patrick Schmidt.

**Figure 1.** Figure 1: Data samples from our ShotcreteDepth dataset. RGB images originate from Roboception rc visard 160c camera, the disparity and confidence is computed internally by the camera, and point cloud from the Velodyne LiDAR projected to the image plane. Red color denotes the smallest and blue the largest values. Abstract— We introduce ShotcreteDepth, a bi-modal dataset from the construction domain that captures both… view at source ↗

**Figure 3.** Figure 3: Our 3D printed dust-proof sensor housing containing [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: The Annotation Tool we are releasing with the dataset. Upper image: 3D view of the LiDAR point cloud. Lower image: point cloud overlaid on top of the left camera image. The green points are “kept” while the purple color denotes the dust cloud which is to be excluded from evaluation data. point contained in the sliding window by a given threshold are removed. Considering the lower amount of LiDAR scan line… view at source ↗

**Figure 5.** Figure 5: Comparing stereo matching methods. From top to bottom: left RGB image, disparity maps computed by rc visard stereo matching, RAFT-Stereo [40], FoundationStereo [60] and Stereo Anywhere [4]. TABLE III: Evaluation of three depth completion methods: Marigold-DC [56] for a single run (1 st value) and ensemble of 10 runs (2 nd value), Marigold-SSD [17] and VPP4DC [3]. Metrics Methods Marigold-DC Marigold-SSD … view at source ↗

**Figure 6.** Figure 6: Qualitative results for depth completion methods. (a) RGB (b) Marigold-E2E (c) Depth Anything v3 (d) MoGe-2 [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: Qualitative results for depth estimation methods. V. DISCUSSION AND CONCLUSION Our experiments in Sec. IV, have shown that depth perception is indeed possible in the challenging conditions of shotcreting environments. Each one of the tested approaches—3 stereo matching approaches, 3 depth completion methods and 3 depth estimation methods—has exhibited merits in the corresponding task. While computation… view at source ↗

read the original abstract

We introduce ShotcreteDepth, a bi-modal dataset from the construction domain that captures both an active shotcreting process and general construction environments. The dataset comprises stereo RGB imagery and LiDAR point clouds acquired under harsh real-world conditions, including high turbidity and poor illumination. Such conditions adversely affect sensor measurements, leading to incomplete and noisy observations that pose significant challenges for perception systems in autonomous applications. Alongside the dataset, we release a lightweight annotation tool designed for time-efficient labeling of LiDAR point clouds. ShotcreteDepth consists of 11,252 temporally synchronized data samples, of which 220 are annotated for evaluation purposes. The dataset supports research in stereo matching, depth completion, and depth estimation under conditions that closely reflect the operational complexities found in industrial settings. Project repository: https://github.com/dtu-pas/shotcrete-depth

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a basic dataset release for depth perception in shotcrete and construction scenes, but the abstract gives no numbers or methods to back up the synchronization and harsh-condition claims.

read the letter

The paper's main contribution is ShotcreteDepth, a collection of 11,252 stereo RGB and LiDAR pairs from active shotcrete sites and other construction areas. It also ships a simple annotation tool for the point clouds. That fills a narrow gap—most public depth datasets skip these dusty, poorly lit industrial settings—so the release itself is the new part.

What works is the scale and the domain focus. Eleven thousand synchronized samples is enough to train or test stereo matching and depth completion models, and the 220 annotated frames give a starting point for evaluation. Releasing the tool lowers the barrier for others who want to add labels.

The soft spot is the missing verification. The abstract states the data were captured under high turbidity and poor illumination and that the pairs are temporally synchronized, yet it reports no lux readings, turbidity measurements, hardware trigger details, or measured time offsets. Without those, it is hard to know how faithfully the data match the claimed conditions. The small annotated set also means most users will still need to label a lot themselves.

This is for people already working on construction robotics or robust depth estimation who need domain-specific data. If the full paper or repo adds the collection logs and metrics, the dataset could support incremental experiments. I would send it to peer review because dataset papers in applied robotics often need referee input on documentation and utility even when the core claim is just the release.

Referee Report

1 major / 0 minor

Summary. The manuscript introduces ShotcreteDepth, a bi-modal dataset comprising 11,252 temporally synchronized stereo RGB images and LiDAR point clouds captured in shotcrete construction environments (including active shotcreting) under harsh conditions of high turbidity and poor illumination, along with 220 annotated samples and a lightweight LiDAR annotation tool. The dataset is positioned to support research in stereo matching, depth completion, and depth estimation for industrial robotic applications.

Significance. Release of synchronized multi-modal sensor data from a real industrial construction domain, together with an annotation tool, would address a gap in publicly available datasets for perception under challenging conditions; if the synchronization and environmental fidelity claims are substantiated, the resource could enable targeted algorithm development and benchmarking for autonomous systems in similar settings.

major comments (1)

[Abstract] Abstract: The central claims that the 11,252 samples are 'temporally synchronized' and were 'acquired under harsh real-world conditions, including high turbidity and poor illumination' are load-bearing for all stated use cases (stereo matching, depth completion, depth estimation), yet the manuscript provides no description of the synchronization hardware or protocol, no measured time-offset statistics, and no quantitative environmental metrics (turbidity, lux, or equivalent) to confirm the conditions across the collection.

Simulated Author's Rebuttal

1 responses · 2 unresolved

We thank the referee for the constructive feedback on our manuscript. The recommendation for major revision is noted, and we address the concerns regarding substantiation of the synchronization and environmental condition claims below. We will revise the manuscript to strengthen these aspects where feasible.

read point-by-point responses

Referee: [Abstract] Abstract: The central claims that the 11,252 samples are 'temporally synchronized' and were 'acquired under harsh real-world conditions, including high turbidity and poor illumination' are load-bearing for all stated use cases (stereo matching, depth completion, depth estimation), yet the manuscript provides no description of the synchronization hardware or protocol, no measured time-offset statistics, and no quantitative environmental metrics (turbidity, lux, or equivalent) to confirm the conditions across the collection.

Authors: We agree that additional details are needed to support these claims. In the revised manuscript, we will expand the methods section to describe the synchronization hardware (including the specific trigger mechanism and cabling) and the protocol employed to achieve temporal alignment between the stereo RGB cameras and LiDAR sensor. We will also provide any available supporting information on the collection setup. However, time-offset statistics were not measured during acquisition, and quantitative environmental metrics such as turbidity or lux values were not recorded. We will explicitly note these limitations and enhance the qualitative description of the harsh conditions based on the operational context of active shotcreting. revision: partial

standing simulated objections not resolved

Provision of measured time-offset statistics for synchronization, as these data were not collected during the original dataset acquisition.
Provision of quantitative environmental metrics (turbidity, lux, or equivalent), as these were not recorded during data collection.

Circularity Check

0 steps flagged

No circularity: dataset release paper with no derivations

full rationale

This manuscript introduces and describes a new bi-modal dataset (ShotcreteDepth) consisting of synchronized stereo RGB and LiDAR samples collected in construction environments. It contains no equations, no fitted parameters, no predictions derived from models, and no derivation chain of any kind. All content is descriptive reporting of data acquisition, annotation, and intended uses for downstream tasks such as stereo matching. No self-citations, ansatzes, or uniqueness claims appear that could create circularity. The central claims rest on the existence and properties of the released data itself rather than any reduction to prior inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Dataset release paper with no mathematical modeling; relies only on standard assumptions about sensor data acquisition and synchronization.

axioms (1)

domain assumption Stereo RGB imagery and LiDAR point clouds can be temporally synchronized under the described harsh construction conditions.
Invoked implicitly when stating that the 11,252 samples are temporally synchronized.

pith-pipeline@v0.9.1-grok · 5698 in / 1275 out tokens · 21405 ms · 2026-06-26T08:31:46.665269+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

72 extracted references · 2 canonical work pages · 2 internal anchors

[1]

Ancuti, Cosmin Ancuti, Mateu Sbert, and Radu Timofte

Codruta O. Ancuti, Cosmin Ancuti, Mateu Sbert, and Radu Timofte. Dense-Haze: A Benchmark for Image Dehazing with Dense-Haze and Haze-Free Images. In2019 IEEE International Conference on Image Processing (ICIP), pages 1014–1018, 2019

2019
[2]

Robot active vision-based path planning for localization improvement in indoor environments

Sotirios Barlakas, Dimitrios Alexiou, Kosmas Tsiakas, Dimitrios Kat- satos, Ioannis Kostavelis, Dimitrios Giakoumis, Antonios Gasteratos, and Dimitrios Tzovaras. Robot active vision-based path planning for localization improvement in indoor environments. In2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10686–10693. ...

2024
[3]

Revisiting Depth Completion from a Stereo Matching Perspective for Cross-domain Generalization

Luca Bartolomei, Matteo Poggi, Andrea Conti, Fabio Tosi, and Stefano Mattoccia. Revisiting Depth Completion from a Stereo Matching Perspective for Cross-domain Generalization . In2024 International Conference on 3D Vision (3DV), pages 1360–1370, Los Alamitos, CA, USA, Mar. 2024. IEEE Computer Society

2024
[4]

Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail

Luca Bartolomei, Fabio Tosi, Matteo Poggi, and Stefano Mattoccia. Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail. pages 1013–1027, 2025

2025
[5]

ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth, 2023

Shariq Farooq Bhat, Reiner Birkl, Diana Wofk, Peter Wonka, and Matthias M¨uller. ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth, 2023

2023
[6]

Seeing Through Fog Without Seeing Fog: Deep Multimodal Sensor Fusion in Unseen Adverse Weather

Mario Bijelic, Tobias Gruber, Fahim Mannan, Florian Kraus, Werner Ritter, Klaus Dietmayer, and Felix Heide. Seeing Through Fog Without Seeing Fog: Deep Multimodal Sensor Fusion in Unseen Adverse Weather. InThe IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2020

2020
[7]

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

Aleksei Bochkovskii, Ama ¨el Delaunoy, Hugo Germain, Marcel Santos, Yichao Zhou, Stephan R Richter, and Vladlen Koltun. Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.arXiv preprint arXiv:2410.02073, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[8]

MoCha-Stereo: Motif Channel Attention Network for Stereo Matching

Ziyang Chen, Wei Long, He Yao, Yongjun Zhang, Bingshu Wang, Yongbin Qin, and Jia Wu. MoCha-Stereo: Motif Channel Attention Network for Stereo Matching. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 27768–27777, June 2024

2024
[9]

Unsupervised confidence for LiDAR depth maps and applications

Andrea Conti, Matteo Poggi, Filippo Aleotti, and Stefano Mattoccia. Unsupervised confidence for LiDAR depth maps and applications. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 8352–8359, 2022

2022
[10]

Deformation measurement of tunnel shotcrete liner using the multiepoch LiDAR point clouds.Jour- nal of Construction Engineering and Management, 150(6):04024049, 2024

Li-Zhuang Cui, Jian Liu, Hongzheng Luo, Jianhong Wang, Xiao Zhang, Gaohang Lv, and Quanyi Xie. Deformation measurement of tunnel shotcrete liner using the multiepoch LiDAR point clouds.Jour- nal of Construction Engineering and Management, 150(6):04024049, 2024

2024
[11]

A virtual construction vehicles and workers dataset with three-dimensional annotations.Engineering Applications of Artificial Intelligence, 133:107964, 2024

Yuexiong Ding and Xiaowei Luo. A virtual construction vehicles and workers dataset with three-dimensional annotations.Engineering Applications of Artificial Intelligence, 133:107964, 2024

2024
[12]

AREPO: Uncertainty-Aware Robot Ensemble Learning Under Ex- treme Partial Observability.IEEE Robotics and Automation Letters, 10(6):5737–5744, 2025

Yurui Du, Louis Hanut, Herman Bruyninckx, and Renaud Detry. AREPO: Uncertainty-Aware Robot Ensemble Learning Under Ex- treme Partial Observability.IEEE Robotics and Automation Letters, 10(6):5737–5744, 2025

2025
[13]

Indoor FireRes- cue Radar: 4D Indoor Millimeter Wave Dataset and Analysis for Hazardous Environment Perception

Kangkang Duan, Zehao Zhu, and Zhengbo Zou. Indoor FireRes- cue Radar: 4D Indoor Millimeter Wave Dataset and Analysis for Hazardous Environment Perception. In2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 18620– 18627, 2025

2025
[14]

Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Gonzalo Martin Garcia, Karim Abou Zeid, Christian Schmidt, Daan de Geus, Alexander Hermans, and Bastian Leibe. Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think. In Proceedings of the Winter Conference on Applications of Computer Vision (WACV), pages 753–762, February 2025

2025
[15]

Vision meets Robotics: The KITTI Dataset.International Journal of Robotics Research (IJRR), 2013

Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. Vision meets Robotics: The KITTI Dataset.International Journal of Robotics Research (IJRR), 2013

2013
[16]

SteeredMarigold: Steering Diffusion Towards Depth Completion of Largely Incomplete Depth Maps

Jakub Gregorek and Lazaros Nalpantidis. SteeredMarigold: Steering Diffusion Towards Depth Completion of Largely Incomplete Depth Maps. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 13304–13311, 2025

2025
[17]

Need for Speed: Zero-Shot Depth Completion with Single-Step Diffusion, 2026

Jakub Gregorek, Paraskevas Pegios, Nando Metzger, Konrad Schindler, Theodora Kontogianni, and Lazaros Nalpantidis. Need for Speed: Zero-Shot Depth Completion with Single-Step Diffusion, 2026

2026
[18]

DepthFM: Fast Generative Monocular Depth Estimation with Flow Matching

Ming Gui, Johannes Schusterbauer, Ulrich Prestel, Pingchuan Ma, Dmytro Kotovenko, Olga Grebenkova, Stefan Andreas Baumann, Vincent Tao Hu, and Bj ¨orn Ommer. DepthFM: Fast Generative Monocular Depth Estimation with Flow Matching. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 3203–3211, 2025

2025
[19]

Robotic Framework for Iterative and Adaptive Profile Grading of Sand

Louis Hanut, Yurui Du, Andrew Vande Moere, Renaud Detry, and Herman Bruyninckx. Robotic Framework for Iterative and Adaptive Profile Grading of Sand. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 10387–10393, 2025

2025
[20]

Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction

Jing He, Haodong Li, Wei Yin, Yixun Liang, Leheng Li, Kaiqiang Zhou, Hongbo Zhang, Bingbing Liu, and Ying-Cong Chen. Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction. InThe Thirteenth International Conference on Learning Representations, 2025

2025
[21]

Hirschmuller

H. Hirschmuller. Accurate and efficient stereo processing by semi- global matching and mutual information. In2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 2, pages 807–814 vol. 2, 2005

2005
[22]

BuildingWorld: A Structured 3D Building Dataset for Urban Foundation Mod- els.Proceedings of the AAAI Conference on Artificial Intelligence, 40(7):5085–5094, Mar

Shangfeng Huang, Ruisheng Wang, and Xin Wang. BuildingWorld: A Structured 3D Building Dataset for Urban Foundation Mod- els.Proceedings of the AAAI Conference on Artificial Intelligence, 40(7):5085–5094, Mar. 2026

2026
[23]

Zero-shot Depth Completion via Test-time Alignment with Affine-invariant Depth Prior.Proceedings of the AAAI Conference on Artificial Intelligence, 39(4):3877–3885, Apr

Lee Hyoseok, Kyeong Seon Kim, Kwon Byung-Ki, and Tae-Hyun Oh. Zero-shot Depth Completion via Test-time Alignment with Affine-invariant Depth Prior.Proceedings of the AAAI Conference on Artificial Intelligence, 39(4):3877–3885, Apr. 2025

2025
[24]

Annotation Tool and Urban Dataset for 3D Point Cloud Semantic Segmentation.IEEE Access, 9:35984–35996, 2021

Muhammad Ibrahim, Naveed Akhtar, Michael Wise, and Ajmal Mian. Annotation Tool and Urban Dataset for 3D Point Cloud Semantic Segmentation.IEEE Access, 9:35984–35996, 2021

2021
[25]

Test- Time Prompt Tuning for Zero-Shot Depth Completion

Chanhwi Jeong, Inhwan Bae, Jin-Hwi Park, and Hae-Gon Jeon. Test- Time Prompt Tuning for Zero-Shot Depth Completion. InProceed- ings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9443–9454, October 2025

2025
[26]

DEFOM-Stereo: Depth Foundation Model Based Stereo Matching

Hualie Jiang, Zhiqiang Lou, Laiyan Ding, Rui Xu, Minglang Tan, Wenjie Jiang, and Rui Huang. DEFOM-Stereo: Depth Foundation Model Based Stereo Matching. InIEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2025

2025
[27]

Semantic 3D Reconstruction for V olumetric Modeling of Defects in Construction Sites.Robotics, 13(7), 2024

Dimitrios Katsatos, Paschalis Charalampous, Patrick Schmidt, Ioannis Kostavelis, Dimitrios Giakoumis, Lazaros Nalpantidis, and Dimitrios Tzovaras. Semantic 3D Reconstruction for V olumetric Modeling of Defects in Construction Sites.Robotics, 13(7), 2024

2024
[28]

Repurposing Diffusion- Based Image Generators for Monocular Depth Estimation

Bingxin Ke, Anton Obukhov, Shengyu Huang, Nando Metzger, Ro- drigo Caye Daudt, and Konrad Schindler. Repurposing Diffusion- Based Image Generators for Monocular Depth Estimation. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

2024
[29]

Depth Completion as Parameter-Efficient Test-Time Adaptation, 2026

Bingxin Ke, Qunjie Zhou, Jiahui Huang, Xuanchi Ren, Tianchang Shen, Konrad Schindler, Laura Leal-Taix´e, and Shengyu Huang. Depth Completion as Parameter-Efficient Test-Time Adaptation, 2026

2026
[30]

Seman- ticBridge—A dataset for 3D semantic segmentation of bridges and do- main gap analysis.Developments in the Built Environment, 26:100912, 2026

Maximilian Kellner, Mariana Ferrandon Cervantes, Yuandong Pan, Ruodan Lu, Ioannis Brilakis, and Alexander Reiterer. Seman- ticBridge—A dataset for 3D semantic segmentation of bridges and do- main gap analysis.Developments in the Built Environment, 26:100912, 2026

2026
[31]

End-To-End Learning of Geometry and Context for Deep Stereo Regression

Alex Kendall, Hayk Martirosyan, Saumitro Dasgupta, Peter Henry, Ryan Kennedy, Abraham Bachrach, and Adam Bry. End-To-End Learning of Geometry and Context for Deep Stereo Regression. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Oct 2017

2017
[32]

Evaluation of CNN-based Single-Image Depth Estimation Methods

Tobias Koch, Lukas Liebel, Friedrich Fraundorfer, and Marco Korner. Evaluation of CNN-based Single-Image Depth Estimation Methods. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pages 0–0, 2018

2018
[33]

I Kostavelis, L Nalpantidis, R Detry, H Bruyninckx, A Billard, S Christian, M Bosch, K Andronikidis, H Lund-Nielsen, P Yosefipor, U Wajid, R Tomar, FL Mart ´ınez, F Fugaroli, D Papargyriou, N Mehandjiev, G Bhullar, E Gonc ¸alves, J Bentzen, M Essenbæk, C Cremona, M Wong, M Sanchez, D Giakoumis, and D Tzovaras. RoB´etArm´e Project: Human-robot collaborativ...

2024
[34]

ConPR: Ongoing Construction Site Dataset for Place Recognition, 2024

Dongjae Lee, Minwoo Jung, and Ayoung Kim. ConPR: Ongoing Construction Site Dataset for Place Recognition, 2024

2024
[35]

Distilling Monocular Foundation Model for Fine-grained Depth Completion

Yingping Liang, Yutao Hu, Wenqi Shao, and Ying Fu. Distilling Monocular Foundation Model for Fine-grained Depth Completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 22254–22265, June 2025

2025
[36]

Any-stereo: Arbitrary scale disparity estimation for iterative stereo matching

Zhaohuai Liang and Changhe Li. Any-stereo: Arbitrary scale disparity estimation for iterative stereo matching. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 3333–3341, 2024

2024
[37]

Chen, Zhenyu Li, Guang Shi, Jiashi Feng, and Bingyi Kang

Haotong Lin, Sili Chen, Junhao Liew, Donny Y . Chen, Zhenyu Li, Guang Shi, Jiashi Feng, and Bingyi Kang. Depth Anything 3: Recovering the Visual Space from Any Views, 2025

2025
[38]

Depth Anything 3: Recovering the Visual Space from Any Views

Haotong Lin, Sili Chen, Jun Hao Liew, Donny Y . Chen, Zhenyu Li, Guang Shi, Jiashi Feng, and Bingyi Kang. Depth Anything 3: Recovering the visual space from any views.arXiv preprint arXiv:2511.10647, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[39]

Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation

Haotong Lin, Sida Peng, Jingxiao Chen, Songyou Peng, Jiaming Sun, Minghuan Liu, Hujun Bao, Jiashi Feng, Xiaowei Zhou, and Bingyi Kang. Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17070– 17080, June 2025

2025
[40]

RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching

Lahav Lipson, Zachary Teed, and Jia Deng. RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching. In2021 International conference on 3D vision (3DV), pages 218–227. IEEE, 2021

2021
[41]

A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation

Nikolaus Mayer, Eddy Ilg, Philip Hausser, Philipp Fischer, Daniel Cremers, Alexey Dosovitskiy, and Thomas Brox. A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016

2016
[42]

Sirakoulis, and Antonios Gaster- atos

Lazaros Nalpantidis, Georgios Ch. Sirakoulis, and Antonios Gaster- atos. Review of stereo vision algorithms: from software to hardware. International Journal of Optomechatronics, 2(4):435–462, 2008

2008
[43]

DINOv2: Learning Robust Visual Features without Supervision, 2024

Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Fran- cisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herv ´e Jegou, Julien Mairal, Pa...

2024
[44]

SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation

Duc-Hai Pham, Tung Do, Phong Nguyen, Binh-Son Hua, Khoi Nguyen, and Rang Nguyen. SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 17060– 17069, 2025

2025
[45]

UniDepthV2: Universal Monocular Metric Depth Estimation Made Simpler.IEEE Transac- tions on Pattern Analysis and Machine Intelligence, 48(3):2354–2367, 2026

Luigi Piccinelli, Christos Sakaridis, Yung-Hsu Yang, Mattia Segu, Siyuan Li, Wim Abbeloos, and Luc Van Gool. UniDepthV2: Universal Monocular Metric Depth Estimation Made Simpler.IEEE Transac- tions on Pattern Analysis and Machine Intelligence, 48(3):2354–2367, 2026

2026
[46]

UniDepth: Universal Monocular Metric Depth Estimation

Luigi Piccinelli, Yung-Hsu Yang, Christos Sakaridis, Mattia Segu, Siyuan Li, Luc Van Gool, and Fisher Yu. UniDepth: Universal Monocular Metric Depth Estimation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10106–10116, June 2024

2024
[47]

Ren ´e Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, and Vladlen Koltun. Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3):1623–1637, 2022

2022
[48]

A Virtual Reality-Based Learning Environment for Human—Robot Collaboration Training in Construction 4.0

Khadija Sabiri, Lu ´ıs Afonso, Caio Camargo, Estefˆania Gonc ¸alves, and Rui Fernandes. A Virtual Reality-Based Learning Environment for Human—Robot Collaboration Training in Construction 4.0. In2025 International Conference on Robotic Computing and Communication (RoboticCC), pages 54–61, 2025

2025
[49]

Towards autonomous shotcrete construction: semantic 3D reconstruction for concrete deposition using stereo vision and deep learning

Patrick Schmidt, Dimitrios Katsatos, Dimitrios Alexiou, Ioannis Kostavelis, Dimitrios Giakoumis, Dimitrios Tzovaras, and Lazaros Nalpantidis. Towards autonomous shotcrete construction: semantic 3D reconstruction for concrete deposition using stereo vision and deep learning. InProceedings of the 41st International Symposium on Automation and Robotics in Co...

2024
[50]

Segmentation dataset for reinforced concrete construction.Automation in Construction, 171:105990, 2025

Patrick Schmidt and Lazaros Nalpantidis. Segmentation dataset for reinforced concrete construction.Automation in Construction, 171:105990, 2025

2025
[51]

Efficient Test-Time Optimization for Depth Completion via Low-Rank Decoder Adaptation, 2026

Minseok Seo, Wonjun Lee, Jaehyuk Jang, and Changick Kim. Efficient Test-Time Optimization for Depth Completion via Low-Rank Decoder Adaptation, 2026

2026
[52]

CFNet: Cascade and Fused Cost V olume for Robust Stereo Matching

Zhelun Shen, Yuchao Dai, and Zhibo Rao. CFNet: Cascade and Fused Cost V olume for Robust Stereo Matching. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13906–13915, June 2021

2021
[53]

PCW-Net: Pyramid Combination and Warping Cost V olume for Stereo Matching

Zhelun Shen, Yuchao Dai, Xibin Song, Zhibo Rao, Dingfu Zhou, and Liangjun Zhang. PCW-Net: Pyramid Combination and Warping Cost V olume for Stereo Matching. In Shai Avidan, Gabriel Brostow, Moustapha Ciss ´e, Giovanni Maria Farinella, and Tal Hassner, editors, Computer Vision – ECCV 2022, pages 280–297, Cham, 2022. Springer Nature Switzerland

2022
[54]

Dataset for evaluation and numerical modelling of structural performance of fibre- reinforced shotcrete with fibres of steel, synthetic and basalt.Data in Brief, 61:111684, 2025

Andreas Sj ¨olander, Erik Nordstr ¨om, and Anders Ansell. Dataset for evaluation and numerical modelling of structural performance of fibre- reinforced shotcrete with fibres of steel, synthetic and basalt.Data in Brief, 61:111684, 2025

2025
[55]

ConSLAM: Periodically Collected Real-World Construction Dataset for SLAM and Progress Monitoring

Maciej Trzeciak, Kacper Pluta, Yasmin Fathy, Lucio Alcalde, Stan- ley Chee, Antony Bromley, Ioannis Brilakis, and Pierre Alliez. ConSLAM: Periodically Collected Real-World Construction Dataset for SLAM and Progress Monitoring. In Leonid Karlinsky, Tomer Michaeli, and Ko Nishino, editors,Computer Vision – ECCV 2022 Workshops, pages 317–331, Cham, 2023. Spr...

2022
[56]

Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion

Massimiliano Viola, Kevin Qu, Nando Metzger, Bingxin Ke, Alexan- der Becker, Konrad Schindler, and Anton Obukhov. Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 5359–5370, October 2025

2025
[57]

MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision

Ruicheng Wang, Sicheng Xu, Cassie Dai, Jianfeng Xiang, Yu Deng, Xin Tong, and Jiaolong Yang. MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 5261–5271, June 2025

2025
[58]

MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details

Ruicheng Wang, Sicheng Xu, Yue Dong, Yu Deng, Jianfeng Xiang, Zelong Lv, Guangzhong Sun, Xin Tong, and Jiaolong Yang. MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

2025
[59]

Croco v2: Improved cross- view completion pre-training for stereo matching and optical flow

Philippe Weinzaepfel, Thomas Lucas, Vincent Leroy, Yohann Cabon, Vaibhav Arora, Romain Br ´egier, Gabriela Csurka, Leonid Antsfeld, Boris Chidlovskii, and J ´erˆome Revaud. Croco v2: Improved cross- view completion pre-training for stereo matching and optical flow. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 17969–17980, 2023

2023
[60]

FoundationStereo: Zero-Shot Stereo Matching

Bowen Wen, Matthew Trepte, Joseph Aribido, Jan Kautz, Orazio Gallo, and Stan Birchfield. FoundationStereo: Zero-Shot Stereo Matching. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5249–5260, June 2025

2025
[61]

Advancing robotic automation in wood- framed construction using vision-driven adaptive control.Automation in Construction, 185:106858, 2026

Chao Xie and Aladdin Alwisy. Advancing robotic automation in wood- framed construction using vision-driven adaptive control.Automation in Construction, 185:106858, 2026

2026
[62]

Pixel- Perfect Depth with Semantics-Prompted Diffusion Transformers, 2025

Gangwei Xu, Haotong Lin, Hongcheng Luo, Xianqi Wang, Jingfeng Yao, Lianghui Zhu, Yuechuan Pu, Cheng Chi, Haiyang Sun, Bing Wang, Guang Chen, Hangjun Ye, Sida Peng, and Xin Yang. Pixel- Perfect Depth with Semantics-Prompted Diffusion Transformers, 2025

2025
[63]

Iterative geometry encoding volume for stereo matching

Gangwei Xu, Xianqi Wang, Xiaohuan Ding, and Xin Yang. Iterative geometry encoding volume for stereo matching. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 21919–21928, 2023

2023
[64]

AANet: Adaptive Aggregation Network for Efficient Stereo Matching

Haofei Xu and Juyong Zhang. AANet: Adaptive Aggregation Network for Efficient Stereo Matching. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020

2020
[65]

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, and Hengshuang Zhao. Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10371–10381, June 2024

2024
[66]

Depth Anything V2

Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiaogang Xu, Jiashi Feng, and Hengshuang Zhao. Depth Anything V2. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Processing Systems, volume 37, pages 21875–21911. Curran Associates, Inc., 2024

2024
[67]

Physics-based particle system modeling of shotcrete process for robotic placement.Construction Robotics, 9(2):30, 2025

Mohammad Reza Yazdi Samadi, Ralf Waspe, and Christian Schlette. Physics-based particle system modeling of shotcrete process for robotic placement.Construction Robotics, 9(2):30, 2025

2025
[68]

From Human to Height-Field: Predictive Shotcrete Simulation with a Physics-Informed Particle System

Mohammad Reza Yazdi Samadi, Rui Wu, Soheil Gholami, Ralf Waspe, Ali Muhammad, Aude Billard, and Christian Schlette. From Human to Height-Field: Predictive Shotcrete Simulation with a Physics-Informed Particle System. In2025 IEEE International Con- ference on Advanced Robotics (ICAR), pages 826–832, 2025

2025
[69]

Metric3D: Towards Zero- shot Metric 3D Prediction from A Single Image

Wei Yin, Chi Zhang, Hao Chen, Zhipeng Cai, Gang Yu, Kaixuan Wang, Xiaozhi Chen, and Chunhua Shen. Metric3D: Towards Zero- shot Metric 3D Prediction from A Single Image. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9043–9053, October 2023

2023
[70]

BetterDepth: Plug-and-Play Diffusion Refiner for Zero- Shot Monocular Depth Estimation

Xiang Zhang, Bingxin Ke, Hayko Riemenschneider, Nando Metzger, Anton Obukhov, Markus Gross, Konrad Schindler, and Christopher Schroers. BetterDepth: Plug-and-Play Diffusion Refiner for Zero- Shot Monocular Depth Estimation. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors, Advances in Neural Information Process...

2024
[71]

OGNI-DC: Robust Depth Completion with Optimization-Guided Neural Iterations

Yiming Zuo and Jia Deng. OGNI-DC: Robust Depth Completion with Optimization-Guided Neural Iterations. InComputer Vision – ECCV 2024, pages 78–95, Cham, 2025. Springer Nature Switzerland

2024
[72]

OMNI-DC: Highly Robust Depth Completion with Multiresolution Depth Inte- gration

Yiming Zuo, Willow Yang, Zeyu Ma, and Jia Deng. OMNI-DC: Highly Robust Depth Completion with Multiresolution Depth Inte- gration. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9287–9297, October 2025

2025

[1] [1]

Ancuti, Cosmin Ancuti, Mateu Sbert, and Radu Timofte

Codruta O. Ancuti, Cosmin Ancuti, Mateu Sbert, and Radu Timofte. Dense-Haze: A Benchmark for Image Dehazing with Dense-Haze and Haze-Free Images. In2019 IEEE International Conference on Image Processing (ICIP), pages 1014–1018, 2019

2019

[2] [2]

Robot active vision-based path planning for localization improvement in indoor environments

Sotirios Barlakas, Dimitrios Alexiou, Kosmas Tsiakas, Dimitrios Kat- satos, Ioannis Kostavelis, Dimitrios Giakoumis, Antonios Gasteratos, and Dimitrios Tzovaras. Robot active vision-based path planning for localization improvement in indoor environments. In2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10686–10693. ...

2024

[3] [3]

Revisiting Depth Completion from a Stereo Matching Perspective for Cross-domain Generalization

Luca Bartolomei, Matteo Poggi, Andrea Conti, Fabio Tosi, and Stefano Mattoccia. Revisiting Depth Completion from a Stereo Matching Perspective for Cross-domain Generalization . In2024 International Conference on 3D Vision (3DV), pages 1360–1370, Los Alamitos, CA, USA, Mar. 2024. IEEE Computer Society

2024

[4] [4]

Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail

Luca Bartolomei, Fabio Tosi, Matteo Poggi, and Stefano Mattoccia. Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail. pages 1013–1027, 2025

2025

[5] [5]

ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth, 2023

Shariq Farooq Bhat, Reiner Birkl, Diana Wofk, Peter Wonka, and Matthias M¨uller. ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth, 2023

2023

[6] [6]

Seeing Through Fog Without Seeing Fog: Deep Multimodal Sensor Fusion in Unseen Adverse Weather

Mario Bijelic, Tobias Gruber, Fahim Mannan, Florian Kraus, Werner Ritter, Klaus Dietmayer, and Felix Heide. Seeing Through Fog Without Seeing Fog: Deep Multimodal Sensor Fusion in Unseen Adverse Weather. InThe IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2020

2020

[7] [7]

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

Aleksei Bochkovskii, Ama ¨el Delaunoy, Hugo Germain, Marcel Santos, Yichao Zhou, Stephan R Richter, and Vladlen Koltun. Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.arXiv preprint arXiv:2410.02073, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[8] [8]

MoCha-Stereo: Motif Channel Attention Network for Stereo Matching

Ziyang Chen, Wei Long, He Yao, Yongjun Zhang, Bingshu Wang, Yongbin Qin, and Jia Wu. MoCha-Stereo: Motif Channel Attention Network for Stereo Matching. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 27768–27777, June 2024

2024

[9] [9]

Unsupervised confidence for LiDAR depth maps and applications

Andrea Conti, Matteo Poggi, Filippo Aleotti, and Stefano Mattoccia. Unsupervised confidence for LiDAR depth maps and applications. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 8352–8359, 2022

2022

[10] [10]

Deformation measurement of tunnel shotcrete liner using the multiepoch LiDAR point clouds.Jour- nal of Construction Engineering and Management, 150(6):04024049, 2024

Li-Zhuang Cui, Jian Liu, Hongzheng Luo, Jianhong Wang, Xiao Zhang, Gaohang Lv, and Quanyi Xie. Deformation measurement of tunnel shotcrete liner using the multiepoch LiDAR point clouds.Jour- nal of Construction Engineering and Management, 150(6):04024049, 2024

2024

[11] [11]

A virtual construction vehicles and workers dataset with three-dimensional annotations.Engineering Applications of Artificial Intelligence, 133:107964, 2024

Yuexiong Ding and Xiaowei Luo. A virtual construction vehicles and workers dataset with three-dimensional annotations.Engineering Applications of Artificial Intelligence, 133:107964, 2024

2024

[12] [12]

AREPO: Uncertainty-Aware Robot Ensemble Learning Under Ex- treme Partial Observability.IEEE Robotics and Automation Letters, 10(6):5737–5744, 2025

Yurui Du, Louis Hanut, Herman Bruyninckx, and Renaud Detry. AREPO: Uncertainty-Aware Robot Ensemble Learning Under Ex- treme Partial Observability.IEEE Robotics and Automation Letters, 10(6):5737–5744, 2025

2025

[13] [13]

Indoor FireRes- cue Radar: 4D Indoor Millimeter Wave Dataset and Analysis for Hazardous Environment Perception

Kangkang Duan, Zehao Zhu, and Zhengbo Zou. Indoor FireRes- cue Radar: 4D Indoor Millimeter Wave Dataset and Analysis for Hazardous Environment Perception. In2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 18620– 18627, 2025

2025

[14] [14]

Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Gonzalo Martin Garcia, Karim Abou Zeid, Christian Schmidt, Daan de Geus, Alexander Hermans, and Bastian Leibe. Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think. In Proceedings of the Winter Conference on Applications of Computer Vision (WACV), pages 753–762, February 2025

2025

[15] [15]

Vision meets Robotics: The KITTI Dataset.International Journal of Robotics Research (IJRR), 2013

Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. Vision meets Robotics: The KITTI Dataset.International Journal of Robotics Research (IJRR), 2013

2013

[16] [16]

SteeredMarigold: Steering Diffusion Towards Depth Completion of Largely Incomplete Depth Maps

Jakub Gregorek and Lazaros Nalpantidis. SteeredMarigold: Steering Diffusion Towards Depth Completion of Largely Incomplete Depth Maps. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 13304–13311, 2025

2025

[17] [17]

Need for Speed: Zero-Shot Depth Completion with Single-Step Diffusion, 2026

Jakub Gregorek, Paraskevas Pegios, Nando Metzger, Konrad Schindler, Theodora Kontogianni, and Lazaros Nalpantidis. Need for Speed: Zero-Shot Depth Completion with Single-Step Diffusion, 2026

2026

[18] [18]

DepthFM: Fast Generative Monocular Depth Estimation with Flow Matching

Ming Gui, Johannes Schusterbauer, Ulrich Prestel, Pingchuan Ma, Dmytro Kotovenko, Olga Grebenkova, Stefan Andreas Baumann, Vincent Tao Hu, and Bj ¨orn Ommer. DepthFM: Fast Generative Monocular Depth Estimation with Flow Matching. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 3203–3211, 2025

2025

[19] [19]

Robotic Framework for Iterative and Adaptive Profile Grading of Sand

Louis Hanut, Yurui Du, Andrew Vande Moere, Renaud Detry, and Herman Bruyninckx. Robotic Framework for Iterative and Adaptive Profile Grading of Sand. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 10387–10393, 2025

2025

[20] [20]

Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction

Jing He, Haodong Li, Wei Yin, Yixun Liang, Leheng Li, Kaiqiang Zhou, Hongbo Zhang, Bingbing Liu, and Ying-Cong Chen. Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction. InThe Thirteenth International Conference on Learning Representations, 2025

2025

[21] [21]

Hirschmuller

H. Hirschmuller. Accurate and efficient stereo processing by semi- global matching and mutual information. In2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 2, pages 807–814 vol. 2, 2005

2005

[22] [22]

BuildingWorld: A Structured 3D Building Dataset for Urban Foundation Mod- els.Proceedings of the AAAI Conference on Artificial Intelligence, 40(7):5085–5094, Mar

Shangfeng Huang, Ruisheng Wang, and Xin Wang. BuildingWorld: A Structured 3D Building Dataset for Urban Foundation Mod- els.Proceedings of the AAAI Conference on Artificial Intelligence, 40(7):5085–5094, Mar. 2026

2026

[23] [23]

Zero-shot Depth Completion via Test-time Alignment with Affine-invariant Depth Prior.Proceedings of the AAAI Conference on Artificial Intelligence, 39(4):3877–3885, Apr

Lee Hyoseok, Kyeong Seon Kim, Kwon Byung-Ki, and Tae-Hyun Oh. Zero-shot Depth Completion via Test-time Alignment with Affine-invariant Depth Prior.Proceedings of the AAAI Conference on Artificial Intelligence, 39(4):3877–3885, Apr. 2025

2025

[24] [24]

Annotation Tool and Urban Dataset for 3D Point Cloud Semantic Segmentation.IEEE Access, 9:35984–35996, 2021

Muhammad Ibrahim, Naveed Akhtar, Michael Wise, and Ajmal Mian. Annotation Tool and Urban Dataset for 3D Point Cloud Semantic Segmentation.IEEE Access, 9:35984–35996, 2021

2021

[25] [25]

Test- Time Prompt Tuning for Zero-Shot Depth Completion

Chanhwi Jeong, Inhwan Bae, Jin-Hwi Park, and Hae-Gon Jeon. Test- Time Prompt Tuning for Zero-Shot Depth Completion. InProceed- ings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9443–9454, October 2025

2025

[26] [26]

DEFOM-Stereo: Depth Foundation Model Based Stereo Matching

Hualie Jiang, Zhiqiang Lou, Laiyan Ding, Rui Xu, Minglang Tan, Wenjie Jiang, and Rui Huang. DEFOM-Stereo: Depth Foundation Model Based Stereo Matching. InIEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2025

2025

[27] [27]

Semantic 3D Reconstruction for V olumetric Modeling of Defects in Construction Sites.Robotics, 13(7), 2024

Dimitrios Katsatos, Paschalis Charalampous, Patrick Schmidt, Ioannis Kostavelis, Dimitrios Giakoumis, Lazaros Nalpantidis, and Dimitrios Tzovaras. Semantic 3D Reconstruction for V olumetric Modeling of Defects in Construction Sites.Robotics, 13(7), 2024

2024

[28] [28]

Repurposing Diffusion- Based Image Generators for Monocular Depth Estimation

Bingxin Ke, Anton Obukhov, Shengyu Huang, Nando Metzger, Ro- drigo Caye Daudt, and Konrad Schindler. Repurposing Diffusion- Based Image Generators for Monocular Depth Estimation. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

2024

[29] [29]

Depth Completion as Parameter-Efficient Test-Time Adaptation, 2026

Bingxin Ke, Qunjie Zhou, Jiahui Huang, Xuanchi Ren, Tianchang Shen, Konrad Schindler, Laura Leal-Taix´e, and Shengyu Huang. Depth Completion as Parameter-Efficient Test-Time Adaptation, 2026

2026

[30] [30]

Seman- ticBridge—A dataset for 3D semantic segmentation of bridges and do- main gap analysis.Developments in the Built Environment, 26:100912, 2026

Maximilian Kellner, Mariana Ferrandon Cervantes, Yuandong Pan, Ruodan Lu, Ioannis Brilakis, and Alexander Reiterer. Seman- ticBridge—A dataset for 3D semantic segmentation of bridges and do- main gap analysis.Developments in the Built Environment, 26:100912, 2026

2026

[31] [31]

End-To-End Learning of Geometry and Context for Deep Stereo Regression

Alex Kendall, Hayk Martirosyan, Saumitro Dasgupta, Peter Henry, Ryan Kennedy, Abraham Bachrach, and Adam Bry. End-To-End Learning of Geometry and Context for Deep Stereo Regression. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Oct 2017

2017

[32] [32]

Evaluation of CNN-based Single-Image Depth Estimation Methods

Tobias Koch, Lukas Liebel, Friedrich Fraundorfer, and Marco Korner. Evaluation of CNN-based Single-Image Depth Estimation Methods. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pages 0–0, 2018

2018

[33] [33]

I Kostavelis, L Nalpantidis, R Detry, H Bruyninckx, A Billard, S Christian, M Bosch, K Andronikidis, H Lund-Nielsen, P Yosefipor, U Wajid, R Tomar, FL Mart ´ınez, F Fugaroli, D Papargyriou, N Mehandjiev, G Bhullar, E Gonc ¸alves, J Bentzen, M Essenbæk, C Cremona, M Wong, M Sanchez, D Giakoumis, and D Tzovaras. RoB´etArm´e Project: Human-robot collaborativ...

2024

[34] [34]

ConPR: Ongoing Construction Site Dataset for Place Recognition, 2024

Dongjae Lee, Minwoo Jung, and Ayoung Kim. ConPR: Ongoing Construction Site Dataset for Place Recognition, 2024

2024

[35] [35]

Distilling Monocular Foundation Model for Fine-grained Depth Completion

Yingping Liang, Yutao Hu, Wenqi Shao, and Ying Fu. Distilling Monocular Foundation Model for Fine-grained Depth Completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 22254–22265, June 2025

2025

[36] [36]

Any-stereo: Arbitrary scale disparity estimation for iterative stereo matching

Zhaohuai Liang and Changhe Li. Any-stereo: Arbitrary scale disparity estimation for iterative stereo matching. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 3333–3341, 2024

2024

[37] [37]

Chen, Zhenyu Li, Guang Shi, Jiashi Feng, and Bingyi Kang

Haotong Lin, Sili Chen, Junhao Liew, Donny Y . Chen, Zhenyu Li, Guang Shi, Jiashi Feng, and Bingyi Kang. Depth Anything 3: Recovering the Visual Space from Any Views, 2025

2025

[38] [38]

Depth Anything 3: Recovering the Visual Space from Any Views

Haotong Lin, Sili Chen, Jun Hao Liew, Donny Y . Chen, Zhenyu Li, Guang Shi, Jiashi Feng, and Bingyi Kang. Depth Anything 3: Recovering the visual space from any views.arXiv preprint arXiv:2511.10647, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[39] [39]

Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation

Haotong Lin, Sida Peng, Jingxiao Chen, Songyou Peng, Jiaming Sun, Minghuan Liu, Hujun Bao, Jiashi Feng, Xiaowei Zhou, and Bingyi Kang. Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17070– 17080, June 2025

2025

[40] [40]

RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching

Lahav Lipson, Zachary Teed, and Jia Deng. RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching. In2021 International conference on 3D vision (3DV), pages 218–227. IEEE, 2021

2021

[41] [41]

A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation

Nikolaus Mayer, Eddy Ilg, Philip Hausser, Philipp Fischer, Daniel Cremers, Alexey Dosovitskiy, and Thomas Brox. A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016

2016

[42] [42]

Sirakoulis, and Antonios Gaster- atos

Lazaros Nalpantidis, Georgios Ch. Sirakoulis, and Antonios Gaster- atos. Review of stereo vision algorithms: from software to hardware. International Journal of Optomechatronics, 2(4):435–462, 2008

2008

[43] [43]

DINOv2: Learning Robust Visual Features without Supervision, 2024

Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Fran- cisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herv ´e Jegou, Julien Mairal, Pa...

2024

[44] [44]

SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation

Duc-Hai Pham, Tung Do, Phong Nguyen, Binh-Son Hua, Khoi Nguyen, and Rang Nguyen. SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 17060– 17069, 2025

2025

[45] [45]

UniDepthV2: Universal Monocular Metric Depth Estimation Made Simpler.IEEE Transac- tions on Pattern Analysis and Machine Intelligence, 48(3):2354–2367, 2026

Luigi Piccinelli, Christos Sakaridis, Yung-Hsu Yang, Mattia Segu, Siyuan Li, Wim Abbeloos, and Luc Van Gool. UniDepthV2: Universal Monocular Metric Depth Estimation Made Simpler.IEEE Transac- tions on Pattern Analysis and Machine Intelligence, 48(3):2354–2367, 2026

2026

[46] [46]

UniDepth: Universal Monocular Metric Depth Estimation

Luigi Piccinelli, Yung-Hsu Yang, Christos Sakaridis, Mattia Segu, Siyuan Li, Luc Van Gool, and Fisher Yu. UniDepth: Universal Monocular Metric Depth Estimation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10106–10116, June 2024

2024

[47] [47]

Ren ´e Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, and Vladlen Koltun. Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3):1623–1637, 2022

2022

[48] [48]

A Virtual Reality-Based Learning Environment for Human—Robot Collaboration Training in Construction 4.0

Khadija Sabiri, Lu ´ıs Afonso, Caio Camargo, Estefˆania Gonc ¸alves, and Rui Fernandes. A Virtual Reality-Based Learning Environment for Human—Robot Collaboration Training in Construction 4.0. In2025 International Conference on Robotic Computing and Communication (RoboticCC), pages 54–61, 2025

2025

[49] [49]

Towards autonomous shotcrete construction: semantic 3D reconstruction for concrete deposition using stereo vision and deep learning

Patrick Schmidt, Dimitrios Katsatos, Dimitrios Alexiou, Ioannis Kostavelis, Dimitrios Giakoumis, Dimitrios Tzovaras, and Lazaros Nalpantidis. Towards autonomous shotcrete construction: semantic 3D reconstruction for concrete deposition using stereo vision and deep learning. InProceedings of the 41st International Symposium on Automation and Robotics in Co...

2024

[50] [50]

Segmentation dataset for reinforced concrete construction.Automation in Construction, 171:105990, 2025

Patrick Schmidt and Lazaros Nalpantidis. Segmentation dataset for reinforced concrete construction.Automation in Construction, 171:105990, 2025

2025

[51] [51]

Efficient Test-Time Optimization for Depth Completion via Low-Rank Decoder Adaptation, 2026

Minseok Seo, Wonjun Lee, Jaehyuk Jang, and Changick Kim. Efficient Test-Time Optimization for Depth Completion via Low-Rank Decoder Adaptation, 2026

2026

[52] [52]

CFNet: Cascade and Fused Cost V olume for Robust Stereo Matching

Zhelun Shen, Yuchao Dai, and Zhibo Rao. CFNet: Cascade and Fused Cost V olume for Robust Stereo Matching. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13906–13915, June 2021

2021

[53] [53]

PCW-Net: Pyramid Combination and Warping Cost V olume for Stereo Matching

Zhelun Shen, Yuchao Dai, Xibin Song, Zhibo Rao, Dingfu Zhou, and Liangjun Zhang. PCW-Net: Pyramid Combination and Warping Cost V olume for Stereo Matching. In Shai Avidan, Gabriel Brostow, Moustapha Ciss ´e, Giovanni Maria Farinella, and Tal Hassner, editors, Computer Vision – ECCV 2022, pages 280–297, Cham, 2022. Springer Nature Switzerland

2022

[54] [54]

Dataset for evaluation and numerical modelling of structural performance of fibre- reinforced shotcrete with fibres of steel, synthetic and basalt.Data in Brief, 61:111684, 2025

Andreas Sj ¨olander, Erik Nordstr ¨om, and Anders Ansell. Dataset for evaluation and numerical modelling of structural performance of fibre- reinforced shotcrete with fibres of steel, synthetic and basalt.Data in Brief, 61:111684, 2025

2025

[55] [55]

ConSLAM: Periodically Collected Real-World Construction Dataset for SLAM and Progress Monitoring

Maciej Trzeciak, Kacper Pluta, Yasmin Fathy, Lucio Alcalde, Stan- ley Chee, Antony Bromley, Ioannis Brilakis, and Pierre Alliez. ConSLAM: Periodically Collected Real-World Construction Dataset for SLAM and Progress Monitoring. In Leonid Karlinsky, Tomer Michaeli, and Ko Nishino, editors,Computer Vision – ECCV 2022 Workshops, pages 317–331, Cham, 2023. Spr...

2022

[56] [56]

Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion

Massimiliano Viola, Kevin Qu, Nando Metzger, Bingxin Ke, Alexan- der Becker, Konrad Schindler, and Anton Obukhov. Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 5359–5370, October 2025

2025

[57] [57]

MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision

Ruicheng Wang, Sicheng Xu, Cassie Dai, Jianfeng Xiang, Yu Deng, Xin Tong, and Jiaolong Yang. MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 5261–5271, June 2025

2025

[58] [58]

MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details

Ruicheng Wang, Sicheng Xu, Yue Dong, Yu Deng, Jianfeng Xiang, Zelong Lv, Guangzhong Sun, Xin Tong, and Jiaolong Yang. MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

2025

[59] [59]

Croco v2: Improved cross- view completion pre-training for stereo matching and optical flow

Philippe Weinzaepfel, Thomas Lucas, Vincent Leroy, Yohann Cabon, Vaibhav Arora, Romain Br ´egier, Gabriela Csurka, Leonid Antsfeld, Boris Chidlovskii, and J ´erˆome Revaud. Croco v2: Improved cross- view completion pre-training for stereo matching and optical flow. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 17969–17980, 2023

2023

[60] [60]

FoundationStereo: Zero-Shot Stereo Matching

Bowen Wen, Matthew Trepte, Joseph Aribido, Jan Kautz, Orazio Gallo, and Stan Birchfield. FoundationStereo: Zero-Shot Stereo Matching. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5249–5260, June 2025

2025

[61] [61]

Advancing robotic automation in wood- framed construction using vision-driven adaptive control.Automation in Construction, 185:106858, 2026

Chao Xie and Aladdin Alwisy. Advancing robotic automation in wood- framed construction using vision-driven adaptive control.Automation in Construction, 185:106858, 2026

2026

[62] [62]

Pixel- Perfect Depth with Semantics-Prompted Diffusion Transformers, 2025

Gangwei Xu, Haotong Lin, Hongcheng Luo, Xianqi Wang, Jingfeng Yao, Lianghui Zhu, Yuechuan Pu, Cheng Chi, Haiyang Sun, Bing Wang, Guang Chen, Hangjun Ye, Sida Peng, and Xin Yang. Pixel- Perfect Depth with Semantics-Prompted Diffusion Transformers, 2025

2025

[63] [63]

Iterative geometry encoding volume for stereo matching

Gangwei Xu, Xianqi Wang, Xiaohuan Ding, and Xin Yang. Iterative geometry encoding volume for stereo matching. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 21919–21928, 2023

2023

[64] [64]

AANet: Adaptive Aggregation Network for Efficient Stereo Matching

Haofei Xu and Juyong Zhang. AANet: Adaptive Aggregation Network for Efficient Stereo Matching. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020

2020

[65] [65]

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, and Hengshuang Zhao. Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10371–10381, June 2024

2024

[66] [66]

Depth Anything V2

Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiaogang Xu, Jiashi Feng, and Hengshuang Zhao. Depth Anything V2. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Processing Systems, volume 37, pages 21875–21911. Curran Associates, Inc., 2024

2024

[67] [67]

Physics-based particle system modeling of shotcrete process for robotic placement.Construction Robotics, 9(2):30, 2025

Mohammad Reza Yazdi Samadi, Ralf Waspe, and Christian Schlette. Physics-based particle system modeling of shotcrete process for robotic placement.Construction Robotics, 9(2):30, 2025

2025

[68] [68]

From Human to Height-Field: Predictive Shotcrete Simulation with a Physics-Informed Particle System

Mohammad Reza Yazdi Samadi, Rui Wu, Soheil Gholami, Ralf Waspe, Ali Muhammad, Aude Billard, and Christian Schlette. From Human to Height-Field: Predictive Shotcrete Simulation with a Physics-Informed Particle System. In2025 IEEE International Con- ference on Advanced Robotics (ICAR), pages 826–832, 2025

2025

[69] [69]

Metric3D: Towards Zero- shot Metric 3D Prediction from A Single Image

Wei Yin, Chi Zhang, Hao Chen, Zhipeng Cai, Gang Yu, Kaixuan Wang, Xiaozhi Chen, and Chunhua Shen. Metric3D: Towards Zero- shot Metric 3D Prediction from A Single Image. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9043–9053, October 2023

2023

[70] [70]

BetterDepth: Plug-and-Play Diffusion Refiner for Zero- Shot Monocular Depth Estimation

Xiang Zhang, Bingxin Ke, Hayko Riemenschneider, Nando Metzger, Anton Obukhov, Markus Gross, Konrad Schindler, and Christopher Schroers. BetterDepth: Plug-and-Play Diffusion Refiner for Zero- Shot Monocular Depth Estimation. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors, Advances in Neural Information Process...

2024

[71] [71]

OGNI-DC: Robust Depth Completion with Optimization-Guided Neural Iterations

Yiming Zuo and Jia Deng. OGNI-DC: Robust Depth Completion with Optimization-Guided Neural Iterations. InComputer Vision – ECCV 2024, pages 78–95, Cham, 2025. Springer Nature Switzerland

2024

[72] [72]

OMNI-DC: Highly Robust Depth Completion with Multiresolution Depth Inte- gration

Yiming Zuo, Willow Yang, Zeyu Ma, and Jia Deng. OMNI-DC: Highly Robust Depth Completion with Multiresolution Depth Inte- gration. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9287–9297, October 2025

2025