pith. sign in

arxiv: 2606.23152 · v1 · pith:4ZL4MUFUnew · submitted 2026-06-22 · 💻 cs.RO

ShotcreteDepth: A Bi-modal Dataset for Robust Robotic Depth Perception in Shotcrete Construction Environments

Pith reviewed 2026-06-26 08:31 UTC · model grok-4.3

classification 💻 cs.RO
keywords ShotcreteDepthdatasetdepth estimationstereo matchingLiDARconstruction roboticsdepth completionshotcrete
0
0 comments X

The pith

ShotcreteDepth provides a bi-modal dataset of synchronized stereo RGB and LiDAR data from active shotcrete construction under high turbidity and poor illumination.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ShotcreteDepth to supply realistic sensor data for testing robotic depth perception systems in construction settings. It collects 11,252 temporally synchronized stereo images and LiDAR point clouds from both active shotcreting processes and general sites, where turbidity and low light create noisy, incomplete measurements. The release includes a lightweight annotation tool for LiDAR and 220 annotated samples to support evaluation. This setup allows work on stereo matching, depth completion, and depth estimation that matches industrial operating conditions.

Core claim

The paper establishes the ShotcreteDepth dataset as a collection of 11,252 temporally synchronized stereo RGB images and LiDAR point clouds acquired in real-world shotcrete construction environments that feature high turbidity and poor illumination, accompanied by a lightweight annotation tool for LiDAR point clouds and 220 annotated samples for evaluation in stereo matching, depth completion, and depth estimation tasks.

What carries the argument

The ShotcreteDepth bi-modal dataset of stereo RGB imagery paired with LiDAR point clouds collected under harsh construction conditions.

If this is right

  • Stereo matching algorithms can be evaluated on imagery degraded by construction turbidity.
  • Depth completion techniques gain a testbed for recovering structure from sparse, noisy LiDAR returns in low light.
  • Depth estimation research obtains examples that reflect the incomplete observations typical of industrial robotics.
  • Autonomous construction systems can be trained and validated against data that includes both active spraying and static site conditions.
  • The annotation tool enables rapid expansion of labeled point clouds for further experiments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same data collection approach could be replicated for other dusty or low-visibility industrial tasks such as mining or tunneling.
  • Models pretrained on general outdoor datasets may require fine-tuning on this data to handle domain-specific noise patterns.
  • The dataset highlights sensor-fusion needs that could drive new hardware designs for construction robots.
  • Release of the annotation tool may accelerate labeling efforts in other point-cloud-heavy robotics domains.

Load-bearing premise

The collected stereo RGB imagery and LiDAR point clouds are temporally synchronized and accurately represent the high turbidity and poor illumination of active construction environments.

What would settle it

If depth estimation or completion models that succeed on the 220 annotated samples show no performance gain when deployed on independent recordings from the same shotcrete sites, the dataset's claimed representativeness would be refuted.

Figures

Figures reproduced from arXiv: 2606.23152 by Jakub Gregorek, Jonas Flink Bentzen, Lars Arnold Dethlefsen, Lazaros Nalpantidis, Mads Essenb{\ae}k, Patrick Schmidt.

Figure 1
Figure 1. Figure 1: Data samples from our ShotcreteDepth dataset. RGB images originate from Roboception rc visard 160c camera, the disparity and confidence is computed internally by the camera, and point cloud from the Velodyne LiDAR projected to the image plane. Red color denotes the smallest and blue the largest values. Abstract— We introduce ShotcreteDepth, a bi-modal dataset from the construction domain that captures both… view at source ↗
Figure 3
Figure 3. Figure 3: Our 3D printed dust-proof sensor housing containing [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The Annotation Tool we are releasing with the dataset. Upper image: 3D view of the LiDAR point cloud. Lower image: point cloud overlaid on top of the left camera image. The green points are “kept” while the purple color denotes the dust cloud which is to be excluded from evalu￾ation data. point contained in the sliding window by a given threshold are removed. Considering the lower amount of LiDAR scan line… view at source ↗
Figure 5
Figure 5. Figure 5: Comparing stereo matching methods. From top to bottom: left RGB image, disparity maps computed by rc visard stereo matching, RAFT-Stereo [40], Foundation￾Stereo [60] and Stereo Anywhere [4]. TABLE III: Evaluation of three depth completion meth￾ods: Marigold-DC [56] for a single run (1 st value) and ensemble of 10 runs (2 nd value), Marigold-SSD [17] and VPP4DC [3]. Metrics Methods Marigold-DC Marigold-SSD … view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative results for depth completion methods. (a) RGB (b) Marigold-E2E (c) Depth Anything v3 (d) MoGe-2 [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Qualitative results for depth estimation methods. V. DISCUSSION AND CONCLUSION Our experiments in Sec. IV, have shown that depth perception is indeed possible in the challenging condi￾tions of shotcreting environments. Each one of the tested approaches—3 stereo matching approaches, 3 depth com￾pletion methods and 3 depth estimation methods—has ex￾hibited merits in the corresponding task. While computa￾tion… view at source ↗
read the original abstract

We introduce ShotcreteDepth, a bi-modal dataset from the construction domain that captures both an active shotcreting process and general construction environments. The dataset comprises stereo RGB imagery and LiDAR point clouds acquired under harsh real-world conditions, including high turbidity and poor illumination. Such conditions adversely affect sensor measurements, leading to incomplete and noisy observations that pose significant challenges for perception systems in autonomous applications. Alongside the dataset, we release a lightweight annotation tool designed for time-efficient labeling of LiDAR point clouds. ShotcreteDepth consists of 11,252 temporally synchronized data samples, of which 220 are annotated for evaluation purposes. The dataset supports research in stereo matching, depth completion, and depth estimation under conditions that closely reflect the operational complexities found in industrial settings. Project repository: https://github.com/dtu-pas/shotcrete-depth

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript introduces ShotcreteDepth, a bi-modal dataset comprising 11,252 temporally synchronized stereo RGB images and LiDAR point clouds captured in shotcrete construction environments (including active shotcreting) under harsh conditions of high turbidity and poor illumination, along with 220 annotated samples and a lightweight LiDAR annotation tool. The dataset is positioned to support research in stereo matching, depth completion, and depth estimation for industrial robotic applications.

Significance. Release of synchronized multi-modal sensor data from a real industrial construction domain, together with an annotation tool, would address a gap in publicly available datasets for perception under challenging conditions; if the synchronization and environmental fidelity claims are substantiated, the resource could enable targeted algorithm development and benchmarking for autonomous systems in similar settings.

major comments (1)
  1. [Abstract] Abstract: The central claims that the 11,252 samples are 'temporally synchronized' and were 'acquired under harsh real-world conditions, including high turbidity and poor illumination' are load-bearing for all stated use cases (stereo matching, depth completion, depth estimation), yet the manuscript provides no description of the synchronization hardware or protocol, no measured time-offset statistics, and no quantitative environmental metrics (turbidity, lux, or equivalent) to confirm the conditions across the collection.

Simulated Author's Rebuttal

1 responses · 2 unresolved

We thank the referee for the constructive feedback on our manuscript. The recommendation for major revision is noted, and we address the concerns regarding substantiation of the synchronization and environmental condition claims below. We will revise the manuscript to strengthen these aspects where feasible.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claims that the 11,252 samples are 'temporally synchronized' and were 'acquired under harsh real-world conditions, including high turbidity and poor illumination' are load-bearing for all stated use cases (stereo matching, depth completion, depth estimation), yet the manuscript provides no description of the synchronization hardware or protocol, no measured time-offset statistics, and no quantitative environmental metrics (turbidity, lux, or equivalent) to confirm the conditions across the collection.

    Authors: We agree that additional details are needed to support these claims. In the revised manuscript, we will expand the methods section to describe the synchronization hardware (including the specific trigger mechanism and cabling) and the protocol employed to achieve temporal alignment between the stereo RGB cameras and LiDAR sensor. We will also provide any available supporting information on the collection setup. However, time-offset statistics were not measured during acquisition, and quantitative environmental metrics such as turbidity or lux values were not recorded. We will explicitly note these limitations and enhance the qualitative description of the harsh conditions based on the operational context of active shotcreting. revision: partial

standing simulated objections not resolved
  • Provision of measured time-offset statistics for synchronization, as these data were not collected during the original dataset acquisition.
  • Provision of quantitative environmental metrics (turbidity, lux, or equivalent), as these were not recorded during data collection.

Circularity Check

0 steps flagged

No circularity: dataset release paper with no derivations

full rationale

This manuscript introduces and describes a new bi-modal dataset (ShotcreteDepth) consisting of synchronized stereo RGB and LiDAR samples collected in construction environments. It contains no equations, no fitted parameters, no predictions derived from models, and no derivation chain of any kind. All content is descriptive reporting of data acquisition, annotation, and intended uses for downstream tasks such as stereo matching. No self-citations, ansatzes, or uniqueness claims appear that could create circularity. The central claims rest on the existence and properties of the released data itself rather than any reduction to prior inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Dataset release paper with no mathematical modeling; relies only on standard assumptions about sensor data acquisition and synchronization.

axioms (1)
  • domain assumption Stereo RGB imagery and LiDAR point clouds can be temporally synchronized under the described harsh construction conditions.
    Invoked implicitly when stating that the 11,252 samples are temporally synchronized.

pith-pipeline@v0.9.1-grok · 5698 in / 1275 out tokens · 21405 ms · 2026-06-26T08:31:46.665269+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

72 extracted references · 2 canonical work pages · 2 internal anchors

  1. [1]

    Ancuti, Cosmin Ancuti, Mateu Sbert, and Radu Timofte

    Codruta O. Ancuti, Cosmin Ancuti, Mateu Sbert, and Radu Timofte. Dense-Haze: A Benchmark for Image Dehazing with Dense-Haze and Haze-Free Images. In2019 IEEE International Conference on Image Processing (ICIP), pages 1014–1018, 2019

  2. [2]

    Robot active vision-based path planning for localization improvement in indoor environments

    Sotirios Barlakas, Dimitrios Alexiou, Kosmas Tsiakas, Dimitrios Kat- satos, Ioannis Kostavelis, Dimitrios Giakoumis, Antonios Gasteratos, and Dimitrios Tzovaras. Robot active vision-based path planning for localization improvement in indoor environments. In2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10686–10693. ...

  3. [3]

    Revisiting Depth Completion from a Stereo Matching Perspective for Cross-domain Generalization

    Luca Bartolomei, Matteo Poggi, Andrea Conti, Fabio Tosi, and Stefano Mattoccia. Revisiting Depth Completion from a Stereo Matching Perspective for Cross-domain Generalization . In2024 International Conference on 3D Vision (3DV), pages 1360–1370, Los Alamitos, CA, USA, Mar. 2024. IEEE Computer Society

  4. [4]

    Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail

    Luca Bartolomei, Fabio Tosi, Matteo Poggi, and Stefano Mattoccia. Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail. pages 1013–1027, 2025

  5. [5]

    ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth, 2023

    Shariq Farooq Bhat, Reiner Birkl, Diana Wofk, Peter Wonka, and Matthias M¨uller. ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth, 2023

  6. [6]

    Seeing Through Fog Without Seeing Fog: Deep Multimodal Sensor Fusion in Unseen Adverse Weather

    Mario Bijelic, Tobias Gruber, Fahim Mannan, Florian Kraus, Werner Ritter, Klaus Dietmayer, and Felix Heide. Seeing Through Fog Without Seeing Fog: Deep Multimodal Sensor Fusion in Unseen Adverse Weather. InThe IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2020

  7. [7]

    Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

    Aleksei Bochkovskii, Ama ¨el Delaunoy, Hugo Germain, Marcel Santos, Yichao Zhou, Stephan R Richter, and Vladlen Koltun. Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.arXiv preprint arXiv:2410.02073, 2024

  8. [8]

    MoCha-Stereo: Motif Channel Attention Network for Stereo Matching

    Ziyang Chen, Wei Long, He Yao, Yongjun Zhang, Bingshu Wang, Yongbin Qin, and Jia Wu. MoCha-Stereo: Motif Channel Attention Network for Stereo Matching. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 27768–27777, June 2024

  9. [9]

    Unsupervised confidence for LiDAR depth maps and applications

    Andrea Conti, Matteo Poggi, Filippo Aleotti, and Stefano Mattoccia. Unsupervised confidence for LiDAR depth maps and applications. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 8352–8359, 2022

  10. [10]

    Deformation measurement of tunnel shotcrete liner using the multiepoch LiDAR point clouds.Jour- nal of Construction Engineering and Management, 150(6):04024049, 2024

    Li-Zhuang Cui, Jian Liu, Hongzheng Luo, Jianhong Wang, Xiao Zhang, Gaohang Lv, and Quanyi Xie. Deformation measurement of tunnel shotcrete liner using the multiepoch LiDAR point clouds.Jour- nal of Construction Engineering and Management, 150(6):04024049, 2024

  11. [11]

    A virtual construction vehicles and workers dataset with three-dimensional annotations.Engineering Applications of Artificial Intelligence, 133:107964, 2024

    Yuexiong Ding and Xiaowei Luo. A virtual construction vehicles and workers dataset with three-dimensional annotations.Engineering Applications of Artificial Intelligence, 133:107964, 2024

  12. [12]

    AREPO: Uncertainty-Aware Robot Ensemble Learning Under Ex- treme Partial Observability.IEEE Robotics and Automation Letters, 10(6):5737–5744, 2025

    Yurui Du, Louis Hanut, Herman Bruyninckx, and Renaud Detry. AREPO: Uncertainty-Aware Robot Ensemble Learning Under Ex- treme Partial Observability.IEEE Robotics and Automation Letters, 10(6):5737–5744, 2025

  13. [13]

    Indoor FireRes- cue Radar: 4D Indoor Millimeter Wave Dataset and Analysis for Hazardous Environment Perception

    Kangkang Duan, Zehao Zhu, and Zhengbo Zou. Indoor FireRes- cue Radar: 4D Indoor Millimeter Wave Dataset and Analysis for Hazardous Environment Perception. In2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 18620– 18627, 2025

  14. [14]

    Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

    Gonzalo Martin Garcia, Karim Abou Zeid, Christian Schmidt, Daan de Geus, Alexander Hermans, and Bastian Leibe. Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think. In Proceedings of the Winter Conference on Applications of Computer Vision (WACV), pages 753–762, February 2025

  15. [15]

    Vision meets Robotics: The KITTI Dataset.International Journal of Robotics Research (IJRR), 2013

    Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. Vision meets Robotics: The KITTI Dataset.International Journal of Robotics Research (IJRR), 2013

  16. [16]

    SteeredMarigold: Steering Diffusion Towards Depth Completion of Largely Incomplete Depth Maps

    Jakub Gregorek and Lazaros Nalpantidis. SteeredMarigold: Steering Diffusion Towards Depth Completion of Largely Incomplete Depth Maps. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 13304–13311, 2025

  17. [17]

    Need for Speed: Zero-Shot Depth Completion with Single-Step Diffusion, 2026

    Jakub Gregorek, Paraskevas Pegios, Nando Metzger, Konrad Schindler, Theodora Kontogianni, and Lazaros Nalpantidis. Need for Speed: Zero-Shot Depth Completion with Single-Step Diffusion, 2026

  18. [18]

    DepthFM: Fast Generative Monocular Depth Estimation with Flow Matching

    Ming Gui, Johannes Schusterbauer, Ulrich Prestel, Pingchuan Ma, Dmytro Kotovenko, Olga Grebenkova, Stefan Andreas Baumann, Vincent Tao Hu, and Bj ¨orn Ommer. DepthFM: Fast Generative Monocular Depth Estimation with Flow Matching. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 3203–3211, 2025

  19. [19]

    Robotic Framework for Iterative and Adaptive Profile Grading of Sand

    Louis Hanut, Yurui Du, Andrew Vande Moere, Renaud Detry, and Herman Bruyninckx. Robotic Framework for Iterative and Adaptive Profile Grading of Sand. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 10387–10393, 2025

  20. [20]

    Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction

    Jing He, Haodong Li, Wei Yin, Yixun Liang, Leheng Li, Kaiqiang Zhou, Hongbo Zhang, Bingbing Liu, and Ying-Cong Chen. Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction. InThe Thirteenth International Conference on Learning Representations, 2025

  21. [21]

    Hirschmuller

    H. Hirschmuller. Accurate and efficient stereo processing by semi- global matching and mutual information. In2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 2, pages 807–814 vol. 2, 2005

  22. [22]

    BuildingWorld: A Structured 3D Building Dataset for Urban Foundation Mod- els.Proceedings of the AAAI Conference on Artificial Intelligence, 40(7):5085–5094, Mar

    Shangfeng Huang, Ruisheng Wang, and Xin Wang. BuildingWorld: A Structured 3D Building Dataset for Urban Foundation Mod- els.Proceedings of the AAAI Conference on Artificial Intelligence, 40(7):5085–5094, Mar. 2026

  23. [23]

    Zero-shot Depth Completion via Test-time Alignment with Affine-invariant Depth Prior.Proceedings of the AAAI Conference on Artificial Intelligence, 39(4):3877–3885, Apr

    Lee Hyoseok, Kyeong Seon Kim, Kwon Byung-Ki, and Tae-Hyun Oh. Zero-shot Depth Completion via Test-time Alignment with Affine-invariant Depth Prior.Proceedings of the AAAI Conference on Artificial Intelligence, 39(4):3877–3885, Apr. 2025

  24. [24]

    Annotation Tool and Urban Dataset for 3D Point Cloud Semantic Segmentation.IEEE Access, 9:35984–35996, 2021

    Muhammad Ibrahim, Naveed Akhtar, Michael Wise, and Ajmal Mian. Annotation Tool and Urban Dataset for 3D Point Cloud Semantic Segmentation.IEEE Access, 9:35984–35996, 2021

  25. [25]

    Test- Time Prompt Tuning for Zero-Shot Depth Completion

    Chanhwi Jeong, Inhwan Bae, Jin-Hwi Park, and Hae-Gon Jeon. Test- Time Prompt Tuning for Zero-Shot Depth Completion. InProceed- ings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9443–9454, October 2025

  26. [26]

    DEFOM-Stereo: Depth Foundation Model Based Stereo Matching

    Hualie Jiang, Zhiqiang Lou, Laiyan Ding, Rui Xu, Minglang Tan, Wenjie Jiang, and Rui Huang. DEFOM-Stereo: Depth Foundation Model Based Stereo Matching. InIEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2025

  27. [27]

    Semantic 3D Reconstruction for V olumetric Modeling of Defects in Construction Sites.Robotics, 13(7), 2024

    Dimitrios Katsatos, Paschalis Charalampous, Patrick Schmidt, Ioannis Kostavelis, Dimitrios Giakoumis, Lazaros Nalpantidis, and Dimitrios Tzovaras. Semantic 3D Reconstruction for V olumetric Modeling of Defects in Construction Sites.Robotics, 13(7), 2024

  28. [28]

    Repurposing Diffusion- Based Image Generators for Monocular Depth Estimation

    Bingxin Ke, Anton Obukhov, Shengyu Huang, Nando Metzger, Ro- drigo Caye Daudt, and Konrad Schindler. Repurposing Diffusion- Based Image Generators for Monocular Depth Estimation. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

  29. [29]

    Depth Completion as Parameter-Efficient Test-Time Adaptation, 2026

    Bingxin Ke, Qunjie Zhou, Jiahui Huang, Xuanchi Ren, Tianchang Shen, Konrad Schindler, Laura Leal-Taix´e, and Shengyu Huang. Depth Completion as Parameter-Efficient Test-Time Adaptation, 2026

  30. [30]

    Seman- ticBridge—A dataset for 3D semantic segmentation of bridges and do- main gap analysis.Developments in the Built Environment, 26:100912, 2026

    Maximilian Kellner, Mariana Ferrandon Cervantes, Yuandong Pan, Ruodan Lu, Ioannis Brilakis, and Alexander Reiterer. Seman- ticBridge—A dataset for 3D semantic segmentation of bridges and do- main gap analysis.Developments in the Built Environment, 26:100912, 2026

  31. [31]

    End-To-End Learning of Geometry and Context for Deep Stereo Regression

    Alex Kendall, Hayk Martirosyan, Saumitro Dasgupta, Peter Henry, Ryan Kennedy, Abraham Bachrach, and Adam Bry. End-To-End Learning of Geometry and Context for Deep Stereo Regression. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Oct 2017

  32. [32]

    Evaluation of CNN-based Single-Image Depth Estimation Methods

    Tobias Koch, Lukas Liebel, Friedrich Fraundorfer, and Marco Korner. Evaluation of CNN-based Single-Image Depth Estimation Methods. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pages 0–0, 2018

  33. [33]

    I Kostavelis, L Nalpantidis, R Detry, H Bruyninckx, A Billard, S Christian, M Bosch, K Andronikidis, H Lund-Nielsen, P Yosefipor, U Wajid, R Tomar, FL Mart ´ınez, F Fugaroli, D Papargyriou, N Mehandjiev, G Bhullar, E Gonc ¸alves, J Bentzen, M Essenbæk, C Cremona, M Wong, M Sanchez, D Giakoumis, and D Tzovaras. RoB´etArm´e Project: Human-robot collaborativ...

  34. [34]

    ConPR: Ongoing Construction Site Dataset for Place Recognition, 2024

    Dongjae Lee, Minwoo Jung, and Ayoung Kim. ConPR: Ongoing Construction Site Dataset for Place Recognition, 2024

  35. [35]

    Distilling Monocular Foundation Model for Fine-grained Depth Completion

    Yingping Liang, Yutao Hu, Wenqi Shao, and Ying Fu. Distilling Monocular Foundation Model for Fine-grained Depth Completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 22254–22265, June 2025

  36. [36]

    Any-stereo: Arbitrary scale disparity estimation for iterative stereo matching

    Zhaohuai Liang and Changhe Li. Any-stereo: Arbitrary scale disparity estimation for iterative stereo matching. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 3333–3341, 2024

  37. [37]

    Chen, Zhenyu Li, Guang Shi, Jiashi Feng, and Bingyi Kang

    Haotong Lin, Sili Chen, Junhao Liew, Donny Y . Chen, Zhenyu Li, Guang Shi, Jiashi Feng, and Bingyi Kang. Depth Anything 3: Recovering the Visual Space from Any Views, 2025

  38. [38]

    Depth Anything 3: Recovering the Visual Space from Any Views

    Haotong Lin, Sili Chen, Jun Hao Liew, Donny Y . Chen, Zhenyu Li, Guang Shi, Jiashi Feng, and Bingyi Kang. Depth Anything 3: Recovering the visual space from any views.arXiv preprint arXiv:2511.10647, 2025

  39. [39]

    Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation

    Haotong Lin, Sida Peng, Jingxiao Chen, Songyou Peng, Jiaming Sun, Minghuan Liu, Hujun Bao, Jiashi Feng, Xiaowei Zhou, and Bingyi Kang. Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17070– 17080, June 2025

  40. [40]

    RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching

    Lahav Lipson, Zachary Teed, and Jia Deng. RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching. In2021 International conference on 3D vision (3DV), pages 218–227. IEEE, 2021

  41. [41]

    A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation

    Nikolaus Mayer, Eddy Ilg, Philip Hausser, Philipp Fischer, Daniel Cremers, Alexey Dosovitskiy, and Thomas Brox. A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016

  42. [42]

    Sirakoulis, and Antonios Gaster- atos

    Lazaros Nalpantidis, Georgios Ch. Sirakoulis, and Antonios Gaster- atos. Review of stereo vision algorithms: from software to hardware. International Journal of Optomechatronics, 2(4):435–462, 2008

  43. [43]

    DINOv2: Learning Robust Visual Features without Supervision, 2024

    Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Fran- cisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herv ´e Jegou, Julien Mairal, Pa...

  44. [44]

    SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation

    Duc-Hai Pham, Tung Do, Phong Nguyen, Binh-Son Hua, Khoi Nguyen, and Rang Nguyen. SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 17060– 17069, 2025

  45. [45]

    UniDepthV2: Universal Monocular Metric Depth Estimation Made Simpler.IEEE Transac- tions on Pattern Analysis and Machine Intelligence, 48(3):2354–2367, 2026

    Luigi Piccinelli, Christos Sakaridis, Yung-Hsu Yang, Mattia Segu, Siyuan Li, Wim Abbeloos, and Luc Van Gool. UniDepthV2: Universal Monocular Metric Depth Estimation Made Simpler.IEEE Transac- tions on Pattern Analysis and Machine Intelligence, 48(3):2354–2367, 2026

  46. [46]

    UniDepth: Universal Monocular Metric Depth Estimation

    Luigi Piccinelli, Yung-Hsu Yang, Christos Sakaridis, Mattia Segu, Siyuan Li, Luc Van Gool, and Fisher Yu. UniDepth: Universal Monocular Metric Depth Estimation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10106–10116, June 2024

  47. [47]

    Ren ´e Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, and Vladlen Koltun. Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3):1623–1637, 2022

  48. [48]

    A Virtual Reality-Based Learning Environment for Human—Robot Collaboration Training in Construction 4.0

    Khadija Sabiri, Lu ´ıs Afonso, Caio Camargo, Estefˆania Gonc ¸alves, and Rui Fernandes. A Virtual Reality-Based Learning Environment for Human—Robot Collaboration Training in Construction 4.0. In2025 International Conference on Robotic Computing and Communication (RoboticCC), pages 54–61, 2025

  49. [49]

    Towards autonomous shotcrete construction: semantic 3D reconstruction for concrete deposition using stereo vision and deep learning

    Patrick Schmidt, Dimitrios Katsatos, Dimitrios Alexiou, Ioannis Kostavelis, Dimitrios Giakoumis, Dimitrios Tzovaras, and Lazaros Nalpantidis. Towards autonomous shotcrete construction: semantic 3D reconstruction for concrete deposition using stereo vision and deep learning. InProceedings of the 41st International Symposium on Automation and Robotics in Co...

  50. [50]

    Segmentation dataset for reinforced concrete construction.Automation in Construction, 171:105990, 2025

    Patrick Schmidt and Lazaros Nalpantidis. Segmentation dataset for reinforced concrete construction.Automation in Construction, 171:105990, 2025

  51. [51]

    Efficient Test-Time Optimization for Depth Completion via Low-Rank Decoder Adaptation, 2026

    Minseok Seo, Wonjun Lee, Jaehyuk Jang, and Changick Kim. Efficient Test-Time Optimization for Depth Completion via Low-Rank Decoder Adaptation, 2026

  52. [52]

    CFNet: Cascade and Fused Cost V olume for Robust Stereo Matching

    Zhelun Shen, Yuchao Dai, and Zhibo Rao. CFNet: Cascade and Fused Cost V olume for Robust Stereo Matching. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13906–13915, June 2021

  53. [53]

    PCW-Net: Pyramid Combination and Warping Cost V olume for Stereo Matching

    Zhelun Shen, Yuchao Dai, Xibin Song, Zhibo Rao, Dingfu Zhou, and Liangjun Zhang. PCW-Net: Pyramid Combination and Warping Cost V olume for Stereo Matching. In Shai Avidan, Gabriel Brostow, Moustapha Ciss ´e, Giovanni Maria Farinella, and Tal Hassner, editors, Computer Vision – ECCV 2022, pages 280–297, Cham, 2022. Springer Nature Switzerland

  54. [54]

    Dataset for evaluation and numerical modelling of structural performance of fibre- reinforced shotcrete with fibres of steel, synthetic and basalt.Data in Brief, 61:111684, 2025

    Andreas Sj ¨olander, Erik Nordstr ¨om, and Anders Ansell. Dataset for evaluation and numerical modelling of structural performance of fibre- reinforced shotcrete with fibres of steel, synthetic and basalt.Data in Brief, 61:111684, 2025

  55. [55]

    ConSLAM: Periodically Collected Real-World Construction Dataset for SLAM and Progress Monitoring

    Maciej Trzeciak, Kacper Pluta, Yasmin Fathy, Lucio Alcalde, Stan- ley Chee, Antony Bromley, Ioannis Brilakis, and Pierre Alliez. ConSLAM: Periodically Collected Real-World Construction Dataset for SLAM and Progress Monitoring. In Leonid Karlinsky, Tomer Michaeli, and Ko Nishino, editors,Computer Vision – ECCV 2022 Workshops, pages 317–331, Cham, 2023. Spr...

  56. [56]

    Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion

    Massimiliano Viola, Kevin Qu, Nando Metzger, Bingxin Ke, Alexan- der Becker, Konrad Schindler, and Anton Obukhov. Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 5359–5370, October 2025

  57. [57]

    MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision

    Ruicheng Wang, Sicheng Xu, Cassie Dai, Jianfeng Xiang, Yu Deng, Xin Tong, and Jiaolong Yang. MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 5261–5271, June 2025

  58. [58]

    MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details

    Ruicheng Wang, Sicheng Xu, Yue Dong, Yu Deng, Jianfeng Xiang, Zelong Lv, Guangzhong Sun, Xin Tong, and Jiaolong Yang. MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  59. [59]

    Croco v2: Improved cross- view completion pre-training for stereo matching and optical flow

    Philippe Weinzaepfel, Thomas Lucas, Vincent Leroy, Yohann Cabon, Vaibhav Arora, Romain Br ´egier, Gabriela Csurka, Leonid Antsfeld, Boris Chidlovskii, and J ´erˆome Revaud. Croco v2: Improved cross- view completion pre-training for stereo matching and optical flow. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 17969–17980, 2023

  60. [60]

    FoundationStereo: Zero-Shot Stereo Matching

    Bowen Wen, Matthew Trepte, Joseph Aribido, Jan Kautz, Orazio Gallo, and Stan Birchfield. FoundationStereo: Zero-Shot Stereo Matching. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5249–5260, June 2025

  61. [61]

    Advancing robotic automation in wood- framed construction using vision-driven adaptive control.Automation in Construction, 185:106858, 2026

    Chao Xie and Aladdin Alwisy. Advancing robotic automation in wood- framed construction using vision-driven adaptive control.Automation in Construction, 185:106858, 2026

  62. [62]

    Pixel- Perfect Depth with Semantics-Prompted Diffusion Transformers, 2025

    Gangwei Xu, Haotong Lin, Hongcheng Luo, Xianqi Wang, Jingfeng Yao, Lianghui Zhu, Yuechuan Pu, Cheng Chi, Haiyang Sun, Bing Wang, Guang Chen, Hangjun Ye, Sida Peng, and Xin Yang. Pixel- Perfect Depth with Semantics-Prompted Diffusion Transformers, 2025

  63. [63]

    Iterative geometry encoding volume for stereo matching

    Gangwei Xu, Xianqi Wang, Xiaohuan Ding, and Xin Yang. Iterative geometry encoding volume for stereo matching. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 21919–21928, 2023

  64. [64]

    AANet: Adaptive Aggregation Network for Efficient Stereo Matching

    Haofei Xu and Juyong Zhang. AANet: Adaptive Aggregation Network for Efficient Stereo Matching. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020

  65. [65]

    Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

    Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, and Hengshuang Zhao. Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10371–10381, June 2024

  66. [66]

    Depth Anything V2

    Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiaogang Xu, Jiashi Feng, and Hengshuang Zhao. Depth Anything V2. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Processing Systems, volume 37, pages 21875–21911. Curran Associates, Inc., 2024

  67. [67]

    Physics-based particle system modeling of shotcrete process for robotic placement.Construction Robotics, 9(2):30, 2025

    Mohammad Reza Yazdi Samadi, Ralf Waspe, and Christian Schlette. Physics-based particle system modeling of shotcrete process for robotic placement.Construction Robotics, 9(2):30, 2025

  68. [68]

    From Human to Height-Field: Predictive Shotcrete Simulation with a Physics-Informed Particle System

    Mohammad Reza Yazdi Samadi, Rui Wu, Soheil Gholami, Ralf Waspe, Ali Muhammad, Aude Billard, and Christian Schlette. From Human to Height-Field: Predictive Shotcrete Simulation with a Physics-Informed Particle System. In2025 IEEE International Con- ference on Advanced Robotics (ICAR), pages 826–832, 2025

  69. [69]

    Metric3D: Towards Zero- shot Metric 3D Prediction from A Single Image

    Wei Yin, Chi Zhang, Hao Chen, Zhipeng Cai, Gang Yu, Kaixuan Wang, Xiaozhi Chen, and Chunhua Shen. Metric3D: Towards Zero- shot Metric 3D Prediction from A Single Image. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9043–9053, October 2023

  70. [70]

    BetterDepth: Plug-and-Play Diffusion Refiner for Zero- Shot Monocular Depth Estimation

    Xiang Zhang, Bingxin Ke, Hayko Riemenschneider, Nando Metzger, Anton Obukhov, Markus Gross, Konrad Schindler, and Christopher Schroers. BetterDepth: Plug-and-Play Diffusion Refiner for Zero- Shot Monocular Depth Estimation. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors, Advances in Neural Information Process...

  71. [71]

    OGNI-DC: Robust Depth Completion with Optimization-Guided Neural Iterations

    Yiming Zuo and Jia Deng. OGNI-DC: Robust Depth Completion with Optimization-Guided Neural Iterations. InComputer Vision – ECCV 2024, pages 78–95, Cham, 2025. Springer Nature Switzerland

  72. [72]

    OMNI-DC: Highly Robust Depth Completion with Multiresolution Depth Inte- gration

    Yiming Zuo, Willow Yang, Zeyu Ma, and Jia Deng. OMNI-DC: Highly Robust Depth Completion with Multiresolution Depth Inte- gration. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9287–9297, October 2025