AerialMetric: Benchmarking and Adapting UAV Monocular Metric Depth Estimation in the Real World
Pith reviewed 2026-06-30 06:43 UTC · model grok-4.3
The pith
AerialMetric supplies 68K image-depth pairs that let fine-tuned models close the domain gap and reach state-of-the-art metric depth accuracy from UAV viewpoints.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AerialMetric provides 52K real and 16K synthetic image-depth pairs with metric ground truth across four complementary subsets that jointly cover photogrammetry data, controlled aerial acquisition, photorealistic synthetic scenes, and in-the-wild imagery. Evaluation of existing models under aerial conditions reveals the size of the domain gap and the separate effects of viewpoint, altitude, and camera parameters. Fine-tuning representative models on the dataset establishes a comprehensive aerial benchmark and delivers state-of-the-art metric depth performance on diverse UAV imagery.
What carries the argument
The AerialMetric dataset of 68K image-depth pairs with reliable metric ground truth collected under UAV viewpoints.
If this is right
- Existing models exhibit clear performance drops when applied to aerial viewpoints.
- Viewpoint, altitude, and camera parameters each measurably affect metric depth accuracy.
- Fine-tuning on AerialMetric creates a new public benchmark for aerial monocular metric depth.
- Adapted models reach state-of-the-art results across the four aerial subsets.
Where Pith is reading between the lines
- UAV navigation and mapping systems could adopt the fine-tuned models for more reliable obstacle avoidance and terrain reconstruction.
- The same real-plus-synthetic collection strategy may transfer to other robotics settings that face large viewpoint shifts.
- Public release of the pairs, code, and weights could accelerate metric depth work for satellite or underwater imagery.
- Testing the adapted models on live UAV video with changing lighting or motion would reveal whether the benchmark gains survive dynamic flight.
Load-bearing premise
The image-depth pairs supply accurate and representative metric ground truth for real UAV operating conditions.
What would settle it
Independent real UAV flights with LiDAR-verified depths showing that fine-tuned models produce no accuracy gain over untuned baselines would falsify the adaptation claim.
Figures
read the original abstract
This paper addresses the problem of monocular metric depth estimation in aerial UAV imagery. Although recent data-driven methods have achieved remarkable progress in ground-level scenarios, models trained primarily on street-view and indoor datasets exhibit significant domain gaps when applied to aerial viewpoints. To tackle these challenges, we introduce AerialMetric, a benchmark dataset designed to evaluate and facilitate the adaptation of monocular metric depth estimation under UAV aerial viewpoints. The dataset consists of four complementary subsets collected from different sources, jointly covering real-world photogrammetry data, controlled aerial acquisition settings, photorealistic synthetic scenes, and in-the-wild Internet imagery. Totally, AerialMetric provides 52K real-world and 16K synthetic image-depth pairs with reliable metric ground truth. Based on this dataset, we conduct systematic evaluations of existing state-of-the-art models under aerial settings and investigate the impact of viewpoint, altitude, and camera parameters on metric depth prediction. In addition, by fine-tuning representative metric depth model on our dataset, we establish a comprehensive aerial benchmark and achieve state-of-the-art performance across diverse aerial imagery. Our dataset, code, and model weight are publicly available at https://kuieless.github.io/AerialMetric-ECCV2026-page/.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces AerialMetric, a benchmark dataset for monocular metric depth estimation under UAV aerial viewpoints. It comprises four subsets (real photogrammetry, controlled aerial acquisition, photorealistic synthetic, and in-the-wild imagery) totaling 52K real and 16K synthetic image-depth pairs claimed to have reliable metric ground truth. The work evaluates existing SOTA models, analyzes effects of viewpoint/altitude/camera parameters, and reports that fine-tuning representative models on the dataset yields state-of-the-art performance across diverse aerial imagery, with dataset, code, and weights released publicly.
Significance. If the absolute metric scale of the real photogrammetry pairs is independently validated and representative of UAV conditions, the dataset would address a clear domain gap between ground-level and aerial depth estimation, providing a reproducible benchmark that enables systematic adaptation and evaluation. The public release of data, code, and weights is a clear strength supporting reproducibility.
major comments (1)
- [Dataset Construction] Dataset section (photogrammetry subset description): the assertion of 'reliable metric ground truth' for the 52K real-world pairs is load-bearing for the fine-tuning and SOTA claims, yet the manuscript provides no explicit description of absolute scale recovery (e.g., RTK-GPS fusion, known baselines, or barometric altitude) nor any error statistics against an external reference. Standard SfM/MVS pipelines recover depths only up to scale, so without this the reported domain-gap closure may optimize for pseudo-metric rather than true metric depth.
minor comments (1)
- [Abstract] Abstract: 'Totally, AerialMetric provides' should be rephrased to 'In total, AerialMetric provides' for standard academic English.
Simulated Author's Rebuttal
We thank the referee for their constructive review and for recognizing the potential value of AerialMetric as a benchmark. We respond point-by-point to the single major comment below.
read point-by-point responses
-
Referee: [Dataset Construction] Dataset section (photogrammetry subset description): the assertion of 'reliable metric ground truth' for the 52K real-world pairs is load-bearing for the fine-tuning and SOTA claims, yet the manuscript provides no explicit description of absolute scale recovery (e.g., RTK-GPS fusion, known baselines, or barometric altitude) nor any error statistics against an external reference. Standard SfM/MVS pipelines recover depths only up to scale, so without this the reported domain-gap closure may optimize for pseudo-metric rather than true metric depth.
Authors: We agree that the manuscript would be strengthened by an explicit description of absolute scale recovery for the photogrammetry subset. The 52K pairs were sourced from professional surveying pipelines that incorporate RTK-GPS, known camera intrinsics/extrinsics, and barometric altitude constraints to produce metric reconstructions; however, this process was summarized only briefly rather than detailed. In the revised manuscript we will add a dedicated paragraph (and, if space permits, a supplementary figure) describing the scale-recovery pipeline, the role of ground control points, and quantitative error statistics obtained by cross-validation against independent total-station measurements on a held-out subset of scenes. This addition will directly address the concern that the ground truth may be only pseudo-metric. revision: yes
Circularity Check
No circularity: empirical dataset and benchmarking contribution
full rationale
The paper introduces AerialMetric as a new benchmark dataset with image-depth pairs claimed to have reliable metric ground truth, then evaluates existing models and fine-tunes them to report SOTA performance. No derivation chain, equations, fitted parameters presented as predictions, or self-citation load-bearing steps exist in the provided text. The contribution is data collection and empirical evaluation against external models and imagery; it does not reduce any result to its own inputs by construction. Self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
117483 (2026)
Barbato, F., Caligiuri, M., Zanuttigh, P.: Flyawarev2: A multimodal cross-domain uavdatasetforurbansceneunderstanding.SignalProcessing:ImageCommunication p. 117483 (2026)
2026
-
[2]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Beche, R., Nedevschi, S.: Claravid: A holistic scene reconstruction benchmark from aerial perspective with delentropy-based complexity profiling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 26015–26025 (2025)
2025
-
[3]
ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth
Bhat, S.F., Birkl, R., Wofk, D., Wonka, P., Müller, M.: Zoedepth: Zero-shot transfer by combining relative and metric depth. arXiv preprint arXiv:2302.12288 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[4]
1–a model zoo for robust monocular relative depth estimation
Birkl, R., Wofk, D., Müller, M.: Midas v3. 1–a model zoo for robust monocular relative depth estimation. arXiv preprint arXiv:2307.14460 (2023)
-
[5]
In: ICLR (2025)
Bochkovskii, A., Delaunoy, A., Germain, H., Santos, M., Zhou, Y., Richter, S.R., Koltun, V.: Depth pro: Sharp monocular metric depth in less than a second. In: ICLR (2025)
2025
-
[6]
In: CVPR (2020)
Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., Beijbom, O.: nuscenes: A multimodal dataset for autonomous driving. In: CVPR (2020)
2020
-
[7]
In: CVPR (2019)
Chang, M.F., Lambert, J., Sangkloy, P., Singh, J., Bak, S., Hartnett, A., Wang, D., Carr, P., Lucey, S., Ramanan, D., Hays, J.: Argoverse: 3d tracking and forecasting with rich maps. In: CVPR (2019)
2019
-
[8]
arXiv preprint arXiv:2203.09065 (2022) 16 Z
Chen, M., Hu, Q., Yu, Z., Thomas, H., Feng, A., Hou, Y., McCullough, K., Ren, F., Soibelman, L.: Stpls3d: A large-scale synthetic and real aerial photogrammetry 3d point cloud dataset. arXiv preprint arXiv:2203.09065 (2022) 16 Z. Song et al
-
[9]
Comanici, G., Bieber, E., Schaekermann, M., Pasupat, I., Sachdeva, N., Dhillon, I., Blistein, M., Ram, O., Zhang, D., Rosen, E., et al.: Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities. arXiv preprint arXiv:2507.06261 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[10]
In: CVPR (2017)
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: Scannet: Richly-annotated 3d reconstructions of indoor scenes. In: CVPR (2017)
2017
-
[11]
IEEE Robotics and Automation Letters10(4), 3302–3309 (2025)
Dhrafani, D., Liu, Y., Jong, A., Shin, U., He, Y., Harp, T., Hu, Y., Oh, J., Scherer, S.: Firestereo: Forest infrared stereo dataset for uas depth perception in visually degraded environments. IEEE Robotics and Automation Letters10(4), 3302–3309 (2025)
2025
-
[12]
DJI: Dji terra.https://enterprise.dji.com/dji-terra, accessed: 2025-11-3
2025
-
[13]
In: CVPR (2023)
Du, B., Huang, Y., Chen, J., Huang, D.: Adaptive sparse convolutional networks with global context enhancement for faster object detection on drone images. In: CVPR (2023)
2023
-
[14]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Du, K., Liao, X., Xia, J., Guo, C., Gu, Y., Guan, Y., Wang, D., Huang, S., Wang, Z.: Uavlight: A benchmark for illumination-robust 3d reconstruction in unmanned aerial vehicle (uav) scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 5670–5679 (2026)
2026
-
[15]
NIPS (2014)
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. NIPS (2014)
2014
-
[16]
ESRI: Create a 3d product with arcgis drone2map.https://www.esri.com/zh- cn/arcgis/products/arcgis-reality/resources/sample-drone-datasets , ac- cessed: 2025-10-5
2025
-
[17]
In: 2021 IEEE 17th International Conference on Intelligent Com- puter Communication and Processing (ICCP)
Florea, H., Miclea, V.C., Nedevschi, S.: Wilduav: Monocular uav dataset for depth estimation tasks. In: 2021 IEEE 17th International Conference on Intelligent Com- puter Communication and Processing (ICCP). pp. 291–298. IEEE (2021)
2021
-
[18]
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing18, 5445–5459 (2025)
Florea, H., Nedevschi, S.: Tandepth: Leveraging global dems for metric monocular depth estimation in uavs. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing18, 5445–5459 (2025)
2025
-
[19]
In: CVPRW (2019)
Fonder, M., Van Droogenbroeck, M.: Mid-air: A multi-modal dataset for extremely low altitude drone flights. In: CVPRW (2019)
2019
-
[20]
arXiv preprint arXiv:2502.18041 (2025)
Gao, Y., Li, C., You, Z., Liu, J., Li, Z., Chen, P., Chen, Q., Tang, Z., Wang, L., Yang, P., Tang, Y., Tang, Y., Liang, S., Zhu, S., Xiong, Z., Su, Y., Ye, X., Li, J., Ding, Y., Wang, D., Wang, Z., Zhao, B., Li, X.: Openfly: A comprehensive platform for aerial vision-language navigation. arXiv preprint arXiv:2502.18041 (2025)
-
[21]
The international journal of robotics research32(11), 1231–1237 (2013)
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: The kitti dataset. The international journal of robotics research32(11), 1231–1237 (2013)
2013
-
[22]
Google: Google earth pro.https://earth.google.com, accessed: 2026-01-05
2026
-
[23]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Gross, M., Matha, S.B., Fahmy, A., Song, R., Cremers, D., Meeß, H.: Occufly: A 3d vision benchmark for semantic scene completion from the aerial perspective. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 21474–21485 (2026)
2026
-
[24]
In: CVPR (2020)
Guizilini, V., Ambrus, R., Pillai, S., Raventos, A., Gaidon, A.: 3d packing for self-supervised monocular depth estimation. In: CVPR (2020)
2020
-
[25]
Lotus-2: Advancing Geometric Dense Prediction with Powerful Image Generative Model
He, J., Li, H., Sheng, M., Chen, Y.C.: Lotus-2: Advancing geometric dense prediction with powerful image generative model. arXiv preprint arXiv:2512.01030 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[26]
arXiv preprint arXiv:2409.18124 (2024)
He, J., Li, H., Yin, W., Liang, Y., Li, L., Zhou, K., Liu, H., Liu, B., Chen, Y.C.: Lotus: Diffusion-based visual foundation model for high-quality dense prediction. arXiv preprint arXiv:2409.18124 (2024)
-
[27]
In: ICLR (2022) AerialMetric: UAV Metric Depth Estimation 17
Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W.: LoRA: Low-rank adaptation of large language models. In: ICLR (2022) AerialMetric: UAV Metric Depth Estimation 17
2022
-
[28]
TPAMI46(12), 10579–10596 (2024)
Hu, M., Yin, W., Zhang, C., Cai, Z., Long, X., Chen, H., Wang, K., Yu, G., Shen, C., Shen, S.: Metric3d v2: A versatile monocular geometric foundation model for zero-shot metric depth and surface normal estimation. TPAMI46(12), 10579–10596 (2024)
2024
-
[29]
In: CVPR (2018)
Huang, P.H., Matzen, K., Kopf, J., Ahuja, N., Huang, J.B.: Deepmvs: Learning multi-view stereopsis. In: CVPR (2018)
2018
-
[30]
In: AAAI (2022)
Huang, Y., Chen, J., Huang, D.: Ufpmp-det: Toward accurate and efficient object detection on drone imagery. In: AAAI (2022)
2022
-
[31]
In: CVPR (2023)
Jung, H., Ruhkamp, P., Zhai, G., Brasch, N., Li, Y., Verdie, Y., Song, J., Zhou, Y., Armagan, A., Ilic, S., et al.: On the importance of accurate geometry data for dense 3d vision tasks. In: CVPR (2023)
2023
-
[32]
Array23, 100361 (2024)
Katkuri, A.V.R., Madan, H., Khatri, N., Abdul-Qawy, A.S.H., Patnaik, K.S.: Autonomous uav navigation using deep learning-based computer vision frameworks: A systematic literature review. Array23, 100361 (2024)
2024
-
[33]
In: 2026 International Conference on 3D Vision (3DV)
Keetha, N., Müller, N., Schönberger, J., Porzi, L., Zhang, Y., Fischer, T., Knapitsch, A., Zauss, D., Weber, E., Antunes, N., et al.: Mapanything: Universal feed-forward metric 3d reconstruction. In: 2026 International Conference on 3D Vision (3DV). pp. 499–509. IEEE (2026)
2026
-
[34]
In: ECCV Workshops (2018)
Koch, T., Liebel, L., Fraundorfer, F., Korner, M.: Evaluation of cnn-based single- image depth estimation methods. In: ECCV Workshops (2018)
2018
-
[35]
In: CVPR (2024)
Kolbeinsson, B., Mikolajczyk, K.: Ddos: The drone depth and obstacle segmentation dataset. In: CVPR (2024)
2024
-
[36]
ISPRS Open Journal of Photogrammetry and Remote Sensing1, 100001 (2021)
Kölle, M., Laupheimer, D., Schmohl, S., Haala, N., Rottensteiner, F., Wegner, J.D., Ledoux, H.: The hessigheim 3d (h3d) benchmark on semantic segmentation of high-resolution 3d point clouds and textured meshes from uav lidar and multi-view- stereo. ISPRS Open Journal of Photogrammetry and Remote Sensing1, 100001 (2021)
2021
-
[37]
The International Journal of Robotics Research 43(8), 1114–1127 (2024)
Li, H., Zou, Y., Chen, N., Lin, J., Liu, X., Xu, W., Zheng, C., Li, R., He, D., Kong, F., et al.: Mars-lvig dataset: A multi-sensor aerial robots slam dataset for lidar-visual-inertial-gnss fusion. The International Journal of Robotics Research 43(8), 1114–1127 (2024)
2024
-
[38]
In: ICML (2024)
Li, Y., Liu, M., Wu, Y., Wang, X., Yang, X., Li, S.: Learning adaptive and view- invariant vision transformer for real-time uav tracking. In: ICML (2024)
2024
-
[39]
In: CVPR (2018)
Li, Z., Snavely, N.: Megadepth: Learning single-view depth prediction from internet photos. In: CVPR (2018)
2018
-
[40]
Depth Anything 3: Recovering the Visual Space from Any Views
Lin, H., Chen, S., Liew, J., Chen, D.Y., Li, Z., Shi, G., Feng, J., Kang, B.: Depth any- thing 3: Recovering the visual space from any views. arXiv preprint arXiv:2511.10647 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[41]
In: ECCV (2022)
Lin, L., Liu, Y., Hu, Y., Yan, X., Xie, K., Huang, H.: Capturing, reconstructing, and simulating: the urbanscene3d dataset. In: ECCV (2022)
2022
-
[42]
arXiv preprint arXiv:2512.16913 (2025)
Lin, X., Song, M., Zhang, D., Lu, W., Li, H., Du, B., Yang, M.H., Nguyen, T., Qi, L.: Depth any panoramas: A foundation model for panoramic depth estimation. arXiv preprint arXiv:2512.16913 (2025)
-
[43]
Science Robotics6(59), eabg5810 (2021)
Loquercio, A., Kaufmann, E., Ranftl, R., Müller, M., Koltun, V., Scaramuzza, D.: Learning high-speed flight in the wild. Science Robotics6(59), eabg5810 (2021)
2021
-
[44]
ISPRS journal of photogrammetry and remote sensing165, 108–119 (2020)
Lyu, Y., Vosselman, G., Xia, G.S., Yilmaz, A., Yang, M.Y.: Uavid: A semantic segmentation dataset for uav imagery. ISPRS journal of photogrammetry and remote sensing165, 108–119 (2020)
2020
-
[45]
ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences2(2020) 18 Z
Madhuanand, L., Nex, F., Yang, M., et al.: Deep learning for monocular depth estimation from uav images. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences2(2020) 18 Z. Song et al
2020
-
[46]
In: ECCV (2012)
Nathan Silberman, Derek Hoiem, P.K., Fergus, R.: Indoor segmentation and support inference from rgbd images. In: ECCV (2012)
2012
-
[47]
ISPRS Open Journal of Photogrammetry and Remote Sensing13, 100070 (2024)
Nex, F., Stathopoulou, E., Remondino, F., Yang, M., Madhuanand, L., Yogender, Y., Alsadik, B., Weinmann, M., Jutzi, B., Qin, R.: Usegeo-a uav-based multi-sensor dataset for geospatial research. ISPRS Open Journal of Photogrammetry and Remote Sensing13, 100070 (2024)
2024
-
[48]
The International Journal of Robotics Research41(3), 270–280 (2022)
Nguyen, T.M., Yuan, S., Cao, M., Lyu, Y., Nguyen, T.H., Xie, L.: Ntu viral: A visual- inertial-ranging-lidar dataset, from an aerial vehicle viewpoint. The International Journal of Robotics Research41(3), 270–280 (2022)
2022
-
[49]
OpenDroneMap Authors: Odm - a command line toolkit to generate maps, point clouds, 3d models and dems from drone, balloon or kite images.https: //opendronemap.org/odm/datasets/(2020), accessed: 2025-9-25
2020
-
[50]
Piccinelli, L., Sakaridis, C., Yang, Y.H., Segu, M., Li, S., Abbeloos, W., Gool, L.V.: UniDepthV2: Universal monocular metric depth estimation made simpler (2025), https://arxiv.org/abs/2502.20110
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[51]
In: CVPR (2024)
Piccinelli, L., Yang, Y.H., Sakaridis, C., Segu, M., Li, S., Van Gool, L., Yu, F.: UniDepth: Universal monocular metric depth estimation. In: CVPR (2024)
2024
-
[52]
In: ICCV (2021)
Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV (2021)
2021
-
[53]
TPAMI44(3), 1623–1637 (2022)
Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., Koltun, V.: Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. TPAMI44(3), 1623–1637 (2022)
2022
-
[54]
In: ICCV (2023)
Rizzoli, G., Barbato, F., Caligiuri, M., Zanuttigh, P.: Syndrone-multi-modal uav dataset for urban scenarios. In: ICCV (2023)
2023
-
[55]
In: ICCV (2021)
Roberts, M., Ramapuram, J., Ranjan, A., Kumar, A., Bautista, M.A., Paczan, N., Webb, R., Susskind, J.M.: Hypersim: A photorealistic synthetic dataset for holistic indoor scene understanding. In: ICCV (2021)
2021
-
[56]
In: CVPR (2016)
Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR (2016)
2016
-
[57]
In: CVPR (2017)
Schops, T., Schonberger, J.L., Galliani, S., Sattler, T., Schindler, K., Pollefeys, M., Geiger, A.: A multi-view stereo benchmark with high-resolution images and multi-camera videos. In: CVPR (2017)
2017
-
[58]
Sensors22(6), 2097 (2022)
Shimada, T., Nishikawa, H., Kong, X., Tomiyama, H.: Pix2pix-based monocular depth estimation for drones with optical flow on airsim. Sensors22(6), 2097 (2022)
2097
-
[59]
In: CVPR (2020)
Sun, P., Kretzschmar, H., Dotiwalla, X., Chouard, A., Patnaik, V., Tsui, P., Guo, J., Zhou, Y., Chai, Y., Caine, B., Vasudevan, V., Han, W., Ngiam, J., Zhao, H., Timofeev, A., Ettinger, S., Krivokon, M., Gao, A., Joshi, A., Zhang, Y., Shlens, J., Chen, Z., Anguelov, D.: Scalability in perception for autonomous driving: Waymo open dataset. In: CVPR (2020)
2020
-
[60]
The International Journal of Robotics Research43(12), 1853–1866 (2024)
Thalagala, R.G., De Silva, O., Jayasiri, A., Gubbels, A., Mann, G.K., Gosine, R.G.: Mun-frl: A visual-inertial-lidar dataset for aerial autonomous navigation and mapping. The International Journal of Robotics Research43(12), 1853–1866 (2024)
2024
-
[61]
In: CVPR (2022)
Turki, H., Ramanan, D., Satyanarayanan, M.: Mega-nerf: Scalable construction of large-scale nerfs for virtual fly-throughs. In: CVPR (2022)
2022
-
[62]
arXiv preprint arXiv:1908.00463 (2019)
Vasiljevic, I., Kolkin, N., Zhang, S., Luo, R., Wang, H., Dai, F.Z., Daniele, A.F., Mostajabi, M., Basart, S., Walter, M.R., et al.: Diode: A dense indoor and outdoor depth dataset. arXiv preprint arXiv:1908.00463 (2019)
-
[63]
In: CVPR (2025)
Vuong, K., Ghosh, A., Ramanan, D., Narasimhan, S., Tulsiani, S.: Aerialmegadepth: Learning aerial-ground reconstruction and view synthesis. In: CVPR (2025)
2025
-
[64]
ISPRS Journal of Photogrammetry and Remote Sensing190, 196–214 (2022)
Wang, L., Li, R., Zhang, C., Fang, S., Duan, C., Meng, X., Atkinson, P.M.: Unet- former: A unet-like transformer for efficient semantic segmentation of remote sensing AerialMetric: UAV Metric Depth Estimation 19 urban scene imagery. ISPRS Journal of Photogrammetry and Remote Sensing190, 196–214 (2022)
2022
-
[65]
In: CVPR (2025)
Wang, R., Xu, S., Dai, C., Xiang, J., Deng, Y., Tong, X., Yang, J.: Moge: Unlocking accurate monocular geometry estimation for open-domain images with optimal training supervision. In: CVPR (2025)
2025
-
[66]
Advances in Neural Information Processing Systems38, 35928–35959 (2025)
Wang, R., Xu, S., Dong, Y., Deng, Y., Xiang, J., Lv, Z., Sun, G., Tong, X., Yang, J.: Moge-2: Accurate monocular geometry with metric scale and sharp details. Advances in Neural Information Processing Systems38, 35928–35959 (2025)
2025
-
[67]
In: ICCV (2025)
Wang, S., Li, S., Zhang, Y., Yu, S., Yuan, S., She, R., Guo, Q., Zheng, J., Howe, O.K., Chandra, L., et al.: Uavscenes: A multi-modal dataset for uavs. In: ICCV (2025)
2025
-
[68]
In: IROS (2020)
Wang, W., Zhu, D., Wang, X., Hu, Y., Qiu, Y., Wang, C., Hu, Y., Kapoor, A., Scherer, S.: Tartanair: A dataset to push the limits of visual slam. In: IROS (2020)
2020
-
[69]
Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting
Wilson, B., Qi, W., Agarwal, T., Lambert, J., Singh, J., Khandelwal, S., Pan, B., Kumar, R., Hartnett, A., Pontes, J.K., et al.: Argoverse 2: Next generation datasets for self-driving perception and forecasting. arXiv preprint arXiv:2301.00493 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[70]
arXiv preprint arXiv:2401.05971 (2024)
Wu, R., Cheng, X., Zhu, J., Liu, X., Zhang, M., Yan, S.: Uavd4l: A large-scale dataset for uav 6-dof localization. arXiv preprint arXiv:2401.05971 (2024)
-
[71]
In: CVPR (2025)
Wu, Y., Wang, X., Yang, X., Liu, M., Zeng, D., Ye, H., Li, S.: Learning occlusion- robust vision transformers for real-time uav tracking. In: CVPR (2025)
2025
-
[72]
arXiv preprint arXiv:2401.14032 (2024)
Xiong,B.,Li,Z.,Li,Z.:Gauu-scene:Ascenereconstructionbenchmarkonlargescale 3d reconstruction dataset using gaussian splatting. arXiv preprint arXiv:2401.14032 (2024)
-
[73]
arXiv preprint arXiv:2404.04880 (2024)
Xiong, B., Zheng, N., Liu, J., Li, Z.: Gauu-scene v2: Assessing the reliability of image-based metrics with expansive lidar image dataset using 3dgs and nerf. arXiv preprint arXiv:2404.04880 (2024)
-
[74]
In: CVPR (2022)
Yan, Q., Zheng, J., Reding, S., Li, S., Doytchinov, I.: Crossloc: Scalable aerial localization assisted by multimodal synthetic data. In: CVPR (2022)
2022
-
[75]
In: SIGGRAPH (2023)
Yang, G., Xue, F., Zhang, Q., Xie, K., Fu, C.W., Huang, H.: Urbanbis: a large-scale benchmark for fine-grained urban building instance segmentation. In: SIGGRAPH (2023)
2023
-
[76]
In: CVPR (2024)
Yang, L., Kang, B., Huang, Z., Xu, X., Feng, J., Zhao, H.: Depth anything: Un- leashing the power of large-scale unlabeled data. In: CVPR (2024)
2024
-
[77]
NIPS (2024)
Yang, L., Kang, B., Huang, Z., Zhao, Z., Xu, X., Feng, J., Zhao, H.: Depth anything v2. NIPS (2024)
2024
-
[78]
In: AAAI (2025)
Ye, C., Zhuge, Y., Zhang, P.: Towards open-vocabulary remote sensing image semantic segmentation. In: AAAI (2025)
2025
-
[79]
In: ICCV (2023)
Yin, W., Zhang, C., Chen, H., Cai, Z., Yu, G., Wang, K., Chen, X., Shen, C.: Metric3d: Towards zero-shot metric 3d prediction from a single image. In: ICCV (2023)
2023
-
[80]
arXiv preprint arXiv:2601.03252 (2026)
Yu, H., Lin, H., Wang, J., Li, J., Wang, Y., Zhang, X., Wang, Y., Zhou, X., Hu, R., Peng, S.: Infinidepth: Arbitrary-resolution and fine-grained depth estimation with neural implicit fields. arXiv preprint arXiv:2601.03252 (2026)
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.