pith. machine review for the scientific record. sign in

arxiv: 2604.22482 · v1 · submitted 2026-04-24 · 💻 cs.CV · cs.GR

Recognition: unknown

Holo360D: A Large-Scale Real-World Dataset with Continuous Trajectories for Advancing Panoramic 3D Reconstruction and Beyond

Hui Xiong, Jing Ou, Jinjing Zhu, Shuai Zhang, Tongyan Hua, Wufan Zhao, Yinrui Ren, Zhuoxiao Li, Zidong Cao

Pith reviewed 2026-05-08 12:26 UTC · model grok-4.3

classification 💻 cs.CV cs.GR
keywords panoramic 3D reconstructioncontinuous trajectoriesdepth maps360 datasetSLAMlaser scanningbenchmarkfeed-forward models
0
0 comments X

The pith

Holo360D supplies the first large-scale dataset of continuous panoramic sequences with aligned high-completeness depth maps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Holo360D to fix a key gap: existing panoramic 3D datasets capture 360 images only from fixed, separate locations, so they lack the continuous trajectories needed for multi-view training. Current feed-forward reconstruction models already lose accuracy on panoramas because of spherical distortions, and the missing continuity makes multi-view learning even harder. The new dataset records 109,495 panoramas along smooth paths using a 3D laser scanner paired with a 360 camera, then runs online and offline SLAM followed by geometry denoising, mesh hole filling, and region-specific remeshing to produce registered point clouds, meshes, and depth maps. Fine-tuning experiments on Holo360D show that models receive stronger training signals and establish a practical benchmark for panoramic 3D work.

Core claim

Holo360D is the first large-scale real-world dataset that supplies continuous panoramic sequences paired with accurately aligned high-completeness depth maps, registered point clouds, meshes, and camera poses. Raw data are captured with a 3D laser scanner and 360 camera, refined through SLAM systems, and cleaned by a post-processing pipeline of geometry denoising, mesh hole filling, and region-specific remeshing. Fine-tuning 3D reconstruction models on the dataset yields superior training signals compared with prior discrete-location collections.

What carries the argument

The Holo360D dataset of continuous panoramic sequences with SLAM-aligned high-completeness depth maps produced by laser scanning and a tailored post-processing pipeline.

If this is right

  • Panoramic feed-forward 3D reconstruction models gain stronger multi-view training signals from continuous trajectories.
  • The dataset functions as a standardized benchmark for evaluating and advancing panoramic 3D reconstruction methods.
  • Fine-tuned models exhibit improved handling of spherical distortions when trained on the aligned depth maps.
  • Public release of the data and code supports further development of related panoramic vision applications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The focus on trajectory continuity implies that similar capture and processing choices could improve datasets for other wide-field sensors.
  • Better panoramic reconstruction models trained this way may directly aid indoor mapping and navigation systems that use consumer 360 cameras.
  • The post-processing steps could be tested on other large-scale 3D capture projects to reduce artifacts in depth maps.

Load-bearing premise

The post-processing pipeline of geometry denoising, mesh hole filling, and region-specific remeshing combined with online and offline SLAM produces sufficiently accurate alignments and high-completeness depth maps without major artifacts or biases.

What would settle it

If models fine-tuned on Holo360D show no accuracy improvement or even worse results on panoramic 3D reconstruction tasks than models trained on existing discrete panoramic datasets, the claim of superior training signals would be falsified.

Figures

Figures reproduced from arXiv: 2604.22482 by Hui Xiong, Jing Ou, Jinjing Zhu, Shuai Zhang, Tongyan Hua, Wufan Zhao, Yinrui Ren, Zhuoxiao Li, Zidong Cao.

Figure 1
Figure 1. Figure 1: We present Holo360D, the first large-scale real-world panoramic 3D dataset, containing 109,495 panoramas paired with LiDAR￾derived ground truth, including precise meshes, point clouds, depth maps, and camera poses. More importantly, Holo360D is the first panoramic dataset to offer accurately aligned high-completeness depth maps with continuous camera trajectories over long sequences. Abstract While feed-fo… view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of depth maps across different panoramic view at source ↗
Figure 3
Figure 3. Figure 3: Dataset creation pipeline consisting of (i) data collection, (ii) offline reconstruction, and (iii) data post-processing. view at source ↗
Figure 4
Figure 4. Figure 4: Data post-processing pipeline consisting of (i) data denoising, (ii) mesh hole filling, and (iii) region-specific remeshing. view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of reconstructed mesh models on Matterport3D and Holo360D. Holo360D meshes exhibit higher completeness in view at source ↗
Figure 6
Figure 6. Figure 6: Reference dimensions used to evaluate point cloud re view at source ↗
Figure 7
Figure 7. Figure 7: Visualization of fine-tuning performance with different view at source ↗
Figure 8
Figure 8. Figure 8: Visualization results comparing different view configurations and depth supervision types. view at source ↗
Figure 9
Figure 9. Figure 9: View decomposition strategies. The 8 views consists view at source ↗
Figure 10
Figure 10. Figure 10: Visualization of baseline models fine-tuned on Holo360D. The blue arrows indicate viewpoints selected for zoom-in views. view at source ↗
Figure 11
Figure 11. Figure 11: Comparison of reconstructions in glass regions before and after finetuning. The finetuned view at source ↗
Figure 12
Figure 12. Figure 12: Finetuning π 3 on different datasets. Fine-tuning on Holo360D enables more accurate and complete reconstruction re￾sults than finetuning on Matterport3D. its re-rendered version also offers continuous three-view sequences, making it an ideal reference for comparison. As shown in view at source ↗
Figure 13
Figure 13. Figure 13: Challenging scenes. Our dataset includes (a) low-texture and repetitive-texture scenes, (b) large, long-sequence scenes, and (c) view at source ↗
Figure 14
Figure 14. Figure 14: Comparison of single-frame point clouds. view at source ↗
Figure 15
Figure 15. Figure 15: Qualitative comparison of sparse-view panoramic 3D reconstruction results. After finetuning with our dataset, the view at source ↗
Figure 16
Figure 16. Figure 16: Qualitative comparison of single-view panoramic 3D reconstruction results. After finetuning with our dataset, the model view at source ↗
Figure 17
Figure 17. Figure 17: Comparison between the advanced panoramic monocular depth estimation models (DA view at source ↗
Figure 18
Figure 18. Figure 18: Degradation of reconstruction quality in distant regions. view at source ↗
read the original abstract

While feed-forward 3D reconstruction models have advanced rapidly, they still exhibit degraded performance on panoramas due to spherical distortions. Moreover, existing panoramic 3D datasets are predominantly collected with 360 cameras fixed at discrete locations, resulting in discontinuous trajectories. These limitations critically hinder the development of panoramic feed-forward 3D reconstruction, especially for the multi-view setting. In this paper, we present Holo360D, a comprehensive dataset containing 109,495 panoramas paired with registered point clouds, meshes, and aligned camera poses. To our knowledge, Holo360D is the first large-scale dataset that provides continuous panoramic sequences with accurately aligned high-completeness depth maps. The raw data are initially collected using a 3D laser scanner coupled with a 360 camera. Subsequently, the raw data are processed with both online and offline SLAM systems. Furthermore, to enhance the 3D data quality, a post-processing pipeline tailored for the 360 dataset is proposed, including geometry denoising, mesh hole filling, and region-specific remeshing. Finally, we establish a new benchmark by fine-tuning 3D reconstruction models on Holo360D, providing key insights into effective fine-tuning strategies. Our results demonstrate that Holo360D delivers superior training signals and provides a comprehensive benchmark for advancing panoramic 3D reconstruction models. Datasets and Code will be made publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents Holo360D, a dataset of 109,495 continuous panoramic sequences paired with registered point clouds, meshes, and camera poses. Raw data are captured with a 3D laser scanner and 360° camera, then processed via online and offline SLAM followed by a post-processing pipeline (geometry denoising, mesh hole filling, region-specific remeshing). The authors fine-tune existing 3D reconstruction models on the dataset to create a benchmark and claim that Holo360D supplies superior training signals for panoramic feed-forward reconstruction.

Significance. A high-quality, large-scale continuous-trajectory panoramic dataset with aligned depth would address a clear gap in multi-view 3D reconstruction research, where existing datasets are limited to discrete viewpoints. Public release of data and code is a concrete strength that could enable reproducible progress on spherical-distortion handling.

major comments (2)
  1. [Abstract and §4] Abstract and §4 (Benchmark): the central claim that Holo360D 'delivers superior training signals' rests on fine-tuning results, yet no quantitative metrics (RMSE, completeness percentages, alignment error distributions, or controlled ablations against prior panoramic datasets) are reported. This absence directly undermines verification of the 'accurately aligned high-completeness depth maps' assertion.
  2. [§3.3] §3.3 (Post-processing pipeline): the description of geometry denoising, hole filling, and region-specific remeshing contains no before/after quantitative validation or error analysis. Given that the pipeline is load-bearing for the 'high-completeness' and 'artifact-free' properties, the lack of such evidence leaves the weakest assumption untested.
minor comments (2)
  1. [Abstract] The abstract states 'Datasets and Code will be made publicly available' without a specific URL or repository link; this should be added for reproducibility.
  2. [§2] Notation for 'continuous trajectories' versus 'discontinuous' baselines could be clarified with a short diagram or table comparing trajectory properties across datasets.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and for recognizing the potential value of Holo360D in addressing gaps in panoramic 3D reconstruction. We address each major comment below and will revise the manuscript accordingly to provide stronger quantitative support.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (Benchmark): the central claim that Holo360D 'delivers superior training signals' rests on fine-tuning results, yet no quantitative metrics (RMSE, completeness percentages, alignment error distributions, or controlled ablations against prior panoramic datasets) are reported. This absence directly undermines verification of the 'accurately aligned high-completeness depth maps' assertion.

    Authors: We agree that the current version of the manuscript does not include the requested quantitative metrics or ablations in §4. In the revised manuscript we will expand the benchmark section to report RMSE, completeness percentages, alignment error distributions, and controlled comparisons against prior panoramic datasets. These additions will directly substantiate the claim of superior training signals and allow verification of the alignment and completeness properties. revision: yes

  2. Referee: [§3.3] §3.3 (Post-processing pipeline): the description of geometry denoising, hole filling, and region-specific remeshing contains no before/after quantitative validation or error analysis. Given that the pipeline is load-bearing for the 'high-completeness' and 'artifact-free' properties, the lack of such evidence leaves the weakest assumption untested.

    Authors: We concur that quantitative before-and-after validation is required for the post-processing pipeline. We will augment §3.3 with error analyses and metrics quantifying the effects of geometry denoising, hole-filling success rates, and region-specific remeshing accuracy. These additions will provide concrete evidence supporting the high-completeness and artifact-free characteristics of the final data. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents an empirical dataset construction effort using laser scanning, 360 cameras, online/offline SLAM, and a post-processing pipeline of denoising, hole filling, and remeshing. No equations, parameter fittings, or mathematical derivations are described that could reduce to self-defined inputs or fitted quantities by construction. Claims of being the 'first large-scale dataset' with continuous trajectories and high-completeness depth maps rest on the described data collection process rather than any self-referential logic, self-citation chains, or renamed known results. The contribution is data release and benchmarking, with no load-bearing steps that collapse into their own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim depends on the unverified accuracy of the custom post-processing pipeline and SLAM alignment for producing usable training data; no free parameters or invented entities are introduced.

axioms (2)
  • domain assumption Online and offline SLAM systems can produce accurate camera pose alignment between 360 images and laser-scanned point clouds in real-world environments.
    Invoked in the data processing step described in the abstract.
  • domain assumption The proposed post-processing steps (denoising, hole filling, region-specific remeshing) improve 3D data quality without introducing new errors that affect downstream model training.
    Stated as part of the pipeline to enhance 3D data quality.

pith-pipeline@v0.9.0 · 5587 in / 1591 out tokens · 37098 ms · 2026-05-08T12:26:46.147137+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

47 extracted references · 7 canonical work pages · 1 internal anchor

  1. [1]

    A survey of represen- tation learning, optimization strategies, and applications for omnidirectional vision.International Journal of Computer Vision, pages 1–40, 2025

    Hao Ai, Zidong Cao, and Lin Wang. A survey of represen- tation learning, optimization strategies, and applications for omnidirectional vision.International Journal of Computer Vision, pages 1–40, 2025. 2

  2. [2]

    Pano3d: A holistic benchmark and a solid baseline for 360deg depth estimation

    Georgios Albanis, Nikolaos Zioulis, Petros Drakoulis, Vasileios Gkitsas, Vladimiros Sterzentsenko, Federico Al- varez, Dimitrios Zarpalas, and Petros Daras. Pano3d: A holistic benchmark and a solid baseline for 360deg depth estimation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3727– 3737, 2021. 2

  3. [3]

    Mapillary planet-scale depth dataset

    Manuel L ´opez Antequera, Pau Gargallo, Markus Hofinger, Samuel Rota Bulo, Yubin Kuang, and Peter Kontschieder. Mapillary planet-scale depth dataset. InEuropean Confer- ence on Computer Vision, pages 589–604. Springer, 2020. 4

  4. [4]

    Joint 2D-3D-Semantic Data for Indoor Scene Understanding

    Iro Armeni, Sasha Sax, Amir R Zamir, and Silvio Savarese. Joint 2d-3d-semantic data for indoor scene understanding. arXiv preprint arXiv:1702.01105, 2017. 2, 3

  5. [5]

    Neural rgb-d surface reconstruction

    Dejan Azinovi ´c, Ricardo Martin-Brualla, Dan B Goldman, Matthias Nießner, and Justus Thies. Neural rgb-d surface reconstruction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6290– 6301, 2022. 8

  6. [6]

    St2360d: Spatial-to-temporal consis- tency for training-free 360 monocular depth estimation

    Zidong Cao, Jinjing Zhu, Hao Ai, Lutao Jiang, Yuanhuiyi Lyu, and Hui Xiong. St2360d: Spatial-to-temporal consis- tency for training-free 360 monocular depth estimation. In 10 The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. 2

  7. [7]

    Panda: To- wards panoramic depth anything with unlabeled panoramas and mobius spatial augmentation

    Zidong Cao, Jinjing Zhu, Weiming Zhang, Hao Ai, Hao- tian Bai, Hengshuang Zhao, and Lin Wang. Panda: To- wards panoramic depth anything with unlabeled panoramas and mobius spatial augmentation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 982–992, 2025. 2

  8. [8]

    Matterport3D: Learning from RGB-D Data in Indoor Environments

    Angel Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Niessner, Manolis Savva, Shuran Song, Andy Zeng, and Yinda Zhang. Matterport3d: Learning from rgb-d data in indoor environments.arXiv preprint arXiv:1709.06158, 2017. 2, 3, 9

  9. [9]

    Panogrf: Generalizable spherical radiance fields for wide-baseline panoramas.Ad- vances in Neural Information Processing Systems, 36:6961– 6985, 2023

    Zheng Chen, Yan-Pei Cao, Yuan-Chen Guo, Chen Wang, Ying Shan, and Song-Hai Zhang. Panogrf: Generalizable spherical radiance fields for wide-baseline panoramas.Ad- vances in Neural Information Processing Systems, 36:6961– 6985, 2023. 2

  10. [10]

    Splatter-360: Generalizable 360 gaussian splatting for wide- baseline panoramic images

    Zheng Chen, Chenming Wu, Zhelun Shen, Chen Zhao, We- icai Ye, Haocheng Feng, Errui Ding, and Song-Hai Zhang. Splatter-360: Generalizable 360 gaussian splatting for wide- baseline panoramic images. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 21590– 21599, 2025. 2

  11. [11]

    Hsfm: Hybrid structure-from-motion

    Hainan Cui, Xiang Gao, Shuhan Shen, and Zhanyi Hu. Hsfm: Hybrid structure-from-motion. InProceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 1212–1221, 2017. 3

  12. [12]

    Qi Feng, Hubert P. H. Shum, and Shigeo Morishima. 360 depth estimation in the wild - the depth360 dataset and the segfuse network.2022 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), pages 664–673, 2022. 3

  13. [13]

    Cambridge university press,

    Richard Hartley and Andrew Zisserman.Multiple view ge- ometry in computer vision. Cambridge university press,

  14. [14]

    360loc: A dataset and benchmark for omnidirectional visual localization with cross-device queries

    Huajian Huang, Changkun Liu, Yipeng Zhu, Hui Cheng, Tristan Braud, and Sai-Kit Yeung. 360loc: A dataset and benchmark for omnidirectional visual localization with cross-device queries. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 22314–22324, 2024. 2, 3

  15. [15]

    Deepmvs: Learning multi- view stereopsis

    Po-Han Huang, Kevin Matzen, Johannes Kopf, Narendra Ahuja, and Jia-Bin Huang. Deepmvs: Learning multi- view stereopsis. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 2821–2830,

  16. [16]

    Im360: Textured mesh reconstruction for large- scale indoor mapping with 360 cameras.arXiv preprint arXiv:2502.12545, 2025

    Dongki Jung, Jaehoon Choi, Yonghan Lee, and Dinesh Manocha. Im360: Textured mesh reconstruction for large- scale indoor mapping with 360 cameras.arXiv preprint arXiv:2502.12545, 2025. 3

  17. [17]

    Ground- ing image matching in 3d with mast3r

    Vincent Leroy, Yohann Cabon, and J´erˆome Revaud. Ground- ing image matching in 3d with mast3r. InEuropean Confer- ence on Computer Vision, pages 71–91. Springer, 2024. 2, 3

  18. [18]

    Mode: Multi-view omnidirectional depth esti- mation with 360 cameras

    Ming Li, Xueqian Jin, Xuejiao Hu, Jingzhao Dai, Sidan Du, and Yang Li. Mode: Multi-view omnidirectional depth esti- mation with 360 cameras. InEuropean Conference on Com- puter Vision, pages 197–213. Springer, 2022. 2, 3

  19. [19]

    Megadepth: Learning single- view depth prediction from internet photos

    Zhengqi Li and Noah Snavely. Megadepth: Learning single- view depth prediction from internet photos. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 2041–2050, 2018. 4

  20. [20]

    Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(3):3292–3310, 2022

    Yiyi Liao, Jun Xie, and Andreas Geiger. Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(3):3292–3310, 2022. 2, 3

  21. [21]

    Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision

    Lu Ling, Yichen Sheng, Zhi Tu, Wentian Zhao, Cheng Xin, Kun Wan, Lantao Yu, Qianyu Guo, Zixun Yu, Yawen Lu, et al. Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22160–22169, 2024. 4

  22. [22]

    Slam3r: Real- time dense scene reconstruction from monocular rgb videos

    Yuzheng Liu, Siyan Dong, Shuzhe Wang, Yingda Yin, Yan- chao Yang, Qingnan Fan, and Baoquan Chen. Slam3r: Real- time dense scene reconstruction from monocular rgb videos. InProceedings of the Computer Vision and Pattern Recogni- tion Conference, pages 16651–16662, 2025. 4

  23. [23]

    Aria digital twin: A new benchmark dataset for egocentric 3d machine perception

    Xiaqing Pan, Nicholas Charron, Yongqian Yang, Scott Pe- ters, Thomas Whelan, Chen Kong, Omkar Parkhi, Richard Newcombe, and Yuheng Carl Ren. Aria digital twin: A new benchmark dataset for egocentric 3d machine perception. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 20133–20143, 2023. 4

  24. [24]

    Com- mon objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction

    Jeremy Reizenstein, Roman Shapovalov, Philipp Henzler, Luca Sbordone, Patrick Labatut, and David Novotny. Com- mon objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction. InProceedings of the IEEE/CVF international conference on computer vision, pages 10901–10911, 2021. 4

  25. [25]

    Yoloe: Real-time seeing anything.arXiv preprint arXiv:2503.07465, 2025

    Ao Wang, Lihao Liu, Hui Chen, Zijia Lin, Jungong Han, and Guiguang Ding. Yoloe: Real-time seeing anything.arXiv preprint arXiv:2503.07465, 2025. 6

  26. [26]

    arXiv preprint arXiv:2408.16061 (2024)

    Hengyi Wang and Lourdes Agapito. 3d reconstruction with spatial memory.arXiv preprint arXiv:2408.16061, 2024. 4, 8

  27. [27]

    Posediffusion: Solving pose estimation via diffusion-aided bundle adjustment

    Jianyuan Wang, Christian Rupprecht, and David Novotny. Posediffusion: Solving pose estimation via diffusion-aided bundle adjustment. InProceedings of the IEEE/CVF Inter- national Conference on Computer Vision, pages 9773–9783,

  28. [28]

    Vggt: Vi- sual geometry grounded transformer

    Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, and David Novotny. Vggt: Vi- sual geometry grounded transformer. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 5294–5306, 2025. 2, 4, 8

  29. [29]

    Depth any- where: Enhancing 360 monocular depth estimation via perspective distillation and unlabeled data augmentation

    Ning-Hsu Albert Wang and Yu-Lun Liu. Depth any- where: Enhancing 360 monocular depth estimation via perspective distillation and unlabeled data augmentation. Advances in Neural Information Processing Systems, 37: 127739–127764, 2024. 2

  30. [30]

    Continuous 3d per- ception model with persistent state

    Qianqian Wang, Yifei Zhang, Aleksander Holynski, Alexei A Efros, and Angjoo Kanazawa. Continuous 3d per- ception model with persistent state. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 10510–10522, 2025. 4, 8 11

  31. [31]

    Moge: Unlocking accurate monocular geometry estimation for open-domain images with optimal training supervision

    Ruicheng Wang, Sicheng Xu, Cassie Dai, Jianfeng Xiang, Yu Deng, Xin Tong, and Jiaolong Yang. Moge: Unlocking accurate monocular geometry estimation for open-domain images with optimal training supervision. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 5261–5271, 2025. 2

  32. [32]

    Dust3r: Geometric 3d vi- sion made easy

    Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, and Jerome Revaud. Dust3r: Geometric 3d vi- sion made easy. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20697– 20709, 2024. 2, 3, 8

  33. [33]

    $\pi^3$: Permutation-Equivariant Visual Geometry Learning

    Yifan Wang, Jianjun Zhou, Haoyi Zhu, Wenzheng Chang, Yang Zhou, Zizun Li, Junyi Chen, Jiangmiao Pang, Chunhua Shen, and Tong He. Permutation-equivariant visual geome- try learning.arXiv preprint arXiv:2507.13347, 2025. 2, 4, 8

  34. [34]

    Fast3r: Towards 3d reconstruction of 1000+ images in one forward pass

    Jianing Yang, Alexander Sax, Kevin J Liang, Mikael Henaff, Hao Tang, Ang Cao, Joyce Chai, Franziska Meier, and Matt Feiszli. Fast3r: Towards 3d reconstruction of 1000+ images in one forward pass. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 21924–21935,

  35. [35]

    Helvipad: A real-world dataset for om- nidirectional stereo depth estimation

    Mehdi Zayene, Jannik Endres, Albias Havolli, Charles Corbi`ere, Salim Cherkaoui, Alexandre Kontouli, and Alexandre Alahi. Helvipad: A real-world dataset for om- nidirectional stereo depth estimation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 26975–26984, 2025. 7

  36. [36]

    Pansplat: 4k panorama synthesis with feed-forward gaussian splatting

    Cheng Zhang, Haofei Xu, Qianyi Wu, Camilo Cruz Gam- bardella, Dinh Phung, and Jianfei Cai. Pansplat: 4k panorama synthesis with feed-forward gaussian splatting. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 11437–11447, 2025. 2

  37. [37]

    arXiv preprint arXiv:2410.03825 (2024)

    Junyi Zhang, Charles Herrmann, Junhwa Hur, Varun Jam- pani, Trevor Darrell, Forrester Cole, Deqing Sun, and Ming- Hsuan Yang. Monst3r: A simple approach for estimat- ing geometry in the presence of motion.arXiv preprint arXiv:2410.03825, 2024. 2, 8

  38. [38]

    Flare: Feed-forward geometry, appearance and camera estimation from uncalibrated sparse views

    Shangzhan Zhang, Jianyuan Wang, Yinghao Xu, Nan Xue, Christian Rupprecht, Xiaowei Zhou, Yujun Shen, and Gor- don Wetzstein. Flare: Feed-forward geometry, appearance and camera estimation from uncalibrated sparse views. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 21936–21947, 2025. 2

  39. [39]

    Particlesfm: Exploiting dense point trajecto- ries for localizing moving cameras in the wild

    Wang Zhao, Shaohui Liu, Hengkai Guo, Wenping Wang, and Yong-Jin Liu. Particlesfm: Exploiting dense point trajecto- ries for localizing moving cameras in the wild. InEuropean Conference on Computer Vision, pages 523–542. Springer,

  40. [40]

    Structured3d: A large photo-realistic dataset for structured 3d modeling

    Jia Zheng, Junfei Zhang, Jing Li, Rui Tang, Shenghua Gao, and Zihan Zhou. Structured3d: A large photo-realistic dataset for structured 3d modeling. InComputer Vision– ECCV 2020: 16th European Conference, Glasgow, UK, Au- gust 23–28, 2020, Proceedings, Part IX 16, pages 519–535. Springer, 2020. 2, 3

  41. [41]

    Omnidepth: Dense depth estimation for indoors spherical panoramas

    Nikolaos Zioulis, Antonis Karakottas, Dimitrios Zarpalas, and Petros Daras. Omnidepth: Dense depth estimation for indoors spherical panoramas. InProceedings of the Euro- pean Conference on Computer Vision (ECCV), pages 448– 465, 2018. 2 12 Supplementary Material

  42. [42]

    The LiDAR offers a 360° × 270° (Horizontal × Vertical) field of view, with a sensing range from 0.05 m to 120 m

    Equipment Details The data acquisition device integrates a LiDAR, RTK- GNSS, IMU, three pinhole cameras, and a 360° camera. The LiDAR offers a 360° × 270° (Horizontal × Vertical) field of view, with a sensing range from 0.05 m to 120 m. It captures point clouds at 320,000 points per second, achiev- ing an absolute precision of 5 cm and a relative precisio...

  43. [43]

    Challenging Scenes As shown in Fig

    Dataset Details 7.1. Challenging Scenes As shown in Fig. 13, our dataset includes several challeng- ing scenes, including (a) low-texture and repetitive-texture scenes, (b) large, long-sequence scenes, as well as (c) low- light and overexposed scenes. These challenging environ- ments provide a robust basis for thoroughly evaluating the performance of pano...

  44. [44]

    We samplenpanoramic images (n∈[3,6]) from a ran- domly selected window of a sequence and decompose each panorama into eight perspective views

    Experiment Settings During training, we adopt a dynamic batch size following π3. We samplenpanoramic images (n∈[3,6]) from a ran- domly selected window of a sequence and decompose each panorama into eight perspective views. Thus, each training batch contains 24–48 perspective images, with at most 48 images processed on each GPU. We train each model us- in...

  45. [45]

    4.3 of the main paper, all models show improved quantitative and qualitative performance af- ter finetuning on our dataset

    More Results As discussed in Sec. 4.3 of the main paper, all models show improved quantitative and qualitative performance af- ter finetuning on our dataset. To complement these findings, we further assess the qualitative performance of the fine- tunedπ 3 model under diverse evaluation settings, including sparse-view and single-view panoramic reconstructi...

  46. [46]

    As shown in Fig

    and PanDA [2]). As shown in Fig. 17, we observe that the finetunedπ 3 approach achieves higher geometric accu- racy than other depth estimation methods. We attribute this improvement to two main factors. First, our dataset pro- vides outdoor and long-range indoor scenes, enabling fine- tunedπ 3 to generalize better to such environments. Sec- ond, although...

  47. [47]

    As shown in Fig

    Limitations and Future Work Limitations.Although our dataset surpasses existing ones in both scale and quality and significantly improves the per- formance of fine-tuned models, one limitation remains: re- construction quality degrades in distant regions. As shown in Fig. 18, while the finetuned model performs well in near regions, the geometric accuracy ...