PanoImager: Geometry-Guided Novel View Synthesis and Reconstruction from Sparse Panoramic Views

Takeshi Oishi; Zhisong Xu

arxiv: 2606.27071 · v1 · pith:JD5ETI2Vnew · submitted 2026-06-25 · 💻 cs.CV

PanoImager: Geometry-Guided Novel View Synthesis and Reconstruction from Sparse Panoramic Views

Zhisong Xu , Takeshi Oishi This is my paper

Pith reviewed 2026-06-26 05:05 UTC · model grok-4.3

classification 💻 cs.CV

keywords panoramic images3D reconstructionnovel view synthesis3D Gaussian splattingdiffusion modelssparse viewsSfM-freeview completion

0 comments

The pith

PanoImager reconstructs 3D scenes from a few panoramic images by generating synthetic auxiliary views that guide 3D Gaussian splatting without SfM.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard SfM and SLAM pipelines often fail to initialize under rotation-dominant panoramic capture because parallax is too weak to produce reliable camera poses. PanoImager decomposes each panorama into local perspective images, applies feed-forward networks for initial pose and depth estimates, and then uses a geometry-conditioned diffusion model to synthesize additional views that fill gaps in the sparse evidence. These auxiliary observations are fed into depth-guided 3D Gaussian splatting optimization, which produces more consistent geometry across views. The resulting pipeline targets offline map refinement tasks where conventional methods cannot even begin.

Core claim

Given only a few panoramic images, PanoImager decomposes them into local perspective views, synthesizes auxiliary observations to enrich sparse evidence, and stabilizes Gaussian optimization for improved cross-view consistency in an SfM-free framework that combines feed-forward pose/depth priors, geometry-conditioned diffusion view completion, and depth-guided 3DGS optimization.

What carries the argument

Geometry-conditioned diffusion view completion that uses feed-forward pose and depth priors to generate auxiliary observations supporting depth-guided 3D Gaussian splatting optimization.

If this is right

Reconstruction stability holds under extreme sparsity where standard SfM initialization is ill-conditioned.
The system can act as an offline background component for map refinement after SfM or SLAM fails to initialize.
Cross-view consistency in the final 3D model improves because sparse evidence is enriched with synthesized observations.
Decomposition of panoramas into local perspective views allows conventional perspective-based optimization techniques to be applied directly.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same feed-forward-plus-diffusion pattern could be tested on other wide-field sensors such as fisheye or multi-camera rigs when parallax is similarly limited.
A faster approximation of the diffusion step would allow the method to serve as a fallback module inside live SLAM systems rather than only offline.
If the synthesized views prove reliable, they could also supply training data for subsequent learning-based depth or pose estimators in panoramic domains.

Load-bearing premise

The auxiliary views produced by the diffusion model must contain geometry accurate enough and artifacts low enough that they improve rather than degrade the subsequent 3D Gaussian splatting step.

What would settle it

Apply the full pipeline and a baseline 3DGS run without synthesis to the same set of two or three panoramic images on a dataset with ground-truth geometry, then compare final reconstruction metrics such as PSNR or depth error to see whether the synthesis stage produces a measurable gain.

Figures

Figures reproduced from arXiv: 2606.27071 by Takeshi Oishi, Zhisong Xu.

**Figure 1.** Figure 1: Overview of our panoramic reconstruction pipeline. Starting from sparse panoramic images in Ω1, we first sample them into a perspective observation set in Ω2. A visual foundation model then predicts camera poses and depth on Ω2, providing geometric priors. Conditioned on these priors, we extend the observations from Ω2 to Ω3 via novel view synthesis, and map the generated views back to Ω2 to enrich the obs… view at source ↗

**Figure 2.** Figure 2: Each panorama is sampled over the viewing sphere and decomposed [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Panoramic view completion via geometry-guided sampling. A [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: For each target view, reference observations are reprojected [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative comparison on Omniscenes and 360roam. From left to right: GT, OmniSplat, ODGS, SPaGS and [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: We sample several local pinhole views from the panoramic view set and compare the diffusion-synthesized views with GT renderings at the same camera poses. In addition, we show synthesized panoramas on real-world scenes for qualitative evaluation of panoramic consistency and visual realism. Ereproj is 0.32 m, and the Free-space IoU reaches 0.76 under ground-truth poses, suggesting improved geometric consist… view at source ↗

read the original abstract

Panoramic sensing offers wide field-of-view coverage, yet 3D reconstruction from sparse panoramas remains challenging under rotation-dominant, weak-parallax motion. In such regimes, SfM/SLAM initialization is often ill-conditioned and unreliable. We present PanoImager, an SfM-free framework that combines feed-forward pose/depth priors, geometry-conditioned diffusion view completion, and depth-guided 3DGS optimization. Given only a few panoramic images, PanoImager decomposes them into local perspective views, synthesizes auxiliary observations to enrich sparse evidence, and stabilizes Gaussian optimization for improved cross-view consistency. Experiments on multiple benchmarks show improved stability under extreme sparsity, suggesting PanoImager as an offline/background component for map refinement when SfM/SLAM fails to initialize.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PanoImager describes a diffusion-plus-3DGS pipeline for sparse panoramic reconstruction but provides no numbers to show the generative step actually helps.

read the letter

PanoImager decomposes a few panoramic images into perspective views, runs feed-forward pose and depth priors, uses geometry-conditioned diffusion to synthesize extra observations, and then runs depth-guided 3D Gaussian splatting. The target is rotation-dominant capture where SfM initialization fails due to weak parallax.

The combination is aimed at a concrete practical gap: enriching evidence without relying on SfM. That focus is reasonable for offline map refinement tasks.

The load-bearing assumption is that the diffusion outputs will be geometrically accurate enough not to inject depth errors or view inconsistencies that degrade the later optimization. Under extreme sparsity this is not automatic, and any hallucinated structure could increase cross-view error rather than reduce it. The abstract asserts improved stability on benchmarks yet supplies no quantitative results, baselines, ablations, or error breakdowns, so the claim cannot be checked.

The paper is aimed at researchers working on panoramic novel-view synthesis and 3D reconstruction in constrained capture settings. A reader already working on diffusion-conditioned 3DGS might extract an idea or two, but the absence of evidence makes it hard to judge whether the pipeline delivers a net gain.

I would not send this to peer review on the current text because the central empirical claim is unsupported. If the full manuscript contains solid metrics and controls that address the diffusion quality issue, that could change.

Referee Report

2 major / 0 minor

Summary. The paper proposes PanoImager, an SfM-free framework for novel view synthesis and 3D reconstruction from sparse panoramic images under rotation-dominant weak-parallax conditions. It decomposes input panoramas into local perspective views, employs feed-forward pose/depth priors, uses a geometry-conditioned diffusion model to synthesize auxiliary observations, and performs depth-guided 3D Gaussian Splatting (3DGS) optimization to stabilize reconstruction and improve cross-view consistency. The abstract claims that experiments on multiple benchmarks demonstrate improved stability under extreme sparsity, positioning the method as an offline component for map refinement when SfM/SLAM fails.

Significance. If the central claims hold with supporting evidence, the work would address a practically relevant gap in panoramic 3D reconstruction where traditional initialization is ill-conditioned. The pipeline's integration of feed-forward priors with generative view completion for subsequent geometry-guided optimization represents a potentially useful direction for handling extreme sparsity. However, the absence of any quantitative results, baselines, ablations, or error metrics in the provided text prevents assessment of whether the synthesized views actually improve rather than degrade 3DGS performance.

major comments (2)

[Abstract] Abstract: the claim that 'Experiments on multiple benchmarks show improved stability under extreme sparsity' is unsupported by any quantitative metrics, baselines, ablation results, or error analysis; the central claim of stabilization via auxiliary views cannot be evaluated.
[Abstract] Abstract: the pipeline's core assumption—that feed-forward pose/depth priors plus the geometry-conditioned diffusion model produce auxiliary perspective views whose geometry is accurate and artifact-free enough to improve (rather than degrade) depth-guided 3DGS under rotation-dominant weak-parallax—is stated without verification, analysis of failure modes, or quality thresholds for the generative step.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the referee's feedback. We recognize that the abstract claims require empirical backing that is not present in the provided manuscript text, and we will revise accordingly to strengthen the submission.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'Experiments on multiple benchmarks show improved stability under extreme sparsity' is unsupported by any quantitative metrics, baselines, ablation results, or error analysis; the central claim of stabilization via auxiliary views cannot be evaluated.

Authors: We agree that the claim in the abstract is unsupported by any quantitative evidence in the current manuscript text. In the revised version we will add a dedicated experimental section reporting metrics (e.g., PSNR, SSIM, depth error) on the cited benchmarks, direct comparisons against relevant baselines, and ablation studies that isolate the contribution of the auxiliary views to 3DGS stability under extreme sparsity. revision: yes
Referee: [Abstract] Abstract: the pipeline's core assumption—that feed-forward pose/depth priors plus the geometry-conditioned diffusion model produce auxiliary perspective views whose geometry is accurate and artifact-free enough to improve (rather than degrade) depth-guided 3DGS under rotation-dominant weak-parallax—is stated without verification, analysis of failure modes, or quality thresholds for the generative step.

Authors: We accept that the manuscript provides no verification or analysis of this assumption. The revision will include quantitative evaluation of the synthesized views (geometric consistency metrics, artifact detection), explicit discussion of observed failure modes in weak-parallax regimes, and any filtering thresholds applied before feeding views into 3DGS, so that readers can assess whether the generative step reliably improves rather than degrades optimization. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper presents a descriptive pipeline (feed-forward priors + geometry-conditioned diffusion + depth-guided 3DGS) without any equations, derivations, or fitted parameters that reduce outputs to inputs by construction. No self-definitional steps, no predictions that are statistically forced by fitting, and no load-bearing self-citations are present. Claims rest on empirical benchmark results rather than internal redefinitions. This matches the default expectation of a non-circular method description.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no equations, parameters, or explicit assumptions beyond the high-level pipeline description.

pith-pipeline@v0.9.1-grok · 5663 in / 1073 out tokens · 18320 ms · 2026-06-26T05:05:41.778289+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

MyGO-Splat: Multi-Objective Closed-Loop Geometric Feedback for RGB-Only Gaussian SLAM
cs.RO 2026-06 unverdicted novelty 6.0

MyGO-Splat is a closed-loop RGB-only Gaussian SLAM system that rasterizes depth and normals from the map to supervise pose optimization and align monocular depth priors for scale consistency.

Reference graph

Works this paper leans on

28 extracted references · 8 canonical work pages · cited by 1 Pith paper · 5 internal anchors

[1]

Robot homing by exploiting panoramic vision,

A. A. Argyros, K. E. Bekris, S. C. Orphanoudakis, and L. E. Kavraki, “Robot homing by exploiting panoramic vision,”Autonomous Robots, vol. 19, no. 1, pp. 7–25, 2005

2005
[2]

Panoslam: Panoptic 3d scene reconstruction via gaussian slam,

R. Chen, Z. Wang, J. Wang, Y . Ma, M. Gong, W. Wang, and T. Liu, “Panoslam: Panoptic 3d scene reconstruction via gaussian slam,”arXiv preprint arXiv:2501.00352, 2024

work page arXiv 2024
[3]

360orb-slam: A visual slam system for panoramic images with depth completion network,

Y . Chen, Y . Pan, R. Liu, H. Zhang, G. Zhang, B. Sun, and J. Zhang, “360orb-slam: A visual slam system for panoramic images with depth completion network,” in2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD). IEEE, 2024, pp. 717–722

2024
[4]

Inf: Implicit neural fusion for lidar and camera,

S. Zhou, S. Xie, R. Ishikawa, K. Sakurada, M. Onishi, and T. Oishi, “Inf: Implicit neural fusion for lidar and camera,” in2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2023, pp. 10 918–10 925

2023
[5]

360 video viewing dataset in head-mounted virtual reality,

W.-C. Lo, C.-L. Fan, J. Lee, C.-Y . Huang, K.-T. Chen, and C.-H. Hsu, “360 video viewing dataset in head-mounted virtual reality,” in Proceedings of the 8th ACM on Multimedia Systems Conference, 2017, pp. 211–216

2017
[6]

Nerf: Representing scenes as neural radiance fields for view synthesis,

B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoor- thi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,”Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2021

2021
[7]

3d gaussian splatting for real-time radiance field rendering

B. Kerbl, G. Kopanas, T. Leimkühler, and G. Drettakis, “3d gaussian splatting for real-time radiance field rendering.”ACM Trans. Graph., vol. 42, no. 4, pp. 139–1, 2023

2023
[8]

Odgs: 3d scene recon- struction from omnidirectional images with 3d gaussian splattings,

S. Lee, J. Chung, J. Huh, and K. M. Lee, “Odgs: 3d scene recon- struction from omnidirectional images with 3d gaussian splattings,” Advances in Neural Information Processing Systems, vol. 37, pp. 57 050–57 075, 2024

2024
[9]

Balanced spherical grid for egocentric view synthesis,

C. Choi, S. M. Kim, and Y . M. Kim, “Balanced spherical grid for egocentric view synthesis,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 16 590–16 599

2023
[10]

Spags: Fast and accurate 3d gaussian splatting for spherical panoramas,

J. Li, F. Hahlbohm, T. Scholz, M. Eisemann, J. Tauscher, and M. Mag- nor, “Spags: Fast and accurate 3d gaussian splatting for spherical panoramas,” inComputer Graphics Forum, vol. 44, no. 4. Wiley Online Library, 2025, p. e70171

2025
[11]

Dust3r: Geometric 3d vision made easy,

S. Wang, V . Leroy, Y . Cabon, B. Chidlovskii, and J. Revaud, “Dust3r: Geometric 3d vision made easy,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 20 697–20 709

2024
[12]

Vggt: Visual geometry grounded transformer,

J. Wang, M. Chen, N. Karaev, A. Vedaldi, C. Rupprecht, and D. Novotny, “Vggt: Visual geometry grounded transformer,” inPro- ceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 5294–5306

2025
[13]

FrameVGGT: Coherence-Preserving Memory for Bounded Streaming Geometry

Z. Xu and T. Oishi, “Framevggt: Frame evidence rolling memory for streaming vggt,”arXiv preprint arXiv:2603.07690, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[14]

Omnisplat: Taming feed-forward 3d gaussian splatting for omnidirectional images with editable capabilities,

S. Lee, J. Chung, K. Kim, J. Huh, G. Lee, M. Lee, and K. M. Lee, “Omnisplat: Taming feed-forward 3d gaussian splatting for omnidirectional images with editable capabilities,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 16 356–16 365

2025
[15]

Pansplat: 4k panorama synthesis with feed-forward gaussian splat- ting,

C. Zhang, H. Xu, Q. Wu, C. C. Gambardella, D. Phung, and J. Cai, “Pansplat: 4k panorama synthesis with feed-forward gaussian splat- ting,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 11 437–11 447

2025
[16]

Splatter-360: Generalizable 360 gaussian splatting for wide- baseline panoramic images,

Z. Chen, C. Wu, Z. Shen, C. Zhao, W. Ye, H. Feng, E. Ding, and S.-H. Zhang, “Splatter-360: Generalizable 360 gaussian splatting for wide- baseline panoramic images,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 21 590–21 599

2025
[17]

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

A. Blattmann, T. Dockhorn, S. Kulal, D. Mendelevitch, M. Kilian, D. Lorenz, Y . Levi, Z. English, V . V oleti, A. Lettset al., “Stable video diffusion: Scaling latent video diffusion models to large datasets,” arXiv preprint arXiv:2311.15127, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[18]

ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis

W. Yu, J. Xing, L. Yuan, W. Hu, X. Li, Z. Huang, X. Gao, T.-T. Wong, Y . Shan, and Y . Tian, “Viewcrafter: Taming video diffusion models for high-fidelity novel view synthesis,”arXiv preprint arXiv:2409.02048, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[19]

Reconx: Reconstruct any scene from sparse views with video diffusion model,

F. Liu, W. Sun, H. Wang, Y . Wang, H. Sun, J. Ye, J. Zhang, and Y . Duan, “Reconx: Reconstruct any scene from sparse views with video diffusion model,”arXiv preprint arXiv:2408.16767, 2024

work page arXiv 2024
[20]

Mvgenmaster: Scaling multi-view generation from any image via 3d priors enhanced diffusion model,

C. Cao, C. Yu, S. Liu, F. Wang, X. Xue, and Y . Fu, “Mvgenmaster: Scaling multi-view generation from any image via 3d priors enhanced diffusion model,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 6045–6056

2025
[21]

Yuto mms: A comprehensive slam dataset for urban mobile mapping with tilted lidar and panoramic camera integration,

Y . Zhang, S. Ahmadi, J. Kang, Z. Arjmandi, and G. Sohn, “Yuto mms: A comprehensive slam dataset for urban mobile mapping with tilted lidar and panoramic camera integration,”The International Journal of Robotics Research, vol. 44, no. 1, pp. 3–21, 2025

2025
[22]

Upslam: Union of panoramas slam,

A. Cowley, I. D. Miller, and C. J. Taylor, “Upslam: Union of panoramas slam,” in2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 1103–1109

2021
[23]

Mgso: Monocular real-time photometric slam with efficient 3d gaussian splatting,

Y . S. Hu, N. Abboud, M. Q. Ali, A. S. Yang, I. Elhajj, D. Asmar, Y . Chen, and J. S. Zelek, “Mgso: Monocular real-time photometric slam with efficient 3d gaussian splatting,” in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 11 061–11 067

2025
[24]

Fgo-slam: Enhancing gaussian slam with globally consistent opacity radiance field,

F. Zhu, Y . Zhao, Z. Chen, B. Yu, and H. Zhu, “Fgo-slam: Enhancing gaussian slam with globally consistent opacity radiance field,” in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 11 075–11 081

2025
[25]

MMD-SLAM: Structure-Enhanced Multi-Meta Gaussian Distribution-Guided Visual SLAM

Z. Fan, C. Ziyu, L. Peichen, Z. Yifan, X. Zhisong, Z. Hui, Z. Hongx- ing, L. Sixun, and J. Chunmao, “Mmd-slam: Structure-enhanced multi-meta gaussian distribution-guided visual slam,”arXiv preprint arXiv:2606.19874, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[26]

The Replica Dataset: A Digital Replica of Indoor Spaces

J. Straub, T. Whelan, L. Ma, Y . Chen, E. Wijmans, S. Green, J. J. Engel, R. Mur-Artal, C. Ren, S. Vermaet al., “The replica dataset: A digital replica of indoor spaces,”arXiv preprint arXiv:1906.05797, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1906
[27]

Omni-scene: Omni-gaussian representation for ego-centric sparse-view scene reconstruction,

D. Wei, Z. Li, and P. Liu, “Omni-scene: Omni-gaussian representation for ego-centric sparse-view scene reconstruction,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 22 317–22 327

2025
[28]

360Roam: Real- Time Indoor Roaming Using Geometry-Aware 360° Radiance Fields,

H. Huang, Y . Chen, T. Zhang, and S.-K. Yeung, “360Roam: Real- Time Indoor Roaming Using Geometry-Aware 360° Radiance Fields,” arXiv preprint arXiv:2208.02705, 2022

work page arXiv 2022

[1] [1]

Robot homing by exploiting panoramic vision,

A. A. Argyros, K. E. Bekris, S. C. Orphanoudakis, and L. E. Kavraki, “Robot homing by exploiting panoramic vision,”Autonomous Robots, vol. 19, no. 1, pp. 7–25, 2005

2005

[2] [2]

Panoslam: Panoptic 3d scene reconstruction via gaussian slam,

R. Chen, Z. Wang, J. Wang, Y . Ma, M. Gong, W. Wang, and T. Liu, “Panoslam: Panoptic 3d scene reconstruction via gaussian slam,”arXiv preprint arXiv:2501.00352, 2024

work page arXiv 2024

[3] [3]

360orb-slam: A visual slam system for panoramic images with depth completion network,

Y . Chen, Y . Pan, R. Liu, H. Zhang, G. Zhang, B. Sun, and J. Zhang, “360orb-slam: A visual slam system for panoramic images with depth completion network,” in2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD). IEEE, 2024, pp. 717–722

2024

[4] [4]

Inf: Implicit neural fusion for lidar and camera,

S. Zhou, S. Xie, R. Ishikawa, K. Sakurada, M. Onishi, and T. Oishi, “Inf: Implicit neural fusion for lidar and camera,” in2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2023, pp. 10 918–10 925

2023

[5] [5]

360 video viewing dataset in head-mounted virtual reality,

W.-C. Lo, C.-L. Fan, J. Lee, C.-Y . Huang, K.-T. Chen, and C.-H. Hsu, “360 video viewing dataset in head-mounted virtual reality,” in Proceedings of the 8th ACM on Multimedia Systems Conference, 2017, pp. 211–216

2017

[6] [6]

Nerf: Representing scenes as neural radiance fields for view synthesis,

B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoor- thi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,”Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2021

2021

[7] [7]

3d gaussian splatting for real-time radiance field rendering

B. Kerbl, G. Kopanas, T. Leimkühler, and G. Drettakis, “3d gaussian splatting for real-time radiance field rendering.”ACM Trans. Graph., vol. 42, no. 4, pp. 139–1, 2023

2023

[8] [8]

Odgs: 3d scene recon- struction from omnidirectional images with 3d gaussian splattings,

S. Lee, J. Chung, J. Huh, and K. M. Lee, “Odgs: 3d scene recon- struction from omnidirectional images with 3d gaussian splattings,” Advances in Neural Information Processing Systems, vol. 37, pp. 57 050–57 075, 2024

2024

[9] [9]

Balanced spherical grid for egocentric view synthesis,

C. Choi, S. M. Kim, and Y . M. Kim, “Balanced spherical grid for egocentric view synthesis,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 16 590–16 599

2023

[10] [10]

Spags: Fast and accurate 3d gaussian splatting for spherical panoramas,

J. Li, F. Hahlbohm, T. Scholz, M. Eisemann, J. Tauscher, and M. Mag- nor, “Spags: Fast and accurate 3d gaussian splatting for spherical panoramas,” inComputer Graphics Forum, vol. 44, no. 4. Wiley Online Library, 2025, p. e70171

2025

[11] [11]

Dust3r: Geometric 3d vision made easy,

S. Wang, V . Leroy, Y . Cabon, B. Chidlovskii, and J. Revaud, “Dust3r: Geometric 3d vision made easy,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 20 697–20 709

2024

[12] [12]

Vggt: Visual geometry grounded transformer,

J. Wang, M. Chen, N. Karaev, A. Vedaldi, C. Rupprecht, and D. Novotny, “Vggt: Visual geometry grounded transformer,” inPro- ceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 5294–5306

2025

[13] [13]

FrameVGGT: Coherence-Preserving Memory for Bounded Streaming Geometry

Z. Xu and T. Oishi, “Framevggt: Frame evidence rolling memory for streaming vggt,”arXiv preprint arXiv:2603.07690, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[14] [14]

Omnisplat: Taming feed-forward 3d gaussian splatting for omnidirectional images with editable capabilities,

S. Lee, J. Chung, K. Kim, J. Huh, G. Lee, M. Lee, and K. M. Lee, “Omnisplat: Taming feed-forward 3d gaussian splatting for omnidirectional images with editable capabilities,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 16 356–16 365

2025

[15] [15]

Pansplat: 4k panorama synthesis with feed-forward gaussian splat- ting,

C. Zhang, H. Xu, Q. Wu, C. C. Gambardella, D. Phung, and J. Cai, “Pansplat: 4k panorama synthesis with feed-forward gaussian splat- ting,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 11 437–11 447

2025

[16] [16]

Splatter-360: Generalizable 360 gaussian splatting for wide- baseline panoramic images,

Z. Chen, C. Wu, Z. Shen, C. Zhao, W. Ye, H. Feng, E. Ding, and S.-H. Zhang, “Splatter-360: Generalizable 360 gaussian splatting for wide- baseline panoramic images,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 21 590–21 599

2025

[17] [17]

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

A. Blattmann, T. Dockhorn, S. Kulal, D. Mendelevitch, M. Kilian, D. Lorenz, Y . Levi, Z. English, V . V oleti, A. Lettset al., “Stable video diffusion: Scaling latent video diffusion models to large datasets,” arXiv preprint arXiv:2311.15127, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[18] [18]

ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis

W. Yu, J. Xing, L. Yuan, W. Hu, X. Li, Z. Huang, X. Gao, T.-T. Wong, Y . Shan, and Y . Tian, “Viewcrafter: Taming video diffusion models for high-fidelity novel view synthesis,”arXiv preprint arXiv:2409.02048, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[19] [19]

Reconx: Reconstruct any scene from sparse views with video diffusion model,

F. Liu, W. Sun, H. Wang, Y . Wang, H. Sun, J. Ye, J. Zhang, and Y . Duan, “Reconx: Reconstruct any scene from sparse views with video diffusion model,”arXiv preprint arXiv:2408.16767, 2024

work page arXiv 2024

[20] [20]

Mvgenmaster: Scaling multi-view generation from any image via 3d priors enhanced diffusion model,

C. Cao, C. Yu, S. Liu, F. Wang, X. Xue, and Y . Fu, “Mvgenmaster: Scaling multi-view generation from any image via 3d priors enhanced diffusion model,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 6045–6056

2025

[21] [21]

Yuto mms: A comprehensive slam dataset for urban mobile mapping with tilted lidar and panoramic camera integration,

Y . Zhang, S. Ahmadi, J. Kang, Z. Arjmandi, and G. Sohn, “Yuto mms: A comprehensive slam dataset for urban mobile mapping with tilted lidar and panoramic camera integration,”The International Journal of Robotics Research, vol. 44, no. 1, pp. 3–21, 2025

2025

[22] [22]

Upslam: Union of panoramas slam,

A. Cowley, I. D. Miller, and C. J. Taylor, “Upslam: Union of panoramas slam,” in2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 1103–1109

2021

[23] [23]

Mgso: Monocular real-time photometric slam with efficient 3d gaussian splatting,

Y . S. Hu, N. Abboud, M. Q. Ali, A. S. Yang, I. Elhajj, D. Asmar, Y . Chen, and J. S. Zelek, “Mgso: Monocular real-time photometric slam with efficient 3d gaussian splatting,” in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 11 061–11 067

2025

[24] [24]

Fgo-slam: Enhancing gaussian slam with globally consistent opacity radiance field,

F. Zhu, Y . Zhao, Z. Chen, B. Yu, and H. Zhu, “Fgo-slam: Enhancing gaussian slam with globally consistent opacity radiance field,” in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 11 075–11 081

2025

[25] [25]

MMD-SLAM: Structure-Enhanced Multi-Meta Gaussian Distribution-Guided Visual SLAM

Z. Fan, C. Ziyu, L. Peichen, Z. Yifan, X. Zhisong, Z. Hui, Z. Hongx- ing, L. Sixun, and J. Chunmao, “Mmd-slam: Structure-enhanced multi-meta gaussian distribution-guided visual slam,”arXiv preprint arXiv:2606.19874, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[26] [26]

The Replica Dataset: A Digital Replica of Indoor Spaces

J. Straub, T. Whelan, L. Ma, Y . Chen, E. Wijmans, S. Green, J. J. Engel, R. Mur-Artal, C. Ren, S. Vermaet al., “The replica dataset: A digital replica of indoor spaces,”arXiv preprint arXiv:1906.05797, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1906

[27] [27]

Omni-scene: Omni-gaussian representation for ego-centric sparse-view scene reconstruction,

D. Wei, Z. Li, and P. Liu, “Omni-scene: Omni-gaussian representation for ego-centric sparse-view scene reconstruction,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 22 317–22 327

2025

[28] [28]

360Roam: Real- Time Indoor Roaming Using Geometry-Aware 360° Radiance Fields,

H. Huang, Y . Chen, T. Zhang, and S.-K. Yeung, “360Roam: Real- Time Indoor Roaming Using Geometry-Aware 360° Radiance Fields,” arXiv preprint arXiv:2208.02705, 2022

work page arXiv 2022