Recognition: 2 theorem links
· Lean TheoremPanoPlane: Plane-Aware Panoramic Completion for Sparse-View Indoor 3D Gaussian Splatting
Pith reviewed 2026-05-15 05:00 UTC · model grok-4.3
The pith
PanoPlane achieves high-fidelity indoor novel view synthesis from sparse inputs by using plane-aware panoramic completion to supervise 3D Gaussian Splatting.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that by anchoring the attention maps in a pre-trained diffusion model to the planar layouts detected in observed panoramic regions, the generated completions maintain geometric consistency across the full 360 degrees, which in turn allows 3D Gaussian Splatting to synthesize high-quality novel views without any fine-tuning of the generative model, as validated by superior PSNR, SSIM, and LPIPS metrics on Replica, ScanNet++, and Matterport3D.
What carries the argument
Layout Anchored Attention Steering, which steers attention within the diffusion model's internal representation toward the scene's detected planar surfaces at inference time to enforce geometric consistency in extrapolations.
If this is right
- Supports accurate novel-view synthesis with as few as three input views.
- Delivers up to 17.8 percent PSNR gains over prior state-of-the-art methods on indoor benchmarks.
- Requires no training or fine-tuning of the underlying diffusion model.
- Enables reconstruction of closed room geometry through full panoramic completion.
- Grounds the generative process in observed planar structures to minimize artifacts.
Where Pith is reading between the lines
- The technique might generalize to other generative tasks where geometric priors can guide inference-time behavior.
- It points toward hybrid pipelines that combine classical geometry detection with modern diffusion models for 3D vision.
- Future work could test the method on dynamic scenes or with moving objects to assess robustness.
Load-bearing premise
Steering attention in the diffusion model toward detected planar surfaces will reliably produce geometrically consistent extrapolations in unobserved regions without artifacts or inconsistencies.
What would settle it
A test scene where the synthesized novel views show systematic geometric errors, such as warped walls or mismatched floor planes that contradict the input measurements, would disprove the claim.
Figures
read the original abstract
We present PanoPlane, an approach for high-fidelity sparse-view indoor novel view synthesis that reconstructs closed room geometry via panoramic scene completion. Unlike perspective-based methods that generate training views from limited fields of view, PanoPlane leverages $360^{\circ}$ panoramic completion to condition the generative process on the full spatial layout. We propose Layout Anchored Attention Steering, a training-free mechanism that steers attention within the diffusion model's internal representation toward scene's detected planar surfaces at inference time. By directing each unobserved region's attention toward geometrically consistent observed content, our method replaces unconstrained hallucination with grounded surface extrapolation. The resulting panoramic completions provide supervision for 3D Gaussian Splatting, enabling accurate novel-view synthesis across unobserved regions from as few as three input views. Experiments on Replica, ScanNet++, and Matterport3D demonstrate state-of-the-art novel view synthesis quality across 3, 6, and 9 input views, achieving up to $+17.8\%$ improvement in PSNR over the current state-of-the-art baseline without any training or fine-tuning of the diffusion model.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces PanoPlane for sparse-view indoor novel view synthesis via 360° panoramic scene completion. It proposes Layout Anchored Attention Steering, a training-free mechanism that directs a diffusion model's attention toward detected planar surfaces at inference time to produce grounded extrapolations, which then supervise 3D Gaussian Splatting. Experiments on Replica, ScanNet++, and Matterport3D claim state-of-the-art results for 3/6/9 input views, with up to +17.8% PSNR gains over baselines without diffusion fine-tuning.
Significance. If the geometric consistency of the steered completions holds, the approach would be significant for training-free sparse-view indoor reconstruction: it leverages full panoramic layout context and existing diffusion models to reduce hallucination in unobserved regions, offering a practical way to improve 3DGS supervision from very few views.
major comments (2)
- [Experiments] Experiments section: only novel-view PSNR/SSIM/LPIPS are reported; no direct metrics (multi-view depth consistency, plane-normal error, or reprojection error on completed regions) are provided to verify that Layout Anchored Attention Steering produces 3D-geometrically consistent extrapolations rather than plausible 2D textures.
- [Method] Method section (Layout Anchored Attention Steering): the mechanism is described at a high level but lacks precise implementation details on attention-layer selection, steering weights, and integration with plane detection, which are load-bearing for reproducing the claimed geometric grounding.
minor comments (1)
- [Abstract] Abstract: the '+17.8% PSNR' claim does not specify the exact baseline method or input-view count at which the maximum gain occurs.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript accordingly to strengthen the presentation of geometric consistency and reproducibility.
read point-by-point responses
-
Referee: [Experiments] Experiments section: only novel-view PSNR/SSIM/LPIPS are reported; no direct metrics (multi-view depth consistency, plane-normal error, or reprojection error on completed regions) are provided to verify that Layout Anchored Attention Steering produces 3D-geometrically consistent extrapolations rather than plausible 2D textures.
Authors: We agree that direct geometric metrics on the completed regions would provide stronger validation of the 3D consistency achieved by Layout Anchored Attention Steering. In the revised manuscript, we will add quantitative evaluations including multi-view depth consistency (measured via reprojection error across held-out views) and plane-normal error on extrapolated surfaces, using ground-truth geometry from Replica and ScanNet++ where available. These will be reported alongside the existing NVS metrics to directly address the concern. revision: yes
-
Referee: [Method] Method section (Layout Anchored Attention Steering): the mechanism is described at a high level but lacks precise implementation details on attention-layer selection, steering weights, and integration with plane detection, which are load-bearing for reproducing the claimed geometric grounding.
Authors: We acknowledge that additional implementation specifics are needed for full reproducibility. The revised Method section will detail the exact attention layers targeted (specifically the mid-level cross-attention blocks in the diffusion U-Net), the steering weight schedule (fixed at 1.2 for planar regions with linear decay), and the precise integration pipeline with the plane detection module, including pseudocode and hyperparameter values used in all experiments. revision: yes
Circularity Check
No circularity: method applies external pre-trained diffusion model with new inference-time steering and validates on independent benchmarks
full rationale
The derivation chain begins with standard plane detection on sparse input panoramas, applies a training-free attention-steering rule (Layout Anchored Attention Steering) inside an off-the-shelf diffusion model, produces completed panoramas, and feeds those images as supervision to 3D Gaussian Splatting. All quantitative claims are empirical comparisons (PSNR/SSIM/LPIPS on Replica, ScanNet++, Matterport3D) against external baselines; no parameter is fitted to the target metric and then re-labeled as a prediction, no uniqueness theorem is imported from the authors' prior work, and no equation reduces to a self-definition. The approach therefore remains self-contained against external benchmarks and receives a score of 0.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Layout Anchored Attention Steering... steers attention within the diffusion model's internal representation toward scene's detected planar surfaces at inference time
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Self-rectifying diffusion sampling with perturbed-attention guidance
Donghoon Ahn, Hyoungwon Cho, Jaewon Min, Wooseok Jang, Jungwoo Kim, SeonHwa Kim, Hyun Hee Park, Kyong Hwan Jin, and Seungryong Kim. Self-rectifying diffusion sampling with perturbed-attention guidance. InEuropean Conference on Computer Vision (ECCV), pages 1–17. Springer, 2024. doi: 10.1007/978-3-031-73464-9_1
-
[2]
Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, Wenbin Ge, Zhifang Guo, Qidong Huang, Jie Huang, Fei Huang, Binyuan Hui, Shutong Jiang, Zhaohai Li, Mingsheng Li, Mei Li, Kaixin Li, Zicheng Lin, Junyang Lin, Xuejing Liu, Jiawei Liu, Chenglong Liu, Yang Liu, Dayiheng Liu, Shixuan ...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[3]
Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing
Mingdeng Cao, Xintao Wang, Zhongang Qi, Ying Shan, Xiaohu Qie, and Yinqiang Zheng. Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 22560–22570, 2023
work page 2023
-
[4]
Matterport3D: Learning from RGB-D Data in Indoor Environments
Angel Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Niessner, Manolis Savva, Shuran Song, Andy Zeng, and Yinda Zhang. Matterport3d: Learning from rgb-d data in indoor environments.arXiv preprint arXiv:1709.06158, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[5]
Quantifying and alleviating co-adaptation in sparse-view 3d gaussian splatting
Kangjie Chen, Yingji Zhong, Zhihao Li, Jiaqi Lin, Youyu Chen, Minghan Qin, and Haoqian Wang. Quantifying and alleviating co-adaptation in sparse-view 3d gaussian splatting. In Advances in Neural Information Processing Systems, 2025
work page 2025
-
[6]
Fu, Stefano Ermon, Atri Rudra, and Christopher Ré
Tri Dao, Daniel Y . Fu, Stefano Ermon, Atri Rudra, and Christopher Ré. Flashattention: Fast and memory-efficient exact attention with io-awareness. InAdvances in Neural Information Processing Systems, volume 35, pages 16344–16359, 2022
work page 2022
-
[7]
MASt3r-sfm: a fully-integrated solution for unconstrained structure-from- motion
Bardienus Pieter Duisterhof, Lojze Zust, Philippe Weinzaepfel, Vincent Leroy, Yohann Cabon, and Jerome Revaud. MASt3r-sfm: a fully-integrated solution for unconstrained structure-from- motion. InInternational Conference on 3D Vision 2025, 2025. URL https://openreview. net/forum?id=5uw1GRBFoT
work page 2025
-
[8]
arXiv preprint arXiv:2403.20309 (2024)
Zhiwen Fan, Wenyan Cong, Kairun Wen, Kevin Wang, Jian Zhang, Xinghao Ding, Danfei Xu, Boris Ivanovic, Marco Pavone, Georgios Pavlakos, et al. Instantsplat: Sparse-view gaussian splatting in seconds.arXiv preprint arXiv:2403.20309, 2024
-
[9]
Haoran Feng, Dizhe Zhang, Xiangtai Li, Bo Du, and Lu Qi. Dit360: High-fidelity panoramic image generation via hybrid training.arXiv preprint arXiv:2510.11712, 2025
-
[10]
Martin A. Fischler and Robert C. Bolles. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography.Communications of the ACM, 24(6):381–395, 1981. doi: 10.1145/358669.358692
-
[11]
Yasutaka Furukawa, Brian Curless, Steven M. Seitz, and Richard Szeliski. Manhattan-world stereo. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1422–1429, 2009. doi: 10.1109/CVPRW.2009.5206867
-
[12]
Matcha gaussians: Atlas of charts for high-quality geometry and photorealism from sparse views
Antoine Guedon, Tomoki Ichikawa, Kohei Yamashita, and Ko Nishino. Matcha gaussians: Atlas of charts for high-quality geometry and photorealism from sparse views. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6001–6011, June 2025
work page 2025
-
[13]
Igl-nav: Incremental 3d gaussian localization for image-goal navigation
Wenxuan Guo, Xiuwei Xu, Hang Yin, Ziwei Wang, Jianjiang Feng, Jie Zhou, and Jiwen Lu. Igl-nav: Incremental 3d gaussian localization for image-goal navigation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 6808–6817, October 2025. 10
work page 2025
-
[14]
Prompt-to-prompt image editing with cross attention control
Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. Prompt-to-prompt image editing with cross attention control. InInternational Conference on Learning Representations (ICLR), 2023
work page 2023
-
[15]
Local path opti- mization in the latent space using learned distance gradient
Kohei Honda, Takeshi Ishita, Yasuhiro Yoshimura, and Ryo Yonetani. Gsplatvnm: Point- of-view synthesis for visual navigation models using gaussian splatting. In2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 20869–20876, 2025. doi: 10.1109/IROS60139.2025.11246997
-
[16]
Improving sample quality of diffusion models using self-attention guidance
Susung Hong, Gyuseong Lee, Wooseok Jang, and Seungryong Kim. Improving sample quality of diffusion models using self-attention guidance. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 7462–7471, 2023
work page 2023
-
[17]
2d gaussian splatting for geometrically accurate radiance fields
Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, and Shenghua Gao. 2d gaussian splatting for geometrically accurate radiance fields. InSIGGRAPH 2024 Conference Papers. Association for Computing Machinery, 2024. doi: 10.1145/3641519.3657428
-
[18]
GaMO: Geometry-aware Multi-view Diffusion Outpainting for Sparse-View 3D Reconstruction
Yi-Chuan Huang, Hao-Jen Chien, Chin-Yang Lin, Ying-Huan Chen, and Yu-Lun Liu. Gamo: Geometry-aware multi-view diffusion outpainting for sparse-view 3d reconstruction.arXiv preprint arXiv:2512.25073, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[19]
Team HY-World. Hy-world 2.0: A multi-modal world model for reconstructing, generating, and simulating 3d worlds.arXiv preprint, 2026
work page 2026
-
[20]
Comapgs: Covisibility map-based gaussian splatting for sparse novel view synthesis
Youngkyoon Jang and Eduardo Pérez-Pellitero. Comapgs: Covisibility map-based gaussian splatting for sparse novel view synthesis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 26779–26788, June 2025
work page 2025
-
[21]
Liren Jin, Xingguang Zhong, Yue Pan, Jens Behley, Cyrill Stachniss, and Marija Popovi´c. Ac- tiveGS: Active Scene Reconstruction Using Gaussian Splatting.IEEE Robotics and Automation Letters, 10(5):4866–4873, 2025. doi: 10.1109/LRA.2025.3555149
-
[22]
Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42(4), 2023
work page 2023
-
[23]
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick. Segment anything. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4015–4026, 2023. doi: 10.1109/ICCV51070.2023.00371
-
[24]
FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space
Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dockhorn, Jack English, Zion English, Patrick Esser, Sumith Kulal, Kyle Lacey, Yam Levi, Cheng Li, Dominik Lorenz, Jonas Müller, Dustin Podell, Robin Rombach, Harry Saini, Axel Sauer, and Luke Smith. Flux.1 kontext: Flow matching for in-context image ...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[25]
Lee, Martial Hebert, and Takeo Kanade
David C. Lee, Martial Hebert, and Takeo Kanade. Geometric reasoning for single image structure recovery. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2136–2143, 2009. doi: 10.1109/CVPR.2009.5206872
-
[26]
Xiaohan Lei, Min Wang, Wengang Zhou, and Houqiang Li. Gaussnav: Gaussian splatting for visual navigation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(5): 4108–4121, 2025. doi: 10.1109/TPAMI.2025.3538496
-
[27]
Guandong Li and Mengxia Ye. Dual-channel attention guidance for training-free image editing control in diffusion transformers, 2026. URLhttps://arxiv.org/abs/2602.18022
-
[28]
Dngaussian: Optimizing sparse-view 3d gaussian radiance fields with global-local depth normalization
Jiahe Li, Jiawei Zhang, Xiao Bai, Jin Zheng, Xin Ning, Jun Zhou, and Lin Gu. Dngaussian: Optimizing sparse-view 3d gaussian radiance fields with global-local depth normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20775–20785, 2024. 11
work page 2024
-
[29]
Freecontrol: Efficient, training-free structural control via one-step attention extraction
Jiang Lin, Xinyu Chen, Song Wu, Zhiqiu Zhang, Jizhi Zhang, Ye Wang, Qiang Tang, Qian Wang, Jian Yang, and Zili Yi. Freecontrol: Efficient, training-free structural control via one-step attention extraction. InAdvances in Neural Information Processing Systems, 2025
work page 2025
-
[30]
Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations (ICLR), 2023. URLhttps://openreview.net/forum?id=PqvMRDCJT9t
work page 2023
-
[31]
PlaneRCNN: 3d plane detection and reconstruction from a single image
Chen Liu, Kihwan Kim, Jinwei Gu, Yasutaka Furukawa, and Jan Kautz. PlaneRCNN: 3d plane detection and reconstruction from a single image. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4450–4459, 2019. doi: 10.1109/CVPR.2019.00458
-
[32]
Reconx: Reconstruct any scene from sparse views with video diffusion model
Fangfu Liu, Wenqiang Sun, Hanyang Wang, Yikai Wang, Haowen Sun, Junliang Ye, Jun Zhang, and Yueqi Duan. Reconx: Reconstruct any scene from sparse views with video diffusion model. IEEE Transactions on Image Processing, 35:2305–2319, 2026. doi: 10.1109/TIP.2026.3666733
-
[33]
Xinhang Liu, Jiaben Chen, Shiu-Hong Kao, Yu-Wing Tai, and Chi-Keung Tang. Deceptive- nerf/3dgs: Diffusion-generated pseudo-observations for high-quality sparse-view reconstruction. InComputer Vision – ECCV 2024, pages 337–355. Springer Nature Switzerland, 2024. doi: 10.1007/978-3-031-72640-8_19
-
[34]
Omniroam: World wandering via long-horizon panoramic video generation.SIGGRAPH, 2026
Yuheng Liu, Xin Lin, Xinke Li, Baihan Yang, Chen Wang, Kalyan Sunkavalli, Yannick Hold- Geoffroy, Hao Tan, Kai Zhang, Xiaohui Xie, Zifan Shi, and Yiwei Hu. Omniroam: World wandering via long-horizon panoramic video generation.SIGGRAPH, 2026
work page 2026
-
[35]
You see it, you got it: Learning 3d creation on pose-free videos at scale
Baorui Ma, Huachen Gao, Haoge Deng, Zhengxiong Luo, Tiejun Huang, Lulu Tang, and Xinlong Wang. You see it, you got it: Learning 3d creation on pose-free videos at scale. In IEEE/CVF conference on computer vision and pattern recognition, 2025
work page 2025
-
[36]
Splatfields: Neural gaussian splats for sparse 3d and 4d reconstruction
Marko Mihajlovic, Sergey Prokudin, Siyu Tang, Robert Maier, Federica Bogo, Tony Tung, and Edmond Boyer. Splatfields: Neural gaussian splats for sparse 3d and 4d reconstruction. InCom- puter Vision – ECCV 2024, pages 313–332. Springer, 2024. doi: 10.1007/978-3-031-72627-9_ 18
-
[37]
G4splat: Geometry-guided gaussian splatting with generative prior
Junfeng Ni, Yixin Chen, Zhifei Yang, Yu Liu, Ruijie Lu, Song-Chun Zhu, and Siyuan Huang. G4splat: Geometry-guided gaussian splatting with generative prior. InThe Fourteenth Interna- tional Conference on Learning Representations, 2026
work page 2026
-
[38]
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection , isbn =
Michael Niemeyer, Jonathan T. Barron, Ben Mildenhall, Mehdi S. M. Sajjadi, Andreas Geiger, and Noha Radwan. RegNeRF: Regularizing neural radiance fields for view synthesis from sparse inputs. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5470–5480, 2022. doi: 10.1109/CVPR52688.2022.00540
-
[39]
Ri3d: Few-shot gaussian splatting with repair and inpainting diffusion priors
Avinash Paliwal, Xilong Zhou, Wei Ye, Jinhui Xiong, Rakesh Ranjan, and Nima Khademi Kalantari. Ri3d: Few-shot gaussian splatting with repair and inpainting diffusion priors. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025
work page 2025
-
[40]
Dropgaussian: Structural regularization for sparse-view gaussian splatting
Hyunwoo Park, Gun Ryu, and Wonjun Kim. Dropgaussian: Structural regularization for sparse-view gaussian splatting. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 21600–21609, 2025
work page 2025
-
[41]
Ad-gs: Alternating densification for sparse-input 3d gaussian splatting
Gurutva Patle, Nilay Girgaonkar, Nagabhushan Somraj, and Rajiv Soundararajan. Ad-gs: Alternating densification for sparse-input 3d gaussian splatting. InProceedings of the SIG- GRAPH Asia 2025 Conference Papers. Association for Computing Machinery, 2025. doi: 10.1145/3757377.3763993
-
[42]
Semantic image inversion and editing using rectified stochastic differential equations
L Rout, Y Chen, N Ruiz, C Caramanis, S Shakkottai, and W Chu. Semantic image inversion and editing using rectified stochastic differential equations. InThe Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum? id=Hu0FSOSEyS. 12
work page 2025
-
[43]
Meixi Song, Xin Lin, Dizhe Zhang, Haodong Li, Xiangtai Li, Bo Du, and Lu Qi. D 2GS: Depth-and-density guided gaussian splatting for stable and accurate sparse-view reconstruction. InInternational Conference on Learning Representations, 2026
work page 2026
-
[44]
The Replica Dataset: A Digital Replica of Indoor Spaces
Julian Straub, Thomas Whelan, Lingni Ma, Yufan Chen, Erik Wijmans, Simon Green, Jakob J Engel, Raul Mur-Artal, Carl Ren, Shobhit Verma, et al. The replica dataset: A digital replica of indoor spaces.arXiv preprint arXiv:1906.05797, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1906
-
[45]
HorizonNet: Learning room layout with 1d representation and pano stretch data augmentation
Cheng Sun, Chi-Wei Hsiao, Min Sun, and Hwann-Tzong Chen. HorizonNet: Learning room layout with 1d representation and pano stretch data augmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1047–1056,
-
[46]
doi: 10.1109/CVPR.2019.00114
-
[47]
Wenhao Sun, Xue-Mei Dong, Benlei Cui, and Jingqun Tang. Attentive eraser: Unleashing dif- fusion model’s object removal potential via self-attention redirection guidance. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 20734–20742, 2025. URL https://ojs.aaai.org/index.php/AAAI/article/view/34285
work page 2025
-
[48]
Murat Tekalp, and Federico Tombari
Atakan Topalo˘glu, Kunyi Li, Michael Niemeyer, Nassir Navab, A. Murat Tekalp, and Federico Tombari. Oraclegs: Grounding generative priors for sparse-view gaussian splatting. InProceed- ings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 77–87, March 2026
work page 2026
-
[49]
Plug-and-play diffusion features for text-driven image-to-image translation
Narek Tumanyan, Michal Geyer, Shai Bagon, and Tali Dekel. Plug-and-play diffusion features for text-driven image-to-image translation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1921–1930, 2023
work page 1921
-
[50]
Dn-splatter: Depth and normal priors for gaussian splatting and meshing
Matias Turkulainen, Xuqian Ren, Iaroslav Melekhov, Otto Seiskari, Esa Rahtu, and Juho Kannala. Dn-splatter: Depth and normal priors for gaussian splatting and meshing. InProceed- ings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 2421–2431, 2025. doi: 10.1109/W ACV61041.2025.00241
work page doi:10.1109/w 2025
-
[51]
Guangcong Wang, Zhaoxi Chen, Chen Change Loy, and Ziwei Liu. SparseNeRF: Distilling depth ranking for few-shot novel view synthesis. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9065–9076, October 2023. doi: 10.1109/ ICCV51070.2023.00832
-
[52]
Dust3r: Geometric 3d vision made easy
Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, and Jerome Revaud. Dust3r: Geometric 3d vision made easy. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20697–20709, 2024. doi: 10.1109/CVPR52733. 2024.01956. URL https://openaccess.thecvf.com/content/CVPR2024/html/Wang_ DUSt3R_Geometric_3D_Vis...
-
[53]
π3: Permutation-equivariant visual geometry learning
Yifan Wang, Jianjun Zhou, Haoyi Zhu, Wenzheng Chang, Yang Zhou, Zizun Li, Junyi Chen, Jiangmiao Pang, Chunhua Shen, and Tong He. π3: Permutation-equivariant visual geometry learning. InInternational Conference on Learning Representations (ICLR), 2026
work page 2026
-
[54]
InProceedings of the IEEE/CVF conference on computer vision and pattern recognition
Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. Image quality assessment: From error visibility to structural similarity.IEEE Transactions on Image Processing, 13(4): 600–612, 2004. doi: 10.1109/TIP.2003.819861
-
[55]
PanoDiffusion: 360-degree panorama outpainting via diffusion
Tianhao Wu, Chuanxia Zheng, and Tat-Jen Cham. PanoDiffusion: 360-degree panorama outpainting via diffusion. InThe Twelfth International Conference on Learning Representations, 2024
work page 2024
-
[56]
Ziyi Wu, Daniel Watson, Andrea Tagliasacchi, David J. Fleet, Marcus A. Brubaker, and Saurabh Saxena. 360Anything: Geometry-free lifting of images and videos to 360°.arXiv, 2026
work page 2026
-
[57]
Sparse view synthesis using 3d gaussian splatting.arXiv preprint arXiv:2312.00206, 2025
Haolin Xiong, Sairisheek Muttukuru, Rishi Upadhyay, Pradyumna Chari, and Achuta Kadambi. Sparse view synthesis using 3d gaussian splatting.arXiv preprint arXiv:2312.00206, 2025. doi: 10.48550/arXiv.2312.00206. 13
-
[58]
Scannet++: A high- fidelity dataset of 3d indoor scenes
Chandan Yeshwanth, Yueh-Cheng Liu, Matthias Nießner, and Angela Dai. Scannet++: A high- fidelity dataset of 3d indoor scenes. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 12–22, 2023
work page 2023
-
[59]
Fewviewgs: Gaussian splatting with few view matching and multi-stage training
Ruihong Yin, Vladimir Yugay, Yue Li, Sezer Karaoglu, and Theo Gevers. Fewviewgs: Gaussian splatting with few view matching and multi-stage training. InAdvances in Neural Information Processing Systems, volume 37, 2024
work page 2024
-
[60]
Fregs: 3d gaussian splatting with progressive frequency regularization
Jiahui Zhang, Fangneng Zhan, Muyu Xu, Shijian Lu, and Eric Xing. Fregs: 3d gaussian splatting with progressive frequency regularization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21424–21433, 2024
work page 2024
-
[61]
Adding conditional control to text-to-image diffusion models
Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3836–3847, 2023
work page 2023
-
[62]
Efros, Eli Shechtman, and Oliver Wang
Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. The unrea- sonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 586–595, 2018. doi: 10.1109/CVPR.2018.00068
-
[63]
Wancai Zheng, Hao Chen, Xianlong Lu, Linlin Ou, and Xinyi Yu. 3dgsnav: Enhancing vision-language model reasoning for object navigation via active 3d gaussian splatting, 2026
work page 2026
-
[64]
Nexusgs: Sparse view synthesis with epipolar depth priors in 3d gaussian splatting
Yulong Zheng, Zicheng Jiang, Shengfeng He, Yandu Sun, Junyu Dong, Huaidong Zhang, and Yong Du. Nexusgs: Sparse view synthesis with epipolar depth priors in 3d gaussian splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
work page 2025
-
[65]
Yingji Zhong, Zhihao Li, Dave Zhenyu Chen, Lanqing Hong, and Dan Xu. Taming video diffusion prior with scene-grounding guidance for 3d gaussian splatting from sparse inputs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6133–6143, 2025
work page 2025
-
[66]
Learning to reconstruct 3d manhattan wireframes from a single image
Yichao Zhou, Haozhi Qi, Yuexiang Zhai, Qi Sun, Zhili Chen, Li-Yi Wei, and Yi Ma. Learning to reconstruct 3d manhattan wireframes from a single image. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 7698–7707, 2019. doi: 10.1109/ ICCV .2019.00779
-
[67]
Fsgs: Real-time few-shot view synthesis using gaussian splatting
Zehao Zhu, Zhiwen Fan, Yifan Jiang, and Zhangyang Wang. Fsgs: Real-time few-shot view synthesis using gaussian splatting. InComputer Vision – ECCV 2024, pages 145–163. Springer,
work page 2024
-
[68]
doi: 10.1007/978-3-031-72933-1_9. 14 Appendix In this appendix, we provide additional discussion, experimental results, and technical details: implementation details (Sec. A), Failure cases (Sec. B), and additional qualitative and quantitative results (Sec. D, C). A Implementation Details General ConfigurationAll experiments are conducted on a NVIDIA A600...
-
[69]
What does this region look like?
-
[70]
Is this region located on a wall, floor, ceiling or some other surface?
-
[71]
Give your final answer as a single word: wall, floor, ceiling, bed, table, shelf, cabinet, window, door, or other. The model generates up to 200 tokens of reasoning. We parse the response by finding thelast occurrence of any label keyword; if the final keyword is “wall”, “floor”, or “ceiling”, the plane is labeledlayout, otherwisenon-layout. Using the las...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.