arxiv: 2605.09688 · v1 · submitted 2026-05-10 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

ConFixGS: Learning to Fix Feedforward 3D Gaussian Splatting with Confidence-Aware Diffusion Priors in Driving Scenes

Jiaqi Ma, Markus Gross, Olaf Wysocki, Rui Song, Tianhui Cai, Xingcheng Zhou, Zewei Zhou, Zhiyu Huang

Authors on Pith no claims yet

Pith reviewed 2026-05-12 03:31 UTC · model grok-4.3

classification 💻 cs.CV

keywords 3D Gaussian Splattingdiffusion priorsnovel view synthesisdriving scenesconfidence mapsfeedforward reconstructionsparse viewsreprojection consistency

0 comments

The pith

ConFixGS repairs feedforward 3D Gaussian Splatting in driving scenes by validating diffusion enhancements against support-view consistency.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents ConFixGS as a plug-and-play fix for feedforward 3D Gaussian Splatting models that produce poor results when only sparse trajectory views are available in driving data. It generates local pseudo-targets enhanced by diffusion models and then applies reprojection cross-checking from neighboring support views to create dense confidence maps. These maps direct which enhanced details to incorporate during refinement, keeping consistent content while discarding hallucinations. A sympathetic reader would care because accurate novel view synthesis from limited moving-camera feeds directly affects the reliability of 3D scene understanding in real-world navigation tasks.

Core claim

ConFixGS begins with a pretrained feedforward 3DGS model, produces diffusion-enhanced local pseudo-targets, and validates them through reprojection-based cross-checking against support views to build dense confidence maps. The maps then guide refinement so that reliable details from the priors are kept while hallucinated or inconsistent evidence is suppressed. On Waymo, nuScenes, and KITTI this yields improved novel view synthesis with PSNR gains of up to 3.68 dB and FID reduced by nearly half.

What carries the argument

The confidence-aware fusion pipeline that creates diffusion-enhanced pseudo-targets and filters them via reprojection cross-checking to produce dense maps that control refinement.

If this is right

Feedforward 3DGS models become usable for challenging sparse-view driving reconstructions without per-scene optimization.
Diffusion priors can be safely integrated into geometric reconstruction pipelines when filtered by view-consistency checks.
Novel view synthesis quality improves measurably on standard autonomous-driving benchmarks.
The same confidence-guided principle can be applied to other generative priors beyond diffusion.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach may generalize to indoor or handheld sparse-view settings if the cross-checking remains robust to different motion patterns.
Reducing hallucinations in this way could lower the camera density required for acceptable 3D driving maps.
If the refinement step is made efficient, the method could support online map updates from vehicle fleets.

Load-bearing premise

Reprojection cross-checking against support views can reliably separate useful diffusion-enhanced details from hallucinated or inconsistent content in trajectory-based sparse-view driving scenes.

What would settle it

Running the refinement step without the confidence maps and measuring whether PSNR on held-out novel views drops below the reported gains or whether visual artifacts increase in regions the maps previously down-weighted.

Figures

Figures reproduced from arXiv: 2605.09688 by Jiaqi Ma, Markus Gross, Olaf Wysocki, Rui Song, Tianhui Cai, Xingcheng Zhou, Zewei Zhou, Zhiyu Huang.

**Figure 1.** Figure 1: ConFixGS: a plug-and-play repair of feedforward 3DGS in sparse driving scenes. Left: Our confidence-guided method enhances state-of-the-art feedforward backbones, yielding better novel view rendering and consistent gains across metrics. Right: The repaired 3D Gaussians generalize well beyond the original camera trajectory, supporting novel view synthesis under large lateral offsets and elevated drone-like … view at source ↗

**Figure 2.** Figure 2: ConFixGS Framework Overview. ConFixGS consists of three stages: (i) initial feedforward reconstruction with local pseudo-view episodes, (ii) input-observation-guided confidence estimation through reprojection, and (iii) confidence-modulated global repair optimization. and ci ∈ R 3 is the RGB color. During Stage 3.5 we adopt the standard logit/log reparameterization oˆi = σ −1 (oi) and sˆi = log si so that… view at source ↗

**Figure 3.** Figure 3: Visual ablation study. We visualize the effects of individual components by removing one module at a time. We use WorldMirror [71] as the feedforward backbone [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Visual comparison. We use DepthSplat [69] as the feedforward backbone for our approach. More results are shown in Figs. 7 and 8, with additional backbone comparisons in 9, 10, and 11. 5 Conclusion In summary, we present ConFixGS, a confidence-guided repair approach specifically designed to fix feedforward 3DGS in sparse-view driving scenes. ConFixGS uses a training-free reprojection check to cross-validate… view at source ↗

**Figure 9.** Figure 9: Moreover, to demonstrate that our approach is plug-and-play, we apply [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗

**Figure 5.** Figure 5: Comparison of global and local feedforward 3DGS rendering on Waymo, nuScenes, and KITTI. For each scene we compare the global feedforward (FW) rendering of G0, which has to reconstruct the entire trajectory from sparse, weakly overlapping support views, against the local FW rendering produced from a small subset around each novel view, and report the diffusion-enhanced version. Across all three datasets, l… view at source ↗

**Figure 6.** Figure 6: Additional novel view synthesis results under more challenging viewpoint shifts, including lateral offsets of 1 m and 3 m, as well as a drone-style viewpoint at approximately 2.5 m height with a 20◦ downward pitch. 24 [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗

**Figure 7.** Figure 7: Qualitative comparison of DepthSplat [69] before and after ConFixGS enhancement – I. 25 [PITH_FULL_IMAGE:figures/full_fig_p025_7.png] view at source ↗

**Figure 8.** Figure 8: Qualitative comparison of DepthSplat [69] before and after ConFixGS enhancement – II. 26 [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗

**Figure 9.** Figure 9: Qualitative comparison of WorldMirror [71] before and after ConFixGS enhancement [PITH_FULL_IMAGE:figures/full_fig_p027_9.png] view at source ↗

**Figure 10.** Figure 10: Qualitative comparison of AnySplat [70] before and after ConFixGS enhancement [PITH_FULL_IMAGE:figures/full_fig_p027_10.png] view at source ↗

**Figure 11.** Figure 11: Qualitative comparison of DrivingForward [72] before and after ConFixGS enhancement. 27 [PITH_FULL_IMAGE:figures/full_fig_p027_11.png] view at source ↗

**Figure 12.** Figure 12: Qualitative comparison with Difix3D+ [17]. We use DepthSplat [69] as the feedforward backbone for our approach. 28 [PITH_FULL_IMAGE:figures/full_fig_p028_12.png] view at source ↗

read the original abstract

Feedforward 3D Gaussian Splatting (3DGS) often struggles in trajectory-based sparse-view driving scenes. Existing Gaussian repair methods mainly target optimization-based 3DGS, while diffusion-based repair is typically restricted to iterative refinement near observed viewpoints, leaving feedforward 3DGS repair underexplored. We propose ConFixGS, a plug-and-play method that learns to fix feedforward 3DGS with confidence-aware diffusion priors. Starting from a pretrained feedforward model, ConFixGS generates diffusion-enhanced local pseudo-targets and validates them through reprojection-based cross-checking against support views. The resulting dense confidence maps guide refinement, enhancing reliable details while suppressing hallucinated or inconsistent evidence. On Waymo, nuScenes, and KITTI, ConFixGS improves challenging novel view synthesis, with PSNR gains of up to 3.68 dB and FID reduced by nearly half. Our results highlight confidence-aware fusion of generative priors and support-view consistency as a key principle for robust feedforward 3D driving scene reconstruction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ConFixGS adds a reprojection-guided confidence filter to diffusion priors for fixing feedforward 3DGS in sparse driving trajectories, delivering reported PSNR and FID gains on standard benchmarks, though the filter's reliability in low-parallax scenes remains the open question.

read the letter

ConFixGS is a plug-and-play method that takes a pretrained feedforward 3D Gaussian Splatting model and refines it for trajectory-based sparse views by generating local diffusion pseudo-targets and then keeping only the parts that pass reprojection checks against support frames. The resulting confidence maps steer the final 3DGS update. This targets an underexplored corner: most prior repair work assumes optimization-based 3DGS, while diffusion fixes have stayed close to observed viewpoints. The paper shows the approach on Waymo, nuScenes, and KITTI, with PSNR lifts reaching 3.68 dB and FID roughly halved for novel-view synthesis. Those numbers are concrete and the framing as a lightweight add-on is practical for driving-scene pipelines. The combination of generative detail with geometric consistency checks is a reasonable way to handle the under-constrained regions that feedforward models leave behind. The main soft spot is exactly the one the stress-test flags. Driving trajectories usually give small baselines, so a diffusion model can produce content that is geometrically coherent across the few available views yet still wrong. If lane markings or foliage get fabricated in a way that reprojects cleanly, the confidence map will assign them high weight and they will be baked into the splats. The abstract does not include ablations that isolate this failure mode or quantify how often the check actually discards bad content versus good. Without those, the reported gains rest on an assumption that needs direct testing. This work is for people already using 3DGS or diffusion models for vehicle-camera reconstruction. A reader who needs better novel views from limited driving passes will find the method and the dataset results useful to try. I would send it for peer review. The core idea fills a stated gap, the experiments are on real benchmarks, and the remaining questions about the confidence step are the sort referees can resolve with targeted revisions.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces ConFixGS, a plug-and-play refinement method for feedforward 3D Gaussian Splatting in trajectory-based sparse-view driving scenes. It generates diffusion-enhanced local pseudo-targets from pretrained diffusion models and validates them via reprojection-based cross-checking against support views to produce dense confidence maps. These maps then guide refinement of the initial 3DGS output, aiming to retain reliable details while suppressing hallucinations. The authors report quantitative gains on Waymo, nuScenes, and KITTI, with PSNR improvements up to 3.68 dB and FID reduced by nearly half for novel view synthesis.

Significance. If the confidence-aware filtering step proves reliable, the work would offer a practical advance for feedforward 3D reconstruction in autonomous driving by integrating generative priors without per-scene optimization. The focus on geometric consistency checks to control diffusion outputs addresses a relevant limitation in sparse-view settings and could influence hybrid reconstruction pipelines. The reported metrics indicate potential utility, though the absence of supporting implementation details and targeted validation limits immediate assessment of broader impact.

major comments (1)

[Method (confidence map generation and refinement)] The core claim rests on the reprojection cross-checking step (described in the method overview and confidence map generation) reliably distinguishing useful diffusion-enhanced content from hallucinations. In the low-parallax, near-collinear trajectory regimes of the evaluated datasets, this check may fail to expose geometrically coherent but incorrect syntheses (e.g., fabricated lane markings or foliage) that reproject consistently across the limited support views. This is load-bearing for the reported PSNR/FID gains, as high-confidence erroneous regions would be incorporated into the refined 3DGS. The manuscript provides no dedicated analysis, ablation on baseline distance, or controlled hallucination tests to substantiate the filtering efficacy.

minor comments (2)

[Abstract] The abstract states concrete PSNR and FID numbers without specifying the exact baseline feedforward model, dataset splits, or comparison methods used to compute the 'up to 3.68 dB' gain.
[Method and Experiments] No implementation details, diffusion model architecture, or training procedure for the confidence predictor are provided, which hinders reproducibility of the plug-and-play claim.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comment below and will incorporate additional validation in the revised manuscript.

read point-by-point responses

Referee: [Method (confidence map generation and refinement)] The core claim rests on the reprojection cross-checking step (described in the method overview and confidence map generation) reliably distinguishing useful diffusion-enhanced content from hallucinations. In the low-parallax, near-collinear trajectory regimes of the evaluated datasets, this check may fail to expose geometrically coherent but incorrect syntheses (e.g., fabricated lane markings or foliage) that reproject consistently across the limited support views. This is load-bearing for the reported PSNR/FID gains, as high-confidence erroneous regions would be incorporated into the refined 3DGS. The manuscript provides no dedicated analysis, ablation on baseline distance, or controlled hallucination tests to substantiate the filtering efficacy.

Authors: We agree that the reliability of the reprojection-based cross-checking is central to our approach, particularly in the challenging low-parallax conditions typical of driving trajectories. While our experiments on Waymo, nuScenes, and KITTI demonstrate consistent improvements in PSNR and FID, indicating that the confidence maps effectively filter hallucinations in practice, we acknowledge the lack of targeted ablations. In the revision, we will add an analysis of the confidence map generation, including ablations varying the baseline distance between support views and controlled tests using synthetic hallucinations to validate the filtering efficacy. This will strengthen the evidence for the method's robustness. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation; method is externally grounded

full rationale

The paper presents ConFixGS as a plug-and-play refinement pipeline that starts from an independently pretrained feedforward 3DGS model, applies an external diffusion prior to generate pseudo-targets, and uses geometric reprojection against support views to produce confidence maps. No equations, fitted parameters, or self-referential definitions appear in the abstract or description; the central claims rest on the empirical performance of these independent components rather than any reduction of outputs to inputs by construction. The approach is self-contained against external benchmarks and does not invoke load-bearing self-citations or ansatzes.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no free parameters, axioms, or invented entities are specified in the provided text.

pith-pipeline@v0.9.0 · 5512 in / 1179 out tokens · 50729 ms · 2026-05-12T03:31:01.655502+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
We define the discrepancy between the pseudo-target and the consensus as en(p) = 1/3 Σ |I∗n(p,c) - Īreproj(p,c)|. Support-validated pixel-wise confidence score ˜wn(p) = exp(-en(p)² / 2σ²e) if V(p)≠∅ …
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear
The resulting confidence maps guide a global 3DGS repair … modulating both photometric supervision and densification

Reference graph

Works this paper leans on

110 extracted references · 110 canonical work pages · 3 internal anchors

[1]

Hugsim: A real-time, photo-realistic and closed-loop simulator for autonomous driving.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

Hongyu Zhou, Longzhong Lin, Jiabao Wang, Yichong Lu, Dongfeng Bai, Bingbing Liu, Yue Wang, Andreas Geiger, and Yiyi Liao. Hugsim: A real-time, photo-realistic and closed-loop simulator for autonomous driving.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

work page 2025
[2]

Pseudo-simulation for autonomous driving

Wei Cao, Marcel Hallgarten, Tianyu Li, Daniel Dauner, Xunjiang Gu, Caojun Wang, Yakov Miron, Marco Aiello, Hongyang Li, Igor Gilitschenski, Boris Ivanovic, Marco Pavone, Andreas Geiger, and Kashyap Chitta. Pseudo-simulation for autonomous driving. InConference on Robot Learning (CoRL), 2025

work page 2025
[3]

BridgeSim: Unveiling the OL-CL Gap in End-to-End Autonomous Driving

Seth Z Zhao, Luobin Wang, Hongwei Ruan, Yuxin Bao, Yilan Chen, Ziyang Leng, Abhijit Ravichandran, Honglin He, Zewei Zhou, Xu Han, et al. Bridgesim: Unveiling the ol-cl gap in end-to-end autonomous driving.arXiv preprint arXiv:2604.10856, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[4]

3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42(4), July 2023

Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42(4), July 2023

work page 2023
[5]

pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction

David Charatan, Sizhe Li, Andrea Tagliasacchi, and Vincent Sitzmann. pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction. InCVPR, 2024

work page 2024
[6]

Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images

Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, and Jianfei Cai. Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images. InEuropean conference on computer vision, pages 370–386. Springer, 2024

work page 2024
[7]

Freesplat: Generalizable 3d gaussian splatting towards free view synthesis of indoor scenes.Advances in Neural Information Processing Systems, 37:107326–107349, 2024

Yunsong Wang, Tianxin Huang, Hanlin Chen, and Gim Hee Lee. Freesplat: Generalizable 3d gaussian splatting towards free view synthesis of indoor scenes.Advances in Neural Information Processing Systems, 37:107326–107349, 2024

work page 2024
[8]

PF3plat: Pose-free feed-forward 3d gaussian splatting for novel view synthesis

Sunghwan Hong, Jaewoo Jung, Heeseong Shin, Jisang Han, Jiaolong Yang, Chong Luo, and Seungryong Kim. PF3plat: Pose-free feed-forward 3d gaussian splatting for novel view synthesis. InForty-second International Conference on Machine Learning, 2025

work page 2025
[9]

arXiv preprint arXiv:2410.24207 (2024)

Botao Ye, Sifei Liu, Haofei Xu, Xueting Li, Marc Pollefeys, Ming-Hsuan Yang, and Songyou Peng. No pose, no problem: Surprisingly simple 3d gaussian splats from sparse unposed images.arXiv preprint arXiv:2410.24207, 2024

work page arXiv 2024
[10]

latentsplat: Autoencoding variational gaussians for fast generalizable 3d reconstruction

Christopher Wewer, Kevin Raj, Eddy Ilg, Bernt Schiele, and Jan Eric Lenssen. latentsplat: Autoencoding variational gaussians for fast generalizable 3d reconstruction. InEuropean conference on computer vision, pages 456–473. Springer, 2024

work page 2024
[11]

Splatt3r: Zero-shot gaussian splatting from uncalibrated image pairs.arXiv preprint arXiv:2408.13912, 2024

Brandon Smart, Chuanxia Zheng, Iro Laina, and Victor Adrian Prisacariu. Splatt3r: Zero-shot gaussian splatting from uncalibrated image pairs.arXiv preprint arXiv:2408.13912, 2024

work page arXiv 2024
[12]

Yonosplat: You only need one model for feedforward 3d gaussian splatting.arXiv preprint arXiv:2511.07321, 2025

Botao Ye, Boqi Chen, Haofei Xu, Daniel Barath, and Marc Pollefeys. Yonosplat: You only need one model for feedforward 3d gaussian splatting.arXiv preprint arXiv:2511.07321, 2025

work page arXiv 2025
[13]

Driv- ingscene: A multi-task online feed-forward 3d gaussian splatting method for dynamic driving scenes

Qirui Hou, Wenzhang Sun, Chang Zeng, Chunfeng Wang, Hao Li, and Jianxun Cui. Driv- ingscene: A multi-task online feed-forward 3d gaussian splatting method for dynamic driving scenes. InICASSP 2026-2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 10287–10291. IEEE, 2026

work page 2026
[14]

EnerGS: Energy-Based Gaussian Splatting with Partial Geometric Priors

Rui Song, Tianhui Cai, Markus Gross, Yun Zhang, Walter Zimmer, Zhiyu Huang, Olaf Wysocki, and Jiaqi Ma. Energs: Energy-based gaussian splatting with partial geometric priors. arXiv preprint arXiv:2604.26238, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[15]

Vgd: Visual geometry gaussian splatting for feed-forward surround-view driving reconstruction.arXiv preprint arXiv:2510.19578, 2025

Junhong Lin, Kangli Wang, Shunzhou Wang, Songlin Fan, Ge Li, and Wei Gao. Vgd: Visual geometry gaussian splatting for feed-forward surround-view driving reconstruction.arXiv preprint arXiv:2510.19578, 2025

work page arXiv 2025
[16]

Ggs: Gener- alizable gaussian splatting for lane switching in autonomous driving

Huasong Han, Kaixuan Zhou, Xiaoxiao Long, Yusen Wang, and Chunxia Xiao. Ggs: Gener- alizable gaussian splatting for lane switching in autonomous driving. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 3329–3337, 2025. 10

work page 2025
[17]

Difix3d+: Improving 3d reconstructions with single-step diffusion models

Jay Zhangjie Wu, Yuxuan Zhang, Haithem Turki, Xuanchi Ren, Jun Gao, Mike Zheng Shou, Sanja Fidler, Zan Gojcic, and Huan Ling. Difix3d+: Improving 3d reconstructions with single-step diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 26024–26035, 2025

work page 2025
[18]

3dgs-enhancer: Enhancing unbounded 3d gaussian splatting with view-consistent 2d diffusion priors.Advances in Neural Information Processing Systems, 37:133305–133327, 2024

Xi Liu, Chaoyi Zhou, and Siyu Huang. 3dgs-enhancer: Enhancing unbounded 3d gaussian splatting with view-consistent 2d diffusion priors.Advances in Neural Information Processing Systems, 37:133305–133327, 2024

work page 2024
[19]

Fixinggs: Enhancing 3d gaussian splatting via training-free score distillation.arXiv preprint arXiv:2509.18759, 2025

Zhaorui Wang, Yi Gu, Deming Zhou, and Renjing Xu. Fixinggs: Enhancing 3d gaussian splatting via training-free score distillation.arXiv preprint arXiv:2509.18759, 2025

work page arXiv 2025
[20]

Gsfix3d: Diffusion-guided repair of novel views in gaussian splatting.arXiv preprint arXiv:2508.14717, 2025

Jiaxin Wei, Stefan Leutenegger, and Simon Schaefer. Gsfix3d: Diffusion-guided repair of novel views in gaussian splatting.arXiv preprint arXiv:2508.14717, 2025

work page arXiv 2025
[21]

arXiv preprint arXiv:2508.09667 , year=

Xingyilang Yin, Qi Zhang, Jiahao Chang, Ying Feng, Qingnan Fan, Xi Yang, Chi-Man Pun, Huaqi Zhang, and Xiaodong Cun. Gsfixer: Improving 3d gaussian splatting with reference- guided video diffusion priors.arXiv preprint arXiv:2508.09667, 2025

work page arXiv 2025
[22]

Ri3d: Few-shot gaussian splatting with repair and inpainting diffusion priors

Avinash Paliwal, Xilong Zhou, Wei Ye, Jinhui Xiong, Rakesh Ranjan, and Nima Khademi Kalantari. Ri3d: Few-shot gaussian splatting with repair and inpainting diffusion priors. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 25094– 25103, 2025

work page 2025
[23]

Lm-gaussian: Boost sparse-view 3d gaussian splatting with large model priors.arXiv preprint arXiv:2409.03456, 2024

Hanyang Yu, Xiaoxiao Long, and Ping Tan. Lm-gaussian: Boost sparse-view 3d gaussian splatting with large model priors.arXiv preprint arXiv:2409.03456, 2024

work page arXiv 2024
[24]

Streetcrafter: Street view synthesis with controllable video diffusion models

Yunzhi Yan, Zhen Xu, Haotong Lin, Haian Jin, Haoyu Guo, Yida Wang, Kun Zhan, Xianpeng Lang, Hujun Bao, Xiaowei Zhou, et al. Streetcrafter: Street view synthesis with controllable video diffusion models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 822–832, 2025

work page 2025
[25]

Genwarp: Single image to novel views with semantic-preserving generative warping.Advances in Neural Information Processing Systems, 37:80220–80243, 2024

Junyoung Seo, Kazumi Fukuda, Takashi Shibuya, Takuya Narihira, Naoki Murata, Shoukang Hu, Chieh-Hsin Lai, Seungryong Kim, and Yuki Mitsufuji. Genwarp: Single image to novel views with semantic-preserving generative warping.Advances in Neural Information Processing Systems, 37:80220–80243, 2024

work page 2024
[26]

Multidiff: Consistent novel view synthesis from a single im- age

Norman Müller, Katja Schwarz, Barbara Rössle, Lorenzo Porzi, Samuel Rota Bulo, Matthias Nießner, and Peter Kontschieder. Multidiff: Consistent novel view synthesis from a single im- age. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10258–10268, 2024

work page 2024
[27]

Driving view synthesis on free-form trajectories with generative prior.arXiv preprint arXiv:2412.01717, 2024

Zeyu Yang, Zijie Pan, Yuankun Yang, Xiatian Zhu, and Li Zhang. Driving view synthesis on free-form trajectories with generative prior.arXiv preprint arXiv:2412.01717, 2024

work page arXiv 2024
[28]

Freesim: Toward free-viewpoint camera simulation in driving scenes

Lue Fan, Hao Zhang, Qitai Wang, Hongsheng Li, and Zhaoxiang Zhang. Freesim: Toward free-viewpoint camera simulation in driving scenes. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 12004–12014, 2025

work page 2025
[29]

Freevs: Generative view synthesis on free driving trajectory.arXiv preprint arXiv:2410.18079, 2024

Qitai Wang, Lue Fan, Yuqi Wang, Yuntao Chen, and Zhaoxiang Zhang. Freevs: Generative view synthesis on free driving trajectory.arXiv preprint arXiv:2410.18079, 2024

work page arXiv 2024
[30]

Nerf: Representing scenes as neural radiance fields for view synthesis

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoor- thi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021

work page 2021
[31]

Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields

Jonathan T Barron, Ben Mildenhall, Matthew Tancik, Peter Hedman, Ricardo Martin-Brualla, and Pratul P Srinivasan. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. InProceedings of the IEEE/CVF international conference on computer vision, pages 5855–5864, 2021

work page 2021
[32]

Nerf++: Analyzing and improving neural radiance fields.arXiv preprint arXiv:2010.07492, 2020

Kai Zhang, Gernot Riegler, Noah Snavely, and Vladlen Koltun. Nerf++: Analyzing and improving neural radiance fields.arXiv preprint arXiv:2010.07492, 2020. 11

work page arXiv 2010
[33]

Mip-nerf 360: Unbounded anti-aliased neural radiance fields

Jonathan T Barron, Ben Mildenhall, Dor Verbin, Pratul P Srinivasan, and Peter Hedman. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5470–5479, 2022

work page 2022
[34]

Mip-splatting: Alias-free 3d gaussian splatting

Zehao Yu, Anpei Chen, Binbin Huang, Torsten Sattler, and Andreas Geiger. Mip-splatting: Alias-free 3d gaussian splatting. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 19447–19456, 2024

work page 2024
[35]

Scaffold-gs: Structured 3d gaussians for view-adaptive rendering

Tao Lu, Mulin Yu, Linning Xu, Yuanbo Xiangli, Limin Wang, Dahua Lin, and Bo Dai. Scaffold-gs: Structured 3d gaussians for view-adaptive rendering. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20654–20664, 2024

work page 2024
[36]

2d gaussian splatting for geometrically accurate radiance fields

Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, and Shenghua Gao. 2d gaussian splatting for geometrically accurate radiance fields. InSIGGRAPH 2024 Conference Papers. Association for Computing Machinery, 2024

work page 2024
[37]

Pgsr: Planar-based gaussian splatting for efficient and high-fidelity surface reconstruction.IEEE Transactions on Visualization and Computer Graphics, 2024

Danpeng Chen, Hai Li, Weicai Ye, Yifan Wang, Weijian Xie, Shangjin Zhai, Nan Wang, Haomin Liu, Hujun Bao, and Guofeng Zhang. Pgsr: Planar-based gaussian splatting for efficient and high-fidelity surface reconstruction.IEEE Transactions on Visualization and Computer Graphics, 2024

work page 2024
[38]

Sugar: Surface-aligned gaussian splatting for efficient 3d mesh reconstruction and high-quality mesh rendering

Antoine Guédon and Vincent Lepetit. Sugar: Surface-aligned gaussian splatting for efficient 3d mesh reconstruction and high-quality mesh rendering. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5354–5363, 2024

work page 2024
[39]

Gaussianpro: 3d gaussian splatting with progressive propagation

Kai Cheng, Xiaoxiao Long, Kaizhi Yang, Yao Yao, Wei Yin, Yuexin Ma, Wenping Wang, and Xuejin Chen. Gaussianpro: 3d gaussian splatting with progressive propagation. InForty-first International Conference on Machine Learning, 2024

work page 2024
[40]

Taming 3dgs: High-quality radiance fields with limited resources

Saswat Subhajyoti Mallick, Rahul Goel, Bernhard Kerbl, Markus Steinberger, Francisco Vi- cente Carrasco, and Fernando De La Torre. Taming 3dgs: High-quality radiance fields with limited resources. InSIGGRAPH Asia 2024 Conference Papers, pages 1–11, 2024

work page 2024
[41]

Geogaussian: Geometry-aware gaussian splatting for scene rendering

Yanyan Li, Chenyu Lyu, Yan Di, Guangyao Zhai, Gim Hee Lee, and Federico Tombari. Geogaussian: Geometry-aware gaussian splatting for scene rendering. InEuropean conference on computer vision, pages 441–457. Springer, 2024

work page 2024
[42]

Dn-splatter: Depth and normal priors for gaussian splatting and meshing

Matias Turkulainen, Xuqian Ren, Iaroslav Melekhov, Otto Seiskari, Esa Rahtu, and Juho Kannala. Dn-splatter: Depth and normal priors for gaussian splatting and meshing. In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 2421–2431. IEEE, 2025

work page 2025
[43]

Depth-regularized optimization for 3d gaussian splatting in few-shot images

Jaeyoung Chung, Jeongtaek Oh, and Kyoung Mu Lee. Depth-regularized optimization for 3d gaussian splatting in few-shot images. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 811–820, 2024

work page 2024
[44]

Dngaussian: Optimizing sparse-view 3d gaussian radiance fields with global-local depth normalization

Jiahe Li, Jiawei Zhang, Xiao Bai, Jin Zheng, Xin Ning, Jun Zhou, and Lin Gu. Dngaussian: Optimizing sparse-view 3d gaussian radiance fields with global-local depth normalization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 20775–20785, 2024

work page 2024
[45]

Fatesgs: Fast and accurate sparse-view surface reconstruction using gaussian splatting with depth-feature consistency

Han Huang, Yulun Wu, Chao Deng, Ge Gao, Ming Gu, and Yu-Shen Liu. Fatesgs: Fast and accurate sparse-view surface reconstruction using gaussian splatting with depth-feature consistency. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 3644–3652, 2025

work page 2025
[46]

Cdgs: Confidence-aware depth regularization for 3d gaussian splatting.arXiv preprint arXiv:2502.14684, 2025

Qilin Zhang, Olaf Wysocki, Steffen Urban, and Boris Jutzi. Cdgs: Confidence-aware depth regularization for 3d gaussian splatting.arXiv preprint arXiv:2502.14684, 2025

work page arXiv 2025
[47]

Det-gs: Depth-and edge-aware regularization for high-fidelity 3d gaussian splatting.arXiv preprint arXiv:2508.04099, 2025

Zexu Huang, Min Xu, and Stuart Perry. Det-gs: Depth-and edge-aware regularization for high-fidelity 3d gaussian splatting.arXiv preprint arXiv:2508.04099, 2025. 12

work page arXiv 2025
[48]

Nerf is a valuable assistant for 3d gaussian splatting

Shuangkang Fang, I Shen, Takeo Igarashi, Yufeng Wang, ZeSheng Wang, Yi Yang, Wenrui Ding, Shuchang Zhou, et al. Nerf is a valuable assistant for 3d gaussian splatting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 26230– 26240, 2025

work page 2025
[49]

Streetsurf: Extending multi-view implicit surface reconstruction to street views.arXiv preprint arXiv:2306.04988, 2023

Jianfei Guo, Nianchen Deng, Xinyang Li, Yeqi Bai, Botian Shi, Chiyu Wang, Chenjing Ding, Dongliang Wang, and Yikang Li. Streetsurf: Extending multi-view implicit surface reconstruction to street views.arXiv preprint arXiv:2306.04988, 2023

work page arXiv 2023
[50]

Drivinggaussian: Composite gaussian splatting for surrounding dynamic autonomous driving scenes

Xiaoyu Zhou, Zhiwei Lin, Xiaojun Shan, Yongtao Wang, Deqing Sun, and Ming-Hsuan Yang. Drivinggaussian: Composite gaussian splatting for surrounding dynamic autonomous driving scenes. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 21634–21643, 2024

work page 2024
[51]

Street gaussians: Modeling dynamic urban scenes with gaussian splatting

Yunzhi Yan, Haotong Lin, Chenxu Zhou, Weijie Wang, Haiyang Sun, Kun Zhan, Xianpeng Lang, Xiaowei Zhou, and Sida Peng. Street gaussians: Modeling dynamic urban scenes with gaussian splatting. InEuropean Conference on Computer Vision, pages 156–173. Springer, 2024

work page 2024
[52]

Li-gs: Gaussian splatting with lidar incorporated for accurate large-scale reconstruction.IEEE Robotics and Automation Letters, 2024

Changjian Jiang, Ruilan Gao, Kele Shao, Yue Wang, Rong Xiong, and Yu Zhang. Li-gs: Gaussian splatting with lidar incorporated for accurate large-scale reconstruction.IEEE Robotics and Automation Letters, 2024

work page 2024
[53]

Lidar-enhanced 3d gaussian splatting mapping.arXiv preprint arXiv:2503.05425, 2025

Jian Shen, Huai Yu, Ji Wu, Wen Yang, and Gui-Song Xia. Lidar-enhanced 3d gaussian splatting mapping.arXiv preprint arXiv:2503.05425, 2025

work page arXiv 2025
[54]

Geomgs: Lidar-guided geometry-aware gaussian splatting for robot localization.arXiv preprint arXiv:2501.13417, 2025

Jaewon Lee, Mangyu Kong, Minseong Park, and Euntai Kim. Geomgs: Lidar-guided geometry- aware gaussian splatting for robot localization.arXiv preprint arXiv:2501.13417, 2025

work page arXiv 2025
[55]

D$^2$GS: Dense depth regularization for liDAR-free urban scene reconstruction

Kejing Xia, Jidong Jia, Ke Jin, Yucai Bai, Li Sun, Dacheng Tao, and Youjian Zhang. D$^2$GS: Dense depth regularization for liDAR-free urban scene reconstruction. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026

work page 2026
[56]

Streetsurfgs: Scalable urban street surface reconstruction with planar-based gaussian splatting

Xiao Cui, Weicai Ye, Yifan Wang, Guofeng Zhang, Wengang Zhou, Tong He, and Houqiang Li. Streetsurfgs: Scalable urban street surface reconstruction with planar-based gaussian splatting. IEEE Transactions on Circuits and Systems for Video Technology, 2025

work page 2025
[57]

Evolsplat: Efficient volume-based gaussian splatting for urban view synthesis

Sheng Miao, Jiaxin Huang, Dongfeng Bai, Xu Yan, Hongyu Zhou, Yue Wang, Bingbing Liu, Andreas Geiger, and Yiyi Liao. Evolsplat: Efficient volume-based gaussian splatting for urban view synthesis. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 11286–11296, 2025

work page 2025
[58]

Splatad: Real-time lidar and camera rendering with 3d gaussian splatting for autonomous driving

Georg Hess, Carl Lindström, Maryam Fatemi, Christoffer Petersson, and Lennart Svensson. Splatad: Real-time lidar and camera rendering with 3d gaussian splatting for autonomous driving. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 11982–11992, 2025

work page 2025
[59]

4d gaussian splatting for real-time dynamic scene rendering

Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, and Xinggang Wang. 4d gaussian splatting for real-time dynamic scene rendering. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 20310–20320, 2024

work page 2024
[60]

Ad-gs: Object-aware b-spline gaussian splatting for self-supervised autonomous driving

Jiawei Xu, Kai Deng, Zexin Fan, Shenlong Wang, Jin Xie, and Jian Yang. Ad-gs: Object-aware b-spline gaussian splatting for self-supervised autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 24770–24779, 2025

work page 2025
[61]

Periodic vibration gaus- sian: Dynamic urban scene reconstruction and real-time rendering.International Journal of Computer Vision, 134(3):83, 2026

Yurui Chen, Chun Gu, Junzhe Jiang, Xiatian Zhu, and Li Zhang. Periodic vibration gaus- sian: Dynamic urban scene reconstruction and real-time rendering.International Journal of Computer Vision, 134(3):83, 2026

work page 2026
[62]

Storm: Spatio-temporal reconstruction model for large-scale outdoor scenes.arXiv preprint arXiv:2501.00602, 2024

Jiawei Yang, Jiahui Huang, Yuxiao Chen, Yan Wang, Boyi Li, Yurong You, Apoorva Sharma, Maximilian Igl, Peter Karkus, Danfei Xu, et al. Storm: Spatio-temporal reconstruction model for large-scale outdoor scenes.arXiv preprint arXiv:2501.00602, 2024. 13

work page arXiv 2024
[63]

Mvsplat360: Feed-forward 360 scene synthesis from sparse views.Advances in Neural Information Processing Systems, 37:107064–107086, 2024

Yuedong Chen, Chuanxia Zheng, Haofei Xu, Bohan Zhuang, Andrea Vedaldi, Tat-Jen Cham, and Jianfei Cai. Mvsplat360: Feed-forward 360 scene synthesis from sparse views.Advances in Neural Information Processing Systems, 37:107064–107086, 2024

work page 2024
[64]

Transplat: Generalizable 3d gaussian splatting from sparse multi-view images with transformers

Chuanrui Zhang, Yingshuang Zou, Zhuoling Li, Minmin Yi, and Haoqian Wang. Transplat: Generalizable 3d gaussian splatting from sparse multi-view images with transformers. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 9869–9877, 2025

work page 2025
[65]

Triplane meets gaussian splatting: Fast and generalizable single-view 3d reconstruction with transformers

Zi-Xin Zou, Zhipeng Yu, Yuan-Chen Guo, Yangguang Li, Ding Liang, Yan-Pei Cao, and Song-Hai Zhang. Triplane meets gaussian splatting: Fast and generalizable single-view 3d reconstruction with transformers. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10324–10335, 2024

work page 2024
[66]

Lgm: Large multi-view gaussian model for high-resolution 3d content creation

Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, and Ziwei Liu. Lgm: Large multi-view gaussian model for high-resolution 3d content creation. InEuropean Conference on Computer Vision, pages 1–18. Springer, 2024

work page 2024
[67]

Gs-lrm: Large reconstruction model for 3d gaussian splatting

Kai Zhang, Sai Bi, Hao Tan, Yuanbo Xiangli, Nanxuan Zhao, Kalyan Sunkavalli, and Zexiang Xu. Gs-lrm: Large reconstruction model for 3d gaussian splatting. InEuropean Conference on Computer Vision, pages 1–19. Springer, 2024

work page 2024
[68]

arXiv preprint arXiv:2410.22128 (2024) 3

Sunghwan Hong, Jaewoo Jung, Heeseong Shin, Jisang Han, Jiaolong Yang, Chong Luo, and Seungryong Kim. Pf3plat: Pose-free feed-forward 3d gaussian splatting.arXiv preprint arXiv:2410.22128, 2024

work page arXiv 2024
[69]

Depthsplat: Connecting gaussian splatting and depth

Haofei Xu, Songyou Peng, Fangjinhua Wang, Hermann Blum, Daniel Barath, Andreas Geiger, and Marc Pollefeys. Depthsplat: Connecting gaussian splatting and depth. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 16453–16463, 2025

work page 2025
[70]

Anysplat: Feed-forward 3d gaussian splatting from unconstrained views.ACM Transactions on Graphics (TOG), 44(6):1–16, 2025

Lihan Jiang, Yucheng Mao, Linning Xu, Tao Lu, Kerui Ren, Yichen Jin, Xudong Xu, Mulin Yu, Jiangmiao Pang, Feng Zhao, et al. Anysplat: Feed-forward 3d gaussian splatting from unconstrained views.ACM Transactions on Graphics (TOG), 44(6):1–16, 2025

work page 2025
[71]

Worldmirror: Universal 3d world reconstruction with any-prior prompting.arXiv preprint arXiv:2510.10726, 2025

Yifan Liu, Zhiyuan Min, Zhenwei Wang, Junta Wu, Tengfei Wang, Yixuan Yuan, Yawei Luo, and Chunchao Guo. Worldmirror: Universal 3d world reconstruction with any-prior prompting. arXiv preprint arXiv:2510.10726, 2025

work page arXiv 2025
[72]

Drivingforward: Feed-forward 3d gaussian splatting for driving scene reconstruction from flexible surround-view input

Qijian Tian, Xin Tan, Yuan Xie, and Lizhuang Ma. Drivingforward: Feed-forward 3d gaussian splatting for driving scene reconstruction from flexible surround-view input. InProceedings of the AAAI Conference on Artificial Intelligence, 2025

work page 2025
[73]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022

work page 2022
[74]

Instructpix2pix: Learning to follow image editing instructions

Tim Brooks, Aleksander Holynski, and Alexei A Efros. Instructpix2pix: Learning to follow image editing instructions. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18392–18402, 2023

work page 2023
[75]

Zero-1-to-3: Zero-shot one image to 3d object

Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tokmakov, Sergey Zakharov, and Carl V ondrick. Zero-1-to-3: Zero-shot one image to 3d object. InProceedings of the IEEE/CVF international conference on computer vision, pages 9298–9309, 2023

work page 2023
[76]

arXiv preprint arXiv:2309.03453 , year=

Yuan Liu, Cheng Lin, Zijiao Zeng, Xiaoxiao Long, Lingjie Liu, Taku Komura, and Wenping Wang. Syncdreamer: Generating multiview-consistent images from a single-view image.arXiv preprint arXiv:2309.03453, 2023

work page arXiv 2023
[77]

MVDream: Multi-view diffusion for 3d generation

Yichun Shi, Peng Wang, Jianglong Ye, Long Mai, Kejie Li, and Xiao Yang. MVDream: Multi-view diffusion for 3d generation. InThe Twelfth International Conference on Learning Representations, 2024. 14

work page 2024
[78]

Score distillation sampling with learned manifold corrective

Thiemo Alldieck, Nikos Kolotouros, and Cristian Sminchisescu. Score distillation sampling with learned manifold corrective. InEuropean Conference on Computer Vision, pages 1–18. Springer, 2024

work page 2024
[79]

Instruct-nerf2nerf: Editing 3d scenes with instructions

Ayaan Haque, Matthew Tancik, Alexei Efros, Aleksander Holynski, and Angjoo Kanazawa. Instruct-nerf2nerf: Editing 3d scenes with instructions. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2023

work page 2023
[80]

Susskind, Christian Theobalt, Lingjie Liu, and Ravi Ramamoorthi

Jiatao Gu, Alex Trevithick, Kai-En Lin, Joshua M. Susskind, Christian Theobalt, Lingjie Liu, and Ravi Ramamoorthi. NerfDiff: Single-image view synthesis with NeRF-guided distillation from 3D-aware diffusion. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors,Proceedings of the 40th Internatio...

work page 2023

Showing first 80 references.