pith. sign in

arXiv preprint arXiv:2506.10981 (2025) 4

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it
abstract

Generative models have shown great promise for novel view synthesis (NVS) by leveraging strong image generation priors. However, existing approaches typically follow a 2D inpainting paradigm, first completing missing image regions and then performing 3D reconstruction. This strategy often causes geometry distortion and appearance drift, as 2D inpainting models cannot reliably infer the underlying 3D structure required for cross-view consistent generation. In this paper, we propose \textbf{SceneCompleter}, a geometry-aware framework that reformulates generative NVS as dense 3D scene completion. Instead of hallucinating isolated 2D views, SceneCompleter jointly completes geometry and appearance through a geometry-appearance dual-stream diffusion model in a spatially aligned RGBD latent space. To provide holistic scene context, we further introduce a Scene Embedder that conditions generation on global semantic and stylistic information from reference images. The completed RGBD predictions are then aligned and integrated into an expandable 3D scene representation, enabling iterative and coherent scene completion. Extensive experiments on in-domain and out-of-distribution datasets demonstrate that SceneCompleter produces visually plausible and geometrically consistent novel views across diverse scenarios. Project Page: https://chen-wl20.github.io/SceneCompleter

fields

cs.CV 2

years

2026 2

verdicts

UNVERDICTED 2

clear filters

representative citing papers

citing papers explorer

Showing 2 of 2 citing papers after filters.

  • TivTok: Broadcasting Time-Invariant Tokens for Scalable Video Tokenization cs.CV · 2026-06-16 · unverdicted · none · ref 72 · internal anchor

    TivTok factorizes video clips into reusable time-invariant tokens and frame-specific time-variant tokens via Scope-Induced Factorization and Invariant Broadcasting, achieving 2.91x better compression for 128-frame videos on benchmarks.

  • AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model cs.CV · 2026-04-21 · unverdicted · none · ref 4 · internal anchor

    AnyRecon enables scalable 3D reconstruction from arbitrary sparse unordered views by combining video diffusion with explicit global geometric memory and retrieval to maintain consistency across large viewpoint changes.