arXiv preprint arXiv:2506.10981 (2025) 4

Chen, W · 2025 · cs.CV · arXiv 2506.10981

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Generative models have shown great promise for novel view synthesis (NVS) by leveraging strong image generation priors. However, existing approaches typically follow a 2D inpainting paradigm, first completing missing image regions and then performing 3D reconstruction. This strategy often causes geometry distortion and appearance drift, as 2D inpainting models cannot reliably infer the underlying 3D structure required for cross-view consistent generation. In this paper, we propose \textbf{SceneCompleter}, a geometry-aware framework that reformulates generative NVS as dense 3D scene completion. Instead of hallucinating isolated 2D views, SceneCompleter jointly completes geometry and appearance through a geometry-appearance dual-stream diffusion model in a spatially aligned RGBD latent space. To provide holistic scene context, we further introduce a Scene Embedder that conditions generation on global semantic and stylistic information from reference images. The completed RGBD predictions are then aligned and integrated into an expandable 3D scene representation, enabling iterative and coherent scene completion. Extensive experiments on in-domain and out-of-distribution datasets demonstrate that SceneCompleter produces visually plausible and geometrically consistent novel views across diverse scenarios. Project Page: https://chen-wl20.github.io/SceneCompleter

representative citing papers

TivTok: Broadcasting Time-Invariant Tokens for Scalable Video Tokenization

cs.CV · 2026-06-16 · unverdicted · novelty 6.0

TivTok factorizes video clips into reusable time-invariant tokens and frame-specific time-variant tokens via Scope-Induced Factorization and Invariant Broadcasting, achieving 2.91x better compression for 128-frame videos on benchmarks.

AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model

cs.CV · 2026-04-21 · unverdicted · novelty 6.0

AnyRecon enables scalable 3D reconstruction from arbitrary sparse unordered views by combining video diffusion with explicit global geometric memory and retrieval to maintain consistency across large viewpoint changes.

citing papers explorer

Showing 2 of 2 citing papers after filters.

TivTok: Broadcasting Time-Invariant Tokens for Scalable Video Tokenization cs.CV · 2026-06-16 · unverdicted · none · ref 72 · internal anchor
TivTok factorizes video clips into reusable time-invariant tokens and frame-specific time-variant tokens via Scope-Induced Factorization and Invariant Broadcasting, achieving 2.91x better compression for 128-frame videos on benchmarks.
AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model cs.CV · 2026-04-21 · unverdicted · none · ref 4 · internal anchor
AnyRecon enables scalable 3D reconstruction from arbitrary sparse unordered views by combining video diffusion with explicit global geometric memory and retrieval to maintain consistency across large viewpoint changes.

arXiv preprint arXiv:2506.10981 (2025) 4

fields

years

verdicts

representative citing papers

citing papers explorer