NeuWorld uses a transformer VAE to learn compact Neural Implicit Scenes from sparse posed frames and a diffusion transformer to evolve them conditioned on camera trajectories for consistent interactive exploration.
More: 3d visual geometry reconstruction meets mixture-of-experts
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 4years
2026 4verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
The paper proposes a problem-driven taxonomy for feed-forward 3D scene modeling that groups methods by five core challenges: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware modeling.
ViGeo is a feed-forward transformer for video geometry that introduces dynamic chunking attention and a completion-based data refinement framework to achieve SOTA on depth, normals, and point map estimation.
LDCM achieves state-of-the-art metric depth completion from sparse observations by combining foundation-model initialization with a point-map regression head that removes the need for camera intrinsics.
citing papers explorer
-
Walking in the Implicit: Interactive World Exploration via Neural Scene Representation
NeuWorld uses a transformer VAE to learn compact Neural Implicit Scenes from sparse posed frames and a diffusion transformer to evolve them conditioned on camera trajectories for consistent interactive exploration.
-
Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective
The paper proposes a problem-driven taxonomy for feed-forward 3D scene modeling that groups methods by five core challenges: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware modeling.
-
Towards Consistent Video Geometry Estimation
ViGeo is a feed-forward transformer for video geometry that introduces dynamic chunking attention and a completion-based data refinement framework to achieve SOTA on depth, normals, and point map estimation.
-
Large Depth Completion Model from Sparse Observations
LDCM achieves state-of-the-art metric depth completion from sparse observations by combining foundation-model initialization with a point-map regression head that removes the need for camera intrinsics.