VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction

Bohan Zhuang; Donny Y. Chen; Feng Chen; Haoxiao Wang; Hengyu Liu; Jia-Wang Bian; Weijie Wang; Wenkang Qin; Yeqing Chen; Zeyu Zhang

arxiv: 2509.19297 · v3 · pith:4NLAZEHLnew · submitted 2025-09-23 · 💻 cs.CV

VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction

Weijie Wang , Yeqing Chen , Zeyu Zhang , Hengyu Liu , Haoxiao Wang , Zhiyuan Feng , Wenkang Qin , Feng Chen

show 4 more authors

Jia-Wang Bian Zheng Zhu Donny Y. Chen Bohan Zhuang

This is my paper

classification 💻 cs.CV

keywords gaussianvolsplatalignmentfeed-forwardgaussianspixelconsistencydensity

0 comments

read the original abstract

Feed-forward 3D Gaussian Splatting (3DGS) has emerged as a highly effective solution for novel view synthesis. Existing methods predominantly rely on a \emph{pixel-aligned} Gaussian prediction paradigm, where each 2D pixel is mapped to a 3D Gaussian. We rethink this widely adopted formulation and identify several inherent limitations: it renders the reconstructed 3D models heavily dependent on the number of input views, leads to view-biased density distributions, and introduces alignment errors, particularly when source views contain occlusions or low texture. To address these challenges, we introduce VolSplat, a new multi-view feed-forward paradigm that replaces pixel alignment with voxel-aligned Gaussians. By directly predicting Gaussians from a predicted 3D voxel grid, it overcomes pixel alignment's reliance on error-prone 2D feature matching, ensuring robust multi-view consistency. Furthermore, it enables adaptive control over density based on 3D scene complexity, yielding more faithful Gaussians, improved geometric consistency, and enhanced novel-view rendering quality. Experiments on widely used benchmarks demonstrate that VolSplat achieves state-of-the-art performance, while producing more plausible and view-consistent results. The video results, code and trained models are available on our project page: https://lhmd.top/volsplat.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 17 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Learn2Splat: Extending the Horizon of Learned 3DGS Optimization
cs.CV 2026-05 unverdicted novelty 7.0

A meta-learned optimizer for 3DGS that extends the optimization horizon via checkpoint buffers and latent gradient-scale encoding, delivering better early novel-view quality and long-term stability with zero-shot gene...
AdaptSplat: Adapting Vision Foundation Models for Feed-Forward 3D Gaussian Splatting
cs.CV 2026-05 unverdicted novelty 7.0

AdaptSplat adds a Frequency-Preserving Adapter to vision foundation models to boost high-frequency fidelity and cross-domain performance in feed-forward 3D Gaussian Splatting.
SplatWeaver: Learning to Allocate Gaussian Primitives for Generalizable Novel View Synthesis
cs.CV 2026-05 unverdicted novelty 7.0

SplatWeaver uses cardinality Gaussian experts and pixel-level routing to dynamically allocate varying numbers of Gaussian primitives for generalizable novel view synthesis.
SplatWeaver: Learning to Allocate Gaussian Primitives for Generalizable Novel View Synthesis
cs.CV 2026-05 unverdicted novelty 7.0

SplatWeaver dynamically allocates Gaussian primitives via cardinality experts and pixel-level routing guided by high-frequency cues for improved generalizable novel view synthesis.
Mix3R: Mixing Feed-forward Reconstruction and Generative 3D Priors for Joint Multi-view Aligned 3D Reconstruction and Pose Estimation
cs.CV 2026-05 unverdicted novelty 7.0

Mix3R mixes feed-forward reconstruction and generative 3D priors via Mixture-of-Transformers and overlap-based attention bias to achieve better-aligned 3D shapes and more accurate poses than either approach alone.
GSCompleter: A Distillation-Free Plugin for Metric-Aware 3D Gaussian Splatting Completion in Seconds
cs.CV 2026-04 unverdicted novelty 7.0

GSCompleter completes sparse 3D Gaussian Splatting scenes via a distillation-free generate-then-register pipeline using Stereo-Anchor lifting and Ray-Constrained Registration, delivering SOTA results on three benchmarks.
GSCompleter: A Distillation-Free Plugin for Metric-Aware 3D Gaussian Splatting Completion in Seconds
cs.CV 2026-04 unverdicted novelty 7.0

GSCompleter completes 3DGS scenes from sparse viewpoints using a generate-then-register workflow with stereo-anchor view selection and ray-constrained registration to achieve metric-aware results and SOTA performance ...
GlobalSplat: Efficient Feed-Forward 3D Gaussian Splatting via Global Scene Tokens
cs.CV 2026-04 unverdicted novelty 7.0

GlobalSplat achieves competitive novel-view synthesis on RealEstate10K and ACID using only 16K Gaussians via global scene tokens and coarse-to-fine training, with a 4MB footprint and under 78ms inference.
AnchorSplat: Feed-Forward 3D Gaussian Splatting with 3D Geometric Priors
cs.CV 2026-04 unverdicted novelty 7.0

AnchorSplat uses anchor-aligned 3D Gaussians guided by geometric priors for feed-forward scene reconstruction, achieving SOTA novel view synthesis on ScanNet++ with fewer primitives and better view consistency.
Free-Range Gaussians: Non-Grid-Aligned Generative 3D Gaussian Reconstruction
cs.CV 2026-04 unverdicted novelty 7.0

Free-Range Gaussians uses flow matching over Gaussian parameters to predict non-grid-aligned 3D Gaussians from multi-view images, enabling synthesis of plausible content in unobserved regions with fewer primitives tha...
Latent Spatial Memory for Video World Models
cs.CV 2026-06 unverdicted novelty 6.0

Mirage stores and queries 3D scene information in diffusion latent space via depth-guided lifting and warping, yielding 10.57× faster generation and 55× smaller memory than explicit RGB point-cloud baselines while rea...
Last-Layer-Centric Feature Recombination: Unleashing 3D Geometric Knowledge in DINOv3 for Monocular Depth Estimation
cs.CV 2026-04 unverdicted novelty 6.0

Layer analysis of DINOv3 shows non-uniform 3D geometric knowledge concentrated in deeper layers, enabling a last-layer-centric recombination module that improves monocular depth estimation accuracy to state-of-the-art levels.
Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective
cs.CV 2026-04 unverdicted novelty 6.0

The paper proposes a problem-driven taxonomy for feed-forward 3D scene modeling that groups methods by five core challenges: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temp...
C3G: Learning Compact 3D Representations with 2K Gaussians
cs.CV 2025-12 unverdicted novelty 6.0

C3G creates compact 3D Gaussian representations with 2K points by guiding placement via learnable tokens that aggregate multi-view features through attention, yielding better efficiency and performance than dense methods.
PhysRAG: Enhancing Physics-Awareness in Video Generation via Retrieval-Augmented Generation
cs.CV 2026-06 unverdicted novelty 5.0

PhysRAG curates 7K videos from WISA-80K, builds a physical video database, and injects knowledge via learnable queries into a diffusion model to reach SOTA visual quality and physical compliance on PhyGenBench and VBench.
AdaptSplat: Adapting Vision Foundation Models for Feed-Forward 3D Gaussian Splatting
cs.CV 2026-05 unverdicted novelty 5.0

AdaptSplat adds a lightweight Frequency-Preserving Adapter to vision foundation models that extracts direction-aware high-frequency priors and integrates them via positional encodings and residual modulation to improv...
UniMesh: Unifying 3D Mesh Understanding and Generation
cs.CV 2026-04 unverdicted novelty 5.0

UniMesh unifies 3D mesh generation and understanding in one model via a Mesh Head interface, Chain of Mesh iterative editing, and an Actor-Evaluator self-reflection loop.