pith. machine review for the scientific record. sign in

arxiv: 2512.04021 · v2 · submitted 2025-12-03 · 💻 cs.CV

Recognition: unknown

C3G: Learning Compact 3D Representations with 2K Gaussians

Authors on Pith no claims yet
classification 💻 cs.CV
keywords featuregaussianunderstandingcompactfeaturesgaussiansnovelscene
0
0 comments X
read the original abstract

Reconstructing and understanding 3D scenes from unposed sparse views in a feed-forward manner remains as a challenging task in 3D computer vision. Recent approaches use per-pixel 3D Gaussian Splatting for reconstruction, followed by a 2D-to-3D feature lifting stage for scene understanding. However, they generate excessive redundant Gaussians, causing high memory overhead and sub-optimal multi-view feature aggregation, leading to degraded novel view synthesis and scene understanding performance. We propose C3G, a novel feed-forward framework that estimates compact 3D Gaussians only at essential spatial locations, minimizing redundancy while enabling effective feature lifting. We introduce learnable tokens that aggregate multi-view features through self-attention to guide Gaussian generation, ensuring each Gaussian integrates relevant visual features across views. We then exploit the learned attention patterns for Gaussian decoding to efficiently lift features. Extensive experiments on pose-free novel view synthesis, 3D open-vocabulary segmentation, and view-invariant feature aggregation demonstrate our approach's effectiveness. Results show that a compact yet geometrically meaningful representation is sufficient for high-quality scene reconstruction and understanding, achieving superior memory efficiency and feature fidelity compared to existing methods.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. PointForward: Feedforward Driving Reconstruction through Point-Aligned Representations

    cs.CV 2026-05 unverdicted novelty 7.0

    PointForward uses sparse world-space 3D queries and scene graphs to deliver consistent single-pass reconstruction of dynamic driving scenes via point-aligned representations.

  2. SplatWeaver: Learning to Allocate Gaussian Primitives for Generalizable Novel View Synthesis

    cs.CV 2026-05 unverdicted novelty 7.0

    SplatWeaver dynamically allocates Gaussian primitives via cardinality experts and pixel-level routing guided by high-frequency cues for improved generalizable novel view synthesis.

  3. GlobalSplat: Efficient Feed-Forward 3D Gaussian Splatting via Global Scene Tokens

    cs.CV 2026-04 unverdicted novelty 7.0

    GlobalSplat achieves competitive novel-view synthesis on RealEstate10K and ACID using only 16K Gaussians via global scene tokens and coarse-to-fine training, with a 4MB footprint and under 78ms inference.

  4. Entropy-Gradient Grounding: Training-Free Evidence Retrieval in Vision-Language Models

    cs.CV 2026-04 unverdicted novelty 6.0

    Entropy-gradient grounding uses model uncertainty to retrieve evidence regions in VLMs, improving performance on detail-critical and compositional tasks across multiple architectures.