GenRecon lifts object-level generative priors to scene-scale reconstruction by chunking scenes and using projection-based conditioning on multi-view features, claiming 16% better results than prior methods.
hub Canonical reference
Anysplat: Feed-forward 3d gaussian splatting from unconstrained views
Canonical reference. 71% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 20representative citing papers
NoPo4D is the first feed-forward system for dynamic 4D Gaussian splatting from unposed multi-view videos, using velocity decomposition supervised by optical flow and a bidirectional motion encoder.
Mix3R mixes feed-forward reconstruction and generative 3D priors via Mixture-of-Transformers and overlap-based attention bias to achieve better-aligned 3D shapes and more accurate poses than either approach alone.
VistaBot integrates 4D geometry estimation and spatiotemporal view synthesis into action policies to improve cross-view generalization by 2.6-2.8x on a new VGS metric in simulation and real tasks.
AnchorSplat uses anchor-aligned 3D Gaussians guided by geometric priors for feed-forward scene reconstruction, achieving SOTA novel view synthesis on ScanNet++ with fewer primitives and better view consistency.
SparseSplat uses entropy-based probabilistic sampling and a specialized point cloud network to generate compact 3D Gaussian maps that retain high rendering quality with far fewer Gaussians than prior feed-forward methods.
TriSplat predicts oriented triangle primitives from images in one forward pass to produce simulation-ready 3D meshes with competitive rendering quality.
A feed-forward model aligns ground and satellite features to predict Gaussian splats for improved novel-view synthesis on georeferenced outdoor scenes.
A Z-order transformer organizes unstructured Gaussians for sparse attention, enabling feed-forward prediction of high-quality 3D splats with fewer primitives.
Geometric 4D Stitching explicitly complements missing geometric regions in 4D generated scenes with grounded stitches to achieve consistent 4D representations in under 10 minutes on a single GPU.
Sat3R adapts Depth Anything V2 via RPC-aware metric depth fine-tuning to deliver satellite DSM reconstruction with 38% lower MAE than zero-shot baselines and over 300x speedup versus optimization methods.
The paper proposes a problem-driven taxonomy for feed-forward 3D scene modeling that groups methods by five core challenges: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware modeling.
FreeScale generates scalable high-quality training data for generalizable novel view synthesis by certainty-aware sampling from imperfect scene reconstructions, delivering 2.7 dB PSNR gains on out-of-distribution tests.
FLEG reconstructs language-embedded 3D Gaussians from arbitrary input views using a dual-branch distillation framework and a sparse set of semantic Gaussians that requires only 5% of prior embeddings.
C3G creates compact 3D Gaussian representations with 2K points by guiding placement via learnable tokens that aggregate multi-view features through attention, yielding better efficiency and performance than dense methods.
Lumos3D enables pose-free single-forward restoration of low-light 3D scenes via cross-illumination distillation from a teacher network and a custom Lumos loss on 3D Gaussians.
UniSplat learns consistent 3D geometry, appearance, and semantics from unposed images using dual masking, progressive Gaussian splatting, and recalibration to align predictions across tasks.
FF3R unifies geometric and semantic 3D reconstruction in a single annotation-free feed-forward network trained solely via RGB and feature rendering supervision.
LDCM achieves state-of-the-art metric depth completion from sparse observations by combining foundation-model initialization with a point-map regression head that removes the need for camera intrinsics.
HY-World 2.0 generates and reconstructs high-fidelity navigable 3D Gaussian Splatting worlds from text, images, or videos via upgraded panorama, planning, expansion, and composition modules, with released code claiming open-source SOTA performance.
citing papers explorer
-
GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction
GenRecon lifts object-level generative priors to scene-scale reconstruction by chunking scenes and using projection-based conditioning on multi-view features, claiming 16% better results than prior methods.
-
No Pose, No Problem in 4D: Feed-Forward Dynamic Gaussians from Unposed Multi-View Videos
NoPo4D is the first feed-forward system for dynamic 4D Gaussian splatting from unposed multi-view videos, using velocity decomposition supervised by optical flow and a bidirectional motion encoder.
-
Mix3R: Mixing Feed-forward Reconstruction and Generative 3D Priors for Joint Multi-view Aligned 3D Reconstruction and Pose Estimation
Mix3R mixes feed-forward reconstruction and generative 3D priors via Mixture-of-Transformers and overlap-based attention bias to achieve better-aligned 3D shapes and more accurate poses than either approach alone.
-
VistaBot: View-Robust Robot Manipulation via Spatiotemporal-Aware View Synthesis
VistaBot integrates 4D geometry estimation and spatiotemporal view synthesis into action policies to improve cross-view generalization by 2.6-2.8x on a new VGS metric in simulation and real tasks.
-
AnchorSplat: Feed-Forward 3D Gaussian Splatting with 3D Geometric Priors
AnchorSplat uses anchor-aligned 3D Gaussians guided by geometric priors for feed-forward scene reconstruction, achieving SOTA novel view synthesis on ScanNet++ with fewer primitives and better view consistency.
-
SparseSplat: Towards Applicable Feed-Forward 3D Gaussian Splatting with Pixel-Unaligned Prediction
SparseSplat uses entropy-based probabilistic sampling and a specialized point cloud network to generate compact 3D Gaussian maps that retain high rendering quality with far fewer Gaussians than prior feed-forward methods.
-
TriSplat: Simulation-Ready Feed-Forward 3D Scene Reconstruction
TriSplat predicts oriented triangle primitives from images in one forward pass to produce simulation-ready 3D meshes with competitive rendering quality.
-
Cross-View Splatter: Feed-Forward View Synthesis with Georeferenced Images
A feed-forward model aligns ground and satellite features to predict Gaussian splats for improved novel-view synthesis on georeferenced outdoor scenes.
-
Z-Order Transformer for Feed-Forward Gaussian Splatting
A Z-order transformer organizes unstructured Gaussians for sparse attention, enabling feed-forward prediction of high-quality 3D splats with fewer primitives.
-
Geometric 4D Stitching for Grounded 4D Generation
Geometric 4D Stitching explicitly complements missing geometric regions in 4D generated scenes with grounded stitches to achieve consistent 4D representations in under 10 minutes on a single GPU.
-
Sat3R: Satellite DSM Reconstruction via RPC-Aware Depth Fine-tuning
Sat3R adapts Depth Anything V2 via RPC-aware metric depth fine-tuning to deliver satellite DSM reconstruction with 38% lower MAE than zero-shot baselines and over 300x speedup versus optimization methods.
-
Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective
The paper proposes a problem-driven taxonomy for feed-forward 3D scene modeling that groups methods by five core challenges: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware modeling.
-
FreeScale: Scaling 3D Scenes via Certainty-Aware Free-View Generation
FreeScale generates scalable high-quality training data for generalizable novel view synthesis by certainty-aware sampling from imperfect scene reconstructions, delivering 2.7 dB PSNR gains on out-of-distribution tests.
-
FLEG: Feed-Forward Language Embedded Gaussian Splatting from Any Views via Compact Semantic Representation
FLEG reconstructs language-embedded 3D Gaussians from arbitrary input views using a dual-branch distillation framework and a sparse set of semantic Gaussians that requires only 5% of prior embeddings.
-
C3G: Learning Compact 3D Representations with 2K Gaussians
C3G creates compact 3D Gaussian representations with 2K points by guiding placement via learnable tokens that aggregate multi-view features through attention, yielding better efficiency and performance than dense methods.
-
Lumos3D: A Single-Forward Framework for Low-Light 3D Scene Restoration
Lumos3D enables pose-free single-forward restoration of low-light 3D scenes via cross-illumination distillation from a teacher network and a custom Lumos loss on 3D Gaussians.
-
Learning 3D Representations for Spatial Intelligence from Unposed Multi-View Images
UniSplat learns consistent 3D geometry, appearance, and semantics from unposed images using dual masking, progressive Gaussian splatting, and recalibration to align predictions across tasks.
-
FF3R: Feedforward Feature 3D Reconstruction from Unconstrained views
FF3R unifies geometric and semantic 3D reconstruction in a single annotation-free feed-forward network trained solely via RGB and feature rendering supervision.
-
Large Depth Completion Model from Sparse Observations
LDCM achieves state-of-the-art metric depth completion from sparse observations by combining foundation-model initialization with a point-map regression head that removes the need for camera intrinsics.
-
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds
HY-World 2.0 generates and reconstructs high-fidelity navigable 3D Gaussian Splatting worlds from text, images, or videos via upgraded panorama, planning, expansion, and composition modules, with released code claiming open-source SOTA performance.