ReConText3D is the first replay-memory framework for continual text-to-3D generation that prevents catastrophic forgetting on new textual categories while preserving quality on previously seen classes.
hub Mixed citations
DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation
Mixed citation behavior. Most common role is background (62%).
abstract
Recent advances in 3D content creation mostly leverage optimization-based 3D generation via score distillation sampling (SDS). Though promising results have been exhibited, these methods often suffer from slow per-sample optimization, limiting their practical usage. In this paper, we propose DreamGaussian, a novel 3D content generation framework that achieves both efficiency and quality simultaneously. Our key insight is to design a generative 3D Gaussian Splatting model with companioned mesh extraction and texture refinement in UV space. In contrast to the occupancy pruning used in Neural Radiance Fields, we demonstrate that the progressive densification of 3D Gaussians converges significantly faster for 3D generative tasks. To further enhance the texture quality and facilitate downstream applications, we introduce an efficient algorithm to convert 3D Gaussians into textured meshes and apply a fine-tuning stage to refine the details. Extensive experiments demonstrate the superior efficiency and competitive generation quality of our proposed approach. Notably, DreamGaussian produces high-quality textured meshes in just 2 minutes from a single-view image, achieving approximately 10 times acceleration compared to existing methods.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
SplatShot is a training-free method that inserts per-step 3DGS refitting and photometric feedback into diffusion denoising to enforce multi-view consistency for single-photo 3D face avatars.
AdvScene is a scene-grounded evaluation method using Adversarial Patch-to-Scene Embedding (APSE) to map the operational envelope of physical adversarial patches in reconstructed real environments.
CAdam reinterprets densification in generative 3DGS as signal verification via gradient-moment interference, quantile context, and SNR gating to achieve large reductions in primitive count with comparable quality.
R-DMesh generates high-fidelity 4D meshes aligned to video by disentangling base mesh, motion, and a learned rectification jump offset inside a VAE, then using Triflow Attention and rectified-flow diffusion.
SplatWeaver uses cardinality Gaussian experts and pixel-level routing to dynamically allocate varying numbers of Gaussian primitives for generalizable novel view synthesis.
HairOrbit leverages video generation priors and a neural orientation extractor to achieve state-of-the-art strand-level 3D hair reconstruction from single-view portraits in visible and invisible regions.
THOM is a training-free two-stage framework that generates physically plausible hand-object 3D meshes directly from text by combining text-guided Gaussians with contact-aware physics optimization and VLM refinement.
VRGaussianAvatar enables real-time full-body 3D Gaussian Splatting avatars in VR from HMD tracking alone via inverse kinematics and binocular batching for efficient stereo rendering, outperforming mesh baselines in performance and user ratings.
SV-GS estimates a time-dependent skeleton pose plus fine deformations to enable 4D Gaussian splatting from sparse views, outperforming prior sparse methods by up to 34% PSNR on synthetic data and matching dense monocular baselines on real data with far fewer frames.
SVG360 lifts a single SVG to a view-conditioned representation, uses spatial memory to propagate consistent parts across views, and applies structure-aware vectorization to produce editable multiview SVGs.
UniEdit-Flow presents tuning-free Uni-Inv and Uni-Edit methods for inversion and editing in flow models that achieve accurate reconstruction and robust region-preserving edits across generative models.
PhyGenHOI couples a motion diffusion model for humans with material point method simulation for objects on 3D Gaussians, using attraction loss, contact re-simulation, and masked video-SDS to produce physically consistent dynamic interactions from text.
SCOUP decouples 2D sparse code learning from 3D Gaussian optimization to deliver up to 400x training speedup and 3x better memory efficiency while matching accuracy on open-vocabulary 3D queries.
DiLAST optimizes 3D latents via guidance from a 2D diffusion model to enable generalizable style transfer for OOD styles in 3D asset generation.
REVIVE 3D generates voluminous 3D assets from flat 2D images via an inflated prior construction followed by latent-space refinement, plus new metrics for volume and flatness validated by user study.
DualSplat bootstraps object-level pseudo-masks from initial 3DGS reconstruction failures using residuals and SAM2 to enable robust second-pass optimization in transient-heavy scenes.
DailyArt recovers full joint parameters of articulated objects from a single static image by synthesizing an opened state and comparing discrepancies, supporting downstream part-level novel state synthesis.
DenZa-Gaussian adapts 3D Gaussian Splatting for ADF-STEM tomography by modeling scattering as a learnable scalar field, adding tilt-angle normalization, and using a Fourier amplitude loss to improve sparse-view 3D reconstructions.
HandDreamer is the first zero-shot text-to-3D method for hands that uses MANO initialization, skeleton-guided diffusion, and corrective shape guidance to produce view-consistent models.
HOIGS adds a cross-attention HOI module to Gaussian Splatting that combines HexPlane human features with Cubic Hermite Spline object features to model interaction-induced deformations.
A framework disentangles local joint motion from global movement, trains a 2D local generator on text-2D pairs, then fine-tunes on 3D data to output view-consistent 3D motions.
CamCo equips image-to-video generators with Plücker-coordinate camera inputs and epipolar attention to improve 3D consistency and camera controllability.
A multi-view diffusion model generates consistent novel views from sparse images to enable fast 3D scene reconstruction.
citing papers explorer
-
ReConText3D: Replay-based Continual Text-to-3D Generation
ReConText3D is the first replay-memory framework for continual text-to-3D generation that prevents catastrophic forgetting on new textual categories while preserving quality on previously seen classes.
-
Splatshot: 3D Face Avatar Generation from a Single Unconstrained Photo
SplatShot is a training-free method that inserts per-step 3DGS refitting and photometric feedback into diffusion denoising to enforce multi-view consistency for single-photo 3D face avatars.
-
AdvScene: Rethinking Adversarial Patch Evaluation Through Scene Robustness
AdvScene is a scene-grounded evaluation method using Adversarial Patch-to-Scene Embedding (APSE) to map the operational envelope of physical adversarial patches in reconstructed real environments.
-
CAdam: Context-Adaptive Moment Estimation for 3D Gaussian Densification in Generative Distillation
CAdam reinterprets densification in generative 3DGS as signal verification via gradient-moment interference, quantile context, and SNR gating to achieve large reductions in primitive count with comparable quality.
-
R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow
R-DMesh generates high-fidelity 4D meshes aligned to video by disentangling base mesh, motion, and a learned rectification jump offset inside a VAE, then using Triflow Attention and rectified-flow diffusion.
-
SplatWeaver: Learning to Allocate Gaussian Primitives for Generalizable Novel View Synthesis
SplatWeaver uses cardinality Gaussian experts and pixel-level routing to dynamically allocate varying numbers of Gaussian primitives for generalizable novel view synthesis.
-
HairOrbit: Multi-view Aware 3D Hair Modeling from Single Portraits
HairOrbit leverages video generation priors and a neural orientation extractor to achieve state-of-the-art strand-level 3D hair reconstruction from single-view portraits in visible and invisible regions.
-
THOM: Generating Physically Plausible Hand-Object Meshes From Text
THOM is a training-free two-stage framework that generates physically plausible hand-object 3D meshes directly from text by combining text-guided Gaussians with contact-aware physics optimization and VLM refinement.
-
VRGaussianAvatar: Integrating 3D Gaussian Avatars into VR
VRGaussianAvatar enables real-time full-body 3D Gaussian Splatting avatars in VR from HMD tracking alone via inverse kinematics and binocular batching for efficient stereo rendering, outperforming mesh baselines in performance and user ratings.
-
SV-GS: Sparse View 4D Reconstruction with Skeleton-Driven Gaussian Splatting
SV-GS estimates a time-dependent skeleton pose plus fine deformations to enable 4D Gaussian splatting from sparse views, outperforming prior sparse methods by up to 34% PSNR on synthetic data and matching dense monocular baselines on real data with far fewer frames.
-
SVG360: Editable Multiview Vector Graphics from a Single SVG
SVG360 lifts a single SVG to a view-conditioned representation, uses spatial memory to propagate consistent parts across views, and applies structure-aware vectorization to produce editable multiview SVGs.
-
UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models
UniEdit-Flow presents tuning-free Uni-Inv and Uni-Edit methods for inversion and editing in flow models that achieve accurate reconstruction and robust region-preserving edits across generative models.
-
PhyGenHOI: Physically-Aware 4D Generation of Dynamic Human-Object Interactions
PhyGenHOI couples a motion diffusion model for humans with material point method simulation for objects on 3D Gaussians, using attraction loss, contact re-simulation, and masked video-SDS to produce physically consistent dynamic interactions from text.
-
Sparse Code Uplifting for Efficient 3D Language Gaussian Splatting
SCOUP decouples 2D sparse code learning from 3D Gaussian optimization to deliver up to 400x training speedup and 3x better memory efficiency while matching accuracy on open-vocabulary 3D queries.
-
Structured 3D Latents Are Surprisingly Powerful: Unleashing Generalizable Style with 2D Diffusion
DiLAST optimizes 3D latents via guidance from a 2D diffusion model to enable generalizable style transfer for OOD styles in 3D asset generation.
-
REVIVE 3D: Refinement via Encoded Voluminous Inflated prior for Volume Enhancement
REVIVE 3D generates voluminous 3D assets from flat 2D images via an inflated prior construction followed by latent-space refinement, plus new metrics for volume and flatness validated by user study.
-
DualSplat: Robust 3D Gaussian Splatting via Pseudo-Mask Bootstrapping from Reconstruction Failures
DualSplat bootstraps object-level pseudo-masks from initial 3DGS reconstruction failures using residuals and SAM2 to enable robust second-pass optimization in transient-heavy scenes.
-
DailyArt: Discovering Articulation from Single Static Images via Latent Dynamics
DailyArt recovers full joint parameters of articulated objects from a single static image by synthesizing an opened state and comparing discrepancies, supporting downstream part-level novel state synthesis.
-
3D Gaussian Splatting for Annular Dark Field Scanning Transmission Electron Microscopy Tomography Reconstruction
DenZa-Gaussian adapts 3D Gaussian Splatting for ADF-STEM tomography by modeling scattering as a learnable scalar field, adding tilt-angle normalization, and using a Fourier amplitude loss to improve sparse-view 3D reconstructions.
-
HandDreamer: Zero-Shot Text to 3D Hand Model Generation using Corrective Hand Shape Guidance
HandDreamer is the first zero-shot text-to-3D method for hands that uses MANO initialization, skeleton-guided diffusion, and corrective shape guidance to produce view-consistent models.
-
HOIGS: Human-Object Interaction Gaussian Splatting
HOIGS adds a cross-attention HOI module to Gaussian Splatting that combines HexPlane human features with Cubic Hermite Spline object features to model interaction-induced deformations.
-
Motion-2-To-3: Leveraging 2D Motion Data for 3D Motion Generations
A framework disentangles local joint motion from global movement, trains a 2D local generator on text-2D pairs, then fine-tunes on 3D data to output view-consistent 3D motions.
-
CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation
CamCo equips image-to-video generators with Plücker-coordinate camera inputs and epipolar attention to improve 3D consistency and camera controllability.
-
CAT3D: Create Anything in 3D with Multi-View Diffusion Models
A multi-view diffusion model generates consistent novel views from sparse images to enable fast 3D scene reconstruction.
-
Feed-Forward Gaussian Splatting from Sparse Aerial Views
AnyCity reconstructs coherent 3D Gaussian urban scenes from sparse aerial views in one feed-forward pass by anchoring observation-supported geometry and applying gated residual updates conditioned on an aerial-adapted video diffusion prior.
-
DecoRec: Decomposed 3D Scene Reconstruction from Single-View Images via Object-Level Diffusion
DecoRec decomposes single-view 3D scene reconstruction into per-object diffusion reconstructions followed by a differentiable rendering and diffusion-guided merging pipeline.
-
Pose-Aware Diffusion for 3D Generation
PAD synthesizes 3D geometry in observation space via depth unprojection as anchor to eliminate pose ambiguity in image-to-3D generation.
-
Unposed-to-3D: Learning Simulation-Ready Vehicles from Real-World Images
Unposed-to-3D learns simulation-ready 3D vehicle models from unposed real images by predicting camera parameters for photometric self-supervision, then adding scale prediction and harmonization.
-
DreamLifting: A Plug-in Module Lifting MV Diffusion Models for 3D Asset Generation
LGAA is a modular adapter framework that lifts multi-view diffusion models to produce 2D Gaussian Splats with PBR channels for high-quality relightable 3D mesh extraction using data-efficient finetuning on 69k instances.
-
Hestia: Voxel-Face-Aware Hierarchical Next-Best-View Acquisition for Efficient 3D Reconstruction
Hestia improves generalizable next-best-view planning for 3D reconstruction via hierarchical action search, diverse data, close-greedy strategy, and face-aware voxel design, yielding higher coverage and lower Chamfer distance than prior RL-based methods.
-
ConsDreamer: Advancing Multi-View Consistency for Zero-Shot Text-to-3D Generation
ConsDreamer refines conditional and unconditional terms in score distillation via view disentanglement and geometric consistency loss to reduce the Janus problem in zero-shot text-to-3D.
-
Diffusion Models are Secretly Zero-Shot 3DGS Harmonizers
D3DR optimizes inserted 3DGS objects with a DDS-inspired diffusion objective plus a new personalization step to match scene lighting, reporting 2 dB PSNR gain over prior methods.
-
TripoSR: Fast 3D Object Reconstruction from a Single Image
TripoSR generates 3D meshes from single images in under 0.5 seconds using an improved transformer architecture over LRM.
-
Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model
Zero123++ produces high-quality 3D-consistent multi-view images from a single input by fine-tuning Stable Diffusion with targeted conditioning and training methods.
-
AnimateAnyMesh++: A Flexible 4D Foundation Model for High-Fidelity Text-Driven Mesh Animation
AnimateAnyMesh++ animates arbitrary 3D meshes from text using an expanded 300K-identity DyMesh-XL dataset, a power-law topology-aware DyMeshVAE-Flex, and a variable-length rectified-flow generator to produce semantically accurate, temporally coherent animations in seconds.
-
Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation
Hunyuan3D 2.0 scales flow-based diffusion transformers and texture synthesis models to generate high-resolution textured 3D assets that outperform prior state-of-the-art in geometry, alignment, and texture quality.
-
LMMs Meet Object-Centric Vision: Understanding, Segmentation, Editing and Generation
This review organizes literature on large multimodal models and object-centric vision into four themes—understanding, referring segmentation, editing, and generation—while summarizing paradigms, strategies, and challenges like instance permanence and consistent interaction.
- Stream3D: Sequential Multi-View 3D Generation via Evidential Memory