hub Canonical reference

CAT3D: Create Anything in 3D with Multi-View Diffusion Models

Ruiqi Gao, Aleksander Holynski, Philipp Henzler, Arthur Brussee, Ricardo Martin-Brualla, Pratul Srinivasan · 2024 · cs.CV · arXiv 2405.10314

Canonical reference. 100% of citing Pith papers cite this work as background.

37 Pith papers citing it

Background 100% of classified citations

open full Pith review browse 37 citing papers arXiv PDF

abstract

Advances in 3D reconstruction have enabled high-quality 3D capture, but require a user to collect hundreds to thousands of images to create a 3D scene. We present CAT3D, a method for creating anything in 3D by simulating this real-world capture process with a multi-view diffusion model. Given any number of input images and a set of target novel viewpoints, our model generates highly consistent novel views of a scene. These generated views can be used as input to robust 3D reconstruction techniques to produce 3D representations that can be rendered from any viewpoint in real-time. CAT3D can create entire 3D scenes in as little as one minute, and outperforms existing methods for single image and few-view 3D scene creation. See our project page for results and interactive demos at https://cat3d.github.io .

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 8

citation-polarity summary

background 8

representative citing papers

Walking in the Implicit: Interactive World Exploration via Neural Scene Representation

cs.CV · 2026-06-29 · unverdicted · novelty 7.0

NeuWorld uses a transformer VAE to learn compact Neural Implicit Scenes from sparse posed frames and a diffusion transformer to evolve them conditioned on camera trajectories for consistent interactive exploration.

FLAT: Feedforward Latent Triangle Splatting for Geometrically Accurate Scene Generation

cs.CV · 2026-06-23 · unverdicted · novelty 7.0

FLAT maps compressed video diffusion latents to explicit triangle splats via ray-centered rotation parameterization and a product window function, reporting better geometric accuracy than 3D Gaussian baselines under identical training.

GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

GenRecon lifts object-level generative priors to scene-scale reconstruction by chunking scenes and using projection-based conditioning on multi-view features, claiming 16% better results than prior methods.

CRePE: Curved Ray Expectation Positional Encoding for Unified-Camera-Controlled Video Generation

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

CRePE supplies depth-aware positional distributions along curved rays for stable unified-camera control in frozen video DiT models.

GSCompleter: A Distillation-Free Plugin for Metric-Aware 3D Gaussian Splatting Completion in Seconds

cs.CV · 2026-04-22 · unverdicted · novelty 7.0 · 2 refs

GSCompleter completes 3DGS scenes from sparse viewpoints using a generate-then-register workflow with stereo-anchor view selection and ray-constrained registration to achieve metric-aware results and SOTA performance on benchmarks.

Geometrically Consistent Multi-View Scene Generation from Freehand Sketches

cs.CV · 2026-04-15 · unverdicted · novelty 7.0

A framework generates consistent multi-view scenes from one freehand sketch via a ~9k-sample dataset, Parallel Camera-Aware Attention Adapters, and Sparse Correspondence Supervision Loss, outperforming baselines in realism and consistency.

Towards Realistic and Consistent Orbital Video Generation via 3D Foundation Priors

cs.CV · 2026-04-14 · unverdicted · novelty 7.0

A video generation approach conditions a base model with multi-scale 3D latent features and a cross-attention adapter to produce geometrically realistic and consistent orbital videos from one image.

Any 3D Scene is Worth 1K Tokens: 3D-Grounded Representation for Scene Generation at Scale

cs.CV · 2026-04-13 · unverdicted · novelty 7.0

A 3D-grounded autoencoder and diffusion transformer allow direct generation of 3D scenes in an implicit latent space using a fixed 1K-token representation for arbitrary views and resolutions.

Novel View Synthesis as Video Completion

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

Video diffusion models can be adapted into permutation-invariant generators for sparse novel view synthesis by treating the problem as video completion and removing temporal order cues.

DPPE: Rethinking Camera-Based Positional Encoding for Scaling Multi-View Transformers

cs.CV · 2026-06-30 · unverdicted · novelty 6.0

DPPE decouples rotation and translation in camera positional encodings for multi-view transformers to resolve late-stage training stagnation and improve generalization in novel view synthesis.

GeoFace: Consistent Multi-View Face Generation with Geometry-Constrained Diffusion

cs.CV · 2026-06-26 · unverdicted · novelty 6.0

GeoFace generates consistent multi-view face images and 3D geometry from one input via a dual-stream diffusion framework with geometry-guided attention alignment.

Error-Conditioned Neural Solvers

cs.LG · 2026-06-25 · unverdicted · novelty 6.0

Error-Conditioned Neural Solvers improve PDE prediction accuracy by using the residual field as network input for learned corrections, outperforming residual-minimization methods by up to 10x on turbulent flows and generalizing better under distribution shifts.

SatSplatDiff: Geometry-preserving generative refinement for high-fidelity satellite Gaussian Splatting

cs.CV · 2026-06-25 · unverdicted · novelty 6.0 · 2 refs

SatSplatDiff combines depth supervision and shadow-guided generative refinement with 2DGS to reduce geometric MAE by up to 18% and improve visual fidelity by 28-45% on satellite datasets while enabling 5x resolution enhancement.

FLUX3D: High-Fidelity 3D Gaussian Generation with Diffusion-Aligned Sparse Representation

cs.CV · 2026-06-23 · unverdicted · novelty 6.0

FLUX3D introduces Diffusion-Aligned Structured Latents (DA-SLAT) and Sparse-structure Multimodal Diffusion Transformer (SMDiT) with MARoPE to address representation and alignment bottlenecks in sparse-voxel 3DGS generation.

Lighting-Consistent Object Transfer Across Radiance Fields

cs.GR · 2026-06-21 · unverdicted · novelty 6.0

Diffusion-based per-view harmonization for lighting-consistent object transfer between 3DGS scenes, using heterogeneous training data and final 3D consolidation.

VideoMDM: Towards 3D Human Motion Generation From 2D Supervision

cs.LG · 2026-06-11 · unverdicted · novelty 6.0

VideoMDM learns coherent 3D motion manifolds from 2D supervision alone by using a pretrained lifter as noisy teacher, depth-weighted 2D reprojection loss, and adapted regularizers, nearly matching fully 3D-supervised performance on HumanML3D.

Prisma-World: Camera-Controllable Multi-Agent Video World Model

cs.CV · 2026-06-08 · unverdicted · novelty 6.0

Prisma-World is a diffusion-based multi-agent video model that uses joint full-attention, multi-agent RoPE, and relative camera geometry injection plus curriculum training to produce consistent cross-view videos from flexible agent counts.

Property-Informed Diffusion-Based Text-to-Microstructure Generation

cs.CV · 2026-06-06 · unverdicted · novelty 6.0

A property-informed diffusion network generates 3D microstructures from text prompts via contrastive text-structure alignment and test-time reward-guided alignment.

Streaming Video Generation with Streaming Force Control

cs.CV · 2026-06-05 · unverdicted · novelty 6.0

StreamForce presents a unified causal model for force-controllable streaming video generation using a new force representation and distillation pipeline, claiming SOTA force adherence and 16.6 FPS performance.

SimuScene: Simulation-Ready Compositional 3D Scene Reconstruction from a Single Image

cs.CV · 2026-06-02 · unverdicted · novelty 6.0

SimuScene feeds physics simulation diagnostics back into shape and layout estimation to correct geometric errors and output simulation-ready compositional scenes from single images.

GeoFlow: Enforcing Implicit Geometric Consistency in Video Generation

cs.CV · 2026-05-18 · unverdicted · novelty 6.0

GeoFlow adds a geometry-consistency reward based on rigid camera flow and object appearance preservation, integrated via reinforcement fine-tuning to improve geometric coherence in video generation.

HAD: Hallucination-Aware Diffusion Priors for 3D Reconstruction

cs.CV · 2026-05-16 · unverdicted · novelty 6.0

HAD uses multi-view reasoning from a pre-trained feedforward NVS network to estimate and mask hallucination scores in diffusion priors, reducing artifacts and achieving SOTA novel view synthesis in sparse-view 3D reconstruction.

FurnSet: Exploiting Repeats for 3D Scene Reconstruction

cs.CV · 2026-04-22 · unverdicted · novelty 6.0

FurnSet improves single-view 3D scene reconstruction by using per-object CLS tokens and set-aware self-attention to group and jointly reconstruct repeated object instances, with added scene-object conditioning and layout optimization.

Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective

cs.CV · 2026-04-15 · unverdicted · novelty 6.0

The paper proposes a problem-driven taxonomy for feed-forward 3D scene modeling that groups methods by five core challenges: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware modeling.

citing papers explorer

Showing 8 of 8 citing papers after filters.

Geometrically Consistent Multi-View Scene Generation from Freehand Sketches cs.CV · 2026-04-15 · unverdicted · none · ref 12 · internal anchor
A framework generates consistent multi-view scenes from one freehand sketch via a ~9k-sample dataset, Parallel Camera-Aware Attention Adapters, and Sparse Correspondence Supervision Loss, outperforming baselines in realism and consistency.
Any 3D Scene is Worth 1K Tokens: 3D-Grounded Representation for Scene Generation at Scale cs.CV · 2026-04-13 · unverdicted · none · ref 20 · internal anchor
A 3D-grounded autoencoder and diffusion transformer allow direct generation of 3D scenes in an implicit latent space using a fixed 1K-token representation for arbitrary views and resolutions.
Novel View Synthesis as Video Completion cs.CV · 2026-04-09 · unverdicted · none · ref 10 · internal anchor
Video diffusion models can be adapted into permutation-invariant generators for sparse novel view synthesis by treating the problem as video completion and removing temporal order cues.
Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective cs.CV · 2026-04-15 · unverdicted · none · ref 253 · internal anchor
The paper proposes a problem-driven taxonomy for feed-forward 3D scene modeling that groups methods by five core challenges: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware modeling.
NavCrafter: Exploring 3D Scenes from a Single Image cs.CV · 2026-04-03 · unverdicted · none · ref 16 · internal anchor
NavCrafter generates controllable novel-view videos from one image via video diffusion, geometry-aware expansion, and enhanced 3D Gaussian Splatting to achieve state-of-the-art synthesis under large viewpoint changes.
ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis cs.CV · 2024-09-03 · unverdicted · none · ref 63 · internal anchor
ViewCrafter tames video diffusion models with point-based 3D guidance and iterative trajectory planning to produce high-fidelity novel views from single or sparse images.
Asset Harvester: Extracting 3D Assets from Autonomous Driving Logs for Simulation cs.CV · 2026-04-20 · unverdicted · none · ref 43 · internal anchor
Asset Harvester converts sparse in-the-wild object observations from AV driving logs into complete simulation-ready 3D assets via data curation, geometry-aware preprocessing, and a SparseViewDiT model that couples sparse-view multiview generation with 3D Gaussian lifting.
UniGeo: Unifying Geometric Guidance for Camera-Controllable Image Editing via Video Models cs.CV · 2026-04-19 · unreviewed · ref 26 · 2 links · internal anchor

CAT3D: Create Anything in 3D with Multi-View Diffusion Models

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer