pith. sign in

hub Canonical reference

CAT3D: Create Anything in 3D with Multi-View Diffusion Models

Canonical reference. 100% of citing Pith papers cite this work as background.

24 Pith papers citing it
Background 100% of classified citations
abstract

Advances in 3D reconstruction have enabled high-quality 3D capture, but require a user to collect hundreds to thousands of images to create a 3D scene. We present CAT3D, a method for creating anything in 3D by simulating this real-world capture process with a multi-view diffusion model. Given any number of input images and a set of target novel viewpoints, our model generates highly consistent novel views of a scene. These generated views can be used as input to robust 3D reconstruction techniques to produce 3D representations that can be rendered from any viewpoint in real-time. CAT3D can create entire 3D scenes in as little as one minute, and outperforms existing methods for single image and few-view 3D scene creation. See our project page for results and interactive demos at https://cat3d.github.io .

hub tools

citation-role summary

background 8

citation-polarity summary

fields

cs.CV 23 cs.GR 1

roles

background 8

polarities

background 8

representative citing papers

Novel View Synthesis as Video Completion

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

Video diffusion models can be adapted into permutation-invariant generators for sparse novel view synthesis by treating the problem as video completion and removing temporal order cues.

HAD: Hallucination-Aware Diffusion Priors for 3D Reconstruction

cs.CV · 2026-05-16 · unverdicted · novelty 6.0

HAD uses multi-view reasoning from a pre-trained feedforward NVS network to estimate and mask hallucination scores in diffusion priors, reducing artifacts and achieving SOTA novel view synthesis in sparse-view 3D reconstruction.

FurnSet: Exploiting Repeats for 3D Scene Reconstruction

cs.CV · 2026-04-22 · unverdicted · novelty 6.0

FurnSet improves single-view 3D scene reconstruction by using per-object CLS tokens and set-aware self-attention to group and jointly reconstruct repeated object instances, with added scene-object conditioning and layout optimization.

Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective

cs.CV · 2026-04-15 · unverdicted · novelty 6.0

The paper proposes a problem-driven taxonomy for feed-forward 3D scene modeling that groups methods by five core challenges: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware modeling.

NavCrafter: Exploring 3D Scenes from a Single Image

cs.CV · 2026-04-03 · unverdicted · novelty 6.0

NavCrafter generates controllable novel-view videos from one image via video diffusion, geometry-aware expansion, and enhanced 3D Gaussian Splatting to achieve state-of-the-art synthesis under large viewpoint changes.

Asset Harvester: Extracting 3D Assets from Autonomous Driving Logs for Simulation

cs.CV · 2026-04-20 · unverdicted · novelty 5.0

Asset Harvester converts sparse in-the-wild object observations from AV driving logs into complete simulation-ready 3D assets via data curation, geometry-aware preprocessing, and a SparseViewDiT model that couples sparse-view multiview generation with 3D Gaussian lifting.

InSpatio-WorldFM: An Open-Source Real-Time Generative Frame Model

cs.CV · 2026-03-12 · unverdicted · novelty 5.0

InSpatio-WorldFM is a frame-independent generative model that uses explicit 3D anchors and spatial memory to deliver real-time multi-view consistent spatial intelligence via a three-stage training pipeline from pretrained diffusion models.

ViPE: Video Pose Engine for 3D Geometric Perception

cs.CV · 2025-08-12 · unverdicted · novelty 5.0

ViPE estimates camera intrinsics, motion, and dense near-metric depth from uncalibrated videos, outperforming baselines on TUM and KITTI while releasing annotations for 96M frames across real and generated videos.

citing papers explorer

Showing 24 of 24 citing papers.