hub

Camco: Camera-controllable 3d-consistent image-to-video gener- ation

Dejia Xu, Weili Nie, Chao Liu, Sifei Liu, Jan Kautz, Zhangyang Wang, Arash Vahdat · 2024 · arXiv 2406.02509

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

read on arXiv browse 15 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

representative citing papers

PicoEyes: Unified Gaze Estimation Framework for Mixed Reality with a Large-Scale Multi-View Dataset

cs.CV · 2026-05-08 · unverdicted · novelty 7.0 · 2 refs

PicoEyes delivers a unified end-to-end model for full 3D gaze estimation including eye parameters, axes, segmentation and depth from monocular or binocular near-eye images, supported by a new large-scale multi-view dataset.

Reshoot-Anything: A Self-Supervised Model for In-the-Wild Video Reshooting

cs.CV · 2026-04-23 · unverdicted · novelty 7.0

Reshoot-Anything trains a diffusion transformer on pseudo multi-view triplets created by cropping and warping monocular videos to achieve temporally consistent video reshooting with robust camera control on dynamic scenes.

UniGeo: Unifying Geometric Guidance for Camera-Controllable Image Editing via Video Models

cs.CV · 2026-04-19 · unverdicted · novelty 7.0 · 2 refs

UniGeo unifies geometric guidance across three levels in video models to reduce geometric drift and improve consistency in camera-controllable image editing.

Geometrically Consistent Multi-View Scene Generation from Freehand Sketches

cs.CV · 2026-04-15 · unverdicted · novelty 7.0

A framework generates consistent multi-view scenes from one freehand sketch via a ~9k-sample dataset, Parallel Camera-Aware Attention Adapters, and Sparse Correspondence Supervision Loss, outperforming baselines in realism and consistency.

OmniCamera: A Unified Framework for Multi-task Video Generation with Arbitrary Camera Control

cs.CV · 2026-04-07 · unverdicted · novelty 7.0

OmniCamera disentangles video content and camera motion for multi-task generation with arbitrary camera control via the OmniCAM hybrid dataset and Dual-level Curriculum Co-Training.

SparseCam4D: Spatio-Temporally Consistent 4D Reconstruction from Sparse Cameras

cs.CV · 2026-03-27 · unverdicted · novelty 7.0

SparseCam4D achieves spatio-temporally consistent high-fidelity 4D reconstruction from sparse cameras via a Spatio-Temporal Distortion Field that corrects inconsistencies in generative observations.

RealCam: Real-Time Novel-View Video Generation with Interactive Camera Control

cs.CV · 2026-05-07 · unverdicted · novelty 6.0

RealCam is a causal autoregressive model for real-time camera-controlled video-to-video generation, using cross-frame in-context teacher distillation and loop-closed data augmentation to achieve high fidelity and consistency.

PhyCo: Learning Controllable Physical Priors for Generative Motion

cs.CV · 2026-04-30 · unverdicted · novelty 6.0

PhyCo adds continuous physical control to video diffusion models via physics-supervised fine-tuning on a large simulation dataset and VLM-guided rewards, yielding measurable gains in physical realism on the Physics-IQ benchmark.

Lyra 2.0: Explorable Generative 3D Worlds

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

Lyra 2.0 produces persistent 3D-consistent video sequences for large explorable worlds by using per-frame geometry for information routing and self-augmented training to correct temporal drift.

Rays as Pixels: Learning A Joint Distribution of Videos and Camera Trajectories

cs.CV · 2026-04-10 · unverdicted · novelty 6.0

A video diffusion model learns a joint distribution over videos and camera trajectories by representing cameras as pixel-aligned ray encodings (raxels) denoised jointly with video frames via decoupled attention.

INSPATIO-WORLD: A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling

cs.CV · 2026-04-08 · unverdicted · novelty 6.0

INSPATIO-WORLD is a real-time framework for high-fidelity 4D scene generation and navigation from monocular videos via STAR architecture with implicit caching, explicit geometric constraints, and distribution-matching distillation.

ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis

cs.CV · 2024-09-03 · unverdicted · novelty 6.0

ViewCrafter tames video diffusion models with point-based 3D guidance and iterative trajectory planning to produce high-fidelity novel views from single or sparse images.

CameraCtrl: Enabling Camera Control for Text-to-Video Generation

cs.CV · 2024-04-02 · unverdicted · novelty 6.0

CameraCtrl enables accurate camera pose control in video diffusion models through a trained plug-and-play module and dataset choices emphasizing diverse camera trajectories with matching appearance.

Pose-Aware Diffusion for 3D Generation

cs.CV · 2026-05-01 · unverdicted · novelty 5.0

PAD synthesizes 3D geometry in observation space via depth unprojection as anchor to eliminate pose ambiguity in image-to-3D generation.

Cosmos World Foundation Model Platform for Physical AI

cs.CV · 2025-01-07 · unverdicted · novelty 3.0

The Cosmos platform supplies open-source pre-trained world models and supporting tools for building fine-tunable digital world simulations to train Physical AI.

citing papers explorer

Showing 15 of 15 citing papers.

PicoEyes: Unified Gaze Estimation Framework for Mixed Reality with a Large-Scale Multi-View Dataset cs.CV · 2026-05-08 · unverdicted · none · ref 42 · 2 links
PicoEyes delivers a unified end-to-end model for full 3D gaze estimation including eye parameters, axes, segmentation and depth from monocular or binocular near-eye images, supported by a new large-scale multi-view dataset.
Reshoot-Anything: A Self-Supervised Model for In-the-Wild Video Reshooting cs.CV · 2026-04-23 · unverdicted · none · ref 43
Reshoot-Anything trains a diffusion transformer on pseudo multi-view triplets created by cropping and warping monocular videos to achieve temporally consistent video reshooting with robust camera control on dynamic scenes.
UniGeo: Unifying Geometric Guidance for Camera-Controllable Image Editing via Video Models cs.CV · 2026-04-19 · unverdicted · none · ref 79 · 2 links
UniGeo unifies geometric guidance across three levels in video models to reduce geometric drift and improve consistency in camera-controllable image editing.
Geometrically Consistent Multi-View Scene Generation from Freehand Sketches cs.CV · 2026-04-15 · unverdicted · none · ref 53
A framework generates consistent multi-view scenes from one freehand sketch via a ~9k-sample dataset, Parallel Camera-Aware Attention Adapters, and Sparse Correspondence Supervision Loss, outperforming baselines in realism and consistency.
OmniCamera: A Unified Framework for Multi-task Video Generation with Arbitrary Camera Control cs.CV · 2026-04-07 · unverdicted · none · ref 37
OmniCamera disentangles video content and camera motion for multi-task generation with arbitrary camera control via the OmniCAM hybrid dataset and Dual-level Curriculum Co-Training.
SparseCam4D: Spatio-Temporally Consistent 4D Reconstruction from Sparse Cameras cs.CV · 2026-03-27 · unverdicted · none · ref 42
SparseCam4D achieves spatio-temporally consistent high-fidelity 4D reconstruction from sparse cameras via a Spatio-Temporal Distortion Field that corrects inconsistencies in generative observations.
RealCam: Real-Time Novel-View Video Generation with Interactive Camera Control cs.CV · 2026-05-07 · unverdicted · none · ref 24
RealCam is a causal autoregressive model for real-time camera-controlled video-to-video generation, using cross-frame in-context teacher distillation and loop-closed data augmentation to achieve high fidelity and consistency.
PhyCo: Learning Controllable Physical Priors for Generative Motion cs.CV · 2026-04-30 · unverdicted · none · ref 42
PhyCo adds continuous physical control to video diffusion models via physics-supervised fine-tuning on a large simulation dataset and VLM-guided rewards, yielding measurable gains in physical realism on the Physics-IQ benchmark.
Lyra 2.0: Explorable Generative 3D Worlds cs.CV · 2026-04-14 · unverdicted · none · ref 120
Lyra 2.0 produces persistent 3D-consistent video sequences for large explorable worlds by using per-frame geometry for information routing and self-augmented training to correct temporal drift.
Rays as Pixels: Learning A Joint Distribution of Videos and Camera Trajectories cs.CV · 2026-04-10 · unverdicted · none · ref 19
A video diffusion model learns a joint distribution over videos and camera trajectories by representing cameras as pixel-aligned ray encodings (raxels) denoised jointly with video frames via decoupled attention.
INSPATIO-WORLD: A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling cs.CV · 2026-04-08 · unverdicted · none · ref 93
INSPATIO-WORLD is a real-time framework for high-fidelity 4D scene generation and navigation from monocular videos via STAR architecture with implicit caching, explicit geometric constraints, and distribution-matching distillation.
ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis cs.CV · 2024-09-03 · unverdicted · none · ref 57
ViewCrafter tames video diffusion models with point-based 3D guidance and iterative trajectory planning to produce high-fidelity novel views from single or sparse images.
CameraCtrl: Enabling Camera Control for Text-to-Video Generation cs.CV · 2024-04-02 · unverdicted · none · ref 158
CameraCtrl enables accurate camera pose control in video diffusion models through a trained plug-and-play module and dataset choices emphasizing diverse camera trajectories with matching appearance.
Pose-Aware Diffusion for 3D Generation cs.CV · 2026-05-01 · unverdicted · none · ref 54
PAD synthesizes 3D geometry in observation space via depth unprojection as anchor to eliminate pose ambiguity in image-to-3D generation.
Cosmos World Foundation Model Platform for Physical AI cs.CV · 2025-01-07 · unverdicted · none · ref 225
The Cosmos platform supplies open-source pre-trained world models and supporting tools for building fine-tunable digital world simulations to train Physical AI.

Camco: Camera-controllable 3d-consistent image-to-video gener- ation

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer