hub

Stereo Magnification: Learning View Synthesis using Multiplane Images

Tinghui Zhou, Richard Tucker, John Flynn, Graham Fyffe, Noah Snavely · 2018 · cs.CV · arXiv 1805.09817

28 Pith papers cite this work. Polarity classification is still indexing.

28 Pith papers citing it

open full Pith review browse 28 citing papers arXiv PDF

abstract

The view synthesis problem--generating novel views of a scene from known imagery--has garnered recent attention due in part to compelling applications in virtual and augmented reality. In this paper, we explore an intriguing scenario for view synthesis: extrapolating views from imagery captured by narrow-baseline stereo cameras, including VR cameras and now-widespread dual-lens camera phones. We call this problem stereo magnification, and propose a learning framework that leverages a new layered representation that we call multiplane images (MPIs). Our method also uses a massive new data source for learning view extrapolation: online videos on YouTube. Using data mined from such videos, we train a deep network that predicts an MPI from an input stereo image pair. This inferred MPI can then be used to synthesize a range of novel views of the scene, including views that extrapolate significantly beyond the input baseline. We show that our method compares favorably with several recent view synthesis methods, and demonstrate applications in magnifying narrow-baseline stereo images.

hub tools

JSON dossier citing papers JSON arXiv source

representative citing papers

GTA: Advancing Image-to-3D World Generation via Geometry Then Appearance Video Diffusion

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

GTA generates 3D worlds from single images via a two-stage video diffusion process that prioritizes geometry before appearance to improve structural consistency.

$h$-control: Training-Free Camera Control via Block-Conditional Gibbs Refinement

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

h-control introduces block-conditional pseudo-Gibbs refinement for training-free camera control in flow-matching video generators, achieving superior FVD scores on RealEstate10K and DAVIS benchmarks.

SplatWeaver: Learning to Allocate Gaussian Primitives for Generalizable Novel View Synthesis

cs.CV · 2026-05-08 · unverdicted · novelty 7.0

SplatWeaver dynamically allocates Gaussian primitives via cardinality experts and pixel-level routing guided by high-frequency cues for improved generalizable novel view synthesis.

AirZoo: A Unified Large-Scale Dataset for Grounding Aerial Geometric 3D Vision

cs.CV · 2026-04-29 · conditional · novelty 7.0

AirZoo is a new large-scale synthetic dataset for aerial 3D vision that improves state-of-the-art models on image retrieval, cross-view matching, and 3D reconstruction when used for fine-tuning.

GSCompleter: A Distillation-Free Plugin for Metric-Aware 3D Gaussian Splatting Completion in Seconds

cs.CV · 2026-04-22 · unverdicted · novelty 7.0

GSCompleter completes sparse 3D Gaussian Splatting scenes via a distillation-free generate-then-register pipeline using Stereo-Anchor lifting and Ray-Constrained Registration, delivering SOTA results on three benchmarks.

URoPE: Universal Relative Position Embedding across Geometric Spaces

cs.CV · 2026-04-20 · unverdicted · novelty 7.0

URoPE is a parameter-free relative position embedding for transformers that works across arbitrary geometric spaces by ray sampling and projection, yielding consistent gains on novel view synthesis, 3D detection, tracking, and depth estimation.

UniGeo: Unifying Geometric Guidance for Camera-Controllable Image Editing via Video Models

cs.CV · 2026-04-19 · unverdicted · novelty 7.0 · 2 refs

UniGeo unifies geometric guidance across three levels in video models to reduce geometric drift and improve consistency in camera-controllable image editing.

GlobalSplat: Efficient Feed-Forward 3D Gaussian Splatting via Global Scene Tokens

cs.CV · 2026-04-16 · unverdicted · novelty 7.0

GlobalSplat achieves competitive novel-view synthesis on RealEstate10K and ACID using only 16K Gaussians via global scene tokens and coarse-to-fine training, with a 4MB footprint and under 78ms inference.

TokenGS: Decoupling 3D Gaussian Prediction from Pixels with Learnable Tokens

cs.CV · 2026-04-16 · unverdicted · novelty 7.0

TokenGS uses learnable Gaussian tokens in an encoder-decoder architecture to regress 3D means directly, achieving SOTA feed-forward reconstruction on static and dynamic scenes with better robustness.

CT-1: Vision-Language-Camera Models Transfer Spatial Reasoning Knowledge to Camera-Controllable Video Generation

cs.CV · 2026-04-10 · unverdicted · novelty 7.0

CT-1 transfers spatial reasoning from vision-language models to estimate camera trajectories, which are then used in a video diffusion model with wavelet regularization to produce controllable videos, claiming 25.7% better accuracy than prior methods.

Novel View Synthesis as Video Completion

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

Video diffusion models can be adapted into permutation-invariant generators for sparse novel view synthesis by treating the problem as video completion and removing temporal order cues.

Mem3R: Streaming 3D Reconstruction with Hybrid Memory via Test-Time Training

cs.CV · 2026-04-08 · unverdicted · novelty 7.0

Mem3R achieves better long-sequence 3D reconstruction by decoupling tracking and mapping with a hybrid memory of TTT-updated MLP and explicit tokens, reducing model size and trajectory errors.

OmniCamera: A Unified Framework for Multi-task Video Generation with Arbitrary Camera Control

cs.CV · 2026-04-07 · unverdicted · novelty 7.0

OmniCamera disentangles video content and camera motion for multi-task generation with arbitrary camera control via the OmniCAM hybrid dataset and Dual-level Curriculum Co-Training.

$\pi^3$: Permutation-Equivariant Visual Geometry Learning

cs.CV · 2025-07-17 · conditional · novelty 7.0

π³ is a feed-forward network with full permutation equivariance that outputs affine-invariant poses and scale-invariant local point maps without reference frames, reaching state-of-the-art on camera pose, depth, and dense reconstruction benchmarks.

Divide and Conquer: Decoupled Representation Alignment for Multimodal World Models

cs.CV · 2026-05-03 · unverdicted · novelty 6.0

M²-REPA decouples modality-specific features inside a diffusion model and aligns each to its matching expert foundation model via an alignment loss plus a decoupling regularizer, yielding better visual quality and long-term consistency in multi-modal video generation.

Long-tail Internet photo reconstruction

cs.CV · 2026-04-24 · unverdicted · novelty 6.0

Finetuning 3D foundation models on simulated sparse subsets from MegaDepth-X produces robust reconstructions from extremely sparse, noisy internet photos while preserving performance on dense benchmarks.

Memorize When Needed: Decoupled Memory Control for Spatially Consistent Long-Horizon Video Generation

cs.CV · 2026-04-20 · unverdicted · novelty 6.0

A decoupled memory branch with hybrid cues, cross-attention, and gating improves spatial consistency and data efficiency in long-horizon camera-trajectory video generation.

Self-Improving 4D Perception via Self-Distillation

cs.CV · 2026-04-09 · unverdicted · novelty 6.0

SelfEvo enables pretrained 4D perception models to self-improve on unlabeled videos via self-distillation, delivering up to 36.5% relative gains in video depth estimation and 20.1% in camera estimation across eight benchmarks.

SurfelSplat: Learning Efficient and Generalizable Gaussian Surfel Representations for Sparse-View Surface Reconstruction

cs.CV · 2026-04-09 · unverdicted · novelty 6.0

A feed-forward model regresses accurate Gaussian surfel geometry from sparse views using Nyquist-guided cross-view feature aggregation, achieving 100x speedup over optimization-based 3DGS surface methods on DTU benchmarks.

INSPATIO-WORLD: A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling

cs.CV · 2026-04-08 · unverdicted · novelty 6.0

INSPATIO-WORLD is a real-time framework for high-fidelity 4D scene generation and navigation from monocular videos via STAR architecture with implicit caching, explicit geometric constraints, and distribution-matching distillation.

NavCrafter: Exploring 3D Scenes from a Single Image

cs.CV · 2026-04-03 · unverdicted · novelty 6.0

NavCrafter generates controllable novel-view videos from one image via video diffusion, geometry-aware expansion, and enhanced 3D Gaussian Splatting to achieve state-of-the-art synthesis under large viewpoint changes.

CameraCtrl: Enabling Camera Control for Text-to-Video Generation

cs.CV · 2024-04-02 · unverdicted · novelty 6.0

CameraCtrl enables accurate camera pose control in video diffusion models through a trained plug-and-play module and dataset choices emphasizing diverse camera trajectories with matching appearance.

RoSplat: Robust Feed-Forward Pixel-wise Gaussian Splatting for Varying Input Views and High-Resolution Rendering

cs.CV · 2026-05-13 · unverdicted · novelty 5.0

RoSplat adds alpha normalization for brightness consistency across varying input views and a 3D sampling regularizer to mitigate hole artifacts in high-resolution feed-forward Gaussian splatting.

AdaptSplat: Adapting Vision Foundation Models for Feed-Forward 3D Gaussian Splatting

cs.CV · 2026-05-11 · unverdicted · novelty 5.0

AdaptSplat adds a lightweight Frequency-Preserving Adapter to vision foundation models that extracts direction-aware high-frequency priors and integrates them via positional encodings and residual modulation to improve feed-forward 3D Gaussian Splatting fidelity and cross-domain performance.

citing papers explorer

Showing 28 of 28 citing papers.

GTA: Advancing Image-to-3D World Generation via Geometry Then Appearance Video Diffusion cs.CV · 2026-05-13 · unverdicted · none · ref 81 · internal anchor
GTA generates 3D worlds from single images via a two-stage video diffusion process that prioritizes geometry before appearance to improve structural consistency.
$h$-control: Training-Free Camera Control via Block-Conditional Gibbs Refinement cs.CV · 2026-05-12 · unverdicted · none · ref 54 · internal anchor
h-control introduces block-conditional pseudo-Gibbs refinement for training-free camera control in flow-matching video generators, achieving superior FVD scores on RealEstate10K and DAVIS benchmarks.
SplatWeaver: Learning to Allocate Gaussian Primitives for Generalizable Novel View Synthesis cs.CV · 2026-05-08 · unverdicted · none · ref 77 · internal anchor
SplatWeaver dynamically allocates Gaussian primitives via cardinality experts and pixel-level routing guided by high-frequency cues for improved generalizable novel view synthesis.
AirZoo: A Unified Large-Scale Dataset for Grounding Aerial Geometric 3D Vision cs.CV · 2026-04-29 · conditional · none · ref 60 · internal anchor
AirZoo is a new large-scale synthetic dataset for aerial 3D vision that improves state-of-the-art models on image retrieval, cross-view matching, and 3D reconstruction when used for fine-tuning.
GSCompleter: A Distillation-Free Plugin for Metric-Aware 3D Gaussian Splatting Completion in Seconds cs.CV · 2026-04-22 · unverdicted · none · ref 54 · internal anchor
GSCompleter completes sparse 3D Gaussian Splatting scenes via a distillation-free generate-then-register pipeline using Stereo-Anchor lifting and Ray-Constrained Registration, delivering SOTA results on three benchmarks.
URoPE: Universal Relative Position Embedding across Geometric Spaces cs.CV · 2026-04-20 · unverdicted · none · ref 43 · internal anchor
URoPE is a parameter-free relative position embedding for transformers that works across arbitrary geometric spaces by ray sampling and projection, yielding consistent gains on novel view synthesis, 3D detection, tracking, and depth estimation.
UniGeo: Unifying Geometric Guidance for Camera-Controllable Image Editing via Video Models cs.CV · 2026-04-19 · unverdicted · none · ref 95 · 2 links · internal anchor
UniGeo unifies geometric guidance across three levels in video models to reduce geometric drift and improve consistency in camera-controllable image editing.
GlobalSplat: Efficient Feed-Forward 3D Gaussian Splatting via Global Scene Tokens cs.CV · 2026-04-16 · unverdicted · none · ref 43 · internal anchor
GlobalSplat achieves competitive novel-view synthesis on RealEstate10K and ACID using only 16K Gaussians via global scene tokens and coarse-to-fine training, with a 4MB footprint and under 78ms inference.
TokenGS: Decoupling 3D Gaussian Prediction from Pixels with Learnable Tokens cs.CV · 2026-04-16 · unverdicted · none · ref 56 · internal anchor
TokenGS uses learnable Gaussian tokens in an encoder-decoder architecture to regress 3D means directly, achieving SOTA feed-forward reconstruction on static and dynamic scenes with better robustness.
CT-1: Vision-Language-Camera Models Transfer Spatial Reasoning Knowledge to Camera-Controllable Video Generation cs.CV · 2026-04-10 · unverdicted · none · ref 44 · internal anchor
CT-1 transfers spatial reasoning from vision-language models to estimate camera trajectories, which are then used in a video diffusion model with wavelet regularization to produce controllable videos, claiming 25.7% better accuracy than prior methods.
Novel View Synthesis as Video Completion cs.CV · 2026-04-09 · unverdicted · none · ref 56 · internal anchor
Video diffusion models can be adapted into permutation-invariant generators for sparse novel view synthesis by treating the problem as video completion and removing temporal order cues.
Mem3R: Streaming 3D Reconstruction with Hybrid Memory via Test-Time Training cs.CV · 2026-04-08 · unverdicted · none · ref 79 · internal anchor
Mem3R achieves better long-sequence 3D reconstruction by decoupling tracking and mapping with a hybrid memory of TTT-updated MLP and explicit tokens, reducing model size and trajectory errors.
OmniCamera: A Unified Framework for Multi-task Video Generation with Arbitrary Camera Control cs.CV · 2026-04-07 · unverdicted · none · ref 45 · internal anchor
OmniCamera disentangles video content and camera motion for multi-task generation with arbitrary camera control via the OmniCAM hybrid dataset and Dual-level Curriculum Co-Training.
$\pi^3$: Permutation-Equivariant Visual Geometry Learning cs.CV · 2025-07-17 · conditional · none · ref 18 · internal anchor
π³ is a feed-forward network with full permutation equivariance that outputs affine-invariant poses and scale-invariant local point maps without reference frames, reaching state-of-the-art on camera pose, depth, and dense reconstruction benchmarks.
Divide and Conquer: Decoupled Representation Alignment for Multimodal World Models cs.CV · 2026-05-03 · unverdicted · none · ref 63 · internal anchor
M²-REPA decouples modality-specific features inside a diffusion model and aligns each to its matching expert foundation model via an alignment loss plus a decoupling regularizer, yielding better visual quality and long-term consistency in multi-modal video generation.
Long-tail Internet photo reconstruction cs.CV · 2026-04-24 · unverdicted · none · ref 51 · internal anchor
Finetuning 3D foundation models on simulated sparse subsets from MegaDepth-X produces robust reconstructions from extremely sparse, noisy internet photos while preserving performance on dense benchmarks.
Memorize When Needed: Decoupled Memory Control for Spatially Consistent Long-Horizon Video Generation cs.CV · 2026-04-20 · unverdicted · none · ref 62 · internal anchor
A decoupled memory branch with hybrid cues, cross-attention, and gating improves spatial consistency and data efficiency in long-horizon camera-trajectory video generation.
Self-Improving 4D Perception via Self-Distillation cs.CV · 2026-04-09 · unverdicted · none · ref 81 · internal anchor
SelfEvo enables pretrained 4D perception models to self-improve on unlabeled videos via self-distillation, delivering up to 36.5% relative gains in video depth estimation and 20.1% in camera estimation across eight benchmarks.
SurfelSplat: Learning Efficient and Generalizable Gaussian Surfel Representations for Sparse-View Surface Reconstruction cs.CV · 2026-04-09 · unverdicted · none · ref 64 · internal anchor
A feed-forward model regresses accurate Gaussian surfel geometry from sparse views using Nyquist-guided cross-view feature aggregation, achieving 100x speedup over optimization-based 3DGS surface methods on DTU benchmarks.
INSPATIO-WORLD: A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling cs.CV · 2026-04-08 · unverdicted · none · ref 110 · internal anchor
INSPATIO-WORLD is a real-time framework for high-fidelity 4D scene generation and navigation from monocular videos via STAR architecture with implicit caching, explicit geometric constraints, and distribution-matching distillation.
NavCrafter: Exploring 3D Scenes from a Single Image cs.CV · 2026-04-03 · unverdicted · none · ref 29 · internal anchor
NavCrafter generates controllable novel-view videos from one image via video diffusion, geometry-aware expansion, and enhanced 3D Gaussian Splatting to achieve state-of-the-art synthesis under large viewpoint changes.
CameraCtrl: Enabling Camera Control for Text-to-Video Generation cs.CV · 2024-04-02 · unverdicted · none · ref 170 · internal anchor
CameraCtrl enables accurate camera pose control in video diffusion models through a trained plug-and-play module and dataset choices emphasizing diverse camera trajectories with matching appearance.
RoSplat: Robust Feed-Forward Pixel-wise Gaussian Splatting for Varying Input Views and High-Resolution Rendering cs.CV · 2026-05-13 · unverdicted · none · ref 30 · internal anchor
RoSplat adds alpha normalization for brightness consistency across varying input views and a 3D sampling regularizer to mitigate hole artifacts in high-resolution feed-forward Gaussian splatting.
AdaptSplat: Adapting Vision Foundation Models for Feed-Forward 3D Gaussian Splatting cs.CV · 2026-05-11 · unverdicted · none · ref 43 · internal anchor
AdaptSplat adds a lightweight Frequency-Preserving Adapter to vision foundation models that extracts direction-aware high-frequency priors and integrates them via positional encodings and residual modulation to improve feed-forward 3D Gaussian Splatting fidelity and cross-domain performance.
ST-Gen4D: Embedding 4D Spatiotemporal Cognition into World Model for 4D Generation cs.CV · 2026-05-08 · unverdicted · none · ref 34 · internal anchor
ST-Gen4D uses a world model that fuses global appearance and local dynamic graphs into a 4D cognition representation to guide consistent 4D Gaussian generation.
Video Generation with Predictive Latents cs.CV · 2026-05-04 · unverdicted · none · ref 69 · internal anchor
PV-VAE improves video latent spaces for generation by unifying reconstruction with future-frame prediction, reporting 52% faster convergence and 34.42 FVD gain over Wan2.2 VAE on UCF101.
Learning 3D Representations for Spatial Intelligence from Unposed Multi-View Images cs.CV · 2026-04-12 · unverdicted · none · ref 88 · internal anchor
UniSplat learns consistent 3D geometry, appearance, and semantics from unposed images using dual masking, progressive Gaussian splatting, and recalibration to align predictions across tasks.
Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory cs.CV · 2026-04-10 · unverdicted · none · ref 58 · internal anchor
Matrix-Game 3.0 delivers 720p real-time video generation at 40 FPS with minute-scale memory consistency by combining residual self-correction training, camera-aware memory injection, and DMD-based autoregressive distillation on a 5B model.

Stereo Magnification: Learning View Synthesis using Multiplane Images

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer