ZoeDepth combines relative depth pre-training on many datasets with metric depth fine-tuning and automatic head routing to achieve strong zero-shot generalization while preserving metric scale.
Irs: A large naturalistic indoor robotics stereo dataset to train deep models for dis- parity and surface normal estimation
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 6roles
dataset 2polarities
use dataset 2representative citing papers
GemDepth adds explicit camera-pose geometry embeddings and an alternating spatio-temporal transformer to produce sharper, more temporally consistent video depth maps than prior smoothing-based methods.
Lotus-2 is a two-stage deterministic adaptation of diffusion priors that achieves state-of-the-art monocular depth estimation with only 59K training samples.
Lite Any Stereo delivers top-ranked zero-shot accuracy on four real-world stereo benchmarks using a lightweight backbone, hybrid cost aggregation, and three-stage training on million-scale data, at less than 1% of typical computational cost.
DA3 recovers consistent visual geometry from arbitrary views via a vanilla DINO transformer and depth-ray target, setting new SOTA on a visual geometry benchmark while outperforming DA2 on monocular depth.
MoGe-2 recovers metric-scale 3D point maps with fine details from single images via data refinement and extension of affine-invariant predictions.
citing papers explorer
-
ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth
ZoeDepth combines relative depth pre-training on many datasets with metric depth fine-tuning and automatic head routing to achieve strong zero-shot generalization while preserving metric scale.
-
GemDepth: Geometry-Embedded Features for 3D-Consistent Video Depth
GemDepth adds explicit camera-pose geometry embeddings and an alternating spatio-temporal transformer to produce sharper, more temporally consistent video depth maps than prior smoothing-based methods.
-
Lotus-2: Advancing Geometric Dense Prediction with Powerful Image Generative Model
Lotus-2 is a two-stage deterministic adaptation of diffusion priors that achieves state-of-the-art monocular depth estimation with only 59K training samples.
-
Lite Any Stereo: Efficient Zero-Shot Stereo Matching
Lite Any Stereo delivers top-ranked zero-shot accuracy on four real-world stereo benchmarks using a lightweight backbone, hybrid cost aggregation, and three-stage training on million-scale data, at less than 1% of typical computational cost.
-
Depth Anything 3: Recovering the Visual Space from Any Views
DA3 recovers consistent visual geometry from arbitrary views via a vanilla DINO transformer and depth-ray target, setting new SOTA on a visual geometry benchmark while outperforming DA2 on monocular depth.
-
MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details
MoGe-2 recovers metric-scale 3D point maps with fine details from single images via data refinement and extension of affine-invariant predictions.