pith. machine review for the scientific record. sign in

hub

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

21 Pith papers cite this work. Polarity classification is still indexing.

21 Pith papers citing it
abstract

We present a foundation model for zero-shot metric monocular depth estimation. Our model, Depth Pro, synthesizes high-resolution depth maps with unparalleled sharpness and high-frequency details. The predictions are metric, with absolute scale, without relying on the availability of metadata such as camera intrinsics. And the model is fast, producing a 2.25-megapixel depth map in 0.3 seconds on a standard GPU. These characteristics are enabled by a number of technical contributions, including an efficient multi-scale vision transformer for dense prediction, a training protocol that combines real and synthetic datasets to achieve high metric accuracy alongside fine boundary tracing, dedicated evaluation metrics for boundary accuracy in estimated depth maps, and state-of-the-art focal length estimation from a single image. Extensive experiments analyze specific design choices and demonstrate that Depth Pro outperforms prior work along multiple dimensions. We release code and weights at https://github.com/apple/ml-depth-pro

hub tools

citation-role summary

background 1

citation-polarity summary

years

2026 19 2025 2

roles

background 1

polarities

background 1

representative citing papers

Globally Optimal Pose from Orthographic Silhouettes

cs.CV · 2026-04-10 · unverdicted · novelty 7.0

A search-based algorithm achieves globally optimal pose estimation from silhouettes alone by querying precomputed area response surfaces and auxiliary ellipse aspect ratios for any shape.

3D-Fixer: Coarse-to-Fine In-place Completion for 3D Scenes from a Single Image

cs.CV · 2026-04-06 · unverdicted · novelty 7.0

3D-Fixer performs in-place 3D asset completion from single-view partial point clouds via coarse-to-fine generation with ORFA conditioning, plus a new ARSG-110K dataset, to achieve higher geometric accuracy than MIDI and Gen3DSR while keeping diffusion efficiency.

GeoQuery: Geometry-Query Diffusion for Sparse-View Reconstruction

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

GeoQuery replaces corrupted rendering features with geometry-aligned proxy queries and restricts cross-view attention to local windows, enabling robust diffusion-based refinement under extreme view sparsity.

Depth Anything 3: Recovering the Visual Space from Any Views

cs.CV · 2025-11-13 · unverdicted · novelty 6.0

DA3 recovers consistent visual geometry from arbitrary views via a vanilla DINO transformer and depth-ray target, setting new SOTA on a visual geometry benchmark while outperforming DA2 on monocular depth.

The Midas Touch for Metric Depth

cs.CV · 2026-05-12 · unverdicted · novelty 5.0

MTD turns relative depth into metric depth via segment-wise sparse graph optimization and discontinuity-aware geodesic pixel refinement, claiming better accuracy and generalization than prior depth methods.

Sapiens2

cs.CV · 2026-04-23 · unverdicted · novelty 5.0

Sapiens2 improves pretraining, data scale, and architecture over its predecessor to set new state-of-the-art results on human pose estimation, body-part segmentation, normal estimation, and new tasks like pointmap and albedo estimation.

Qwen-Image Technical Report

cs.CV · 2025-08-04 · unverdicted · novelty 5.0

Qwen-Image is a foundation model that reaches state-of-the-art results in image generation and editing by combining a large-scale text-focused data pipeline with curriculum learning and dual semantic-reconstructive encoding for editing consistency.

citing papers explorer

Showing 21 of 21 citing papers.