Depth anything v2.Advances in Neural Information Processing Systems, 37:21875– 21911

Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao · 2024

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

VEGA: Visual Encoder Grounding Alignment for Spatially-Aware Vision-Language-Action Models

cs.RO · 2026-05-11 · unverdicted · novelty 7.0

VEGA improves spatial reasoning in VLA models for robotics by aligning visual encoder features with 3D-supervised DINOv2 representations via a temporary projector and cosine similarity loss.

Focusable Monocular Depth Estimation

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

FocusDepth is a prompt-conditioned framework that fuses SAM3 features into Depth Anything models via Multi-Scale Spatial-Aligned Fusion to improve target-region depth accuracy on the new FDE-Bench.

SpatialForge: Bootstrapping 3D-Aware Spatial Reasoning from Open-World 2D Images

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

SpatialForge synthesizes 10 million spatial QA pairs from in-the-wild 2D images to train VLMs for better depth ordering, layout, and viewpoint-dependent reasoning.

citing papers explorer

Showing 3 of 3 citing papers.

VEGA: Visual Encoder Grounding Alignment for Spatially-Aware Vision-Language-Action Models cs.RO · 2026-05-11 · unverdicted · none · ref 41
VEGA improves spatial reasoning in VLA models for robotics by aligning visual encoder features with 3D-supervised DINOv2 representations via a temporary projector and cosine similarity loss.
Focusable Monocular Depth Estimation cs.CV · 2026-05-12 · unverdicted · none · ref 34
FocusDepth is a prompt-conditioned framework that fuses SAM3 features into Depth Anything models via Multi-Scale Spatial-Aligned Fusion to improve target-region depth accuracy on the new FDE-Bench.
SpatialForge: Bootstrapping 3D-Aware Spatial Reasoning from Open-World 2D Images cs.CV · 2026-05-12 · unverdicted · none · ref 44
SpatialForge synthesizes 10 million spatial QA pairs from in-the-wild 2D images to train VLMs for better depth ordering, layout, and viewpoint-dependent reasoning.

Depth anything v2.Advances in Neural Information Processing Systems, 37:21875– 21911

fields

years

verdicts

representative citing papers

citing papers explorer