Vi- sion transformers for dense prediction

Ren ´e Ranftl, Alexey Bochkovskiy, Vladlen Koltun · 2021

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

CARD: A Multi-Modal Automotive Dataset for Dense 3D Reconstruction in Challenging Road Topography

cs.CV · 2026-05-06 · conditional · novelty 7.0

CARD is a new multi-modal driving dataset delivering ~500K dense depth pixels per frame from challenging road topographies using stereo cameras and fused LiDARs over 110 km.

VGGT-Segmentor: Geometry-Enhanced Cross-View Segmentation

cs.CV · 2026-04-15 · unverdicted · novelty 6.0 · 2 refs

VGGT-Segmentor achieves new SOTA cross-view segmentation on Ego-Exo4D (67.7% Ego-to-Exo, 68.0% Exo-to-Ego IoU) via geometry-enhanced features, a three-stage segmentation head, and correspondence-free pretraining.

LangFlash: Feed-forward 3D Language Gaussian Splatting from Sparse Unposed Images

cs.CV · 2026-05-22 · unverdicted · novelty 5.0

LangFlash introduces a feed-forward model for 3D language Gaussian splatting from sparse unposed images, claiming superior novel view synthesis and semantic consistency via enriched training data and sparse semantic encoding.

citing papers explorer

Showing 3 of 3 citing papers.

CARD: A Multi-Modal Automotive Dataset for Dense 3D Reconstruction in Challenging Road Topography cs.CV · 2026-05-06 · conditional · none · ref 40
CARD is a new multi-modal driving dataset delivering ~500K dense depth pixels per frame from challenging road topographies using stereo cameras and fused LiDARs over 110 km.
VGGT-Segmentor: Geometry-Enhanced Cross-View Segmentation cs.CV · 2026-04-15 · unverdicted · none · ref 39 · 2 links
VGGT-Segmentor achieves new SOTA cross-view segmentation on Ego-Exo4D (67.7% Ego-to-Exo, 68.0% Exo-to-Ego IoU) via geometry-enhanced features, a three-stage segmentation head, and correspondence-free pretraining.
LangFlash: Feed-forward 3D Language Gaussian Splatting from Sparse Unposed Images cs.CV · 2026-05-22 · unverdicted · none · ref 35
LangFlash introduces a feed-forward model for 3D language Gaussian splatting from sparse unposed images, claiming superior novel view synthesis and semantic consistency via enriched training data and sparse semantic encoding.

Vi- sion transformers for dense prediction

fields

years

verdicts

representative citing papers

citing papers explorer