CARD is a new multi-modal driving dataset delivering ~500K dense depth pixels per frame from challenging road topographies using stereo cameras and fused LiDARs over 110 km.
Vi- sion transformers for dense prediction
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3years
2026 3representative citing papers
VGGT-Segmentor achieves new SOTA cross-view segmentation on Ego-Exo4D (67.7% Ego-to-Exo, 68.0% Exo-to-Ego IoU) via geometry-enhanced features, a three-stage segmentation head, and correspondence-free pretraining.
LangFlash introduces a feed-forward model for 3D language Gaussian splatting from sparse unposed images, claiming superior novel view synthesis and semantic consistency via enriched training data and sparse semantic encoding.
citing papers explorer
-
CARD: A Multi-Modal Automotive Dataset for Dense 3D Reconstruction in Challenging Road Topography
CARD is a new multi-modal driving dataset delivering ~500K dense depth pixels per frame from challenging road topographies using stereo cameras and fused LiDARs over 110 km.
-
VGGT-Segmentor: Geometry-Enhanced Cross-View Segmentation
VGGT-Segmentor achieves new SOTA cross-view segmentation on Ego-Exo4D (67.7% Ego-to-Exo, 68.0% Exo-to-Ego IoU) via geometry-enhanced features, a three-stage segmentation head, and correspondence-free pretraining.
-
LangFlash: Feed-forward 3D Language Gaussian Splatting from Sparse Unposed Images
LangFlash introduces a feed-forward model for 3D language Gaussian splatting from sparse unposed images, claiming superior novel view synthesis and semantic consistency via enriched training data and sparse semantic encoding.