3DVLA is a plug-and-play framework that enhances pretrained VLAs with pervasive 3D feature encoding using multi-view consistency and Spatially-Conditioned Geometry Aggregation, an instance estimation module, and a masked self-supervised 3D branch, yielding gains on LIBERO-Plus and RoboTwin 2.0.
Openad: Open-world au- tonomous driving benchmark for 3d object detection
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
SearchAD is a large-scale semantic image retrieval benchmark for rare driving scenarios that supports text-to-image and image-to-image tasks and shows text-based methods outperform image-based ones while overall performance stays limited.
DSAA boosts fine-grained OVD by injecting attribute priors via APA in text embeddings, modulating K/V in BERT, and using attribute-aware contrastive loss, with gains reported on FG-OVD benchmark.
citing papers explorer
-
3DVLA: Enhancing Vision-Language-Action Models via 3D Spatial and Instance Understanding
3DVLA is a plug-and-play framework that enhances pretrained VLAs with pervasive 3D feature encoding using multi-view consistency and Spatially-Conditioned Geometry Aggregation, an instance estimation module, and a masked self-supervised 3D branch, yielding gains on LIBERO-Plus and RoboTwin 2.0.