FreeOcc enables training-free open-vocabulary 3D occupancy prediction from RGB-D sequences by combining SLAM, dense Gaussian maps, off-the-shelf vision-language models, and probabilistic projection, achieving over 2x gains on benchmarks and zero-shot transfer to novel scenes.
Harnessing vision foundation models for high-performance, training- free open vocabulary segmentation
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
RADIO-ViPE performs online open-vocabulary semantic SLAM directly from monocular RGB video in dynamic environments by tightly coupling vision-language embeddings from foundation models with geometric factor-graph optimization using adaptive robust kernels.
CAMFusion fuses multiview 2D vision-language embeddings via cross-attention and multiview consistency self-supervision to produce better 3D semantic and instance representations, outperforming averaging and reaching SOTA on benchmarks including zero-shot out-of-domain cases.
citing papers explorer
-
FreeOcc: Training-Free Embodied Open-Vocabulary Occupancy Prediction
FreeOcc enables training-free open-vocabulary 3D occupancy prediction from RGB-D sequences by combining SLAM, dense Gaussian maps, off-the-shelf vision-language models, and probabilistic projection, achieving over 2x gains on benchmarks and zero-shot transfer to novel scenes.
-
RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments
RADIO-ViPE performs online open-vocabulary semantic SLAM directly from monocular RGB video in dynamic environments by tightly coupling vision-language embeddings from foundation models with geometric factor-graph optimization using adaptive robust kernels.
-
Cross-Attentive Multiview Fusion of Vision-Language Embeddings
CAMFusion fuses multiview 2D vision-language embeddings via cross-attention and multiview consistency self-supervision to produce better 3D semantic and instance representations, outperforming averaging and reaching SOTA on benchmarks including zero-shot out-of-domain cases.