pith. sign in

Har- nessing vision foundation models for high-performance, training-free open vocabulary segmentation

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

fields

cs.CV 5 cs.RO 1

years

2026 5 2025 1

verdicts

UNVERDICTED 6

clear filters

representative citing papers

SegRAG: Training-Free Retrieval-Augmented Semantic Segmentation

cs.CV · 2026-05-17 · unverdicted · novelty 6.0 · 2 refs

SegRAG is a training-free retrieval-augmented framework that extracts class-specific point prompts from a filtered DINOv3 feature bank to boost SAM3 semantic segmentation performance on standard and agricultural benchmarks.

FreeOcc: Training-Free Embodied Open-Vocabulary Occupancy Prediction

cs.RO · 2026-04-30 · unverdicted · novelty 6.0

FreeOcc enables training-free open-vocabulary 3D occupancy prediction from RGB-D sequences by combining SLAM, dense Gaussian maps, off-the-shelf vision-language models, and probabilistic projection, achieving over 2x gains on benchmarks and zero-shot transfer to novel scenes.

Cross-Attentive Multiview Fusion of Vision-Language Embeddings

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

CAMFusion fuses multiview 2D vision-language embeddings via cross-attention and multiview consistency self-supervision to produce better 3D semantic and instance representations, outperforming averaging and reaching SOTA on benchmarks including zero-shot out-of-domain cases.

Monocular Open Vocabulary Occupancy Prediction for Indoor Scenes

cs.CV · 2026-02-26 · unverdicted · novelty 6.0

A 3D Language-Embedded Gaussians framework with opacity-aware Poisson volumetric aggregation and progressive temperature decay achieves 59.50 IoU and 21.05 mIoU on Occ-ScanNet for open-vocabulary indoor occupancy.

citing papers explorer

Showing 5 of 5 citing papers after filters.

  • SegRAG: Training-Free Retrieval-Augmented Semantic Segmentation cs.CV · 2026-05-17 · unverdicted · none · ref 80 · 2 links

    SegRAG is a training-free retrieval-augmented framework that extracts class-specific point prompts from a filtered DINOv3 feature bank to boost SAM3 semantic segmentation performance on standard and agricultural benchmarks.

  • FreeOcc: Training-Free Embodied Open-Vocabulary Occupancy Prediction cs.RO · 2026-04-30 · unverdicted · none · ref 62

    FreeOcc enables training-free open-vocabulary 3D occupancy prediction from RGB-D sequences by combining SLAM, dense Gaussian maps, off-the-shelf vision-language models, and probabilistic projection, achieving over 2x gains on benchmarks and zero-shot transfer to novel scenes.

  • RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments cs.CV · 2026-04-28 · unverdicted · none · ref 40

    RADIO-ViPE performs online open-vocabulary semantic SLAM directly from monocular RGB video in dynamic environments by tightly coupling vision-language embeddings from foundation models with geometric factor-graph optimization using adaptive robust kernels.

  • Cross-Attentive Multiview Fusion of Vision-Language Embeddings cs.CV · 2026-04-14 · unverdicted · none · ref 31

    CAMFusion fuses multiview 2D vision-language embeddings via cross-attention and multiview consistency self-supervision to produce better 3D semantic and instance representations, outperforming averaging and reaching SOTA on benchmarks including zero-shot out-of-domain cases.

  • Monocular Open Vocabulary Occupancy Prediction for Indoor Scenes cs.CV · 2026-02-26 · unverdicted · none · ref 35

    A 3D Language-Embedded Gaussians framework with opacity-aware Poisson volumetric aggregation and progressive temperature decay achieves 59.50 IoU and 21.05 mIoU on Occ-ScanNet for open-vocabulary indoor occupancy.