SetCon achieves state-of-the-art open-ended referring segmentation by using LVLM-generated set-level concepts for joint mask decoding, with gains increasing for multi-target cases on image and video benchmarks.
Lisa: Reasoning segmentation via large language model
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 3years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
WebEye benchmark and Pixel-Searcher agent enable visual perception tasks by using web search to resolve object identities before precise localization or answering.
B-GRTO extends GRPO by reusing rollouts to optimize auxiliary segmentation decoder objectives, yielding substantial gains over plain GRPO on referring segmentation tasks.
citing papers explorer
-
SetCon: Towards Open-Ended Referring Segmentation via Set-Level Concept Prediction
SetCon achieves state-of-the-art open-ended referring segmentation by using LVLM-generated set-level concepts for joint mask decoding, with gains increasing for multi-target cases on image and video benchmarks.
-
From Web to Pixels: Bringing Agentic Search into Visual Perception
WebEye benchmark and Pixel-Searcher agent enable visual perception tasks by using web search to resolve object identities before precise localization or answering.
-
B-GRTO: Bootstrapped Group Relative Tool Optimization for Referring Segmentation
B-GRTO extends GRPO by reusing rollouts to optimize auxiliary segmentation decoder objectives, yielding substantial gains over plain GRPO on referring segmentation tasks.