The authors advocate UserSOD, a salient object detection task that identifies objects matching users' proactive needs rather than strongest visual stimuli.
Learn- ing transferable visual models from natural language super- vision
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.CV 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
GETok partitions images with grid tokens and refines locations via offset tokens to enable better native 2D spatial reasoning in MLLMs.
citing papers explorer
-
Determined by User Needs: A Salient Object Detection Rationale Beyond Conventional Visual Stimuli
The authors advocate UserSOD, a salient object detection task that identifies objects matching users' proactive needs rather than strongest visual stimuli.
-
Grounding Everything in Tokens for Multimodal Large Language Models
GETok partitions images with grid tokens and refines locations via offset tokens to enable better native 2D spatial reasoning in MLLMs.