Evaluator VLMs frequently fail to detect quality-degrading perturbations in I2T and T2I outputs, with failure rates exceeding 50% in some cases.
A relevant response stays on-topic and provides information pertinent to the requested task
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Seeing Isn't Believing: Uncovering Blind Spots in Evaluator Vision-Language Models
Evaluator VLMs frequently fail to detect quality-degrading perturbations in I2T and T2I outputs, with failure rates exceeding 50% in some cases.