Evaluator VLMs frequently fail to detect quality-degrading perturbations in I2T and T2I outputs, with failure rates exceeding 50% in some cases.
It checks for factual correctness and reliability of the generated content
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Seeing Isn't Believing: Uncovering Blind Spots in Evaluator Vision-Language Models
Evaluator VLMs frequently fail to detect quality-degrading perturbations in I2T and T2I outputs, with failure rates exceeding 50% in some cases.