VLM judges exhibit task-dependent uncertainty in their scores, with conformal prediction revealing wide intervals for complex tasks and a decoupling between good ranking performance and poor absolute scoring reliability.
A comparison of some conformal quantile regression methods
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
Novel methods for valid conformal prediction after data-dependent model selection without additional sample splitting, with finite-sample guarantees and asymptotic optimality under regularity conditions.
citing papers explorer
-
VLM Judges Can Rank but Cannot Score: Task-Dependent Uncertainty in Multimodal Evaluation
VLM judges exhibit task-dependent uncertainty in their scores, with conformal prediction revealing wide intervals for complex tasks and a decoupling between good ranking performance and poor absolute scoring reliability.
-
Conformal prediction after data-dependent model selection
Novel methods for valid conformal prediction after data-dependent model selection without additional sample splitting, with finite-sample guarantees and asymptotic optimality under regularity conditions.