For each category: mean score, standard deviation (across models), minimum and maximum scores (model variance), Cannot- Answer rate, and question count

Category-Level Statistical Summary Table 9 shows statistics for all 25 top-level categories under the Simple prompt, aggregated across all models

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

CaptionQA: Is Your Caption as Useful as the Image Itself?

cs.CV · 2025-11-26 · conditional · novelty 7.0

CaptionQA is a new benchmark with 33,027 questions across natural, document, e-commerce, and embodied AI domains that measures how much utility model-generated captions retain compared to original images when used by LLMs for downstream tasks.

citing papers explorer

Showing 1 of 1 citing paper.

CaptionQA: Is Your Caption as Useful as the Image Itself? cs.CV · 2025-11-26 · conditional · none · ref 56
CaptionQA is a new benchmark with 33,027 questions across natural, document, e-commerce, and embodied AI domains that measures how much utility model-generated captions retain compared to original images when used by LLMs for downstream tasks.

For each category: mean score, standard deviation (across models), minimum and maximum scores (model variance), Cannot- Answer rate, and question count

fields

years

verdicts

representative citing papers

citing papers explorer