The upper bag has a black base color with a bold geometric pattern of interlocking diamond and flower-like shapes in white, light blue, and dark blue

Detail-Description (Robustness) [Unbiased Image] [Biased Image] Input Data [Unbiased] Q:“Compared to the larger black luggage, what common personal electronic device is the pattern

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

MM-JudgeBias: A Benchmark for Evaluating Compositional Biases in MLLM-as-a-Judge

cs.CL · 2026-04-20 · unverdicted · novelty 7.0

MM-JudgeBias benchmark shows that many MLLM judges neglect modalities and produce unstable evaluations under small input changes, based on tests of 26 models with over 1,800 samples.

citing papers explorer

Showing 1 of 1 citing paper.

MM-JudgeBias: A Benchmark for Evaluating Compositional Biases in MLLM-as-a-Judge cs.CL · 2026-04-20 · unverdicted · none · ref 20
MM-JudgeBias benchmark shows that many MLLM judges neglect modalities and produce unstable evaluations under small input changes, based on tests of 26 models with over 1,800 samples.

The upper bag has a black base color with a bold geometric pattern of interlocking diamond and flower-like shapes in white, light blue, and dark blue

fields

years

verdicts

representative citing papers

citing papers explorer