LLMs score 0.96 on standard probability exercises but 0.59 on counterintuitive ones and drop further with biased wording or misleading cues, indicating they are not genuine probabilistic reasoners.
Slater, Ali Ziaee and Morgan Nguyen
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Numeric anchors embedded in images systematically bias VLM quality judgments more than severe visual degradation, with layer-wise probing showing that anchor-saturated layers are suboptimal for quality prediction.
citing papers explorer
-
How reliable are LLMs when it comes to playing dice?
LLMs score 0.96 on standard probability exercises but 0.59 on counterintuitive ones and drop further with biased wording or misleading cues, indicating they are not genuine probabilistic reasoners.
-
Don't Look at the Numbers: Visual Anchoring Bias and Layer-wise Representation in VLMs
Numeric anchors embedded in images systematically bias VLM quality judgments more than severe visual degradation, with layer-wise probing showing that anchor-saturated layers are suboptimal for quality prediction.