None of 11 frontier LLMs reach 90% coverage on QuantSightBench prediction intervals; top models hit 75-79% and show overconfidence especially at extreme magnitudes.
PREDICTION REQUIREMENTS: Provide a {probability level} prediction interval (lower, median, upper) in the same units as the resolution criteria
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
QuantSightBench: Evaluating LLM Quantitative Forecasting with Prediction Intervals
None of 11 frontier LLMs reach 90% coverage on QuantSightBench prediction intervals; top models hit 75-79% and show overconfidence especially at extreme magnitudes.