More capable LLMs produce worse distributional forecasts on superlinear growth time series with tail risks of regime change, with the error concentrated in the upper tail; this reverses on conventional threshold metrics.
Journal of Applied Meteorology , volume=
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
TRIAGE evaluates LLMs on prospective metacognitive control by requiring a single plan for task selection, sequencing, and token allocation under a calibrated budget, revealing substantial gaps in current models across math, science, code, and knowledge tasks.
A new 2x2 diagnostic matrix classifies probabilistic classifiers into Eagles, Bulls, Sloths, and Moles by calibration and discrimination, with empirical archetype assignments and a proof that post-hoc calibration cannot add discriminatory power.
citing papers explorer
-
The Manokhin Probability Matrix: A Diagnostic Framework for Classifier Probability Quality
A new 2x2 diagnostic matrix classifies probabilistic classifiers into Eagles, Bulls, Sloths, and Moles by calibration and discrimination, with empirical archetype assignments and a proof that post-hoc calibration cannot add discriminatory power.