After length correction, reasoning-trained language models exhibit distinct hidden-state trajectory geometries on harder problems compared to instruction-tuned baselines, with the strongest effect in code domains.
LLMs encode how difficult problems are.arXiv preprint arXiv:2510.18147
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
LCAE is introduced as a Rasch-model metric that aligns LLM self-reported confidence with latent error probability derived from ability and item difficulty, shown to improve calibration on a medical dataset across 20 models.
citing papers explorer
-
Latent Confidence Alignment for LLM Self-Assessment
LCAE is introduced as a Rasch-model metric that aligns LLM self-reported confidence with latent error probability derived from ability and item difficulty, shown to improve calibration on a medical dataset across 20 models.