Trace length is a simple un- certainty signal in reasoning models.arXiv preprint arXiv:2510.10409

Trace length is a simple uncertainty signal in reasoning models , author= · 2025 · arXiv 2510.10409

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Your Language Model is Its Own Critic: Reinforcement Learning with Value Estimation from Actor's Internal States

cs.LG · 2026-05-08 · unverdicted · novelty 7.0 · 2 refs

POISE trains a lightweight probe on the actor's internal states to predict expected rewards for RLVR, matching DAPO performance on math benchmarks with lower compute by avoiding extra rollouts or critic models.

How Language Models Fail: Token-Level Signatures of Committed and Persistent Reasoning Failures

cs.CL · 2026-06-04 · unverdicted · novelty 6.0

LLM reasoning failures split into committed (early lock-in) and persistent-uncertainty modes with distinct token-level signatures that hold across 23 model-dataset pairs in 20 of 23 falsifiable tests.

VERDI: Single-Call Confidence Estimation for Verification-Based LLM Judges via Decomposed Inference

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

VERDI derives three structural confidence signals from decomposed LLM verification traces and calibrates them with Platt-scaled logistic regression to achieve AUROC 0.72-0.91 on GPT models and 0.56-0.70 on Qwen models where log-probabilities fail.

SELFDOUBT: Uncertainty Quantification for Reasoning LLMs via the Hedge-to-Verify Ratio

cs.AI · 2026-04-07 · unverdicted · novelty 6.0

SELFDOUBT introduces the Hedge-to-Verify Ratio from reasoning traces as a single-pass uncertainty signal, with no-hedge traces correct 96% of the time and outperforming semantic entropy at 10x lower cost.

How do LLMs Compute Verbal Confidence

cs.CL · 2026-03-18 · unverdicted · novelty 6.0

Mechanistic experiments on Gemma 3 27B, Qwen 2.5 7B and Magistral Small 24B show verbal confidence is cached at post-answer positions from answer tokens and captures richer answer-quality information beyond token log-probabilities.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Your Language Model is Its Own Critic: Reinforcement Learning with Value Estimation from Actor's Internal States cs.LG · 2026-05-08 · unverdicted · none · ref 5 · 2 links
POISE trains a lightweight probe on the actor's internal states to predict expected rewards for RLVR, matching DAPO performance on math benchmarks with lower compute by avoiding extra rollouts or critic models.

Trace length is a simple un- certainty signal in reasoning models.arXiv preprint arXiv:2510.10409

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer