Closing the confidence-faithfulness gap in large language models

Miranda Muqing Miao, Lyle Ungar · 2026 · arXiv 2603.25052

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

representative citing papers

The Geometry of Forgetting: Temporal Knowledge Drift as an Independent Axis in LLM Representations

cs.AI · 2026-05-09 · unverdicted · novelty 6.0

Temporal knowledge drift is encoded as a geometrically orthogonal direction in LLM residual streams, independent of correctness and uncertainty.

Hypothesis generation and updating in large language models

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

LLMs exhibit Bayesian-like hypothesis updating with strong-sampling bias and an evaluation-generation gap but generalize poorly outside observed data.

Process Supervision of Confidence Margin for Calibrated LLM Reasoning

cs.LG · 2026-04-25 · unverdicted · novelty 6.0

RLCM trains LLMs with a margin-enhanced process reward that widens the gap between correct and incorrect reasoning steps, improving calibration on math, code, logic, and science tasks without hurting accuracy.

Verbal Confidence Saturation in 3-9B Open-Weight Instruction-Tuned LLMs: A Pre-Registered Psychometric Validity Screen

cs.CL · 2026-04-24 · conditional · novelty 6.0

Seven 3-9B instruction-tuned LLMs produce verbal confidence that saturates at high values and fails psychometric validity criteria for Type-2 discrimination under minimal elicitation.

Distilling Self-Consistency into Verbal Confidence: A Pre-Registered Negative Result and Post-Hoc Rescue on Gemma 3 4B

cs.CL · 2026-04-27 · conditional · novelty 5.0

Fine-tuning Gemma 3 4B on unfiltered self-consistency targets produces a binary verbal correctness discriminator with AUROC 0.774 on TriviaQA, outperforming logit entropy after a modal-filtered pre-registration failed.

citing papers explorer

Showing 5 of 5 citing papers.

The Geometry of Forgetting: Temporal Knowledge Drift as an Independent Axis in LLM Representations cs.AI · 2026-05-09 · unverdicted · none · ref 28
Temporal knowledge drift is encoded as a geometrically orthogonal direction in LLM residual streams, independent of correctness and uncertainty.
Hypothesis generation and updating in large language models cs.LG · 2026-05-07 · unverdicted · none · ref 24
LLMs exhibit Bayesian-like hypothesis updating with strong-sampling bias and an evaluation-generation gap but generalize poorly outside observed data.
Process Supervision of Confidence Margin for Calibrated LLM Reasoning cs.LG · 2026-04-25 · unverdicted · none · ref 53
RLCM trains LLMs with a margin-enhanced process reward that widens the gap between correct and incorrect reasoning steps, improving calibration on math, code, logic, and science tasks without hurting accuracy.
Verbal Confidence Saturation in 3-9B Open-Weight Instruction-Tuned LLMs: A Pre-Registered Psychometric Validity Screen cs.CL · 2026-04-24 · conditional · none · ref 13
Seven 3-9B instruction-tuned LLMs produce verbal confidence that saturates at high values and fails psychometric validity criteria for Type-2 discrimination under minimal elicitation.
Distilling Self-Consistency into Verbal Confidence: A Pre-Registered Negative Result and Post-Hoc Rescue on Gemma 3 4B cs.CL · 2026-04-27 · conditional · none · ref 12
Fine-tuning Gemma 3 4B on unfiltered self-consistency targets produces a binary verbal correctness discriminator with AUROC 0.774 on TriviaQA, outperforming logit entropy after a modal-filtered pre-registration failed.

Closing the confidence-faithfulness gap in large language models

fields

years

verdicts

representative citing papers

citing papers explorer