PCHI uses a frozen probe to detect likely wrong-but-confident LLM responses and conditionally intervenes on attention heads during confidence generation, converting 82.2% of wrong high-confidence outputs to low while damaging only 5.1% of correct ones and lowering ECE from 21.9% to 9.2%.
hub
A Survey of Con- fidence Estimation and Calibration in Large Language Models
13 Pith papers cite this work. Polarity classification is still indexing.
hub tools
representative citing papers
Three code-specific uncertainty axes (lexical, algorithmic, functional) yield an ensemble that raises average AUROC from 0.696 to 0.776 across five code LLMs, with one single-pass signal matching multi-pass baselines at lower cost.
Introduces DOSEBENCH benchmark and shows four LLMs often fail at rolling 24-hour dose calculations and constraint adherence in OTC dosing decisions despite appearing confident.
A new framework quantifies faithful confidence expression in large reasoning models by comparing linguistic decisiveness to token probabilities, hidden states, and response consistency, revealing it as a persistent challenge.
LLMs correct only 34.8% of zero-shot annotation errors via prompting, and Definition-Specific Familiarity correlates positively with performance (partial r = +0.41) while memorization metrics do not.
BAG prompts LLMs to reason over K sampled responses for strategy selection in multi-turn ambiguous QA, improving accuracy and faithfulness to uncertainty over baselines across six models.
Adapts multi-layer token-level Mahalanobis distance with supervised linear regression to yield improved uncertainty scores for LLM truthfulness tasks.
A thermodynamic-inspired information-geometric framework defines a composite LLM stability score that outperforms a utility-entropy baseline by 0.0299 on average across 80 observations, with gains increasing at higher entropy.
TRACE aggregates answer consistency and confidence trajectory over multiple reasoning steps to decide when to halt inference, reducing token usage by 25-30% while keeping accuracy within 1-2% of full reasoning.
Verbalized confidence from small LMs enables cost-effective cascade routing for automated educational scoring, matching large-model accuracy at 76% lower cost when discrimination is strong.
Simple supervision improves LLM distributional alignment with diverse population groups on three datasets, with evaluation across multiple models and prompts providing a benchmark.
citing papers explorer
-
Calibrating Overconfidence Without Sacrificing Confidence: Probe-Conditioned Head Intervention for LLMs
PCHI uses a frozen probe to detect likely wrong-but-confident LLM responses and conditionally intervenes on attention heads during confidence generation, converting 82.2% of wrong high-confidence outputs to low while damaging only 5.1% of correct ones and lowering ECE from 21.9% to 9.2%.