Presents a distributional model of linguistic confidence, Faithfulness Divergence metric, and RALC pipeline that boosts faithfulness and calibration on QA benchmarks across LLM families.
On calibration of modern neural networks
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
LiBaGS scores and selects synthetic data near decision boundaries using proximity, uncertainty, density, and validity, with boundary-gap allocation and marginal stopping to improve training accuracy.
Multimodal LLMs exhibit central tendency bias when scoring ordinal clinical images, over-predicting low scores and under-predicting high scores even after prompt ablations.
Conditional optimal transport is used to turn raw PRM outputs into monotonic quantile functions that improve calibration and downstream Best-of-N performance on MATH-500 and AIME.
SATTC improves top-k accuracy in cross-subject EEG-to-image retrieval by fusing geometric whitening and structural nearest-neighbor experts on the similarity matrix without labels.
GrACE is a fine-tuned generative method that uses similarity to a special token embedding for real-time calibrated confidence in LLMs and enables efficient confidence-based test-time scaling.
Max-plus neural networks enable tracing each output to one dominant neuron, allowing a pixel fragility measure that provides more useful explanations than SHAP or Integrated Gradients on medical images.
Neural activation coverage can be adapted to provide uncertainty estimates in regression that the authors' experiments show are more meaningful than Monte-Carlo Dropout.
citing papers explorer
-
Retrieval-Augmented Linguistic Calibration
Presents a distributional model of linguistic confidence, Faithfulness Divergence metric, and RALC pipeline that boosts faithfulness and calibration on QA benchmarks across LLM families.
-
GrACE: A Generative Approach to Better Confidence Elicitation and Efficient Test-Time Scaling in Large Language Models
GrACE is a fine-tuned generative method that uses similarity to a special token embedding for real-time calibrated confidence in LLMs and enables efficient confidence-based test-time scaling.