Posterior calibration and exploratory analysis for natural language processing models

Khanh Nguyen , Brendan O'Connor

Authors on Pith no claims yet

classification 💻 cs.CL

keywords analysismodelscalibrationexploratorylanguagenaturalposteriorprocessing

read the original abstract

Many models in natural language processing define probabilistic distributions over linguistic structures. We argue that (1) the quality of a model' s posterior distribution can and should be directly evaluated, as to whether probabilities correspond to empirical frequencies, and (2) NLP uncertainty can be projected not only to pipeline components, but also to exploratory data analysis, telling a user when to trust and not trust the NLP analysis. We present a method to analyze calibration, and apply it to compare the miscalibration of several commonly used models. We also contribute a coreference sampling algorithm that can create confidence intervals for a political event extraction task.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Generative Cross-Entropy: A Strictly Proper Loss for Data-Efficient Classification
cs.LG 2026-04 unverdicted novelty 6.0

GenCE is a strictly proper loss obtained by normalizing each sample's softmax against the batch predictions, outperforming cross-entropy in low-data and imbalanced regimes with better calibration and OOD detection.
Generative Cross-Entropy: A Strictly Proper Loss for Data-Efficient Classification
cs.LG 2026-04 unverdicted novelty 5.0

Generative Cross-Entropy loss improves both accuracy and calibration over standard cross-entropy by augmenting it with a generative p(x|y) term, especially on long-tailed data, and pairs with adaptive temperature scal...