Recognition: unknown
Posterior calibration and exploratory analysis for natural language processing models
read the original abstract
Many models in natural language processing define probabilistic distributions over linguistic structures. We argue that (1) the quality of a model' s posterior distribution can and should be directly evaluated, as to whether probabilities correspond to empirical frequencies, and (2) NLP uncertainty can be projected not only to pipeline components, but also to exploratory data analysis, telling a user when to trust and not trust the NLP analysis. We present a method to analyze calibration, and apply it to compare the miscalibration of several commonly used models. We also contribute a coreference sampling algorithm that can create confidence intervals for a political event extraction task.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
Generative Cross-Entropy: A Strictly Proper Loss for Data-Efficient Classification
GenCE is a strictly proper loss obtained by normalizing each sample's softmax against the batch predictions, outperforming cross-entropy in low-data and imbalanced regimes with better calibration and OOD detection.
-
Generative Cross-Entropy: A Strictly Proper Loss for Data-Efficient Classification
Generative Cross-Entropy loss improves both accuracy and calibration over standard cross-entropy by augmenting it with a generative p(x|y) term, especially on long-tailed data, and pairs with adaptive temperature scal...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.