CalArena is a large-scale benchmark that evaluates dozens of post-hoc calibration methods using Post-Hoc Improvement (PHI) in proper scoring rules and finds that smooth functions outperform binning while dedicated multiclass methods are required in high-dimensional settings.
arXiv preprint arXiv:2601.19944 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2representative citing papers
A new 2x2 diagnostic matrix classifies probabilistic classifiers into Eagles, Bulls, Sloths, and Moles by calibration and discrimination, with empirical archetype assignments and a proof that post-hoc calibration cannot add discriminatory power.
citing papers explorer
-
CalArena: A Large-Scale Post-Hoc Calibration Benchmark
CalArena is a large-scale benchmark that evaluates dozens of post-hoc calibration methods using Post-Hoc Improvement (PHI) in proper scoring rules and finds that smooth functions outperform binning while dedicated multiclass methods are required in high-dimensional settings.
-
The Manokhin Probability Matrix: A Diagnostic Framework for Classifier Probability Quality
A new 2x2 diagnostic matrix classifies probabilistic classifiers into Eagles, Bulls, Sloths, and Moles by calibration and discrimination, with empirical archetype assignments and a proof that post-hoc calibration cannot add discriminatory power.