CalArena is a large-scale benchmark that evaluates dozens of post-hoc calibration methods using Post-Hoc Improvement (PHI) in proper scoring rules and finds that smooth functions outperform binning while dedicated multiclass methods are required in high-dimensional settings.
Beyond overconfidence: foundation models redefine calibration in deep neural networks.arXiv preprint arXiv:2506.09593, 2025
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
CalArena: A Large-Scale Post-Hoc Calibration Benchmark
CalArena is a large-scale benchmark that evaluates dozens of post-hoc calibration methods using Post-Hoc Improvement (PHI) in proper scoring rules and finds that smooth functions outperform binning while dedicated multiclass methods are required in high-dimensional settings.