Global calibration metrics like ECE are confounded by accuracy; the proposed ACE framework with three accuracy-controlled views shows many prior calibration advantages weaken or reverse.
Sayself: Teaching llms to express con- fidence with self-reflective rationales.arXiv preprint arXiv:2405.20974
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
citing papers explorer
-
When Calibration Rankings Reverse: Accuracy-Controlled Evaluation for Fair Comparison of LLMs
Global calibration metrics like ECE are confounded by accuracy; the proposed ACE framework with three accuracy-controlled views shows many prior calibration advantages weaken or reverse.
- Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards