Beyond Correctness: Confidence-Aware Reward Modeling for Enhancing Large Language Model Reasoning

He, Qianxi, Ren, Qingyu, Lei, Shanzhe, Wang, Xuhong, Wang, Yingchun · 2025 · DOI 10.18653/v1/2025.emnlp-main.1385

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open at publisher browse 1 citing papers

representative citing papers

Scaling with Confidence: Calibrating Confidence of LLMs for Adaptive Test Time Scaling

cs.AI · 2026-07-02 · unverdicted · novelty 5.0

C3RL is a new RL algorithm combining correctness, calibration, and reference accuracy rewards to improve LLM confidence calibration, enabling CAS to outperform majority voting with up to 12.33x lower inference cost.

citing papers explorer

Showing 1 of 1 citing paper.

Scaling with Confidence: Calibrating Confidence of LLMs for Adaptive Test Time Scaling cs.AI · 2026-07-02 · unverdicted · none · ref 56
C3RL is a new RL algorithm combining correctness, calibration, and reference accuracy rewards to improve LLM confidence calibration, enabling CAS to outperform majority voting with up to 12.33x lower inference cost.

Beyond Correctness: Confidence-Aware Reward Modeling for Enhancing Large Language Model Reasoning

fields

years

verdicts

representative citing papers

citing papers explorer