Online Label Refinement lets LLMs learn robust reasoning from noisy supervision by correcting labels when majority answers show rising rollout success and stable history, delivering 3-4% gains on math and reasoning benchmarks even at high noise levels.
Making deep neural networks robust to label noise: A loss correction approach
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.LG 1years
2026 1verdicts
CONDITIONAL 1roles
background 1polarities
background 1representative citing papers
citing papers explorer
-
Can LLMs Learn to Reason Robustly under Noisy Supervision?
Online Label Refinement lets LLMs learn robust reasoning from noisy supervision by correcting labels when majority answers show rising rollout success and stable history, delivering 3-4% gains on math and reasoning benchmarks even at high noise levels.