pith. sign in

hub

arXiv preprint arXiv:2508.13167 , year=

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

hub tools

citation-role summary

background 3 baseline 1

citation-polarity summary

years

2026 8 2025 2

representative citing papers

ICRL: Learning to Internalize Self-Critique with Reinforcement Learning

cs.AI · 2026-05-13 · unverdicted · novelty 6.0

ICRL uses joint RL training of solver and critic with distribution-calibration re-weighting and role-wise advantage estimation to internalize critique into unassisted LLM performance, yielding 6.4-point gains on agentic tasks and 7.0 on math reasoning with Qwen3 models.

citing papers explorer

Showing 10 of 10 citing papers.