pith. sign in

Bread: Branched rollouts from expert anchors bridge sft & rl for reasoning, 2025 k

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

years

2026 5 2025 1

representative citing papers

ICRL: Learning to Internalize Self-Critique with Reinforcement Learning

cs.AI · 2026-05-13 · unverdicted · novelty 6.0

ICRL uses joint RL training of solver and critic with distribution-calibration re-weighting and role-wise advantage estimation to internalize critique into unassisted LLM performance, yielding 6.4-point gains on agentic tasks and 7.0 on math reasoning with Qwen3 models.

citing papers explorer

Showing 6 of 6 citing papers.