pith. sign in

arXiv preprint arXiv:2402.10963 , year=

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

fields

cs.CL 1 cs.LG 1

years

2026 1 2024 1

verdicts

UNVERDICTED 2

representative citing papers

CATPO: Critique-Augmented Tree Policy Optimization

cs.CL · 2026-06-06 · unverdicted · novelty 6.0

CATPO introduces an informativeness score F(T) and critique-guided healing for failed trees to improve efficiency and performance in tree-based RLVR, reaching 37.5% macro accuracy on math benchmarks.

citing papers explorer

Showing 2 of 2 citing papers.

  • CATPO: Critique-Augmented Tree Policy Optimization cs.CL · 2026-06-06 · unverdicted · none · ref 6

    CATPO introduces an informativeness score F(T) and critique-guided healing for failed trees to improve efficiency and performance in tree-based RLVR, reaching 37.5% macro accuracy on math benchmarks.

  • Training Language Models to Self-Correct via Reinforcement Learning cs.LG · 2024-09-19 · unverdicted · none · ref 131

    SCoRe uses multi-turn online RL with regularization on self-generated traces to improve LLM self-correction, achieving 15.6% and 9.1% gains on MATH and HumanEval for Gemini models.