pith. sign in

arXiv preprint arXiv:2509.21016 , year=

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

fields

cs.LG 2 cs.CL 1

years

2026 3

verdicts

UNVERDICTED 3

clear filters

representative citing papers

VeriGate: Verifier-Gated Step-Level Supervision for GRPO

cs.LG · 2026-05-28 · unverdicted · novelty 6.0

VeriGate adds verifier-gated step-level supervision to GRPO via cumulated PRM rewards and group-normalized token advantages, raising accuracy 20% and 12% on 1.5B and 7B models on MATH and six benchmarks.

citing papers explorer

Showing 2 of 2 citing papers after filters.

  • The Hidden Bias of Process Reward Models:PRISM for Rewarding the Right Reasoning cs.LG · 2026-06-08 · unverdicted · none · ref 11

    PRISM is a contrastive, policy-aware training framework for process reward models that reduces false positives by 22% on PRMBench and boosts downstream accuracy up to 33% in Best-of-N selection by learning reliable relative comparisons instead of pointwise labels.

  • VeriGate: Verifier-Gated Step-Level Supervision for GRPO cs.LG · 2026-05-28 · unverdicted · none · ref 15

    VeriGate adds verifier-gated step-level supervision to GRPO via cumulated PRM rewards and group-normalized token advantages, raising accuracy 20% and 12% on 1.5B and 7B models on MATH and six benchmarks.