pith. sign in

Sparse but critical: A token-level analysis of distributional shifts in rlvr fine-tuning of llms

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

citation-role summary

method 1

citation-polarity summary

years

2026 7

verdicts

UNVERDICTED 7

roles

method 1

polarities

use method 1

clear filters

representative citing papers

Not only where, But when: Temporal Scheduling for RLVR

cs.LG · 2026-05-25 · unverdicted · novelty 7.0

Temporal scheduling of credit allocation criteria over RLVR training, using trajectory percentiles to target heterogeneous behaviors, yields more stable policy entropy and better reasoning benchmark results than static allocation.

One-Way Policy Optimization for Self-Evolving LLMs

cs.LG · 2026-05-21 · unverdicted · novelty 5.0

OWPO decouples optimization direction from magnitude via asymmetric reweighting (Accelerated Alignment for inferior deviations, Gain Locking for superior) plus iterative references to create a ratchet effect for continuous LLM improvement.

citing papers explorer

Showing 1 of 1 citing paper after filters.