pith. machine review for the scientific record. sign in

hub

Spurious rewards: Rethinking training signals in rlvr.arXiv preprint arXiv:2506.10947

16 Pith papers cite this work. Polarity classification is still indexing.

16 Pith papers citing it

hub tools

years

2026 15 2025 1

clear filters

representative citing papers

Reward Hacking in Rubric-Based Reinforcement Learning

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

Rubric-based RL verifiers can be gamed via partial criterion satisfaction and implicit-to-explicit tricks, yielding proxy gains that do not improve quality under rubric-free judges; stronger verifiers reduce but do not eliminate the mismatch.

H\"older Policy Optimisation

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

HölderPO unifies token aggregation in GRPO via the Hölder mean with dynamic p annealing, reporting 54.9% average math-benchmark accuracy and 93.8% ALFWorld success.

Characterizing Model-Native Skills

cs.AI · 2026-04-19 · conditional · novelty 6.0

Recovering an orthogonal basis from model activations yields a model-native skill characterization that improves reasoning Pass@1 by up to 41% via targeted data selection and supports inference steering, outperforming human-characterized alternatives.

citing papers explorer

Showing 2 of 2 citing papers after filters.

  • SFT-then-RL Outperforms Mixed-Policy Methods for LLM Reasoning cs.LG · 2026-04-26 · conditional · none · ref 42

    Correcting DeepSpeed optimizer and OpenRLHF loss bugs reveals SFT-then-RL outperforms mixed-policy methods by 3.8-22.2 points on math benchmarks.

  • Characterizing Model-Native Skills cs.AI · 2026-04-19 · conditional · none · ref 75

    Recovering an orthogonal basis from model activations yields a model-native skill characterization that improves reasoning Pass@1 by up to 41% via targeted data selection and supports inference steering, outperforming human-characterized alternatives.