pith. machine review for the scientific record. sign in

arXiv preprint arXiv:2502.18770 , year=

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

years

2026 7 2025 1

roles

background 1

polarities

background 1

clear filters

representative citing papers

Variance-aware Reward Modeling with Anchor Guidance

stat.ML · 2026-05-12 · unverdicted · novelty 7.0

Anchor-guided variance-aware reward modeling uses two response-level anchors to resolve non-identifiability in Gaussian models of pluralistic preferences, yielding provable identification, a joint training objective, and improved RLHF performance.

Reward Hacking in Rubric-Based Reinforcement Learning

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

Rubric-based RL verifiers can be gamed via partial criterion satisfaction and implicit-to-explicit tricks, yielding proxy gains that do not improve quality under rubric-free judges; stronger verifiers reduce but do not eliminate the mismatch.

Optimal Transport for LLM Reward Modeling from Noisy Preference

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

SelectiveRM applies optimal transport with a joint consistency discrepancy and partial mass relaxation to produce reward models that optimize a tighter upper bound on clean risk while autonomously dropping noisy preference samples.

citing papers explorer

Showing 8 of 8 citing papers.