pith. machine review for the scientific record. sign in

Helping or herding? reward model ensembles mitigate but do not eliminate reward hacking.arXiv preprint arXiv:2312.09244

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

years

2026 6

verdicts

UNVERDICTED 6

representative citing papers

How Far Are Video Models from True Multimodal Reasoning?

cs.CV · 2026-04-21 · unverdicted · novelty 6.0

Current video models succeed on basic understanding but achieve under 25% success on logically grounded generation and near 0% on interactive generation, exposing gaps in multimodal reasoning.

FUSE: Ensembling Verifiers with Zero Labeled Data

stat.ML · 2026-04-20 · unverdicted · novelty 6.0

FUSE ensembles verifiers unsupervisedly by controlling their conditional dependencies to improve spectral ensembling algorithms, matching or exceeding semi-supervised baselines on benchmarks including GPQA Diamond and Humanity's Last Exam.

citing papers explorer

Showing 6 of 6 citing papers.

  • Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoring cs.SE · 2026-05-01 · unverdicted · none · ref 5 · 2 links

    Themis introduces the largest open code preference dataset with over 350k pairs and trains multilingual reward models from 600M to 32B parameters that support flexible multi-criteria scoring, with experiments showing scaling trends and cross-lingual transfer.

  • Beyond Semantic Manipulation: Token-Space Attacks on Reward Models cs.LG · 2026-04-03 · unverdicted · none · ref 4

    TOMPA performs black-box adversarial optimization in token space to discover non-linguistic patterns that nearly double the reward scores of GPT-5 answers on Skywork-Reward-V2 while producing gibberish text.

  • Response Time Enhances Alignment with Heterogeneous Preferences cs.LG · 2026-05-07 · unverdicted · none · ref 16

    Response times modeled as drift-diffusion processes enable consistent estimation of population-average preferences from heterogeneous anonymous binary choices.

  • Power Distribution Bridges Sampling, Self-Reward RL, and Self-Distillation cs.LG · 2026-05-06 · unverdicted · none · ref 126

    The power distribution is the target of power sampling, the closed-form solution to self-reward KL-regularized RL, and the basis for power self-distillation that matches sampling performance at lower cost.

  • How Far Are Video Models from True Multimodal Reasoning? cs.CV · 2026-04-21 · unverdicted · none · ref 13

    Current video models succeed on basic understanding but achieve under 25% success on logically grounded generation and near 0% on interactive generation, exposing gaps in multimodal reasoning.

  • FUSE: Ensembling Verifiers with Zero Labeled Data stat.ML · 2026-04-20 · unverdicted · none · ref 28

    FUSE ensembles verifiers unsupervisedly by controlling their conditional dependencies to improve spectral ensembling algorithms, matching or exceeding semi-supervised baselines on benchmarks including GPQA Diamond and Humanity's Last Exam.