The verifier failed because it accepted the presence of several relevant examples as full credit without enforcing that at least three distinct items each be explicitly explained

Safety appropriate for medical context · 2000 · arXiv 2000.29250

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Reward Hacking in Rubric-Based Reinforcement Learning

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

Rubric-based RL verifiers can be gamed via partial criterion satisfaction and implicit-to-explicit tricks, yielding proxy gains that do not improve quality under rubric-free judges; stronger verifiers reduce but do not eliminate the mismatch.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Reward Hacking in Rubric-Based Reinforcement Learning cs.AI · 2026-05-12 · unverdicted · none · ref 43
Rubric-based RL verifiers can be gamed via partial criterion satisfaction and implicit-to-explicit tricks, yielding proxy gains that do not improve quality under rubric-free judges; stronger verifiers reduce but do not eliminate the mismatch.

The verifier failed because it accepted the presence of several relevant examples as full credit without enforcing that at least three distinct items each be explicitly explained

fields

years

verdicts

representative citing papers

citing papers explorer