The paper identifies confounds in RLVR evaluations that inflate apparent gains and proposes a minimum standard for budget-matched, contamination-aware assessment with calibration tracking.
Mitigating many-shot jailbreaking
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
Dual-space semantic-character mutations on prompts achieve higher misuse success rates against DeepSeek than single-space attacks alone.
citing papers explorer
-
Position: The Hidden Costs and Measurement Gaps of Reinforcement Learning with Verifiable Rewards
The paper identifies confounds in RLVR evaluations that inflate apparent gains and proposes a minimum standard for budget-matched, contamination-aware assessment with calibration tracking.
-
DeepSeek Robustness Against Semantic-Character Dual-Space Mutated Prompt Injection
Dual-space semantic-character mutations on prompts achieve higher misuse success rates against DeepSeek than single-space attacks alone.