QUBRIC co-designs queries and rubrics via teacher key points, contrastive generation, and learnability filtering to support GRPO training, yielding +5.5 on ArenaHard and +6.3 average transfer to legal/moral/narrative benchmarks.
arXiv preprint arXiv:2601.18533 , year=
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
QUBRIC: Co-Designing Queries and Rubrics for RL Beyond Verifiable Rewards
QUBRIC co-designs queries and rubrics via teacher key points, contrastive generation, and learnability filtering to support GRPO training, yielding +5.5 on ArenaHard and +6.3 average transfer to legal/moral/narrative benchmarks.