ComplexConstraints and Beyond: Expert Rubrics for RLVR

· 2026 · cs.AI · arXiv 2606.09118

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

As LLM capabilities advance rapidly, the evaluation methods used to assess them increasingly lag behind. Traditional benchmarks relied on programmatic verification of narrow, surface-level constraints, but real-world instruction following and agentic tasks demand assessment of nuanced, context-dependent behaviors that resist simple scripted checks. We present a systematic analysis of expert-curated rubric-based evaluation as an alternative paradigm, drawing on empirical evidence from two domains: complex instruction following and enterprise agentic tasks. We first articulate five design principles for constructing high-quality rubrics, including Maximum Viable Atomicity, intent-aware criterion design, and iterative LLM-judge calibration. To validate these principles, we introduce ComplexConstraints, a new expert-curated instruction-following dataset in which each prompt is paired with 10-40 atomic rubric criteria. We demonstrate that these expert rubrics are not only better evaluation instruments but also highly effective training signals: training on approximately 1,000 ComplexConstraints examples yields +15.5% improvement for a 4B-parameter model and +12.2% for a 235B-parameter model on instruction following, while single-epoch RL training on a rubric-graded enterprise environment produces gains that transfer to out-of-distribution benchmarks the model was never trained on (+4.5% BFCL, +7.4% Tau2-Bench, +6.8% Tool-Decathlon). Our findings establish that expert-authored rubrics improve both the measurement and the development of frontier LLM capabilities, serving as effective evaluation and RL training signals.

representative citing papers

From Holistic Evaluation to Structured Criteria: Rubrics Across the Evolving LLM Landscape

cs.CL · 2026-06-07 · unverdicted · novelty 3.0

The paper frames rubrics as a recurring structured-criteria approach that decomposes holistic judgments at evaluative, training, and intrinsic levels in LLM research.

citing papers explorer

Showing 1 of 1 citing paper.

From Holistic Evaluation to Structured Criteria: Rubrics Across the Evolving LLM Landscape cs.CL · 2026-06-07 · unverdicted · none · ref 6 · internal anchor
The paper frames rubrics as a recurring structured-criteria approach that decomposes holistic judgments at evaluative, training, and intrinsic levels in LLM research.

ComplexConstraints and Beyond: Expert Rubrics for RLVR

fields

years

verdicts

representative citing papers

citing papers explorer