Models benchmarking as principal-agent game, derives welfare loss from welfare alignment, improvability and variance, and applies an audit framework to OLMES items.
Strategic candidacy in generative ai arenas.arXiv preprint arXiv:2603.26891, 2026
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Welfare, Improvability, and Variance: A Principal-Agent Approach to Optimal Benchmark Item Aggregation
Models benchmarking as principal-agent game, derives welfare loss from welfare alignment, improvability and variance, and applies an audit framework to OLMES items.