Benchmark-specific training maps to shift bribery and is NP-hard under Borda and mean win rate; mean win rate has the highest instance-level robustness (median 22 tasks on BBH) among tested aggregation rules.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
baseline 1polarities
baseline 1representative citing papers
Context-ordinal Nash equilibria are defined via social choice aggregation of ordinal preferences, shown to exist under mild conditions, with regularization, approximation, regret notions, complexity results, and learning rules developed.
Analysis of the LMArena dataset reveals heavy topic skew and varying model rankings, leading to an interactive visualization tool for users to define custom evaluation priorities on LLM leaderboards.
citing papers explorer
-
How Hard is it to Rig a Benchmark? A Social Choice Analysis of Leaderboard Robustness
Benchmark-specific training maps to shift bribery and is NP-hard under Borda and mean win rate; mean win rate has the highest instance-level robustness (median 22 tasks on BBH) among tested aggregation rules.
-
Nash without Numbers: A Social Choice Approach to Mixed Equilibria in Context-Ordinal Games
Context-ordinal Nash equilibria are defined via social choice aggregation of ordinal preferences, shown to exist under mild conditions, with regularization, approximation, regret notions, complexity results, and learning rules developed.
-
Who Defines "Best"? Towards Interactive, User-Defined Evaluation of LLM Leaderboards
Analysis of the LMArena dataset reveals heavy topic skew and varying model rankings, leading to an interactive visualization tool for users to define custom evaluation priorities on LLM leaderboards.