Recognition: 2 theorem links
· Lean TheoremNash without Numbers: A Social Choice Approach to Mixed Equilibria in Context-Ordinal Games
Pith reviewed 2026-05-11 02:34 UTC · model grok-4.3
The pith
Nash equilibrium can be defined and computed using only ordinal action rankings aggregated via social choice rules.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A mixed strategy profile is a context-ordinal Nash equilibrium when each player's mixed strategy is a best response under a social choice aggregation of their ordinal preference ranking over pure actions, given the opponents' mixed strategies. Such equilibria exist whenever the aggregation satisfies standard mild conditions drawn from social choice theory. The authors also define regularized and approximate versions together with regret measures and give learning dynamics that converge to these equilibria.
What carries the argument
Context-ordinal best response formed by applying a social choice aggregation function to a player's ordinal ranking of actions given opponents' joint mixed strategy.
If this is right
- Equilibria can be identified from elicited rankings without any numerical utility elicitation.
- Regularized and approximate equilibria remain well-defined and computable when rankings contain noise or omissions.
- Learning rules based on the aggregated best-response condition can be run directly on ordinal feedback.
- Complexity results for simple games bound the cost of computing exact or approximate equilibria under common aggregation methods.
Where Pith is reading between the lines
- The framework reduces the cost of running equilibrium-based experiments with human subjects by replacing numerical payoff entry with simple rankings.
- It supplies a direct route for applying equilibrium analysis to multi-agent systems where only qualitative preference data are observed.
- Extensions to incomplete or cyclic rankings would immediately enlarge the set of games to which the method applies.
Load-bearing premise
Each player can supply a consistent ordinal ranking of their own actions for any fixed joint action of the others, and the chosen aggregation rule satisfies conditions that guarantee an equilibrium exists.
What would settle it
A concrete finite game in which every player supplies transitive ordinal rankings over actions in every context yet no mixed strategy profile satisfies the aggregated best-response condition for the chosen aggregation rule.
Figures
read the original abstract
Nash equilibrium serves as a fundamental mathematical tool in economics and game theory. However, it classically assumes knowledge of player utilities, whereas economics generally regards preferences as more fundamental. To leverage equilibrium analysis in strategic scenarios, one must first elicit numerical utilities consistent with player preferences, a delicate and time-consuming process. In this work, we forgo precise utilities and generalize the Nash equilibrium to a setting where we only assume a player is capable of providing an ordinal ranking of their actions within the context of other players' joint actions. The key technical challenge is to rethink the definition of a best-response. While the classical definition identifies actions maximizing expected payoff, we naturally look towards social choice theory for how to aggregate preferences to identify the most preferred actions. We define this generalized notion of a context-ordinal Nash equilibrium, establish its existence under mild conditions on aggregation methods, introduce notions of regularization, approximation, and regret, explore complexity for simple settings, and develop learning rules for computing such equilibria. In doing so, we provide a generalization of Nash equilibrium and demonstrate its direct applicability to elicited preferences in human experiments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces context-ordinal Nash equilibria as a generalization of classical Nash equilibrium for games in which players supply only ordinal rankings of their pure actions conditional on other players' pure joint-action profiles. Best responses are redefined via social-choice aggregation of these rankings rather than expected-utility maximization; the manuscript claims existence of mixed-strategy equilibria under mild conditions on the aggregator, and develops accompanying notions of regularization, approximation, regret, complexity results for simple cases, and learning dynamics for computing the equilibria. The stated goal is to enable equilibrium analysis directly from elicited ordinal preferences without cardinal utility elicitation.
Significance. If the existence argument and mixed-strategy extension are made rigorous, the work would provide a parameter-free bridge between game theory and social choice that permits equilibrium analysis in domains where only ordinal data are available, such as human-subject experiments. The additional development of approximation, regret, and learning rules would further increase applicability.
major comments (1)
- [Definition of context-ordinal Nash equilibrium and existence theorem] The central existence claim for mixed equilibria rests on a best-response correspondence defined via aggregation of ordinal rankings supplied for pure contexts. No extension rule is supplied for how these rankings induce preferences (or aggregated choices) when the context is itself a mixed strategy profile, i.e., a lottery over pure joint actions. Without such a rule (e.g., via stochastic dominance, expected rank, or an explicit lottery extension), the best-response set is either empty or undefined for non-degenerate mixed profiles, rendering the fixed-point argument inapplicable.
minor comments (1)
- [Existence theorem] The abstract asserts existence 'under mild conditions on aggregation methods' but does not list the precise axioms; the main text should state them explicitly (e.g., neutrality, monotonicity, or continuity properties) at the point where the existence theorem is proved.
Simulated Author's Rebuttal
We thank the referee for the careful review and constructive feedback. The positive assessment of the work's potential significance is appreciated, and we agree that the technical point on extending the best-response definition to mixed profiles requires clarification to make the existence argument fully rigorous. We address the major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Definition of context-ordinal Nash equilibrium and existence theorem] The central existence claim for mixed equilibria rests on a best-response correspondence defined via aggregation of ordinal rankings supplied for pure contexts. No extension rule is supplied for how these rankings induce preferences (or aggregated choices) when the context is itself a mixed strategy profile, i.e., a lottery over pure joint actions. Without such a rule (e.g., via stochastic dominance, expected rank, or an explicit lottery extension), the best-response set is either empty or undefined for non-degenerate mixed profiles, rendering the fixed-point argument inapplicable.
Authors: We agree that the manuscript as submitted does not explicitly supply a lottery extension rule for ordinal rankings when the context is a mixed strategy profile, which leaves the best-response correspondence formally undefined for non-pure profiles and weakens the fixed-point argument. In the revision we will add a new subsection that defines such an extension. Specifically, we will extend each player's ordinal ranking over pure actions (conditional on a pure joint-action context) to a mixed context by applying the aggregator to the vector of expected ranks induced by the lottery; this preserves the aggregator's mild conditions (e.g., continuity and monotonicity) and ensures the resulting best-response correspondence is non-empty, convex-valued, and upper hemicontinuous on the mixed-strategy simplex. We will then verify that Kakutani's fixed-point theorem applies directly, thereby establishing existence of context-ordinal Nash equilibria. This addition will be parameter-free and consistent with the social-choice aggregation approach already used for pure contexts. revision: yes
Circularity Check
No circularity: context-ordinal Nash equilibrium defined from external social choice axioms
full rationale
The paper introduces a new equilibrium concept by replacing expected-utility best responses with social-choice aggregation of ordinal rankings supplied over pure action profiles. Existence is asserted under 'mild conditions on aggregation methods,' which are treated as independent inputs drawn from social choice theory rather than derived or fitted within the paper. No equations reduce a claimed prediction or existence result to a parameter that was itself calibrated on the target quantity; no self-citation chain is invoked to justify the core fixed-point argument; and the construction does not rename or smuggle in a known result under new coordinates. The derivation therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Aggregation methods satisfy mild conditions sufficient for existence of context-ordinal Nash equilibrium
- domain assumption Players can provide ordinal rankings of actions given others' joint actions
Reference graph
Works this paper leans on
-
[1]
A note on strictly competi- tive games
Ilan Adler, Constantinos Daskalakis, and Christos H Papadimitriou. A note on strictly competi- tive games. InInternational Workshop on Internet and Network Economics, pages 471–474. Springer, 2009
work page 2009
-
[2]
Akshay Agrawal, Brandon Amos, Shane Barratt, Stephen Boyd, Steven Diamond, and J Zico Kolter. Differentiable convex optimization layers.Advances in Neural Information Processing Systems, 32, 2019
work page 2019
-
[3]
Alfred V Aho.Data Structures and Algorithms. Addison-Wesley, 1983
work page 1983
-
[4]
Utilitarian distortion under probabilistic voting.arXiv preprint arXiv:2602.11152, 2026
Hamidreza Alipour and Mohak Goyal. Utilitarian distortion under probabilistic voting.arXiv preprint arXiv:2602.11152, 2026
-
[5]
Springer Science & Business Media, 2006
Charalambos D Aliprantis and Kim C Border.Infinite Dimensional Analysis: A Hitchhiker’s Guide. Springer Science & Business Media, 2006. ISBN 978-3-540-29587-7. doi: 10.1007/ 3-540-29587-9_17
work page 2006
-
[6]
Equilibria in ordinal games: A framework based on possibility theory
Nahla Ben Amor, Hélène Fargier, and Régis Sabbadin. Equilibria in ordinal games: A framework based on possibility theory. InProceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI’17), page 105–111, 2017
work page 2017
-
[7]
Anton Bakhtin, David J Wu, Adam Lerer, Jonathan Gray, Athul Paul Jacob, Gabriele Farina, Alexander H Miller, and Noam Brown. Mastering the game of no-press diplomacy via human- regularized reinforcement learning and planning.arXiv preprint arXiv:2210.05492, 2022
-
[8]
Mastering the game of no-press diplomacy via human- regularized reinforcement learning and planning
Anton Bakhtin, David J Wu, Adam Lerer, Jonathan Gray, Athul Paul Jacob, Gabriele Farina, Alexander H Miller, and Noam Brown. Mastering the game of no-press diplomacy via human- regularized reinforcement learning and planning. InThe Eleventh International Conference on Learning Representations, 2023
work page 2023
-
[9]
David Balduzzi, Karl Tuyls, Julien Perolat, and Thore Graepel. Re-evaluating evaluation. Advances in Neural Information Processing Systems, 31, 2018
work page 2018
-
[10]
Michel Balinski and Rida Laraki. A theory of measuring, electing, and ranking.Proceedings of the National Academy of Sciences, 104(21):8720–8725, 2007. doi: 10.1073/pnas.0702634104. URLhttps://www.pnas.org/doi/abs/10.1073/pnas.0702634104
-
[11]
Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization.Operations Research Letters, 31(3):167–175, 2003
work page 2003
-
[12]
Michel Benaïm and Mathieu Faure. Consistency of vanishingly smooth fictitious play.Mathe- matics of Operations Research, 38(3):437–450, 2013
work page 2013
-
[13]
Michel Benaım and Morris W Hirsch. Mixed equilibria and dynamical systems arising from fictitious play in perturbed games.Games and Economic Behavior, 29(1-2):36–72, 1999
work page 1999
-
[14]
The graph structure of two-player games.Scientific Reports, 13(1):1833, 2023
Oliver Biggar and Iman Shames. The graph structure of two-player games.Scientific Reports, 13(1):1833, 2023
work page 2023
-
[15]
Andreas Born, Eva Ranehill, and Anna Sandberg. Gender and Willingness to Lead: Does the Gender Composition of Teams Matter?The Review of Economics and Statistics, 104(2): 259–275, 03 2022. ISSN 0034-6535. doi: 10.1162/rest_a_00955. URL https://doi.org/ 10.1162/rest_a_00955. 10
-
[16]
JAX: composable transformations of Python+NumPy programs, 2018
James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/google/jax
work page 2018
-
[17]
Consistent probabilistic social choice
Florian Brandl, Felix Brandt, and Hans Georg Seedig. Consistent probabilistic social choice. Econometrica, 84(5):1839–1880, 2016
work page 2016
-
[18]
Cambridge University Press, 2016
Felix Brandt, Vincent Conitzer, Ulle Endriss, Jérôme Lang, and Ariel D Procaccia.Handbook of Computational Social Choice. Cambridge University Press, 2016
work page 2016
-
[19]
Iterative solution of games by fictitious play.Act
George W Brown. Iterative solution of games by fictitious play.Act. Anal. Prod Allocation, 13 (1):374, 1951
work page 1951
-
[20]
Ozan Candogan, Ishai Menache, Asuman Ozdaglar, and Pablo A Parrilo. Flows and decomposi- tions of games: Harmonic and potential games.Mathematics of Operations Research, 36(3): 474–503, 2011
work page 2011
-
[21]
Ioannis Caragiannis, Swaprava Nath, Ariel D Procaccia, and Nisarg Shah. Subset selection via implicit utilitarian voting.Journal of Artificial Intelligence Research, 58:123–152, 2017
work page 2017
-
[22]
Approximation guarantees for fictitious play
Vincent Conitzer. Approximation guarantees for fictitious play. In2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 636–643. IEEE, 2009
work page 2009
-
[23]
The complexity of computing robust mediated equilibria in ordinal games
Vincent Conitzer. The complexity of computing robust mediated equilibria in ordinal games. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 9607–9615, 2024
work page 2024
-
[24]
Ordinal games and generalized Nash and Stackelberg solutions
JB Cruz and Marwan A Simaan. Ordinal games and generalized Nash and Stackelberg solutions. Journal of Optimization Theory and Applications, 107:205–222, 2000
work page 2000
-
[25]
Constantinos Daskalakis, Ian Gemp, Yanchen Jiang, Renato Paes Leme, Christos Papadimitriou, and Georgios Piliouras. Charting the shapes of stories with game theory.arXiv preprint arXiv:2412.05747, 2024
-
[26]
Approximating nash equilibria using small-support strategies
Tomas Feder, Hamid Nazerzadeh, and Amin Saberi. Approximating nash equilibria using small-support strategies. InProceedings of the 8th ACM Conference on Electronic Commerce, pages 352–354, 2007
work page 2007
-
[27]
Peter C Fishburn. Probabilistic social choice based on simple voting comparisons.The Review of Economic Studies, 51(4):683–692, 1984
work page 1984
-
[28]
Gerald B Folland.Real Analysis: Modern Techniques and Their Applications, 2nd ed.John Wiley & Sons, 1999
work page 1999
-
[29]
Drew Fudenberg and David K Levine. Consistency and cautious fictitious play.Journal of Economic Dynamics and Control, 19(5-7):1065–1089, 1995
work page 1995
- [30]
-
[31]
D3c: Reducing the price of anarchy in multi-agent learning
Ian Gemp, Kevin R McKee, Richard Everett, Edgar Duéñez-Guzmán, Yoram Bachrach, David Balduzzi, and Andrea Tacchetti. D3c: Reducing the price of anarchy in multi-agent learning. InProceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, pages 498–506, 2022
work page 2022
-
[32]
Sample-based approximation of nash in large many-player games via gradient descent
Ian Gemp, Rahul Savani, Marc Lanctot, Yoram Bachrach, Thomas Anthony, Richard Everett, Andrea Tacchetti, Tom Eccles, and János Kramár. Sample-based approximation of nash in large many-player games via gradient descent. InProceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, pages 507–515, 2022
work page 2022
-
[33]
Approximating nash equilibria in normal- form games via stochastic optimization
Ian Gemp, Luke Marris, and Georgios Piliouras. Approximating nash equilibria in normal- form games via stochastic optimization. InThe Twelfth International Conference on Learning Representations, 2024. 11
work page 2024
-
[34]
Con- vex markov games: A new frontier for multi-agent reinforcement learning
Ian Gemp, Andreas Alexander Haupt, Luke Marris, Siqi Liu, and Georgios Piliouras. Con- vex markov games: A new frontier for multi-agent reinforcement learning. InForty-second International Conference on Machine Learning, 2025
work page 2025
-
[35]
No-regret learning in convex games
Geoffrey J Gordon, Amy Greenwald, and Casey Marks. No-regret learning in convex games. In Proceedings of the 25th International Conference on Machine learning, pages 360–367, 2008
work page 2008
-
[36]
A general theory of equilibrium selection in games
John C Harsanyi, Reinhard Selten, et al. A general theory of equilibrium selection in games. MIT Press Books, 1, 1988
work page 1988
-
[37]
Rainbow: Combining improve- ments in deep reinforcement learning
Matteo Hessel, Joseph Modayil, Hado Van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, and David Silver. Rainbow: Combining improve- ments in deep reinforcement learning. InProceedings of the AAAI conference on artificial intelligence, volume 32, 2018
work page 2018
-
[38]
On the global convergence of stochastic fictitious play.Econometrica, 70(6):2265–2294, 2002
Josef Hofbauer and William H Sandholm. On the global convergence of stochastic fictitious play.Econometrica, 70(6):2265–2294, 2002
work page 2002
-
[39]
A generalization of brouwer’s fixed point theorem.Duke Mathematical Journal, 8(3):457, 1941
Shizuo Kakutani. A generalization of brouwer’s fixed point theorem.Duke Mathematical Journal, 8(3):457, 1941
work page 1941
- [40]
-
[41]
Evaluating agents using social choice theory
Marc Lanctot, Kate Larson, Yoram Bachrach, Luke Marris, Zun Li, Avishkar Bhoopchand, Thomas Anthony, Brian Tanner, and Anna Koop. Evaluating agents using social choice theory. arXiv preprint arXiv:2312.03121, 2023
-
[42]
Davide Legacci, Panayotis Mertikopoulos, Christos Papadimitriou, Georgios Piliouras, and Bary Pradelski. No-regret learning in harmonic games: Extrapolation in the face of conflicting interests.Advances in Neural Information Processing Systems, 37:123637–123674, 2024
work page 2024
-
[43]
Mingyang Liu, Yongshan Chen, Zhiyuan Fan, Gabriele Farina, Asuman Ozdaglar, and Kaiqing Zhang. Online learning and equilibrium computation with ranking feedback.arXiv preprint arXiv:2603.19221, 2026
-
[44]
Re- evaluating open-ended evaluation of large language models
Siqi Liu, Ian Gemp, Luke Marris, Georgios Piliouras, Nicolas Heess, and Marc Lanctot. Re- evaluating open-ended evaluation of large language models. InThe Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[45]
Deviation ratings: A general, clone-invariant rating method.arXiv preprint arXiv:2502.11645, 2025
Luke Marris, Siqi Liu, Ian Gemp, Georgios Piliouras, and Marc Lanctot. Deviation ratings: A general, clone-invariant rating method.arXiv preprint arXiv:2502.11645, 2025
-
[46]
Jackpot! alignment as a maximal lottery
Roberto-Rafael Maura-Rivero, Marc Lanctot, Francesco Visin, and Kate Larson. Jackpot! alignment as a maximal lottery.arXiv preprint arXiv:2501.19266, 2025
-
[47]
Quantal response equilibria for normal form games.Games and Economic Behavior, 10(1):6–38, 1995
Richard D McKelvey and Thomas R Palfrey. Quantal response equilibria for normal form games.Games and Economic Behavior, 10(1):6–38, 1995
work page 1995
-
[48]
H Brendan McMahan. A survey of algorithms and analysis for adaptive online learning.Journal of Machine Learning Research, 18(90):1–50, 2017
work page 2017
-
[49]
A theory of voting equilibria.American Political science review, 87(1):102–114, 1993
Roger B Myerson and Robert J Weber. A theory of voting equilibria.American Political science review, 87(1):102–114, 1993
work page 1993
-
[50]
John F Nash Jr. Equilibrium points in n-person games.Proceedings of the national academy of sciences, 36(1):48–49, 1950
work page 1950
-
[51]
Panos M Pardalos and Stephen A Vavasis. Quadratic programming with one negative eigenvalue is np-hard.Journal of Global optimization, 1(1):15–22, 1991
work page 1991
-
[52]
Julien Perolat, Remi Munos, Jean-Baptiste Lespiau, Shayegan Omidshafiei, Mark Rowland, Pedro Ortega, Neil Burch, Thomas Anthony, David Balduzzi, Bart De Vylder, et al. From poincaré recurrence to convergence in imperfect information games: Finding equilibrium via regularization. InInternational Conference on Machine Learning, pages 8525–8535. PMLR, 2021. 12
work page 2021
-
[53]
The distortion of cardinal preferences in voting
Ariel D Procaccia and Jeffrey S Rosenschein. The distortion of cardinal preferences in voting. InInternational Workshop on Cooperative Information Agents, pages 317–331. Springer, 2006
work page 2006
-
[54]
To mask or to mirror: Human-AI alignment in collective reasoning
Crystal Qian, Aaron Parisi, Clémentine Bouleau, Vivian Tsai, Maël Lebreton, and Lucas Dixon. To mask or to mirror: Human-AI alignment in collective reasoning. 2025. URL https://arxiv.org/abs/2510.01924
-
[55]
An iterative method of solving a game.Annals of Mathematics, 54(2):296–301, 1951
Julia Robinson. An iterative method of solving a game.Annals of Mathematics, 54(2):296–301, 1951
work page 1951
-
[56]
Springer Science & Business Media, 1995
Donald G Saari.Basic Geometry of Voting, volume 12. Springer Science & Business Media, 1995
work page 1995
-
[57]
Shai Shalev-Shwartz et al. Online learning and online convex optimization.Foundations and Trends® in Machine Learning, 4(2):107–194, 2012
work page 2012
-
[58]
Samuel Sokota, Ryan D’Orazio, J Zico Kolter, Nicolas Loizou, Marc Lanctot, Ioannis Mitliagkas, Noam Brown, and Christian Kroer. A unified approach to reinforcement learning, quantal response equilibria, and two-player zero-sum games. InThe Eleventh International Conference on Learning Representations, 2023
work page 2023
-
[59]
Arun Suggala and Praneeth Netrapalli. Follow the perturbed leader: Optimism and fast parallel algorithms for smooth minimax games.Advances in Neural Information Processing Systems, 33:22316–22326, 2020. 13 A Context Ordinal Equilibria A.1 Doubly Probabilistic Social Choice Functions Many voting rules satisfy or can be adapted to satisfy Definition 2 of a ...
work page 2020
-
[60]
depending on what information is needed to compute them. C1 uses only pairwise majority relationships (e.g., Copeland), C2 uses weighted pairwise majority relationships (e.g., ranked pairs, Borda), and then C3 is other rules (e.g., Dodgson). The lottery representation enables at least C1 and C2rules. Axioms’ Effect on dpSCFsSocial choice theory is axiomat...
-
[61]
Indicating interest- You will first be asked to indicate how much you want to become the group leader on a scale from 0 to 10
-
[62]
Ranking your teammates- You will rank your three teammates, with your preferred leader at position 1, the second most preferred leader at position 2, and the third most preferred leader at position 3. You cannot vote for yourself. We will use your answers to these two questions to select the leader: • The two group members who express the most interest in...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.