Recognition: 2 theorem links
· Lean TheoremAggregating Elo Ratings: An Axiomatization
Pith reviewed 2026-05-12 01:56 UTC · model grok-4.3
The pith
The only way to reduce a vector of Elo ratings to a single Elo-scale rating that meets three consistency conditions is to convert each to its underlying strength, take the weighted arithmetic mean, and convert back.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The unique rating rule satisfying same-scale normalization, recursive consistency, and marginal Elo-strength consistency converts each component rating to its Elo strength, takes the weighted arithmetic mean of those strengths, and converts the result back to the Elo scale.
What carries the argument
The marginal Elo-strength consistency condition, which equates the ratio of marginal contributions to the combined rating (for two equally weighted coordinates) with the ordinary Elo odds ratio between the two inputs.
If this is right
- The resulting rule is distinct from both direct arithmetic averaging of the rating numbers and from treating the combined rating as the outcome of a random-format lottery.
- Each of the three axioms is independent of the others, so dropping any one allows additional aggregation rules.
- The rule supplies a concrete method for combining classical, rapid, and blitz chess ratings into a single overall rating.
Where Pith is reading between the lines
- The same conversion-to-strength step could be applied to other rating systems that share the logistic or exponential relationship between rating difference and win probability.
- Platforms could adopt the rule to guarantee that the order in which sub-ratings are grouped does not alter the final scalar rating.
- The construction treats Elo ratings as logarithmic measures of strength, suggesting analogous aggregation procedures whenever performance is scored on an odds-ratio scale.
Load-bearing premise
The marginal Elo-strength consistency condition requires that for any two equally weighted ratings the ratio of their marginal contributions to the combined rating equals the standard Elo odds between them.
What would settle it
A concrete counter-example consisting of two or three specific rating values and weights for which either the strength-averaging rule violates one of the three axioms or some other aggregation rule satisfies all three axioms yet produces a numerically different combined rating.
read the original abstract
Many environments assign several Elo ratings to the same agent: a chess player has classical, rapid, and blitz ratings; an online platform may rate by time control, mode, or format; an evaluator may rate performance across tasks or roles. This paper axiomatizes when such a vector of ratings can be reduced to a single scalar rating that is itself on the Elo scale. We impose three substantive conditions: same-scale normalization (a uniform profile keeps its rating), recursive consistency (aggregating in blocks gives the same answer as aggregating directly, provided each block carries the total weight of its members), and marginal Elo-strength consistency (for two equally weighted coordinates, the ratio of marginal contributions to the combined rating equals the ordinary Elo odds). The unique rating rule satisfying these conditions converts each component to its Elo strength, takes a weighted arithmetic mean of strengths, and converts back. We show how this rule differs from a random-format lottery and from rating-scale averaging, prove the axioms are independent, and illustrate the rule on combining classical, rapid, and blitz ratings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript axiomatizes the aggregation of multiple Elo ratings for the same agent into a single scalar rating that remains on the Elo scale. It imposes three axioms: same-scale normalization (a uniform profile of ratings is preserved), recursive consistency (block-wise aggregation with preserved total weights equals direct aggregation), and marginal Elo-strength consistency (for any two equally weighted coordinates, the ratio of their marginal contributions to the aggregate equals the standard Elo odds ratio). The paper proves that the unique rule satisfying these axioms maps each rating to its underlying strength via the Elo transformation, computes the weighted arithmetic mean of the strengths, and maps the result back to the Elo scale. It further establishes independence of the three axioms and shows that the rule differs from both a random-format lottery and direct averaging on the rating scale, with an illustration on combining classical, rapid, and blitz chess ratings.
Significance. If the uniqueness result holds, the paper supplies a clean, parameter-free aggregation rule derived directly from three externally motivated axioms without circularity or ad-hoc fitting. This is useful for chess federations and online platforms that maintain multiple ratings per agent. The independence proofs and explicit differentiation from intuitive alternatives (lottery, scale averaging) add robustness. The approach aligns with standard functional-equation techniques in economic theory and yields a falsifiable prediction for how aggregated ratings should behave under the stated conditions.
minor comments (2)
- The manuscript would benefit from an explicit statement of the Elo strength transformation function (including the base-10 logarithm and scaling constant) at the first point of use, rather than assuming familiarity.
- A short numerical example with concrete Elo values (e.g., 2700 classical, 2600 rapid) showing the numerical difference from scale averaging would strengthen the illustration section.
Simulated Author's Rebuttal
We thank the referee for the positive and accurate summary of our manuscript, which correctly identifies the three axioms and the uniqueness result for the Elo-strength aggregation rule. We appreciate the recognition of its potential usefulness for chess federations and rating platforms, as well as the note on axiom independence and differentiation from alternatives. Since the report contains no specific major comments or requested changes, we have no point-by-point revisions to address at this stage.
Circularity Check
No significant circularity
full rationale
The paper presents a standard axiomatic characterization: three externally motivated conditions (same-scale normalization, recursive consistency, and marginal Elo-strength consistency) are stated as independent primitives, shown to be independent, and used to derive a unique functional form (Elo-to-strength conversion, weighted arithmetic mean on the strength scale, and return to Elo). No step reduces a derived quantity to a fitted parameter, renames an input, or relies on a self-citation chain; the marginal consistency axiom directly encodes the target odds ratio without presupposing the final aggregator. The derivation is therefore self-contained against the stated axioms and does not exhibit any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (3)
- domain assumption same-scale normalization (a uniform profile keeps its rating)
- domain assumption recursive consistency (aggregating in blocks gives the same answer as aggregating directly, provided each block carries the total weight of its members)
- domain assumption marginal Elo-strength consistency (for two equally weighted coordinates, the ratio of marginal contributions to the combined rating equals the ordinary Elo odds)
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel (J uniqueness) unclearTheorem 1: Cλ(R) = 400 log10( (∑ λ_i 10^{Ri/400}) / ∑ λ_i ); equivalently q(Cλ(R)) = weighted arithmetic mean of q(Ri) with q(r)=10^{r/400}
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection (coupling combiner forces bilinear J branch) unclearAssumption 4 (marginal Elo-strength consistency): ∂x C(1,1)(x,y) / ∂y C(1,1)(x,y) = 10^{(x-y)/400}
Reference graph
Works this paper leans on
-
[1]
Bradley, Ralph Allan, and Milton E. Terry. 1952. ``Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons.'' Biometrika 39(3/4): 324--345. https://doi.org/10.1093/biomet/39.3-4.324
-
[2]
Elo, Arpad E. 1978. The Rating of Chessplayers, Past and Present. New York: Arco
work page 1978
-
[3]
Glickman, Mark E. 1995. ``A Comprehensive Guide to Chess Ratings.'' American Chess Journal 3: 59--102. https://www.glicko.net/research/acjpaper.pdf
work page 1995
-
[4]
Luce, R. Duncan. 1959. Individual Choice Behavior: A Theoretical Analysis. New York: Wiley
work page 1959
-
[5]
Skaperdas, Stergios. 1996. ``Contest Success Functions.'' Economic Theory 7(2): 283--290. https://doi.org/10.1007/BF01213906
-
[6]
Sonas, Jeff. 2002. ``The Sonas Rating Formula -- Better than Elo?'' ChessBase, October 22. https://en.chessbase.com/post/the-sonas-rating-formula-better-than-elo. Accessed May 8, 2026
work page 2002
-
[7]
Tullock, Gordon. 1980. ``Efficient Rent Seeking.'' In Toward a Theory of the Rent-Seeking Society, edited by James M. Buchanan, Robert D. Tollison, and Gordon Tullock, 97--112. College Station: Texas A&M University Press
work page 1980
-
[8]
Universal Rating System. 2017. ``Universal Rating System.'' http://universalrating.com/about-us.php. Accessed May 8, 2026
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.