arxiv: 2605.08989 · v1 · submitted 2026-05-09 · 💰 econ.TH · math.OC

Recognition: 2 theorem links

· Lean Theorem

Aggregating Elo Ratings: An Axiomatization

Mehmet Mars Seven

Pith reviewed 2026-05-12 01:56 UTC · model grok-4.3

classification 💰 econ.TH math.OC

keywords Elo ratingsrating aggregationaxiomatic characterizationconsistency conditionschess ratingsstrength averagingrecursive aggregationmarginal consistency

0 comments

The pith

The only way to reduce a vector of Elo ratings to a single Elo-scale rating that meets three consistency conditions is to convert each to its underlying strength, take the weighted arithmetic mean, and convert back.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to characterize a principled method for collapsing several Elo ratings that describe the same agent into one overall rating that remains on the Elo scale. It requires that any such rule obey same-scale normalization (identical inputs produce the identical output rating), recursive consistency (aggregating ratings in successive blocks with preserved total weights yields the same result as aggregating all at once), and marginal Elo-strength consistency (the ratio of marginal effects on the combined rating for two equally weighted inputs equals the standard Elo odds ratio between them). Under these requirements the unique rule is the one that maps each rating to its Elo strength, computes the weighted arithmetic average of those strengths, and maps the average back to the Elo scale. This matters for chess players with classical, rapid, and blitz ratings, for online platforms that rate by time control or format, and for any evaluator that scores performance across multiple tasks or roles.

Core claim

The unique rating rule satisfying same-scale normalization, recursive consistency, and marginal Elo-strength consistency converts each component rating to its Elo strength, takes the weighted arithmetic mean of those strengths, and converts the result back to the Elo scale.

What carries the argument

The marginal Elo-strength consistency condition, which equates the ratio of marginal contributions to the combined rating (for two equally weighted coordinates) with the ordinary Elo odds ratio between the two inputs.

If this is right

The resulting rule is distinct from both direct arithmetic averaging of the rating numbers and from treating the combined rating as the outcome of a random-format lottery.
Each of the three axioms is independent of the others, so dropping any one allows additional aggregation rules.
The rule supplies a concrete method for combining classical, rapid, and blitz chess ratings into a single overall rating.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same conversion-to-strength step could be applied to other rating systems that share the logistic or exponential relationship between rating difference and win probability.
Platforms could adopt the rule to guarantee that the order in which sub-ratings are grouped does not alter the final scalar rating.
The construction treats Elo ratings as logarithmic measures of strength, suggesting analogous aggregation procedures whenever performance is scored on an odds-ratio scale.

Load-bearing premise

The marginal Elo-strength consistency condition requires that for any two equally weighted ratings the ratio of their marginal contributions to the combined rating equals the standard Elo odds between them.

What would settle it

A concrete counter-example consisting of two or three specific rating values and weights for which either the strength-averaging rule violates one of the three axioms or some other aggregation rule satisfies all three axioms yet produces a numerically different combined rating.

read the original abstract

Many environments assign several Elo ratings to the same agent: a chess player has classical, rapid, and blitz ratings; an online platform may rate by time control, mode, or format; an evaluator may rate performance across tasks or roles. This paper axiomatizes when such a vector of ratings can be reduced to a single scalar rating that is itself on the Elo scale. We impose three substantive conditions: same-scale normalization (a uniform profile keeps its rating), recursive consistency (aggregating in blocks gives the same answer as aggregating directly, provided each block carries the total weight of its members), and marginal Elo-strength consistency (for two equally weighted coordinates, the ratio of marginal contributions to the combined rating equals the ordinary Elo odds). The unique rating rule satisfying these conditions converts each component to its Elo strength, takes a weighted arithmetic mean of strengths, and converts back. We show how this rule differs from a random-format lottery and from rating-scale averaging, prove the axioms are independent, and illustrate the rule on combining classical, rapid, and blitz ratings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a clean axiomatic characterization of how to aggregate multiple Elo ratings into one, with uniqueness under three conditions.

read the letter

This paper axiomatizes the aggregation of multiple Elo ratings into one scalar rating. The unique rule that satisfies the three conditions is to map each rating to its underlying strength, take a weighted average of those strengths, and map back to the Elo scale. The main result is that this is the only rule meeting same-scale normalization, recursive consistency across blocks, and marginal Elo-strength consistency for equal-weight pairs. It differs from direct scale averaging or a random-format lottery, and the axioms are shown to be independent. That is the core contribution, and it is new as a characterization in this setting. The work does well in motivating the axioms from practical needs like preserving the rating scale and ensuring consistent aggregation no matter the order or grouping. The chess example with classical, rapid, and blitz ratings illustrates the rule without overclaiming. The derivation follows standard functional-equation steps and lands on the expected exponential-strength average, which aligns with how Elo is usually interpreted probabilistically. The soft spots are limited. The marginal consistency axiom is quite specific and tailored to force the arithmetic mean on the strength scale, so its appeal depends on whether one finds that condition natural. Without seeing the full independence proofs and any regularity conditions used, there is some uncertainty about edge cases, but nothing in the abstract suggests a gap. The paper stays theoretical and does not test the rule on large datasets, which is fine for an axiomatization but leaves empirical fit open. This is for researchers working on rating systems, sports analytics, or axiomatic decision theory. A reader who needs a justified way to combine multi-format scores would find the uniqueness result useful. It deserves peer review because the axioms are clearly stated, the result is focused, and the independence claim adds value even if the domain is narrow. I would send it to referees rather than desk reject.

Referee Report

0 major / 2 minor

Summary. The manuscript axiomatizes the aggregation of multiple Elo ratings for the same agent into a single scalar rating that remains on the Elo scale. It imposes three axioms: same-scale normalization (a uniform profile of ratings is preserved), recursive consistency (block-wise aggregation with preserved total weights equals direct aggregation), and marginal Elo-strength consistency (for any two equally weighted coordinates, the ratio of their marginal contributions to the aggregate equals the standard Elo odds ratio). The paper proves that the unique rule satisfying these axioms maps each rating to its underlying strength via the Elo transformation, computes the weighted arithmetic mean of the strengths, and maps the result back to the Elo scale. It further establishes independence of the three axioms and shows that the rule differs from both a random-format lottery and direct averaging on the rating scale, with an illustration on combining classical, rapid, and blitz chess ratings.

Significance. If the uniqueness result holds, the paper supplies a clean, parameter-free aggregation rule derived directly from three externally motivated axioms without circularity or ad-hoc fitting. This is useful for chess federations and online platforms that maintain multiple ratings per agent. The independence proofs and explicit differentiation from intuitive alternatives (lottery, scale averaging) add robustness. The approach aligns with standard functional-equation techniques in economic theory and yields a falsifiable prediction for how aggregated ratings should behave under the stated conditions.

minor comments (2)

The manuscript would benefit from an explicit statement of the Elo strength transformation function (including the base-10 logarithm and scaling constant) at the first point of use, rather than assuming familiarity.
A short numerical example with concrete Elo values (e.g., 2700 classical, 2600 rapid) showing the numerical difference from scale averaging would strengthen the illustration section.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and accurate summary of our manuscript, which correctly identifies the three axioms and the uniqueness result for the Elo-strength aggregation rule. We appreciate the recognition of its potential usefulness for chess federations and rating platforms, as well as the note on axiom independence and differentiation from alternatives. Since the report contains no specific major comments or requested changes, we have no point-by-point revisions to address at this stage.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents a standard axiomatic characterization: three externally motivated conditions (same-scale normalization, recursive consistency, and marginal Elo-strength consistency) are stated as independent primitives, shown to be independent, and used to derive a unique functional form (Elo-to-strength conversion, weighted arithmetic mean on the strength scale, and return to Elo). No step reduces a derived quantity to a fitted parameter, renames an input, or relies on a self-citation chain; the marginal consistency axiom directly encodes the target odds ratio without presupposing the final aggregator. The derivation is therefore self-contained against the stated axioms and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 0 invented entities

The central claim rests on the three substantive conditions stated in the abstract; no free parameters or invented entities are introduced.

axioms (3)

domain assumption same-scale normalization (a uniform profile keeps its rating)
Imposed as one of the three substantive conditions.
domain assumption recursive consistency (aggregating in blocks gives the same answer as aggregating directly, provided each block carries the total weight of its members)
Imposed as one of the three substantive conditions.
domain assumption marginal Elo-strength consistency (for two equally weighted coordinates, the ratio of marginal contributions to the combined rating equals the ordinary Elo odds)
Imposed as one of the three substantive conditions.

pith-pipeline@v0.9.0 · 5470 in / 1296 out tokens · 39919 ms · 2026-05-12T01:56:19.395033+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel (J uniqueness) unclear
Theorem 1: Cλ(R) = 400 log10( (∑ λ_i 10^{Ri/400}) / ∑ λ_i ); equivalently q(Cλ(R)) = weighted arithmetic mean of q(Ri) with q(r)=10^{r/400}
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection (coupling combiner forces bilinear J branch) unclear
Assumption 4 (marginal Elo-strength consistency): ∂x C(1,1)(x,y) / ∂y C(1,1)(x,y) = 10^{(x-y)/400}

Reference graph

Works this paper leans on

8 extracted references · 8 canonical work pages

[1]

Bradley, Ralph Allan, and Milton E. Terry. 1952. ``Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons.'' Biometrika 39(3/4): 324--345. https://doi.org/10.1093/biomet/39.3-4.324

work page doi:10.1093/biomet/39.3-4.324 1952
[2]

Elo, Arpad E. 1978. The Rating of Chessplayers, Past and Present. New York: Arco

work page 1978
[3]

Glickman, Mark E. 1995. ``A Comprehensive Guide to Chess Ratings.'' American Chess Journal 3: 59--102. https://www.glicko.net/research/acjpaper.pdf

work page 1995
[4]

Luce, R. Duncan. 1959. Individual Choice Behavior: A Theoretical Analysis. New York: Wiley

work page 1959
[5]

Skaperdas, Stergios. 1996. ``Contest Success Functions.'' Economic Theory 7(2): 283--290. https://doi.org/10.1007/BF01213906

work page doi:10.1007/bf01213906 1996
[6]

Sonas, Jeff. 2002. ``The Sonas Rating Formula -- Better than Elo?'' ChessBase, October 22. https://en.chessbase.com/post/the-sonas-rating-formula-better-than-elo. Accessed May 8, 2026

work page 2002
[7]

Tullock, Gordon. 1980. ``Efficient Rent Seeking.'' In Toward a Theory of the Rent-Seeking Society, edited by James M. Buchanan, Robert D. Tollison, and Gordon Tullock, 97--112. College Station: Texas A&M University Press

work page 1980
[8]

Universal Rating System. 2017. ``Universal Rating System.'' http://universalrating.com/about-us.php. Accessed May 8, 2026

work page 2017