pith. machine review for the scientific record. sign in

arxiv: 2605.08989 · v1 · submitted 2026-05-09 · 💰 econ.TH · math.OC

Recognition: 2 theorem links

· Lean Theorem

Aggregating Elo Ratings: An Axiomatization

Mehmet Mars Seven

Pith reviewed 2026-05-12 01:56 UTC · model grok-4.3

classification 💰 econ.TH math.OC
keywords Elo ratingsrating aggregationaxiomatic characterizationconsistency conditionschess ratingsstrength averagingrecursive aggregationmarginal consistency
0
0 comments X

The pith

The only way to reduce a vector of Elo ratings to a single Elo-scale rating that meets three consistency conditions is to convert each to its underlying strength, take the weighted arithmetic mean, and convert back.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to characterize a principled method for collapsing several Elo ratings that describe the same agent into one overall rating that remains on the Elo scale. It requires that any such rule obey same-scale normalization (identical inputs produce the identical output rating), recursive consistency (aggregating ratings in successive blocks with preserved total weights yields the same result as aggregating all at once), and marginal Elo-strength consistency (the ratio of marginal effects on the combined rating for two equally weighted inputs equals the standard Elo odds ratio between them). Under these requirements the unique rule is the one that maps each rating to its Elo strength, computes the weighted arithmetic average of those strengths, and maps the average back to the Elo scale. This matters for chess players with classical, rapid, and blitz ratings, for online platforms that rate by time control or format, and for any evaluator that scores performance across multiple tasks or roles.

Core claim

The unique rating rule satisfying same-scale normalization, recursive consistency, and marginal Elo-strength consistency converts each component rating to its Elo strength, takes the weighted arithmetic mean of those strengths, and converts the result back to the Elo scale.

What carries the argument

The marginal Elo-strength consistency condition, which equates the ratio of marginal contributions to the combined rating (for two equally weighted coordinates) with the ordinary Elo odds ratio between the two inputs.

If this is right

  • The resulting rule is distinct from both direct arithmetic averaging of the rating numbers and from treating the combined rating as the outcome of a random-format lottery.
  • Each of the three axioms is independent of the others, so dropping any one allows additional aggregation rules.
  • The rule supplies a concrete method for combining classical, rapid, and blitz chess ratings into a single overall rating.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same conversion-to-strength step could be applied to other rating systems that share the logistic or exponential relationship between rating difference and win probability.
  • Platforms could adopt the rule to guarantee that the order in which sub-ratings are grouped does not alter the final scalar rating.
  • The construction treats Elo ratings as logarithmic measures of strength, suggesting analogous aggregation procedures whenever performance is scored on an odds-ratio scale.

Load-bearing premise

The marginal Elo-strength consistency condition requires that for any two equally weighted ratings the ratio of their marginal contributions to the combined rating equals the standard Elo odds between them.

What would settle it

A concrete counter-example consisting of two or three specific rating values and weights for which either the strength-averaging rule violates one of the three axioms or some other aggregation rule satisfies all three axioms yet produces a numerically different combined rating.

read the original abstract

Many environments assign several Elo ratings to the same agent: a chess player has classical, rapid, and blitz ratings; an online platform may rate by time control, mode, or format; an evaluator may rate performance across tasks or roles. This paper axiomatizes when such a vector of ratings can be reduced to a single scalar rating that is itself on the Elo scale. We impose three substantive conditions: same-scale normalization (a uniform profile keeps its rating), recursive consistency (aggregating in blocks gives the same answer as aggregating directly, provided each block carries the total weight of its members), and marginal Elo-strength consistency (for two equally weighted coordinates, the ratio of marginal contributions to the combined rating equals the ordinary Elo odds). The unique rating rule satisfying these conditions converts each component to its Elo strength, takes a weighted arithmetic mean of strengths, and converts back. We show how this rule differs from a random-format lottery and from rating-scale averaging, prove the axioms are independent, and illustrate the rule on combining classical, rapid, and blitz ratings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript axiomatizes the aggregation of multiple Elo ratings for the same agent into a single scalar rating that remains on the Elo scale. It imposes three axioms: same-scale normalization (a uniform profile of ratings is preserved), recursive consistency (block-wise aggregation with preserved total weights equals direct aggregation), and marginal Elo-strength consistency (for any two equally weighted coordinates, the ratio of their marginal contributions to the aggregate equals the standard Elo odds ratio). The paper proves that the unique rule satisfying these axioms maps each rating to its underlying strength via the Elo transformation, computes the weighted arithmetic mean of the strengths, and maps the result back to the Elo scale. It further establishes independence of the three axioms and shows that the rule differs from both a random-format lottery and direct averaging on the rating scale, with an illustration on combining classical, rapid, and blitz chess ratings.

Significance. If the uniqueness result holds, the paper supplies a clean, parameter-free aggregation rule derived directly from three externally motivated axioms without circularity or ad-hoc fitting. This is useful for chess federations and online platforms that maintain multiple ratings per agent. The independence proofs and explicit differentiation from intuitive alternatives (lottery, scale averaging) add robustness. The approach aligns with standard functional-equation techniques in economic theory and yields a falsifiable prediction for how aggregated ratings should behave under the stated conditions.

minor comments (2)
  1. The manuscript would benefit from an explicit statement of the Elo strength transformation function (including the base-10 logarithm and scaling constant) at the first point of use, rather than assuming familiarity.
  2. A short numerical example with concrete Elo values (e.g., 2700 classical, 2600 rapid) showing the numerical difference from scale averaging would strengthen the illustration section.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and accurate summary of our manuscript, which correctly identifies the three axioms and the uniqueness result for the Elo-strength aggregation rule. We appreciate the recognition of its potential usefulness for chess federations and rating platforms, as well as the note on axiom independence and differentiation from alternatives. Since the report contains no specific major comments or requested changes, we have no point-by-point revisions to address at this stage.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents a standard axiomatic characterization: three externally motivated conditions (same-scale normalization, recursive consistency, and marginal Elo-strength consistency) are stated as independent primitives, shown to be independent, and used to derive a unique functional form (Elo-to-strength conversion, weighted arithmetic mean on the strength scale, and return to Elo). No step reduces a derived quantity to a fitted parameter, renames an input, or relies on a self-citation chain; the marginal consistency axiom directly encodes the target odds ratio without presupposing the final aggregator. The derivation is therefore self-contained against the stated axioms and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 0 invented entities

The central claim rests on the three substantive conditions stated in the abstract; no free parameters or invented entities are introduced.

axioms (3)
  • domain assumption same-scale normalization (a uniform profile keeps its rating)
    Imposed as one of the three substantive conditions.
  • domain assumption recursive consistency (aggregating in blocks gives the same answer as aggregating directly, provided each block carries the total weight of its members)
    Imposed as one of the three substantive conditions.
  • domain assumption marginal Elo-strength consistency (for two equally weighted coordinates, the ratio of marginal contributions to the combined rating equals the ordinary Elo odds)
    Imposed as one of the three substantive conditions.

pith-pipeline@v0.9.0 · 5470 in / 1296 out tokens · 39919 ms · 2026-05-12T01:56:19.395033+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

8 extracted references · 8 canonical work pages

  1. [1]

    Bradley, Ralph Allan, and Milton E. Terry. 1952. ``Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons.'' Biometrika 39(3/4): 324--345. https://doi.org/10.1093/biomet/39.3-4.324

  2. [2]

    Elo, Arpad E. 1978. The Rating of Chessplayers, Past and Present. New York: Arco

  3. [3]

    Glickman, Mark E. 1995. ``A Comprehensive Guide to Chess Ratings.'' American Chess Journal 3: 59--102. https://www.glicko.net/research/acjpaper.pdf

  4. [4]

    Luce, R. Duncan. 1959. Individual Choice Behavior: A Theoretical Analysis. New York: Wiley

  5. [5]

    Skaperdas, Stergios. 1996. ``Contest Success Functions.'' Economic Theory 7(2): 283--290. https://doi.org/10.1007/BF01213906

  6. [6]

    Sonas, Jeff. 2002. ``The Sonas Rating Formula -- Better than Elo?'' ChessBase, October 22. https://en.chessbase.com/post/the-sonas-rating-formula-better-than-elo. Accessed May 8, 2026

  7. [7]

    Tullock, Gordon. 1980. ``Efficient Rent Seeking.'' In Toward a Theory of the Rent-Seeking Society, edited by James M. Buchanan, Robert D. Tollison, and Gordon Tullock, 97--112. College Station: Texas A&M University Press

  8. [8]

    Universal Rating System. 2017. ``Universal Rating System.'' http://universalrating.com/about-us.php. Accessed May 8, 2026