arxiv: 2605.04867 · v1 · submitted 2026-05-06 · 🧮 math.PR

Recognition: unknown

First server effect on the expected number of games in tennis

Ali Mohammadi

Pith reviewed 2026-05-08 16:44 UTC · model grok-4.3

classification 🧮 math.PR

keywords tennisexpected gamesfirst serverserve probabilityconstant probabilitytennis modelingmatch margin

0 comments

The pith

Knowing who serves first changes the expected number of games in a tennis match by at most one.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that information on the first server affects the expected total number of games and the final margin in a tennis set or match, even when each player wins points on serve with fixed probabilities. It maps the exact combinations of those probabilities where the first-server effect is large enough to notice and shows through calculation that the shift never exceeds one game. An empirical check against professional match records supports treating the serve-win probabilities as constant for the purpose of predicting game counts. Readers interested in tennis modeling or match forecasting would care because the result refines a standard assumption without requiring large corrections.

Core claim

Under the assumption that player A wins points on serve with probability p and player B with probability q, the expected number of games and the expected margin both depend on which player serves first; the dependence is non-negligible only inside certain regions of the (p, q) plane, yet the absolute difference remains at most one game at both the set and match level.

What carries the argument

The difference in conditional expectations for total games, obtained by modeling the set and match as sequences of games whose outcomes depend on the current server and the fixed point-win probabilities.

Load-bearing premise

Each player's probability of winning a point on serve stays constant throughout the match.

What would settle it

A large collection of professional matches in which the estimated serve-win probabilities fall inside the identified non-negligible region and the observed average game count differs by more than one game according to which player served first.

Figures

Figures reproduced from arXiv: 2605.04867 by Ali Mohammadi.

**Figure 1.** Figure 1: Expected total games differential caused by serve order. (a) Match-level effect on margin (b) Match and set margin effect difference view at source ↗

**Figure 2.** Figure 2: Expected margin differential caused by serve order view at source ↗

**Figure 3.** Figure 3: Total games model residual structures. Remark 4.1. Write M(x, y) for the function [5, equation 48], such that the probability of A winning a match against B is MAB = M(PA,PB). A useful observation in [3] is that M(PA,PB) ≈ M(PA + ε,PB + ε) for sufficiently small values of ε > 0 (see [3, view at source ↗

read the original abstract

We show that information on the first server influences the expected total number of games and margin in a tennis match under the standard assumption that each player's serve point win probability remains constant, and identify the exact regions, in terms of these probabilities, in which this effect is non-negligible. We confirm numerically that this effect is bounded by at most one game at both the set and match level. We complement the analysis with an empirical comparison on professional match data, illustrating the adequacy of the constant-probability assumption for modelling the total number of games.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper maps exact regions where knowing the first server shifts expected tennis games by a noticeable amount and numerically checks that the shift stays under one game, but the bound lacks an analytical proof.

read the letter

The key point is that information about who serves first changes the expected number of games in a tennis match by no more than one game, and the paper gives the precise ranges of serve win probabilities where this change is noticeable. The authors work with the usual model where each player has fixed probabilities p and q of winning a point on their serve. They set up the recursive expectations for the number of games in a set and then in a match, and they compare the cases with and without knowing the first server. From this they extract the exact regions in the p-q plane where the difference in expected games or margin is non-negligible. They also run a numerical check over many values of p and q to verify that the maximum difference stays at or below one game, both for sets and full matches. On top of that they look at some professional match data to see how well the constant-probability assumption fits the observed total games. The identification of those exact regions is the genuinely new piece. It is a straightforward but careful calculation that previous tennis probability papers do not seem to have done. The empirical comparison is small but directly addresses whether the model is adequate for the quantity they care about. The main limitation is that the one-game bound comes from numerical verification rather than a closed-form proof. Tennis scoring includes tiebreaks and can go to advantage sets, so the state space is large. Even a fine grid could overlook narrow intervals near p or q equal to zero or one where the effect might behave differently. An analytical demonstration that the difference cannot exceed one would strengthen the central claim considerably. This paper is for readers who already work with detailed probabilistic models of tennis or similar sports. It will not interest people looking for broad new theory, but it gives a precise tool for anyone who needs accurate expected game counts in simulations or betting models. I would recommend sending it out for peer review. The calculations look honest and the empirical check is a nice touch, so referees can evaluate the numerical coverage and suggest improvements if needed.

Referee Report

1 major / 2 minor

Summary. The paper derives recursive expressions for the expected number of games (and win margin) in a tennis set and best-of-n match under constant point-win probabilities p and q, distinguishing the cases where the first server is known versus unknown. It analytically identifies the regions of the (p,q) unit square in which the first-server information produces a non-negligible difference in these expectations, and numerically verifies that the absolute difference is at most one game at both set and match level. The analysis is complemented by an empirical comparison against professional match data that supports the adequacy of the constant-probability assumption for modeling total games.

Significance. If the numerical bound holds, the work supplies a precise, parameter-region-specific quantification of a previously unexamined modeling choice (first-server information) that affects expected match length by a bounded but non-zero amount. The recursive expectation framework on the finite tennis scoring tree is standard yet cleanly executed here, and the explicit delineation of negligible versus non-negligible regions offers falsifiable guidance for analysts. The empirical check, while secondary, directly addresses the practical relevance of the constant-p,q assumption for the quantity of interest.

major comments (1)

[§4] §4 (Numerical confirmation of the bound): The central claim that the first-server effect is bounded by at most one game at both set and match level rests entirely on numerical evaluation over a finite grid of (p,q) values. Because the scoring tree admits arbitrarily long deuce/advantage sequences when p or q approaches 0 or 1, and because tie-break and advantage-set rules introduce additional discontinuities, a discrete grid may miss narrow parameter bands in which the difference exceeds 1. An analytical upper bound or a convergence argument with explicit error control on the grid is required to make the headline bound rigorous.

minor comments (2)

[§§2–3] The recursive definitions for expected games (e.g., the conditioning on the first point or first game) are introduced without a compact notation for the two cases (known vs. unknown first server); a single pair of symbols such as E_A and E_U would improve readability in §§2–3.
[Figure 2] Figure 2 (or equivalent region plot) lacks an explicit color-bar scale for the magnitude of the difference; readers cannot immediately judge whether the plotted 'non-negligible' regions correspond to differences of 0.1 or 0.9 games.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive report and the positive assessment of the paper's contributions. We address the single major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: §4 (Numerical confirmation of the bound): The central claim that the first-server effect is bounded by at most one game at both set and match level rests entirely on numerical evaluation over a finite grid of (p,q) values. Because the scoring tree admits arbitrarily long deuce/advantage sequences when p or q approaches 0 or 1, and because tie-break and advantage-set rules introduce additional discontinuities, a discrete grid may miss narrow parameter bands in which the difference exceeds 1. An analytical upper bound or a convergence argument with explicit error control on the grid is required to make the headline bound rigorous.

Authors: We agree that a finite grid, however dense, cannot by itself rigorously exclude exceedances in narrow unsampled regions, particularly near the boundaries where long deuce sequences occur and at the tie-break discontinuities. In the revision we will replace the current numerical confirmation with a two-part argument: (i) results on a grid with spacing 0.001 together with a direct computation of the difference function at all grid points, and (ii) an explicit error-control argument showing that the difference between the two recursive expectations is Lipschitz continuous on each compact sub-rectangle away from the tie-break thresholds, with a computable Lipschitz constant derived from the depth of the recursion. This supplies the required convergence control and confirms that the supremum remains at most one game. revision: yes

Circularity Check

0 steps flagged

No significant circularity; standard recursive expectations with separate numerical verification

full rationale

The paper computes the first-server effect on expected games via recursive expectations over the finite tennis scoring tree (points to games to sets to match) under fixed p and q. These recursions are direct applications of standard Markovian expectation methods and do not reduce to any fitted parameter or self-referential definition. Regions of non-negligible effect follow analytically from comparing the two server-order formulas. The 'at most one game' bound is stated as a numerical confirmation on a grid, not as a derived equality or prediction forced by construction. No self-citations are load-bearing for the central claims, no ansatz is smuggled, and the empirical data comparison is presented as external validation rather than part of the derivation. The analysis is therefore self-contained against external probabilistic benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption of constant serve win probabilities; no numerical parameters are fitted inside the derivation itself and no new entities are postulated.

axioms (1)

domain assumption Each player's serve point win probability remains constant throughout the match.
Explicitly stated in the abstract as the modeling premise that enables the closed-form regions and the one-game bound.

pith-pipeline@v0.9.0 · 5372 in / 1191 out tokens · 37340 ms · 2026-05-08T16:44:59.084064+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

6 extracted references

[1]

T. J. Barnett,Mathematical modelling in hierarchical games with specific reference to tennis, Ph.D. Thesis, Swinburne University of Technology, Melbourne, Australia, 2006

2006
[2]

T. J. Barnett and S. R. Clarke,Combining player statistics to predict outcomes of tennis matches,IMA Journal of Management Mathematics16(2)(2005), 113–120

2005
[3]

W. J. Knottenbelt, D. Spanias and A. M. Madurska,A common-opponent stochastic model for predicting the outcome of professional tennis matches,Computers and Mathematics with Applications64(12)(2012), 3820–3827. 8 A. MOHAMMADI

2012
[4]

O’Malley,Probability formulas and statistical analysis in tennis,Journal of Quantitative Analysis in Sports4(2)(2008), 15

A.J. O’Malley,Probability formulas and statistical analysis in tennis,Journal of Quantitative Analysis in Sports4(2)(2008), 15

2008
[5]

P. K. Newton and J. B. Keller,Probability of winning at tennis I. Theory and data,Studies in Applied Mathematics114(3)(2005), 241–269

2005
[6]

Sackmann,Tennis Match Results Data,https://github.com/jeffsackmann, Accessed: May 2026

J. Sackmann,Tennis Match Results Data,https://github.com/jeffsackmann, Accessed: May 2026. Email address:ali.mohammadi.np@gmail.com

2026