pith. sign in

arxiv: 2403.05077 · v3 · submitted 2024-03-08 · 🧮 math.PR

A refinement of the Ewens sampling formula

Pith reviewed 2026-05-24 03:33 UTC · model grok-4.3

classification 🧮 math.PR
keywords Ewens sampling formularefined Ewens sampling formulapopulation geneticsallelic modelmutation ratesPoisson approximationlimit theoremsneutral model
0
0 comments X

The pith

The joint distribution of the entries in the allelic composition matrix follows the refined Ewens sampling formula in a model with multiple mutation-rate classes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper considers an infinitely-many neutral allelic model in which alleles are partitioned into a finite number of classes, each with its own fixed mutation rate. For samples drawn from a very large population, the allelic composition is encoded as a random matrix, and the refined Ewens sampling formula supplies the joint distribution of the matrix entries. This extends the classical Ewens formula, which assumes a single mutation rate for all alleles. The work also supplies Poisson approximations for the refined formula and derives limit theorems for allele counts under several asymptotic regimes.

Core claim

In an infinitely-many neutral allelic model where all alleles are divided into a finite number of classes each characterized by its own mutation rate, the joint distribution of the entries of the random matrix that describes the allelic composition of a sample taken from a very large population of genes is given by the refined Ewens sampling formula.

What carries the argument

The refined Ewens sampling formula, a direct generalization of the classical Ewens sampling formula that incorporates distinct fixed mutation rates for each allele class.

If this is right

  • The refined formula admits a Poisson approximation.
  • Limit theorems hold for the numbers of alleles of each type in different asymptotic regimes.
  • The formula can be derived by several distinct methods.
  • When all mutation rates are equal the refined formula reduces to the classical Ewens sampling formula.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The refined formula could be used to infer class-specific mutation rates from observed genetic samples.
  • Analogous refinements may exist for other classical sampling formulas in combinatorial probability.
  • Finite-population corrections to the refined formula would be a natural next step for practical genetic data.

Load-bearing premise

The model assumes an infinitely-many neutral allelic framework in which alleles are partitioned into a finite number of classes, each class having its own fixed mutation rate, and the sample comes from a very large population.

What would settle it

Simulate many independent samples from the multi-class mutation model and check whether the observed frequencies of each possible matrix of allele counts match the probabilities prescribed by the refined Ewens sampling formula.

Figures

Figures reproduced from arXiv: 2403.05077 by Eugene Strahov.

Figure 1
Figure 1. Figure 1: An analogue of the Chinese restaurant process. multiplication of the balls in a cycle gives a red ball, then the color of the cycle is red. In order to obtain an element of G ∼ S(n) from an element of G ∼ S(n − 1) add a new ball to the configuration describing the chosen element of G ∼ S(n − 1). The new ball can create a new cycle (with probability t1 t1+t2+2n provided the new ball is red and with probabil… view at source ↗
read the original abstract

We consider an infinitely-many neutral allelic model of population genetics where all alleles are divided into a finite number of classes, and each class is characterized by its own mutation rate. For this model the allelic composition of a sample taken from a very large population of genes is characterized by a random matrix, and the problem is to describe the joint distribution of the matrix entries. The answer is given by a new generalization of the classical Ewens sampling formula called the refined Ewens sampling formula in the present paper. We discuss a Poisson approximation for the refined Ewens sampling formula, and present its derivation by several methods. As an application we obtain limit theorems for the numbers of alleles in different asymptotic regimes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript derives a refined Ewens sampling formula that gives the joint distribution of the entries of a random matrix of allelic counts arising from an infinitely-many neutral alleles model in which alleles are partitioned into a finite number of classes, each with its own fixed mutation rate. The formula is obtained by several methods, a Poisson approximation is discussed, and limit theorems are derived for the numbers of alleles of each type in various asymptotic regimes.

Significance. If the derivations hold, the result supplies a concrete, usable generalization of the classical Ewens sampling formula to a multi-class setting that is already standard in population-genetics modeling. The multi-method derivation and the accompanying Poisson and limit results would make the formula immediately applicable to inference on samples drawn from populations with heterogeneous mutation rates.

minor comments (2)
  1. The definition of the random matrix of allelic counts (presumably in §2) should be accompanied by an explicit statement of the sample size n and the number of classes K; without these the transition from the classical Ewens formula to the refined version is harder to follow.
  2. The Poisson approximation section would benefit from a short statement of the total-variation or Wasserstein distance that is being bounded, together with the precise regime (e.g., θ fixed, n→∞) under which the bound holds.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary and significance assessment of our work on the refined Ewens sampling formula, as well as the recommendation for minor revision. No specific major comments were provided in the report, so we have no points to address point-by-point at this time. We remain available to incorporate any additional feedback or minor clarifications if supplied.

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained from model assumptions

full rationale

The paper constructs the refined Ewens sampling formula directly from the infinitely-many neutral allelic model with class-specific mutation rates, deriving the joint distribution of the allelic count matrix via multiple methods (including Poisson approximation and limit theorems) that reduce to the classical Ewens formula in special cases. No steps reduce by construction to fitted parameters, self-definitions, or load-bearing self-citations; the result is presented as an independent generalization with external reductions to known formulas.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract alone; the model description invokes standard neutral-population assumptions whose precise mathematical content is not visible.

axioms (1)
  • domain assumption Infinitely-many neutral allelic model with finite number of mutation-rate classes
    Stated in the first sentence of the abstract as the setting for the sampling problem.

pith-pipeline@v0.9.0 · 5628 in / 1140 out tokens · 24690 ms · 2026-05-24T03:33:36.268800+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages

  1. [1]

    Aldous, D. J. Exchangeability and related topics. ´Ecole d’´ et´ e de probabilit´ es de Saint-Flour, XIII—1983, 1–198, Lecture Notes in Math., 1117, Springer, Berlin, 1985

  2. [2]

    Antoniak, C. E. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann. Statist. 2 (1974), 1152—1174

  3. [3]

    Poisson process approximation for the Ewens sampling formula

    Arratia, R.; Barbour, A.; Tavar´ e, S. Poisson process approximation for the Ewens sampling formula. Ann. Appl. Probab. 2 (1992), no. 3, 519-–535

  4. [4]

    Logarithmic combinatorial structures: a probabilistic approach

    Arratia, R.; Barbour, A.; Tavar´ e, S. Logarithmic combinatorial structures: a probabilistic approach. European Mathematical Society, Zurich, 2003

  5. [5]

    The Ubiquitous Ewens Sampling Formula

    Crane, H. The Ubiquitous Ewens Sampling Formula. Statistical Science 31 (2016) 1–19

  6. [6]

    Probability Models for DNA Sequence Evolution

    Durrett, R. Probability Models for DNA Sequence Evolution. Probability and Its Applications. Springer 2008

  7. [7]

    Some Mathematical Models from Population Genetics

    Etheridge, A. Some Mathematical Models from Population Genetics. Lecture Notes in Mathematics. Springer 2012

  8. [8]

    Ewens, W. J. The sampling theory of selectively neutral alleles. Theoret. Population Biol. 3 (1972)

  9. [9]

    Ewens, W. J. Mathematical Population Genetics. Lecture Notes. Cornell University 2006. Available online https://services.math.duke.edu/ rtd/CPSS2006/cornelllect.pdf

  10. [10]

    Large deviations associated with Poisson-Dirichlet distribution and Ewens sampling formula

    Feng, S. Large deviations associated with Poisson-Dirichlet distribution and Ewens sampling formula. Ann. Appl. Probab. 17 (2007), no. 5–6, 1570—1595

  11. [11]

    The Poisson-Dirichlet Distribution and Related Topics

    Feng, S. The Poisson-Dirichlet Distribution and Related Topics. Models and Asymptotic Behaviors. Springer 2010

  12. [12]

    Fundamentals of nonparametric Bayesian inference

    Ghosal, S.; van der Vaart, A. Fundamentals of nonparametric Bayesian inference. Cambridge Series in Statistical and Probabilistic Mathematics, 44. Cambridge University Press, Cambridge, 2017

  13. [13]

    Gnedin, A. V. Three sampling formulas. Combin. Probab. Comput. 13 (2004), no. 2, 185—193. 44 EUGENE STRAHOV

  14. [14]

    C.; Lessard, S

    Griffiths, R. C.; Lessard, S. Ewens’ sampling formula and related formulae: combinatorial proofs, exten- sion to a variable population size and applications to ages of alleles. Theor. Population Biology 68 (2005) 167–177

  15. [15]

    P.; Park, J

    Hong, E. P.; Park, J. W. Sample Size and Statistical Power Calculation in Genetic Association Studies. Genomics Inf. (2012); 10(2): 117—122

  16. [16]

    Hoppe, F. M. P´ olya-like urns and the Ewens’ sampling formula. J. Math. Biol. 20 (1984), no. 1, 91—94

  17. [17]

    Hoppe, F. M. The sampling theory of neutral alleles and an urn model in population genetics. J. Math. Biol. 25 (1987), no. 2, 123—159

  18. [18]

    Kingman, J. F. C. Random partitions in population genetics. Proc. R. Soc. London A 361, (1978), 1–20

  19. [19]

    Kingman, J. F. C. The representation of partition structures. J. London Math. Soc. 18 (1978), 374–380

  20. [20]

    M.; Hollander, M

    Korwar, R. M.; Hollander, M. Contributions to the theory of Dirichlet processes. Ann. Probability 1 (1973), 705—711

  21. [21]

    Kotz, S.; Balakrishnan, N.; Johnson, N. L. Continuous multivariate distributions. Vol. 1. Models and ap- plications. Second edition. Wiley Series in Probability and Statistics: Applied Probability and Statistics

  22. [22]

    Symmetric Functions and Hall Polynomials

    Macdonald, I. Symmetric Functions and Hall Polynomials. Second edition, Oxford Mathematical Mono- graphs, Oxford University Press (1995)

  23. [23]

    Mahmoud, H. M. P´ olya Urn Models. Texts in Statistical Sciences (2009)

  24. [24]

    The two-parameter generalization of Ewens’ random partition structure

    Pitman, J. The two-parameter generalization of Ewens’ random partition structure. Technical Report 345, Dept Statistics, University of California, Berkeley (1992)

  25. [25]

    Exchangeable and partially exchangeable random partitions

    Pitman, J. Exchangeable and partially exchangeable random partitions. Probab. Theory Related Fields 102 (1995), no. 2, 145—158

  26. [26]

    Combinatorial Stochastic Processes

    Pitman, J. Combinatorial Stochastic Processes. Ecole d’Et´ e de Probabilit´ es de Saint-Flour XXXII-2002. Lecture Notes in Mathematics 1875, Spinger 2002

  27. [27]

    O’Reilly, R.; Elphick, H. E. Development, clinical utility, and place of ivacaftor in the treatment of cystic fibrosis. Drag Design, Development and Therapy 7 (2013), 929–937

  28. [28]

    Multiple partition structures and harmonic functions on branching graphs

    Strahov, E. Multiple partition structures and harmonic functions on branching graphs. Adv. in Appl. Math. 153 (2024), Paper No. 102617, 49 pp

  29. [29]

    Generalized regular representations of big wreath products

    Strahov, E. Generalized regular representations of big wreath products. Israel J. of Math. (2025) (will appear)

  30. [30]

    Ancestral Inference in Population Genetics

    Tavar´ e, S. Ancestral Inference in Population Genetics. Lectures on Probability Theory and Statistics. Ecole d’Et´ e de Probabilit´ es de Saint-Flour XXXI (2001)

  31. [31]

    The magical Ewens sampling formula

    Tavar´ e, S. The magical Ewens sampling formula. Bull. London Math. Soc. 53 (2021), 1563–1582

  32. [32]

    Estimating the large mutation parameter of the Ewens sampling formula

    Tsukuda, K. Estimating the large mutation parameter of the Ewens sampling formula. J. Appl. Proba- bility 54 (2017), 42–54

  33. [33]

    On Poisson approximations for the Ewens sampling formula when the mutation parameter grows with the sample size

    Tsukuda, K. On Poisson approximations for the Ewens sampling formula when the mutation parameter grows with the sample size. Ann. Appl. Probab. 29 (2019), no. 2, 1188—1232

  34. [34]

    Watterson, G. A. The stationary distribution of the infinitely-many neutral alleles diffusion model. J. Appl. Probability 13 (1976), no. 4, 639—651. Department of Mathematics, The Hebrew University of Jerusalem, Givat Ram, Jerusalem 91904, Israel Email address:strahov@math.huji.ac.il