A refinement of the Ewens sampling formula
Pith reviewed 2026-05-24 03:33 UTC · model grok-4.3
The pith
The joint distribution of the entries in the allelic composition matrix follows the refined Ewens sampling formula in a model with multiple mutation-rate classes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In an infinitely-many neutral allelic model where all alleles are divided into a finite number of classes each characterized by its own mutation rate, the joint distribution of the entries of the random matrix that describes the allelic composition of a sample taken from a very large population of genes is given by the refined Ewens sampling formula.
What carries the argument
The refined Ewens sampling formula, a direct generalization of the classical Ewens sampling formula that incorporates distinct fixed mutation rates for each allele class.
If this is right
- The refined formula admits a Poisson approximation.
- Limit theorems hold for the numbers of alleles of each type in different asymptotic regimes.
- The formula can be derived by several distinct methods.
- When all mutation rates are equal the refined formula reduces to the classical Ewens sampling formula.
Where Pith is reading between the lines
- The refined formula could be used to infer class-specific mutation rates from observed genetic samples.
- Analogous refinements may exist for other classical sampling formulas in combinatorial probability.
- Finite-population corrections to the refined formula would be a natural next step for practical genetic data.
Load-bearing premise
The model assumes an infinitely-many neutral allelic framework in which alleles are partitioned into a finite number of classes, each class having its own fixed mutation rate, and the sample comes from a very large population.
What would settle it
Simulate many independent samples from the multi-class mutation model and check whether the observed frequencies of each possible matrix of allele counts match the probabilities prescribed by the refined Ewens sampling formula.
Figures
read the original abstract
We consider an infinitely-many neutral allelic model of population genetics where all alleles are divided into a finite number of classes, and each class is characterized by its own mutation rate. For this model the allelic composition of a sample taken from a very large population of genes is characterized by a random matrix, and the problem is to describe the joint distribution of the matrix entries. The answer is given by a new generalization of the classical Ewens sampling formula called the refined Ewens sampling formula in the present paper. We discuss a Poisson approximation for the refined Ewens sampling formula, and present its derivation by several methods. As an application we obtain limit theorems for the numbers of alleles in different asymptotic regimes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript derives a refined Ewens sampling formula that gives the joint distribution of the entries of a random matrix of allelic counts arising from an infinitely-many neutral alleles model in which alleles are partitioned into a finite number of classes, each with its own fixed mutation rate. The formula is obtained by several methods, a Poisson approximation is discussed, and limit theorems are derived for the numbers of alleles of each type in various asymptotic regimes.
Significance. If the derivations hold, the result supplies a concrete, usable generalization of the classical Ewens sampling formula to a multi-class setting that is already standard in population-genetics modeling. The multi-method derivation and the accompanying Poisson and limit results would make the formula immediately applicable to inference on samples drawn from populations with heterogeneous mutation rates.
minor comments (2)
- The definition of the random matrix of allelic counts (presumably in §2) should be accompanied by an explicit statement of the sample size n and the number of classes K; without these the transition from the classical Ewens formula to the refined version is harder to follow.
- The Poisson approximation section would benefit from a short statement of the total-variation or Wasserstein distance that is being bounded, together with the precise regime (e.g., θ fixed, n→∞) under which the bound holds.
Simulated Author's Rebuttal
We thank the referee for the positive summary and significance assessment of our work on the refined Ewens sampling formula, as well as the recommendation for minor revision. No specific major comments were provided in the report, so we have no points to address point-by-point at this time. We remain available to incorporate any additional feedback or minor clarifications if supplied.
Circularity Check
No significant circularity; derivation self-contained from model assumptions
full rationale
The paper constructs the refined Ewens sampling formula directly from the infinitely-many neutral allelic model with class-specific mutation rates, deriving the joint distribution of the allelic count matrix via multiple methods (including Poisson approximation and limit theorems) that reduce to the classical Ewens formula in special cases. No steps reduce by construction to fitted parameters, self-definitions, or load-bearing self-citations; the result is presented as an independent generalization with external reductions to known formulas.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Infinitely-many neutral allelic model with finite number of mutation-rate classes
Reference graph
Works this paper leans on
-
[1]
Aldous, D. J. Exchangeability and related topics. ´Ecole d’´ et´ e de probabilit´ es de Saint-Flour, XIII—1983, 1–198, Lecture Notes in Math., 1117, Springer, Berlin, 1985
work page 1983
-
[2]
Antoniak, C. E. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann. Statist. 2 (1974), 1152—1174
work page 1974
-
[3]
Poisson process approximation for the Ewens sampling formula
Arratia, R.; Barbour, A.; Tavar´ e, S. Poisson process approximation for the Ewens sampling formula. Ann. Appl. Probab. 2 (1992), no. 3, 519-–535
work page 1992
-
[4]
Logarithmic combinatorial structures: a probabilistic approach
Arratia, R.; Barbour, A.; Tavar´ e, S. Logarithmic combinatorial structures: a probabilistic approach. European Mathematical Society, Zurich, 2003
work page 2003
-
[5]
The Ubiquitous Ewens Sampling Formula
Crane, H. The Ubiquitous Ewens Sampling Formula. Statistical Science 31 (2016) 1–19
work page 2016
-
[6]
Probability Models for DNA Sequence Evolution
Durrett, R. Probability Models for DNA Sequence Evolution. Probability and Its Applications. Springer 2008
work page 2008
-
[7]
Some Mathematical Models from Population Genetics
Etheridge, A. Some Mathematical Models from Population Genetics. Lecture Notes in Mathematics. Springer 2012
work page 2012
-
[8]
Ewens, W. J. The sampling theory of selectively neutral alleles. Theoret. Population Biol. 3 (1972)
work page 1972
-
[9]
Ewens, W. J. Mathematical Population Genetics. Lecture Notes. Cornell University 2006. Available online https://services.math.duke.edu/ rtd/CPSS2006/cornelllect.pdf
work page 2006
-
[10]
Large deviations associated with Poisson-Dirichlet distribution and Ewens sampling formula
Feng, S. Large deviations associated with Poisson-Dirichlet distribution and Ewens sampling formula. Ann. Appl. Probab. 17 (2007), no. 5–6, 1570—1595
work page 2007
-
[11]
The Poisson-Dirichlet Distribution and Related Topics
Feng, S. The Poisson-Dirichlet Distribution and Related Topics. Models and Asymptotic Behaviors. Springer 2010
work page 2010
-
[12]
Fundamentals of nonparametric Bayesian inference
Ghosal, S.; van der Vaart, A. Fundamentals of nonparametric Bayesian inference. Cambridge Series in Statistical and Probabilistic Mathematics, 44. Cambridge University Press, Cambridge, 2017
work page 2017
-
[13]
Gnedin, A. V. Three sampling formulas. Combin. Probab. Comput. 13 (2004), no. 2, 185—193. 44 EUGENE STRAHOV
work page 2004
-
[14]
Griffiths, R. C.; Lessard, S. Ewens’ sampling formula and related formulae: combinatorial proofs, exten- sion to a variable population size and applications to ages of alleles. Theor. Population Biology 68 (2005) 167–177
work page 2005
-
[15]
Hong, E. P.; Park, J. W. Sample Size and Statistical Power Calculation in Genetic Association Studies. Genomics Inf. (2012); 10(2): 117—122
work page 2012
-
[16]
Hoppe, F. M. P´ olya-like urns and the Ewens’ sampling formula. J. Math. Biol. 20 (1984), no. 1, 91—94
work page 1984
-
[17]
Hoppe, F. M. The sampling theory of neutral alleles and an urn model in population genetics. J. Math. Biol. 25 (1987), no. 2, 123—159
work page 1987
-
[18]
Kingman, J. F. C. Random partitions in population genetics. Proc. R. Soc. London A 361, (1978), 1–20
work page 1978
-
[19]
Kingman, J. F. C. The representation of partition structures. J. London Math. Soc. 18 (1978), 374–380
work page 1978
-
[20]
Korwar, R. M.; Hollander, M. Contributions to the theory of Dirichlet processes. Ann. Probability 1 (1973), 705—711
work page 1973
-
[21]
Kotz, S.; Balakrishnan, N.; Johnson, N. L. Continuous multivariate distributions. Vol. 1. Models and ap- plications. Second edition. Wiley Series in Probability and Statistics: Applied Probability and Statistics
-
[22]
Symmetric Functions and Hall Polynomials
Macdonald, I. Symmetric Functions and Hall Polynomials. Second edition, Oxford Mathematical Mono- graphs, Oxford University Press (1995)
work page 1995
-
[23]
Mahmoud, H. M. P´ olya Urn Models. Texts in Statistical Sciences (2009)
work page 2009
-
[24]
The two-parameter generalization of Ewens’ random partition structure
Pitman, J. The two-parameter generalization of Ewens’ random partition structure. Technical Report 345, Dept Statistics, University of California, Berkeley (1992)
work page 1992
-
[25]
Exchangeable and partially exchangeable random partitions
Pitman, J. Exchangeable and partially exchangeable random partitions. Probab. Theory Related Fields 102 (1995), no. 2, 145—158
work page 1995
-
[26]
Combinatorial Stochastic Processes
Pitman, J. Combinatorial Stochastic Processes. Ecole d’Et´ e de Probabilit´ es de Saint-Flour XXXII-2002. Lecture Notes in Mathematics 1875, Spinger 2002
work page 2002
-
[27]
O’Reilly, R.; Elphick, H. E. Development, clinical utility, and place of ivacaftor in the treatment of cystic fibrosis. Drag Design, Development and Therapy 7 (2013), 929–937
work page 2013
-
[28]
Multiple partition structures and harmonic functions on branching graphs
Strahov, E. Multiple partition structures and harmonic functions on branching graphs. Adv. in Appl. Math. 153 (2024), Paper No. 102617, 49 pp
work page 2024
-
[29]
Generalized regular representations of big wreath products
Strahov, E. Generalized regular representations of big wreath products. Israel J. of Math. (2025) (will appear)
work page 2025
-
[30]
Ancestral Inference in Population Genetics
Tavar´ e, S. Ancestral Inference in Population Genetics. Lectures on Probability Theory and Statistics. Ecole d’Et´ e de Probabilit´ es de Saint-Flour XXXI (2001)
work page 2001
-
[31]
The magical Ewens sampling formula
Tavar´ e, S. The magical Ewens sampling formula. Bull. London Math. Soc. 53 (2021), 1563–1582
work page 2021
-
[32]
Estimating the large mutation parameter of the Ewens sampling formula
Tsukuda, K. Estimating the large mutation parameter of the Ewens sampling formula. J. Appl. Proba- bility 54 (2017), 42–54
work page 2017
-
[33]
Tsukuda, K. On Poisson approximations for the Ewens sampling formula when the mutation parameter grows with the sample size. Ann. Appl. Probab. 29 (2019), no. 2, 1188—1232
work page 2019
-
[34]
Watterson, G. A. The stationary distribution of the infinitely-many neutral alleles diffusion model. J. Appl. Probability 13 (1976), no. 4, 639—651. Department of Mathematics, The Hebrew University of Jerusalem, Givat Ram, Jerusalem 91904, Israel Email address:strahov@math.huji.ac.il
work page 1976
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.