pith. sign in

arxiv: 2606.13554 · v1 · pith:ZYSEYTQLnew · submitted 2026-06-11 · 🧮 math.ST · stat.ME· stat.TH

Asymptotic regimes for maximum likelihood estimation in the Ewens--Pitman model: When the strength parameter matters

Pith reviewed 2026-06-27 04:55 UTC · model grok-4.3

classification 🧮 math.ST stat.MEstat.TH
keywords Ewens-Pitman modelmaximum likelihood estimationasymptotic regimesfrequency spectrumrandom partitionsinfinite exchangeabilityscaled model
0
0 comments X

The pith

The MLE for Ewens-Pitman parameters (α, θ) displays four distinct asymptotic regimes based on the frequency spectrum limit.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines the large-sample behavior of the maximum likelihood estimator for the discount and strength parameters in the Ewens-Pitman model of random partitions. It identifies four regimes that depend on the limiting behavior of the frequency spectrum, showing that the strength parameter θ can influence the asymptotics in ways not captured by prior studies. This arises because previous work relied on infinite exchangeability, which imposes a strict relation tying the number of blocks to the spectrum. The authors introduce a scaled version of the model allowing θ to grow with sample size n to handle a broader class of spectra, supported by evidence from real data.

Core claim

Four distinct regimes arise for the maximum likelihood estimator of (α, θ) depending on the limiting behaviour of the frequency spectrum; in contrast with previous work, θ may play a crucial role asymptotically. The restriction to two regimes in the literature stems from infinite exchangeability constraints, which can be overcome by the scaled Ewens-Pitman model in which θ grows with n.

What carries the argument

The frequency spectrum and its limiting regimes, which classify the asymptotic behavior of the MLE and are constrained under infinite exchangeability.

If this is right

  • The MLE for θ can be asymptotically relevant in regimes not covered by standard infinite exchangeability.
  • Only two of the four regimes are accessible under the classical Ewens-Pitman model due to the rigid structural relation between number of distinct blocks and frequency spectrum.
  • The scaled Ewens-Pitman model extends the framework by letting θ grow with sample size n.
  • Real-world frequency spectra may fall outside the classical framework, as shown by empirical evidence.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This implies that analyses assuming infinite exchangeability may miss important asymptotic behaviors in finite samples.
  • Extensions like the scaled model could be applied to other partition models to increase flexibility.
  • Data analysts might check the empirical frequency spectrum to select the appropriate regime for inference.

Load-bearing premise

The analysis assumes mild conditions on the data-generating mechanism that guarantee the frequency spectrum possesses a well-defined limiting behavior capable of distinguishing the four regimes.

What would settle it

A dataset whose frequency spectrum converges to a limit that places it outside the two regimes allowed by infinite exchangeability, yet whose MLE for θ follows the behavior of one of the two classical regimes, would falsify the claim of four regimes.

Figures

Figures reproduced from arXiv: 2606.13554 by Filippo Ascolani, Mario Beraha, Stefano Favaro.

Figure 1
Figure 1. Figure 1: Estimated 𝜃ˆ 𝑛 (solid line) and 95% pointwise confidence intervals (shaded area) as a function of the sample size in three different real datasets. See Section 4 for more details. 2 Maximum likelihood estimation in the Ewens–Pitman model 2.1 The asymptotic behaviour of the Maximum Likelihood Estimators Consider a sample 𝑋1:𝑛 = (𝑋1, . . . , 𝑋𝑛) featuring 𝐾𝑛 distinct values, with multiplicities (or block siz… view at source ↗
Figure 2
Figure 2. Figure 2: Empirical and estimated frequency-of-frequency spectrum [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
read the original abstract

We study the large sample asymptotic behaviour of the Maximum Likelihood Estimator of the discount and strength parameters $(\alpha,\theta)$ in the Ewens--Pitman model for random partitions, under mild assumptions on the data-generating mechanism. We show that four distinct regimes arise, depending on the limiting behaviour of the frequency spectrum. In particular, in contrast with previous work, we find that $\theta$ may play a crucial role asymptotically. We further show that the existing literature implicitly focuses on only two of these regimes, and we relate this restriction to the constraints imposed by infinite exchangeability. Under the latter, indeed, the number of distinct blocks and the frequency spectrum are necessarily tied by a rigid structural relation. We prove that this lack of flexibility can be overcome through what we call the scaled Ewens--Pitman model, in which $\theta$ is allowed to grow with the sample size $n$. Finally, we provide empirical evidence from real-world data showing that such extensions are needed to capture frequency spectra that fall outside the classical Ewens--Pitman framework.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript analyzes the large-sample asymptotics of the MLE for the discount and strength parameters (α, θ) in the Ewens–Pitman model for random partitions. Under mild conditions on the data-generating process, it identifies four distinct regimes determined by the limiting behavior of the frequency spectrum. The authors show that θ can play a non-negligible role in some regimes (in contrast to prior work), relate the restriction to only two regimes in the existing literature to the structural constraints of infinite exchangeability, introduce a scaled Ewens–Pitman model in which θ is permitted to grow with n, and supply empirical illustrations from real data.

Significance. If the regime classification and associated limit theorems hold, the paper supplies a more complete asymptotic theory for inference in the Ewens–Pitman family, clarifying when each parameter is identifiable and when the classical model is misspecified. The explicit link between exchangeability and the admissible regimes, together with the scaled extension, offers a principled way to enlarge the model class while retaining the partition structure; the empirical examples indicate that the additional regimes are observable in practice.

major comments (2)
  1. [§3, Theorem 3.1] §3, Theorem 3.1 and the regime definitions that follow: the four regimes are stated to be distinguished by the limiting frequency spectrum, yet the proof that the MLE converges to different limits in each regime appears to rely on the spectrum limit being known a priori; it is not shown that the regimes remain distinguishable when the spectrum limit must itself be estimated from the same data.
  2. [§5.1, Proposition 5.2] §5.1, Proposition 5.2: the claim that infinite exchangeability forces the number of blocks K_n and the frequency spectrum to satisfy a rigid relation is central to the motivation for the scaled model, but the argument only treats the two-parameter Ewens–Pitman case; it is unclear whether the same rigidity persists under the mild conditions stated in Assumption 2.1 or whether additional regimes become admissible even without scaling.
minor comments (3)
  1. [§2] Notation for the frequency spectrum (e.g., the definition of the empirical measure μ_n) is introduced in §2 but used with varying normalizations in later sections; a single consolidated definition would improve readability.
  2. [§6] The empirical section (§6) reports point estimates and regime assignments but does not include standard errors or bootstrap intervals for the MLE; adding these would strengthen the claim that the observed spectra fall outside the classical regimes.
  3. [Introduction] Several references to earlier work on the Ewens–Pitman MLE (e.g., the papers cited for the two-regime case) are listed in the bibliography but not discussed in the introduction; a brief comparison paragraph would clarify the precise advance.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below with clarifications and indicate the revisions that will be incorporated.

read point-by-point responses
  1. Referee: [§3, Theorem 3.1] §3, Theorem 3.1 and the regime definitions that follow: the four regimes are stated to be distinguished by the limiting frequency spectrum, yet the proof that the MLE converges to different limits in each regime appears to rely on the spectrum limit being known a priori; it is not shown that the regimes remain distinguishable when the spectrum limit must itself be estimated from the same data.

    Authors: The regimes are defined as properties of the true data-generating process under Assumption 2.1, which fixes the limiting frequency spectrum. Theorem 3.1 establishes the asymptotic behavior of the MLE conditional on the true regime. The spectrum limit characterizes the underlying distribution rather than being an observed quantity for the theorem statement. In practice, the regime can be diagnosed by separately estimating the frequency spectrum, but the theoretical convergence results hold conditionally. We will add a short clarifying remark in Section 3 distinguishing the theoretical regime classification from practical identification. revision: partial

  2. Referee: [§5.1, Proposition 5.2] §5.1, Proposition 5.2: the claim that infinite exchangeability forces the number of blocks K_n and the frequency spectrum to satisfy a rigid relation is central to the motivation for the scaled model, but the argument only treats the two-parameter Ewens–Pitman case; it is unclear whether the same rigidity persists under the mild conditions stated in Assumption 2.1 or whether additional regimes become admissible even without scaling.

    Authors: Proposition 5.2 uses the two-parameter Ewens–Pitman model as a concrete illustration, but the rigidity between K_n and the frequency spectrum is a direct consequence of infinite exchangeability of the partition. Assumption 2.1 maintains this exchangeability structure while relaxing the parametric form; the same structural relation therefore persists, and additional regimes remain inadmissible without scaling θ with n. We will revise the text in Section 5.1 to state explicitly that the rigidity arises from exchangeability and applies under the general conditions of Assumption 2.1. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The derivation classifies four asymptotic regimes for the MLE of (α, θ) according to the limiting behavior of the frequency spectrum under mild external assumptions on the data-generating mechanism. These regimes are not defined in terms of the fitted parameters or by construction from the MLE itself; the scaled Ewens-Pitman extension is introduced to relax exchangeability constraints rather than to rename or refit existing quantities. No load-bearing self-citation, self-definitional step, or reduction of a claimed prediction to an input fit is present in the argument structure.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the existence of limiting frequency spectrum behavior under mild data-generating assumptions and on standard properties of the Ewens-Pitman model; the scaled model is introduced as an extension without independent external validation in the abstract.

axioms (1)
  • domain assumption Mild assumptions on the data-generating mechanism guarantee a limiting frequency spectrum that distinguishes regimes
    Invoked in the abstract as the basis for the four-regime classification.
invented entities (1)
  • scaled Ewens-Pitman model no independent evidence
    purpose: Allow θ to grow with sample size n to capture frequency spectra outside the classical exchangeable framework
    Introduced to overcome the rigid relation between number of blocks and frequency spectrum imposed by infinite exchangeability.

pith-pipeline@v0.9.1-grok · 5726 in / 1311 out tokens · 27701 ms · 2026-06-27T04:55:37.588992+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 7 canonical work pages

  1. [1]

    Conditional formulae for Gibbs-type exchangeable random partitions , volume =

    Favaro, Stefano and Lijoi, Antonio and Pr. Conditional formulae for Gibbs-type exchangeable random partitions , volume =. The Annals of Applied Probability , number =

  2. [2]

    The asymptotic expansion of a ratio of gamma functions , volume =

    Erd. The asymptotic expansion of a ratio of gamma functions , volume =. Pacific Journal of Mathematics , number =

  3. [3]

    A martingale approach to Gaussian fluctuations and laws of iterated logarithm for Ewens-Pitman model , volume =

    Bercu, Bernard and Favaro, Stefano , date-added =. A martingale approach to Gaussian fluctuations and laws of iterated logarithm for Ewens-Pitman model , volume =. Stochastic Processes and their Applications , pages =

  4. [4]

    Power-law distributions in empirical data , volume =

    Clauset, Aaron and Shalizi, Cosma Rohilla and Newman, Mark EJ , journal =. Power-law distributions in empirical data , volume =

  5. [5]

    Edge exchangeable models for interaction networks , volume =

    Crane, Harry and Dempsey, Walter , journal =. Edge exchangeable models for interaction networks , volume =

  6. [6]

    Cereda, Giulia and Corradi, Fabio and Viscardi, Cecilia , journal =

  7. [7]

    Generalized hypergeometric, digamma and trigamma distributions , volume =

    Sibuya, Masaaki , journal =. Generalized hypergeometric, digamma and trigamma distributions , volume =

  8. [8]

    Bercu, Bernard and Favaro, Stefano , journal =

  9. [9]

    Central limit theorems for certain infinite urn schemes , volume =

    Karlin, Samuel , journal =. Central limit theorems for certain infinite urn schemes , volume =

  10. [10]

    The number of small blocks in exchangeable random partitions , volume =

    Schweinsberg, Jason , journal =. The number of small blocks in exchangeable random partitions , volume =

  11. [11]

    Notes on the occupancy problem with infinitely many boxes: general asymptotics and power laws , volume =

    Gnedin, Alexander and Hansen, Ben and Pitman, Jim , journal =. Notes on the occupancy problem with infinitely many boxes: general asymptotics and power laws , volume =

  12. [12]

    Koriyama, Takuya and Matsuda, Takeru and Komaki, Fumiyasu , journal =

  13. [13]

    Asymptotic statistics , volume =

    van der Vaart, Aad , publisher =. Asymptotic statistics , volume =

  14. [14]

    Regular variation , volume =

    Bingham, Nicholas H and Goldie, Charles M and Teugels, Jef L , publisher =. Regular variation , volume =

  15. [15]

    Bayesian nonparametric inference for ``species-sampling'' problems , volume =

    Balocchi, Cecilia and Favaro, Stefano and Naulet, Zacharie , journal =. Bayesian nonparametric inference for ``species-sampling'' problems , volume =

  16. [16]

    Franssen, SEMP and van der Vaart, AW , journal =

  17. [17]

    The Annals of Probability , number =

    Pitman, Jim and Yor, Marc , doi =. The Annals of Probability , number =. 1997 , Bdsk-Url-1 =

  18. [18]

    , journal =

    Ishwaran, Hemant and James, Lancelot F. , journal =

  19. [19]

    2006 , Bdsk-Url-1 =

    Pitman, Jim , doi =. 2006 , Bdsk-Url-1 =

  20. [20]

    2020 , Bdsk-Url-1 =

    Lijoi, Antonio and Pr. 2020 , Bdsk-Url-1 =. doi:10.1093/biomet/asaa030 , journal =

  21. [21]

    2008 , Bdsk-Url-1 =

    Lijoi, Antonio and Pr. 2008 , Bdsk-Url-1 =. doi:10.1214/07-AAP495 , journal =

  22. [22]

    Journal of Mathematical Sciences , pages =

    Gnedin, Alexander and Pitman, Jim , doi =. Journal of Mathematical Sciences , pages =. 2006 , Bdsk-Url-1 =

  23. [23]

    Bayesian Nonparametrics , doi =

    Lijoi, Antonio and Pr. Bayesian Nonparametrics , doi =. 2010 , Bdsk-Url-1 =

  24. [24]

    2015 , Bdsk-Url-1 =

    De Blasi, Pierpaolo and Favaro, Stefano and Lijoi, Antonio and Mena, Rams. 2015 , Bdsk-Url-1 =. doi:10.1109/TPAMI.2013.217 , journal =

  25. [25]

    Bayesian nonparametric estimation of the probability of discovering new species , volume =

    Lijoi, Antonio and Mena, Rams. Bayesian nonparametric estimation of the probability of discovering new species , volume =. 2007 , Bdsk-Url-1 =. doi:10.1093/biomet/asm061 , journal =

  26. [26]

    2007 , Bdsk-Url-1 =

    Lijoi, Antonio and Mena, Rams. 2007 , Bdsk-Url-1 =. doi:10.1186/1471-2105-8-339 , journal =

  27. [27]

    Particle

    Favaro, Stefano and Lijoi, Antonio and Mena, Rams. 2009 , Bdsk-Url-1 =. doi:10.1111/j.1467-9868.2009.00717.x , journal =

  28. [28]

    and Johnson, Mark , booktitle =

    Goldwater, Sharon and Griffiths, Thomas L. and Johnson, Mark , booktitle =. Interpolating between types and tokens by estimating power-law generators , year =

  29. [29]

    2006 , Bdsk-Url-1 =

    Teh, Yee Whye , booktitle =. 2006 , Bdsk-Url-1 =. doi:10.3115/1220175.1220299 , pages =

  30. [30]

    Beraha, Mario and Favaro, Stefano , journal =

  31. [31]

    Gerlach, Martin and Altmann, Eduardo G. , doi =. Physical Review X , number =. 2013 , Bdsk-Url-1 =

  32. [32]

    Heaps' law, statistics of shared components, and temporal patterns from a sample-space-reducing process , volume =

    Mazzolini, Andrea and Colliva, Alberto and Caselle, Michele and Osella, Matteo , doi =. Heaps' law, statistics of shared components, and temporal patterns from a sample-space-reducing process , volume =. Physical Review E , number =. 2018 , Bdsk-Url-1 =

  33. [33]

    Collective dynamics of social annotation , volume =

    Cattuto, Ciro and Barrat, Alain and Baldassarri, Andrea and Schehr, Gregory and Loreto, Vittorio , doi =. Collective dynamics of social annotation , volume =. Proceedings of the National Academy of Sciences of the United States of America , number =. 2009 , Bdsk-Url-1 =