pith. machine review for the scientific record. sign in

arxiv: 2605.12475 · v1 · submitted 2026-05-12 · 🧮 math.PR

Recognition: 2 theorem links

· Lean Theorem

Central limit theorem for the homozygosity of the hierarchical Pitman-Yor process

J. E. Paguyo, Shui Feng

Pith reviewed 2026-05-13 02:52 UTC · model grok-4.3

classification 🧮 math.PR
keywords central limit theoremPitman-Yor processhierarchical modelhomozygositypower sum polynomialsasymptotic varianceBayesian nonparametrics
0
0 comments X

The pith

The hierarchical Pitman-Yor process obeys a central limit theorem for homozygosity and related power-sum statistics on its weights as concentration parameters tend to infinity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper sets out to establish a central limit theorem for the homozygosity and more generally the power sum symmetric polynomials computed from the weights of the hierarchical Pitman-Yor process. The limit is taken as the concentration parameters in the hierarchy tend to infinity, yielding Gaussian limiting distributions with explicitly computable variances. A reader would care because these asymptotics govern the sampling formulas used in Bayesian nonparametric models for grouped data that follow power laws. The explicit variances make visible how the different layers of the hierarchy contribute to overall variability. Compared to the hierarchical Dirichlet process, the Pitman-Yor version requires more involved calculations but exposes the power-law aspects more clearly.

Core claim

We prove a central limit theorem for the family of power sum symmetric polynomials in the weights of the hierarchical Pitman-Yor process when the concentration parameters tend to infinity. Explicit formulas are derived for the asymptotic variances, which display the separate influence of each level in the hierarchical construction. These results are obtained via moment calculations based on the exchangeable partition probability function of the process and are more demanding than the corresponding statements for the hierarchical Dirichlet process while making the power-law features of the model more apparent.

What carries the argument

The exchangeable partition probability function of the hierarchical Pitman-Yor process, together with recursive moment calculations for the weight vectors at each level of the hierarchy.

If this is right

  • The asymptotic sampling formulas for the process are Gaussian after normalization.
  • Each component of the hierarchy contributes additively or through specific products to the asymptotic variance.
  • The results apply directly to understanding fluctuations in clustered data models with power-law cluster sizes.
  • The approach extends previous work on the hierarchical Dirichlet process to the more general Pitman-Yor setting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • These variance formulas might be used to construct asymptotic confidence intervals for estimated homozygosity in finite samples from hierarchical models.
  • Similar CLTs could hold for other functionals of the weights beyond power sums.
  • In applications to population genetics, the theorem would predict the distribution of homozygosity measures under hierarchical Pitman-Yor priors.

Load-bearing premise

The hierarchical Pitman-Yor process uses its standard two-parameter construction at each level, and the limit is taken with all concentration parameters diverging to infinity while the discount parameters remain fixed.

What would settle it

A simulation study with increasingly large concentration parameters where the normalized homozygosity statistic fails to approach a normal distribution with the predicted variance would disprove the central limit theorem.

read the original abstract

The hierarchical Pitman-Yor process is a discrete random measure used as a prior in Bayesian nonparametrics. It is motivated by the study of groups of clustered data exhibiting power law behavior. Our focus in this paper is on the Gaussian behavior of a family of statistics, namely the power sum symmetric polynomials for the vector of weights of the process, as the concentration parameters tend to infinity. We establish a central limit theorem and obtain explicit representations for the asymptotic variance, with the latter clearly showing the impact of each component in the hierarchical structure. These results are crucial for understanding the asymptotic behavior of the sampling formulas associated with the process. In comparison with the known results for the hierarchical Dirichlet process, the results for the hierarchical Pitman-Yor process are mathematically more challenging and structurally more revealing of power law behavior.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper establishes a central limit theorem for the power-sum symmetric polynomials (including homozygosity) of the weights of the hierarchical Pitman-Yor process as all concentration parameters tend to infinity. Explicit asymptotic variance formulas are derived that decompose the contributions from each level of the hierarchy. The argument relies on the exchangeable partition probability function of the process together with direct moment calculations, and the results are presented as more technically demanding than the corresponding statements for the hierarchical Dirichlet process due to the power-law features.

Significance. If the central limit theorem and variance expressions hold, the work is significant for Bayesian nonparametrics because it supplies Gaussian fluctuations and interpretable variances for key functionals of hierarchical random measures that exhibit power-law cluster sizes. The explicit separation of hierarchical contributions in the variance is a concrete strength that can inform the analysis of sampling formulas and statistical procedures built on these priors. The extension beyond the Dirichlet case is a natural and useful step in the literature on exchangeable random partitions.

minor comments (2)
  1. [Abstract] The abstract refers to 'a family of statistics, namely the power sum symmetric polynomials' but does not list the precise collection considered (e.g., which exponents p are treated); adding an explicit enumeration or reference to the relevant definition would improve clarity.
  2. [Introduction] The comparison with the hierarchical Dirichlet process is mentioned but would benefit from a short paragraph in the introduction that recalls the known variance formulas for the Dirichlet case and highlights the new technical obstacles introduced by the Pitman-Yor discount parameter.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our manuscript and for recommending minor revision. The referee's summary accurately captures the paper's contribution: a central limit theorem for the power-sum symmetric polynomials (including homozygosity) of the weights in the hierarchical Pitman-Yor process, together with explicit asymptotic variances that separate the hierarchical contributions and reflect the power-law features. We appreciate the recognition that these results extend the Dirichlet case in a technically more demanding setting and that the variance decomposition is useful for Bayesian nonparametrics.

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained from definition and standard theorems

full rationale

The paper derives a CLT for power-sum symmetric polynomials (including homozygosity) of the hierarchical Pitman-Yor weights in the regime where concentration parameters tend to infinity. It starts from the explicit EPPF of the hierarchical process and performs direct moment calculations on the weights, invoking standard limit theorems for exchangeable partitions. No step reduces by construction to a fitted parameter renamed as prediction, a self-definitional loop, or a load-bearing self-citation whose content is itself unverified. The asymptotic variance formula separates hierarchical contributions via explicit computation rather than by ansatz or renaming. The result is independent of its own outputs and aligns with known single-level cases without internal reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The result rests on the standard construction of the hierarchical Pitman-Yor process via stick-breaking or exchangeable partition probability functions, plus classical central limit theorems for dependent variables.

axioms (2)
  • domain assumption The hierarchical Pitman-Yor process admits an exchangeable partition probability function with power-law behavior controlled by discount and concentration parameters.
    Invoked implicitly when stating the process and its weights.
  • standard math Moments of the power sum polynomials admit asymptotic expansions as concentration parameters tend to infinity.
    Required for the CLT statement and variance calculation.

pith-pipeline@v0.9.0 · 5433 in / 1233 out tokens · 83754 ms · 2026-05-13T02:52:52.170425+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages

  1. [1]

    Argiento, A

    R. Argiento, A. Cremaschi, and M. Vannucci,Hierarchical normalized completely random measures to cluster grouped data, Journal of the American Statistical Association, 115 (529) (2020), 318-333

  2. [2]

    Camerlenghi, A

    F. Camerlenghi, A. Lijoi, P. Orbanz, and I. Prunster,Distribution theory for hierarchical processes, Annals of Statistics, 47 (1) (2019), 67-92

  3. [3]

    C. A. Charalambides,Combinatorial Methods in Discrete Distributions, Wiley, 2005

  4. [4]

    D. A. Dawson and S. Feng,Asymptotic behavior of the Poisson-Dirichlet distribution for large mutation rate, Annals of Applied Probability, 16 (2) (2006), 562-582

  5. [5]

    D. A. Dawson and S. Feng,Large deviations for homozygosity, Electronic Communications in Probability, 21 (83) (2016), 1-8

  6. [6]

    Favaro, S

    S. Favaro, S. Feng, and J. E. Paguyo,Asymptotic behavior of clusters in hierarchical species sampling models, arXiv:2501.09741, (2025)

  7. [7]

    Feng,The Poisson-Dirichlet Distribution and Related Topics: Models and Asymptotic Behaviors, Springer, 2010

    S. Feng,The Poisson-Dirichlet Distribution and Related Topics: Models and Asymptotic Behaviors, Springer, 2010

  8. [8]

    Feng,Hierarchical Dirichlet process and relative entropy, Electronic Communications in Probability, 28 (5) (2023), 1-12

    S. Feng,Hierarchical Dirichlet process and relative entropy, Electronic Communications in Probability, 28 (5) (2023), 1-12

  9. [9]

    Feng and F

    S. Feng and F. Gao,Moderate deviations for Poisson-Dirichlet distribution, Annals of Applied Probability, 18 (5) (2008), 1794-1824

  10. [10]

    Feng and F

    S. Feng and F. Gao,Asymptotic results for the two-parameter Poisson-Dirichlet distribution, Stochastic Processes and their Applications, 120 (2010), 1159-1177

  11. [11]

    Feng and J

    S. Feng and J. E. Paguyo,Central limit theorems associated with the hierarchical Dirichlet process, Stochastic Processes and their Applications, 190 (2025), 104767

  12. [12]

    T. S. Ferguson,A Bayesian analysis of some nonparametric problems, Annals of Statistics, 1 (1973), 209-230

  13. [13]

    Ghosal and A

    S. Ghosal and A. van der Vaart,Fundamentals of nonparametric Bayesian inference, Cambridge Series in Statistical and Probabilistic Mathematics 44, Cambridge University Press, 2017

  14. [14]

    R. C. Griffiths,On the distribution of allele frequencies in a diffusion model, Theoretical Population Biology, 15 (1979), 140-158

  15. [15]

    Handa,The two-parameter Poisson-Dirichlet point process, Bernoulli, 15 (4) (2009), 1082-1116

    K. Handa,The two-parameter Poisson-Dirichlet point process, Bernoulli, 15 (4) (2009), 1082-1116. CLT FOR THE HOMOZYGOSITY OF THE HPYP 21

  16. [16]

    O. C. Herfindahl,Concentration in the U.S. Steel Industry, Unpublished doctoral dissertation, Columbia University, (1950)

  17. [17]

    A. O. Hirschman,National Power and the Structure of Foreign Trade, University of California Press, Berkeley, 1945

  18. [18]

    Joyce, S

    P. Joyce, S. M. Krone, and T. G. Kurtz,Gaussian limits associated with the Poisson-Dirichlet distribution and the Ewens sampling formula, Annals of Applied Probability, 12 (1) (2002), 101-124

  19. [19]

    Joyce, S

    P. Joyce, S. M. Krone, and T. G. Kurtz,When can one detect overdominant selection in the infinite-alleles model?, Annals of Applied Probability, 13 (1) (2003), 181-212

  20. [20]

    Pitman and M

    J. Pitman and M. Yor,The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator, Annals of Probability, 25 (2) (1997), 855-900

  21. [21]

    E. H. Simpson,Measurement of diversity, Nature, 163 (1949), 688-688

  22. [22]

    Y. W. Teh,A hierarchical Bayesian language model based on Pitman-Yor processes, Proceedings of the 21st International Conference on Computation Linguistics and 44th Annual Meeting of the ACL, (2006), 985-992

  23. [23]

    Y. W. Teh and M. I. Jordan,Hierarchical Bayesian nonparametric models with applications, In: Bayesian Nonparametrics, Cambridge University Press, 2010

  24. [24]

    Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei,Hierarchical Dirichlet processes, Journal of the American Statistical Association, 101 (476) (2006), 1566-1581. Department of Mathematics & Statistics, McMaster University, Hamilton, ON, L8S 4K1, Canada E-mail address:paguyoj@mcmaster.ca