Mathematical Characterization of Private and Public Immune Repertoire Sequences
Pith reviewed 2026-05-24 12:05 UTC · model grok-4.3
The pith
A general probabilistic model for clone abundances yields exact formulas for the mean and variance of immune receptor richness and overlap across individuals.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using a general probabilistic model for T/B cell receptor clone abundances to define publicness or privateness and information-theoretic measures for comparing the frequency of sampled sequences observed across different individuals, we derive mathematical formulae to quantify the mean and the variances of clone richness and overlap. Our results can be used to evaluate the effect of different sampling protocols on abundances of clones within an individual as well as the commonality of clones across individuals. Using synthetic and empirical TCR amino acid sequence data, we perform simulations to study expected clonal commonalities across multiple individuals and compare them with the analyt
What carries the argument
A general probabilistic model for clone abundances that permits independent sampling across individuals, allowing overlap statistics to be derived without specifying the exact VDJ recombination mechanism.
If this is right
- The formulas quantify how different sampling protocols change the observed abundances of clones within and across individuals.
- Explicit closed-form expressions for richness and its uncertainty are available when clone abundances follow a single-parameter truncated power-law distribution.
- The information loss incurred by grouping receptor sequences together, as in spectratyping, can be calculated directly from the model.
- Simulations with synthetic and real TCR data confirm that the analytical predictions match observed clonal commonalities.
Where Pith is reading between the lines
- The separation of abundance statistics from the generation mechanism could let researchers test whether observed public clones exceed what is expected from sampling alone.
- Variance formulas supply a direct route to statistical tests for whether sharing between patient groups differs from healthy controls.
- The same expressions could be reused to compare repertoires collected before and after vaccination once an abundance model is fitted.
Load-bearing premise
Clone abundances follow a probabilistic distribution that permits independent sampling across individuals.
What would settle it
Fit an abundance distribution to TCR data from many individuals, then check whether the measured mean and variance of sequence overlap in a new cohort fall within the predicted ranges given the fitted distribution.
read the original abstract
Diverse T and B cell repertoires play an important role in mounting effective immune responses against a wide range of pathogens and malignant cells. The number of unique T and B cell clones is characterized by T and B cell receptors (TCRs and BCRs), respectively. Although receptor sequences are generated probabilistically by recombination processes, clinical studies found a high degree of sharing of TCRs and BCRs among different individuals. In this work, we use a general probabilistic model for T/B cell receptor clone abundances to define "publicness" or "privateness" and information-theoretic measures for comparing the frequency of sampled sequences observed across different individuals. We derive mathematical formulae to quantify the mean and the variances of clone richness and overlap. Our results can be used to evaluate the effect of different sampling protocols on abundances of clones within an individual as well as the commonality of clones across individuals. Using synthetic and empirical TCR amino acid sequence data, we perform simulations to study expected clonal commonalities across multiple individuals. Based on our formulae, we compare these simulated results with the analytically predicted mean and variances of the repertoire overlap. Complementing the results on simulated repertoires, we derive explicit expressions for the richness and its uncertainty for specific, single-parameter truncated power-law probability distributions. Finally, the information loss associated with grouping together certain receptor sequences, as is done in spectratyping, is also evaluated. Our approach can be, in principle, applied under more general and mechanistically realistic clone generation models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a general probabilistic model for T/B cell receptor clone abundances that treats sampling across individuals as independent. It derives closed-form expressions for the mean and variance of clone richness and overlap using linearity of expectation and indicator variables, validates the formulas via Monte Carlo simulations on both synthetic draws from the model and empirical TCR frequency data, specializes the results to single-parameter truncated power-law distributions, and quantifies information loss from sequence grouping as in spectratyping.
Significance. If the derivations hold, the work supplies analytical, parameter-light tools for quantifying public versus private repertoire sequences and the effects of sampling depth on observed richness and overlap. These expressions can directly inform experimental design in immunology without requiring a full mechanistic model of VDJ recombination, and the simulation checks provide a concrete falsifiability route for the formulas.
minor comments (2)
- [Model definition] The model definition paragraph states that sampling across individuals is independent, but the text does not explicitly list the precise independence assumptions used when deriving the overlap variance (e.g., whether clone abundances are drawn once per individual or re-sampled). Adding a short enumerated list of the independence statements would remove any ambiguity for readers applying the formulas.
- [Truncated power-law specialization] In the section presenting explicit expressions for the truncated power-law case, the normalization constant for the truncated distribution is left implicit; writing the closed form for the normalizing factor (even if standard) would make the richness and variance formulas fully self-contained.
Simulated Author's Rebuttal
We thank the referee for their positive summary, significance assessment, and recommendation for minor revision. No specific major comments were provided in the report, so we have no individual points requiring point-by-point rebuttal or clarification. The manuscript stands as submitted, and we are prepared to address any minor editorial suggestions that may arise.
Circularity Check
No significant circularity
full rationale
The central derivations apply linearity of expectation and variance formulas for indicator variables to a stated general probabilistic model of clone abundances that treats sampling across individuals as independent; the resulting closed-form expressions for mean and variance of richness and overlap are compared to independent Monte Carlo simulations on synthetic draws and empirical frequencies, with no reduction of the reported quantities to fitted parameters or self-citation chains by construction. The model is defined explicitly without embedding the target statistics, and explicit expressions are also given for specific truncated power-law distributions without circularity.
Axiom & Free-Parameter Ledger
free parameters (1)
- parameter of the truncated power-law distribution
axioms (2)
- domain assumption T/B cell receptor clone abundances follow a general probabilistic model that can be sampled across individuals
- standard math Standard rules of probability for computing expectations and variances apply to the clone abundance distribution
Reference graph
Works this paper leans on
-
[1]
Physical Biology 17(3), 031001 (2020)
Xu, S., B¨ ottcher, L., Chou, T.: Diversity in biology: definitions, qu antifi- cation and models. Physical Biology 17(3), 031001 (2020)
work page 2020
-
[2]
Immunology Today 13(8), 306–314 (1992)
Alt, F.W., Oltz, E.M., Young, F., Gorman, J., Taccioli, G., Chen, J.: VDJ recombination. Immunology Today 13(8), 306–314 (1992)
work page 1992
-
[3]
Churchill Livingstone , London (1997)
Travers, P., Walport, M., Shlomchik, M.J., Janeway, M.C.: Immunobio l- ogy: the Immune System in Health and Disease. Churchill Livingstone , London (1997)
work page 1997
-
[4]
Journal of Investigative Dermatology 126(1), 25–31 (2006)
Girardi, M.: Immunosurveillance and immunoregulation by γδ T cells. Journal of Investigative Dermatology 126(1), 25–31 (2006)
work page 2006
-
[5]
Front iers in Immunology 5, 13 (2014)
Yates, A.: Theories and quantification of thymic selection. Front iers in Immunology 5, 13 (2014)
work page 2014
-
[6]
Elsevier Health Sciences, New Delhi (2021)
Abbas, A.K., Lichtman, A.H., Pillai, S.: Cellular and Molecular Immunol- ogy, Tenth Edition, South Asia Edition. Elsevier Health Sciences, New Delhi (2021)
work page 2021
-
[7]
The Journal of Experimental Medicine 181(4), 1391–1398 (1995)
Davodeau, F., Peyrat, M.A., Romagne, F., Necker, A., Hallet, M.M., V ie, H., Bonneville, M.: Dual T cell receptor beta chain expression on huma n T lymphocytes. The Journal of Experimental Medicine 181(4), 1391–1398 (1995)
work page 1995
-
[8]
Padovan, E., Giachino, C., Cella, M., Valitutti, S., Acuto, O., Lanzave c- chia, A.: Normal T lymphocytes can express two different T cell rece ptor beta chains: implications for the mechanism of allelic exclusion. The Journal of Experimental Medicine 181(4), 1587–1591 (1995) Springer Nature 2021 LATEX template 30 Private and Public TCR statistics
work page 1995
-
[9]
Schuldt, N.J., Binstadt, B.A.: Dual TCR T cells: identity crisis or multitaskers? The Journal of Immunology 202(3), 637–644 (2019)
work page 2019
-
[10]
Rybakin, V., Westernberg, L., Fu, G., Kim, H.-O., Ampudia, J., Saue r, K., Gascoigne, N.R.J.: Allelic exclusion of TCR α -chains upon severe restriction of V α repertoire. PLOS One 9(12), 114320 (2014)
work page 2014
-
[11]
European Journal of Immunology 39(9), 2317–2324 (2009)
Tussiwand, R., Bosco, N., Ceredig, R., Rolink, A.G.: Tolerance chec k- points in B-cell development: Johnny B good. European Journal of Immunology 39(9), 2317–2324 (2009)
work page 2009
-
[12]
Nature 334(6181), 395–402 (1988)
Davis, M.M., Bjorkman, P.J.: T-cell antigen receptor genes and T -cell recognition. Nature 334(6181), 395–402 (1988)
work page 1988
-
[13]
Venturi, V., Price, D.A., Douek, D.C., Davenport, M.P.: The molecula r basis for public T-cell responses? Nature Reviews Immunology 8(3), 231– 238 (2008)
work page 2008
-
[14]
Frontiers in Immunology 4, 485 (2013)
Zarnitsyna, V., Evavold, B., Schoettle, L., Blattman, J., Antia, R .: Esti- mating the diversity, completeness, and cross-reactivity of the T cell repertoire. Frontiers in Immunology 4, 485 (2013)
work page 2013
-
[15]
Lythe, G., Callard, R.E., Hoare, R.L., Molina-Par ´ ıs, C.: How many TC R clonotypes does a body maintain? Journal of Theoretical Biology 389, 214–224 (2016)
work page 2016
-
[16]
Philosophical Transactions of the Royal Society B: Biological Scienc es 370(1675), 20140291 (2015)
Laydon, D.J., Bangham, C.R.M., Asquith, B.: Estimating T-cell repe r- toire diversity: limitations of classical estimators and a new approac h. Philosophical Transactions of the Royal Society B: Biological Scienc es 370(1675), 20140291 (2015)
work page 2015
-
[17]
The Journal of Immunology 164(11), 5782–5787 (2000)
Casrouge, A., Beaudoing, E., Dalle, S., Pannetier, C., Kanellopoulo s, J., Kourilsky, P.: Size estimate of the αβ TCR repertoire of naive mouse splenocytes. The Journal of Immunology 164(11), 5782–5787 (2000)
work page 2000
-
[18]
Cell Reports 32(2), 107882 (2020)
Soto, C., Bombardi, R.G., Kozhevnikov, M., Sinkovits, R.S., Chen, E .C., Branchizio, A., Kose, N., Day, S.B., Pilkinton, M., Gujral, M., et al.: High frequency of shared clonotypes in human T cell receptor reperto ires. Cell Reports 32(2), 107882 (2020)
work page 2020
-
[19]
PLOS One 11(8), 0160853 (2016)
DeWitt, W.S., Lindau, P., Snyder, T.M., Sherwood, A.M., Vignali, M., Carlson, C.S., Greenberg, P.D., Duerkopp, N., Emerson, R.O., Robins, H.S.: A public database of memory and naive B-cell receptor sequenc es. PLOS One 11(8), 0160853 (2016)
work page 2016
-
[20]
: Mother and child T cell receptor repertoires: deep profiling study
Putintseva, E.V., Britanova, O.V., Staroverov, D.B., Merzlyak, E .M., Tur- chaninova, M.A., Shugay, M., Bolotin, D.A., Pogorelyy, M.V., Mamedov, Springer Nature 2021 LATEX template Private and Public TCR statistics 31 I.Z., Bobrynina, V., et al. : Mother and child T cell receptor repertoires: deep profiling study. Frontiers in Immunology 4, 463 (2013)
work page 2021
-
[21]
Science Translation al Medicine 2(47), 47–644764 (2010)
Robins, H.S., Srivastava, S.K., Campregher, P.V., Turtle, C.J., And riesen, J., Riddell, S.R., Carlson, C.S., Warren, E.H.: Overlap and effective size of the human CD8+ T cell receptor repertoire. Science Translation al Medicine 2(47), 47–644764 (2010)
work page 2010
-
[22]
Frontiers in Immunology 4, 466 (2013)
Shugay, M., Bolotin, D.A., Putintseva, E.V., Pogorelyy, M.V., Mamed ov, I.Z., Chudakov, D.M.: Huge overlap of individual TCR beta repertoires . Frontiers in Immunology 4, 466 (2013)
work page 2013
-
[23]
Na ture 566(7744), 393–397 (2019)
Briney, B., Inderbitzin, A., Joyce, C., Burton, D.R.: Commonality d espite exceptional diversity in the baseline human antibody repertoire. Na ture 566(7744), 393–397 (2019)
work page 2019
-
[24]
Nature 566(7744), 398–402 (2019)
Soto, C., Bombardi, R.G., Branchizio, A., Kose, N., Matta, P., Sevy , A.M., Sinkovits, R.S., Gilchuk, P., Finn, J.A., Crowe, J.E.: High frequency of shared clonotypes in human B cell receptor repertoires. Nature 566(7744), 398–402 (2019)
work page 2019
-
[25]
The Journal of Immunology 199(8), 2985–2997 (2017)
Greiff, V., Weber, C.R., Palme, J., Bodenhofer, U., Miho, E., Menzel, U., Reddy, S.T.: Learning the high-dimensional immunogenomic featur es that predict public and private antibody repertoires. The Journal of Immunology 199(8), 2985–2997 (2017)
work page 2017
-
[26]
Immunological Reviews 284(1), 167–179 (2018)
Elhanati, Y., Sethna, Z., Callan Jr., C.G., Mora, T., Walczak, A.M.: Pre - dicting the spectrum of TCR repertoire sharing with a data-driven m odel of recombination. Immunological Reviews 284(1), 167–179 (2018)
work page 2018
-
[27]
PLoS Genetics 19(2), 1010652 (2023)
Ruiz Ortega, M., Spisak, N., Mora, T., Walczak, A.M.: Modeling and predicting the overlap of B-and T-cell receptor repertoires in hea lthy and SARS-CoV-2 infected individuals. PLoS Genetics 19(2), 1010652 (2023)
work page 2023
-
[28]
Gorski, J., Yassai, M., Zhu, X., Kissela, B., Keever, C., Flomenberg , N., et al. : Circulating T cell repertoire complexity in normal individuals and bone marrow recipients analyzed by CDR3 size spectratyping. corr elation with immune status. The Journal of Immunology 152(10), 5109–5119 (1994)
work page 1994
-
[29]
Journal of Immunological Methods 440, 1–11 (2017)
Fozza, C., Barraqueddu, F., Corda, G., Contini, S., Virdis, P., Dor e, F., Bonfigli, S., Longinotti, M.: Study of the T-cell receptor reperto ire by CDR3 spectratyping. Journal of Immunological Methods 440, 1–11 (2017)
work page 2017
-
[30]
GitLab repository. https://gitlab.com/ComputationalScience/immune repertoires (2022) Springer Nature 2021 LATEX template 32 Private and Public TCR statistics
work page 2022
-
[31]
Goyal, S., Kim, S., Chen, I.S.Y., Chou, T.: Mechanisms of blood home- ostasis: lineage tracking and a neutral model of cell populations in rhesus macaques. BMC Biology 13(1), 85 (2015). https://doi.org/10. 1186/s12915-015-0191-8
work page 2015
-
[32]
Bulletin of Mathematical Biology 81, 2783–2817 (2019)
Lewkiewicz, S., Chuang, Y.-L., Chou, T.: A mathematical model of the effects of aging on naive T-cell populations and diversity. Bulletin of Mathematical Biology 81, 2783–2817 (2019)
work page 2019
-
[33]
P hilosoph- ical Transactions of the Royal Society B: Biological Sciences 370(1676), 20140243 (2015)
Elhanati, Y., Sethna, Z., Marcou, Q., Callan Jr., C.G., Mora, T., Walcz ak, A.M.: Inferring processes underlying B-cell repertoire diversity. P hilosoph- ical Transactions of the Royal Society B: Biological Sciences 370(1676), 20140243 (2015)
work page 2015
-
[34]
: Individualized VDJ recombination predisposes the available Ig sequen ce space
Slabodkin, A., Chernigovskaya, M., Mikocziova, I., Akbar, R., Sch effer, L., Pavlovi´ c, M., Bashour, H., Snapkov, I., Mehta, B.B., Weber, C.R., et al. : Individualized VDJ recombination predisposes the available Ig sequen ce space. Genome Research 31(12), 2209–2224 (2021)
work page 2021
-
[35]
Frontiers in Immunology 12, 735135 (2022)
Dessalles, R., Pan, Y., Xia, M., Maestrini, D., D’Orsogna, M.R., Chou, T.: How naive T-cell clone counts are shaped by heterogeneous thy mic output and homeostatic proliferation. Frontiers in Immunology 12, 735135 (2022)
work page 2022
-
[36]
Journal of Statistical Physics 173, 182–221 (2018)
Dessalles, R., D’Orsogna, M., Chou, T.: Exact steady-state dist ributions of multispecies birth-death-immigration processes: effects of mut ations and carrying capacity on diversity. Journal of Statistical Physics 173, 182–221 (2018)
work page 2018
-
[37]
Journal of Mathematical Biology 67(6-7), 1339–1368 (2013)
Rempala, G.A., Seweryn, M.: Methods for diversity and overlap an alysis in T-cell receptor populations. Journal of Mathematical Biology 67(6-7), 1339–1368 (2013)
work page 2013
-
[38]
Hampton, J., Lladser, M.E.: Estimation of distribution overlap of u rn models. PLOS One 7(11), 42368 (2012)
work page 2012
-
[39]
Statistica Sinica 7, 875–892 (1997)
Chen, S.X., Liu, J.S.: Statistical applications of the Poisson-Binom ial and conditional Bernoulli distributions. Statistica Sinica 7, 875–892 (1997)
work page 1997
-
[40]
Computational Statistics and Data Analysis 59, 41–51 (2013)
Hong, Y.: On computing the distribution function for the Poisson bino- mial distribution. Computational Statistics and Data Analysis 59, 41–51 (2013)
work page 2013
-
[41]
Chao, A., Lin, C.-W.: Nonparametric lower bounds for species rich - ness and shared species richness under sampling without replaceme nt. Biometrics 68(3), 912–921 (2012) Springer Nature 2021 LATEX template Private and Public TCR statistics 33
work page 2012
-
[42]
PLoS Computational Biology 15(3), 1006898 (2019)
Larremore, D.B.: Bayes-optimal estimation of overlap between popula- tions of fixed size. PLoS Computational Biology 15(3), 1006898 (2019)
work page 2019
-
[43]
Proceedings of the National Academy of Sciences 111(27), 9875–9880 (2014)
Elhanati, Y., Murugan, A., Callan, C.G., Mora, T., Walczak, A.M.: Quantifying selection in immune receptor repertoires. Proceedings of the National Academy of Sciences 111(27), 9875–9880 (2014)
work page 2014
-
[44]
Journal of the American Statistical Association 87, 210–217 (1992)
Chao, A., Lee, S.-M.: Estimating the Number of Classes via Sample Coverage. Journal of the American Statistical Association 87, 210–217 (1992)
work page 1992
-
[45]
Journal of the Ame rican Statistical Association 100, 942–959 (2005)
Wang, J.P.Z., Lindsay, B.G.: A penalized nonparametric maximum like- lihood approach to species richness estimation. Journal of the Ame rican Statistical Association 100, 942–959 (2005)
work page 2005
-
[46]
Gotelli, N., Colwell, R.: Estimating species richness, vol. 12, pp. 39– 54 (2011)
work page 2011
-
[47]
Journal of Plant Ecology 5, 3–21 (2012)
Colwell, R.K., Chao, A., Gotelli, N.J., Lin, S.-Y., Mao, C.-X., Chazdon, R.L., Longino, J.T.: Models and estimators linking individual-based and sample-based rarefaction, extrapolation and comparison of asse mblages. Journal of Plant Ecology 5, 3–21 (2012)
work page 2012
-
[48]
Gotelli, N.J., Chao, A.: Measuring and estimating species richness, species diversity, and biotic similarity from sampling data. (2013)
work page 2013
-
[49]
Chiu, C.-H., Wang, Y.-T., Walther, B.A., Chao, A.: An improved non- parametric lower bound of species richness via a modified Good-Turin g frequency formula. Biometrics 70, 671–682 (2014)
work page 2014
-
[50]
Ecological Research 35(2), 292–314 (2020)
Chao, A., Kubota, Y., Zelen´ y, D., Chiu, C.-H., Li, C.-F., Kusumoto, B., Yasuhara, M., Thorn, S., Wei, C.-L., Costello, M.J., Colwell, R.K.: Quantifying sample completeness and comparing diversities among as sem- blages. Ecological Research 35(2), 292–314 (2020)
work page 2020
-
[51]
Natur e Com- munications 7(1), 1–10 (2016)
Kaplinsky, J., Arnaout, R.: Robust estimates of overall immune- repertoire diversity from high-throughput measurements on samples. Natur e Com- munications 7(1), 1–10 (2016)
work page 2016
-
[52]
Frontiers in Immunology, 2547 (2018)
Gkazi, A.S., Margetts, B.K., Attenborough, T., Mhaldien, L., Stan d- ing, J.F., Oakes, T., Heather, J.M., Booth, J., Pasquet, M., Chiesa, R., et al.: Clinical T cell receptor repertoire deep sequencing and analys is: an application to monitor immune reconstitution following cord blood transplantation. Frontiers in Immunology, 2547 (2018)
work page 2018
-
[53]
Sch¨ urmann, T.: Bias analysis in entropy estimation. Journal of Physics A: Mathematical and General 37(27), 295 (2004) Springer Nature 2021 LATEX template 34 Private and Public TCR statistics
work page 2004
-
[54]
Grassberger, P.: On Generalized Sch¨ urmann Entropy Estimators. Entropy 24(5) (2022). https://doi.org/10.3390/e24050680
-
[55]
S tatistical Physics 3, 181 (1963)
Jaynes, E.T.: Information Theory and Statistical Mechanics. S tatistical Physics 3, 181 (1963)
work page 1963
-
[56]
Entropy 13(11), 1945–1957 (2011)
Baez, J.C., Fritz, T., Leinster, T.: A Characterization of Entrop y in Terms of Information Loss. Entropy 13(11), 1945–1957 (2011)
work page 1945
-
[57]
Ciupe, S.M., Devlin, B.H., Markert, M.L., Kepler, T.B.: Quantification o f total T-cell receptor diversityby flow cytometry and spectraty ping. BMC Immunology 14, 35 (2013) Springer Nature 2021 LATEX template Private and Public TCR statistics 35 T able 1 T able of mathematical results. W e list our main mathematical derivations and expressions for richness ...
work page 2013
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.