Recognition: unknown
Estimating Clonality
read the original abstract
Challenges of assessing complexity and clonality in populations of mixed species arise in diverse areas of modern biology, including estimating diversity and clonality in microbiome populations, measuring patterns of T and B cell clonality, and determining the underlying tumor cell population structure in cancer. Here we address the problem of quantifying populations, with our analysis directed toward systems for which previously defined algorithms allow the sequence-based identification of clonal subpopulations. Data come from replicate sequencing libraries generated from a sample, potentially with very different depths. While certain properties of the underlying clonal distribution (most notably the total number of clones) are difficult to estimate accurately from data representing a small fraction of the total population, the population-level "clonality" metric that is the sum of squared probabilities of the respective species can be calculated. (This is the sum of squared entries of a high-dimensional vector $p$ of relative frequencies.) The clonality score is the probability of a clonal relationship between two randomly chosen members of the population of interest. A principal takeaway message is that knowing a functional of $p$ well may not depend on knowing $p$ itself very well. Our work has led to software, which we call {\it lymphclon}; it has been deposited in the CRAN library.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
Reddit's Globalization over Twenty Years: Inferring Community Time Zone from Activity Timestamps
A 4 a.m. activity minimum heuristic infers community time zones from timestamps with sub-hour accuracy on Reddit, enabling scalable analysis of the platform's globalization without user location data.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.