Estimating Clonality

Yi Liu , Andrew Z. Fire , Scott Boyd , Richard A. Olshen

Authors on Pith no claims yet

classification 📊 stat.ME q-bio.PE

keywords clonalityclonalpopulationpopulationscelldataestimatingknowing

read the original abstract

Challenges of assessing complexity and clonality in populations of mixed species arise in diverse areas of modern biology, including estimating diversity and clonality in microbiome populations, measuring patterns of T and B cell clonality, and determining the underlying tumor cell population structure in cancer. Here we address the problem of quantifying populations, with our analysis directed toward systems for which previously defined algorithms allow the sequence-based identification of clonal subpopulations. Data come from replicate sequencing libraries generated from a sample, potentially with very different depths. While certain properties of the underlying clonal distribution (most notably the total number of clones) are difficult to estimate accurately from data representing a small fraction of the total population, the population-level "clonality" metric that is the sum of squared probabilities of the respective species can be calculated. (This is the sum of squared entries of a high-dimensional vector $p$ of relative frequencies.) The clonality score is the probability of a clonal relationship between two randomly chosen members of the population of interest. A principal takeaway message is that knowing a functional of $p$ well may not depend on knowing $p$ itself very well. Our work has led to software, which we call {\it lymphclon}; it has been deposited in the CRAN library.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Reddit's Globalization over Twenty Years: Inferring Community Time Zone from Activity Timestamps
cs.SI 2026-05 unverdicted novelty 6.0

A 4 a.m. activity minimum heuristic infers community time zones from timestamps with sub-hour accuracy on Reddit, enabling scalable analysis of the platform's globalization without user location data.