pith. machine review for the scientific record. sign in

arxiv: 2605.03498 · v1 · submitted 2026-05-05 · 🧬 q-bio.PE

Recognition: 3 theorem links

· Lean Theorem

Connecting IBD tracts and runs of homozygosity: A coalescent framework for inferring effective population size

Enrique Santiago

Pith reviewed 2026-05-08 18:37 UTC · model grok-4.3

classification 🧬 q-bio.PE
keywords identity by descentruns of homozygositycoalescenteffective population sizebackground selectionautozygositymarker densitylactase persistence
0
0 comments X

The pith

A coalescent framework unifies IBD tracts with runs of homozygosity by deriving closed-form length distributions for inferring effective population size.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a coalescent framework that starts from the Wright-Fisher model to derive closed-form probability density functions for the lengths of identity-by-descent tracts. It extends these distributions to the observable lengths of runs of homozygosity by modeling the displacement from true recombination breakpoints to the nearest heterozygous marker. This unification incorporates mutation, gene conversion, and marker density effects, allowing demographic information from both IBD and ROH data to be combined for inferring effective population size. The framework further incorporates background selection to show that selection creates a systematic upward bias in tract lengths, so that no single effective population size value can explain the full length distribution.

Core claim

Starting from a Wright-Fisher model, closed-form probability density functions are derived for IBD tract lengths and extended to the observable distribution of ROH lengths by explicitly modelling the displacement of ROH limits from true recombination breakpoints to the nearest heterozygous marker site. Mutation, gene conversion, finite marker density, and variable marker heterozygosity are incorporated as parameters. The chromosome segment homozygosity statistic emerges as a special case. This enables demographic information from IBD tracts and ROHs to be combined into a framework for inferring effective population size. Incorporating the quantitative genetic theory of background selection,

What carries the argument

The closed-form probability density functions for IBD tract lengths, extended to ROH lengths by modeling displacement to the nearest heterozygous marker site and incorporating background selection.

If this is right

  • Demographic information from IBD tracts and ROH can be combined for joint inference of effective population size.
  • The model allows detection of selection signatures by identifying systematic deviations from expected length distributions, as shown for the lactase persistence locus.
  • The chromosome segment homozygosity statistic is recovered as a limiting case of the unified length distribution framework.
  • Selection produces an upward bias in apparent tract lengths that cannot be captured by any single effective population size parameter.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same approach could be applied to other genomic regions with known selection to quantify consistent biases in demographic estimates.
  • Combining IBD and ROH data may improve accuracy when reconstructing population history in the presence of heterogeneous selection pressures.
  • The framework could be extended to time-varying population sizes or additional demographic events while retaining the closed-form length expressions.

Load-bearing premise

The derivation starts from a Wright-Fisher population model and accounts for the displacement of observed ROH limits to the nearest heterozygous marker site.

What would settle it

Observing IBD or ROH length distributions across a genome region under known background selection that can be fully explained by one constant effective population size value would contradict the predicted upward bias.

read the original abstract

Identity by descent (IBD) tracts and runs of homozygosity (ROH) are related concepts that refer to the autozygosity in chromosome segments. However the formal relationship between their length distributions remains to be established. Here we present a coalescent framework that unifies these two concepts within a single analytical development. Starting from a Wright-Fisher model, we derive closed-form probability density functions for IBD tract lengths and extend these to the observable distribution of ROH lengths. This is achieved by explicitly modelling the displacement of ROH limits from true recombination breakpoints to the nearest heterozygous marker site. Mutation, gene conversion, finite marker density, and variable marker heterozygosity are incorporated as parameters in the theory that link IBD tracts to ROH. We show that the chromosome segment homozygosity (CSH) statistic emerges as a special case. This enables demographic information from IBD tracts and ROHs to be combined into a framework for inferring effective population size. Finally, we incorporate the quantitative genetic theory of background selection into the IBD length distribution, to show how selection introduces a systematic upward bias in apparent tract lengths. This demonstrates that no single Ne value can account for the entire IBD length distribution under selection. The application of this theory to the detection of selection signatures in the genome is illustrated using the example of the local selective sweep associated with lactase persistence in human populations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper develops a coalescent framework starting from the Wright-Fisher model to derive closed-form probability density functions for IBD tract lengths, extends these analytically to the observable distribution of ROH lengths by modeling the displacement of tract limits to the nearest heterozygous marker, incorporates parameters for mutation, gene conversion, finite marker density and variable heterozygosity, identifies the chromosome segment homozygosity statistic as a special case, and supplies a unified approach for inferring effective population size. It further folds quantitative-genetic background selection into the IBD length distribution to demonstrate a systematic upward bias in apparent tract lengths, concluding that no single Ne can describe the full distribution under selection, and illustrates the approach with the lactase-persistence selective sweep in humans.

Significance. If the closed-form derivations are free of gaps and the incorporation of background selection is performed by re-deriving the tract-length density from position-specific coalescence rates rather than by global rescaling, the work would supply a parameter-light analytical bridge between IBD and ROH data that strengthens demographic inference and selection-signature detection in population genetics. The explicit treatment of marker effects and the unification with CSH are concrete strengths that could be directly useful for empirical studies.

major comments (2)
  1. [Background selection section] The central demonstration that background selection produces an upward bias precluding any single-Ne neutral model (abstract and the section on selection) rests on folding quantitative-genetic background selection into the previously derived neutral IBD length PDF. It is unclear from the provided description whether this is achieved by rescaling a global Ne or recombination rate within the neutral closed-form expression or by constructing a new position-dependent coalescence-rate density; the former would not automatically preserve the claimed systematic shift and could render the “no single Ne” conclusion artifactual.
  2. [Methods / Results on ROH extension] The soundness assessment notes that the abstract asserts closed-form derivations and explicit marker-effect modeling, yet no numerical checks or simulation validations against the analytic PDFs are referenced. Without such checks (e.g., forward simulations matching the derived ROH length distribution under known Ne and marker density), it is impossible to confirm absence of derivation gaps or post-hoc adjustments in the extension from IBD to observable ROH.
minor comments (2)
  1. [Abstract] The abstract states that mutation, gene conversion, marker density and heterozygosity are incorporated as parameters, but the manuscript should explicitly list which of these remain free versus fixed when the Ne-inference framework is applied.
  2. [CSH unification paragraph] The claim that CSH emerges as a special case would benefit from a short derivation or limiting-case statement showing how the general ROH PDF reduces to the CSH expression.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which help clarify key aspects of our framework. We address each major comment point by point below, indicating the revisions we will implement.

read point-by-point responses
  1. Referee: [Background selection section] The central demonstration that background selection produces an upward bias precluding any single-Ne neutral model (abstract and the section on selection) rests on folding quantitative-genetic background selection into the previously derived neutral IBD length PDF. It is unclear from the provided description whether this is achieved by rescaling a global Ne or recombination rate within the neutral closed-form expression or by constructing a new position-dependent coalescence-rate density; the former would not automatically preserve the claimed systematic shift and could render the “no single Ne” conclusion artifactual.

    Authors: We thank the referee for identifying this potential ambiguity. In the manuscript, background selection is incorporated by using the quantitative-genetic model to obtain position-specific coalescence rates (reflecting local reductions in effective population size due to linked selection), which are then integrated into the derivation of the IBD tract length density. This is not a global rescaling of Ne or recombination rate but a position-dependent adjustment to the coalescent process. To eliminate any uncertainty, we will revise the selection section to include the explicit mathematical formulation: the position-dependent coalescence probability is substituted into the integral expression for the tract length PDF, preserving the heterogeneous rate structure that produces the upward bias in apparent lengths. This will also reinforce why no single neutral Ne can fit the full distribution. revision: yes

  2. Referee: [Methods / Results on ROH extension] The soundness assessment notes that the abstract asserts closed-form derivations and explicit marker-effect modeling, yet no numerical checks or simulation validations against the analytic PDFs are referenced. Without such checks (e.g., forward simulations matching the derived ROH length distribution under known Ne and marker density), it is impossible to confirm absence of derivation gaps or post-hoc adjustments in the extension from IBD to observable ROH.

    Authors: We agree that direct numerical validation strengthens confidence in the analytic results. Although the ROH extension follows rigorously from modeling the displacement of tract endpoints to the nearest heterozygous marker (with parameters for mutation, gene conversion, and variable heterozygosity), we will add explicit simulation checks in the revised manuscript. This will include forward simulations under known Ne, marker densities, and heterozygosity levels, with direct comparisons of simulated ROH length histograms to the closed-form PDFs, confirming agreement and the absence of gaps or adjustments in the IBD-to-ROH mapping. revision: yes

Circularity Check

0 steps flagged

Derivation from Wright-Fisher model is analytically self-contained with no circular reductions.

full rationale

The paper begins from the standard neutral Wright-Fisher coalescent and derives closed-form PDFs for IBD tract lengths, then extends them to observable ROH lengths via explicit modeling of marker displacement, mutation, gene conversion, and finite density. The chromosome segment homozygosity statistic is shown to emerge as a special case, and background selection is folded in to demonstrate that no constant Ne fits the full distribution. No step reduces by construction to a fitted parameter renamed as prediction, a self-citation chain, or an ansatz smuggled via prior work; all load-bearing relations follow directly from the initial WF assumptions and the stated extensions. The framework therefore remains independent of its target quantities.

Axiom & Free-Parameter Ledger

3 free parameters · 1 axioms · 0 invented entities

The central claim rests on the standard Wright-Fisher coalescent as the starting point, plus explicit modeling of marker displacement, mutation, and gene conversion as linking parameters; no new entities are introduced.

free parameters (3)
  • mutation rate
    Incorporated as a parameter that links IBD tracts to observable ROH.
  • gene conversion rate
    Incorporated as a parameter affecting tract lengths.
  • marker density and heterozygosity
    Used to model displacement of ROH limits from true breakpoints.
axioms (1)
  • domain assumption Wright-Fisher model (constant population size, random mating, no structure)
    Explicitly stated as the starting point for the coalescent derivations.

pith-pipeline@v0.9.0 · 5546 in / 1424 out tokens · 87329 ms · 2026-05-08T18:37:43.251560+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

14 extracted references · 1 canonical work pages

  1. [1]

    Enrique Santiago Departamento de Biología Funcional, Facultad de Biología, Universidad de Oviedo, 33006 Oviedo, Spain

    1 Connecting IBD tracts and runs of homozygosity: A coalescent framework for inferring effective population size. Enrique Santiago Departamento de Biología Funcional, Facultad de Biología, Universidad de Oviedo, 33006 Oviedo, Spain. ORCID: 0000-0002-5524-5641 Keywords: identity by descent, runs of homozygosity, coalescent theory, effective population size 2...

  2. [2]

    Depending on the type of data and sample size, these methods can resolve demographic fluctuations over a time window spanning approximately four to 200 generations in the past

    exploit the relationship between IBD segment length distributions and past Ne trajectories. Depending on the type of data and sample size, these methods can resolve demographic fluctuations over a time window spanning approximately four to 200 generations in the past. However, a fundamental limitation of all IBD-based 4 approaches is the difficulty of precis...

  3. [3]

    ,…=12𝑁!&1−12𝑁!(#$%≈ 𝑒$#

    take mutation rates and conflation events into account to a certain extent. This allows shorter autozygosity tracts to be included in the analysis and thus shedding light on ancient demography. However, existing ROH-based methods rely on empirical or semi-analytical frameworks rather than deriving predictions from a fully explicit probability model. We pre...

  4. [4]

    extension

    An IBD tract of length x at present, after a number of generations since coalescence. Time can be counted forwards or backwards depending on the perspective. Recombination, mutation and gene conversion events trim both copies of the original chromosome on either side of the focal site. The aim is first to determine the distribution of IBD tract lengths, an...

  5. [5]

    ), this expectation is essentially independent of population size. By contrast, for very short tracts (𝑥≪𝑁2#

    After normalisation, the PDF of the overall tract length distribution becomes: 𝑄(𝑥,𝑡)=𝑃(𝑥,𝑡)𝑥⁄∫𝑃(𝑥,𝑡)𝑥⁄∙𝑑𝑥/0 =2𝑡(1+𝑚)∙𝑒#,*("%&)∙( (4) which is an exponential distribution identical to equation (1), with a mean of: 𝐸(𝑥)=12𝑡(1+𝑚) Morgans (5) This is half the expected length in equation (3) for tracts that include the focal site. 9 - The distribution of coal...

  6. [6]

    ,1$(')∙∏_1−

    (Santiago and Caballero 2016): 12 𝑁2(*)=𝑁∙𝑒#"=/,∫@∙A%('))BC*/), (12) where 𝑄C(*)=∑(1−𝑟);*;<0∙L1−@-@M; represents the cumulative effect of selection over t generations, L is the chromosome length in Morgans, N is the census size, V is the standing genetic variance for fitness and 𝑉D is the input of new genetic variance for fitness per generation. Note that 𝑁2...

  7. [7]

    ) is essentially independent of population size. In contrast, for very short tracts (𝑥≪𝑁2#

    Two examples of identity extensions beyond the recombination breakpoints. Identities between the two reference chromosomes of the two branches are shown in blue color, and het sites (i.e. sites with different alleles in both reference chromosomes) are marked by red circles. In Example 1, the new identity limit with respect to the chromosome in branch 2 ext...

  8. [8]

    The MCM6 gene is an stablished target for positive selection in relation to lactase persistence

    Estimates of signatures of selection identified by local reductions in 𝑁4 across 3 Mb regions encompassing the MCM6 gene in six human populations. The MCM6 gene is an stablished target for positive selection in relation to lactase persistence. The gene location is indicated by grey columns. The populations are grouped in pairs to represent three subcontine...

  9. [9]

    However, they do not derive the observable ROH distribution from first principles within the same formalism

    and HapNe (Fournier 2023), rely on the empirical detection of IBD segments and fit their length distributions to demographic models. However, they do not derive the observable ROH distribution from first principles within the same formalism. ROH-based methods such as those of MacLeod et al. (2009,

  10. [10]

    adopt a computational approach and achieve corrections for mutations and genotyping errors, yet do not establish a formal derivation linking IBD tracts to ROH boundaries. The present framework provides these missing connections analytically, expressing the distribution of ROH lengths as an explicit function of the parameters of the coalescent process and ...

  11. [11]

    The incorporation of selection into the IBD tract length framework is another point of interest. Since Robertson (1961), it has been known that selection on inherited traits accelerates genetic drift due to the heritable variance in reproductive success, which induces correlations in allele frequency changes across consecutive generations. Santiago and Ca...

  12. [12]

    PloS Genet

    A quantitative genetic model of background selection in humans. PloS Genet. 20:e1011144. Charlesworth B., Morgan M. T., Charlesworth D. 1993 The effect of deleterious mutations on neutral molecular variation. Genetics 13: 1289-1303. Fisher, R. A

  13. [13]

    bioRxiv, doi:10.1101/2024.05.04.592538

    Benchmarking and optimization of methods for the detection of identity-by-descent in high-recombining Plasmodium falciparum genomes. bioRxiv, doi:10.1101/2024.05.04.592538. Haldane JBS

  14. [14]

    Genomic patterns of homozygosity in worldwide human populations. Am. J. Hum. Genet. 2012 91:275-292. Robertson A