Bayesian Credible Sets for Phylogenetic Tree Topologies with Applications to Coverage Analysis and Cross-Model Comparison

Alexei J. Drummond; Jonathan Klawitter

arxiv: 2505.14532 · v2 · submitted 2025-05-20 · 💻 cs.DS · q-bio.PE

Bayesian Credible Sets for Phylogenetic Tree Topologies with Applications to Coverage Analysis and Cross-Model Comparison

Jonathan Klawitter , Alexei J. Drummond This is my paper

Pith reviewed 2026-05-22 14:04 UTC · model grok-4.3

classification 💻 cs.DS q-bio.PE

keywords Bayesian phylogeneticscredible setstree topologiesConditional Clade Distributionposterior approximationcoverage analysismodel calibration

0 comments

The pith

New methods using Conditional Clade Distributions compute credible levels for any phylogenetic tree topology.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops efficient ways to find credible sets for tree topologies in Bayesian phylogenetics, where the space of possible trees is enormous and samples often contain mostly unique trees. Standard frequency counts from MCMC samples break down for diffuse posteriors, so the authors turn to tractable approximations called Conditional Clade Distributions. They define an alpha credible CCD as the smallest collection of trees whose probabilities sum to a target alpha, and supply algorithms that compute the credible level of any given tree or subtree. These tools also let researchers run calibration checks and compare how well different models recover true trees in simulations.

Core claim

Credible levels of individual tree topologies and subtrees can be estimated directly from a Conditional Clade Distribution without relying on raw sample frequencies, and an alpha credible CCD is the set of highest-probability trees that together carry exactly alpha posterior probability mass.

What carries the argument

Conditional Clade Distribution (CCD), a factorized probability model over tree topologies built from conditional clade probabilities that remains computationally tractable.

If this is right

Any sampled tree topology can be assigned a numeric credible level in linear time relative to the number of clades.
Credible sets can be constructed for subtrees, allowing focused uncertainty statements on particular clades.
Rank-uniformity checks become possible by plotting the empirical distribution of credible levels against the uniform distribution.
Different CCD parameterizations can be ranked by how well their credible sets achieve nominal coverage in repeated simulations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same machinery could supply topology-aware diagnostics when comparing non-nested phylogenetic models.
Reporting a credible CCD alongside the maximum a posteriori tree would give readers a direct sense of remaining topological uncertainty.
Extensions to time-calibrated trees might allow credible sets on divergence times conditional on topology.

Load-bearing premise

The Conditional Clade Distribution must approximate the true posterior over tree topologies closely enough that its probability assignments remain meaningful for credible-set construction.

What would settle it

On simulated data where the true posterior over topologies is known exactly, the estimated credible levels for topologies would fail to match their nominal coverage if the approximation is too coarse.

read the original abstract

Credible intervals and credible sets, such as highest posterior density (HPD) intervals, form an integral statistical tool in Bayesian phylogenetics, both for phylogenetic analyses and for development. Readily available for continuous parameters such as base frequencies and clock rates, the vast and complex space of tree topologies poses significant challenges for defining analogous credible sets. Traditional frequency-based approaches are inadequate for diffuse posteriors where sampled trees are often unique. To address this, we introduce novel and efficient methods for estimating the credible level of individual tree topologies using tractable tree distributions, specifically Conditional Clade Distribution (CCD). Furthermore, we propose a new concept called $\alpha$ credible CCD, which encapsulates a CCD whose trees collectively make up $\alpha$ probability. We present algorithms to compute these credible CCDs efficiently and to determine credible levels of tree topologies as well as of subtrees. We evaluate the accuracy of these credible set methods leveraging simulated and real datasets. Furthermore, to demonstrate the utility of our methods, we use well-calibrated simulation studies to evaluate the performance of different CCD models. In particular, we show how the credible set methods can be used to conduct rank-uniformity validation and produce Empirical Cumulative Distribution Function (ECDF) plots, supplementing standard coverage analyses for continuous parameters.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces α-credible CCDs as a way to build credible sets over tree topologies when the posterior is too diffuse for frequency counts, and it gives algorithms plus rank-uniformity checks that look workable.

read the letter

The core contribution is a set of methods that treat Conditional Clade Distributions as a tractable proxy for the posterior over topologies. From there they define α-credible CCDs and give ways to compute the credible level of any single topology or subtree. That directly tackles the problem the abstract flags: when MCMC samples are mostly unique, simple frequency-based credible sets break down. The algorithms for extracting these sets and the use of ECDF plots for rank-uniformity validation are the concrete new pieces, and they seem efficient enough to be practical. The evaluation on both simulated and real data is a plus, and showing how the same machinery can supplement standard coverage analysis for continuous parameters is a reasonable extension. Credit where due: the work builds on existing CCD machinery without pretending to reinvent it, and the focus on subtrees as well as full topologies adds some extra utility. The main soft spot is the validation strategy. The checks are internal to the CCD family and the reported accuracy and uniformity diagnostics stay within that model. We do not yet see direct head-to-head comparisons against an independently obtained ground-truth posterior, such as exhaustive enumeration on small taxon sets or very long MCMC runs with high effective sample sizes. If higher-order clade dependencies that the CCD factorization misses are common in diffuse regimes, the credible levels could be mis-calibrated even while the internal ECDFs look good. That is not a fatal flaw, but it is the part that needs the clearest evidence in a revision. This paper is aimed at Bayesian phylogeneticists who already work with CCDs or similar approximations and need better tools for summarizing tree posteriors or comparing models. A reader who cares about coverage diagnostics or credible-set construction in tree space will find usable methods here. It is worth sending to a serious referee because the problem is genuine, the proposed fix is concrete, and the authors have done enough empirical work to make discussion productive.

Referee Report

2 major / 2 minor

Summary. The manuscript addresses challenges in defining Bayesian credible sets for phylogenetic tree topologies, where traditional frequency-based methods fail in diffuse posteriors with many unique sampled trees. It introduces efficient methods to estimate credible levels of individual topologies via tractable Conditional Clade Distributions (CCD), proposes the new concept of an α-credible CCD (a CCD whose trees sum to α probability mass), develops algorithms for computing these sets and per-topology/subtree credible levels, evaluates accuracy on simulated and real datasets, and demonstrates utility for rank-uniformity validation and ECDF plots to supplement coverage analyses and enable cross-model comparison of CCD variants.

Significance. If the CCD approximation holds, the work offers a practical advance for quantifying uncertainty over tree topologies in Bayesian phylogenetics, particularly for large or diffuse posteriors. The algorithmic focus on efficient computation and the application to model validation (via rank-uniformity) are strengths; the latter provides a falsifiable check on CCD assumptions that could improve downstream phylogenetic analyses. Reproducible simulation studies and ECDF diagnostics add value for the field.

major comments (2)

[Simulation studies and results sections] The accuracy evaluation and rank-uniformity validation (described in the simulation studies and results sections) are performed entirely within the CCD model family, using CCD-derived probabilities both to generate data and to compute credible levels/ECDFs. This does not constitute an external check against an independent ground-truth posterior (e.g., exhaustive enumeration for ≤12 taxa or converged MCMC with ESS >10^5 unique topologies). If unmodeled higher-order clade dependencies cause CCD to mis-rank or mis-assign mass to topologies, the reported credible levels and coverage diagnostics will be mis-calibrated even if ECDFs appear uniform internally.
[Methods (definition of α-credible CCD and credible level estimation)] The central construction of α-credible CCD sets and per-topology credible levels (introduced after the abstract and formalized in the methods) treats CCD marginal probabilities as stand-ins for true posterior probabilities. The manuscript reports good performance on simulated/real data but provides no direct calibration test in diffuse regimes where the CCD factorization assumption is most likely to break; this is load-bearing for the claim that the methods support meaningful credible sets and cross-model comparison.

minor comments (2)

[Abstract and introduction] Clarify in the abstract and introduction whether the CCD models used for credible-set estimation are the same as those being validated in the cross-model comparison, to avoid any appearance of circularity in the experimental design.
[Algorithms section] Add a brief discussion of computational complexity or runtime scaling for the proposed algorithms on larger taxon sets, as this is relevant for the cs.DS audience.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the scope and limitations of our proposed methods for computing credible sets and levels using Conditional Clade Distributions. We address each major comment below and indicate where revisions will be made to improve clarity and acknowledge assumptions.

read point-by-point responses

Referee: [Simulation studies and results sections] The accuracy evaluation and rank-uniformity validation (described in the simulation studies and results sections) are performed entirely within the CCD model family, using CCD-derived probabilities both to generate data and to compute credible levels/ECDFs. This does not constitute an external check against an independent ground-truth posterior (e.g., exhaustive enumeration for ≤12 taxa or converged MCMC with ESS >10^5 unique topologies). If unmodeled higher-order clade dependencies cause CCD to mis-rank or mis-assign mass to topologies, the reported credible levels and coverage diagnostics will be mis-calibrated even if ECDFs appear uniform internally.

Authors: We appreciate this point regarding the internal nature of the validation. The simulation framework is deliberately constructed within the CCD family to isolate and test the rank-uniformity property as a diagnostic for different CCD variants, allowing direct assessment of whether the assigned probabilities produce the expected uniform ranks under the model. This serves as a falsifiable check on the CCD assumptions themselves, which is useful for cross-model comparison as described in the manuscript. We acknowledge that this does not constitute an external validation against an independent ground truth and that higher-order dependencies could lead to miscalibration relative to the true posterior. We will revise the simulation studies and discussion sections to explicitly state this scope and limitation, including a note that the methods are intended for use when CCD provides a reasonable approximation, and we will suggest small-tree exhaustive enumeration as a direction for future external calibration studies. revision: partial
Referee: [Methods (definition of α-credible CCD and credible level estimation)] The central construction of α-credible CCD sets and per-topology credible levels (introduced after the abstract and formalized in the methods) treats CCD marginal probabilities as stand-ins for true posterior probabilities. The manuscript reports good performance on simulated/real data but provides no direct calibration test in diffuse regimes where the CCD factorization assumption is most likely to break; this is load-bearing for the claim that the methods support meaningful credible sets and cross-model comparison.

Authors: We thank the referee for identifying this foundational assumption. The α-credible CCD and per-topology credible levels are explicitly defined with respect to the CCD probabilities, and the real-data experiments illustrate practical application even when the true posterior is inaccessible. The rank-uniformity and ECDF diagnostics provide an indirect calibration mechanism by verifying consistency properties that should hold if the CCD marginals are well-specified. We agree that direct tests in highly diffuse regimes are difficult to perform at scale. We will revise the methods section to more prominently articulate the CCD factorization assumption, the conditions under which the credible sets are expected to be meaningful, and the role of the validation diagnostics in supporting cross-model comparisons of CCD variants. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation of α-credible CCD sets

full rationale

The paper defines α-credible CCD sets and per-topology credible levels by direct application of the existing Conditional Clade Distribution factorization to the topology probability space. Credible levels are computed from the CCD marginals, and accuracy is assessed via separate simulation studies and real-data ECDF plots that compare against the generating model rather than re-using the same fitted CCD quantities as both input and output. No quoted step reduces a claimed prediction or uniqueness result to a self-citation or to a parameter fitted from the target quantity itself. The central constructions therefore retain independent content relative to the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central approach rests on the domain assumption that CCD provides a suitable tractable distribution for defining credible sets in the tree topology space.

axioms (1)

domain assumption The Conditional Clade Distribution (CCD) provides a tractable approximation to the posterior distribution over tree topologies.
Invoked to enable efficient computation of credible levels where frequency-based methods fail.

pith-pipeline@v0.9.0 · 5762 in / 1161 out tokens · 37803 ms · 2026-05-22T14:04:59.167433+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Omni-DuplexEval: Evaluating Real-time Duplex Omni-modal Interaction
cs.CV 2026-05 conditional novelty 7.0

Omni-DuplexEval creates a new benchmark and LLM-as-a-Judge framework for real-time duplex omni-modal interaction, revealing that current models score below 40% overall and struggle especially with proactive responses.