Recognition: unknown
Polynomial-time completion of phylogenetic tree sets
Pith reviewed 2026-05-07 17:24 UTC · model grok-4.3
The pith
A polynomial-time algorithm completes phylogenetic tree sets with partial taxon overlap while preserving distances and producing order-independent results.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper demonstrates a polynomial-time algorithm for completing sets of phylogenetic trees that have partial taxon overlap. It identifies maximal completion subtrees that recur across the source trees, assembles a weighted majority-rule consensus of those subtrees, derives branch-length scaling rates from the common leaves, and inserts each consensus subtree into the target tree at the unique position that minimizes the quadratic distance error measured against source-tree information, restricting candidate positions to the original branches of the target tree. The resulting completion preserves distances among the taxa that were already present and is independent of the order in which the
What carries the argument
Extraction of maximal completion subtrees, construction of a weighted majority-rule consensus, and insertion of each consensus subtree at the position that minimizes quadratic distance error while restricting candidates to the original branches of the target tree.
If this is right
- The algorithm completes any input set of trees in polynomial time.
- Distances among taxa that appear in the original trees remain exactly the same after completion.
- The final completed tree set is unique and does not change if the order of the input trees is altered.
- On the amphibian, mammal, shark, and squamate data sets the completed trees are closer to the reference subset trees than those produced by competing methods, both in topology and in branch lengths.
Where Pith is reading between the lines
- Order independence removes the need to canonicalize input order before running the procedure, which simplifies its use inside larger automated pipelines.
- If the quadratic-error placements avoid artifacts on the tested clades, the same insertion rule may transfer to other clades or to trees with greater overlap variation.
- Because distances are preserved, downstream methods that treat branch lengths as evolutionary rates can be applied directly to the completed trees without additional correction steps.
Load-bearing premise
That restricting subtree insertions to the original branches of each target tree and choosing the position that minimizes quadratic distance error against the source trees will produce biologically accurate completions without creating topological artifacts or distorting distances.
What would settle it
A concrete data set of source trees for which the completed output exhibits a larger topological or branch-length distance to the reference subset trees than at least one alternative completion method, or for which distances among the originally shared taxa have changed.
read the original abstract
Comparative analyses of phylogenetic trees typically require identical taxon sets, however, in practice, trees often include distinct but overlapping taxa. Pruning non-shared leaves discards phylogenetic signal, whereas tree completion can preserve both taxa and branch-length information. This work introduces a polynomial-time algorithm for set-wide completion of phylogenetic trees with partial taxon overlap. The proposed method identifies and extracts maximal completion subtrees that frequently appear across the source trees and constructs a weighted majority-rule consensus. Branch lengths are scaled using rates derived from common leaves. Each consensus subtree is inserted at the position that minimizes the quadratic distance error measured against information from the source trees, with candidate positions restricted to the original branches of the target tree. We demonstrate that the algorithm runs in polynomial time and preserves distances among the original taxa, yielding a unique completion that is order-independent with respect to the processing order of target trees. An experimental evaluation on amphibians, mammals, sharks, and squamates shows that the proposed method consistently achieves the lowest distance to the subset reference trees across subsets among all methods, in both topology and branch lengths. An open-source Python implementation of the proposed algorithm and the biological datasets utilized in this study are publicly available at: https://github.com/tahiri-lab/overlap-treeset-completion/.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a polynomial-time algorithm for completing phylogenetic tree sets with partial taxon overlap. It extracts maximal completion subtrees appearing across source trees, constructs a weighted majority-rule consensus, scales branch lengths from common leaves, and inserts each consensus subtree into a target tree at the position on existing branches that minimizes quadratic distance error to source-tree information. The method is claimed to preserve distances among original taxa, produce a unique order-independent completion, run in polynomial time, and outperform alternatives on four biological datasets (amphibians, mammals, sharks, squamates) in both topology and branch lengths. An open-source Python implementation is provided.
Significance. If the polynomial-time guarantee, distance preservation, and uniqueness/order-independence hold, the work would enable more complete use of overlapping but incomplete phylogenetic trees without discarding signal via pruning. The public code and datasets strengthen reproducibility. The empirical superiority on real datasets suggests practical value for comparative analyses, though the theoretical claims are the primary contribution.
major comments (2)
- [Method description (insertion step)] The central claim of uniqueness and order-independence (abstract) rests on the insertion procedure. Sequential insertion of multiple subtrees can cause one insertion to alter effective distances for the next; if the quadratic error surface admits ties or if the order of processing target trees affects the final topology or lengths, the uniqueness guarantee fails. The restriction of candidate positions to original branches of the target tree further risks forcing distortions when overlap patterns require new attachment points.
- [Complexity analysis] The polynomial-time claim (abstract) requires explicit complexity analysis. The time for extracting maximal subtrees, building the weighted majority-rule consensus, and performing quadratic minimization over all branches for each subtree must be shown to be polynomial in the total number of taxa and trees; without this breakdown, the runtime guarantee cannot be verified.
minor comments (2)
- [Experimental evaluation] The experimental section should specify how the subset reference trees were constructed and whether the same subsetting protocol was applied uniformly to all compared methods.
- [Notation and definitions] Define the quadratic distance error measure precisely, including how source-tree information is aggregated when multiple source trees contribute to a given subtree.
Simulated Author's Rebuttal
We thank the referee for the constructive comments and the opportunity to address the concerns regarding the insertion procedure and the complexity analysis. We respond to each major comment below.
read point-by-point responses
-
Referee: [Method description (insertion step)] The central claim of uniqueness and order-independence (abstract) rests on the insertion procedure. Sequential insertion of multiple subtrees can cause one insertion to alter effective distances for the next; if the quadratic error surface admits ties or if the order of processing target trees affects the final topology or lengths, the uniqueness guarantee fails. The restriction of candidate positions to original branches of the target tree further risks forcing distortions when overlap patterns require new attachment points.
Authors: The quadratic distance error for each consensus subtree is computed directly against the fixed source-tree information rather than against an intermediate completed tree. Consequently, the minimization for each subtree is independent of prior insertions, and the final positions are determined on the original target tree. This ensures that the result is independent of the order in which target trees are processed. When ties occur in the quadratic error, the algorithm applies a deterministic tie-breaker (lowest branch index in a depth-first traversal of the target tree) to guarantee uniqueness. The restriction of candidate positions to the original branches of each target tree is deliberate: it is required to preserve exact distances among the taxa already present in that target tree, which is a core property stated in the manuscript. We will add a short subsection clarifying these independence and tie-breaking mechanisms. revision: partial
-
Referee: [Complexity analysis] The polynomial-time claim (abstract) requires explicit complexity analysis. The time for extracting maximal subtrees, building the weighted majority-rule consensus, and performing quadratic minimization over all branches for each subtree must be shown to be polynomial in the total number of taxa and trees; without this breakdown, the runtime guarantee cannot be verified.
Authors: We agree that an explicit breakdown is needed. In the revised manuscript we will insert a dedicated complexity paragraph showing that (i) extraction of maximal completion subtrees is O(T N^2), (ii) construction of the weighted majority-rule consensus is O(T N^2) using standard algorithms, and (iii) for each of the O(N) subtrees the quadratic minimization examines O(N) candidate branches with O(1) distance evaluations per branch, yielding an overall polynomial bound in the combined size of the input (number of taxa and number of trees). revision: yes
Circularity Check
No circularity: algorithmic assembly with external validation
full rationale
The paper describes a constructive polynomial-time algorithm that extracts maximal subtrees from source trees, forms a weighted majority-rule consensus, scales branch lengths from common leaves, and inserts subtrees by quadratic-error minimization restricted to existing branches. Claims of distance preservation, uniqueness, and order-independence are presented as direct consequences of this explicit procedure rather than derived predictions; the experimental section evaluates the output against independent reference trees on real biological datasets (amphibians, mammals, etc.), providing external benchmarks. No load-bearing step reduces a claimed result to a tautological restatement of the inputs by construction, self-citation, or fitted-parameter renaming.
Axiom & Free-Parameter Ledger
free parameters (1)
- quadratic distance error measure
axioms (2)
- domain assumption Phylogenetic trees admit completion by subtree insertion that preserves distances among originally shared taxa.
- domain assumption Maximal completion subtrees that appear across source trees capture the relevant phylogenetic signal for consensus construction.
Reference graph
Works this paper leans on
-
[1]
Systematic Biology 21(4):390–397 Akanni W A, Wilkinson M, Creevey CJ, et al (2015) Implementing and testing bayesian and maximum-likelihood supertree methods in phylogenetics
Adams III EN (1972) Consensus techniques and the comparison of taxonomic trees. Systematic Biology 21(4):390–397 Akanni W A, Wilkinson M, Creevey CJ, et al (2015) Implementing and testing bayesian and maximum-likelihood supertree methods in phylogenetics. Royal S ociety open science 2(8):140436 Bansal MS (2020) Linear-time algorithms for phylogenetic tree...
1972
-
[2]
Systematic Biology 58(1) :35–54 Dong J, Fern´ andez-Baca D, McMorris F, et al (2010) Majority-r ule (+) consensus trees
Springer Degnan JH, DeGiorgio M, Bryant D, et al (2009) Properties of cons ensus methods for inferring species trees from gene trees. Systematic Biology 58(1) :35–54 Dong J, Fern´ andez-Baca D, McMorris F, et al (2010) Majority-r ule (+) consensus trees. Mathematical Biosciences 228(1):10–15 Garey MR, Johnson DS (2002) Computers and intractability, vol 29...
2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.