Recognition: unknown
Average-Tree Phylogenetic Diversity Parameterized by Scanwidth and Invisibility
Pith reviewed 2026-05-07 04:59 UTC · model grok-4.3
The pith
A subset of taxa with maximum average-tree phylogenetic diversity can be found in polynomial time for phylogenetic networks of scanwidth at most 2.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors prove that finding a taxon subset of maximum average-tree phylogenetic diversity is solvable in polynomial time on rooted phylogenetic networks of scanwidth at most 2 and NP-hard on networks of scanwidth 3. They also present an algorithm that computes the APD value of any given taxon set in O(2^sw n) time. Finally, they give a linear-time algorithm when the induced subnetwork on the chosen taxa is reticulation-visible and a polynomial-time algorithm when each biconnected component of that induced network contains only constantly many invisible reticulations.
What carries the argument
Scanwidth, a width parameter on directed acyclic graphs that bounds the maximum number of taxa visible across any cut in a linear scan of the network and enables dynamic programming.
If this is right
- Networks whose scanwidth is bounded by a small constant allow exact maximum-APD taxon selection without exponential blow-up.
- APD values for arbitrary fixed taxon sets become computable in time linear in the number of taxa once scanwidth is fixed.
- Reticulation-visible networks or networks with bounded invisible reticulations per block admit even faster exact APD computation without needing the full scanwidth parameterization.
- The jump from polynomial to NP-hard between scanwidth 2 and 3 indicates that further restrictions beyond scanwidth are required for tractability on wider networks.
Where Pith is reading between the lines
- Scanwidth may serve as a useful parameterization for other optimization problems on phylogenetic networks that currently lack efficient algorithms.
- Real biological networks that are nearly tree-like will typically have small scanwidth, making the polynomial-time results directly applicable to conservation or biodiversity datasets.
- The visibility-based speed-ups suggest that preprocessing to detect and handle invisible reticulations could be a practical first step before invoking the general scanwidth algorithm.
Load-bearing premise
The input is a rooted phylogenetic network for which scanwidth is finite and well-defined and the induced subnetwork on any chosen taxa preserves the visibility properties needed to apply the dynamic-programming states.
What would settle it
A concrete rooted phylogenetic network of scanwidth 2 together with an enumeration of all possible taxon subsets showing that the claimed polynomial-time algorithm returns a suboptimal APD value, or a network of scanwidth 3 together with a polynomial-time algorithm solving the maximum-APD problem on it.
read the original abstract
We investigate parameterized algorithms for computing the average-tree phylogenetic diversity (APD) in rooted phylogenetic networks, studying the problem under different structural parameters that capture the deviation of a network from a tree. Our primary parameter is the scanwidth, a measure of the tree-likeness of a given directed acyclic graph. We show that a subset of taxa with maximum APD can be found in polynomial time in phylogenetic networks of scanwidth at most 2, but becomes NP-hard in networks of scanwidth 3. Further, we design an algorithm that computes the APD of a given set of taxa in O(2^sw n) time, where sw denotes the scanwidth and n the number of taxa in the input network. Finally, we give a linear-time algorithm for computing the APD of a given set of taxa if the network induced by these taxa is reticulation-visible. We generalize this algorithm to still run in polynomial time if each biconnected component of the induced network has only constantly many invisible reticulations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript studies the Average-Tree Phylogenetic Diversity (APD) problem on rooted phylogenetic networks, parameterized primarily by scanwidth. It claims that selecting a subset of taxa maximizing APD is solvable in polynomial time for scanwidth at most 2 but NP-hard for scanwidth 3. It further presents an FPT algorithm computing APD for a given taxon set in O(2^sw n) time, a linear-time algorithm when the induced network is reticulation-visible, and a polynomial-time generalization when each biconnected component has only constantly many invisible reticulations.
Significance. If the claims hold, the work establishes a useful complexity dichotomy for APD using the scanwidth parameter on DAGs, together with practical FPT and special-case algorithms. The threshold at scanwidth 2 versus 3 is a clean structural result that aligns with known width-based techniques for networks and could inform algorithms for related diversity measures. The reticulation-visibility results add immediate applicability for networks arising in biology.
major comments (1)
- The NP-hardness claim for APD maximization at scanwidth 3 is load-bearing for the central dichotomy. The reduction (presumably detailed in the hardness section) must be verified to produce networks whose scanwidth is exactly 3 while preserving the correspondence between maximum-APD subsets and solutions of the source problem; any gap in the construction or in the proof that scanwidth remains 3 would undermine the threshold result.
minor comments (3)
- The abstract and introduction should include a one-sentence reminder of the definitions of APD and scanwidth (or explicit citations to the standard sources) so that the parameterized results are immediately accessible.
- Clarify the precise relationship between the scanwidth of the input network and the scanwidth of the induced subnetwork on the chosen taxa; this is used in the polynomial-time claim for scanwidth 2 and should be stated explicitly.
- In the O(2^sw n) algorithm section, confirm that the dynamic-programming states correctly aggregate the average over spanning trees without an extra exponential factor in the number of trees.
Simulated Author's Rebuttal
We thank the referee for their careful reading of the manuscript and for their positive evaluation of the significance of the complexity dichotomy and algorithmic results. The major comment raises a valid point about the need for explicit verification of the scanwidth in the hardness reduction. We address this below and will revise the manuscript accordingly to strengthen the presentation of the proof.
read point-by-point responses
-
Referee: The NP-hardness claim for APD maximization at scanwidth 3 is load-bearing for the central dichotomy. The reduction (presumably detailed in the hardness section) must be verified to produce networks whose scanwidth is exactly 3 while preserving the correspondence between maximum-APD subsets and solutions of the source problem; any gap in the construction or in the proof that scanwidth remains 3 would undermine the threshold result.
Authors: We appreciate the referee highlighting the centrality of this result. The reduction appears in Section 4, where we reduce from the NP-complete Maximum Independent Set problem on 3-regular graphs. The network is constructed by starting from a base tree whose leaves correspond to vertices and attaching a constant number of reticulations per edge to encode adjacency constraints. We prove scanwidth exactly 3 in two directions: (i) an explicit linear scan sequence of width 3 is given that processes the network by handling the three 'parallel' reticulation bundles one at a time after the root; (ii) a lower-bound argument shows that any scan must retain at least three unresolved taxa or reticulation states simultaneously because of three vertex-disjoint paths that each require an independent 'slot' in the scan state (formalized via a minor argument that forbids scanwidth 2). The correspondence between maximum-APD subsets and maximum independent sets is established by showing that the average-tree diversity contribution of each taxon is additive except for a penalty term that is incurred precisely when two adjacent vertices are both selected; this penalty is realized uniformly across all spanning trees. To eliminate any possible ambiguity, the revised version will contain an expanded subsection with a formal lemma stating 'the constructed network has scanwidth 3' together with a self-contained proof, an illustrative figure depicting the scan sequence on a small gadget, and a short example that computes both the APD optimum and the independent-set optimum side-by-side. These additions will make the verification fully self-contained without altering any claims. revision: yes
Circularity Check
No significant circularity identified
full rationale
The paper establishes a polynomial-time algorithm for maximum-APD subset selection on scanwidth-2 networks, NP-hardness for scanwidth 3 via reduction, an FPT algorithm running in O(2^sw n) time, and linear-time special-case algorithms for reticulation-visible networks (generalized to bounded invisible reticulations per biconnected component). These results are obtained through constructive dynamic programming over the scanwidth decomposition and standard parameterized reductions; none of the central claims reduce by definition or self-citation to the input quantities themselves. The derivation relies on external graph-theoretic machinery for DAGs and phylogenetic networks rather than re-expressing APD or scanwidth in terms of each other. No self-definitional, fitted-input, or load-bearing self-citation steps are present.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The input is a rooted phylogenetic network (a DAG with a single root and leaves labeled by taxa).
- domain assumption Scanwidth is a well-defined integer parameter measuring deviation from tree structure.
Reference graph
Works this paper leans on
-
[1]
Scanning phylogenetic networks is NP-hard
1 Vincent Berry , Celine Scornavacca, and Mathias Weller. Scanning phylogenetic networks is NP-hard. InProceedings of the 46th International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM 2020), pages 519–530. Springer, 2020.doi:10.1007/978-3-030-38919-2_42. 2 Magnus Bordewich, Charles Semple, and Kristina Wicke. On the co...
-
[2]
doi:10.1016/j.tcs.2022.03.012. 3 Sebastian Bruchhold. On (Node-)Scanwidth and Its Algorithmic Applications. Master’s thesis, Technische Universität Berlin, 2024.doi:10.14279/depositonce-21448. 4 Sebastian Bruchhold and Mathias Weller. Exploiting Low Scanwidth to Resolve Soft Polytomies. In Proceedings of the 51st International Conference on Current Trends...
-
[3]
7 Niels Holtgrefe, Leo van Iersel, and Mark Jones
URL: https://resolver.tudelft.nl/uuid: 9c82fd2a-5841-4aac-8e40-d4d22542cdf5. 7 Niels Holtgrefe, Leo van Iersel, and Mark Jones. Exact and heuristic computation of the scanwidth of directed acyclic graphs. 2024.arXiv:2403.12734. 8 Niels Holtgrefe, Leo van Iersel, Ruben Meuwese, Yukihiro Murakami, and Jannik Schestag. PaNDA: Efficient Optimization of Phylog...
-
[4]
URL: https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs
Schloss Dagstuhl – Leibniz-Zentrum für Informatik.doi:10.4230/LIPIcs. IPEC.2023.30. 11 Mark Jones and Jannik Schestag. Parameterized Algorithms for Diversity of Networks with Ecological Dependencies. In20th International Symposium on Parameterized and Exact Computation (IPEC 2025), volume 358 ofLeibniz International Proceedings in Informatics (LIPIcs), pa...
-
[5]
Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.IPEC. 2025.11. 12 Christian Komusiewicz and Jannik Schestag. A multivariate complexity analysis of the generalized noah’s ark problem.Discrete Applied Mathematics, 382:137–154,
-
[6]
doi:10.1016/J.DAM.2025. 11.037. 13 Fabio Pardi and Nick Goldman. Species Choice for Comparative Genomics: Being Greedy Works. PLoS Genetics, 1(6):e71, 2005.doi:10.1371/journal.pgen.0010071. 14 William J. Ripple, Christopher Wolf, Thomas M. Newsome, Mauro Galetti, Mohammed Alamgir, Eileen Crist, Mahmoud I. Mahmoud, William F . Laurance, and 364 scientist s...
-
[7]
15 Jannik Schestag and Norbert Zeh
World Scientists’ Warning to Humanity: A Second Notice.BioScience, 67(12):1026–1028, 11 2017.doi:10.1093/biosci/bix125. 15 Jannik Schestag and Norbert Zeh. The First Known Problem That Is FPT with Respect to Node Scanwidth but Not Treewidth. 2026.arXiv:2602.06903. 16 Mike Steel. Phylogenetic Diversity and the greedy algorithm.Systematic Biology, 54(4):527...
-
[8]
Schloss Dagstuhl – Leibniz-Zentrum für Informatik.doi:10.4230/ LIPIcs.WABI.2025.15. 14 Average-Tree Phylogenetic Diversity Parameterized by Scanwidth and Invisibility 18 Leo van Iersel, Mark Jones, Jannik Schestag, Celine Scornavacca, and Mathias Weller. Phylogenetic Network Diversity Parameterized by Reticulation Number and Beyond. InProceedings of the 2...
-
[9]
20 Kristina Wicke and Mareike Fischer
doi:10.2307/ 2999617. 20 Kristina Wicke and Mareike Fischer. Phylogenetic diversity and biodiversity indices on phylogenetic networks.Mathematical Biosciences, 298:80–90, 2018.doi:10.1016/j.mbs.2018.02.005. 21 Kristina Wicke, Arne Mooers, and Mike Steel. Formal links between feature diversity and phylogenetic diversity .Systematic Biology, 70(3):480–490, ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.