arxiv: 2604.27745 · v1 · submitted 2026-04-30 · 💻 cs.DS

Recognition: unknown

Average-Tree Phylogenetic Diversity Parameterized by Scanwidth and Invisibility

Leo van Iersel , Mark Jones , Jannik Schestag , Celine Scornavacca , Mathias Weller

Authors on Pith no claims yet

Pith reviewed 2026-05-07 04:59 UTC · model grok-4.3

classification 💻 cs.DS

keywords phylogenetic networksaverage-tree phylogenetic diversityscanwidthparameterized algorithmsNP-hardnessreticulation visibilitydynamic programmingtaxon selection

0 comments

The pith

A subset of taxa with maximum average-tree phylogenetic diversity can be found in polynomial time for phylogenetic networks of scanwidth at most 2.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops algorithms for the maximum average-tree phylogenetic diversity problem on rooted phylogenetic networks, using scanwidth as the main structural parameter that measures deviation from tree structure. It proves that the problem admits a polynomial-time algorithm when scanwidth is at most 2, becomes NP-hard already at scanwidth 3, and is solvable in O(2^sw n) time for any fixed set of taxa when parameterized by scanwidth. Additional results give linear time when the induced network is reticulation-visible and polynomial time when each biconnected component has only a constant number of invisible reticulations. A reader would care because phylogenetic networks capture reticulate evolution more accurately than trees, and these results make diversity-based taxon selection computationally feasible on large but nearly tree-like networks.

Core claim

The authors prove that finding a taxon subset of maximum average-tree phylogenetic diversity is solvable in polynomial time on rooted phylogenetic networks of scanwidth at most 2 and NP-hard on networks of scanwidth 3. They also present an algorithm that computes the APD value of any given taxon set in O(2^sw n) time. Finally, they give a linear-time algorithm when the induced subnetwork on the chosen taxa is reticulation-visible and a polynomial-time algorithm when each biconnected component of that induced network contains only constantly many invisible reticulations.

What carries the argument

Scanwidth, a width parameter on directed acyclic graphs that bounds the maximum number of taxa visible across any cut in a linear scan of the network and enables dynamic programming.

If this is right

Networks whose scanwidth is bounded by a small constant allow exact maximum-APD taxon selection without exponential blow-up.
APD values for arbitrary fixed taxon sets become computable in time linear in the number of taxa once scanwidth is fixed.
Reticulation-visible networks or networks with bounded invisible reticulations per block admit even faster exact APD computation without needing the full scanwidth parameterization.
The jump from polynomial to NP-hard between scanwidth 2 and 3 indicates that further restrictions beyond scanwidth are required for tractability on wider networks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Scanwidth may serve as a useful parameterization for other optimization problems on phylogenetic networks that currently lack efficient algorithms.
Real biological networks that are nearly tree-like will typically have small scanwidth, making the polynomial-time results directly applicable to conservation or biodiversity datasets.
The visibility-based speed-ups suggest that preprocessing to detect and handle invisible reticulations could be a practical first step before invoking the general scanwidth algorithm.

Load-bearing premise

The input is a rooted phylogenetic network for which scanwidth is finite and well-defined and the induced subnetwork on any chosen taxa preserves the visibility properties needed to apply the dynamic-programming states.

What would settle it

A concrete rooted phylogenetic network of scanwidth 2 together with an enumeration of all possible taxon subsets showing that the claimed polynomial-time algorithm returns a suboptimal APD value, or a network of scanwidth 3 together with a polynomial-time algorithm solving the maximum-APD problem on it.

read the original abstract

We investigate parameterized algorithms for computing the average-tree phylogenetic diversity (APD) in rooted phylogenetic networks, studying the problem under different structural parameters that capture the deviation of a network from a tree. Our primary parameter is the scanwidth, a measure of the tree-likeness of a given directed acyclic graph. We show that a subset of taxa with maximum APD can be found in polynomial time in phylogenetic networks of scanwidth at most 2, but becomes NP-hard in networks of scanwidth 3. Further, we design an algorithm that computes the APD of a given set of taxa in O(2^sw n) time, where sw denotes the scanwidth and n the number of taxa in the input network. Finally, we give a linear-time algorithm for computing the APD of a given set of taxa if the network induced by these taxa is reticulation-visible. We generalize this algorithm to still run in polynomial time if each biconnected component of the induced network has only constantly many invisible reticulations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Scanwidth draws a clean line for APD: polynomial at 2, NP-hard at 3, with an FPT algorithm and a linear-time case for reticulation-visible networks.

read the letter

The paper's core contribution is a parameterized complexity split for finding a maximum-APD subset of taxa: polynomial time when scanwidth is at most 2, NP-hard at scanwidth 3, plus an O(2^sw n) FPT algorithm for computing APD on a given set and a linear-time algorithm when the induced network is reticulation-visible (extended to polynomial time with constantly many invisible reticulations per biconnected component). These results organize tractability for this diversity measure under a width parameter that measures how tree-like the DAG is. The approach follows standard dynamic programming on DAGs with bounded width, and the visibility special case reuses known linear-time techniques for phylogenetic networks without obvious reinvention. The hardness reduction at scanwidth 3 is the key new statement, and the abstract presents it as a direct construction, which is the right way to establish the threshold. No circularity appears in the claims. The work is incremental rather than foundational; it applies existing ideas to APD rather than introducing a new framework that would affect broader evolutionary biology. Impact stays inside parameterized algorithms for biological networks. The main soft spot is that the abstract alone leaves the precise definition of scanwidth and the details of the hardness reduction unverified, so any gaps in how the network gadgets preserve APD values would only surface in the full proofs. No experiments or implementation details are mentioned, which is typical for this style of paper but means practical running times on real networks remain untested. The input assumptions (rooted phylogenetic networks with well-defined scanwidth) are standard and not overreached. This paper is for people already working on width parameters in phylogenetic networks or on diversity measures in DAGs. A reader in that niche gets a useful classification and two concrete algorithms. It is coherent on its own terms and shows clear engagement with the literature on visibility and scanwidth, so it deserves a serious referee even if revisions are needed on the proof details.

Referee Report

1 major / 3 minor

Summary. The manuscript studies the Average-Tree Phylogenetic Diversity (APD) problem on rooted phylogenetic networks, parameterized primarily by scanwidth. It claims that selecting a subset of taxa maximizing APD is solvable in polynomial time for scanwidth at most 2 but NP-hard for scanwidth 3. It further presents an FPT algorithm computing APD for a given taxon set in O(2^sw n) time, a linear-time algorithm when the induced network is reticulation-visible, and a polynomial-time generalization when each biconnected component has only constantly many invisible reticulations.

Significance. If the claims hold, the work establishes a useful complexity dichotomy for APD using the scanwidth parameter on DAGs, together with practical FPT and special-case algorithms. The threshold at scanwidth 2 versus 3 is a clean structural result that aligns with known width-based techniques for networks and could inform algorithms for related diversity measures. The reticulation-visibility results add immediate applicability for networks arising in biology.

major comments (1)

The NP-hardness claim for APD maximization at scanwidth 3 is load-bearing for the central dichotomy. The reduction (presumably detailed in the hardness section) must be verified to produce networks whose scanwidth is exactly 3 while preserving the correspondence between maximum-APD subsets and solutions of the source problem; any gap in the construction or in the proof that scanwidth remains 3 would undermine the threshold result.

minor comments (3)

The abstract and introduction should include a one-sentence reminder of the definitions of APD and scanwidth (or explicit citations to the standard sources) so that the parameterized results are immediately accessible.
Clarify the precise relationship between the scanwidth of the input network and the scanwidth of the induced subnetwork on the chosen taxa; this is used in the polynomial-time claim for scanwidth 2 and should be stated explicitly.
In the O(2^sw n) algorithm section, confirm that the dynamic-programming states correctly aggregate the average over spanning trees without an extra exponential factor in the number of trees.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for their positive evaluation of the significance of the complexity dichotomy and algorithmic results. The major comment raises a valid point about the need for explicit verification of the scanwidth in the hardness reduction. We address this below and will revise the manuscript accordingly to strengthen the presentation of the proof.

read point-by-point responses

Referee: The NP-hardness claim for APD maximization at scanwidth 3 is load-bearing for the central dichotomy. The reduction (presumably detailed in the hardness section) must be verified to produce networks whose scanwidth is exactly 3 while preserving the correspondence between maximum-APD subsets and solutions of the source problem; any gap in the construction or in the proof that scanwidth remains 3 would undermine the threshold result.

Authors: We appreciate the referee highlighting the centrality of this result. The reduction appears in Section 4, where we reduce from the NP-complete Maximum Independent Set problem on 3-regular graphs. The network is constructed by starting from a base tree whose leaves correspond to vertices and attaching a constant number of reticulations per edge to encode adjacency constraints. We prove scanwidth exactly 3 in two directions: (i) an explicit linear scan sequence of width 3 is given that processes the network by handling the three 'parallel' reticulation bundles one at a time after the root; (ii) a lower-bound argument shows that any scan must retain at least three unresolved taxa or reticulation states simultaneously because of three vertex-disjoint paths that each require an independent 'slot' in the scan state (formalized via a minor argument that forbids scanwidth 2). The correspondence between maximum-APD subsets and maximum independent sets is established by showing that the average-tree diversity contribution of each taxon is additive except for a penalty term that is incurred precisely when two adjacent vertices are both selected; this penalty is realized uniformly across all spanning trees. To eliminate any possible ambiguity, the revised version will contain an expanded subsection with a formal lemma stating 'the constructed network has scanwidth 3' together with a self-contained proof, an illustrative figure depicting the scan sequence on a small gadget, and a short example that computes both the APD optimum and the independent-set optimum side-by-side. These additions will make the verification fully self-contained without altering any claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper establishes a polynomial-time algorithm for maximum-APD subset selection on scanwidth-2 networks, NP-hardness for scanwidth 3 via reduction, an FPT algorithm running in O(2^sw n) time, and linear-time special-case algorithms for reticulation-visible networks (generalized to bounded invisible reticulations per biconnected component). These results are obtained through constructive dynamic programming over the scanwidth decomposition and standard parameterized reductions; none of the central claims reduce by definition or self-citation to the input quantities themselves. The derivation relies on external graph-theoretic machinery for DAGs and phylogenetic networks rather than re-expressing APD or scanwidth in terms of each other. No self-definitional, fitted-input, or load-bearing self-citation steps are present.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the standard definitions of rooted phylogenetic networks, the APD measure, and the scanwidth parameter; no free parameters are fitted, no new entities are invented, and the axioms are standard graph-theoretic assumptions about DAGs and biconnected components.

axioms (2)

domain assumption The input is a rooted phylogenetic network (a DAG with a single root and leaves labeled by taxa).
Invoked throughout the abstract when defining APD and induced subnetworks.
domain assumption Scanwidth is a well-defined integer parameter measuring deviation from tree structure.
Used as the primary parameterization; its exact definition is presupposed.

pith-pipeline@v0.9.0 · 5488 in / 1467 out tokens · 31804 ms · 2026-05-07T04:59:06.267061+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

9 extracted references · 9 canonical work pages

[1]

Scanning phylogenetic networks is NP-hard

1 Vincent Berry , Celine Scornavacca, and Mathias Weller. Scanning phylogenetic networks is NP-hard. InProceedings of the 46th International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM 2020), pages 519–530. Springer, 2020.doi:10.1007/978-3-030-38919-2_42. 2 Magnus Bordewich, Charles Semple, and Kristina Wicke. On the co...

work page doi:10.1007/978-3-030-38919-2_42 2020
[2]

3 Sebastian Bruchhold

doi:10.1016/j.tcs.2022.03.012. 3 Sebastian Bruchhold. On (Node-)Scanwidth and Its Algorithmic Applications. Master’s thesis, Technische Universität Berlin, 2024.doi:10.14279/depositonce-21448. 4 Sebastian Bruchhold and Mathias Weller. Exploiting Low Scanwidth to Resolve Soft Polytomies. In Proceedings of the 51st International Conference on Current Trends...

work page doi:10.1016/j.tcs.2022.03.012 2022
[3]

7 Niels Holtgrefe, Leo van Iersel, and Mark Jones

URL: https://resolver.tudelft.nl/uuid: 9c82fd2a-5841-4aac-8e40-d4d22542cdf5. 7 Niels Holtgrefe, Leo van Iersel, and Mark Jones. Exact and heuristic computation of the scanwidth of directed acyclic graphs. 2024.arXiv:2403.12734. 8 Niels Holtgrefe, Leo van Iersel, Ruben Meuwese, Yukihiro Murakami, and Jannik Schestag. PaNDA: Efficient Optimization of Phylog...

work page doi:10.1101/2025.11.14.688467 2024
[4]

URL: https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs

Schloss Dagstuhl – Leibniz-Zentrum für Informatik.doi:10.4230/LIPIcs. IPEC.2023.30. 11 Mark Jones and Jannik Schestag. Parameterized Algorithms for Diversity of Networks with Ecological Dependencies. In20th International Symposium on Parameterized and Exact Computation (IPEC 2025), volume 358 ofLeibniz International Proceedings in Informatics (LIPIcs), pa...

work page doi:10.4230/lipics 2023
[5]

In: Dell, H., Nederlof, J

Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.IPEC. 2025.11. 12 Christian Komusiewicz and Jannik Schestag. A multivariate complexity analysis of the generalized noah’s ark problem.Discrete Applied Mathematics, 382:137–154,

work page doi:10.4230/lipics.ipec 2025
[6]

doi:10.1016/J.DAM.2025. 11.037. 13 Fabio Pardi and Nick Goldman. Species Choice for Comparative Genomics: Being Greedy Works. PLoS Genetics, 1(6):e71, 2005.doi:10.1371/journal.pgen.0010071. 14 William J. Ripple, Christopher Wolf, Thomas M. Newsome, Mauro Galetti, Mohammed Alamgir, Eileen Crist, Mahmoud I. Mahmoud, William F . Laurance, and 364 scientist s...

work page doi:10.1016/j.dam.2025 2025
[7]

15 Jannik Schestag and Norbert Zeh

World Scientists’ Warning to Humanity: A Second Notice.BioScience, 67(12):1026–1028, 11 2017.doi:10.1093/biosci/bix125. 15 Jannik Schestag and Norbert Zeh. The First Known Problem That Is FPT with Respect to Node Scanwidth but Not Treewidth. 2026.arXiv:2602.06903. 16 Mike Steel. Phylogenetic Diversity and the greedy algorithm.Systematic Biology, 54(4):527...

work page doi:10.1093/biosci/bix125 2017
[8]

14 Average-Tree Phylogenetic Diversity Parameterized by Scanwidth and Invisibility 18 Leo van Iersel, Mark Jones, Jannik Schestag, Celine Scornavacca, and Mathias Weller

Schloss Dagstuhl – Leibniz-Zentrum für Informatik.doi:10.4230/ LIPIcs.WABI.2025.15. 14 Average-Tree Phylogenetic Diversity Parameterized by Scanwidth and Invisibility 18 Leo van Iersel, Mark Jones, Jannik Schestag, Celine Scornavacca, and Mathias Weller. Phylogenetic Network Diversity Parameterized by Reticulation Number and Beyond. InProceedings of the 2...

work page doi:10.1007/978-3-031-94928-9_7 2025
[9]

20 Kristina Wicke and Mareike Fischer

doi:10.2307/ 2999617. 20 Kristina Wicke and Mareike Fischer. Phylogenetic diversity and biodiversity indices on phylogenetic networks.Mathematical Biosciences, 298:80–90, 2018.doi:10.1016/j.mbs.2018.02.005. 21 Kristina Wicke, Arne Mooers, and Mike Steel. Formal links between feature diversity and phylogenetic diversity .Systematic Biology, 70(3):480–490, ...

work page doi:10.1016/j.mbs.2018.02.005 2018