pith. sign in

arxiv: 2606.08409 · v1 · pith:RFTVL246new · submitted 2026-06-07 · 📊 stat.ME · q-bio.PE

Matrix representations and distance metrics for unlabeled ranked phylogenetic networks

Pith reviewed 2026-06-27 18:17 UTC · model grok-4.3

classification 📊 stat.ME q-bio.PE
keywords phylogenetic networksdistance metricsmatrix representationsranked networksunlabeled networksreticulate evolutionhybridizationancestral recombination graphs
0
0 comments X

The pith

A bijective triangular matrix representation turns comparisons of ranked phylogenetic networks into standard matrix norm calculations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops distance metrics for rooted ranked unlabeled phylogenetic networks by mapping each network to a unique triangular matrix that records the temporal sequence of speciations and hybridizations. Distances are then ordinary matrix norms applied to these representations. The construction handles networks that differ in the number of reticulation events and works for both isochronous and heterochronous tip sampling. The resulting distances are shown to distinguish biologically plausible histories in simulated data and in posterior samples of viral networks.

Core claim

Rooted ranked unlabeled phylogenetic networks admit a bijective triangular matrix representation whose entries encode the order of internal events, speciations, and hybridizations; matrix norms on these matrices therefore supply distances that compare topologies, timed networks, and networks with unequal numbers of hybridizations, and that apply equally to isochronous and heterochronous cases.

What carries the argument

bijective triangular matrix representation that captures the temporal order of internal events, speciations, and hybridizations

If this is right

  • Network topologies can be compared quantitatively using efficient matrix operations.
  • Networks with different numbers of hybridizations become directly comparable.
  • Both isochronous and heterochronous networks are handled by the same distance.
  • The metrics can be applied to posterior distributions from Bayesian inference of viral networks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The matrix encoding may support clustering or summarization of large sets of inferred networks from Bayesian analyses.
  • Branch-length or timing information could be incorporated into the matrix entries in future extensions.
  • Analogous triangular representations might be developed for other classes of reticulate graphs.

Load-bearing premise

Every rooted ranked unlabeled phylogenetic network possesses a unique bijective triangular matrix representation that fully encodes the temporal order of its events.

What would settle it

Two distinct rooted ranked unlabeled phylogenetic networks that produce identical triangular matrices, or a valid network that cannot be represented by any such matrix.

Figures

Figures reproduced from arXiv: 2606.08409 by Claudia Sol\'is-Lemus, Jiayang Wang, Julia A. Palacios.

Figure 1
Figure 1. Figure 1: Example of a phylogenetic network and its corresponding [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Complete list of unlabeled ranked phylogenetic networks with [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: A ranked phylogenetic network exemplifying notation used in the proof of Theorem 3.2. In (A), [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: (A) A heterochronous phylogenetic network [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: (A) P (1) n,m1 and (B) P (2) n,m2 are two phylogenetic networks with the same number of leaves but different number of hybridization events. Specifically, they both have five leaves, but P (1) n,m1 has one hybridization while P (2) n,m2 has two. Each event is labeled as b, a branching event, or h, a hybridization event [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Alignment of event vectors of two networks. Branching events (b) are aligned first and then [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Aligned ranked phylogenetic networks (original networks in Figure 5). Each event is labeled as [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Top: Medoid networks with 100 tips corresponding to different [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Multidimensional scaling (MDS) representation of distances between 500 simulated isochronous [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Comparison of the evolutionary histories of A/H1N1 influenza viruses across the United States, [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Comparison of the evolutionary histories of seasonal influenza A/H1N1, pandemic influenza [PITH_FULL_IMAGE:figures/full_fig_p021_11.png] view at source ↗
read the original abstract

Phylogenetic networks are graphs inferred from molecular sequence data that represent ancestral histories shaped by reticulate processes such as recombination, hybridization, and horizontal gene transfer. We introduce a family of distance metrics for rooted, ranked, unlabeled phylogenetic networks, extending a previously developed distance for ranked trees. Our approach relies on a bijective triangular matrix representation of phylogenetic networks that captures the temporal order of internal events, speciations, and hybridizations. Our metrics, defined as standard matrix norms, allow efficient quantitative comparisons of network topologies, timed networks and networks with differing numbers of hybridizations. Our distance can be used for both isochronous networks where all tips are sampled at one time point, and heterochronous networks where tips are allowed to be sampled at different time points. We show that our metrics capture biologically meaningful differences among evolutionary histories in both simulations and empirical posterior distributions of viral phylogenetic networks. These tools fill a methodological gap, enabling principled comparisons of ranked, unlabeled phylogenetic networks, including ancestral recombination graphs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces a family of distance metrics for rooted, ranked, unlabeled phylogenetic networks by defining a bijective triangular matrix representation that encodes the temporal order of speciations and hybridizations. Distances are then taken as standard matrix norms, claimed to enable comparisons of topologies, timed networks, and networks with differing reticulation counts for both isochronous and heterochronous cases. Utility is illustrated via simulations and empirical posterior distributions of viral networks.

Significance. If the bijectivity and uniqueness claims hold, the work supplies a concrete, computable framework for quantitative network comparison that extends existing tree metrics and addresses a clear methodological gap; the simulation and empirical demonstrations provide initial evidence of biological relevance.

major comments (2)
  1. [Abstract] Abstract and introduction: the central claim that a bijective triangular matrix representation exists and uniquely captures temporal order for all rooted ranked unlabeled networks (including variable hybridization counts) is asserted without an explicit construction, injectivity proof, or surjectivity argument; this is load-bearing for the subsequent matrix-norm distances.
  2. [Abstract] The assertion that the metrics distinguish networks with differing numbers of hybridizations relies on the matrix representation being well-defined and bijective across heterochronous cases, yet no verification or counter-example analysis is supplied to confirm this.
minor comments (1)
  1. Notation for the triangular matrix entries and the precise mapping from network events to matrix positions should be defined earlier and with an example.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the presentation of our central claims. We address each major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract and introduction: the central claim that a bijective triangular matrix representation exists and uniquely captures temporal order for all rooted ranked unlabeled networks (including variable hybridization counts) is asserted without an explicit construction, injectivity proof, or surjectivity argument; this is load-bearing for the subsequent matrix-norm distances.

    Authors: The explicit construction of the bijective triangular matrix representation appears in Section 2, with the mapping defined via the temporal ordering of internal nodes. Injectivity is established in Theorem 3.1 and surjectivity in Theorem 3.2; both theorems explicitly cover networks with arbitrary reticulation counts. We agree that the abstract and introduction would be strengthened by forward references to these results. We will revise both sections to briefly describe the construction and cite the theorems. revision: yes

  2. Referee: [Abstract] The assertion that the metrics distinguish networks with differing numbers of hybridizations relies on the matrix representation being well-defined and bijective across heterochronous cases, yet no verification or counter-example analysis is supplied to confirm this.

    Authors: Section 2.3 defines the representation for heterochronous networks by augmenting the matrix with tip-time information, preserving bijectivity and thereby ensuring distinct reticulation counts map to distinct matrices. While the simulation studies include heterochronous examples, we acknowledge the absence of a dedicated verification subsection. We will add such a subsection containing explicit checks and a short counter-example search confirming that the bijectivity property holds without collision in the heterochronous setting. revision: yes

Circularity Check

0 steps flagged

No circularity: constructive definitions of matrix representation and matrix-norm distances

full rationale

The paper defines a triangular matrix representation for rooted ranked unlabeled phylogenetic networks and applies standard matrix norms to obtain distances. This is a direct constructive mapping and metric definition, not a derivation that reduces to its own inputs by construction, fitted parameters renamed as predictions, or load-bearing self-citations. The bijectivity claim is presented as a property of the representation rather than an assumption that circularly justifies the distances. No equations or steps in the abstract or description exhibit the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review prevents identification of specific free parameters, axioms, or invented entities; none are explicitly named in the provided text.

pith-pipeline@v0.9.1-grok · 5707 in / 938 out tokens · 20822 ms · 2026-06-27T18:17:26.389199+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

20 extracted references · 18 canonical work pages · 1 internal anchor

  1. [1]

    URLhttps://www.pnas.org/doi/abs/10.1073/pnas.1116871109

    doi: 10.1073/pnas.1116871109. URLhttps://www.pnas.org/doi/abs/10.1073/pnas.1116871109. David Aldous. Probability distributions on cladograms. In David Aldous and Robin Pemantle, editors, Random Discrete Structures, pages 1–18, New York, NY,

  2. [2]

    Michael G

    doi: 10.1007/s11538-018-0485-4. Michael G. B. Blum and Olivier Fran¸ cois. Which Random Processes Describe the Tree of Life? A Large-Scale Study of Phylogenetic Tree Imbalance. Systematic Biology, 55(4):685–691, 08

  3. [3]

    doi: 10.1080/10635150600889625

    ISSN 1063-5157. doi: 10.1080/10635150600889625. URLhttps://doi.org/10.1080/10635150600889625. Remco Bouckaert, Joseph Heled, Denise K¨ uhnert, Timothy Vaughan, Chieh-Hsi Wu, Dong Xie, Marc Suchard, Andrew Rambaut, and Alexei Drummond. Beast 2: A software platform for bayesian evolu- tionary analysis. PLoS computational biology, 10:e1003537, 04

  4. [4]

    Gabriel Cardona, Merc` e Llabr´ es, Francesc Rossell´ o, and Gabriel Valiente

    doi: 10.1371/journal.pcbi.1003537. Gabriel Cardona, Merc` e Llabr´ es, Francesc Rossell´ o, and Gabriel Valiente. A distance metric for a class of tree-sibling phylogenetic networks. Bioinformatics, 24(13):1481–1488, 2008a. Gabriel Cardona, Merc` e Llabr´ es, Francesc Rossell´ o, and Gabriel Valiente. Metrics for phylogenetic networks i: Generalizations o...

  5. [5]

    Neural networks and physical systems with emergent collective com- putational abilities

    doi: 10.1073/pnas. 2004999117. Robert C Griffiths and Paul Marjoram. An ancestral recombination graph. Institute for Mathematics and its Applications, 87:257,

  6. [6]

    Graph diffusion distance: A difference measure for weighted graphs based on the graph laplacian exponential kernel

    David Hammond, Yaniv Gur, and Chris Johnson. Graph diffusion distance: A difference measure for weighted graphs based on the graph laplacian exponential kernel. 2013 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2013 - Proceedings, 12

  7. [7]

    Daniel H

    doi: 10.1109/GlobalSIP.2013.6736904. Daniel H. Huson, Regula Rupp, and Celine Scornavacca. Phylogenetic Networks. Cambridge University Press, Cambridge,

  8. [8]

    doi: 10.1017/CBO9780511974076

    ISBN 9780511974076. doi: 10.1017/CBO9780511974076. URLhttp://ebooks. cambridge.org/ref/id/CBO9780511974076. Remie Janssen and Pengyu Liu. Comparing the topology of phylogenetic network generators. Journal of bioinformatics and computational biology, 19(06):2140012,

  9. [9]

    doi: 10.1093/molbev/mst010

    ISSN 0737-4038. doi: 10.1093/molbev/mst010. URLhttps://doi.org/10.1093/molbev/mst010. Jaehee Kim, Noah Rosenberg, and Julia Palacios. Distance metrics for ranked evolutionary trees.Proceedings of the National Academy of Sciences, 117:28876–28886, 11

  10. [10]

    Sungsik Kong, Claudia Sol´ ıs-Lemus, and George P Tiley

    doi: 10.1073/pnas.1922851117. Sungsik Kong, Claudia Sol´ ıs-Lemus, and George P Tiley. Phylogenetic networks empower biodiversity research. Proceedings of the National Academy of Sciences, 122(31):e2410934122,

  11. [11]

    doi: 10.1126/science.1250092. Carla Mavian, Sergei Pond, Simone Marini, Brittany Rife Magalis, Anne-Mieke Vandamme, Simon Dellicour, Samuel Scarpino, Charlotte Houldcroft, Christian Julian Villabona-Arenas, Taylor Paisie, N´ ıdia Trov˜ ao, Christina Boucher, Yun Zhang, Richard Scheuermann, Olivier Gascuel, Tommy Lam, Marc Suchard, Ana Abecasis, Eduan Wilk...

  12. [12]

    Michael Maxfield, Jingcheng Xu, and C´ ecile An´ e

    doi: 10.1073/pnas.2007295117. Michael Maxfield, Jingcheng Xu, and C´ ecile An´ e. A dissimilarity measure for semidirected networks.IEEE Transactions on Computational Biology and Bioinformatics,

  13. [13]

    Vincent Moulton, James Oldman, and Taoyang Wu

    doi: 10.1600/036364418X696897. Vincent Moulton, James Oldman, and Taoyang Wu. A cubic-time algorithm for computing the trinet distance between level-1 networks. Information Processing Letters, 123:36–41,

  14. [14]

    Luay Nakhleh

    doi: 10.1073/pnas.1918304117. Luay Nakhleh. A metric on the space of reduced phylogenetic networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 7(2):218–222,

  15. [15]

    24 Origins and evolutionary genomics of the 2009 swine-origin h1n1 influenza a epidemic.Nature, 459:1122–5, 07

    Gavin Smith, Vijaykrishna Dhanasekaran, Justin Bahl, Samantha Lycett, Michael Worobey, Oliver Pybus, Siu Ma, Chung Cheung, Jayna Raghwani, Samir Bhatt, Joseph S Peiris, Yi Guan, and Andrew Rambaut. 24 Origins and evolutionary genomics of the 2009 swine-origin h1n1 influenza a epidemic.Nature, 459:1122–5, 07

  16. [16]

    Claudia Sol´ ıs-Lemus and C´ ecile An´ e

    doi: 10.1038/nature08182. Claudia Sol´ ıs-Lemus and C´ ecile An´ e. Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting. PLoS Genetics, 12(3):e1005896,

  17. [17]

    Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting

    ISSN 1553-7404. doi: 10.1371/journal. pgen.1005896. URLhttp://arxiv.org/abs/1509.06075. Santiago S´ anchez-Pacheco, Sungsik Kong, Paola Pulido-Santacruz, Robert Murphy, and Laura Ku- batko. Median-joining network analysis of sars-cov-2 genomes is neither phylogenetic nor evolutionary. Proceedings of the National Academy of Sciences, 117, 05

  18. [18]

    John Wakeley

    doi: 10.1073/pnas.2007062117. John Wakeley. Coalescent Theory: An Introduction. Roberts & Company Publishers, June

  19. [19]

    Yun Yu, Jianrong Dong, Kevin J

    doi: 10.1371/journal.pgen.1002660. Yun Yu, Jianrong Dong, Kevin J. Liu, and Luay K Nakhleh. Maximum likelihood inference of reticulate evolutionary histories. Proceedings of the National Academy of Sciences of the United States of America, 111 46:16448–53,

  20. [20]

    URLhttps:// royalsocietypublishing.org/doi/abs/10.1098/rstb.1925.0002

    doi: 10.1098/rstb.1925.0002. URLhttps:// royalsocietypublishing.org/doi/abs/10.1098/rstb.1925.0002. Chi Zhang, Huw A. Ogilvie, Alexei J. Drummond, and Tanja Stadler. Bayesian inference of species networks from multilocus sequence data. Molecular Biology and Evolution,