pith. machine review for the scientific record. sign in

arxiv: 2604.12164 · v1 · submitted 2026-04-14 · 🧬 q-bio.PE · cs.DS· math.OC

Recognition: unknown

Phylogenetic Inference under the Balanced Minimum Evolution Criterion via Semidefinite Programming

P. Skums

Pith reviewed 2026-05-10 14:36 UTC · model grok-4.3

classification 🧬 q-bio.PE cs.DSmath.OC
keywords phylogeneticsbalanced minimum evolutionsemidefinite programmingphylogenetic inferencedistance-based methodstree topologyconvex relaxationrounding algorithm
0
0 comments X

The pith

Semidefinite programming relaxes the balanced minimum evolution problem to recover accurate phylogenetic trees.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an algorithm that casts the balanced minimum evolution problem as a semidefinite program and recovers valid tree topologies through iterative rounding. It tests the method on simulated and empirical datasets, showing that the resulting trees match known structures at high accuracy. The approach treats phylogenetics as a convex optimization task rather than a purely combinatorial search. If the relaxation stays tight, it supplies a practical new route for distance-based tree inference that can be generalized beyond the single criterion studied here.

Core claim

An SDP relaxation of the BME objective, combined with an iterative rounding scheme, converts relaxed matrix solutions into valid tree topologies and produces accurate phylogenetic reconstructions on both simulated and empirical data.

What carries the argument

The SDP relaxation of the balanced minimum evolution objective function together with an iterative rounding procedure that maps positive-semidefinite matrices back to tree topologies.

If this is right

  • The SDP-plus-rounding pipeline reconstructs phylogenies accurately on both simulated and real data.
  • The same convex-relaxation strategy can be extended to other phylogenetic optimization problems.
  • Distance-based inference gains access to the same tight convex bounds already used in many combinatorial settings outside biology.
  • Larger or noisier distance matrices become tractable once the problem is moved into the positive-semidefinite cone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the rounding step proves stable, the framework could replace heuristic search in pipelines that currently rely on neighbor-joining or fastME variants.
  • The approach invites direct comparison with integer-programming formulations of the same BME objective to quantify the price of the convex relaxation.
  • Because SDP solvers scale with matrix size rather than with the number of tree topologies, the method may open the door to inference on hundreds of taxa where exhaustive search remains impossible.

Load-bearing premise

The SDP relaxation of the BME objective stays sufficiently tight that the rounding procedure recovers high-quality or optimal trees without frequent failure or prohibitive cost.

What would settle it

On standard benchmark datasets the method would return trees whose Robinson-Foulds distance to the true topology exceeds the distance achieved by established BME heuristics.

Figures

Figures reproduced from arXiv: 2604.12164 by P. Skums.

Figure 1
Figure 1. Figure 1: Performance profiles of SDP relaxation and rounding under different algorithm settings. [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Left: Joint comparison of four methods on simulated data. The y-axis shows the percentage of instances where a method achieved the best objective value among the four methods (ties allowed). Right: Number of SPR moves for FastME (random initial tree) and SDPTree after SDP rounding for the Hamming dataset nestedness and balance constraints, yielding solutions that are more coherent and easier to round. It a… view at source ↗
Figure 3
Figure 3. Figure 3: Left: Joint comparison of four methods on Phylobench data. The y-axis shows the percentage of instances where a method achieved the best objective value among the four methods (ties allowed). Right: running time of the single SDP relaxation + rounding step (in minutes) 4 Discussion In this study, we proposed a novel algorithmic approach to phylogenetic inference based on Semidefinite Programming (SDP) and … view at source ↗
read the original abstract

In this study, we investigate the application of Semidefinite Programming (SDP) to phylogenetics. SDP is a powerful optimization framework that seeks to optimize a linear objective function over the cone of positive semidefinite matrices. As a convex optimization problem, SDP generalizes linear programming and provides tight relaxations for many combinatorial optimization problems. However, despite its many applications, SDP remains largely unused in computational biology. We argue that SDP relaxations are particularly well suited for phylogenetic inference. As a proof of concept, we focus on the Balanced Minimum Evolution (BME) problem, a widely used model in distance-based phylogenetics. We propose an algorithm combining an SDP relaxation with a rounding scheme that iteratively converts relaxed solutions into valid tree topologies. Experiments on simulated and empirical datasets show that the method enables accurate phylogenetic reconstruction. The approach is sufficiently general to be extendable to other phylogenetic problems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes an SDP relaxation of the Balanced Minimum Evolution (BME) objective for distance-based phylogenetic inference, combined with a deterministic iterative rounding procedure that projects relaxed solutions onto valid tree topologies. Experiments on simulated data (recovery of known topologies) and empirical datasets (comparisons to NJ and other methods) report Robinson-Foulds distances and runtimes that scale to a few dozen taxa, with the claim that the framework is extensible to other phylogenetic problems.

Significance. If the reported performance holds, the work supplies a convex-optimization route to a standard phylogenetic criterion that is distinct from existing heuristics and admits a standard SDP formulation whose tightness is directly tested by the rounding success rate. The explicit reporting of RF distances, runtime scaling, and deterministic rounding constitute reproducible evidence that addresses the central practical question of whether the relaxation remains useful after projection.

minor comments (3)
  1. [Abstract] Abstract: the claim of 'accurate phylogenetic reconstruction' is not accompanied by any numerical summary (e.g., mean RF distance or success rate); a single sentence with the key quantitative result would make the abstract self-contained.
  2. [Methods] The description of the iterative rounding scheme (projection steps, termination criterion, and handling of numerical tolerance) is only sketched; a short pseudocode block or explicit list of the projection operations would remove ambiguity for readers wishing to re-implement the method.
  3. [Experiments] Table or figure captions for the empirical results should explicitly state the number of taxa, number of replicates, and the precise distance matrix source for each dataset so that the reported RF values can be directly compared with future work.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of our work, recognition of its significance as a convex-optimization approach to BME, and recommendation for minor revision. The referee's description accurately reflects the SDP relaxation, deterministic iterative rounding, and the reported experiments on simulated and empirical data.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper formulates an SDP relaxation of the standard BME objective (linear function over PSD matrices with tree topology constraints) and applies a deterministic iterative rounding procedure to recover topologies. This follows conventional convex optimization techniques with no reduction of outputs to fitted inputs, self-definitional equations, or load-bearing self-citations. Validation on simulated and empirical data (Robinson-Foulds distances, runtime scaling) is independent of the formulation itself, confirming the derivation chain is self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no free parameters, axioms, or invented entities are specified in the provided text. The method relies on standard SDP theory and rounding techniques from the optimization literature.

pith-pipeline@v0.9.0 · 5449 in / 1097 out tokens · 51438 ms · 2026-05-10T14:36:09.405006+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

44 extracted references

  1. [1]

    Determining protein structures from noesy distance constraints by semidefinite programming

    Babak Alipanahi, Nathan Krislock, Ali Ghodsi, Henry Wolkowicz, Logan Donaldson, and Ming Li. Determining protein structures from noesy distance constraints by semidefinite programming. Journal of Computational Biology, 20(4):296–310, 2013

  2. [2]

    The performance of neighbor-joining methods of phylogenetic reconstruction

    Kevin Atteson. The performance of neighbor-joining methods of phylogenetic reconstruction. Algorithmica, 25(2):251–278, 1999

  3. [3]

    SIAM, 2001

    Aharon Ben-Tal and Arkadi Nemirovski.Lectures on modern convex optimization: analysis, algorithms, and engineering applications. SIAM, 2001

  4. [4]

    Cambridge university press, 2004

    Stephen Boyd, Stephen P Boyd, and Lieven Vandenberghe.Convex optimization. Cambridge university press, 2004

  5. [5]

    A method for deducing branching sequences in phylogeny

    Joseph H Camin and Robert R Sokal. A method for deducing branching sequences in phylogeny. Evolution, 19(3):311–326, September 1965

  6. [6]

    New heuristics for phylogeny estimation under the balanced minimum evolution criterion.Bioinformatics, 41(7):btaf361, 2025

    Daniele Catanzaro, Henri Dehaybe, and Raffaele Pesenti. New heuristics for phylogeny estimation under the balanced minimum evolution criterion.Bioinformatics, 41(7):btaf361, 2025

  7. [7]

    The bal- anced minimum evolution problem.INFORMS Journal on Computing, 24(2):276–294, 2012

    Daniele Catanzaro, Martine Labb´ e, Raffaele Pesenti, and Juan-Jos´ e Salazar-Gonz´ alez. The bal- anced minimum evolution problem.INFORMS Journal on Computing, 24(2):276–294, 2012

  8. [8]

    Optimizing over path-length matrices of unrooted binary trees.Mathematical Programming, 215(1):453–505, 2026

    Daniele Catanzaro, Raffaele Pesenti, Allan Sapucaia, and Laurence Wolsey. Optimizing over path-length matrices of unrooted binary trees.Mathematical Programming, 215(1):453–505, 2026

  9. [9]

    On the balanced minimum evolution polytope.Discrete Optimization, 36:100570, 2020

    Daniele Catanzaro, Raffaele Pesenti, and Laurence Wolsey. On the balanced minimum evolution polytope.Discrete Optimization, 36:100570, 2020

  10. [10]

    Maximum likelihood of evolutionary trees: hardness and approxi- mation.Bioinformatics, 21(suppl 1):i97–i106, 2005

    Benny Chor and Tamir Tuller. Maximum likelihood of evolutionary trees: hardness and approxi- mation.Bioinformatics, 21(suppl 1):i97–i106, 2005

  11. [11]

    John Wiley & Sons, 1999

    Thomas M Cover.Elements of information theory. John Wiley & Sons, 1999

  12. [12]

    Computational complexity of inferring phylogenies from dissimilarity matrices

    William HE Day. Computational complexity of inferring phylogenies from dissimilarity matrices. Bulletin of mathematical biology, 49(4):461–467, 1987

  13. [13]

    Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle.Journal of computational biology, 9(5):687–705, 2002

    Richard Desper and Olivier Gascuel. Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle.Journal of computational biology, 9(5):687–705, 2002

  14. [14]

    Theoretical foundation of the balanced minimum evolu- tion method of phylogenetic inference and its relationship to weighted least-squares tree fitting

    Richard Desper and Olivier Gascuel. Theoretical foundation of the balanced minimum evolu- tion method of phylogenetic inference and its relationship to weighted least-squares tree fitting. Molecular Biology and Evolution, 21(3):587–598, 2004

  15. [15]

    Benchmarking optimization software with performance profiles.Mathematical programming, 91(2):201–213, 2002

    Elizabeth D Dolan and Jorge J Mor´ e. Benchmarking optimization software with performance profiles.Mathematical programming, 91(2):201–213, 2002

  16. [16]

    Origin and evolution of the octoploid strawberry genome.Nature genetics, 51(3):541–547, 2019

    Patrick P Edger, Thomas J Poorten, Robert VanBuren, Michael A Hardigan, Marivi Colle, Michael R McKain, Ronald D Smith, Scott J Teresi, Andrew DL Nelson, Ching Man Wai, et al. Origin and evolution of the octoploid strawberry genome.Nature genetics, 51(3):541–547, 2019

  17. [17]

    Convergence assessment for bayesian phylogenetic analysis using mcmc simulation.Methods in Ecology and Evolution, 13(1):77–90, 2022

    Luiza Guimaraes Fabreti and Sebastian H¨ ohna. Convergence assessment for bayesian phylogenetic analysis using mcmc simulation.Methods in Ecology and Evolution, 13(1):77–90, 2022

  18. [18]

    Approximating the balanced minimum evolution problem

    Samuel Fiorini and Gwena¨ el Joret. Approximating the balanced minimum evolution problem. Operations research letters, 40(1):31–35, 2012

  19. [19]

    Facets of the balanced minimal evolution poly- tope.Journal of mathematical biology, 73:447–468, 2016

    Stefan Forcey, Logan Keefe, and William Sands. Facets of the balanced minimal evolution poly- tope.Journal of mathematical biology, 73:447–468, 2016. 10

  20. [20]

    On the approximability of the fixed-tree balanced minimum evolution problem

    Martin Frohn. On the approximability of the fixed-tree balanced minimum evolution problem. Optimization Letters, 15(6):2321–2329, 2021

  21. [21]

    Neighbor-joining revealed.Molecular biology and evolution, 23(11):1997–2000, 2006

    Olivier Gascuel and Mike Steel. Neighbor-joining revealed.Molecular biology and evolution, 23(11):1997–2000, 2006

  22. [22]

    An evolution strategy approach for the balanced minimum evolution problem.Bioinformatics, 39(11):btad660, 2023

    Andrea Gasparin, Federico Julian Camerota Verd` u, Daniele Catanzaro, and Lorenzo Castelli. An evolution strategy approach for the balanced minimum evolution problem.Bioinformatics, 39(11):btad660, 2023

  23. [23]

    Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming.Journal of the ACM (JACM), 42(6):1115–1145, 1995

    Michel X Goemans and David P Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming.Journal of the ACM (JACM), 42(6):1115–1145, 1995

  24. [24]

    Cambridge University Press, 2019

    Dan Gusfield.Integer linear programming in computational and systems biology. Cambridge University Press, 2019

  25. [25]

    Haplotype phasing using semidefinite program- ming

    Konstantinos Kalpakis and Parag Namjoshi. Haplotype phasing using semidefinite program- ming. InFifth IEEE Symposium on Bioinformatics and Bioengineering (BIBE’05), pages 145–152. IEEE, 2005

  26. [26]

    On the genealogy of large populations.Journal of applied probability, 19(A):27– 43, 1982

    John FC Kingman. On the genealogy of large populations.Journal of applied probability, 19(A):27– 43, 1982

  27. [27]

    Cellphy: accurate and fast probabilistic inference of single-cell phylogenies from scdna-seq data.Genome biology, 23(1):37, 2022

    Alexey Kozlov, Joao M Alves, Alexandros Stamatakis, and David Posada. Cellphy: accurate and fast probabilistic inference of single-cell phylogenies from scdna-seq data.Genome biology, 23(1):37, 2022

  28. [28]

    Fastme 2.0: a comprehensive, accurate, and fast distance-based phylogeny inference program.Molecular biology and evolution, 32(10):2798– 2800, 2015

    Vincent Lefort, Richard Desper, and Olivier Gascuel. Fastme 2.0: a comprehensive, accurate, and fast distance-based phylogeny inference program.Molecular biology and evolution, 32(10):2798– 2800, 2015

  29. [29]

    The origins and spread of domestic horses from the western eurasian steppes.Nature, 598(7882):634– 640, 2021

    Pablo Librado, Naveed Khan, Antoine Fages, Mariya A Kusliy, Tomasz Suchan, Laure Tonasso- Calvi` ere, St´ ephanie Schiavinato, Duha Alioglu, Aurore Fromentier, Aude Perdereau, et al. The origins and spread of domestic horses from the western eurasian steppes.Nature, 598(7882):634– 640, 2021

  30. [30]

    High accuracy meets high throughput for near full-length 16s ribosomal rna amplicon sequencing on the nanopore platform.PNAS nexus, 3(10):pgae411, 2024

    Xuan Lin, Katherine Waring, Hans Ghezzi, Carolina Tropini, John Tyson, and Ryan M Ziels. High accuracy meets high throughput for near full-length 16s ribosomal rna amplicon sequencing on the nanopore platform.PNAS nexus, 3(10):pgae411, 2024

  31. [31]

    The influence of the number of tree searches on maximum likelihood inference in phylogenomics.Systematic Biology, 73(5):807–822, 2024

    Chao Liu, Xiaofan Zhou, Yuanning Li, Chris Todd Hittinger, Ronghui Pan, Jinyan Huang, Xue- xin Chen, Antonis Rokas, Yun Chen, and Xing-Xing Shen. The influence of the number of tree searches on maximum likelihood inference in phylogenomics.Systematic Biology, 73(5):807–822, 2024

  32. [32]

    L¨ ofberg

    J. L¨ ofberg. Yalmip : A toolbox for modeling and optimization in matlab. InIn Proceedings of the CACSD Conference, Taipei, Taiwan, 2004

  33. [33]

    Semidefinite programs and combinatorial optimization

    L´ aszl´ o Lov´ asz. Semidefinite programs and combinatorial optimization. InRecent advances in algorithms and combinatorics, pages 137–194. Springer, 2003

  34. [34]

    Global optimization in protein docking using clustering, underestimation and semidefinite programming.Optimisation Methods and Software, 22(5):803–811, 2007

    Roummel F Marcia, Julie C Mitchell, and Stephen J Wright. Global optimization in protein docking using clustering, underestimation and semidefinite programming.Optimisation Methods and Software, 22(5):803–811, 2007

  35. [35]

    Computability of global solutions to factorable nonconvex programs: Part i—convex underestimating problems.Mathematical programming, 10(1):147–175, 1976

    Garth P McCormick. Computability of global solutions to factorable nonconvex programs: Part i—convex underestimating problems.Mathematical programming, 10(1):147–175, 1976

  36. [36]

    Direct calculation of a tree length using a distance matrix.Journal of Molecular Evolution, 51:41–47, 2000

    Yves Pauplin. Direct calculation of a tree length using a distance matrix.Journal of Molecular Evolution, 51:41–47, 2000. 11

  37. [37]

    The neighbor-joining method: a new method for reconstructing phylogenetic trees.Mol

    N Saitou and M Nei. The neighbor-joining method: a new method for reconstructing phylogenetic trees.Mol. Biol. Evol., 4(4):406–425, 1987

  38. [38]

    Cyclic permutations and evolutionary trees.Advances in Applied Mathematics, 32(4):669–680, 2004

    Charles Semple and Mike Steel. Cyclic permutations and evolutionary trees.Advances in Applied Mathematics, 32(4):669–680, 2004

  39. [39]

    IOP Publishing, 2023

    Paul Skrzypczyk and Daniel Cavalcanti.Semidefinite programming in quantum information sci- ence. IOP Publishing, 2023

  40. [40]

    Phy- lobench: A benchmark for evaluating phylogenetic programs.Molecular Biology and Evolution, 41(6):msae084, 2024

    Sergey Spirin, Andrey Sigorskikh, Aleksei Efremov, Dmitry Penzar, and Anna Karyagina. Phy- lobench: A benchmark for evaluating phylogenetic programs.Molecular Biology and Evolution, 41(6):msae084, 2024

  41. [41]

    Semidefinite programming.SIAM review, 38(1):49–95, 1996

    Lieven Vandenberghe and Stephen Boyd. Semidefinite programming.SIAM review, 38(1):49–95, 1996

  42. [42]

    Trycycler: consensus long-read assemblies for bacterial genomes.Genome biology, 22(1):266, 2021

    Ryan R Wick, Louise M Judd, Louise T Cerdeira, Jane Hawkey, Guillaume M´ eric, Ben Vezina, Kelly L Wyres, and Kathryn E Holt. Trycycler: consensus long-read assemblies for bacterial genomes.Genome biology, 22(1):266, 2021

  43. [43]

    Springer Science & Business Media, 2012

    Henry Wolkowicz, Romesh Saigal, and Lieven Vandenberghe.Handbook of semidefinite program- ming: theory, algorithms, and applications. Springer Science & Business Media, 2012

  44. [44]

    Evaluating fast maxi- mum likelihood-based phylogenetic programs using empirical phylogenomic data sets.Molecular biology and evolution, 35(2):486–503, 2018

    Xiaofan Zhou, Xing-Xing Shen, Chris Todd Hittinger, and Antonis Rokas. Evaluating fast maxi- mum likelihood-based phylogenetic programs using empirical phylogenomic data sets.Molecular biology and evolution, 35(2):486–503, 2018. 12