Tree-Structured Orthonormal Decomposition of the Aitchison Simplex
Pith reviewed 2026-06-27 11:05 UTC · model grok-4.3
The pith
PolyILR produces a canonical orthonormal decomposition of the Aitchison tangent space aligned with any tree topology.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PolyILR yields a canonical orthonormal decomposition of the Aitchison tangent space aligned with any tree topology, producing stable interpretable features that enable inference at multiscale tree resolution. The construction defines a weighted local geometry at each internal node capturing full branching structure, then lifts these to a global orthonormal basis where every coordinate corresponds to a specific tree location. On microbiome and single-cell benchmarks, PolyILR yields stable, interpretable features and enables inference at multiscale tree resolution. It also establishes a novel theoretical connection to softmax classifiers.
What carries the argument
The PolyILR construction, which defines a weighted local geometry at each internal node to capture full branching structure and lifts it to a global orthonormal basis preserving the Aitchison inner product.
If this is right
- Every coordinate in the resulting basis corresponds to a specific location on the input tree.
- The features remain stable across different tree resolutions for microbiome and single-cell data.
- Inference becomes possible at multiple scales of the same tree without discarding geometry or structure.
- A direct theoretical link appears between the decomposition and softmax classifiers.
Where Pith is reading between the lines
- The same lifting procedure might extend to other hierarchical structures such as ontologies or phylogenies outside the tested domains.
- The connection to softmax suggests the coordinates could serve as a drop-in replacement for standard inputs in probabilistic models that already use trees.
- Because the basis is canonical for any tree, it could support direct comparison of models trained on different but related hierarchies.
Load-bearing premise
A weighted local geometry defined at each internal node can be lifted to a single global orthonormal basis while preserving the Aitchison inner product and capturing the full branching structure for arbitrary trees.
What would settle it
An explicit counterexample tree (binary or non-binary) for which the lifted coordinates fail to remain orthonormal under the Aitchison inner product or fail to span the full tangent space.
Figures
read the original abstract
Compositional data -- vectors encoding relative proportions -- arise across scientific domains, including ecology, geochemistry, and genomics. The features in these data often come with known hierarchical structure (e.g., taxonomies, phylogenies, ontologies), yet existing methods either ignore this structure, discard the intrinsic Aitchison geometry, are designed for binary trees, or yield incomplete coordinate systems. We describe PolyILR, a canonical orthonormal decomposition of the Aitchison tangent space aligned with any tree topology. Our construction defines a weighted local geometry at each internal node capturing full branching structure, then lifts these to a global orthonormal basis where every coordinate corresponds to a specific tree location. On microbiome and single-cell benchmarks, PolyILR yields stable, interpretable features and enables inference at multiscale tree resolution. We also establish a novel theoretical connection to softmax classifiers, suggesting possible applications to probabilistic modeling.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces PolyILR as a canonical orthonormal decomposition of the Aitchison tangent space aligned with arbitrary tree topologies for compositional data. It defines a weighted local geometry at each internal node that captures the full branching structure and lifts these local bases to a single global orthonormal basis in which each coordinate corresponds to a specific tree location. The work reports stable interpretable features on microbiome and single-cell benchmarks, multiscale inference capability, and a theoretical link to softmax classifiers.
Significance. If the lifting step is shown to preserve the Aitchison inner product and produce a complete basis for any tree (including nodes with branching factor >2), the result would supply a principled, tree-aligned coordinate system for compositional data that respects the intrinsic geometry while enabling hierarchical analysis; this would be a useful contribution to compositional data methods in ecology, genomics, and related fields.
major comments (1)
- [Main construction (lifting step)] The central construction asserts that weighted local geometries defined at internal nodes can be lifted to a global orthonormal basis of the (D-1)-dimensional Aitchison tangent space while preserving the inner product for arbitrary (including non-binary) trees. The manuscript must supply an explicit verification or inductive argument showing that the lifted vectors remain orthogonal and span the full space when an internal node has three or more children; without this, the claim that the decomposition is canonical and complete for general tree topologies is not secured.
minor comments (2)
- [Abstract] The abstract states existence of the construction and benchmark outcomes but supplies no derivation outline, error analysis, or data-exclusion criteria; the full manuscript should include a concise proof sketch or algorithmic pseudocode for the lifting procedure.
- [Methods] Notation for the weighted local inner product at each node and the precise definition of the lifting map should be introduced with explicit equations rather than descriptive text only.
Simulated Author's Rebuttal
We thank the referee for the careful reading and for identifying a point where the theoretical justification can be strengthened. We agree that an explicit inductive argument for the lifting step on non-binary trees will improve the manuscript and will add it in revision.
read point-by-point responses
-
Referee: [Main construction (lifting step)] The central construction asserts that weighted local geometries defined at internal nodes can be lifted to a global orthonormal basis of the (D-1)-dimensional Aitchison tangent space while preserving the inner product for arbitrary (including non-binary) trees. The manuscript must supply an explicit verification or inductive argument showing that the lifted vectors remain orthogonal and span the full space when an internal node has three or more children; without this, the claim that the decomposition is canonical and complete for general tree topologies is not secured.
Authors: We acknowledge that while the manuscript states the construction holds for arbitrary trees and provides the local weighted geometry and lifting procedure, an explicit inductive verification of orthogonality and completeness for nodes with branching factor greater than two is not supplied in detail. In the revised manuscript we will insert a new subsection (or appendix) containing a short inductive argument: the base case for binary nodes follows directly from the local orthonormal construction; the inductive step for a node with k>2 children shows that the weighted local basis vectors remain mutually orthogonal under the Aitchison inner product, that their lifts are orthogonal to all previously lifted vectors from other subtrees, and that the resulting set spans the full (D-1)-dimensional tangent space. This addition will make the canonicity claim fully rigorous without altering any other results or claims. revision: yes
Circularity Check
No circularity: construction presented as novel without reduction to inputs or self-citations
full rationale
The provided abstract and description outline a construction that defines weighted local geometry at internal nodes and lifts it to a global orthonormal basis of the Aitchison tangent space. No equations, fitted parameters, or self-citations are shown that would make any claimed prediction or uniqueness equivalent to the inputs by definition. The central claim of a canonical decomposition aligned with arbitrary tree topologies is presented as a new method without load-bearing reliance on prior author work or renaming of known results. This is the typical case of a self-contained derivation.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Mathematical Geology , volume=
Isometric logratio transformations for compositional data analysis , author=. Mathematical Geology , volume=. 2003 , publisher=
2003
-
[2]
Journal of Scientific Computing , volume=
Haar-Like Wavelets on Hierarchical Trees , author=. Journal of Scientific Computing , volume=. 2024 , publisher=
2024
-
[3]
Austrian Journal of Statistics , volume=
Changing the reference measure in the simplex and its weighting effects , author=. Austrian Journal of Statistics , volume=
-
[4]
elife , volume=
A phylogenetic transform enhances analysis of compositional microbiota data , author=. elife , volume=. 2017 , publisher=
2017
-
[5]
PeerJ , volume=
Phylogenetic factorization of compositional data yields lineage-level associations in microbiome datasets , author=. PeerJ , volume=. 2017 , publisher=
2017
-
[6]
Mathematical Geology , volume=
Groups of parts and their balances in compositional data analysis , author=. Mathematical Geology , volume=. 2005 , publisher=
2005
-
[7]
Advances in Neural Information Processing Systems , volume=
Fisher flow matching for generative modeling over discrete data , author=. Advances in Neural Information Processing Systems , volume=
-
[8]
arXiv preprint arXiv:2510.27480 , year=
Simplex-to-Euclidean Bijections for Categorical Flow Matching , author=. arXiv preprint arXiv:2510.27480 , year=
-
[9]
Journal of the Royal Statistical Society: Series B (Methodological) , volume=
The statistical analysis of compositional data , author=. Journal of the Royal Statistical Society: Series B (Methodological) , volume=. 1982 , publisher=
1982
-
[10]
Stochastic Environmental Research and Risk Assessment , volume=
Geometric approach to statistical analysis on the simplex , author=. Stochastic Environmental Research and Risk Assessment , volume=. 2001 , publisher=
2001
-
[11]
Journal of the American Statistical Association , volume=
Statistical interpretation of species composition , author=. Journal of the American Statistical Association , volume=. 2001 , publisher=
2001
-
[12]
2018 , publisher=
Compositional data analysis in practice , author=. 2018 , publisher=
2018
-
[13]
Frontiers in Microbiology , volume=
Microbiome datasets are compositional: and this is not optional , author=. Frontiers in Microbiology , volume=. 2017 , publisher=
2017
-
[14]
Ecology , volume=
Compositional data in community ecology: the paradigm or peril of proportions? , author=. Ecology , volume=. 1997 , publisher=
1997
-
[15]
Earth and Planetary Science Letters , volume=
Calibration of XRF core scanners for quantitative geochemical logging of sediment cores: Theory and application , author=. Earth and Planetary Science Letters , volume=. 2008 , publisher=
2008
-
[16]
2019 , publisher=
Phylogenetic comparative methods: learning from trees , author=. 2019 , publisher=
2019
-
[17]
The ISME Journal , volume=
Fast UniFrac: facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and PhyloChip data , author=. The ISME Journal , volume=. 2010 , publisher=
2010
-
[18]
Lancaster, HO , journal=. The. 1965 , publisher=
1965
-
[19]
Biometrika , volume=
Variable selection in regression with compositional covariates , author=. Biometrika , volume=. 2014 , publisher=
2014
-
[20]
Microbial ecology in health and disease , volume=
Analysis of composition of microbiomes: a novel method for studying microbial composition , author=. Microbial ecology in health and disease , volume=. 2015 , publisher=
2015
-
[21]
MSystems , volume=
Balances: a new perspective for microbiome analysis , author=. MSystems , volume=. 2018 , publisher=
2018
-
[22]
MSystems , volume=
Balance trees reveal microbial niche differentiation , author=. MSystems , volume=. 2017 , publisher=
2017
-
[23]
Ecological Monographs , volume=
Phylofactorization: a graph partitioning algorithm to identify phylogenetic scales of ecological data , author=. Ecological Monographs , volume=. 2019 , publisher=
2019
-
[24]
2021 , publisher=
Buettner, Maren and Ostner, Johannes and Mueller, Christian L and Theis, Fabian J and Schubert, Benjamin , journal=. 2021 , publisher=
2021
-
[25]
Brigham, E Oran , year=. The
-
[26]
2002 , publisher=
A theory for multiresolution signal decomposition: the wavelet representation , author=. 2002 , publisher=
2002
-
[27]
Nature Methods , volume=
Accessible, curated metagenomic data through ExperimentHub , author=. Nature Methods , volume=. 2017 , publisher=
2017
-
[28]
Nature , volume=
Structure, function and diversity of the healthy human microbiome , author=. Nature , volume=. 2012 , publisher=
2012
-
[29]
BMC systems Biology , volume=
Polytomy identification in microbial phylogenetic reconstruction , author=. BMC systems Biology , volume=. 2011 , publisher=
2011
-
[30]
Journal of Bacteriology , volume=
The human oral microbiome , author=. Journal of Bacteriology , volume=. 2010 , publisher=
2010
-
[31]
Proceedings of the National Academy of Sciences , volume=
Impact of diet in shaping gut microbiota revealed by a comparative study in children from Europe and rural Africa , author=. Proceedings of the National Academy of Sciences , volume=. 2010 , publisher=
2010
-
[32]
Nature , volume=
Human gut microbiome viewed across age and geography , author=. Nature , volume=. 2012 , publisher=
2012
-
[33]
NPJ Biofilms and Microbiomes , volume=
Integrated metagenomics identifies a crucial role for trimethylamine-producing Lachnoclostridium in promoting atherosclerosis , author=. NPJ Biofilms and Microbiomes , volume=. 2022 , publisher=
2022
-
[34]
Gut , volume=
A novel faecal Lachnoclostridium marker for the non-invasive diagnosis of colorectal adenoma and cancer , author=. Gut , volume=. 2020 , publisher=
2020
-
[35]
Science , volume=
Bacterial community variation in human body habitats across space and time , author=. Science , volume=. 2009 , publisher=
2009
-
[36]
core microbiome
Defining the healthy" core microbiome" of oral microbial communities , author=. BMC microbiology , volume=. 2009 , publisher=
2009
-
[37]
Nature , volume=
Enterotypes of the human gut microbiome , author=. Nature , volume=. 2011 , publisher=
2011
-
[38]
Forty-second International Conference on Machine Learning , year=
From Logits to Hierarchies: Hierarchical Clustering made Simple , author=. Forty-second International Conference on Machine Learning , year=
-
[39]
Applied and environmental microbiology , volume=
UniFrac: a new phylogenetic method for comparing microbial communities , author=. Applied and environmental microbiology , volume=. 2005 , publisher=
2005
-
[40]
Mao, Jialiang and Ma, Li , journal=
-
[41]
Nature Genetics , volume=
Gene ontology: tool for the unification of biology , author=. Nature Genetics , volume=. 2000 , publisher=
2000
-
[42]
Nature Machine Intelligence , volume=
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , author=. Nature Machine Intelligence , volume=. 2019 , publisher=
2019
-
[43]
Frontiers in Microbiology , volume=
Applications of machine learning in human microbiome studies: a review on feature selection, biomarker identification, disease prediction and treatment , author=. Frontiers in Microbiology , volume=. 2021 , publisher=
2021
-
[44]
2009 , organization=
Deng, Jia and Dong, Wei and Socher, Richard and Li, Li-Jia and Li, Kai and Fei-Fei, Li , booktitle=. 2009 , organization=
2009
-
[45]
1995 , publisher=
Miller, George A , journal=. 1995 , publisher=
1995
-
[46]
2009 , publisher=
Learning multiple layers of features from tiny images , author=. 2009 , publisher=
2009
-
[47]
Frontiers in microbiology , volume=
Analysis of microbiome data in the presence of excess zeros , author=. Frontiers in microbiology , volume=. 2017 , publisher=
2017
-
[48]
2022 , publisher=
Hu, Yingtian and Satten, Glen A and Hu, Yi-Juan , journal=. 2022 , publisher=
2022
-
[49]
2022 , publisher=
Li, Mengwei and Zhang, Xiaomeng and Ang, Kok Siong and Ling, Jingjing and Sethi, Raman and Lee, Nicole Yee Shin and Ginhoux, Florent and Chen, Jinmiao , journal=. 2022 , publisher=
2022
-
[50]
New England Journal of Medicine , volume=
Acute myeloid leukemia , author=. New England Journal of Medicine , volume=. 1999 , publisher=
1999
-
[51]
Comprehensive Physiology , volume=
Liver sinusoidal endothelial cells , author=. Comprehensive Physiology , volume=. 2015 , publisher=
2015
-
[52]
2022 , publisher=
Phipson, Belinda and Sim, Choon Boon and Porrello, Enzo R and Hewitt, Alex W and Powell, Joseph and Oshlack, Alicia , journal=. 2022 , publisher=
2022
-
[53]
The 28th International Conference on Artificial Intelligence and Statistics , year=
Hypernym Bias: Unraveling Deep Classifier Training Dynamics through the Lens of Class Hierarchy , author=. The 28th International Conference on Artificial Intelligence and Statistics , year=
-
[54]
Nickel, Maximillian and Kiela, Douwe , journal=. Poincar
-
[55]
Advances in neural information processing systems , volume=
Hyperbolic graph convolutional neural networks , author=. Advances in neural information processing systems , volume=
-
[56]
Advances in Applied Mathematics , volume=
Geometry of the space of phylogenetic trees , author=. Advances in Applied Mathematics , volume=. 2001 , publisher=
2001
-
[57]
IEEE/ACM Transactions on Computational Biology and Bioinformatics , volume=
A fast algorithm for computing geodesic distances in tree space , author=. IEEE/ACM Transactions on Computational Biology and Bioinformatics , volume=. 2010 , publisher=
2010
-
[58]
arXiv preprint arXiv:1805.12400 , year=
Tropical geometry of phylogenetic tree space: a statistical perspective , author=. arXiv preprint arXiv:1805.12400 , year=
-
[59]
Data mining and knowledge discovery , volume=
A survey of hierarchical classification across different application domains , author=. Data mining and knowledge discovery , volume=. 2011 , publisher=
2011
-
[60]
International Conference on Machine Learning , pages=
Representation tradeoffs for hyperbolic embeddings , author=. International Conference on Machine Learning , pages=. 2018 , organization=
2018
-
[61]
Signal Transduction and Targeted Therapy , volume=
A systematic framework for understanding the microbiome in human health and disease: from basic principles to clinical translation , author=. Signal Transduction and Targeted Therapy , volume=. 2024 , publisher=
2024
-
[62]
Cladistics , volume=
Reconstructing character evolution on polytomous cladograms , author=. Cladistics , volume=. 1989 , publisher=
1989
-
[63]
Diehl, Alexander D and Meehan, Terrence F and Bradford, Yvonne M and Brush, Matthew H and Dahdul, Wasila M and Dougall, David S and He, Yongqun and Osumi-Sutherland, David and Ruttenberg, Alan and Sarntivijai, Sirarat and others , journal=. The. 2016 , publisher=
2016
-
[64]
Nature Communications , volume=
Large-scale genome-wide analysis links lactic acid bacteria from food with the gut microbiome , author=. Nature Communications , volume=. 2020 , publisher=
2020
-
[65]
Microbiome , volume=
Proportion-based normalizations outperform compositional data transformations in machine learning applications , author=. Microbiome , volume=. 2024 , publisher=
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.