From Persistence to Survival: Hypothesis Testing, Effect Sizes and Vectorisation for Topological Features
Pith reviewed 2026-06-27 08:16 UTC · model grok-4.3
The pith
Treating persistence values in diagrams as survival times produces a unified representation for statistical tests, effect sizes, and stable feature vectors.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
STRAND treats persistence diagrams as survival data, with each topological feature's persistence serving as a time-to-event datum and the survival function S(t) = P(p > t) as the central object; from this representation the method derives a non-parametric two-sample test with calibrated Type I error, interpretable effect sizes, and a 1-Wasserstein-stable feature vector for machine learning.
What carries the argument
The persistence survival function S(t) = P(p > t), which encodes the distribution of persistence values across diagrams and serves as the source for both the test statistic and the feature vector.
If this is right
- A non-parametric test becomes available that controls Type I error and achieves high power even with few diagrams.
- Effect sizes derived from the survival function supply an interpretable measure of difference between diagram collections.
- The resulting feature vector remains stable under 1-Wasserstein perturbations and can be fed directly into standard machine learning pipelines.
- The same representation applies to functional brain connectivity analysis in fMRI data.
Where Pith is reading between the lines
- The survival framing may allow direct borrowing of censoring or competing-risks techniques when some topological features are only partially observed.
- Because the vector is 1-Wasserstein stable, it could be combined with existing Wasserstein-based kernels without additional stability proofs.
- The approach might generalize to other topological summaries that admit a natural ordering or magnitude, such as persistence landscapes or silhouettes.
Load-bearing premise
Treating each topological feature's persistence value as a fully observed time-to-event datum preserves all information needed for valid statistical comparison and stable vectorization.
What would settle it
Running the proposed two-sample test on synthetic manifolds with controlled topology differences and checking whether the empirical Type I error rate stays at the nominal level (for example 0.05) across repeated small-sample draws would directly test the calibration claim.
Figures
read the original abstract
Persistence diagrams are common representations in topological data analysis, but they do not naturally live in a vector space, and the statistical tools developed for comparing them have largely evolved separately from those used for downstream prediction. We introduce STRAND (Survival Topological Representation ANalysis of Diagrams), which treats (collections of) PDs as survival data: each topological feature with persistence value $p = d - b$ is a fully observed time-to-event, and the persistence survival function $S(t) = \mathbb{P}(p > t)$ is the central object for comparing diagrams. From this single representation we derive (i) a non-parametric two-sample test with calibrated Type I error and high power from a small number of diagrams; (ii) interpretable effect sizes; and (iii) a 1-Wasserstein-stable feature vector for downstream machine learning. We validate calibration and power on synthetic manifolds with controlled topology, demonstrate competitive vectorisation across 14 graph and 3D point cloud benchmarks, and apply the method to study functional brain connectivity in fMRI/neuroscience data. To our knowledge, STRAND is the first method to provide hypothesis testing and vectorisation for persistence diagrams from a single coherent and interpretable representation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces STRAND, which models collections of persistence diagrams as survival data by treating each topological feature's persistence value p = d - b as a fully observed time-to-event observation. The persistence survival function S(t) = P(p > t) is positioned as the central object, from which the authors derive (i) a non-parametric two-sample test with calibrated Type I error and high power, (ii) interpretable effect sizes, and (iii) a 1-Wasserstein-stable feature vector for downstream machine learning. The approach is validated on synthetic manifolds with controlled topology, 14 graph and 3-D point cloud benchmarks, and an fMRI neuroscience application, with the claim that it is the first method to unify hypothesis testing and vectorization from a single coherent representation.
Significance. If the representation proves sufficient for both valid inference and stable vectorization, the work would provide a unified, interpretable bridge between topological data analysis and survival analysis tools, potentially simplifying statistical practice in TDA while offering effect-size interpretability that is currently rare in diagram comparisons.
major comments (2)
- [Abstract] Abstract: the claim that the derived feature vector is '1-Wasserstein-stable' does not follow from the construction. The survival function S(t) is computed solely from the multiset of persistence values p = d - b; however, the 1-Wasserstein distance on persistence diagrams is defined via optimal matching in the (b, d) plane (with diagonal projection), so two diagrams can share identical persistence multisets yet exhibit arbitrarily large W1 distance when their birth coordinates differ. This directly challenges the stability assertion for the vectorization component.
- [Abstract] Abstract (paragraph on the survival representation): the assumption that the marginal distribution on p is a sufficient statistic for both the hypothesis test and the vectorization is load-bearing but unexamined. Because standard PD distances and stability results operate on the full (b, d) coordinates, treating p-values as i.i.d. time-to-event data risks invalidating the claimed Type I error calibration and the 1-Wasserstein stability when birth times carry topological information.
minor comments (1)
- [Abstract] The abstract states 'competitive vectorisation across 14 graph and 3D point cloud benchmarks' without naming the baselines, metrics, or exclusion rules; these details are needed to assess the performance claims.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive comments on the manuscript. We respond to each major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that the derived feature vector is '1-Wasserstein-stable' does not follow from the construction. The survival function S(t) is computed solely from the multiset of persistence values p = d - b; however, the 1-Wasserstein distance on persistence diagrams is defined via optimal matching in the (b, d) plane (with diagonal projection), so two diagrams can share identical persistence multisets yet exhibit arbitrarily large W1 distance when their birth coordinates differ. This directly challenges the stability assertion for the vectorization component.
Authors: We agree with the referee's analysis. The feature vector is obtained from the survival function of the persistence values p alone and therefore cannot be guaranteed to satisfy stability with respect to the 1-Wasserstein distance on the full (b, d) diagrams. We will revise the abstract to remove the specific claim of 1-Wasserstein stability. revision: yes
-
Referee: [Abstract] Abstract (paragraph on the survival representation): the assumption that the marginal distribution on p is a sufficient statistic for both the hypothesis test and the vectorization is load-bearing but unexamined. Because standard PD distances and stability results operate on the full (b, d) coordinates, treating p-values as i.i.d. time-to-event data risks invalidating the claimed Type I error calibration and the 1-Wasserstein stability when birth times carry topological information.
Authors: The referee correctly notes that the method relies on the marginal distribution of persistence values. While the synthetic experiments demonstrate calibrated Type I error under this modeling choice, birth coordinates can carry additional information in some applications. We will add a clarifying paragraph in the discussion section acknowledging this modeling assumption and its scope, without altering the core procedure. revision: partial
Circularity Check
No circularity: STRAND applies standard survival methods to a new representation
full rationale
The paper introduces the survival function S(t) = P(p > t) computed from persistence values p = d - b as an explicit modeling choice, then derives the two-sample test, effect sizes, and 1-Wasserstein-stable vector by applying established non-parametric survival techniques (e.g., log-rank or Kaplan-Meier style estimators) to this representation. No equation reduces any claimed output to a fitted parameter or input quantity by construction, and the provided abstract and description contain no self-citations that serve as load-bearing uniqueness theorems or ansatzes. The derivation chain is therefore self-contained: the statistical properties follow from the external validity of survival analysis once the representation is adopted, rather than from re-labeling or re-fitting the same data.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Persistence values p = d - b can be treated as fully observed time-to-event data for constructing a survival function S(t) = P(p > t)
Reference graph
Works this paper leans on
-
[1]
Langley , title =
P. Langley , title =. Proceedings of the 17th International Conference on Machine Learning (ICML 2000) , address =. 2000 , pages =
2000
-
[2]
T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980
1980
-
[3]
M. J. Kearns , title =
-
[4]
Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983
1983
-
[5]
R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000
2000
-
[6]
Suppressed for Anonymity , author=
-
[7]
Newell and P
A. Newell and P. S. Rosenbloom. Mechanisms of Skill Acquisition and the Law of Practice. Cognitive Skills and Their Acquisition. 1981
1981
-
[8]
A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. 1959
1959
-
[9]
Nature , year=
Toroidal topology of population activity in grid cells , author=. Nature , year=
-
[10]
Topology of viral evolution , volume =
Chan, Joseph and Carlsson, Gunnar and Rabadan, Raul , year =. Topology of viral evolution , volume =. Proceedings of the National Academy of Sciences of the United States of America , doi =
-
[11]
arXiv preprint arXiv:1511.01426 , year=
Multiscale topology of chromatin folding , author=. arXiv preprint arXiv:1511.01426 , year=
-
[12]
Journal of The Royal Society Interface , year=
Homology of homologous knotted proteins , author=. Journal of The Royal Society Interface , year=
-
[13]
, author=
Topological methods for genomics: present and future directions. , author=. Current opinion in systems biology , year=
-
[14]
Rizi, Abbas and Jafari, Reza , year =
Masoomy, Hosein and Askari, Behrouz and Tajik, Samin and K. Rizi, Abbas and Jafari, Reza , year =. Topological Analysis of Interaction Patterns in Cancer-Specific Gene Regulatory Network: Persistent Homology Approach , doi =. Scientific Reports , volume =
-
[15]
Clique topology reveals intrinsic geometric structure in neural correlations , volume =
Giusti, Chad and Pastalkova, Eva and Curto, Carina and Itskov, Vladimir , year =. Clique topology reveals intrinsic geometric structure in neural correlations , volume =. Proceedings of the National Academy of Sciences of the United States of America , doi =
-
[16]
PLOS Computational Biology , publisher =
A Topological Paradigm for Hippocampal Spatial Map Formation Using Persistent Homology , year =. PLOS Computational Biology , publisher =. doi:10.1371/journal.pcbi.1002581 , author =
-
[17]
Stolz and Jakob Kaeppler and Boston Markelc and Franziska Mech and Florian Lipsmeier and Ruth J
Bernadette J. Stolz and Jakob Kaeppler and Boston Markelc and Franziska Mech and Florian Lipsmeier and Ruth J. Muschel and Helen M. Byrne and Heather A. Harrington , Date-Added =. Multiscale topology characterises dynamic tumour vascular networks , Volume =. Science Advances , Number =
-
[18]
Topological Data Analysis of Collective and Individual Epithelial Cells using Persistent Homology of Loops , volume =
Bhaskar, Dhananjay and Zhang, William and Wong, Ian , year =. Topological Data Analysis of Collective and Individual Epithelial Cells using Persistent Homology of Loops , volume =. Soft Matter , doi =
-
[19]
Proceedings of the National Academy of Sciences , year=
Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival , author=. Proceedings of the National Academy of Sciences , year=
-
[20]
International Symposium on Computational Geometry , year=
Persistent Homology Based Characterization of the Breast Cancer Immune Microenvironment: A Feasibility Study , author=. International Symposium on Computational Geometry , year=
-
[21]
Vectorized persistent homology representations for characterizing glandular architecture in histology images , doi =
Chittajallu, Deepak and Siekierski, Neal and Lee, Sanghoon and Samuel, Gerber and Beezley, Jonathan and Manthey, David and Gutman, David and Cooper, Lee , year =. Vectorized persistent homology representations for characterizing glandular architecture in histology images , doi =
-
[22]
and Perou, Charles and Niethammer, Marc , year =
Singh, Nikhil and Couture, Heather and Marron, J. and Perou, Charles and Niethammer, Marc , year =. Topological Descriptors of Histology Images , isbn =
-
[23]
and Fasy, Brittany and Wenk, Carola , year =
Lawson, Peter and Sholl, Andrew and Brown, J. and Fasy, Brittany and Wenk, Carola , year =. Persistent Homology for the Quantitative Evaluation of Architectural Features in Prostate Cancer Histology , volume =. Scientific Reports , doi =
-
[24]
arXiv preprint arXiv:2406.02300 , year=
Point-Level Topological Representation Learning on Point Clouds , author=. arXiv preprint arXiv:2406.02300 , year=
-
[25]
arXiv preprint arXiv:2303.16716 , year=
Topological point cloud clustering , author=. arXiv preprint arXiv:2303.16716 , year=
-
[26]
International Conference on Machine Learning , pages=
Topological singularity detection at multiple scales , author=. International Conference on Machine Learning , pages=. 2023 , organization=
2023
-
[27]
Proceedings of the national academy of sciences , volume=
Geometric anomaly detection in data , author=. Proceedings of the national academy of sciences , volume=. 2020 , publisher=
2020
-
[28]
Journal of Machine Learning Research , volume=
Multiparameter persistence landscapes , author=. Journal of Machine Learning Research , volume=
-
[29]
Proceedings of the twenty-third annual symposium on Computational geometry , pages=
The theory of multidimensional persistence , author=. Proceedings of the twenty-third annual symposium on Computational geometry , pages=
-
[30]
A Survey of Vectorization Methods in Topological Data Analysis , publisher =
Ali, Dashti and Asaad, Aras and Jimenez, Maria-Jose and Nanda, Vidit and Paluzo-Hidalgo, Eduardo and Soriano-Trigueros, Manuel , keywords =. A Survey of Vectorization Methods in Topological Data Analysis , publisher =. 2022 , copyright =. doi:10.48550/ARXIV.2212.09703 , url =
-
[31]
Bulletin of the American Mathematical Society , volume=
Topology and data , author=. Bulletin of the American Mathematical Society , volume=
-
[32]
Discrete & computational geometry , volume=
Topological persistence and simplification , author=. Discrete & computational geometry , volume=. 2002 , publisher=
2002
-
[33]
2000 , school=
Computational topology at multiple resolutions: foundations and applications to fractals and dynamics , author=. 2000 , school=
2000
-
[34]
Chaos: An Interdisciplinary Journal of Nonlinear Science , volume=
Persistent homology of time-dependent functional networks constructed from coupled time series , author=. Chaos: An Interdisciplinary Journal of Nonlinear Science , volume=. 2017 , publisher=
2017
-
[35]
Inverse Problems , volume=
Persistent homology detects curvature , author=. Inverse Problems , volume=. 2020 , publisher=
2020
-
[36]
arXiv preprint arXiv:2207.03926 , year=
On the universality of random persistence diagrams , author=. arXiv preprint arXiv:2207.03926 , year=
-
[37]
arXiv preprint arXiv:1303.7117 , year=
Statistical inference for persistent homology: Confidence sets for persistence diagrams , author=. arXiv preprint arXiv:1303.7117 , year=
-
[38]
Journal of Machine Learning Research , volume=
Outlier-robust subsampling techniques for persistent homology , author=. Journal of Machine Learning Research , volume=
-
[39]
Journal of the American statistical association , volume=
Nonparametric estimation from incomplete observations , author=. Journal of the American statistical association , volume=. 1958 , publisher=
1958
-
[40]
Cancer Chemother Rep , volume=
Evaluation of survival data and two new rank order statistics arising in its consideration , author=. Cancer Chemother Rep , volume=
-
[41]
Journal of Open Source Software , volume=
lifelines: survival analysis in Python , author=. Journal of Open Source Software , volume=
-
[42]
py: A lean persistent homology library for python , author=
Ripser. py: A lean persistent homology library for python , author=. Journal of Open Source Software , volume=
-
[43]
Journal of Open Source Software , volume=
SurPyval: Survival Analysis with Python , author=. Journal of Open Source Software , volume=
-
[44]
Confidence sets for persistence diagrams , author=
-
[45]
arXiv preprint arXiv:1107.4775 , year=
Distance functions, critical points, and topology for some random complexes , author=. arXiv preprint arXiv:1107.4775 , year=
-
[46]
Proceedings of the National Academy of Sciences , volume=
Hierarchical structures of amorphous solids characterized by persistent homology , author=. Proceedings of the National Academy of Sciences , volume=. 2016 , publisher=
2016
-
[47]
Applied general topology , volume=
cl-Supercontinuous functions , author=. Applied general topology , volume=
-
[48]
Journal of the Royal Statistical Society: Series B (Methodological) , volume=
Regression models and life-tables , author=. Journal of the Royal Statistical Society: Series B (Methodological) , volume=. 1972 , publisher=
1972
-
[49]
Frontiers in artificial intelligence , volume=
An introduction to topological data analysis: fundamental and practical aspects for data scientists , author=. Frontiers in artificial intelligence , volume=. 2021 , publisher=
2021
-
[50]
2018 , publisher=
Geometric and topological inference , author=. 2018 , publisher=
2018
-
[51]
Foundations of Data Science , volume=
On the limits of topological data analysis for statistical inference , author=. Foundations of Data Science , volume=. 2025 , publisher=
2025
-
[52]
Statistical applications in genetics and molecular biology , volume=
Using persistent homology and dynamical distances to analyze protein binding , author=. Statistical applications in genetics and molecular biology , volume=. 2016 , publisher=
2016
-
[53]
NeuroImage , volume=
Promises and pitfalls of topological data analysis for brain connectivity analysis , author=. NeuroImage , volume=. 2021 , publisher=
2021
-
[54]
SIAM Journal on Applied Algebra and Geometry , volume=
Tropical sufficient statistics for persistent homology , author=. SIAM Journal on Applied Algebra and Geometry , volume=. 2019 , publisher=
2019
-
[55]
Advances in neural information processing systems , volume=
Rapid distance-based outlier detection via sampling , author=. Advances in neural information processing systems , volume=
-
[56]
The Stata Journal , volume=
Parametric frailty and shared frailty survival models , author=. The Stata Journal , volume=. 2002 , publisher=
2002
-
[57]
Shanghai archives of psychiatry , volume=
Survival analysis for observational and clustered data: an application for assessing individual and environmental risk factors for suicide , author=. Shanghai archives of psychiatry , volume=
-
[58]
A statistical approach to persistent homology , author=
-
[59]
Proceedings of the twentieth annual symposium on Computational geometry , pages=
Computing persistent homology , author=. Proceedings of the twentieth annual symposium on Computational geometry , pages=
-
[60]
Bulletin of the American Mathematical Society , volume=
Barcodes: the persistent topology of data , author=. Bulletin of the American Mathematical Society , volume=
-
[61]
European Journal of Cardio-Thoracic Surgery , volume=
Advanced considerations in survival analysis , author=. European Journal of Cardio-Thoracic Surgery , volume=. 2024 , publisher=
2024
-
[62]
arXiv preprint arXiv:1311.0376 , year=
On the bootstrap for persistence diagrams and landscapes , author=. arXiv preprint arXiv:1311.0376 , year=
-
[63]
British journal of cancer , volume=
Survival analysis part I: basic concepts and first analyses , author=. British journal of cancer , volume=. 2003 , publisher=
2003
-
[64]
Statistics in medicine , volume=
Heterogeneity in survival analysis , author=. Statistics in medicine , volume=. 1988 , publisher=
1988
-
[65]
2011 , publisher=
Statistical models and methods for lifetime data , author=. 2011 , publisher=
2011
-
[66]
Inference and applications to clustering , author=
Mixture models. Inference and applications to clustering , author=. Statistics: textbooks and monographs , year=
-
[67]
2013 , publisher=
Statistical methods in medical research , author=. 2013 , publisher=
2013
-
[68]
Bauer, Ulrich , TITLE =. J. Appl. Comput. Topol. , FJOURNAL =. 2021 , NUMBER =. doi:10.1007/s41468-021-00071-5 , URL =
-
[69]
Hunter and Derek Young , journal =
Tatiana Benaglia and Didier Chauveau and David R. Hunter and Derek Young , journal =. 2009 , volume =
2009
-
[70]
Turner, Katharine and Mileyko, Yuriy and Mukherjee, Sayan and Harer, John , journal=. Fr. 2014 , publisher=
2014
-
[71]
2025 , booktitle=
The Flood Complex: Large-Scale Persistent Homology on Millions of Points , author=. 2025 , booktitle=
2025
-
[72]
arXiv preprint arXiv:2410.11042 , year=
Persistent topological features in large language models , author=. arXiv preprint arXiv:2410.11042 , year=
-
[73]
2010 , publisher=
Frailty models in survival analysis , author=. 2010 , publisher=
2010
-
[74]
A Scalable Approach for Mapper via Efficient Spatial Search , author=
-
[75]
International Conference on Machine Learning , pages=
Graph filtration learning , author=. International Conference on Machine Learning , pages=. 2020 , organization=
2020
-
[76]
Biometrika , volume=
Partial residuals for the proportional hazards regression model , author=. Biometrika , volume=. 1982 , publisher=
1982
-
[77]
The journal of machine learning research , volume=
A kernel two-sample test , author=. The journal of machine learning research , volume=. 2012 , publisher=
2012
-
[78]
arXiv preprint arXiv:2506.09277 , year=
Did I Faithfully Say What I Thought? Bridging the Gap Between Neural Activity and Self-Explanations in Large Language Models , author=. arXiv preprint arXiv:2506.09277 , year=
-
[79]
BMC Medical Research Methodology , volume=
Random survival forests for the analysis of recurrent events for right-censored data, with or without a terminal event , author=. BMC Medical Research Methodology , volume=. 2025 , publisher=
2025
-
[80]
Journal of Applied and Computational Topology , volume=
Hypothesis testing for topological data analysis , author=. Journal of Applied and Computational Topology , volume=. 2017 , publisher=
2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.