k-Nearest Neighbors in Gromov--Wasserstein Space
Pith reviewed 2026-06-27 12:01 UTC · model grok-4.3
The pith
The k-nearest neighbors classifier is universally consistent under the Gromov-Wasserstein distance when applied to graphs represented as metric measure spaces.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We prove the universal consistency of the GW-k-NN classifier on the space of equivalence classes of metric measure spaces with finite support and uniform probability measure. By viewing graphs as finitely supported metric measure spaces equipped with the pairwise distance metric and a uniform probability measure on the nodes, we obtain universal consistency of GW-k-NN for the space of graphs. Likewise for fGW-k-NN, we prove universal consistency on the space of weak isomorphism classes of structured objects consisting of metric measure spaces with finite support and uniform probability measure and feature maps into Euclidean space.
What carries the argument
The Gromov-Wasserstein distance between equivalence classes of finite metric measure spaces, which defines a metric on the quotient space and thereby turns the set of graphs into a metric space on which k-NN can be run directly.
If this is right
- GW-k-NN is consistent for any classification problem whose data are finite graphs.
- fGW-k-NN is consistent for any classification problem whose data are finite node-attributed graphs.
- Consistency holds without first embedding the graphs into a Euclidean vector space.
- The same consistency guarantee applies to any data type that can be faithfully cast as a finite metric measure space with uniform measure.
Where Pith is reading between the lines
- The result supplies a justification for trying other nonparametric classifiers inside the same GW metric space.
- It raises the question whether similar consistency statements hold when the uniform measure on nodes is replaced by a non-uniform measure derived from node degrees or other graph statistics.
- The framework immediately suggests testing whether the same k-NN rule remains consistent when the underlying distance is replaced by other optimal-transport-type distances on graphs.
Load-bearing premise
Representing a graph by its nodes equipped with pairwise distances and the uniform probability measure on nodes preserves all information needed for the classification task.
What would settle it
A concrete sequence of probability distributions on graphs (with labels) for which the GW-k-NN error rate remains bounded away from the Bayes error for arbitrarily large sample sizes.
Figures
read the original abstract
The Gromov--Wasserstein (GW) distance provides a framework for comparing metric measure spaces, regardless of their underlying structure or geometry. For network-based data, it enables direct comparisons of graphs with different numbers of nodes, without requiring an embedding or other abstraction. Furthermore, through a variant of GW known as fused Gromov--Wasserstein (fGW), it is also possible to incorporate node features in addition to graph structure. In this work, we implement $k$-nearest neighbors ($k$-NN) classification using the GW and fGW distances. We prove the universal consistency of the GW-$k$-NN classifier on the space of equivalence classes of metric measure spaces with finite support and uniform probability measure. By viewing graphs as finitely supported metric measure spaces equipped with the pairwise distance metric and a uniform probability measure on the nodes, we obtain universal consistency of GW-$k$-NN for the space of graphs. Likewise for fGW-$k$-NN, we prove universal consistency on the space of weak isomorphism classes of structured objects consisting of metric measure spaces with finite support and uniform probability measure and feature maps into Euclidean space, thus establishing universal consistency on the space of node-attributed graphs. Our numerical experiments show that GW-$k$-NN and fGW-$k$-NN consistently perform well across multiple graph datasets, suggesting that metric classifiers such as $k$-NN work well in the GW framework.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper implements k-NN classification using Gromov-Wasserstein (GW) and fused GW (fGW) distances on metric measure spaces. It proves universal consistency of GW-k-NN on the quotient space of equivalence classes of finite-support uniform mm-spaces, then identifies graphs with these objects via pairwise distances and uniform node measures to claim consistency for graph classification; an analogous result holds for fGW-k-NN on node-attributed graphs. Experiments on several graph datasets are reported to show competitive performance.
Significance. If the consistency theorems are correctly established, the result supplies a nonparametric consistency guarantee for a metric classifier directly on the GW space of mm-spaces and, by the stated identification, on graphs. This is a substantive theoretical contribution because it avoids embedding or alignment steps and applies to graphs of varying size. The experiments provide supporting empirical evidence, though without reported variance or detailed baselines their strength is limited.
major comments (2)
- [§5] §4 (proof of universal consistency) and §5 (extension to graphs): the central claim that consistency on the quotient space of finite-support uniform mm-spaces immediately yields consistency for graph classification rests on the unstated assumption that every graph distribution and label function factors through the pairwise-distance + uniform-measure representation without loss; no theorem is given showing that labels depending on directedness, edge attributes, or other non-metric information remain measurable functions on the GW quotient.
- [§5.1] §5.1 (fGW case): the analogous claim for node-attributed graphs via fGW likewise identifies structured objects with mm-spaces plus Euclidean feature maps, but does not address whether the weak-isomorphism classes preserve label-relevant information when node features interact with non-metric graph properties.
minor comments (2)
- [Experiments] Experiments section: error bars, standard deviations, or multiple random seeds are not reported, and baseline details (e.g., which competing methods and hyper-parameter ranges) are only sketched.
- [§3] Notation: the precise definition of the quotient space (equivalence classes under GW) and the topology used for universal consistency should be stated explicitly before the theorem, rather than referenced only in passing.
Simulated Author's Rebuttal
We thank the referee for their careful reading and valuable comments. We respond to the major comments below and will make revisions to clarify the scope of our results.
read point-by-point responses
-
Referee: [§5] §4 (proof of universal consistency) and §5 (extension to graphs): the central claim that consistency on the quotient space of finite-support uniform mm-spaces immediately yields consistency for graph classification rests on the unstated assumption that every graph distribution and label function factors through the pairwise-distance + uniform-measure representation without loss; no theorem is given showing that labels depending on directedness, edge attributes, or other non-metric information remain measurable functions on the GW quotient.
Authors: The paper establishes universal consistency for the GW-k-NN classifier on the quotient space of finite-support uniform mm-spaces. The extension to graphs is made by identifying undirected graphs (without edge attributes) with such spaces via their pairwise distance metric and uniform node measure. This identification is the standard one used in the GW literature for graphs. For label functions that depend on properties not encoded in this representation, such as directedness or edge attributes, the GW distance cannot distinguish them, and our consistency result does not apply to those cases. We did not intend to claim universality over all possible graph distributions. We will revise the manuscript to explicitly delimit the class of graphs considered and to note that the relevant label functions are measurable with respect to the quotient space under this identification. revision: yes
-
Referee: [§5.1] §5.1 (fGW case): the analogous claim for node-attributed graphs via fGW likewise identifies structured objects with mm-spaces plus Euclidean feature maps, but does not address whether the weak-isomorphism classes preserve label-relevant information when node features interact with non-metric graph properties.
Authors: Analogously, the fGW result applies to the weak isomorphism classes of mm-spaces equipped with feature maps. For node-attributed graphs, we use this representation assuming that the node features and the metric structure capture the relevant information for the labels. If node features interact with non-metric properties in a way not preserved, the result would not hold. We will add a clarifying statement in §5.1 regarding the scope. revision: yes
Circularity Check
No circularity: mathematical consistency proof is self-contained
full rationale
The central claim is a proof of universal consistency of GW-k-NN on the quotient space of finite-support uniform mm-spaces (and likewise for fGW on structured objects). This is a direct extension of classical k-NN consistency theorems in metric spaces to the GW metric; the derivation chain consists of standard measure-theoretic arguments and does not reduce any quantity to a fitted parameter, self-definition, or load-bearing self-citation. The modeling step that identifies graphs with pairwise-distance mm-spaces is an explicit assumption stated in the abstract, not a hidden circular reduction. No equations or steps in the provided text exhibit the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Graphs are faithfully modeled as metric measure spaces with finite support and uniform probability measure on nodes.
- standard math Universal consistency of k-NN holds in the GW metric on the space of equivalence classes of finite-support mm-spaces.
Reference graph
Works this paper leans on
-
[1]
Sur la distance de Nagata.C
Patrice Assouad. Sur la distance de Nagata.C. R. Acad. Sci. Paris, 294(1):31–34, 1982
1982
-
[2]
Recouvrements, Derivation Des Measures et Dimensions
Patrice Assouad and Thierry Quentin de Gromard. Recouvrements, Derivation Des Measures et Dimensions. Rev. Mat. Iberoam, 22(3):893–953, 2006
2006
-
[3]
The Z-Gromov-Wasserstein Distance.Journal of Machine Learning Research, 26:1–57, 2025
Martin Bauer, Facundo M´emoli, Tom Needham, and Mao Nishino. The Z-Gromov-Wasserstein Distance.Journal of Machine Learning Research, 26:1–57, 2025
2025
-
[4]
On a Linear Gromov-Wasserstein Distance.IEEE Transactions on Image Processing, 31:7292–7305, 2022
Florian Beier, Robert Beinert, and Gabriele Steidl. On a Linear Gromov-Wasserstein Distance.IEEE Transactions on Image Processing, 31:7292–7305, 2022
2022
-
[5]
Borgwardt and Hans-Peter Kriegel
Karsten M. Borgwardt and Hans-Peter Kriegel. Shortest-path kernels on graphs. InProceedings of the Fifth IEEE International Conference on Data Mining, ICDM ’05, page 74–81, USA, 2005. IEEE Computer Society. ISBN 0769522785. doi: 10.1109/ICDM.2005.132. URLhttps://doi.org/10.1109/ICDM.2005.132. 16
-
[6]
Borgwardt, Cheng Soon Ong, Stefan Sch¨onauer, S
Karsten M. Borgwardt, Cheng Soon Ong, Stefan Sch¨onauer, S. V. N. Vishwanathan, Alex J. Smola, and Hans-Peter Kriegel. Protein function prediction via graph kernels.Bioinformatics, 21(1):47–56, January 2005. ISSN 1367-
2005
-
[7]
URLhttps://doi.org/10.1093/bioinformatics/bti1007
doi: 10.1093/bioinformatics/bti1007. URLhttps://doi.org/10.1093/bioinformatics/bti1007
-
[8]
Classification with Imperfect Training Labels
Timothy Cannings, Yingying Fan, and Richard Samworth. Classification with Imperfect Training Labels. Biometrika, 107(2):311–330, 2020
2020
-
[9]
Nearest Neighbor Classification in Infinite Dimension.ESAIM: Probability and Statistics, 10:340–355, 2006
Fr ´ed´eric C´erou and Arnaud Guyader. Nearest Neighbor Classification in Infinite Dimension.ESAIM: Probability and Statistics, 10:340–355, 2006
2006
-
[10]
Rates of convergence for nearest neighbor classification
Kamalika Chaudhuri and Sanjoy Dasgupta. Rates of convergence for nearest neighbor classification. In Z. Ghahra- mani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger, editors,Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014. URLhttps://proceedings.neurips.cc/ paper_files/paper/2014/file/2b764b803acec2d590...
2014
-
[11]
Benoit Collins, Sushma Kumari, and Vladimir Pestov. Universal consistency of the𝑘-nn rule in metric spaces and nagata dimension.Probability and Statistics, 24:914–934, 2020. doi: 10.1051/ps/2020018. URLhttps: //www.numdam.org/articles/10.1051/ps/2020018/
-
[12]
Cover and P
T. Cover and P. Hart. Nearest neighbor pattern classification.IEEE Transactions on Information Theory, 13(1): 21–27, 1967
1967
-
[14]
Springer, 1996
Luc Devroye, L ´aszl´o Gy¨orfi, and Gabor Lugosi.A Probabilistic Theory of Pattern Recognition. Springer, 1996
1996
-
[15]
Paul Dobson and Andrew Doig. Distinguishing enzyme structures from non-enzymes without alignments.Journal of Molecular Biology, 330(4):771–783, 2003. ISSN 0022-2836. doi: https://doi.org/10.1016/S0022-2836(03) 00628-4. URLhttps://www.sciencedirect.com/science/article/pii/S0022283603006284
-
[16]
Rate of Convergence of k-Nearest-Neighbor Classification Rule
Maik D ¨oring, L´aszl´o Gy¨orfi, and Harro Walk. Rate of Convergence of k-Nearest-Neighbor Classification Rule. Journal of Machine Learning Research, 18:1–16, 2018
2018
-
[17]
Scalable kernels for graphs with continuous attributes
Aasa Feragen, Niklas Kasenburg, Jens Petersen, Marleen de Bruijne, and Karsten Borgwardt. Scalable kernels for graphs with continuous attributes. InProceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1, NIPS’13, page 216–224, Red Hook, NY, USA, 2013. Curran Associates Inc
2013
-
[18]
Alaya, Aur´elie Boisbunon, Stanislas Chambon, Laetitia Chapel, Adrien Corenflos, Kilian Fatras, Nemo Fournier, L´eo Gautheron, Nathalie T.H
R ´emi Flamary, Nicolas Courty, Alexandre Gramfort, Mokhtar Z. Alaya, Aur´elie Boisbunon, Stanislas Chambon, Laetitia Chapel, Adrien Corenflos, Kilian Fatras, Nemo Fournier, L´eo Gautheron, Nathalie T.H. Gayraud, Hicham Janati, Alain Rakotomamonjy, Ievgen Redko, Antoine Rolet, Antony Schutz, Vivien Seguy, Danica J. Sutherland, Romain Tavenard, Alexander T...
2021
-
[19]
Scalable gromov–wasserstein based comparison of biological time series.Bulletin of Mathematical Biology, 85(77), 2023
Natalia Kravtsova, Reginald McGee, and Adriana Dawes. Scalable gromov–wasserstein based comparison of biological time series.Bulletin of Mathematical Biology, 85(77), 2023
2023
-
[20]
PhD thesis, Kyoto University, 2018
Sushma Kumari.Topics in Random Matrices and Statistical Machine Learning. PhD thesis, Kyoto University, 2018
2018
-
[21]
Nagata dimension, quasisymmetric embeddings, and lipschitz extensions
Urs Lang and Thilo Schlichenmaier. Nagata dimension, quasisymmetric embeddings, and lipschitz extensions. International Mathematics Research Notices, 2005(58):3625–3655, 01 2005. ISSN 1073-7928. doi: 10.1155/ IMRN.2005.3625. URLhttps://doi.org/10.1155/IMRN.2005.3625
-
[22]
The pharmacophore kernel for virtual screening with support vector machines.Journal of Chemical Information and Modeling, 46(5):2003–14, August
Pierre Mah ´e, Liva Ralaivola, V ´eronique Stoven, and Jean-Philippe Vert. The pharmacophore kernel for virtual screening with support vector machines.Journal of Chemical Information and Modeling, 46(5):2003–14, August
2003
-
[23]
URLhttps://hal.science/hal-00433580
doi: 10.1021/ci060138m. URLhttps://hal.science/hal-00433580. 17
-
[24]
Gromov-Wasserstein Distances and the Metric Approach to Object Matching.Foundations of Computational Mathematics, 11:417–487, 2011
Facundo M ´emoli. Gromov-Wasserstein Distances and the Metric Approach to Object Matching.Foundations of Computational Mathematics, 11:417–487, 2011
2011
-
[25]
Note on Dimension Theory for Metric Spaces.Fundamenta Mathematicae, 45(1):143–181, 1958
Jun-iti Nagata. Note on Dimension Theory for Metric Spaces.Fundamenta Mathematicae, 45(1):143–181, 1958
1958
-
[26]
Marion Neumann, Roman Garnett, Christian Bauckhage, and Kristian Kersting. Propagation kernels: Ef- ficient graph kernels from propagated information.Machine Learning, 102:209–245, 2015. doi: 10.1007/ s10994-015-5517-9. URLhttps://link.springer.com/article/10.1007/s10994-015-5517-9
-
[27]
Gromov-Wasserstein Averaging of Kernel and Distance Matrices.Proceedings of the 33rd International Conference on Machine Learning, 48, 2016
Gabriel Peyr ´e, Marco Cuturi, and Justin Solomon. Gromov-Wasserstein Averaging of Kernel and Distance Matrices.Proceedings of the 33rd International Conference on Machine Learning, 48, 2016
2016
-
[28]
A novel sliced fused gromov-wasserstein distance, 2025
Moritz Piening and Robert Beinert. A novel sliced fused gromov-wasserstein distance, 2025
2025
-
[29]
Universal Consistency of Wasserstein k-NN Classifier: A Negative and Some Positive Results.Information and Inference: A Journal of the IMA, 12(3), 2023
Donlapark Ponnoprat. Universal Consistency of Wasserstein k-NN Classifier: A Negative and Some Positive Results.Information and Inference: A Journal of the IMA, 12(3), 2023
2023
-
[30]
Dimension of Metrics and Differentiation of Measures.General Topology and Its Relations to Modern Analysis and Algebra, V (Prague, 1981), 3:565–568, 1983
David Preiss. Dimension of Metrics and Differentiation of Measures.General Topology and Its Relations to Modern Analysis and Algebra, V (Prague, 1981), 3:565–568, 1983
1981
-
[31]
Entropic gromov-wasserstein distances: stability and algorithms
Gabriel Rioux, Ziv Goldfeld, and Kengo Kato. Entropic gromov-wasserstein distances: stability and algorithms. J. Mach. Learn. Res., 25(1), January 2024. ISSN 1532-4435
2024
-
[32]
Modelling Convex Shape Priors and Matching Based on the Gromov-Wasserstein Distance.J
Bernhard Schmitzer and Christoph Schn ¨orr. Modelling Convex Shape Priors and Matching Based on the Gromov-Wasserstein Distance.J. Math. Imaging Vis., 46(1):143–159, May 2013. ISSN 0924-9907. doi: 10.1007/s10851-012-0375-6. URLhttps://doi.org/10.1007/s10851-012-0375-6
-
[33]
Efficient graphlet kernels for large graph comparison
Nino Shervashidze, SVN Vishwanathan, Tobias Petri, Kurt Mehlhorn, and Karsten Borgwardt. Efficient graphlet kernels for large graph comparison. In David van Dyk and Max Welling, editors,Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, volume 5 ofProceedings of Machine Learning Research, pages 488–495, Hilton C...
2009
-
[34]
Consistent Nonparametric Regression.The Annals of Statistics, 5(4):595–620, 1977
Charles Stone. Consistent Nonparametric Regression.The Annals of Statistics, 5(4):595–620, 1977
1977
-
[35]
Number 1443 in Volume 290
Karl-Theodor Sturm.The Space of Spaces: Curvature Bounds and Gradient Flows on the Space of Metric Measure Spaces. Number 1443 in Volume 290. American Mathematical Society, 2023
2023
-
[36]
Spline-Fitting with a Genetic Algorithm: A Method for Develping Classification Structure-Activity Relationships.Journal of Chemical Information and Computer Science, 43(6), 2003
Jeffrey Sutherland, Lee O’Brian, and Donald Weaver. Spline-Fitting with a Genetic Algorithm: A Method for Develping Classification Structure-Activity Relationships.Journal of Chemical Information and Computer Science, 43(6), 2003
2003
-
[37]
Fused Gromov-Wasserstein Distance for Structured Objects.Algorithms, 2020
Titouan Vayer, Laetitia Chapel, R´emi Flamary, Romain Tavenard, and Nicolas Courty. Fused Gromov-Wasserstein Distance for Structured Objects.Algorithms, 2020. URLhttps://hal.science/hal-02971153/document
2020
-
[38]
Pinar Yanardag and S.V.N. Vishwanathan. Deep graph kernels. InProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’15, page 1365–1374, New York, NY, USA, 2015. Association for Computing Machinery. ISBN 9781450336642. doi: 10.1145/2783258.2783417. URLhttps://doi.org/10.1145/2783258.2783417. Appendix Due t...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.