Recognition: unknown
Forward--Backward Green Cosine Geometry for Directed Community Detection and Overlap Expansion
Pith reviewed 2026-05-07 12:55 UTC · model grok-4.3
The pith
The paper introduces forward-backward Green cosine geometry to improve directed community detection and overlap expansion using centered diffusive profiles from random walks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The key observation is that hitting-time information is natural for directed graphs, but raw hitting-time vectors are not well suited for cosine comparison: they contain a source-independent stationary baseline, whereas cosine similarity is not translation-invariant. We therefore replace raw hitting-time profiles by centered Green profiles of the directed random walk and use the diffusive part of the truncated Green profile, excluding the time-zero self-spike. To account for asymmetry, we concatenate the Green profile of the original walk with the corresponding profile on the edge-reversed graph, yielding forward--backward Green coordinates.
Load-bearing premise
That centered Green profiles (diffusive part, truncated, forward-backward concatenated) from the directed random walk provide an embedding in which cosine similarity reliably reflects community structure, as required for both the KMeans partitioning and the overlap expansion rule to succeed.
read the original abstract
Community detection in directed graphs is challenging because edge asymmetry induces non-reversible diffusion, direction-dependent accessibility, and distinct source and target roles. This paper develops a Green-based cosine geometry for directed community detection and for expanding a disjoint partition into an overlapping cover. The key observation is that hitting-time information is natural for directed graphs, but raw hitting-time vectors are not well suited for cosine comparison: they contain a source-independent stationary baseline, whereas cosine similarity is not translation-invariant. We therefore replace raw hitting-time profiles by centered Green profiles of the directed random walk and use the diffusive part of the truncated Green profile, excluding the time-zero self-spike. To account for asymmetry, we concatenate the Green profile of the original walk with the corresponding profile on the edge-reversed graph, yielding forward--backward Green coordinates. The framework gives two algorithms. Di-Green-FB-cosine-KMeans clusters vertices in the Green cosine space to obtain a disjoint directed partition. Di-Green-FB-Cosine Overlap expands an initial partition into an overlapping cover using a community-adaptive cosine rule. The initial partition can be supplied by any disjoint method; in the main pipeline it is produced by Di-Green-FB-cosine-KMeans. Experiments on synthetic directed benchmarks show that the proposed geometry improves over raw hitting-time cosine variants and is competitive with directed spectral and flow-based baselines. Real-network experiments, evaluated by directed modularity as an internal quality measure, indicate that the same geometry produces coherent directed partitions. Synthetic overlap experiments further show that the method recovers additional memberships effectively, especially in moderately and weakly separated directed networks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops a forward-backward Green cosine geometry for directed community detection. Raw hitting-time vectors are replaced by centered Green profiles of the directed random walk (using only the diffusive part of the truncated profile, excluding the time-zero self-spike). Forward and backward profiles are concatenated to handle asymmetry. The resulting coordinates support Di-Green-FB-cosine-KMeans for disjoint partitions and a cosine-based rule for expanding an initial partition into an overlapping cover. Experiments claim improvements over raw hitting-time cosine on synthetic directed benchmarks, competitiveness with directed spectral and flow baselines, coherent real-network partitions under directed modularity, and effective overlap recovery especially in moderately/weakly separated networks.
Significance. If the centered truncated forward-backward Green embedding reliably aligns cosine similarity with community structure, the approach would supply a geometrically motivated method for non-reversible diffusion on directed graphs that corrects the translation issue of raw hitting times and incorporates directionality. The overlap-expansion component adds practical value. Empirical competitiveness with established baselines indicates potential utility, though the contribution remains heuristic without supporting analysis.
major comments (2)
- [Abstract and framework] Abstract (key observation paragraph) and the framework description: the claim that centered, truncated, forward-backward Green profiles form an embedding in which cosine similarity reflects community membership lacks any derivation, bound, or analysis showing that intra-community similarities exceed inter-community ones after centering (to remove the source-independent stationary baseline) and truncation. This unproven alignment is load-bearing for both the KMeans partitioning and the overlap-expansion rule.
- [Experiments] Experiments section: the abstract states improvements on synthetic benchmarks and competitiveness with directed spectral/flow methods, yet supplies no quantitative metrics, error bars, data-selection criteria, or statistical tests. Without these, it is impossible to assess whether the reported gains support the central geometric claim.
minor comments (2)
- [Methods] Define the Green function, truncation threshold, and exact concatenation operation with consistent notation in the methods; the current description leaves the precise construction of the diffusive part ambiguous.
- [Overlap expansion] Clarify how the initial partition is obtained when it is not supplied by Di-Green-FB-cosine-KMeans, and state the community-adaptive cosine threshold explicitly.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive critique of our manuscript. We respond point-by-point to the major comments and indicate the revisions we will make.
read point-by-point responses
-
Referee: [Abstract and framework] Abstract (key observation paragraph) and the framework description: the claim that centered, truncated, forward-backward Green profiles form an embedding in which cosine similarity reflects community membership lacks any derivation, bound, or analysis showing that intra-community similarities exceed inter-community ones after centering (to remove the source-independent stationary baseline) and truncation. This unproven alignment is load-bearing for both the KMeans partitioning and the overlap-expansion rule.
Authors: We agree that the manuscript presents the alignment between cosine similarity in the centered forward-backward Green coordinates and community structure as a key observation without supplying a formal derivation, concentration bound, or proof that intra-community similarities are guaranteed to exceed inter-community ones. The construction is motivated by the fact that raw hitting-time vectors contain a source-independent stationary component that violates the translation invariance implicitly required by cosine similarity; centering subtracts this baseline while truncation retains only the transient diffusive portion of the Green function. Forward-backward concatenation then encodes asymmetry. These steps are heuristic, and the paper relies on empirical validation rather than theoretical guarantees. In the revised manuscript we will explicitly label the construction as a geometrically motivated heuristic in both the abstract and the framework section, expand the discussion of the centering and truncation rationale, and note the absence of supporting analysis as a limitation. revision: partial
-
Referee: [Experiments] Experiments section: the abstract states improvements on synthetic benchmarks and competitiveness with directed spectral/flow methods, yet supplies no quantitative metrics, error bars, data-selection criteria, or statistical tests. Without these, it is impossible to assess whether the reported gains support the central geometric claim.
Authors: We accept that the current experimental presentation is insufficiently quantitative. The manuscript reports results via comparative figures and qualitative statements of improvement and competitiveness, but does not tabulate specific metrics, report variability across runs, or apply statistical tests. In the revision we will augment the experiments section with concrete performance numbers (e.g., NMI or ARI on synthetic directed benchmarks), averages and standard deviations over multiple graph realizations or random seeds, explicit criteria for benchmark selection, and, where appropriate, statistical comparisons against the baselines to substantiate the claims. revision: yes
Circularity Check
No circularity in the proposed Green cosine geometry
full rationale
The paper defines a coordinate embedding from standard directed random-walk quantities (hitting times replaced by centered truncated Green profiles, forward-backward concatenation) and then applies cosine similarity plus KMeans/overlap rules. These are explicit algorithmic constructions, not derivations whose outputs are forced to equal their inputs by definition or by a self-citation chain. Experiments report empirical performance on benchmarks; no load-bearing claim reduces to a fitted parameter renamed as prediction or to an unverified self-citation. The central geometry is therefore self-contained against external random-walk theory.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Directed random walk hitting times and Green functions capture community-relevant structure when properly centered and concatenated forward-backward.
- domain assumption Cosine similarity in the resulting Green space will group nodes according to directed community membership.
Reference graph
Works this paper leans on
-
[1]
InProceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1027–1035, Philadelphia, PA, 2007
David Arthur and Sergei Vassilvitskii.k-means++: The advantages of careful seeding. InProceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1027–1035, Philadelphia, PA, 2007. Society for Industrial and Applied Mathematics
2007
-
[2]
Detecting overlapping commu- nities of weighted networks via a local algorithm.Physica A: Statistical Mechanics and its Applications, 389(19):4177–4187, 2010
Duanbing Chen, Mingsheng Shang, Zehua Lv, and Yan Fu. Detecting overlapping commu- nities of weighted networks via a local algorithm.Physica A: Statistical Mechanics and its Applications, 389(19):4177–4187, 2010
2010
-
[3]
Laplacians and the cheeger inequality for directed graphs.Annals of Combi- natorics, 9(1):1–19, 2005
Fan Chung. Laplacians and the cheeger inequality for directed graphs.Annals of Combi- natorics, 9(1):1–19, 2005
2005
-
[4]
Community detection in directed graphs using stationary distribution and hitting times methods.Social Network Analysis and Mining, 13(80), 2023
Tien Dat Dang, Duy Hieu Do, and Thi Ha Duong Phan. Community detection in directed graphs using stationary distribution and hitting times methods.Social Network Analysis and Mining, 13(80), 2023. 40
2023
-
[5]
Dhillon and Dharmendra S
Inderjit S. Dhillon and Dharmendra S. Modha. Concept decompositions for large sparse text data using clustering.Machine Learning, 42(1–2):143–175, 2001
2001
-
[6]
Improving the DF-Louvain algo- rithm through random walk-based refinement
Duy Hieu Do, Dung Nguyen, and Thi Ha Duong Phan. Improving the DF-Louvain algo- rithm through random walk-based refinement. InProceedings of the 2025 RIVF Interna- tional Conference on Computing and Communication Technologies (RIVF), pages 932–937, 2025
2025
-
[7]
Detecting communities in large networks using the extended Walktrap algorithm
Duy Hieu Do and Thi Ha Duong Phan. Detecting communities in large networks using the extended Walktrap algorithm. InProceedings of the 2022 RIVF International Conference on Computing and Communication Technologies (RIVF), pages 100–105, 2022
2022
-
[8]
An improvement on the Louvain algorithm using random walks.Journal of Combinatorial Optimization, 50(2):14, 2025
Duy Hieu Do and Thi Ha Duong Phan. An improvement on the Louvain algorithm using random walks.Journal of Combinatorial Optimization, 50(2):14, 2025
2025
-
[9]
Overlapping community detection algorithms using modularity and the cosine.Advances in Complex Systems, 28(03):2550006, 2025
Duy Hieu Do and Thi Ha Duong Phan. Overlapping community detection algorithms using modularity and the cosine.Advances in Complex Systems, 28(03):2550006, 2025
2025
-
[10]
Community detection in graphs.Physics Reports, 486(3–5):75–174, 2010
Santo Fortunato. Community detection in graphs.Physics Reports, 486(3–5):75–174, 2010
2010
-
[11]
node2vec: Scalable feature learning for networks
Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 855–864, 2016
2016
-
[12]
Community detection in large-scale networks: A survey and empirical evaluation.WIREs Computational Statistics, 6(6):426–439, 2014
Steve Harenberg, Gonzalo Bello, Lucas Gjeltema, Stephen Ranshous, Jitendra Harlalka, Ramona Seay, Kanchana Padmanabhan, and Nagiza Samatova. Community detection in large-scale networks: A survey and empirical evaluation.WIREs Computational Statistics, 6(6):426–439, 2014
2014
-
[13]
Spherical k-means clus- tering.Journal of Statistical Software, 50(10):1–22, 2012
Kurt Hornik, Ingo Feinerer, Martin Kober, and Christian Buchta. Spherical k-means clus- tering.Journal of Statistical Software, 50(10):1–22, 2012
2012
-
[14]
Comparing partitions.Journal of Classification, 2:193–218, 1985
Lawrence Hubert and Phipps Arabie. Comparing partitions.Journal of Classification, 2:193–218, 1985
1985
-
[15]
Jeffrey J. Hunter. Generalized inverses and their application to applied probability prob- lems.Linear Algebra and its Applications, 45:157–198, 1982
1982
-
[16]
Community detection algorithm evaluation with ground-truth data.Physica A: Statistical Mechanics and its Applications, 492:651–706, 2018
Malek Jebabli, Hocine Cherifi, Chantal Cherifi, and Atef Hamouda. Community detection algorithm evaluation with ground-truth data.Physica A: Statistical Mechanics and its Applications, 492:651–706, 2018
2018
-
[17]
Brian Karrer and M. E. J. Newman. Stochastic blockmodels and community structure in networks.Physical Review E, 83(1):016107, 2011
2011
-
[18]
Gen- eralization of a Fundamental Matrix
John G. Kemeny and J. Laurie Snell.Finite Markov Chains: With a New Appendix “Gen- eralization of a Fundamental Matrix”. Undergraduate Texts in Mathematics. Springer, New York, 1976
1976
-
[19]
Sungmin Kim and Tao Shi. Scalable spectral algorithms for community detection in di- rected networks.arXiv preprint arXiv:1211.6807, 2012
-
[20]
Srijan Kumar, Francesca Spezzano, V. S. Subrahmanian, and Christos Faloutsos. Edge weight prediction in weighted signed networks. InIEEE International Conference on Data Mining, pages 221–230, 2016
2016
-
[21]
E. A. Leicht and M. E. J. Newman. Community structure in directed networks.Physical Review Letters, 100(11):118703, 2008. 41
2008
-
[22]
Graph evolution: Densification and shrinking diameters.ACM Transactions on Knowledge Discovery from Data, 1(1), 2007
Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graph evolution: Densification and shrinking diameters.ACM Transactions on Knowledge Discovery from Data, 1(1), 2007
2007
-
[23]
Digraph laplacian and the degree of asymmetry.Internet Mathematics, 8(4):381–401, 2012
Yanhua Li and Zhi-Li Zhang. Digraph laplacian and the degree of asymmetry.Internet Mathematics, 8(4):381–401, 2012
2012
-
[24]
Link prediction in complex networks: A survey.Physica A: Statistical Mechanics and its Applications, 390(6):1150–1170, 2011
Linyuan Lü and Tao Zhou. Link prediction in complex networks: A survey.Physica A: Statistical Mechanics and its Applications, 390(6):1150–1170, 2011
2011
-
[25]
Malliaros and Michalis Vazirgiannis
Fragkiskos D. Malliaros and Michalis Vazirgiannis. Clustering and community detection in directed networks: A survey.Physics Reports, 533(4):95–142, 2013
2013
-
[26]
McDaid, Derek Greene, and Neil Hurley
Aaron F. McDaid, Derek Greene, and Neil Hurley. Normalized mutual information to evaluate overlapping community finding algorithms, 2011
2011
-
[27]
ThePageRankcitation ranking: Bringing order to the web
LawrencePage, SergeyBrin, RajeevMotwani, andTerryWinograd. ThePageRankcitation ranking: Bringing order to the web. Technical Report 1999-66, Stanford InfoLab, 1999. Stanford InfoLab publication 422
1999
-
[28]
Pietro Panzarasa, Tore Opsahl, and Kathleen M. Carley. Patterns and dynamics of users’ behavior and interaction: Network analysis of an online community.Journal of the Amer- ican Society for Information Science and Technology, 60(5):911–932, 2009
2009
-
[29]
Larremore, and Aaron Clauset
Leto Peel, Daniel B. Larremore, and Aaron Clauset. The ground truth about metadata and community detection in networks.Science Advances, 3(5):e1602548, 2017
2017
-
[30]
Deepwalk: Online learning of social representations
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deepwalk: Online learning of social representations. InProceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 701–710, 2014
2014
-
[31]
Ponomarenko, L
A. Ponomarenko, L. Pitsoulis, and M. Shamshetdinov. Overlapping community detection in networks based on link partitioning and partitioning around medoids.PLOS ONE, 16(8):e0255717, 2021
2021
-
[32]
Computing communities in large networks using random walks.Journal of Graph Algorithms and Applications, 10(2):191–218, 2006
Pascal Pons and Matthieu Latapy. Computing communities in large networks using random walks.Journal of Graph Algorithms and Applications, 10(2):191–218, 2006
2006
-
[33]
Regularized spectral clustering under the degree-corrected stochas- tic blockmodel
Tai Qin and Karl Rohe. Regularized spectral clustering under the degree-corrected stochas- tic blockmodel. InAdvances in Neural Information Processing Systems, volume 26, 2013
2013
-
[34]
Mapping the Gnutella network: Prop- erties of large-scale peer-to-peer systems and implications for system design.IEEE Internet Computing, 6(1):50–57, 2002
Matei Ripeanu, Ian Foster, and Adriana Iamnitchi. Mapping the Gnutella network: Prop- erties of large-scale peer-to-peer systems and implications for system design.IEEE Internet Computing, 6(1):50–57, 2002
2002
-
[35]
Bergstrom
Martin Rosvall and Carl T. Bergstrom. Maps of random walks on complex networks reveal community structure.Proceedings of the National Academy of Sciences, 105(4):1118–1123, 2008
2008
-
[36]
Collegemsg temporal network dataset.https://snap
Stanford Network Analysis Project. Collegemsg temporal network dataset.https://snap. stanford.edu/data/CollegeMsg.html. Accessed 2026
2026
-
[37]
Email-eu-core network dataset.https://snap
Stanford Network Analysis Project. Email-eu-core network dataset.https://snap. stanford.edu/data/email-Eu-core.html. Accessed 2026
2026
-
[38]
Gnutella peer-to-peer network, august 4 2002.https: //snap.stanford.edu/data/p2p-Gnutella04.html
Stanford Network Analysis Project. Gnutella peer-to-peer network, august 4 2002.https: //snap.stanford.edu/data/p2p-Gnutella04.html. Accessed May 2026. 42
2002
-
[39]
Gnutella peer-to-peer network, august 8 2002.https: //snap.stanford.edu/data/p2p-Gnutella08.html
Stanford Network Analysis Project. Gnutella peer-to-peer network, august 8 2002.https: //snap.stanford.edu/data/p2p-Gnutella08.html. Accessed May 2026
2002
-
[40]
Wikipedia vote network dataset.https://snap
Stanford Network Analysis Project. Wikipedia vote network dataset.https://snap. stanford.edu/data/wiki-Vote.html. Accessed 2026
2026
-
[41]
Cluster ensembles—a knowledge reuse framework for combining multiple partitions.Journal of Machine Learning Research, 3:583–617, 2002
Alexander Strehl and Joydeep Ghosh. Cluster ensembles—a knowledge reuse framework for combining multiple partitions.Journal of Machine Learning Research, 3:583–617, 2002
2002
-
[42]
Community detection in networks using graph embeddings.Physical Review E, 103(2):022316, 2021
Aditya Tandon, Aiiad Albeshri, Vijey Thayananthan, Wadee Alhalabi, Filippo Radicchi, and Santo Fortunato. Community detection in networks using graph embeddings.Physical Review E, 103(2):022316, 2021
2021
-
[43]
LINE: Large-scale information network embedding
Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. LINE: Large-scale information network embedding. InProceedings of the 24th International Con- ference on World Wide Web, pages 1067–1077, 2015
2015
-
[44]
Spectral algorithms for community detection in directed networks.Journal of Machine Learning Research, 21(153):1–45, 2020
Zhe Wang, Yingbin Liang, and Pengsheng Ji. Spectral algorithms for community detection in directed networks.Journal of Machine Learning Research, 21(153):1–45, 2020
2020
-
[45]
Paskov, Jure Leskovec, and Christopher Potts
Robert West, Hristo S. Paskov, Jure Leskovec, and Christopher Potts. Exploiting social network structure for person-to-person sentiment analysis.Transactions of the Association for Computational Linguistics, 2:297–310, 2014
2014
-
[46]
Detecting cohesive and 2-mode commu- nitiesindirectedandundirectednetworks
Jaewon Yang, Julian McAuley, and Jure Leskovec. Detecting cohesive and 2-mode commu- nitiesindirectedandundirectednetworks. InProceedings of the Seventh ACM International Conference on Web Search and Data Mining, pages 323–332, 2014
2014
-
[47]
Benson, Jure Leskovec, and David F
Hao Yin, Austin R. Benson, Jure Leskovec, and David F. Gleich. Local higher-order graph clustering. InProceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 555–564. ACM, 2017
2017
-
[48]
Community detection indirectednetworksbasedonnetwork embeddings.Chaos, Solitons & Fractals, 189:115630, 2024
Guihai Yu, Yang Jiao, Matthias Dehmer, and Frank Emmert-Streib. Community detection indirectednetworksbasedonnetwork embeddings.Chaos, Solitons & Fractals, 189:115630, 2024. 43
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.