arxiv: 2605.03318 · v1 · submitted 2026-05-05 · 💻 cs.SI

Recognition: unknown

Forward--Backward Green Cosine Geometry for Directed Community Detection and Overlap Expansion

Duy Hieu Do

Pith reviewed 2026-05-07 12:55 UTC · model grok-4.3

classification 💻 cs.SI

keywords directedcosinegreengeometryhitting-timepartitioncommunitydetection

0 comments

The pith

The paper introduces forward-backward Green cosine geometry to improve directed community detection and overlap expansion using centered diffusive profiles from random walks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Directed graphs have one-way edges, making community detection hard because random walks do not reverse easily and nodes have different source and target roles. The authors observe that hitting times from random walks are useful but raw versions include a constant baseline that distorts cosine similarity, which is not shift-invariant. They address this by centering the Green function profiles to isolate the diffusive part, truncating to exclude the time-zero self-spike, and concatenating the forward profile with the one from the reversed graph. This creates forward-backward Green coordinates for each node. The method then applies KMeans clustering in this space for disjoint partitions and uses an adaptive cosine rule to expand into overlapping covers. Experiments on synthetic benchmarks and real networks are reported to show gains over raw hitting-time variants and competitiveness with spectral and flow baselines.

Core claim

The key observation is that hitting-time information is natural for directed graphs, but raw hitting-time vectors are not well suited for cosine comparison: they contain a source-independent stationary baseline, whereas cosine similarity is not translation-invariant. We therefore replace raw hitting-time profiles by centered Green profiles of the directed random walk and use the diffusive part of the truncated Green profile, excluding the time-zero self-spike. To account for asymmetry, we concatenate the Green profile of the original walk with the corresponding profile on the edge-reversed graph, yielding forward--backward Green coordinates.

Load-bearing premise

That centered Green profiles (diffusive part, truncated, forward-backward concatenated) from the directed random walk provide an embedding in which cosine similarity reliably reflects community structure, as required for both the KMeans partitioning and the overlap expansion rule to succeed.

read the original abstract

Community detection in directed graphs is challenging because edge asymmetry induces non-reversible diffusion, direction-dependent accessibility, and distinct source and target roles. This paper develops a Green-based cosine geometry for directed community detection and for expanding a disjoint partition into an overlapping cover. The key observation is that hitting-time information is natural for directed graphs, but raw hitting-time vectors are not well suited for cosine comparison: they contain a source-independent stationary baseline, whereas cosine similarity is not translation-invariant. We therefore replace raw hitting-time profiles by centered Green profiles of the directed random walk and use the diffusive part of the truncated Green profile, excluding the time-zero self-spike. To account for asymmetry, we concatenate the Green profile of the original walk with the corresponding profile on the edge-reversed graph, yielding forward--backward Green coordinates. The framework gives two algorithms. Di-Green-FB-cosine-KMeans clusters vertices in the Green cosine space to obtain a disjoint directed partition. Di-Green-FB-Cosine Overlap expands an initial partition into an overlapping cover using a community-adaptive cosine rule. The initial partition can be supplied by any disjoint method; in the main pipeline it is produced by Di-Green-FB-cosine-KMeans. Experiments on synthetic directed benchmarks show that the proposed geometry improves over raw hitting-time cosine variants and is competitive with directed spectral and flow-based baselines. Real-network experiments, evaluated by directed modularity as an internal quality measure, indicate that the same geometry produces coherent directed partitions. Synthetic overlap experiments further show that the method recovers additional memberships effectively, especially in moderately and weakly separated directed networks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a clean geometric construction for directed graphs by centering and concatenating truncated Green profiles, but the claim that cosine in this space tracks communities rests on an unproven alignment.

read the letter

The core idea is to replace raw hitting-time vectors with centered Green profiles from the directed random walk, drop the time-zero spike, and concatenate the forward and reversed versions so that cosine similarity becomes usable for partitioning. This directly tackles the translation problem with stationary baselines and the asymmetry of directions in one step, which is a practical adjustment not spelled out in the usual hitting-time or spectral baselines for directed graphs. The two algorithms follow straightforwardly: KMeans in the resulting space for a hard partition, then a cosine-based rule to expand it into overlaps using any initial partition as seed. On synthetic directed benchmarks the method reportedly beats simpler hitting-time cosine versions and stays competitive with flow and spectral alternatives, while real-network checks with directed modularity look coherent and the overlap tests recover extra memberships in moderately separated cases. That is useful empirical grounding for a targeted tool. The soft spot is exactly the one the stress-test flags: there is no derivation, bound, or even a clear argument showing why the diffusive part after centering and truncation should make intra-community cosines reliably larger than inter-community ones. The construction is internally consistent and parameter-free, but the embedding property is taken as given and supported only by the benchmark outcomes rather than analysis that would explain when or why it holds. Without that, the gains remain case-specific and harder to trust for new graphs. The paper is aimed at network researchers who already work with directed data and need a geometric option that respects asymmetry without heavy fitting. It is coherent enough on its own terms to deserve a serious referee, even if the theory side needs strengthening and the reported results would benefit from more detail on variance and data choices.

Referee Report

2 major / 2 minor

Summary. The paper develops a forward-backward Green cosine geometry for directed community detection. Raw hitting-time vectors are replaced by centered Green profiles of the directed random walk (using only the diffusive part of the truncated profile, excluding the time-zero self-spike). Forward and backward profiles are concatenated to handle asymmetry. The resulting coordinates support Di-Green-FB-cosine-KMeans for disjoint partitions and a cosine-based rule for expanding an initial partition into an overlapping cover. Experiments claim improvements over raw hitting-time cosine on synthetic directed benchmarks, competitiveness with directed spectral and flow baselines, coherent real-network partitions under directed modularity, and effective overlap recovery especially in moderately/weakly separated networks.

Significance. If the centered truncated forward-backward Green embedding reliably aligns cosine similarity with community structure, the approach would supply a geometrically motivated method for non-reversible diffusion on directed graphs that corrects the translation issue of raw hitting times and incorporates directionality. The overlap-expansion component adds practical value. Empirical competitiveness with established baselines indicates potential utility, though the contribution remains heuristic without supporting analysis.

major comments (2)

[Abstract and framework] Abstract (key observation paragraph) and the framework description: the claim that centered, truncated, forward-backward Green profiles form an embedding in which cosine similarity reflects community membership lacks any derivation, bound, or analysis showing that intra-community similarities exceed inter-community ones after centering (to remove the source-independent stationary baseline) and truncation. This unproven alignment is load-bearing for both the KMeans partitioning and the overlap-expansion rule.
[Experiments] Experiments section: the abstract states improvements on synthetic benchmarks and competitiveness with directed spectral/flow methods, yet supplies no quantitative metrics, error bars, data-selection criteria, or statistical tests. Without these, it is impossible to assess whether the reported gains support the central geometric claim.

minor comments (2)

[Methods] Define the Green function, truncation threshold, and exact concatenation operation with consistent notation in the methods; the current description leaves the precise construction of the diffusive part ambiguous.
[Overlap expansion] Clarify how the initial partition is obtained when it is not supplied by Di-Green-FB-cosine-KMeans, and state the community-adaptive cosine threshold explicitly.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive critique of our manuscript. We respond point-by-point to the major comments and indicate the revisions we will make.

read point-by-point responses

Referee: [Abstract and framework] Abstract (key observation paragraph) and the framework description: the claim that centered, truncated, forward-backward Green profiles form an embedding in which cosine similarity reflects community membership lacks any derivation, bound, or analysis showing that intra-community similarities exceed inter-community ones after centering (to remove the source-independent stationary baseline) and truncation. This unproven alignment is load-bearing for both the KMeans partitioning and the overlap-expansion rule.

Authors: We agree that the manuscript presents the alignment between cosine similarity in the centered forward-backward Green coordinates and community structure as a key observation without supplying a formal derivation, concentration bound, or proof that intra-community similarities are guaranteed to exceed inter-community ones. The construction is motivated by the fact that raw hitting-time vectors contain a source-independent stationary component that violates the translation invariance implicitly required by cosine similarity; centering subtracts this baseline while truncation retains only the transient diffusive portion of the Green function. Forward-backward concatenation then encodes asymmetry. These steps are heuristic, and the paper relies on empirical validation rather than theoretical guarantees. In the revised manuscript we will explicitly label the construction as a geometrically motivated heuristic in both the abstract and the framework section, expand the discussion of the centering and truncation rationale, and note the absence of supporting analysis as a limitation. revision: partial
Referee: [Experiments] Experiments section: the abstract states improvements on synthetic benchmarks and competitiveness with directed spectral/flow methods, yet supplies no quantitative metrics, error bars, data-selection criteria, or statistical tests. Without these, it is impossible to assess whether the reported gains support the central geometric claim.

Authors: We accept that the current experimental presentation is insufficiently quantitative. The manuscript reports results via comparative figures and qualitative statements of improvement and competitiveness, but does not tabulate specific metrics, report variability across runs, or apply statistical tests. In the revision we will augment the experiments section with concrete performance numbers (e.g., NMI or ARI on synthetic directed benchmarks), averages and standard deviations over multiple graph realizations or random seeds, explicit criteria for benchmark selection, and, where appropriate, statistical comparisons against the baselines to substantiate the claims. revision: yes

Circularity Check

0 steps flagged

No circularity in the proposed Green cosine geometry

full rationale

The paper defines a coordinate embedding from standard directed random-walk quantities (hitting times replaced by centered truncated Green profiles, forward-backward concatenation) and then applies cosine similarity plus KMeans/overlap rules. These are explicit algorithmic constructions, not derivations whose outputs are forced to equal their inputs by definition or by a self-citation chain. Experiments report empirical performance on benchmarks; no load-bearing claim reduces to a fitted parameter renamed as prediction or to an unverified self-citation. The central geometry is therefore self-contained against external random-walk theory.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on domain assumptions about random-walk hitting times and Green functions being suitable for community geometry in directed graphs. No explicit free parameters or invented entities are stated in the abstract.

axioms (2)

domain assumption Directed random walk hitting times and Green functions capture community-relevant structure when properly centered and concatenated forward-backward.
This underpins the replacement of raw hitting-time vectors and the construction of the cosine coordinates.
domain assumption Cosine similarity in the resulting Green space will group nodes according to directed community membership.
Required for both the KMeans step and the overlap-expansion rule.

pith-pipeline@v0.9.0 · 5584 in / 1508 out tokens · 83164 ms · 2026-05-07T12:55:05.448391+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

48 extracted references · 1 canonical work pages

[1]

InProceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1027–1035, Philadelphia, PA, 2007

David Arthur and Sergei Vassilvitskii.k-means++: The advantages of careful seeding. InProceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1027–1035, Philadelphia, PA, 2007. Society for Industrial and Applied Mathematics

2007
[2]

Detecting overlapping commu- nities of weighted networks via a local algorithm.Physica A: Statistical Mechanics and its Applications, 389(19):4177–4187, 2010

Duanbing Chen, Mingsheng Shang, Zehua Lv, and Yan Fu. Detecting overlapping commu- nities of weighted networks via a local algorithm.Physica A: Statistical Mechanics and its Applications, 389(19):4177–4187, 2010

2010
[3]

Laplacians and the cheeger inequality for directed graphs.Annals of Combi- natorics, 9(1):1–19, 2005

Fan Chung. Laplacians and the cheeger inequality for directed graphs.Annals of Combi- natorics, 9(1):1–19, 2005

2005
[4]

Community detection in directed graphs using stationary distribution and hitting times methods.Social Network Analysis and Mining, 13(80), 2023

Tien Dat Dang, Duy Hieu Do, and Thi Ha Duong Phan. Community detection in directed graphs using stationary distribution and hitting times methods.Social Network Analysis and Mining, 13(80), 2023. 40

2023
[5]

Dhillon and Dharmendra S

Inderjit S. Dhillon and Dharmendra S. Modha. Concept decompositions for large sparse text data using clustering.Machine Learning, 42(1–2):143–175, 2001

2001
[6]

Improving the DF-Louvain algo- rithm through random walk-based refinement

Duy Hieu Do, Dung Nguyen, and Thi Ha Duong Phan. Improving the DF-Louvain algo- rithm through random walk-based refinement. InProceedings of the 2025 RIVF Interna- tional Conference on Computing and Communication Technologies (RIVF), pages 932–937, 2025

2025
[7]

Detecting communities in large networks using the extended Walktrap algorithm

Duy Hieu Do and Thi Ha Duong Phan. Detecting communities in large networks using the extended Walktrap algorithm. InProceedings of the 2022 RIVF International Conference on Computing and Communication Technologies (RIVF), pages 100–105, 2022

2022
[8]

An improvement on the Louvain algorithm using random walks.Journal of Combinatorial Optimization, 50(2):14, 2025

Duy Hieu Do and Thi Ha Duong Phan. An improvement on the Louvain algorithm using random walks.Journal of Combinatorial Optimization, 50(2):14, 2025

2025
[9]

Overlapping community detection algorithms using modularity and the cosine.Advances in Complex Systems, 28(03):2550006, 2025

Duy Hieu Do and Thi Ha Duong Phan. Overlapping community detection algorithms using modularity and the cosine.Advances in Complex Systems, 28(03):2550006, 2025

2025
[10]

Community detection in graphs.Physics Reports, 486(3–5):75–174, 2010

Santo Fortunato. Community detection in graphs.Physics Reports, 486(3–5):75–174, 2010

2010
[11]

node2vec: Scalable feature learning for networks

Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 855–864, 2016

2016
[12]

Community detection in large-scale networks: A survey and empirical evaluation.WIREs Computational Statistics, 6(6):426–439, 2014

Steve Harenberg, Gonzalo Bello, Lucas Gjeltema, Stephen Ranshous, Jitendra Harlalka, Ramona Seay, Kanchana Padmanabhan, and Nagiza Samatova. Community detection in large-scale networks: A survey and empirical evaluation.WIREs Computational Statistics, 6(6):426–439, 2014

2014
[13]

Spherical k-means clus- tering.Journal of Statistical Software, 50(10):1–22, 2012

Kurt Hornik, Ingo Feinerer, Martin Kober, and Christian Buchta. Spherical k-means clus- tering.Journal of Statistical Software, 50(10):1–22, 2012

2012
[14]

Comparing partitions.Journal of Classification, 2:193–218, 1985

Lawrence Hubert and Phipps Arabie. Comparing partitions.Journal of Classification, 2:193–218, 1985

1985
[15]

Jeffrey J. Hunter. Generalized inverses and their application to applied probability prob- lems.Linear Algebra and its Applications, 45:157–198, 1982

1982
[16]

Community detection algorithm evaluation with ground-truth data.Physica A: Statistical Mechanics and its Applications, 492:651–706, 2018

Malek Jebabli, Hocine Cherifi, Chantal Cherifi, and Atef Hamouda. Community detection algorithm evaluation with ground-truth data.Physica A: Statistical Mechanics and its Applications, 492:651–706, 2018

2018
[17]

Brian Karrer and M. E. J. Newman. Stochastic blockmodels and community structure in networks.Physical Review E, 83(1):016107, 2011

2011
[18]

Gen- eralization of a Fundamental Matrix

John G. Kemeny and J. Laurie Snell.Finite Markov Chains: With a New Appendix “Gen- eralization of a Fundamental Matrix”. Undergraduate Texts in Mathematics. Springer, New York, 1976

1976
[19]

Scalable spectral algorithms for community detection in di- rected networks.arXiv preprint arXiv:1211.6807, 2012

Sungmin Kim and Tao Shi. Scalable spectral algorithms for community detection in di- rected networks.arXiv preprint arXiv:1211.6807, 2012

work page arXiv 2012
[20]

Srijan Kumar, Francesca Spezzano, V. S. Subrahmanian, and Christos Faloutsos. Edge weight prediction in weighted signed networks. InIEEE International Conference on Data Mining, pages 221–230, 2016

2016
[21]

E. A. Leicht and M. E. J. Newman. Community structure in directed networks.Physical Review Letters, 100(11):118703, 2008. 41

2008
[22]

Graph evolution: Densification and shrinking diameters.ACM Transactions on Knowledge Discovery from Data, 1(1), 2007

Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graph evolution: Densification and shrinking diameters.ACM Transactions on Knowledge Discovery from Data, 1(1), 2007

2007
[23]

Digraph laplacian and the degree of asymmetry.Internet Mathematics, 8(4):381–401, 2012

Yanhua Li and Zhi-Li Zhang. Digraph laplacian and the degree of asymmetry.Internet Mathematics, 8(4):381–401, 2012

2012
[24]

Link prediction in complex networks: A survey.Physica A: Statistical Mechanics and its Applications, 390(6):1150–1170, 2011

Linyuan Lü and Tao Zhou. Link prediction in complex networks: A survey.Physica A: Statistical Mechanics and its Applications, 390(6):1150–1170, 2011

2011
[25]

Malliaros and Michalis Vazirgiannis

Fragkiskos D. Malliaros and Michalis Vazirgiannis. Clustering and community detection in directed networks: A survey.Physics Reports, 533(4):95–142, 2013

2013
[26]

McDaid, Derek Greene, and Neil Hurley

Aaron F. McDaid, Derek Greene, and Neil Hurley. Normalized mutual information to evaluate overlapping community finding algorithms, 2011

2011
[27]

ThePageRankcitation ranking: Bringing order to the web

LawrencePage, SergeyBrin, RajeevMotwani, andTerryWinograd. ThePageRankcitation ranking: Bringing order to the web. Technical Report 1999-66, Stanford InfoLab, 1999. Stanford InfoLab publication 422

1999
[28]

Pietro Panzarasa, Tore Opsahl, and Kathleen M. Carley. Patterns and dynamics of users’ behavior and interaction: Network analysis of an online community.Journal of the Amer- ican Society for Information Science and Technology, 60(5):911–932, 2009

2009
[29]

Larremore, and Aaron Clauset

Leto Peel, Daniel B. Larremore, and Aaron Clauset. The ground truth about metadata and community detection in networks.Science Advances, 3(5):e1602548, 2017

2017
[30]

Deepwalk: Online learning of social representations

Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deepwalk: Online learning of social representations. InProceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 701–710, 2014

2014
[31]

Ponomarenko, L

A. Ponomarenko, L. Pitsoulis, and M. Shamshetdinov. Overlapping community detection in networks based on link partitioning and partitioning around medoids.PLOS ONE, 16(8):e0255717, 2021

2021
[32]

Computing communities in large networks using random walks.Journal of Graph Algorithms and Applications, 10(2):191–218, 2006

Pascal Pons and Matthieu Latapy. Computing communities in large networks using random walks.Journal of Graph Algorithms and Applications, 10(2):191–218, 2006

2006
[33]

Regularized spectral clustering under the degree-corrected stochas- tic blockmodel

Tai Qin and Karl Rohe. Regularized spectral clustering under the degree-corrected stochas- tic blockmodel. InAdvances in Neural Information Processing Systems, volume 26, 2013

2013
[34]

Mapping the Gnutella network: Prop- erties of large-scale peer-to-peer systems and implications for system design.IEEE Internet Computing, 6(1):50–57, 2002

Matei Ripeanu, Ian Foster, and Adriana Iamnitchi. Mapping the Gnutella network: Prop- erties of large-scale peer-to-peer systems and implications for system design.IEEE Internet Computing, 6(1):50–57, 2002

2002
[35]

Bergstrom

Martin Rosvall and Carl T. Bergstrom. Maps of random walks on complex networks reveal community structure.Proceedings of the National Academy of Sciences, 105(4):1118–1123, 2008

2008
[36]

Collegemsg temporal network dataset.https://snap

Stanford Network Analysis Project. Collegemsg temporal network dataset.https://snap. stanford.edu/data/CollegeMsg.html. Accessed 2026

2026
[37]

Email-eu-core network dataset.https://snap

Stanford Network Analysis Project. Email-eu-core network dataset.https://snap. stanford.edu/data/email-Eu-core.html. Accessed 2026

2026
[38]

Gnutella peer-to-peer network, august 4 2002.https: //snap.stanford.edu/data/p2p-Gnutella04.html

Stanford Network Analysis Project. Gnutella peer-to-peer network, august 4 2002.https: //snap.stanford.edu/data/p2p-Gnutella04.html. Accessed May 2026. 42

2002
[39]

Gnutella peer-to-peer network, august 8 2002.https: //snap.stanford.edu/data/p2p-Gnutella08.html

Stanford Network Analysis Project. Gnutella peer-to-peer network, august 8 2002.https: //snap.stanford.edu/data/p2p-Gnutella08.html. Accessed May 2026

2002
[40]

Wikipedia vote network dataset.https://snap

Stanford Network Analysis Project. Wikipedia vote network dataset.https://snap. stanford.edu/data/wiki-Vote.html. Accessed 2026

2026
[41]

Cluster ensembles—a knowledge reuse framework for combining multiple partitions.Journal of Machine Learning Research, 3:583–617, 2002

Alexander Strehl and Joydeep Ghosh. Cluster ensembles—a knowledge reuse framework for combining multiple partitions.Journal of Machine Learning Research, 3:583–617, 2002

2002
[42]

Community detection in networks using graph embeddings.Physical Review E, 103(2):022316, 2021

Aditya Tandon, Aiiad Albeshri, Vijey Thayananthan, Wadee Alhalabi, Filippo Radicchi, and Santo Fortunato. Community detection in networks using graph embeddings.Physical Review E, 103(2):022316, 2021

2021
[43]

LINE: Large-scale information network embedding

Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. LINE: Large-scale information network embedding. InProceedings of the 24th International Con- ference on World Wide Web, pages 1067–1077, 2015

2015
[44]

Spectral algorithms for community detection in directed networks.Journal of Machine Learning Research, 21(153):1–45, 2020

Zhe Wang, Yingbin Liang, and Pengsheng Ji. Spectral algorithms for community detection in directed networks.Journal of Machine Learning Research, 21(153):1–45, 2020

2020
[45]

Paskov, Jure Leskovec, and Christopher Potts

Robert West, Hristo S. Paskov, Jure Leskovec, and Christopher Potts. Exploiting social network structure for person-to-person sentiment analysis.Transactions of the Association for Computational Linguistics, 2:297–310, 2014

2014
[46]

Detecting cohesive and 2-mode commu- nitiesindirectedandundirectednetworks

Jaewon Yang, Julian McAuley, and Jure Leskovec. Detecting cohesive and 2-mode commu- nitiesindirectedandundirectednetworks. InProceedings of the Seventh ACM International Conference on Web Search and Data Mining, pages 323–332, 2014

2014
[47]

Benson, Jure Leskovec, and David F

Hao Yin, Austin R. Benson, Jure Leskovec, and David F. Gleich. Local higher-order graph clustering. InProceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 555–564. ACM, 2017

2017
[48]

Community detection indirectednetworksbasedonnetwork embeddings.Chaos, Solitons & Fractals, 189:115630, 2024

Guihai Yu, Yang Jiao, Matthias Dehmer, and Frank Emmert-Streib. Community detection indirectednetworksbasedonnetwork embeddings.Chaos, Solitons & Fractals, 189:115630, 2024. 43

2024