Clustering as Reasoning: A $k$-Means Interpretation of Chain-of-Thought Graph Learning

Bingheng Li; Xingtong Yu; Xuanting Xie; Yuan Fang; Zhaochen Guo; Zhao Kang; Zhifei Liao

arxiv: 2605.24867 · v1 · pith:6OQM45U2new · submitted 2026-05-24 · 💻 cs.AI · cs.CL· cs.NI

Clustering as Reasoning: A k-Means Interpretation of Chain-of-Thought Graph Learning

Xuanting Xie , Zhaochen Guo , Bingheng Li , Xingtong Yu , Zhifei Liao , Zhao Kang , Yuan Fang This is my paper

Pith reviewed 2026-06-30 11:36 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.NI

keywords chain-of-thoughtk-meansgraph learningtransformerreasoningclusteringtext-attributed graphs

0 comments

The pith

A Transformer block is mathematically equivalent to one step of the k-means algorithm, turning chain-of-thought on graphs into iterative clustering.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that chain-of-thought reasoning over text-attributed graphs can be reframed as a clustering process. It starts from the observation that current graph CoT approaches use separate modules and unchanging graph encodings, which blocks ongoing interaction between meaning and structure. By deriving a direct mathematical link between the operations inside a Transformer block and the assignment-plus-update cycle of k-means, the authors build a single model called KCoT that makes each reasoning step explicit as a clustering move. The resulting method adds a prompt that writes out the assignment and update phases and an alignment step that keeps topological information in the evolving representations. Experiments on standard graph benchmarks show steady gains over prior CoT and graph methods.

Core claim

Our key theoretical result reveals a formal mathematical correspondence between a Transformer block and the k-means algorithm, allowing reasoning to be interpreted as iterative assignment and update steps. Based on this insight, we introduce a Semantic Discriminating Prompt that explicitly formulates these steps as structured CoT reasoning, together with a structure-grounded alignment strategy to fuse topological priors with evolving thought-conditioned representations.

What carries the argument

the formal mathematical correspondence between a Transformer block and the k-means algorithm, which recasts each reasoning step as a cluster assignment followed by a centroid update

If this is right

Reasoning becomes an explicit sequence of assignment and update operations that can be inspected at each step.
Semantic and topological information interact continuously inside one model instead of through separate modules.
The same clustering view supplies a concrete prompt template and an alignment loss that together improve accuracy on graph reasoning benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same block-to-algorithm mapping could be checked in Transformers used for non-graph sequential reasoning tasks.
If the equivalence is tight, replacing k-means with other iterative partitioning methods inside the same framework would be a direct next test.

Load-bearing premise

Existing graph CoT methods are limited by disjoint architectures and fixed representations, and the Transformer-k-means correspondence directly supplies a unified framework with explicit semantic-topological interaction.

What would settle it

Run the proposed KCoT model on a small labeled graph while recording the exact cluster assignments and centroid shifts at each Transformer layer; if these do not match the standard k-means iterations on the same node features and adjacency, the claimed correspondence does not hold.

Figures

Figures reproduced from arXiv: 2605.24867 by Bingheng Li, Xingtong Yu, Xuanting Xie, Yuan Fang, Zhaochen Guo, Zhao Kang, Zhifei Liao.

**Figure 1.** Figure 1: A toy example of the proposed prompt on Cora. The model effectively filters irrelevant neighbor C and focuses on identifying the salient semantic features, like “Dirichlet Mixtures” and “Hidden Markov Models”. 1. Introduction Graph Chain-of-Thought (CoT) prompting has emerged as a promising paradigm for enhancing the reasoning capabilities of LLMs on Text-Attributed Graphs (TAGs). By decomposing complex … view at source ↗

**Figure 2.** Figure 2: The overall framework of KCOT. It synergizes iterative CoT reasoning with graph representation learning. Specifically, we design a Semantic Discriminating Prompt that reformulates the k-means into explicit reasoning to refine node semantics. Simultaneously, a Structure-grounded Thoughts Construction strategy and a Condition-Net module dynamically fuse these evolving semantic thoughts with fixed topological… view at source ↗

**Figure 3.** Figure 3: LLM-driven replication of the k-means. We achieve fine-grained alignment between the k-means algorithm and the proposed prompt. evolving semantic clusters. This enables iterative refinement of latent groupings through language. Moreover, prior work (Diaz-Rodriguez, 2025) has revealed that LLMs possess a superior capability to extract semantic centroids compared to traditional k-means algorithms, which is … view at source ↗

**Figure 4.** Figure 4: Impact of Thought Length t. 2 4 8 16 32 64 128 80 82 84 86 88 90 92 Accuracy (%) Cora Products (a) Node Classification 2 4 8 16 32 64 128 80 85 90 95 100 Accuracy (%) Cora Products (b) Link Prediction [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Impact of hidden dimension d in the condition-net. 5.2. Overall Performance Comparison [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: t-SNE visualization of node embeddings (different colors represent different classes) and evolution of inter/intra class distance ratio during training on CORA (We visualize two steps and 400 epochs). refinement and employs a Condition-Net to dynamically align topological priors with evolving semantic thoughts. Extensive empirical evaluations across standard benchmarks demonstrate that KCoT not only achie… view at source ↗

read the original abstract

Chain-of-Thought (CoT) prompting has shown promise in enhancing the reasoning capabilities of large language models (LLMs) on text-attributed graphs (TAGs). This work reframes CoT-based graph learning through the principle of clustering as reasoning, offering a $k$-means interpretation of how iterative reasoning operates over graph-structured data. We observe that existing graph CoT methods rely on disjoint architectures and fixed graph representations, limiting step-by-step semantic-topological interaction and interpretability. To overcome this limitation, we propose a unified framework named KCoT that integrates CoT reasoning with graph representation learning. Our key theoretical result reveals a formal mathematical correspondence between a Transformer block and the $k$-means algorithm, allowing reasoning to be interpreted as iterative assignment and update steps. Based on this insight, we introduce a Semantic Discriminating Prompt that explicitly formulates these steps as structured CoT reasoning, together with a structure-grounded alignment strategy to fuse topological priors with evolving thought-conditioned representations. Experiments on standard benchmarks demonstrate consistent improvements over state-of-the-art methods, validating clustering as a principled mechanism for CoT-based graph learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The k-means view of CoT on graphs is a usable framing but the central Transformer correspondence is asserted without equations or derivation.

read the letter

The key takeaway is that this paper gives chain-of-thought on graphs a clustering interpretation through k-means, which might help with designing more interpretable reasoning steps, but the claimed formal link between Transformer blocks and k-means needs to be shown explicitly to be convincing.

They start from the observation that current graph CoT approaches use separate architectures for the graph part and the reasoning part, which keeps semantic and topological information from interacting closely across steps. KCoT aims to bring them together in one framework. The theoretical contribution is the mapping that turns reasoning into assignment and update phases like in k-means. From there they build the Semantic Discriminating Prompt to make those steps part of the CoT process, plus an alignment method that incorporates graph structure into the evolving representations. The experiments section reports better results than prior methods on the usual benchmarks.

This framing is new in how directly it applies the clustering view to graph CoT. It takes ideas from both areas and combines them into a practical prompt and alignment strategy. The benchmark gains suggest the approach has some practical merit.

The soft spot is the central theoretical claim. The abstract presents a formal mathematical correspondence, but without the actual equations or proof steps visible, it's difficult to assess whether the Transformer operations truly equate to the k-means iterations or if it's a looser analogy that uses soft attention for assignment. The stress-test note is right to flag this, because if extra assumptions are needed about how CoT tokens map to centroids, the whole interpretation of reasoning as clustering iterations weakens. Minor issues might include more detail on how the alignment strategy is implemented without introducing new hyperparameters that aren't controlled for.

Overall, this is for folks working on reasoning methods for text-attributed graphs and LLMs. Someone looking for new ways to structure CoT prompts could find the Semantic Discriminating Prompt useful to try out. The paper shows clear thinking in connecting the dots between clustering and reasoning, even if the math needs verification.

I would recommend sending it for peer review so the derivation can be examined in detail.

Referee Report

1 major / 0 minor

Summary. The paper reframes Chain-of-Thought (CoT) prompting on text-attributed graphs as a clustering process, asserting a formal mathematical correspondence between a Transformer block and the k-means algorithm that interprets iterative reasoning as assignment and centroid-update steps. It proposes the KCoT unified framework incorporating a Semantic Discriminating Prompt to formulate these steps explicitly and a structure-grounded alignment strategy to fuse topological priors with evolving representations, reporting consistent benchmark gains over prior graph CoT methods.

Significance. If the claimed correspondence is rigorously derived, the work supplies a principled theoretical basis for viewing CoT as clustering, which could unify graph representation learning with step-by-step reasoning and improve both interpretability and semantic-topological interaction. The reported empirical improvements would then constitute supporting evidence for the framework's practical value.

major comments (1)

[Abstract (key theoretical result)] Abstract (key theoretical result): The manuscript states a 'formal mathematical correspondence between a Transformer block and the k-means algorithm' but supplies no equations, derivation, or explicit mapping (e.g., equating attention to hard nearest-centroid assignment or FFN layers to mean updates). Without this, the equivalence remains interpretive rather than formal, which is load-bearing for the interpretation of CoT steps as clustering iterations and for the construction of the Semantic Discriminating Prompt.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on the theoretical foundation of our work. We address the major comment point by point below and will revise the manuscript accordingly.

read point-by-point responses

Referee: Abstract (key theoretical result): The manuscript states a 'formal mathematical correspondence between a Transformer block and the k-means algorithm' but supplies no equations, derivation, or explicit mapping (e.g., equating attention to hard nearest-centroid assignment or FFN layers to mean updates). Without this, the equivalence remains interpretive rather than formal, which is load-bearing for the interpretation of CoT steps as clustering iterations and for the construction of the Semantic Discriminating Prompt.

Authors: We agree that the abstract and main text would benefit from an explicit derivation to substantiate the claimed formal correspondence. In the revised manuscript we will add a new subsection (Section 3.2) that derives the mapping: self-attention as hard nearest-centroid assignment via the argmax operation on similarity scores, and the FFN as the mean-update step for centroid recomputation. We will also include the full set of equations linking one Transformer block iteration to one k-means iteration, making the equivalence rigorous rather than interpretive. This addition will directly support the Semantic Discriminating Prompt construction. revision: yes

Circularity Check

0 steps flagged

No significant circularity; theoretical correspondence presented as independent derivation

full rationale

The abstract and available text present the core result as a 'formal mathematical correspondence between a Transformer block and the k-means algorithm' derived to interpret reasoning steps, with no shown equations, self-citations, or fitted parameters that reduce the claim to its own inputs by construction. No load-bearing steps match the enumerated circularity patterns; the framework builds on the claimed insight without tautological redefinition or renaming of known results. This is the expected non-finding for a paper whose central claim is offered as a derived theoretical mapping rather than a statistical fit or self-referential premise.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests primarily on the unelaborated theoretical correspondence and the assumption that clustering steps can be explicitly prompted and aligned with graph structure; no free parameters or invented entities are described in the abstract.

axioms (1)

domain assumption Formal mathematical correspondence exists between a Transformer block and the k-means algorithm
Invoked as the key theoretical result enabling the clustering-as-reasoning interpretation.

pith-pipeline@v0.9.1-grok · 5752 in / 1210 out tokens · 29212 ms · 2026-06-30T11:36:27.393178+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

11 extracted references · 8 canonical work pages

[1]

arXiv preprint arXiv:2309.15402 , year=

Chen, R., Zhao, T., Jaiswal, A. K., Shah, N., and Wang, Z. LLaGA: Large language and graph assistant. InProceed- ings of the 41st International Conference on Machine Learning, volume 235 ofProceedings of Machine Learn- ing Research, pp. 7809–7823. PMLR, 21–27 Jul 2024a. Chen, Z., Mao, H., Li, H., Jin, W., Wen, H., Wei, X., Wang, S., Yin, D., Fan, W., Liu,...

work page arXiv
[2]

Summaries as centroids for inter- pretable and scalable text clustering.arXiv preprint arXiv:2502.09667,

Diaz-Rodriguez, J. Summaries as centroids for inter- pretable and scalable text clustering.arXiv preprint arXiv:2502.09667,

work page arXiv
[3]

Gpt4graph: Can large language models understand graph structured data? an empirical evaluation and benchmark- ing,

Guo, J., Du, L., Liu, H., Zhou, M., He, X., and Han, S. Gpt4graph: Can large language models understand graph structured data? an empirical evaluation and benchmark- ing.arXiv preprint arXiv:2305.15066,

work page arXiv
[4]

Disen- tangling homophily and heterophily in multimodal graph clustering

Guo, Z., Shen, Z., Xie, X., Wen, L., and Kang, Z. Disen- tangling homophily and heterophily in multimodal graph clustering. InProceedings of the 33rd ACM International Conference on Multimedia, pp. 2044–2053,

2044
[5]

Can llms effectively leverage graph structural information through prompts, and why?arXiv preprint arXiv:2309.16595, 2023

Huang, J., Zhang, X., Mei, Q., and Ma, J. Can llms ef- fectively leverage graph structural information through prompts, and why?arXiv preprint arXiv:2309.16595,

work page arXiv
[6]

Hetgcot: Heterogeneous graph-enhanced chain-of-thought llm rea- soning for academic question answering

Jia, R., Wu, M., Ding, Y ., Lu, J., and Zhang, Y . Hetgcot: Heterogeneous graph-enhanced chain-of-thought llm rea- soning for academic question answering. InFindings of the Association for Computational Linguistics: EMNLP 2025, pp. 15950–15963,

2025
[7]

arXiv preprint arXiv:2404.07103 (2024)

Jin, B., Xie, C., Zhang, J., Roy, K. K., Zhang, Y ., Li, Z., Li, R., Tang, X., Wang, S., Meng, Y ., et al. Graph chain-of- thought: Augmenting large language models by reason- ing on graphs.arXiv preprint arXiv:2404.07103,

work page arXiv
[8]

Evaluating large language models on graphs: Perfor- mance insights and comparative analysis,

Liu, C. and Wu, B. Evaluating large language models on graphs: Performance insights and comparative analysis. arXiv preprint arXiv:2308.11224,

work page arXiv
[9]

Graphinstruct: Empowering large language models with graph understanding and reasoning capability.arXiv preprint arXiv:2403.04483, 2024

Luo, Z., Song, X., Huang, H., Lian, J., Zhang, C., Jiang, J., Xie, X., and Jin, H. Graphinstruct: Empowering large language models with graph understanding and reasoning capability.arXiv preprint arXiv:2403.04483,

work page arXiv
[10]

Can language models solve graph problems in natural language?Advances in Neural Information Processing Systems, 36:30840–30861, 2023a

Wang, H., Feng, S., He, T., Tan, Z., Han, X., and Tsvetkov, Y . Can language models solve graph problems in natural language?Advances in Neural Information Processing Systems, 36:30840–30861, 2023a. Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., and Zhou, D. Self-consistency im- proves chain of thought reasoning in language...

work page arXiv
[11]

Lan- guage is all a graph needs

Ye, R., Zhang, C., Wang, R., Xu, S., and Zhang, Y . Lan- guage is all a graph needs. InFindings of the association for computational linguistics: EACL 2024, pp. 1955– 1973,

2024

[1] [1]

arXiv preprint arXiv:2309.15402 , year=

Chen, R., Zhao, T., Jaiswal, A. K., Shah, N., and Wang, Z. LLaGA: Large language and graph assistant. InProceed- ings of the 41st International Conference on Machine Learning, volume 235 ofProceedings of Machine Learn- ing Research, pp. 7809–7823. PMLR, 21–27 Jul 2024a. Chen, Z., Mao, H., Li, H., Jin, W., Wen, H., Wei, X., Wang, S., Yin, D., Fan, W., Liu,...

work page arXiv

[2] [2]

Summaries as centroids for inter- pretable and scalable text clustering.arXiv preprint arXiv:2502.09667,

Diaz-Rodriguez, J. Summaries as centroids for inter- pretable and scalable text clustering.arXiv preprint arXiv:2502.09667,

work page arXiv

[3] [3]

Gpt4graph: Can large language models understand graph structured data? an empirical evaluation and benchmark- ing,

Guo, J., Du, L., Liu, H., Zhou, M., He, X., and Han, S. Gpt4graph: Can large language models understand graph structured data? an empirical evaluation and benchmark- ing.arXiv preprint arXiv:2305.15066,

work page arXiv

[4] [4]

Disen- tangling homophily and heterophily in multimodal graph clustering

Guo, Z., Shen, Z., Xie, X., Wen, L., and Kang, Z. Disen- tangling homophily and heterophily in multimodal graph clustering. InProceedings of the 33rd ACM International Conference on Multimedia, pp. 2044–2053,

2044

[5] [5]

Can llms effectively leverage graph structural information through prompts, and why?arXiv preprint arXiv:2309.16595, 2023

Huang, J., Zhang, X., Mei, Q., and Ma, J. Can llms ef- fectively leverage graph structural information through prompts, and why?arXiv preprint arXiv:2309.16595,

work page arXiv

[6] [6]

Hetgcot: Heterogeneous graph-enhanced chain-of-thought llm rea- soning for academic question answering

Jia, R., Wu, M., Ding, Y ., Lu, J., and Zhang, Y . Hetgcot: Heterogeneous graph-enhanced chain-of-thought llm rea- soning for academic question answering. InFindings of the Association for Computational Linguistics: EMNLP 2025, pp. 15950–15963,

2025

[7] [7]

arXiv preprint arXiv:2404.07103 (2024)

Jin, B., Xie, C., Zhang, J., Roy, K. K., Zhang, Y ., Li, Z., Li, R., Tang, X., Wang, S., Meng, Y ., et al. Graph chain-of- thought: Augmenting large language models by reason- ing on graphs.arXiv preprint arXiv:2404.07103,

work page arXiv

[8] [8]

Evaluating large language models on graphs: Perfor- mance insights and comparative analysis,

Liu, C. and Wu, B. Evaluating large language models on graphs: Performance insights and comparative analysis. arXiv preprint arXiv:2308.11224,

work page arXiv

[9] [9]

Graphinstruct: Empowering large language models with graph understanding and reasoning capability.arXiv preprint arXiv:2403.04483, 2024

Luo, Z., Song, X., Huang, H., Lian, J., Zhang, C., Jiang, J., Xie, X., and Jin, H. Graphinstruct: Empowering large language models with graph understanding and reasoning capability.arXiv preprint arXiv:2403.04483,

work page arXiv

[10] [10]

Can language models solve graph problems in natural language?Advances in Neural Information Processing Systems, 36:30840–30861, 2023a

Wang, H., Feng, S., He, T., Tan, Z., Han, X., and Tsvetkov, Y . Can language models solve graph problems in natural language?Advances in Neural Information Processing Systems, 36:30840–30861, 2023a. Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., and Zhou, D. Self-consistency im- proves chain of thought reasoning in language...

work page arXiv

[11] [11]

Lan- guage is all a graph needs

Ye, R., Zhang, C., Wang, R., Xu, S., and Zhang, Y . Lan- guage is all a graph needs. InFindings of the association for computational linguistics: EACL 2024, pp. 1955– 1973,

2024