S2Aligner: Pair-Efficient and Transferable Pre-Training for Sparse Text-Attributed Graphs

HaoPeng Zhang; Jiaqi Yu; Ruijie Wang; Xiao Wang; Xinyu Zhao; Yibo Ding; Yuhang Liu; Yuhan Wang; Ziwei Zhang

arxiv: 2605.18579 · v3 · pith:XFEMTSZ6new · submitted 2026-05-18 · 💻 cs.LG

S2Aligner: Pair-Efficient and Transferable Pre-Training for Sparse Text-Attributed Graphs

Yuhan Wang , Haopeng Zhang , Yibo Ding , Jiaqi Yu , Xinyu Zhao , Yuhang Liu , Ziwei Zhang , Xiao Wang

show 1 more author

Ruijie Wang

This is my paper

Pith reviewed 2026-05-21 07:50 UTC · model grok-4.3

classification 💻 cs.LG

keywords sparse text-attributed graphsLLM-as-Alignergraph pre-trainingstructure-semantic decouplingcross-domain risk balancingconsistency controltransferable graph models

0 comments

The pith

S2Aligner decouples semantic alignment from structural modeling to pre-train on sparse text-attributed graphs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents S2Aligner as a method for building transferable graph foundation models when node texts are missing, noisy, or uneven across domains. It separates graph-text alignment into independent semantic and structural components so that topology signals can strengthen representations without mixing into the shared semantic space. Structure-oriented reconstruction with consistency control supplies reliable topology cues while suppressing inconsistent signals under sparsity, and a cross-domain risk balancing step uses global density ratios plus graph reliability estimates to downweight unreliable samples. Theoretical analysis shows the objective reduces generalization gaps by controlling domain risk discrepancy. Experiments across multiple domains, sparsity levels, and tasks indicate consistent gains over prior LLM-as-Aligner baselines.

Core claim

S2Aligner decomposes graph-text representations into semantic and structural components, applies structure-oriented reconstruction with consistency control to inject reliable topology cues into text representations, and introduces sparsity-aware cross-domain risk balancing that calibrates domain risks through a global-domain density ratio and downweights unreliable sparse samples via graph reliability estimation. Theoretical analysis shows that this objective reduces cross-domain generalization gaps by controlling domain risk discrepancy.

What carries the argument

Decomposition of representations into semantic and structural components combined with structure-oriented reconstruction under consistency control and sparsity-aware cross-domain risk balancing via global-domain density ratio and graph reliability estimation.

If this is right

Structure-semantics correspondence becomes more reliable when textual anchors are absent or uneven.
Cross-domain generalization gaps shrink when domain risks are calibrated by density ratios and reliability estimates.
Downstream task accuracy improves across graph domains and sparsity regimes.
Pre-training remains stable even when node texts provide only weak supervision.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same decoupling and risk-calibration steps could be tested on graphs with evolving sparsity, such as citation networks that gain or lose edges over time.
If reliability estimation proves robust, the approach may extend to other weak-supervision settings like partially labeled multimodal data.
Comparing the density-ratio calibration against simpler reweighting schemes on the same sparse benchmarks would clarify how much of the gain comes from the global-domain term.

Load-bearing premise

The assumption that structure-oriented reconstruction with consistency control can inject reliable topology cues into text representations without contaminating the shared semantic space, and that the global-domain density ratio plus graph reliability estimation can effectively calibrate and downweight unreliable sparse samples.

What would settle it

An ablation study on a sparse TAG dataset that removes the consistency control from the structure-oriented reconstruction and measures whether cross-domain transfer performance and generalization gap reductions disappear.

Figures

Figures reproduced from arXiv: 2605.18579 by HaoPeng Zhang, Jiaqi Yu, Ruijie Wang, Xiao Wang, Xinyu Zhao, Yibo Ding, Yuhang Liu, Yuhan Wang, Ziwei Zhang.

**Figure 1.** Figure 1: LLM-asAligner. Full 10% 5% 3% 1% 0 2 4 6 8 10 12 Markers / 1K tokens 3.16 6.36 9.33 9.30 10.38 Markers / 1K tokens Uncertain summaries 50 60 70 80 90 100 57.4% Uncertain summaries (%) 77.5% 83.5% 86.6% 90.4% (a) Uncertainty vs. Sparsity Full Sparse 50 60 70 80 90 Graph-to-Text MRR (%) 76.66 81.00 +5.7% 65.52 62.62 -4.4% Semantic +Struct (b) Structural supplementation MRR R@1 R@5 R@10 0 20 40 60 80 100 T2N… view at source ↗

**Figure 3.** Figure 3: The overall framework of S2Aligner is shown in the figure above. It encodes sparse text-attributed graphs into content and structural components and applies latent reconstruction on the structural branch to reduce negative transfer from sparse text. We further introduce Sparseaware Cross-domain Risk Balancing, aligning multi-source domain risks via density estimation and reliability weighting to learn dom… view at source ↗

**Figure 5.** Figure 5: Performance-efficiency trade-off under varying text sparsity levels. Acad. Com. Web 45 50 55 60 65 70 Avg. Acc. (%) 67.0 53.4 62.5 67.0 54.2 60.9 67.1 54.8 62.8 Small 23M Mid 110M Large 0.6B [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 7.** Figure 7: The study results. 5 Related Work Graph–Text Alignment. Inspired by CLIP [24], contrastive dual-encoder frameworks have become the dominant paradigm for graph–text alignment on text-attributed graphs. Methods such as GraphCLIP [43], G2P2 [37], and GRENADE [15] construct graph-text positive pairs and map them into a shared space. However, they rely on fixed one-to-one alignment, limiting their ability to c… view at source ↗

**Figure 8.** Figure 8: Hyperparameter sensitivity analysis of α, µ and ν [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

**Figure 9.** Figure 9: Embedding visualization of Cora. Circles ( [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗

read the original abstract

Pre-training on text-attributed graphs (TAGs) is central to building transferable graph foundation models, where LLM-as-Aligner methods align graph and text representations through the semantic knowledge of large language models. However, these methods usually assume that node texts provide sufficient and reliable supervision, an assumption often violated in real-world sparse TAGs. When textual anchors are missing, noisy, or uneven across domains, graph structures must be aligned with weak semantic evidence, leading to unreliable structure-semantics correspondence and sparsity-induced transfer bias. This paper presents S2Aligner, a sparsity-aware and structure-enhanced LLM-as-Aligner framework for graph-text pre-training on sparse TAGs. The key idea is to decouple semantic alignment from structural modeling, allowing topology-aware signals to enhance alignment without contaminating the shared semantic space. Specifically, S2Aligner decomposes graph-text representations into semantic and structural components, uses structure-oriented reconstruction with consistency control to inject reliable topology cues into text representations, and suppresses inconsistent structural signals under textual sparsity. Moreover, S2Aligner introduces sparsity-aware cross-domain risk balancing, which calibrates domain risks through a global-domain density ratio and downweights unreliable sparse samples via graph reliability estimation. Theoretical analysis shows that this objective reduces cross-domain generalization gaps by controlling domain risk discrepancy. Extensive experiments across diverse graph domains, sparsity levels, and downstream tasks demonstrate that S2Aligner consistently outperforms existing baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

S2Aligner decouples semantic and structural signals with risk balancing to target sparsity in TAG pre-training, but the contamination risk in weak-text regimes needs direct verification.

read the letter

The main takeaway is that S2Aligner splits graph-text alignment into separate semantic and structural components, then adds consistency-controlled reconstruction plus a sparsity-aware risk balancer that uses domain density ratios and sample reliability scores. This directly tackles the common failure mode where missing or noisy node text breaks structure-semantics correspondence in real TAGs. The approach is a clear step beyond standard LLM-as-Aligner setups that assume strong textual anchors. The risk-balancing piece looks like the freshest part; it tries to down-weight unreliable sparse nodes and control cross-domain discrepancy in a way that prior work did not formalize. Experiments are claimed to show gains across domains, sparsity levels, and tasks, which is the right test bed. The theoretical claim that the objective shrinks generalization gaps by managing domain risk discrepancy is stated plainly. That said, the central assumption—that structure-oriented reconstruction injects topology without leaking graph-specific bias into the shared semantic space—remains the softest point, especially under the extreme sparsity the paper targets. When textual cues are absent, the consistency control may not fully isolate the signals, and the reliability estimator itself could be biased. The abstract gives no equations or ablation numbers, so it is impossible to judge how tightly the theory tracks the implementation or whether the balancing actually reduces the claimed gaps. The work is aimed at people building transferable graph models for sparse, text-attributed data. Anyone already working on LLM-graph alignment or foundation models for real-world graphs would find the concrete techniques useful to examine. It is coherent on its own terms and engages the relevant prior literature without obvious circularity. I would send it to peer review so referees can check the proofs, the estimator derivations, and whether the experiments isolate the sparse regime properly.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes S2Aligner, a sparsity-aware LLM-as-Aligner framework for pre-training on sparse text-attributed graphs. It decouples semantic alignment from structural modeling, employs structure-oriented reconstruction with consistency control to inject topology cues into text representations while suppressing inconsistent signals, and introduces sparsity-aware cross-domain risk balancing via a global-domain density ratio and graph reliability estimation to calibrate risks and downweight unreliable samples. Theoretical analysis claims this objective reduces cross-domain generalization gaps by controlling domain risk discrepancy, with experiments showing consistent outperformance over baselines across diverse domains, sparsity levels, and downstream tasks.

Significance. If the central claims hold, the work would meaningfully advance transferable graph foundation models by tackling sparsity-induced transfer bias in TAGs, a practical limitation of prior LLM-as-Aligner approaches. The explicit decoupling of semantic and structural components together with the risk-balancing mechanism represent targeted innovations that could inform future pre-training designs; the presence of a theoretical analysis linking the objective to generalization gaps is a constructive element that strengthens the contribution beyond purely empirical results.

major comments (2)

[§3.2] §3.2 (Sparsity-Aware Cross-Domain Risk Balancing): The claim that the global-domain density ratio combined with graph reliability estimation reliably controls domain risk discrepancy and reduces generalization gaps is load-bearing for the transferability results. Under the extreme sparsity regimes targeted by the paper, these estimators may themselves become biased when textual anchors are missing or noisy, undermining the calibration of unreliable samples. A formal bound or targeted ablation isolating estimator behavior at high sparsity levels is required to substantiate the theoretical analysis.
[§4.1] §4.1 (Structure-Oriented Reconstruction with Consistency Control): The central assumption that consistency control injects reliable topology cues into text representations without contaminating the shared semantic space is least secure precisely where the method is most needed. When node texts are absent or weak, the mechanism for suppressing inconsistent structural signals lacks sufficient verification (e.g., via quantitative leakage metrics or failure-case analysis), which directly affects the claimed structure-semantics correspondence and downstream transfer performance.

minor comments (2)

[Abstract] The abstract states that S2Aligner 'consistently outperforms existing baselines' but does not quantify the number of domains, sparsity ratios, or task types; adding these specifics would improve clarity without altering the technical content.
[§3] Notation for the density ratio and reliability estimator should be introduced with explicit definitions at first use to avoid ambiguity when readers cross-reference the theoretical and algorithmic sections.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address each major comment point by point below and indicate the revisions planned for the next manuscript version.

read point-by-point responses

Referee: [§3.2] §3.2 (Sparsity-Aware Cross-Domain Risk Balancing): The claim that the global-domain density ratio combined with graph reliability estimation reliably controls domain risk discrepancy and reduces generalization gaps is load-bearing for the transferability results. Under the extreme sparsity regimes targeted by the paper, these estimators may themselves become biased when textual anchors are missing or noisy, undermining the calibration of unreliable samples. A formal bound or targeted ablation isolating estimator behavior at high sparsity levels is required to substantiate the theoretical analysis.

Authors: We appreciate the referee's point on the potential sensitivity of the estimators. Section 3.2 already derives a bound showing that the proposed objective controls domain risk discrepancy under the stated assumptions on the density ratio and reliability estimates. To directly address behavior under extreme sparsity, we will add a targeted ablation in the revised manuscript that isolates estimator bias and variance at high sparsity levels (including cases with missing or noisy textual anchors) and reports their effect on risk calibration and downstream transfer. revision: yes
Referee: [§4.1] §4.1 (Structure-Oriented Reconstruction with Consistency Control): The central assumption that consistency control injects reliable topology cues into text representations without contaminating the shared semantic space is least secure precisely where the method is most needed. When node texts are absent or weak, the mechanism for suppressing inconsistent structural signals lacks sufficient verification (e.g., via quantitative leakage metrics or failure-case analysis), which directly affects the claimed structure-semantics correspondence and downstream transfer performance.

Authors: We agree that explicit verification of signal suppression is valuable in the sparse regime. The current experiments already show improved transfer across sparsity levels, supporting the overall design. In the revision we will add quantitative leakage metrics (e.g., semantic consistency scores before/after consistency control) together with a focused failure-case analysis for nodes with absent or weak text, to be placed in the updated §4.1 and experimental sections. revision: yes

Circularity Check

0 steps flagged

No significant circularity; theoretical claim presented as independent analysis of proposed objective

full rationale

The provided abstract and context describe S2Aligner as introducing a sparsity-aware objective with structure-oriented reconstruction and cross-domain risk balancing, followed by a separate theoretical analysis asserting that this objective reduces generalization gaps via domain risk discrepancy control. No equations, self-citations, or derivations are quoted that reduce the claimed result to a fitted parameter, renamed input, or self-referential definition by construction. The theoretical statement is framed as an analysis of the objective rather than a tautological restatement of its design, and the paper's performance claims rest on experimental comparisons rather than purely internal reductions. This is the common case of an independent supporting argument, yielding a normal non-finding of circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Based solely on the abstract, the central claim rests on domain assumptions about sparsity in TAGs and the effectiveness of the proposed decoupling and balancing mechanisms; no explicit free parameters or invented entities are named.

axioms (2)

domain assumption Node texts provide sufficient and reliable supervision in non-sparse TAGs, but this is often violated in real-world sparse settings
Stated directly in the abstract as the motivation for the work
ad hoc to paper Decoupling semantic alignment from structural modeling allows topology-aware signals to enhance alignment without contaminating the shared semantic space
Core design choice described in the abstract

pith-pipeline@v0.9.0 · 5817 in / 1386 out tokens · 35093 ms · 2026-05-21T07:50:40.714571+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

S2Aligner decomposes graph–text representations into semantic and structural components, uses structure-oriented reconstruction with consistency control... sparsity-aware cross-domain risk balancing... global-domain density ratio
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 3.1 (Sparse-aware weighted risk equalization)... density-ratio weighted risks are equal across all source domains

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 5 internal anchors

[1]

Invariant Risk Minimization

Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. Invariant risk mini- mization.arXiv preprint arXiv:1907.02893, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1907
[2]

ConGraT: Self-supervised contrastive pretraining for joint graph and text embeddings

William Brannon, Wonjune Kang, Suyash Fulay, Hang Jiang, Brandon Roy, Deb Roy, and Jad Kabbara. ConGraT: Self-supervised contrastive pretraining for joint graph and text embeddings. InProceedings of TextGraphs-17: Graph-based Methods for Natural Language Processing, pages 19–39, Bangkok, Thailand, August 2024. Association for Computational Linguistics

work page 2024
[3]

Multi-level graph convolutional networks for cross-platform anchor link prediction

Hongxu Chen, Hongzhi Yin, Xiangguo Sun, Tong Chen, Bogdan Gabrys, and Katarzyna Musial. Multi-level graph convolutional networks for cross-platform anchor link prediction. InProceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pages 1503–1511, 2020

work page 2020
[4]

Brainnet: Epileptic wave detection from seeg with hierarchical graph diffusion learning

Junru Chen, Yang Yang, Tao Yu, Yingying Fan, Xiaolong Mo, and Carl Yang. Brainnet: Epileptic wave detection from seeg with hierarchical graph diffusion learning. InProceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining, pages 2741–2751, 2022

work page 2022
[5]

Llaga: Large language and graph assistant.arXiv preprint arXiv:2402.08170, 2024

Runjin Chen, Tong Zhao, Ajay Jaiswal, Neil Shah, and Zhangyang Wang. Llaga: Large language and graph assistant.arXiv preprint arXiv:2402.08170, 2024

work page arXiv 2024
[6]

Text-space graph foundation models: Comprehensive benchmarks and new insights.Advances in Neural Information Processing Systems, 37:7464– 7492, 2024

Zhikai Chen, Haitao Mao, Jingzhe Liu, Yu Song, Bingheng Li, Wei Jin, Bahare Fatemi, Anton Tsitsulin, Bryan Perozzi, Hui Liu, et al. Text-space graph foundation models: Comprehensive benchmarks and new insights.Advances in Neural Information Processing Systems, 37:7464– 7492, 2024

work page 2024
[7]

Taglas: An atlas of text-attributed graph datasets in the era of large graph and language models.arXiv preprint arXiv:2406.14683, 2024

Jiarui Feng, Hao Liu, Lecheng Kong, Mingfang Zhu, Yixin Chen, and Muhan Zhang. Taglas: An atlas of text-attributed graph datasets in the era of large graph and language models.arXiv preprint arXiv:2406.14683, 2024

work page arXiv 2024
[8]

Domain-adversarial training of neural networks

Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario March, and Victor Lempitsky. Domain-adversarial training of neural networks. Journal of machine learning research, 17(59):1–35, 2016

work page 2016
[9]

Harnessing explanations: Llm-to-lm interpreter for enhanced text- attributed graph representation learning.arXiv preprint arXiv:2305.19523, 2023

Xiaoxin He, Xavier Bresson, Thomas Laurent, and Bryan Hooi. Explanations as features: Llm-based features for text-attributed graphs.CoRR, abs/2305.19523, 2023

work page arXiv 2023
[10]

Graphmae: Self-supervised masked graph autoencoders

Zhenyu Hou, Xiao Liu, Yukuo Cen, Yuxiao Dong, Hongxia Yang, Chunjie Wang, and Jie Tang. Graphmae: Self-supervised masked graph autoencoders. InProceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining, pages 594–604, 2022

work page 2022
[11]

Open graph benchmark: Datasets for machine learning on graphs

Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing systems, 33:22118–22133, 2020

work page 2020
[12]

Can gnn be good adapter for llms? InProceedings of the ACM web conference 2024, pages 893–904, 2024

Xuanwen Huang, Kaiqiao Han, Yang Yang, Dezheng Bao, Quanjin Tao, Ziwei Chai, and Qi Zhu. Can gnn be good adapter for llms? InProceedings of the ACM web conference 2024, pages 893–904, 2024

work page 2024
[13]

Out-of-distribution generalization via risk ex- trapolation (rex)

David Krueger, Ethan Caballero, Joern-Henrik Jacobsen, Amy Zhang, Jonathan Binas, Dinghuai Zhang, Remi Le Priol, and Aaron Courville. Out-of-distribution generalization via risk ex- trapolation (rex). InInternational conference on machine learning, pages 5815–5826. PMLR, 2021

work page 2021
[14]

Learning invariant graph representations for out-of-distribution generalization.Advances in Neural Information Processing Systems, 35:11828–11841, 2022

Haoyang Li, Ziwei Zhang, Xin Wang, and Wenwu Zhu. Learning invariant graph representations for out-of-distribution generalization.Advances in Neural Information Processing Systems, 35:11828–11841, 2022

work page 2022
[15]

GRENADE: Graph-centric language model for self-supervised representation learning on text-attributed graphs

Yichuan Li, Kaize Ding, and Kyumin Lee. GRENADE: Graph-centric language model for self-supervised representation learning on text-attributed graphs. In Houda Bouamor, Juan Pino, and Kalika Bali, editors,Findings of the Association for Computational Linguistics: EMNLP 2023, pages 2745–2757, Singapore, December 2023. Association for Computational Linguistics. 10

work page 2023
[16]

Zerog: Investigating cross- dataset zero-shot transferability in graphs

Yuhan Li, Peisong Wang, Zhixun Li, Jeffrey Xu Yu, and Jia Li. Zerog: Investigating cross- dataset zero-shot transferability in graphs. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 1725–1735, 2024

work page 2024
[17]

One for all: Towards training one graph model for all classifi- cation tasks.arXiv preprint arXiv:2310.00149, 2023

Hao Liu, Jiarui Feng, Lecheng Kong, Ningyue Liang, Dacheng Tao, Yixin Chen, and Muhan Zhang. One for all: Towards training one graph model for all classification tasks.arXiv preprint arXiv:2310.00149, 2023

work page arXiv 2023
[18]

Yu, and Chuan Shi

Jiawei Liu, Cheng Yang, Zhiyuan Lu, Junze Chen, Yibo Li, Mengmei Zhang, Ting Bai, Yuan Fang, Lichao Sun, Philip S. Yu, and Chuan Shi. Towards graph foundation models: A survey and beyond.CoRR, abs/2310.11829, 2023

work page arXiv 2023
[19]

Learning noise-resilient and transferable graph-text alignment via dynamic quality assessment.arXiv preprint arXiv:2510.19384, 2025

Yuhang Liu, Minglai Shao, Zengyi Wo, Yunlong Chu, Bing Hao, Shengzhong Liu, Ruijie Wang, and Jianxin Li. Learning noise-resilient and transferable graph-text alignment via dynamic quality assessment.arXiv preprint arXiv:2510.19384, 2025

work page arXiv 2025
[20]

Learning transferable features with deep adaptation networks

Mingsheng Long, Yue Cao, Jianmin Wang, and Michael Jordan. Learning transferable features with deep adaptation networks. InInternational conference on machine learning, pages 97–105. PMLR, 2015

work page 2015
[21]

Decoupled Weight Decay Regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[22]

Wiki-cs: A wikipedia-based benchmark for graph neural networks.arXiv preprint arXiv:2007.02901, 2020

Péter Mernyei and C˘at˘alina Cangea. Wiki-cs: A wikipedia-based benchmark for graph neural networks.arXiv preprint arXiv:2007.02901, 2020

work page arXiv 2007
[23]

Jonas Peters, Peter Bühlmann, and Nicolai Meinshausen. Causal inference by using invariant prediction: identification and confidence intervals.Journal of the Royal Statistical Society Series B: Statistical Methodology, 78(5):947–1012, 2016

work page 2016
[24]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021

work page 2021
[25]

Recipe for a general, powerful, scalable graph transformer.Advances in Neural Information Processing Systems, 35:14501–14515, 2022

Ladislav Rampášek, Michael Galkin, Vijay Prakash Dwivedi, Anh Tuan Luu, Guy Wolf, and Dominique Beaini. Recipe for a general, powerful, scalable graph transformer.Advances in Neural Information Processing Systems, 35:14501–14515, 2022

work page 2022
[26]

Sentence-bert: Sentence embeddings using siamese bert- networks

Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese bert- networks. InProceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP- IJCNLP), pages 3982–3992, 2019

work page 2019
[27]

A survey of large lan- guage models for graphs

Xubin Ren, Jiabin Tang, Dawei Yin, Nitesh Chawla, and Chao Huang. A survey of large lan- guage models for graphs. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 6616–6626, 2024

work page 2024
[28]

Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization

Shiori Sagawa, Pang Wei Koh, Tatsunori B Hashimoto, and Percy Liang. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization.arXiv preprint arXiv:1911.08731, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1911
[29]

Collective classification in network data.AI magazine, 29(3):93–93, 2008

Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi- Rad. Collective classification in network data.AI magazine, 29(3):93–93, 2008

work page 2008
[30]

Deep coral: Correlation alignment for deep domain adaptation

Baochen Sun and Kate Saenko. Deep coral: Correlation alignment for deep domain adaptation. InEuropean conference on computer vision, pages 443–450. Springer, 2016

work page 2016
[31]

Graphgpt: Graph instruction tuning for large language models

Jiabin Tang, Yuhao Yang, Wei Wei, Lei Shi, Lixin Su, Suqi Cheng, Dawei Yin, and Chao Huang. Graphgpt: Graph instruction tuning for large language models. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 491–500, 2024. 11

work page 2024
[32]

Bootstrapped representation learning on graphs

Shantanu Thakoor, Corentin Tallec, Mohammad Gheshlaghi Azar, Rémi Munos, Petar Veliˇckovi´c, and Michal Valko. Bootstrapped representation learning on graphs. InICLR 2021 workshop on geometrical and topological representation learning, pages 1–14. OpenReview. net, 2021

work page 2021
[33]

Visualizing data using t-sne.Journal of Machine Learning Research, 9(86):2579–2605, 2008

Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of Machine Learning Research, 9(86):2579–2605, 2008

work page 2008
[34]

Deep graph infomax.stat, 1050:21, 2018

Petar Velickovic, William Fedus, William L Hamilton, Pietro Liò, Yoshua Bengio, and R Devon Hjelm. Deep graph infomax.stat, 1050:21, 2018

work page 2018
[35]

Text Embeddings by Weakly-Supervised Contrastive Pre-training

Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, and Furu Wei. Text embeddings by weakly-supervised contrastive pre-training. arXiv preprint arXiv:2212.03533, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[36]

Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers.Advances in neural information processing systems, 33:5776–5788, 2020

Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, and Ming Zhou. Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers.Advances in neural information processing systems, 33:5776–5788, 2020

work page 2020
[37]

Augmenting low-resource text classification with graph-grounded pre-training and prompting

Zhihao Wen and Yuan Fang. Augmenting low-resource text classification with graph-grounded pre-training and prompting. InProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 506–516, 2023

work page 2023
[38]

Weighted risk invari- ance: Domain generalization under invariant feature shift.arXiv preprint arXiv:2407.18428, 2024

Gina Wong, Joshua Gleason, Rama Chellappa, Yoav Wald, and Anqi Liu. Weighted risk invari- ance: Domain generalization under invariant feature shift.arXiv preprint arXiv:2407.18428, 2024

work page arXiv 2024
[39]

Handling distribution shifts on graphs: An invariance perspective.arXiv preprint arXiv:2202.02466, 2022

Qitian Wu, Hengrui Zhang, Junchi Yan, and David Wipf. Handling distribution shifts on graphs: An invariance perspective.arXiv preprint arXiv:2202.02466, 2022

work page arXiv 2022
[40]

A comprehensive study on text-attributed graphs: Benchmarking and rethinking.Advances in Neural Information Processing Systems, 36:17238– 17264, 2023

Hao Yan, Chaozhuo Li, Ruosong Long, Chao Yan, Jianan Zhao, Wenwen Zhuang, Jun Yin, Peiyan Zhang, Weihao Han, Hao Sun, et al. A comprehensive study on text-attributed graphs: Benchmarking and rethinking.Advances in Neural Information Processing Systems, 36:17238– 17264, 2023

work page 2023
[41]

Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[42]

Deep graph contrastive representation learning.arXiv preprint arXiv:2006.04131, 2020

Yanqiao Zhu, Yichen Xu, Feng Yu, Qiang Liu, Shu Wu, and Liang Wang. Deep graph contrastive representation learning.arXiv preprint arXiv:2006.04131, 2020

work page arXiv 2006
[43]

Graphclip: Enhancing transferability in graph foundation models for text-attributed graphs

Yun Zhu, Haizhou Shi, Xiaotang Wang, Yongchao Liu, Yaoke Wang, Boci Peng, Chuntao Hong, and Siliang Tang. Graphclip: Enhancing transferability in graph foundation models for text-attributed graphs. InProceedings of the ACM on Web Conference 2025, pages 2183–2197, 2025. 12 A Appendix Overview The appendix is structured as follows: • Section B discusses lim...

work page 2025
[44]

Feature decomposition: inputs consist of invariantZand domain-specificV e

work page
[45]

Conditional independence:Y⊥ ⊥V e |Z

work page
[46]

Invariant label condition:p ei(Y|Z) =p ej(Y|Z)for alle i, ej ∈ E

work page
[47]

Proposition 1If the assumptions hold and the predictor f depends only on Z, then the weighted risks are equal across all domains

Non-degenerate density:p e(Z)>0for alle∈ E. Proposition 1If the assumptions hold and the predictor f depends only on Z, then the weighted risks are equal across all domains. Proof.The weighted risk for domaineis: Re = ZZZ YZV e pe(Z, Ve, Y)·ℓ(f(Z), Y)·r e(Z) dVedZdY. Marginalizing outV e: Re = ZZ YZ ℓ(f(Z), Y)·r e(Z)·p e(Z, Y) dZdY. Substitutingp e(Z, Y) ...

work page 2023
[48]

Text-only Language Models:These approaches independently process the raw sentences of each entity while completely ignoring topological connections. • SBERT[ 26]: We incorporate standard sentence embedding architectures to produce dense seman- tic representations, specifically evaluating theall-MiniLM-L6-v2andmulti-qa-distilbert-cos-v1 variants. • Qwen3 (...

work page
[49]

Text-Attributed Graph (TAG) & LLM-based Methods:These recent strategies strive to fuse structural patterns with the semantic comprehension capabilities of language models. • GraphGPT[ 31]: This architecture maps topological properties into discrete tokens and employs a dual-stage instruction fine-tuning process to synchronize GNN outputs with an LLM’s sem...

work page
[50]

Graph Self-Supervised Learning (SSL) Models:Focusing primarily on topology and dense features, these methodologies leverage traditional graph neural networks. • DGI[ 34]: A foundational self-supervised strategy that maximizes mutual information by distin- guishing authentic node-graph representations from artificially corrupted counterparts. 16 • GRACE[ 4...

work page
[51]

• GraphCLIP[ 43]: This framework relies heavily on contrastive alignment objectives

State-of-the-Art Graph-Text Aligners:Serving as our primary zero-shot competitors, these methods focus explicitly on synchronizing semantic and structural spaces. • GraphCLIP[ 43]: This framework relies heavily on contrastive alignment objectives. It synthesizes subgraph summaries to align embedding spaces, providing strong zero-shot graph–text alignment ...

work page

[1] [1]

Invariant Risk Minimization

Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. Invariant risk mini- mization.arXiv preprint arXiv:1907.02893, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1907

[2] [2]

ConGraT: Self-supervised contrastive pretraining for joint graph and text embeddings

William Brannon, Wonjune Kang, Suyash Fulay, Hang Jiang, Brandon Roy, Deb Roy, and Jad Kabbara. ConGraT: Self-supervised contrastive pretraining for joint graph and text embeddings. InProceedings of TextGraphs-17: Graph-based Methods for Natural Language Processing, pages 19–39, Bangkok, Thailand, August 2024. Association for Computational Linguistics

work page 2024

[3] [3]

Multi-level graph convolutional networks for cross-platform anchor link prediction

Hongxu Chen, Hongzhi Yin, Xiangguo Sun, Tong Chen, Bogdan Gabrys, and Katarzyna Musial. Multi-level graph convolutional networks for cross-platform anchor link prediction. InProceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pages 1503–1511, 2020

work page 2020

[4] [4]

Brainnet: Epileptic wave detection from seeg with hierarchical graph diffusion learning

Junru Chen, Yang Yang, Tao Yu, Yingying Fan, Xiaolong Mo, and Carl Yang. Brainnet: Epileptic wave detection from seeg with hierarchical graph diffusion learning. InProceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining, pages 2741–2751, 2022

work page 2022

[5] [5]

Llaga: Large language and graph assistant.arXiv preprint arXiv:2402.08170, 2024

Runjin Chen, Tong Zhao, Ajay Jaiswal, Neil Shah, and Zhangyang Wang. Llaga: Large language and graph assistant.arXiv preprint arXiv:2402.08170, 2024

work page arXiv 2024

[6] [6]

Text-space graph foundation models: Comprehensive benchmarks and new insights.Advances in Neural Information Processing Systems, 37:7464– 7492, 2024

Zhikai Chen, Haitao Mao, Jingzhe Liu, Yu Song, Bingheng Li, Wei Jin, Bahare Fatemi, Anton Tsitsulin, Bryan Perozzi, Hui Liu, et al. Text-space graph foundation models: Comprehensive benchmarks and new insights.Advances in Neural Information Processing Systems, 37:7464– 7492, 2024

work page 2024

[7] [7]

Taglas: An atlas of text-attributed graph datasets in the era of large graph and language models.arXiv preprint arXiv:2406.14683, 2024

Jiarui Feng, Hao Liu, Lecheng Kong, Mingfang Zhu, Yixin Chen, and Muhan Zhang. Taglas: An atlas of text-attributed graph datasets in the era of large graph and language models.arXiv preprint arXiv:2406.14683, 2024

work page arXiv 2024

[8] [8]

Domain-adversarial training of neural networks

Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario March, and Victor Lempitsky. Domain-adversarial training of neural networks. Journal of machine learning research, 17(59):1–35, 2016

work page 2016

[9] [9]

Harnessing explanations: Llm-to-lm interpreter for enhanced text- attributed graph representation learning.arXiv preprint arXiv:2305.19523, 2023

Xiaoxin He, Xavier Bresson, Thomas Laurent, and Bryan Hooi. Explanations as features: Llm-based features for text-attributed graphs.CoRR, abs/2305.19523, 2023

work page arXiv 2023

[10] [10]

Graphmae: Self-supervised masked graph autoencoders

Zhenyu Hou, Xiao Liu, Yukuo Cen, Yuxiao Dong, Hongxia Yang, Chunjie Wang, and Jie Tang. Graphmae: Self-supervised masked graph autoencoders. InProceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining, pages 594–604, 2022

work page 2022

[11] [11]

Open graph benchmark: Datasets for machine learning on graphs

Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing systems, 33:22118–22133, 2020

work page 2020

[12] [12]

Can gnn be good adapter for llms? InProceedings of the ACM web conference 2024, pages 893–904, 2024

Xuanwen Huang, Kaiqiao Han, Yang Yang, Dezheng Bao, Quanjin Tao, Ziwei Chai, and Qi Zhu. Can gnn be good adapter for llms? InProceedings of the ACM web conference 2024, pages 893–904, 2024

work page 2024

[13] [13]

Out-of-distribution generalization via risk ex- trapolation (rex)

David Krueger, Ethan Caballero, Joern-Henrik Jacobsen, Amy Zhang, Jonathan Binas, Dinghuai Zhang, Remi Le Priol, and Aaron Courville. Out-of-distribution generalization via risk ex- trapolation (rex). InInternational conference on machine learning, pages 5815–5826. PMLR, 2021

work page 2021

[14] [14]

Learning invariant graph representations for out-of-distribution generalization.Advances in Neural Information Processing Systems, 35:11828–11841, 2022

Haoyang Li, Ziwei Zhang, Xin Wang, and Wenwu Zhu. Learning invariant graph representations for out-of-distribution generalization.Advances in Neural Information Processing Systems, 35:11828–11841, 2022

work page 2022

[15] [15]

GRENADE: Graph-centric language model for self-supervised representation learning on text-attributed graphs

Yichuan Li, Kaize Ding, and Kyumin Lee. GRENADE: Graph-centric language model for self-supervised representation learning on text-attributed graphs. In Houda Bouamor, Juan Pino, and Kalika Bali, editors,Findings of the Association for Computational Linguistics: EMNLP 2023, pages 2745–2757, Singapore, December 2023. Association for Computational Linguistics. 10

work page 2023

[16] [16]

Zerog: Investigating cross- dataset zero-shot transferability in graphs

Yuhan Li, Peisong Wang, Zhixun Li, Jeffrey Xu Yu, and Jia Li. Zerog: Investigating cross- dataset zero-shot transferability in graphs. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 1725–1735, 2024

work page 2024

[17] [17]

One for all: Towards training one graph model for all classifi- cation tasks.arXiv preprint arXiv:2310.00149, 2023

Hao Liu, Jiarui Feng, Lecheng Kong, Ningyue Liang, Dacheng Tao, Yixin Chen, and Muhan Zhang. One for all: Towards training one graph model for all classification tasks.arXiv preprint arXiv:2310.00149, 2023

work page arXiv 2023

[18] [18]

Yu, and Chuan Shi

Jiawei Liu, Cheng Yang, Zhiyuan Lu, Junze Chen, Yibo Li, Mengmei Zhang, Ting Bai, Yuan Fang, Lichao Sun, Philip S. Yu, and Chuan Shi. Towards graph foundation models: A survey and beyond.CoRR, abs/2310.11829, 2023

work page arXiv 2023

[19] [19]

Learning noise-resilient and transferable graph-text alignment via dynamic quality assessment.arXiv preprint arXiv:2510.19384, 2025

Yuhang Liu, Minglai Shao, Zengyi Wo, Yunlong Chu, Bing Hao, Shengzhong Liu, Ruijie Wang, and Jianxin Li. Learning noise-resilient and transferable graph-text alignment via dynamic quality assessment.arXiv preprint arXiv:2510.19384, 2025

work page arXiv 2025

[20] [20]

Learning transferable features with deep adaptation networks

Mingsheng Long, Yue Cao, Jianmin Wang, and Michael Jordan. Learning transferable features with deep adaptation networks. InInternational conference on machine learning, pages 97–105. PMLR, 2015

work page 2015

[21] [21]

Decoupled Weight Decay Regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[22] [22]

Wiki-cs: A wikipedia-based benchmark for graph neural networks.arXiv preprint arXiv:2007.02901, 2020

Péter Mernyei and C˘at˘alina Cangea. Wiki-cs: A wikipedia-based benchmark for graph neural networks.arXiv preprint arXiv:2007.02901, 2020

work page arXiv 2007

[23] [23]

Jonas Peters, Peter Bühlmann, and Nicolai Meinshausen. Causal inference by using invariant prediction: identification and confidence intervals.Journal of the Royal Statistical Society Series B: Statistical Methodology, 78(5):947–1012, 2016

work page 2016

[24] [24]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021

work page 2021

[25] [25]

Recipe for a general, powerful, scalable graph transformer.Advances in Neural Information Processing Systems, 35:14501–14515, 2022

Ladislav Rampášek, Michael Galkin, Vijay Prakash Dwivedi, Anh Tuan Luu, Guy Wolf, and Dominique Beaini. Recipe for a general, powerful, scalable graph transformer.Advances in Neural Information Processing Systems, 35:14501–14515, 2022

work page 2022

[26] [26]

Sentence-bert: Sentence embeddings using siamese bert- networks

Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese bert- networks. InProceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP- IJCNLP), pages 3982–3992, 2019

work page 2019

[27] [27]

A survey of large lan- guage models for graphs

Xubin Ren, Jiabin Tang, Dawei Yin, Nitesh Chawla, and Chao Huang. A survey of large lan- guage models for graphs. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 6616–6626, 2024

work page 2024

[28] [28]

Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization

Shiori Sagawa, Pang Wei Koh, Tatsunori B Hashimoto, and Percy Liang. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization.arXiv preprint arXiv:1911.08731, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1911

[29] [29]

Collective classification in network data.AI magazine, 29(3):93–93, 2008

Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi- Rad. Collective classification in network data.AI magazine, 29(3):93–93, 2008

work page 2008

[30] [30]

Deep coral: Correlation alignment for deep domain adaptation

Baochen Sun and Kate Saenko. Deep coral: Correlation alignment for deep domain adaptation. InEuropean conference on computer vision, pages 443–450. Springer, 2016

work page 2016

[31] [31]

Graphgpt: Graph instruction tuning for large language models

Jiabin Tang, Yuhao Yang, Wei Wei, Lei Shi, Lixin Su, Suqi Cheng, Dawei Yin, and Chao Huang. Graphgpt: Graph instruction tuning for large language models. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 491–500, 2024. 11

work page 2024

[32] [32]

Bootstrapped representation learning on graphs

Shantanu Thakoor, Corentin Tallec, Mohammad Gheshlaghi Azar, Rémi Munos, Petar Veliˇckovi´c, and Michal Valko. Bootstrapped representation learning on graphs. InICLR 2021 workshop on geometrical and topological representation learning, pages 1–14. OpenReview. net, 2021

work page 2021

[33] [33]

Visualizing data using t-sne.Journal of Machine Learning Research, 9(86):2579–2605, 2008

Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of Machine Learning Research, 9(86):2579–2605, 2008

work page 2008

[34] [34]

Deep graph infomax.stat, 1050:21, 2018

Petar Velickovic, William Fedus, William L Hamilton, Pietro Liò, Yoshua Bengio, and R Devon Hjelm. Deep graph infomax.stat, 1050:21, 2018

work page 2018

[35] [35]

Text Embeddings by Weakly-Supervised Contrastive Pre-training

Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, and Furu Wei. Text embeddings by weakly-supervised contrastive pre-training. arXiv preprint arXiv:2212.03533, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[36] [36]

Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers.Advances in neural information processing systems, 33:5776–5788, 2020

Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, and Ming Zhou. Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers.Advances in neural information processing systems, 33:5776–5788, 2020

work page 2020

[37] [37]

Augmenting low-resource text classification with graph-grounded pre-training and prompting

Zhihao Wen and Yuan Fang. Augmenting low-resource text classification with graph-grounded pre-training and prompting. InProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 506–516, 2023

work page 2023

[38] [38]

Weighted risk invari- ance: Domain generalization under invariant feature shift.arXiv preprint arXiv:2407.18428, 2024

Gina Wong, Joshua Gleason, Rama Chellappa, Yoav Wald, and Anqi Liu. Weighted risk invari- ance: Domain generalization under invariant feature shift.arXiv preprint arXiv:2407.18428, 2024

work page arXiv 2024

[39] [39]

Handling distribution shifts on graphs: An invariance perspective.arXiv preprint arXiv:2202.02466, 2022

Qitian Wu, Hengrui Zhang, Junchi Yan, and David Wipf. Handling distribution shifts on graphs: An invariance perspective.arXiv preprint arXiv:2202.02466, 2022

work page arXiv 2022

[40] [40]

A comprehensive study on text-attributed graphs: Benchmarking and rethinking.Advances in Neural Information Processing Systems, 36:17238– 17264, 2023

Hao Yan, Chaozhuo Li, Ruosong Long, Chao Yan, Jianan Zhao, Wenwen Zhuang, Jun Yin, Peiyan Zhang, Weihao Han, Hao Sun, et al. A comprehensive study on text-attributed graphs: Benchmarking and rethinking.Advances in Neural Information Processing Systems, 36:17238– 17264, 2023

work page 2023

[41] [41]

Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[42] [42]

Deep graph contrastive representation learning.arXiv preprint arXiv:2006.04131, 2020

Yanqiao Zhu, Yichen Xu, Feng Yu, Qiang Liu, Shu Wu, and Liang Wang. Deep graph contrastive representation learning.arXiv preprint arXiv:2006.04131, 2020

work page arXiv 2006

[43] [43]

Graphclip: Enhancing transferability in graph foundation models for text-attributed graphs

Yun Zhu, Haizhou Shi, Xiaotang Wang, Yongchao Liu, Yaoke Wang, Boci Peng, Chuntao Hong, and Siliang Tang. Graphclip: Enhancing transferability in graph foundation models for text-attributed graphs. InProceedings of the ACM on Web Conference 2025, pages 2183–2197, 2025. 12 A Appendix Overview The appendix is structured as follows: • Section B discusses lim...

work page 2025

[44] [44]

Feature decomposition: inputs consist of invariantZand domain-specificV e

work page

[45] [45]

Conditional independence:Y⊥ ⊥V e |Z

work page

[46] [46]

Invariant label condition:p ei(Y|Z) =p ej(Y|Z)for alle i, ej ∈ E

work page

[47] [47]

Proposition 1If the assumptions hold and the predictor f depends only on Z, then the weighted risks are equal across all domains

Non-degenerate density:p e(Z)>0for alle∈ E. Proposition 1If the assumptions hold and the predictor f depends only on Z, then the weighted risks are equal across all domains. Proof.The weighted risk for domaineis: Re = ZZZ YZV e pe(Z, Ve, Y)·ℓ(f(Z), Y)·r e(Z) dVedZdY. Marginalizing outV e: Re = ZZ YZ ℓ(f(Z), Y)·r e(Z)·p e(Z, Y) dZdY. Substitutingp e(Z, Y) ...

work page 2023

[48] [48]

Text-only Language Models:These approaches independently process the raw sentences of each entity while completely ignoring topological connections. • SBERT[ 26]: We incorporate standard sentence embedding architectures to produce dense seman- tic representations, specifically evaluating theall-MiniLM-L6-v2andmulti-qa-distilbert-cos-v1 variants. • Qwen3 (...

work page

[49] [49]

Text-Attributed Graph (TAG) & LLM-based Methods:These recent strategies strive to fuse structural patterns with the semantic comprehension capabilities of language models. • GraphGPT[ 31]: This architecture maps topological properties into discrete tokens and employs a dual-stage instruction fine-tuning process to synchronize GNN outputs with an LLM’s sem...

work page

[50] [50]

Graph Self-Supervised Learning (SSL) Models:Focusing primarily on topology and dense features, these methodologies leverage traditional graph neural networks. • DGI[ 34]: A foundational self-supervised strategy that maximizes mutual information by distin- guishing authentic node-graph representations from artificially corrupted counterparts. 16 • GRACE[ 4...

work page

[51] [51]

• GraphCLIP[ 43]: This framework relies heavily on contrastive alignment objectives

State-of-the-Art Graph-Text Aligners:Serving as our primary zero-shot competitors, these methods focus explicitly on synchronizing semantic and structural spaces. • GraphCLIP[ 43]: This framework relies heavily on contrastive alignment objectives. It synthesizes subgraph summaries to align embedding spaces, providing strong zero-shot graph–text alignment ...

work page