pith. machine review for the scientific record. sign in

arxiv: 2605.12685 · v1 · submitted 2026-05-12 · 💻 cs.LG · cs.AI

Recognition: 2 theorem links

· Lean Theorem

A Unified Perspective for Learning Graph Representations Across Multi-Level Abstractions

Authors on Pith no claims yet

Pith reviewed 2026-05-14 21:05 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords Graph Self-Supervised LearningMulti-level Graph RepresentationsContrastive LearningSelf-weighting MechanismNode ClassificationLink PredictionClustering
0
0 comments X

The pith

A unified contrastive framework learns graph representations by linearly combining node, proximity, cluster, and graph level signals with a parameter-free self-weighting mechanism.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that graph self-supervised learning can simultaneously target multiple abstraction levels by treating each level's positive-pair similarity and negative-pair dissimilarity as additive terms in a single loss. It further claims that a self-weighting rule, which boosts scores farthest from their targets without introducing any tunable coefficients, removes the need for manual balancing across tasks or levels. Experiments on standard benchmarks indicate consistent gains in node classification, clustering, and link prediction whether the model operates at one level or several. The approach therefore promises both higher performance and lower engineering cost compared with prior multi-task GSSL methods that require per-level hyperparameter search.

Core claim

A single contrastive objective integrates node-level, proximity-level, cluster-level, and graph-level information through a linear combination of similarity scores on positive pairs and dissimilarity scores on negative pairs; a parameter-free self-weighting term then adaptively emphasizes individual scores that deviate most from their ideal values, yielding representations that improve downstream tasks without level-specific tuning.

What carries the argument

Linear combination of per-level similarity and dissimilarity scores modulated by a parameter-free self-weighting rule that raises the influence of scores farthest from their targets.

If this is right

  • The same objective works without modification for both single-level and multi-level training regimes.
  • No grid search over task or level weights is required, removing a major source of computational overhead.
  • Representations improve simultaneously on classification, clustering, and link prediction tasks.
  • The weighting rule remains effective across real-world graphs of varying size and density.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The self-weighting rule may transfer directly to other contrastive settings that combine heterogeneous objectives, such as multi-modal or multi-view learning.
  • Because the method produces a single embedding space usable at every scale, downstream models could query the same vectors for both local and global tasks without retraining.
  • If the linear combination assumption holds, extending the framework to temporal or heterogeneous graphs would require only the definition of new positive-pair and negative-pair generators at each additional level.

Load-bearing premise

A linear combination of per-level similarity and dissimilarity scores, modulated by the proposed self-weighting, captures complementary multi-level information without destructive interference or the need for level-specific tuning.

What would settle it

Train the method on a graph dataset engineered so that strong performance at one abstraction level systematically degrades performance at another; if the combined objective then yields lower downstream accuracy than the best single-level baseline, the claim of non-destructive integration would be refuted.

Figures

Figures reproduced from arXiv: 2605.12685 by Abdoulaye Banir\'e Diallo, Mohamed Bouguessa, Mohamed Mahmoud Amar, Nairouz Mrabah.

Figure 1
Figure 1. Figure 1: Top row: Average performance of our single-level [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of our multi-task approach, LSW-ML-GSSL. The positive and negative similarity scores are denoted by [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The gradient magnitude of L w.r.t. s +  [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The gradient magnitude of L w.r.t. s −  [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The gradient magnitude of L w.r.t. s +  [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Execution time in seconds of LSW-ML-GSSL and [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Sensitivity of LSW-ML-GSSL to m and γ. the performance of LSW-ML-GSSL across all downstream tasks. As shown in [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Sensitivity of LSW-ML-GSSL to the neighborhood size and the number of clusters. [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
read the original abstract

Graph Self-Supervised Learning (GSSL) has emerged as a powerful paradigm for generating high-quality representations for graph-structured data. While multi-scale graph contrastive learning has received increasing attention, many existing methods still predominantly focus on a single graph abstraction level. To address this limitation, we propose a unified contrastive framework that can target node-level, proximity-level, cluster-level, and graph-level information and integrate them through a linear combination of similarity scores on positive pairs and dissimilarity scores (i.e., similarity scores on negative pairs). Furthermore, current approaches typically assign uniform penalty strengths to all examples, which reduces optimization flexibility and leads to ambiguous convergence status. To overcome this, we introduce a novel parameter-free fine-grained self-weighting mechanism that adaptively assigns weights to individual similarity and dissimilarity scores. The proposed mechanism emphasizes the scores that deviate significantly from their target values. Our approach not only enhances optimization flexibility but also eliminates the computational overhead of hyperparameter tuning in conventional multi-task GSSL methods. Comprehensive experiments on real-world datasets show that our methods consistently outperform state-of-the-art approaches across downstream tasks, including classification, clustering, and link prediction, in both single-level and multi-level scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims to introduce a unified contrastive framework for multi-level graph self-supervised learning that combines node-, proximity-, cluster-, and graph-level signals via a linear combination of similarity and dissimilarity scores, enhanced by a parameter-free self-weighting scheme that up-weights deviating scores. It asserts that this eliminates hyperparameter tuning and consistently outperforms SOTA on classification, clustering, and link prediction in single- and multi-level settings.

Significance. Should the central claims regarding the parameter-free nature and lack of destructive interference hold, this could simplify the design of multi-task GSSL methods and provide a more flexible optimization approach. The work addresses a relevant gap in focusing on single-level abstractions by offering a unified perspective.

major comments (2)
  1. [Abstract] The claim of eliminating hyperparameter tuning is load-bearing but rests on the assumption that the linear combination of per-level scores requires no inter-level balancing; the self-weighting is described as acting on individual scores, leaving open whether fixed coefficients suffice without performance sensitivity.
  2. [§3] The integration of multi-level information through linear combination lacks a demonstration that the fixed inter-level weights avoid destructive interference between abstraction levels, which is central to the unified framework's validity.
minor comments (1)
  1. [Abstract] Specific dataset names and baseline methods are not mentioned, reducing clarity on the experimental setup.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our unified multi-level graph contrastive framework. The concerns about hyperparameter sensitivity and potential destructive interference are well-taken; we provide clarifications below and will strengthen the manuscript with additional empirical analysis to demonstrate robustness.

read point-by-point responses
  1. Referee: [Abstract] The claim of eliminating hyperparameter tuning is load-bearing but rests on the assumption that the linear combination of per-level scores requires no inter-level balancing; the self-weighting is described as acting on individual scores, leaving open whether fixed coefficients suffice without performance sensitivity.

    Authors: We appreciate this observation. Our framework employs fixed inter-level coefficients (uniformly set to 1) in the linear combination of per-level contrastive terms, with the parameter-free self-weighting applied individually to each similarity and dissimilarity score to emphasize deviations from targets. This design avoids the need for inter-level hyperparameter search. To substantiate that fixed coefficients suffice without sensitivity, we will add an ablation study in the revised manuscript varying the inter-level coefficients across a range (e.g., 0.5 to 2.0) and reporting downstream task performance, which we expect to show minimal variance due to the adaptive weighting. revision: yes

  2. Referee: [§3] The integration of multi-level information through linear combination lacks a demonstration that the fixed inter-level weights avoid destructive interference between abstraction levels, which is central to the unified framework's validity.

    Authors: We agree that explicit validation strengthens the central claim. In §3, the total objective is formulated as a sum of weighted per-level terms, where the self-weighting mechanism (detailed in Eq. 4-6) dynamically scales contributions based on score deviation, thereby reducing the risk of one level dominating or interfering destructively with others. We will revise §3 to include a new paragraph with empirical evidence: plots of per-level weight distributions during training and an analysis showing stable convergence across levels on representative datasets, confirming the absence of destructive interference. revision: yes

Circularity Check

0 steps flagged

No load-bearing circularity; framework presented as direct construction

full rationale

The paper defines a linear combination of node-, proximity-, cluster-, and graph-level similarity/dissimilarity scores, then modulates each term with an explicitly parameter-free self-weighting that up-weights deviations from targets. No equation reduces the claimed elimination of hyperparameter tuning to a fitted coefficient or to a self-citation chain; the inter-level weights are presented as fixed by the construction itself rather than derived from data. Empirical results on downstream tasks are offered as external validation, not as tautological confirmation of the weighting scheme. This yields at most minor self-citation risk without affecting the central claim.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on standard contrastive-learning assumptions for graphs; no new fitted parameters or invented entities are introduced in the abstract.

axioms (2)
  • domain assumption Multi-level graph information can be integrated via linear combination of per-level contrastive scores
    Stated as the core of the unified framework
  • domain assumption Adaptive weighting based on deviation from target values improves optimization without introducing new hyperparameters
    Central to the parameter-free self-weighting claim

pith-pipeline@v0.9.0 · 5518 in / 1269 out tokens · 52292 ms · 2026-05-14T21:05:19.597474+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

63 extracted references · 3 canonical work pages · 1 internal anchor

  1. [1]

    Re- thinking deep clustering paradigms: Self-supervision is all you need,

    A. Shaheen, N. Mrabah, R. Ksantini, and A. Alqaddoumi, “Re- thinking deep clustering paradigms: Self-supervision is all you need,”Neural Networks, vol. 181, p. 106773, 2025

  2. [2]

    Big self-supervised models are strong semi-supervised learners,

    T. Chen, S. Kornblith, K. Swersky, M. Norouzi, and G. E. Hinton, “Big self-supervised models are strong semi-supervised learners,” inAdvances in neural information processing systems (NeurIPS), vol. 33, 2020, pp. 22 243–22 255

  3. [3]

    Toward convex manifolds: A geometric perspective for deep graph clus- tering of single-cell rna-seq data

    N. Mrabah, M. M. Amar, M. Bouguessa, and A. B. Diallo, “Toward convex manifolds: A geometric perspective for deep graph clus- tering of single-cell rna-seq data.” inInternational Joint Conference on Artificial Intelligence (IJCAI), 2023, pp. 4855–4863

  4. [4]

    Exploring the interaction between local and global latent configurations for clustering single-cell rna-seq: a unified per- spective,

    ——, “Exploring the interaction between local and global latent configurations for clustering single-cell rna-seq: a unified per- spective,” inAssociation for the Advancement of Artificial Intelligence (AAAI), vol. 37, no. 8, 2023, pp. 9235–9242

  5. [5]

    Graph self-supervised learning: A survey,

    Y. Liu, M. Jin, S. Pan, C. Zhou, Y. Zheng, F. Xia, and S. Y. Philip, “Graph self-supervised learning: A survey,”IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 35, no. 6, pp. 5879– 5900, 2022

  6. [6]

    Deep graph contrastive representation learning,

    Y. Zhu, Y. Xu, F. Yu, Q. Liu, S. Wu, and L. Wang, “Deep graph contrastive representation learning,” inInternational Conference on Machine Learning (ICML Workshop on Graph Representation Learning and Beyond), 2020

  7. [7]

    Large-scale representation learning on graphs via bootstrapping,

    S. Thakoor, C. Tallec, M. G. Azar, M. Azabou, E. L. Dyer, R. Munos, P . Veliˇckovi´c, and M. Valko, “Large-scale representation learning on graphs via bootstrapping,” inInternational Conference on Learn- ing Representations (ICLR), 2022

  8. [8]

    node2vec: Scalable feature learning for networks,

    A. Grover and J. Leskovec, “node2vec: Scalable feature learning for networks,” inACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2016, pp. 855–864

  9. [9]

    Variational graph auto-encoders

    T. N. Kipf and M. Welling, “Variational graph auto-encoders.” inAdvances in Neural Information Processing Systems workshop (NeurIPS workshop), 2016, pp. 1–3

  10. [10]

    A contrastive vari- ational graph auto-encoder for node clustering,

    N. Mrabah, M. Bouguessa, and R. Ksantini, “A contrastive vari- ational graph auto-encoder for node clustering,”Pattern Recogni- tion, vol. 149, p. 110209, 2024

  11. [11]

    Beyond the evidence lower bound: Dual variational graph auto-encoders for node clustering,

    ——, “Beyond the evidence lower bound: Dual variational graph auto-encoders for node clustering,” inProceedings of the 2023 SIAM International Conference on Data Mining (SDM), 2023, pp. 100–108

  12. [12]

    Deep graph infomax,

    P . Velickovic, W. Fedus, W. L. Hamilton, P . Li `o, Y. Bengio, and R. D. Hjelm, “Deep graph infomax,” inInternational Conference on Learning Representations (ICLR), 2019

  13. [13]

    Contrastive multi-view rep- resentation learning on graphs,

    K. Hassani and A. H. Khasahmadi, “Contrastive multi-view rep- resentation learning on graphs,” inInternational Conference on Machine Learning (ICML), 2020, pp. 4116–4126

  14. [14]

    Automated self-supervised learning for graphs,

    W. Jin, X. Liu, X. Zhao, Y. Ma, N. Shah, and J. Tang, “Automated self-supervised learning for graphs,” inInternational Conference on Learning Representations (ICLR), 2022

  15. [15]

    Multi-task self-supervised graph neural networks en- able stronger task generalization,

    M. Ju, T. Zhao, Q. Wen, W. Yu, N. Shah, Y. Ye, and C. Zhang, “Multi-task self-supervised graph neural networks en- able stronger task generalization,” inInternational Conference on Learning Representations (ICLR), 2023

  16. [16]

    Circle loss: A unified perspective of pair similarity op- timization,

    Y. Sun, C. Cheng, Y. Zhang, C. Zhang, L. Zheng, Z. Wang, and Y. Wei, “Circle loss: A unified perspective of pair similarity op- timization,” inIEEE/CVF conference on computer vision and pattern recognition (CVPR), 2020, pp. 6398–6407

  17. [17]

    A simple framework for contrastive learning of visual representations,

    T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in Proceedings of the 37th International Conference on Machine Learning (ICML), vol. 119, 2020, pp. 1597–1607

  18. [18]

    Graph contrastive learning with augmentations,

    Y. You, T. Chen, Y. Sui, T. Chen, Z. Wang, and Y. Shen, “Graph contrastive learning with augmentations,” inAdvances in neural information processing systems (NeurIPS), vol. 33, 2020, pp. 5812– 5823

  19. [19]

    Representation Learning with Contrastive Predictive Coding

    A. v. d. Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive coding,”arXiv preprint arXiv:1807.03748, 2018

  20. [20]

    Graph contrastive learning with adaptive augmentation,

    Y. Zhu, Y. Xu, F. Yu, Q. Liu, S. Wu, and L. Wang, “Graph contrastive learning with adaptive augmentation,” inInternational World Wide Web Conference (WWW), 2021, pp. 2069–2080

  21. [21]

    Deepwalk: Online learning of social representations,

    B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: Online learning of social representations,” inACM SIGKDD Conference on Knowl- edge Discovery and Data Mining (KDD), 2014, pp. 701–710

  22. [22]

    Mgae: Marginal- ized graph autoencoder for graph clustering,

    C. Wang, S. Pan, G. Long, X. Zhu, and J. Jiang, “Mgae: Marginal- ized graph autoencoder for graph clustering,” inACM Conference on Information and Knowledge Management (CIKM), 2017, pp. 889– 898

  23. [23]

    What’s behind the mask: Understanding masked graph modeling for graph autoencoders,

    J. Li, R. Wu, W. Sun, L. Chen, S. Tian, L. Zhu, C. Meng, Z. Zheng, and W. Wang, “What’s behind the mask: Understanding masked graph modeling for graph autoencoders,” inACM SIGKDD Con- ference on Knowledge Discovery and Data Mining (KDD), 2023, pp. 1268–1279

  24. [24]

    Slaps: Self-supervision improves structure learning for graph neural networks,

    B. Fatemi, L. El Asri, and S. M. Kazemi, “Slaps: Self-supervision improves structure learning for graph neural networks,”Advances in Neural Information Processing Systems (NeurIPS), vol. 34, pp. 22 667–22 681, 2021

  25. [25]

    Latent graph inference with limited supervision,

    J. Lu, Y. Xu, H. Wang, Y. Bai, and Y. Fu, “Latent graph inference with limited supervision,”Advances in Neural Information Process- ing Systems (NeurIPS), vol. 36, pp. 32 521–32 538, 2023

  26. [26]

    Neighbor con- trastive learning on learnable graph augmentation,

    X. Shen, D. Sun, S. Pan, X. Zhou, and L. T. Yang, “Neighbor con- trastive learning on learnable graph augmentation,” inProceedings of the AAAI conference on artificial intelligence, vol. 37, no. 8, 2023, pp. 9782–9791

  27. [27]

    Attributed graph clustering: a deep attentional embedding ap- proach,

    C. Wang, S. Pan, R. Hu, G. Long, J. Jiang, and C. Zhang, “Attributed graph clustering: a deep attentional embedding ap- proach,” inInternational Joint Conference on Artificial Intelligence (IJCAI), 2019, p. 3670–3676

  28. [28]

    Collaborative graph convolutional net- works: Unsupervised learning meets semi-supervised learning,

    B. Hui, P . Zhu, and Q. Hu, “Collaborative graph convolutional net- works: Unsupervised learning meets semi-supervised learning,” inAssociation for the Advancement of Artificial Intelligence (AAAI), vol. 34, no. 04, 2020, pp. 4215–4222

  29. [29]

    Talk like a graph: Encoding graphs for large language models,

    B. Fatemi, J. Halcrow, and B. Perozzi, “Talk like a graph: Encoding graphs for large language models,” inThe Twelfth International Conference on Learning Representations (ICLR), 2024

  30. [30]

    Scale-free graph- language models,

    J. Lu, Y. Liu, Y. Zhang, and Y. Fu, “Scale-free graph- language models,” inThe Thirteenth International Conference on Learning Representations (ICLR), 2025. [Online]. Available: https://openreview.net/forum?id=nFcgay1Yo9

  31. [31]

    Sub- graph contrast for scalable self-supervised graph representation learning,

    Y. Jiao, Y. Xiong, J. Zhang, Y. Zhang, T. Zhang, and Y. Zhu, “Sub- graph contrast for scalable self-supervised graph representation learning,” in2020 IEEE international conference on data mining (ICDM). IEEE, 2020, pp. 222–231

  32. [32]

    Anemone: Graph anomaly detection with multi-scale contrastive learning,

    M. Jin, Y. Liu, Y. Zheng, L. Chi, Y.-F. Li, and S. Pan, “Anemone: Graph anomaly detection with multi-scale contrastive learning,” inProceedings of the 30th ACM international conference on information & knowledge management (CIKM), 2021, pp. 3122–3126

  33. [33]

    Multi-scale contrastive siamese networks for self-supervised graph representation learning,

    M. Jin, Y. Zheng, Y.-F. Li, C. Gong, C. Zhou, and S. Pan, “Multi-scale contrastive siamese networks for self-supervised graph representation learning,” inProceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, Z.-H. Zhou, Ed. International Joint Conferences on Artificial Intelligence Organization, 8 2021, pp. 1477...

  34. [34]

    Multi-scale subgraph contrastive learning,

    Y. Liu, Y. Zhao, X. Wang, L. Geng, and Z. Xiao, “Multi-scale subgraph contrastive learning,” inProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, E. Elkind, Ed. International Joint Conferences on Artificial Intelligence Organization, 8 2023, pp. 2215–2223, main Track. [Online]. Available: https://doi.org/...

  35. [35]

    Contrastive representation learning based on multiple node-centered subgraphs,

    D. Li, W. Wang, M. Shao, and C. Zhao, “Contrastive representation learning based on multiple node-centered subgraphs,” inProceed- ings of the 32nd ACM International Conference on Information and Knowledge Management (CIKM), 2023, pp. 1338–1347

  36. [36]

    Anomaly detection in attributed networks via local multi-order contrastive learning and global topology awareness,

    M. J. Li, G. Zhao, S. Huang, Q. Zhang, J. Liu, M. Li, and J. Li, “Anomaly detection in attributed networks via local multi-order contrastive learning and global topology awareness,”Neurocom- puting, p. 130829, 2025

  37. [37]

    Multi-task learning as multi-objective optimization,

    O. Sener and V . Koltun, “Multi-task learning as multi-objective optimization,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 31, 2018, pp. 527–538

  38. [38]

    Multi-task learning with user prefer- ences: Gradient descent with controlled ascent in pareto optimiza- tion,

    D. Mahapatra and V . Rajan, “Multi-task learning with user prefer- ences: Gradient descent with controlled ascent in pareto optimiza- tion,” inInternational Conference on Machine Learning (ICML), 2020, pp. 6597–6607

  39. [39]

    Profiling pareto front with multi- objective stein variational gradient descent,

    X. Liu, X. Tong, and qiang liu, “Profiling pareto front with multi- objective stein variational gradient descent,” inAdvances in Neural Information Processing Systems (NeurIPS), 2021

  40. [40]

    Learning the pareto front with hypernetworks,

    A. Navon, A. Shamsian, G. Chechik, and E. Fetaya, “Learning the pareto front with hypernetworks,” inInternational Conference on Learning Representations (ICLR), 2021

  41. [41]

    Gradient surgery for multi-task learning,

    T. Yu, S. Kumar, A. Gupta, S. Levine, K. Hausman, and C. Finn, “Gradient surgery for multi-task learning,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 33, 2020, pp. 5824– 5836

  42. [42]

    Multi-task self-supervised visual learning,

    C. Doersch and A. Zisserman, “Multi-task self-supervised visual learning,” inIEEE/CVF conference on computer vision and pattern recognition (CVPR), 2017, pp. 2051–2060

  43. [43]

    Anomaly detection in video via self-supervised and multi-task learning,

    M.-I. Georgescu, A. Barbalau, R. T. Ionescu, F. S. Khan, M. Popescu, and M. Shah, “Anomaly detection in video via self-supervised and multi-task learning,” inIEEE/CVF conference on computer vision and pattern recognition (CVPR), 2021, pp. 12 742–12 752

  44. [44]

    Learning modality-specific representations with self-supervised multi-task learning for mul- timodal sentiment analysis,

    W. Yu, H. Xu, Z. Yuan, and J. Wu, “Learning modality-specific representations with self-supervised multi-task learning for mul- timodal sentiment analysis,” inAssociation for the Advancement of Artificial Intelligence (AAAI), vol. 35, no. 12, 2021, pp. 10 790–10 797

  45. [45]

    Every node is different: Dynamically fusing self-supervised tasks for attributed graph clustering,

    P . Zhu, Q. Wang, Y. Wang, J. Li, and Q. Hu, “Every node is different: Dynamically fusing self-supervised tasks for attributed graph clustering,” inAssociation for the Advancement of Artificial Intelligence (AAAI), vol. 38, no. 15, 2024, pp. 17 184–17 192

  46. [46]

    Exploring correlations of self-supervised tasks for graphs,

    T. Fang, W. Zhou, Y. Sun, K. Han, L. Ma, and Y. Yang, “Exploring correlations of self-supervised tasks for graphs,” inInternational Conference on Machine Learning (ICML), 2024

  47. [47]

    Decoupling weighing and selecting for integrating multiple graph pre-training tasks,

    T. Fan, L. Wu, Y. Huang, H. Lin, C. Tan, Z. Gao, and S. Z. Li, “Decoupling weighing and selecting for integrating multiple graph pre-training tasks,” inInternational Conference on Learning Representations (ICLR), 2024

  48. [48]

    Multi-task learning using uncertainty to weigh losses for scene geometry and semantics,

    A. Kendall, Y. Gal, and R. Cipolla, “Multi-task learning using uncertainty to weigh losses for scene geometry and semantics,” inProceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2018, pp. 7482–7491

  49. [49]

    Grad- norm: Gradient normalization for adaptive loss balancing in deep multitask networks,

    Z. Chen, V . Badrinarayanan, C.-Y. Lee, and A. Rabinovich, “Grad- norm: Gradient normalization for adaptive loss balancing in deep multitask networks,” inInternational conference on machine learning (ICML). PMLR, 2018, pp. 794–803

  50. [50]

    Conflict-averse gradient descent for multi-task learning,

    B. Liu, X. Liu, X. Jin, P . Stone, and Q. Liu, “Conflict-averse gradient descent for multi-task learning,”Advances in Neural Information Processing Systems (NeurIPS), vol. 34, pp. 18 878–18 890, 2021

  51. [51]

    Adatask: A task-aware adaptive learning rate approach to multi-task learning,

    E. Yang, J. Pan, X. Wang, H. Yu, L. Shen, X. Chen, L. Xiao, J. Jiang, and G. Guo, “Adatask: A task-aware adaptive learning rate approach to multi-task learning,” inProceedings of the AAAI conference on artificial intelligence, vol. 37, no. 9, 2023, pp. 10 745– 10 753

  52. [52]

    Improvable gap balancing for multi-task learning,

    Y. Dai, N. Fei, and Z. Lu, “Improvable gap balancing for multi-task learning,” inUncertainty in Artificial Intelligence. PMLR, 2023, pp. 496–506

  53. [53]

    Automatic auxiliary task selection and adaptive weighting boost molecular property prediction,

    Z. Zhong and D. Mottin, “Automatic auxiliary task selection and adaptive weighting boost molecular property prediction,” in Advances in neural information processing systems (NeurIPS), 2025

  54. [54]

    Generative and contrastive paradigms are complementary for graph self-supervised learning,

    Y. Wang, X. Yan, C. Hu, Q. Xu, C. Yang, F. Fu, W. Zhang, H. Wang, B. Du, and J. Jiang, “Generative and contrastive paradigms are complementary for graph self-supervised learning,” in2024 IEEE 40th International Conference on Data Engineering (ICDE). IEEE, 2024, pp. 3364–3378

  55. [55]

    Re- thinking graph auto-encoder models for attributed graph cluster- ing,

    N. Mrabah, M. Bouguessa, M. F. Touati, and R. Ksantini, “Re- thinking graph auto-encoder models for attributed graph cluster- ing,”IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 35, no. 9, pp. 9037–9053, 2022

  56. [56]

    Dg- cluster: A neural framework for attributed graph clustering via modularity maximization,

    A. Bhowmick, M. Kosan, Z. Huang, A. Singh, and S. Medya, “Dg- cluster: A neural framework for attributed graph clustering via modularity maximization,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 10, 2024, pp. 11 069–11 077

  57. [57]

    Local structure- aware graph contrastive representation learning,

    K. Yang, Y. Liu, Z. Zhao, P . Ding, and W. Zhao, “Local structure- aware graph contrastive representation learning,”Neural Networks, vol. 172, p. 106083, 2024

  58. [58]

    Automat- ing the construction of internet portals with machine learning,

    A. K. McCallum, K. Nigam, J. Rennie, and K. Seymore, “Automat- ing the construction of internet portals with machine learning,” Information Retrieval, vol. 3, pp. 127–163, 2000

  59. [59]

    Citeseer: An automatic citation indexing system,

    C. L. Giles, K. D. Bollacker, and S. Lawrence, “Citeseer: An automatic citation indexing system,” inACM Conference on Digital Libraries (DL), 1998, pp. 89–98

  60. [60]

    Collective classification in network data,

    P . Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. Eliassi- Rad, “Collective classification in network data,”AI magazine, vol. 29, no. 3, pp. 93–93, 2008

  61. [61]

    Arnetminer: extraction and mining of academic social networks,

    J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su, “Arnetminer: extraction and mining of academic social networks,” inACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2008, pp. 990–998

  62. [62]

    Pit- falls of graph neural network evaluation,

    O. Shchur, M. Mumme, A. Bojchevski, and S. G ¨unnemann, “Pit- falls of graph neural network evaluation,” inInternational Confer- ence on Learning Representations (ICLR Workshop on the pitfalls of limited data and computation for Trustworthy ML), 2023

  63. [63]

    Deep metric learning using triplet net- work,

    E. Hoffer and N. Ailon, “Deep metric learning using triplet net- work,” inInternational workshop on similarity-based pattern recogni- tion. Springer, 2015, pp. 84–92. Mohamed Mahmoud Amaris a Ph.D. student at the University of Quebec at Montreal (UQAM). His research interests include graph representa- tion learning and multi-task learning. Nairouz Mrabahr...