GCIB: Graph Contrastive Information Bottleneck for Multi-Behavior Recommendation
Pith reviewed 2026-06-29 20:24 UTC · model grok-4.3
The pith
GCIB applies a graph information bottleneck to remove noise from auxiliary user behaviors and uses cross-behavior contrastive learning to strengthen sparse target-behavior signals in recommender systems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
GCIB employs a Graph Information Bottleneck objective that maximizes mutual information between the compressed auxiliary graph and the target-behavior graph while minimizing mutual information with the raw auxiliary graph; it then applies cross-behavior graph contrastive learning so that denoised auxiliary features and target features serve as complementary views for users and items, thereby producing noise-resilient and target-aware embeddings.
What carries the argument
Graph Information Bottleneck (GIB) objective that compresses auxiliary graphs to retain only target-aligned structure, paired with cross-behavior Graph Contrastive Learning (GCL) that contrasts denoised auxiliary and target features as positive pairs.
If this is right
- Auxiliary behavior graphs can be treated as noisy observations that are compressible to a target-relevant subgraph.
- Target-behavior representations become denser when contrasted against denoised auxiliary features.
- The same framework can be applied whenever one behavior type is sparse and others are abundant but partially irrelevant.
- Structural denoising and feature-level contrast operate independently and can be combined without mutual interference.
Where Pith is reading between the lines
- If the GIB compression step proves stable, the same principle could extend to other multi-view graph tasks such as multi-modal recommendation.
- The method implicitly assumes that target behavior provides a sufficient anchor for mutual-information calculations; relaxing this could require new regularization terms.
- Performance gains on sparse target behaviors suggest the framework may help cold-start users who have only auxiliary interactions recorded.
Load-bearing premise
The GIB objective can reliably select task-relevant structural patterns from auxiliary graphs even though it receives no direct labels from the target behavior.
What would settle it
An ablation that removes the GIB denoising term and measures whether recommendation accuracy on the target behavior drops or stays the same across multiple datasets.
Figures
read the original abstract
With the rapid emergence of multi-behavior learning in recommender systems, leveraging auxiliary user behaviors has proven effective for mitigating target-behavior data sparsity. Yet auxiliary behavior graphs frequently contain noisy or irrelevant interactions that do not align with the target task, impeding the learning of accurate user and item embeddings. Moreover, the scarcity of direct supervision from the target behavior complicates the extraction of informative collaborative signals. In this paper, we introduce GCIB (Graph Contrastive Information Bottleneck), a novel framework that denoises auxiliary behavior information and enriches target behavior representations at both the structural and feature levels. At the structural level, GCIB employs a Graph Information Bottleneck (GIB) objective to maximize mutual information between the denoised auxiliary graph and the target-behavior graph while minimizing mutual information with the original auxiliary graph. This formulation preserves task-relevant structural patterns and suppresses spurious interactions. At the feature level, we propose a cross-behavior Graph Contrastive Learning (GCL) scheme in which denoised auxiliary features and target-behavior features serve as complementary views for both users and items. By contrasting these views, GCIB enriches sparse target-behavior representations with semantics distilled from auxiliary behaviors. Extensive experiments demonstrate that GCIB outperforms state-of-the-art baselines, highlighting its ability to learn noise-resilient and target-aware representations for multi-behavior recommendation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes GCIB, a framework for multi-behavior recommendation that applies a Graph Information Bottleneck (GIB) objective at the structural level to denoise auxiliary behavior graphs—maximizing mutual information with the target-behavior graph while minimizing it with the original auxiliary graph—and a cross-behavior Graph Contrastive Learning (GCL) scheme at the feature level to enrich sparse target representations. It claims this yields noise-resilient, target-aware embeddings and superior empirical performance over state-of-the-art baselines.
Significance. If the claimed denoising and enrichment effects hold under rigorous validation, the work would offer a principled information-theoretic approach to leveraging noisy auxiliary behaviors in recommender systems, addressing a common practical challenge in multi-behavior settings.
major comments (2)
- [Abstract / Method] Abstract and method description: the central claim that the GIB objective 'preserves task-relevant structural patterns and suppresses spurious interactions' rests on the assumption that mutual-information estimation can reliably identify relevant structure from a sparse target-behavior graph alone. No analysis is supplied of the MI estimator (variational or otherwise), its sample complexity, or behavior under low target density, leaving the denoising guarantee unverified and load-bearing for the noise-resilience claim.
- [Abstract] Abstract: the paper states that auxiliary graphs 'frequently contain noisy or irrelevant interactions' yet provides no quantitative characterization of noise levels in the datasets or ablation isolating the contribution of the GIB term versus the GCL term, making it impossible to attribute performance gains specifically to the proposed bottleneck.
minor comments (1)
- [Abstract] Notation for the GIB objective (max I(denoised_aux, target) − I(denoised_aux, orig_aux)) should be formalized with explicit definitions of the random variables and the estimator used.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. We address each major comment below with clarifications and indicate the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract / Method] Abstract and method description: the central claim that the GIB objective 'preserves task-relevant structural patterns and suppresses spurious interactions' rests on the assumption that mutual-information estimation can reliably identify relevant structure from a sparse target-behavior graph alone. No analysis is supplied of the MI estimator (variational or otherwise), its sample complexity, or behavior under low target density, leaving the denoising guarantee unverified and load-bearing for the noise-resilience claim.
Authors: We acknowledge that the manuscript does not include a dedicated analysis of the MI estimator's sample complexity or its behavior specifically under low target density. The GIB formulation follows the standard variational lower-bound approach from prior information-bottleneck literature, and the empirical gains across four datasets with varying densities provide supporting evidence. In the revision we will add: (i) an explicit description of the variational MI estimator and its implementation, (ii) an empirical study of estimator stability across target densities, and (iii) a brief discussion of practical sample-complexity considerations. A formal denoising guarantee remains outside the scope of the current work, but the added analysis will make the assumptions more transparent. revision: yes
-
Referee: [Abstract] Abstract: the paper states that auxiliary graphs 'frequently contain noisy or irrelevant interactions' yet provides no quantitative characterization of noise levels in the datasets or ablation isolating the contribution of the GIB term versus the GCL term, making it impossible to attribute performance gains specifically to the proposed bottleneck.
Authors: We agree that quantitative noise characterization and component-wise ablations would strengthen attribution of the gains. The current experiments report overall performance improvements but do not isolate the GIB term or measure noise explicitly. In the revised manuscript we will include: (i) quantitative noise metrics (e.g., Jaccard overlap and KL divergence between auxiliary and target interaction sets) for each dataset, and (ii) an ablation table that removes the GIB objective or the GCL objective individually while keeping all other components fixed. These additions will allow readers to assess the individual contributions. revision: yes
Circularity Check
No significant circularity detected
full rationale
The provided abstract and description define GCIB via standard GIB (max I(denoised_aux, target) − I(denoised_aux, orig_aux)) and cross-behavior GCL objectives applied to multi-behavior graphs. No equations or steps are shown that reduce a claimed prediction or result to a fitted parameter or self-citation by construction; the framework is presented as an application of established MI and contrastive techniques without self-definitional or load-bearing self-referential elements. The derivation remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Mutual information between graphs can be effectively optimized via the GIB objective on recommendation graphs.
Reference graph
Works this paper leans on
-
[1]
Multi-behavior recommendation with cascading graph convolution networks
Cheng, Z., Han, S., Liu, F., Zhu, L., Gao, Z., and Peng, Y . Multi-behavior recommendation with cascading graph convolution networks. InProceedings of the ACM Web Conference 2023, pp. 1181–1189,
2023
-
[2]
Self-supervised graph neural networks for multi-behavior recommenda- tion
Gu, S., Wang, X., Shi, C., and Xiao, D. Self-supervised graph neural networks for multi-behavior recommenda- tion. InIJCAI, pp. 2052–2058,
2052
-
[3]
Buying or browsing?: Predicting real-time purchasing intent using attention-based deep network with multiple behavior
Guo, L., Hua, L., Jia, R., Zhao, B., Wang, X., and Cui, B. Buying or browsing?: Predicting real-time purchasing intent using attention-based deep network with multiple behavior. InProceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 1984–1992,
1984
-
[4]
Sun, F.-Y ., Hoffmann, J., Verma, V ., and Tang, J. Info- graph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximiza- tion.arXiv preprint arXiv:1908.01000,
arXiv 1908
-
[5]
Wang, M., Zheng, D., Ye, Z., Gan, Q., Li, M., Song, X., Zhou, J., Ma, C., Yu, L., Gai, Y ., et al. Deep graph library: A graph-centric, highly-performant package for graph neural networks.arXiv preprint arXiv:1909.01315, 2019a. Wang, X., He, X., Cao, Y ., Liu, M., and Chua, T.-S. Kgat: Knowledge graph attention network for recommendation. InProceedings of...
Pith/arXiv arXiv 1909
-
[6]
Yang, Y ., Wu, L., He, Z., Wu, Z., Hong, R., and Wang, M. Less is more: Information bottleneck denoised multime- dia recommendation.arXiv preprint arXiv:2501.12175,
-
[7]
Related Work Graph-based Recommendation.Collaborative filtering has long been a foundational technique in recommendation systems
12 GCIB: Graph Contrastive Information Bottleneck for Multi-Behavior Recommendation A. Related Work Graph-based Recommendation.Collaborative filtering has long been a foundational technique in recommendation systems. Traditional approaches, such as matrix factorization (Zhao et al., 2015), aim to learn user and item latent representations from observed in...
2015
-
[8]
To address noise from irrelevant graph connections, Graph Information Bottleneck (GIB) models extend the IB framework to structured graph data
applied IB to enhance graph embedding expressiveness, laying the foundation for subsequent advancements that integrate IB with contrastive learning—such as CGI (Wei et al., 2022), which uses graph augmentation to improve representation robustness by leveraging contrastive signals. To address noise from irrelevant graph connections, Graph Information Bottl...
2022
-
[9]
The information bottleneck loss weight β is selected from [1, 100], and the contrastive loss weight λ is selected from [0.01, 0.5]
The number of global encoding layers LG is selected from {1, 2, 3, 4}, and the number of message-passing layers LM for both auxiliary and target behaviors is selected from {1, 2, 3, 4}. The information bottleneck loss weight β is selected from [1, 100], and the contrastive loss weight λ is selected from [0.01, 0.5]. The temperature parameter for InfoNCE l...
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.