GCIB: Graph Contrastive Information Bottleneck for Multi-Behavior Recommendation

Haipeng Yang; Jianxin Zhang; Lei Zhang; Likang Wu; Sangqi Zhu; Yuanyuan Ge; Zihao Chen

arxiv: 2605.25690 · v1 · pith:R7LLTHHGnew · submitted 2026-05-25 · 💻 cs.IR

GCIB: Graph Contrastive Information Bottleneck for Multi-Behavior Recommendation

Likang Wu , Zihao Chen , Jianxin Zhang , Sangqi Zhu , Yuanyuan Ge , Haipeng Yang , Lei Zhang This is my paper

Pith reviewed 2026-06-29 20:24 UTC · model grok-4.3

classification 💻 cs.IR

keywords multi-behavior recommendationgraph information bottleneckgraph contrastive learningauxiliary behavior denoisingrecommender systemsnoise-resilient embeddings

0 comments

The pith

GCIB applies a graph information bottleneck to remove noise from auxiliary user behaviors and uses cross-behavior contrastive learning to strengthen sparse target-behavior signals in recommender systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces GCIB to address noise in auxiliary behavior graphs that hurts target-behavior recommendation performance. It uses a Graph Information Bottleneck objective to keep only structural patterns from auxiliary graphs that align with the target task while discarding irrelevant interactions. At the feature level, it treats denoised auxiliary embeddings and target embeddings as positive views in a contrastive setup to enrich the scarcer target representations. Experiments show this combination produces more accurate user and item embeddings than prior multi-behavior methods. The approach matters because real-world platforms often have abundant but noisy auxiliary signals such as clicks or views alongside sparse purchases.

Core claim

GCIB employs a Graph Information Bottleneck objective that maximizes mutual information between the compressed auxiliary graph and the target-behavior graph while minimizing mutual information with the raw auxiliary graph; it then applies cross-behavior graph contrastive learning so that denoised auxiliary features and target features serve as complementary views for users and items, thereby producing noise-resilient and target-aware embeddings.

What carries the argument

Graph Information Bottleneck (GIB) objective that compresses auxiliary graphs to retain only target-aligned structure, paired with cross-behavior Graph Contrastive Learning (GCL) that contrasts denoised auxiliary and target features as positive pairs.

If this is right

Auxiliary behavior graphs can be treated as noisy observations that are compressible to a target-relevant subgraph.
Target-behavior representations become denser when contrasted against denoised auxiliary features.
The same framework can be applied whenever one behavior type is sparse and others are abundant but partially irrelevant.
Structural denoising and feature-level contrast operate independently and can be combined without mutual interference.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the GIB compression step proves stable, the same principle could extend to other multi-view graph tasks such as multi-modal recommendation.
The method implicitly assumes that target behavior provides a sufficient anchor for mutual-information calculations; relaxing this could require new regularization terms.
Performance gains on sparse target behaviors suggest the framework may help cold-start users who have only auxiliary interactions recorded.

Load-bearing premise

The GIB objective can reliably select task-relevant structural patterns from auxiliary graphs even though it receives no direct labels from the target behavior.

What would settle it

An ablation that removes the GIB denoising term and measures whether recommendation accuracy on the target behavior drops or stays the same across multiple datasets.

Figures

Figures reproduced from arXiv: 2605.25690 by Haipeng Yang, Jianxin Zhang, Lei Zhang, Likang Wu, Sangqi Zhu, Yuanyuan Ge, Zihao Chen.

**Figure 1.** Figure 1: Performance comparison of LightGCN under different graph usage settings on the Tmall dataset 1. Introduction In recent years, multi-behavior recommendation has emerged as a powerful solution to alleviate the data sparsity issue inherent in traditional single-behavior algorithms (Gu et al., 2022; Chen et al., 2020; Yan et al., 2023). Unlike single-behavior settings that rely solely on a single type of user … view at source ↗

**Figure 2.** Figure 2: Difference between traditional methods and GCIB in information utilization often overlook the fact that auxiliary behavior graphs contain task-irrelevant or noisy information, which misleads the learning of target behavior preferences. Second, the inherent sparsity of target behavior interactions weakens the supervision signal needed for robust representation learning. To empirically demonstrate these is… view at source ↗

**Figure 3.** Figure 3: The overall structure of the presented GCIB model enables GCNs to learn target-aware representations on the refined subgraph. To ensure semantic consistency across behaviors, GCIB introduces a contrastive alignment module that applies semantic-preserving augmentation to the target graph and leverages denoised auxiliary graphs as contrastive views for adaptive representation alignment. Finally, decoupled … view at source ↗

**Figure 4.** Figure 4: Analysis of global encoding layers LG and auxiliary/target behavior layers LM across multiple datasets [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Analysis of loss weights β and λ. (a)-(b) Tmall; (c)-(d) Taobao; (e)-(f) Yelp. • Information Bottleneck Loss Weight β. As shown in Figures 5a, 5c, and 5e, we examine the impact of β on Tmall, Taobao, and Yelp. The performance of GCIB first increases as β grows, then decreases when β becomes too large. An appropriate value of β helps the model extract more effective denoised signals from auxiliary behavior … view at source ↗

**Figure 6.** Figure 6: Visualization of user embedding distributions on Tmall: (a) initial state vs. (b) after GCIB training. This section provides supplementary visualization results to illustrate the embedding alignment between auxiliary and target 15 [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

read the original abstract

With the rapid emergence of multi-behavior learning in recommender systems, leveraging auxiliary user behaviors has proven effective for mitigating target-behavior data sparsity. Yet auxiliary behavior graphs frequently contain noisy or irrelevant interactions that do not align with the target task, impeding the learning of accurate user and item embeddings. Moreover, the scarcity of direct supervision from the target behavior complicates the extraction of informative collaborative signals. In this paper, we introduce GCIB (Graph Contrastive Information Bottleneck), a novel framework that denoises auxiliary behavior information and enriches target behavior representations at both the structural and feature levels. At the structural level, GCIB employs a Graph Information Bottleneck (GIB) objective to maximize mutual information between the denoised auxiliary graph and the target-behavior graph while minimizing mutual information with the original auxiliary graph. This formulation preserves task-relevant structural patterns and suppresses spurious interactions. At the feature level, we propose a cross-behavior Graph Contrastive Learning (GCL) scheme in which denoised auxiliary features and target-behavior features serve as complementary views for both users and items. By contrasting these views, GCIB enriches sparse target-behavior representations with semantics distilled from auxiliary behaviors. Extensive experiments demonstrate that GCIB outperforms state-of-the-art baselines, highlighting its ability to learn noise-resilient and target-aware representations for multi-behavior recommendation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GCIB combines GIB denoising anchored on the target graph with cross-behavior contrastive enrichment, but the sparse target signal creates a real risk that the bottleneck won't reliably separate noise from signal.

read the letter

The main point is that this paper takes the standard graph information bottleneck idea and applies it to clean auxiliary behavior graphs in recommendation by maximizing mutual information with the target graph while minimizing it with the raw auxiliary graph, then adds a cross-behavior contrastive learning step to pull in extra features for the sparse target side.

What the work does reasonably well is name the practical problem of noisy auxiliary interactions that don't align with the target task and sketch a two-level fix: structural denoising via GIB plus feature-level view contrast. The formulation is straightforward and the abstract claims it leads to noise-resilient representations, which matches a known pain point in multi-behavior settings.

The soft spot is exactly the one in the stress-test note. The target graph is acknowledged to be sparse, so the MI term that is supposed to select relevant structure has very few reliable anchor points. A variational or other estimator could easily either fail to suppress spurious edges or collapse the representation when it overfits the few observed target interactions. The abstract gives no implementation details on the estimator, no ablation isolating the GIB term, and no sensitivity checks on target sparsity, so the central claim rests on an untested assumption about signal strength.

This is for people working on graph-based multi-behavior recommenders who already follow GIB and GCL papers. A reader in that niche might pick up the specific combination and the experimental claims, but the paper does not look like it would move the broader field.

I would send it to peer review. The problem is real and the approach is coherent enough that referees can check whether the MI estimation actually works on the data they care about.

Referee Report

2 major / 1 minor

Summary. The paper proposes GCIB, a framework for multi-behavior recommendation that applies a Graph Information Bottleneck (GIB) objective at the structural level to denoise auxiliary behavior graphs—maximizing mutual information with the target-behavior graph while minimizing it with the original auxiliary graph—and a cross-behavior Graph Contrastive Learning (GCL) scheme at the feature level to enrich sparse target representations. It claims this yields noise-resilient, target-aware embeddings and superior empirical performance over state-of-the-art baselines.

Significance. If the claimed denoising and enrichment effects hold under rigorous validation, the work would offer a principled information-theoretic approach to leveraging noisy auxiliary behaviors in recommender systems, addressing a common practical challenge in multi-behavior settings.

major comments (2)

[Abstract / Method] Abstract and method description: the central claim that the GIB objective 'preserves task-relevant structural patterns and suppresses spurious interactions' rests on the assumption that mutual-information estimation can reliably identify relevant structure from a sparse target-behavior graph alone. No analysis is supplied of the MI estimator (variational or otherwise), its sample complexity, or behavior under low target density, leaving the denoising guarantee unverified and load-bearing for the noise-resilience claim.
[Abstract] Abstract: the paper states that auxiliary graphs 'frequently contain noisy or irrelevant interactions' yet provides no quantitative characterization of noise levels in the datasets or ablation isolating the contribution of the GIB term versus the GCL term, making it impossible to attribute performance gains specifically to the proposed bottleneck.

minor comments (1)

[Abstract] Notation for the GIB objective (max I(denoised_aux, target) − I(denoised_aux, orig_aux)) should be formalized with explicit definitions of the random variables and the estimator used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below with clarifications and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract / Method] Abstract and method description: the central claim that the GIB objective 'preserves task-relevant structural patterns and suppresses spurious interactions' rests on the assumption that mutual-information estimation can reliably identify relevant structure from a sparse target-behavior graph alone. No analysis is supplied of the MI estimator (variational or otherwise), its sample complexity, or behavior under low target density, leaving the denoising guarantee unverified and load-bearing for the noise-resilience claim.

Authors: We acknowledge that the manuscript does not include a dedicated analysis of the MI estimator's sample complexity or its behavior specifically under low target density. The GIB formulation follows the standard variational lower-bound approach from prior information-bottleneck literature, and the empirical gains across four datasets with varying densities provide supporting evidence. In the revision we will add: (i) an explicit description of the variational MI estimator and its implementation, (ii) an empirical study of estimator stability across target densities, and (iii) a brief discussion of practical sample-complexity considerations. A formal denoising guarantee remains outside the scope of the current work, but the added analysis will make the assumptions more transparent. revision: yes
Referee: [Abstract] Abstract: the paper states that auxiliary graphs 'frequently contain noisy or irrelevant interactions' yet provides no quantitative characterization of noise levels in the datasets or ablation isolating the contribution of the GIB term versus the GCL term, making it impossible to attribute performance gains specifically to the proposed bottleneck.

Authors: We agree that quantitative noise characterization and component-wise ablations would strengthen attribution of the gains. The current experiments report overall performance improvements but do not isolate the GIB term or measure noise explicitly. In the revised manuscript we will include: (i) quantitative noise metrics (e.g., Jaccard overlap and KL divergence between auxiliary and target interaction sets) for each dataset, and (ii) an ablation table that removes the GIB objective or the GCL objective individually while keeping all other components fixed. These additions will allow readers to assess the individual contributions. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The provided abstract and description define GCIB via standard GIB (max I(denoised_aux, target) − I(denoised_aux, orig_aux)) and cross-behavior GCL objectives applied to multi-behavior graphs. No equations or steps are shown that reduce a claimed prediction or result to a fitted parameter or self-citation by construction; the framework is presented as an application of established MI and contrastive techniques without self-definitional or load-bearing self-referential elements. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; ledger entries are therefore minimal and provisional.

axioms (1)

domain assumption Mutual information between graphs can be effectively optimized via the GIB objective on recommendation graphs.
Standard assumption in information bottleneck literature applied here to auxiliary behavior graphs.

pith-pipeline@v0.9.1-grok · 5782 in / 1157 out tokens · 30532 ms · 2026-06-29T20:24:35.572043+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

9 extracted references · 1 linked inside Pith

[1]

Multi-behavior recommendation with cascading graph convolution networks

Cheng, Z., Han, S., Liu, F., Zhu, L., Gao, Z., and Peng, Y . Multi-behavior recommendation with cascading graph convolution networks. InProceedings of the ACM Web Conference 2023, pp. 1181–1189,

2023
[2]

Self-supervised graph neural networks for multi-behavior recommenda- tion

Gu, S., Wang, X., Shi, C., and Xiao, D. Self-supervised graph neural networks for multi-behavior recommenda- tion. InIJCAI, pp. 2052–2058,

2052
[3]

Buying or browsing?: Predicting real-time purchasing intent using attention-based deep network with multiple behavior

Guo, L., Hua, L., Jia, R., Zhao, B., Wang, X., and Cui, B. Buying or browsing?: Predicting real-time purchasing intent using attention-based deep network with multiple behavior. InProceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 1984–1992,

1984
[4]

Info- graph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximiza- tion.arXiv preprint arXiv:1908.01000,

Sun, F.-Y ., Hoffmann, J., Verma, V ., and Tang, J. Info- graph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximiza- tion.arXiv preprint arXiv:1908.01000,

arXiv 1908
[5]

Deep graph library: A graph-centric, highly-performant package for graph neural networks.arXiv preprint arXiv:1909.01315, 2019a

Wang, M., Zheng, D., Ye, Z., Gan, Q., Li, M., Song, X., Zhou, J., Ma, C., Yu, L., Gai, Y ., et al. Deep graph library: A graph-centric, highly-performant package for graph neural networks.arXiv preprint arXiv:1909.01315, 2019a. Wang, X., He, X., Cao, Y ., Liu, M., and Chua, T.-S. Kgat: Knowledge graph attention network for recommendation. InProceedings of...

Pith/arXiv arXiv 1909
[6]

Less is more: Information bottleneck denoised multime- dia recommendation.arXiv preprint arXiv:2501.12175,

Yang, Y ., Wu, L., He, Z., Wu, Z., Hong, R., and Wang, M. Less is more: Information bottleneck denoised multime- dia recommendation.arXiv preprint arXiv:2501.12175,

arXiv
[7]

Related Work Graph-based Recommendation.Collaborative filtering has long been a foundational technique in recommendation systems

12 GCIB: Graph Contrastive Information Bottleneck for Multi-Behavior Recommendation A. Related Work Graph-based Recommendation.Collaborative filtering has long been a foundational technique in recommendation systems. Traditional approaches, such as matrix factorization (Zhao et al., 2015), aim to learn user and item latent representations from observed in...

2015
[8]

To address noise from irrelevant graph connections, Graph Information Bottleneck (GIB) models extend the IB framework to structured graph data

applied IB to enhance graph embedding expressiveness, laying the foundation for subsequent advancements that integrate IB with contrastive learning—such as CGI (Wei et al., 2022), which uses graph augmentation to improve representation robustness by leveraging contrastive signals. To address noise from irrelevant graph connections, Graph Information Bottl...

2022
[9]

The information bottleneck loss weight β is selected from [1, 100], and the contrastive loss weight λ is selected from [0.01, 0.5]

The number of global encoding layers LG is selected from {1, 2, 3, 4}, and the number of message-passing layers LM for both auxiliary and target behaviors is selected from {1, 2, 3, 4}. The information bottleneck loss weight β is selected from [1, 100], and the contrastive loss weight λ is selected from [0.01, 0.5]. The temperature parameter for InfoNCE l...

2024

[1] [1]

Multi-behavior recommendation with cascading graph convolution networks

Cheng, Z., Han, S., Liu, F., Zhu, L., Gao, Z., and Peng, Y . Multi-behavior recommendation with cascading graph convolution networks. InProceedings of the ACM Web Conference 2023, pp. 1181–1189,

2023

[2] [2]

Self-supervised graph neural networks for multi-behavior recommenda- tion

Gu, S., Wang, X., Shi, C., and Xiao, D. Self-supervised graph neural networks for multi-behavior recommenda- tion. InIJCAI, pp. 2052–2058,

2052

[3] [3]

Buying or browsing?: Predicting real-time purchasing intent using attention-based deep network with multiple behavior

Guo, L., Hua, L., Jia, R., Zhao, B., Wang, X., and Cui, B. Buying or browsing?: Predicting real-time purchasing intent using attention-based deep network with multiple behavior. InProceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 1984–1992,

1984

[4] [4]

Info- graph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximiza- tion.arXiv preprint arXiv:1908.01000,

Sun, F.-Y ., Hoffmann, J., Verma, V ., and Tang, J. Info- graph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximiza- tion.arXiv preprint arXiv:1908.01000,

arXiv 1908

[5] [5]

Deep graph library: A graph-centric, highly-performant package for graph neural networks.arXiv preprint arXiv:1909.01315, 2019a

Wang, M., Zheng, D., Ye, Z., Gan, Q., Li, M., Song, X., Zhou, J., Ma, C., Yu, L., Gai, Y ., et al. Deep graph library: A graph-centric, highly-performant package for graph neural networks.arXiv preprint arXiv:1909.01315, 2019a. Wang, X., He, X., Cao, Y ., Liu, M., and Chua, T.-S. Kgat: Knowledge graph attention network for recommendation. InProceedings of...

Pith/arXiv arXiv 1909

[6] [6]

Less is more: Information bottleneck denoised multime- dia recommendation.arXiv preprint arXiv:2501.12175,

Yang, Y ., Wu, L., He, Z., Wu, Z., Hong, R., and Wang, M. Less is more: Information bottleneck denoised multime- dia recommendation.arXiv preprint arXiv:2501.12175,

arXiv

[7] [7]

Related Work Graph-based Recommendation.Collaborative filtering has long been a foundational technique in recommendation systems

12 GCIB: Graph Contrastive Information Bottleneck for Multi-Behavior Recommendation A. Related Work Graph-based Recommendation.Collaborative filtering has long been a foundational technique in recommendation systems. Traditional approaches, such as matrix factorization (Zhao et al., 2015), aim to learn user and item latent representations from observed in...

2015

[8] [8]

To address noise from irrelevant graph connections, Graph Information Bottleneck (GIB) models extend the IB framework to structured graph data

applied IB to enhance graph embedding expressiveness, laying the foundation for subsequent advancements that integrate IB with contrastive learning—such as CGI (Wei et al., 2022), which uses graph augmentation to improve representation robustness by leveraging contrastive signals. To address noise from irrelevant graph connections, Graph Information Bottl...

2022

[9] [9]

The information bottleneck loss weight β is selected from [1, 100], and the contrastive loss weight λ is selected from [0.01, 0.5]

The number of global encoding layers LG is selected from {1, 2, 3, 4}, and the number of message-passing layers LM for both auxiliary and target behaviors is selected from {1, 2, 3, 4}. The information bottleneck loss weight β is selected from [1, 100], and the contrastive loss weight λ is selected from [0.01, 0.5]. The temperature parameter for InfoNCE l...

2024