arxiv: 2604.09085 · v1 · submitted 2026-04-10 · 💻 cs.LG · cs.AI

Recognition: unknown

Beyond Isolated Clients: Integrating Graph-Based Embeddings into Event Sequence Models

Andrey Savchenko, Harry Proshian, Ilya Makarov, Ivan Sergeev, Kireev Ivan, Maria Postnova, Nikita Severin, Sergey Nikolenko

Pith reviewed 2026-05-10 17:11 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords graph embeddingsevent sequence modelscontrastive self-supervised learninguser-item interaction graphsfinancial fraud detectione-commerce recommendationsmodel-agnostic integration

0 comments

The pith

Integrating graph-based embeddings into event sequence models improves accuracy by up to 2.3% AUC.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Event sequence models trained with contrastive self-supervised learning capture the temporal order of user-item interactions but typically ignore the global structure of the underlying user-item graph. This paper introduces three model-agnostic ways to inject that structure: enriching individual event embeddings, aligning client representations with graph embeddings, and adding a structural pretext task. On four financial and e-commerce datasets the additions raise predictive accuracy, with the best strategy depending on how dense the interaction graph is. If the gains hold, platforms can improve fraud detection and recommendation quality by using existing graph data without redesigning their sequence pipelines.

Core claim

The paper claims that adding structural information from the user-item interaction graph to contrastive self-supervised event sequence models consistently raises accuracy, with observed gains reaching 2.3% AUC, and that graph density determines which of the three integration strategies works best.

What carries the argument

Three model-agnostic integration strategies—enriching event embeddings, aligning client representations with graph embeddings, and adding a structural pretext task—that inject global graph information into temporal contrastive learning.

If this is right

Fraud prevention and recommendation systems can achieve higher accuracy by incorporating global interaction structure.
The choice of integration method should be guided by measured graph density rather than applied uniformly.
Existing sequence models can be upgraded without internal changes by using one of the three external integration routes.
Performance benefits appear across both financial and e-commerce event datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same integration patterns could be tested on sequential data in other domains that also have sparse-to-dense interaction graphs, such as social media or medical event logs.
If graph density is confirmed as the dominant selector, practitioners could develop simple density-based rules to pick the integration strategy before training.
The work leaves open whether dynamic or time-evolving graphs would require additional handling beyond the static embeddings used here.

Load-bearing premise

The observed accuracy gains come from the added graph structure rather than from extra model capacity, hyperparameter tuning, or dataset-specific effects.

What would settle it

An ablation experiment that adds the same number of extra parameters and training steps but without any graph information and checks whether the AUC improvements disappear.

Figures

Figures reproduced from arXiv: 2604.09085 by Andrey Savchenko, Harry Proshian, Ilya Makarov, Ivan Sergeev, Kireev Ivan, Maria Postnova, Nikita Severin, Sergey Nikolenko.

**Figure 2.** Figure 2: Latent space dissimilarity scores; GrW/GrUnw - [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

read the original abstract

Large-scale digital platforms generate billions of timestamped user-item interactions (events) that are crucial for predicting user attributes in, e.g., fraud prevention and recommendations. While self-supervised learning (SSL) effectively models the temporal order of events, it typically overlooks the global structure of the user-item interaction graph. To bridge this gap, we propose three model-agnostic strategies for integrating this structural information into contrastive SSL: enriching event embeddings, aligning client representations with graph embeddings, and adding a structural pretext task. Experiments on four financial and e-commerce datasets demonstrate that our approach consistently improves the accuracy (up to a 2.3% AUC) and reveals that graph density is a key factor in selecting the optimal integration strategy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Three simple ways to add graph embeddings to event SSL give small AUC lifts on real data, but without capacity controls the gains may not come from the graphs.

read the letter

The paper's core contribution is three straightforward, model-agnostic ways to fold graph-based embeddings into contrastive self-supervised learning on event sequences: embedding enrichment, representation alignment, and a structural pretext task. It also notes that graph density helps pick which tactic to use. On four financial and e-commerce datasets, these yield up to 2.3% AUC improvement for tasks like fraud and recommendations. This is solid for applied work. The strategies are simple to implement on top of existing SSL pipelines, and testing on real platform data makes the results relevant to industry. The density observation is a useful empirical rule of thumb. The main weakness is the lack of controls to show that the improvements come from the graph information rather than just adding capacity or extra training signals. The abstract mentions no capacity-matched baselines, no random-graph ablations, and no statistical tests or error bars. Without those, it's hard to rule out that the lifts are artifacts of the experimental setup or dataset quirks like sparsity. The citation pattern seems standard, pulling in SSL and graph embedding literature without obvious gaps or overclaims. This paper is aimed at engineers and researchers working on large-scale user behavior modeling who already have both sequence and graph data. A reader looking for new theoretical insights or large leaps in performance won't find much, but someone wanting concrete integration recipes will get value. I would recommend sending it to peer review. The experiments are on relevant data and the ideas are implementable, though the authors will likely need to add ablations to strengthen the claims.

Referee Report

3 major / 1 minor

Summary. The paper proposes three model-agnostic strategies for integrating graph-based structural information from user-item interaction graphs into contrastive self-supervised learning models for event sequences: (1) enriching event embeddings, (2) aligning client representations with graph embeddings, and (3) adding a structural pretext task. Experiments across four financial and e-commerce datasets show consistent accuracy gains (up to 2.3% AUC) and indicate that graph density influences the choice of optimal integration strategy.

Significance. If the attribution of gains to graph structure holds, the work usefully connects temporal SSL sequence modeling with global graph structure for large-scale interaction data, with potential impact on fraud detection and recommendations. The model-agnostic framing and multi-dataset evaluation are positive features that support broader applicability.

major comments (3)

[§4 (Experiments)] §4 (Experiments): The central claim of AUC improvements (up to 2.3%) due to the three graph-integration strategies lacks capacity-matched baselines that add equivalent parameters or loss terms without graph structure, as well as random-graph ablations. Without these, it is not possible to isolate the contribution of structural information from incidental increases in model capacity or optimization objectives.
[§5.3] §5.3 (Discussion on graph density): The assertion that graph density is the dominant factor for selecting the integration strategy is not supported by controls for confounding variables such as event sparsity or label distribution. These factors could explain performance differences across datasets independently of density.
[Results tables] Results tables (e.g., Table 1 or equivalent): No error bars, standard deviations, or statistical significance tests (such as paired t-tests across runs) are reported for the AUC gains. This weakens the claim of consistent improvement across the four datasets.

minor comments (1)

[Abstract] The abstract would be clearer if it named the four datasets and the specific baseline models used for the reported AUC comparisons.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback, which highlights important aspects for strengthening the empirical claims in our work on integrating graph-based embeddings into event sequence SSL models. We agree that additional controls and statistical reporting will improve the manuscript and will revise accordingly. We address each major comment below.

read point-by-point responses

Referee: [§4 (Experiments)] The central claim of AUC improvements (up to 2.3%) due to the three graph-integration strategies lacks capacity-matched baselines that add equivalent parameters or loss terms without graph structure, as well as random-graph ablations. Without these, it is not possible to isolate the contribution of structural information from incidental increases in model capacity or optimization objectives.

Authors: We acknowledge that isolating the specific contribution of graph structure requires controls beyond the current baselines. In the revised manuscript, we will introduce capacity-matched baselines that add equivalent parameters or auxiliary loss terms without incorporating any graph information. We will also add random-graph ablations, where the user-item interaction graph is replaced with a randomized version preserving degree distribution, to demonstrate that gains arise from meaningful structural signals rather than added model capacity or objectives. revision: yes
Referee: [§5.3] The assertion that graph density is the dominant factor for selecting the integration strategy is not supported by controls for confounding variables such as event sparsity or label distribution. These factors could explain performance differences across datasets independently of density.

Authors: We agree that event sparsity and label distribution are potential confounders that could influence strategy selection independently of graph density. In the revision, we will add controlled analyses, including dataset subsampling to match sparsity levels across datasets and explicit discussion of label distribution statistics, to better isolate graph density as the key factor. revision: yes
Referee: Results tables (e.g., Table 1 or equivalent): No error bars, standard deviations, or statistical significance tests (such as paired t-tests across runs) are reported for the AUC gains. This weakens the claim of consistent improvement across the four datasets.

Authors: We recognize the need for statistical rigor in reporting. We will rerun all experiments with multiple random seeds (at least 5), report mean AUC values with standard deviations in the updated tables, and include paired t-tests or similar significance tests to substantiate the consistency of the observed improvements. revision: yes

Circularity Check

0 steps flagged

No circularity; experimental claims rest on independent dataset comparisons

full rationale

The paper proposes three integration strategies (embedding enrichment, representation alignment, structural pretext) for graph embeddings in event sequence models and reports AUC gains on four external financial/e-commerce datasets. No derivation chain, equations, or first-principles results appear in the provided text. No self-citations are used to justify uniqueness theorems, ansatzes, or load-bearing premises. Performance attribution is presented as empirical outcome rather than a fitted parameter renamed as prediction or a self-definitional equivalence. The work is self-contained against external benchmarks with no reduction of claims to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the work relies on standard contrastive SSL and graph embedding techniques whose details are not specified here.

pith-pipeline@v0.9.0 · 5441 in / 1135 out tokens · 40690 ms · 2026-05-10T17:11:12.121372+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 5 canonical work pages · 2 internal anchors

[1]

Maher Ala’raj, Maysam Abbod, Munir Majdalawieh, and Luay Jum’a. 2022. A deep learning model for behavioural credit scoring in banks.Neural Computing and Applications34 (04 2022), 1–28. https://doi.org/10.1007/s00521-021-06695-z

work page doi:10.1007/s00521-021-06695-z 2022
[2]

Dmitrii Babaev, Nikita Ovsov, Ivan Kireev, Maria Ivanova, Gleb Gusev, Ivan Nazarov, and Alexander Tuzhilin. 2022. CoLES: Contrastive learning for event sequences with self-supervision. InSIGMOD

2022
[3]

Dmitrii Babaev, Maxim Savchenko, Alexander Tuzhilin, and Dmitrii Umerenkov
[4]

ET-RNN: Applying deep learning to credit loan applications. InKDD. 2183–2190
[5]

Bazarova

Alexandra et al. Bazarova. 2025. Learning transactions representations for infor- mation management in banks: Mastering local, global, and external knowledge. International Journal of Information Management Data Insights5, 1 (2025), 100323

2025
[6]

Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078(2014)

work page internal anchor Pith review arXiv 2014
[7]

Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs.NeurIPS30 (2017)

2017
[8]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. InCVPR. 770–778

2016
[9]

Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: A highly efficient gradient boosting decision tree.NeurIPS30 (2017)

2017
[10]

Kipf and Max Welling

Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. InICLR. https://openreview.net/forum?id= SJU4ayYgl

2017
[11]

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[12]

Inkit Padhi, Yair Schiff, Igor Melnyk, Mattia Rigotti, Youssef Mroueh, Pierre Dognin, Jerret Ross, Ravi Nair, and Erik Altman. 2021. Tabular transformers for modeling multivariate time series. InICASSP. IEEE, 3565–3569

2021
[13]

Oleg Platonov, Denis Kuznedelev, Michael Diskin, Artem Babenko, and Liudmila Prokhorenkova. 2023. A critical look at the evaluation of GNNs under heterophily: Are we really making progress?arXiv preprint arXiv:2302.11640(2023)

work page arXiv 2023
[14]

Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme
[15]

BPR: Bayesian personalized ranking from implicit feedback. InUAI. 452– 461
[16]

Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks.International Con- ference on Learning Representations(2018). https://openreview.net/forum?id= rJXMpikCZ

2018
[17]

Yimu Wang, He Zhao, Ruizhi Deng, Frederick Tung, and Greg Mori. 2024. Pretext Training Algorithms for Event Sequence Data.arXiv preprint arXiv:2402.10392 (2024)

work page arXiv 2024
[18]

Jure Zbontar, Li Jing, Ishan Misra, Yann LeCun, and Stéphane Deny. 2021. Barlow twins: Self-supervised learning via redundancy reduction. InICML. PMLR, 12310– 12320

2021
[19]

Yichi Zhang, Guisheng Yin, and Yuxin Dong. 2023. Contrastive learning with frequency-domain interest trends for sequential recommendation. InRecSys. ACM, 141–150

2023