arxiv: 2604.02899 · v1 · submitted 2026-04-03 · 💻 cs.LG

Recognition: no theorem link

Extracting Money Laundering Transactions from Quasi-Temporal Graph Representation

Haseeb Tariq , Marwan Hassani

Authors on Pith no claims yet

Pith reviewed 2026-05-13 20:49 UTC · model grok-4.3

classification 💻 cs.LG

keywords money laundering detectionanti-money launderinggraph representationsupervised learningfinancial transactionsF1 scorequasi-temporal graphssuspicious transaction detection

0 comments

The pith

A quasi-temporal graph framework detects money laundering transactions more accurately than existing AML models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ExSTraQt, a supervised learning approach that turns financial transaction records into quasi-temporal graphs to flag suspicious money laundering activity. This representation lets a simple classifier achieve higher F1 scores than current anti-money laundering systems while using fewer parameters and less memory. A reader would care because banks handle billions of transactions each day and waste large resources on false alerts from rule-based tools. The method reports consistent gains on both a real dataset and multiple synthetic ones, and the authors state it can run alongside existing bank systems.

Core claim

ExSTraQt extracts suspicious transactions from a quasi-temporal graph representation of financial data using supervised learning and outperforms state-of-the-art AML detection models in transaction-level accuracy, with F1 score uplifts of up to 1% on a real dataset and more than 8% on one synthetic dataset, while requiring minimal parameters and computing resources.

What carries the argument

Quasi-temporal graph representation of transactions that feeds a supervised classifier to label suspicious activity.

Load-bearing premise

The quasi-temporal graph representation must sufficiently encode the evolving patterns used by criminal organizations so that a supervised model trained on the given datasets generalizes to unseen real-world transaction streams.

What would settle it

Running the framework on a fresh real-world transaction dataset and finding no F1 improvement or a drop relative to baseline AML models would challenge the performance claim.

Figures

Figures reproduced from arXiv: 2604.02899 by Haseeb Tariq, Marwan Hassani.

**Figure 1.** Figure 1: Types of ML flow accounts. Blue node represents the node that is being defined. 3.2 Challenges Transaction labels are highly imbalanced. Usually, legitimate transactions outnumber illicit ones, which appear in ratios as low as 0.05% (see [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: We want to stress here that, while having 9 stages, the [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 2.** Figure 2: Overview of the framework [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Example of dispense flow calculation. from the aggregated graph [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 5.** Figure 5: Execution times for the distributed graph features [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Execution times for the flow-based features gener [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Execution times for the temporal flow-based fea [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 8.** Figure 8: Execution times for different steps, with increasing [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

read the original abstract

Money laundering presents a persistent challenge for financial institutions worldwide, while criminal organizations constantly evolve their tactics to bypass detection systems. Traditional anti-money laundering approaches mainly rely on predefined risk-based rules, leading to resource-intensive investigations and high numbers of false positive alerts. In order to restrict operational costs from exploding, while billions of transactions are being processed every day, financial institutions are investing in more sophisticated mechanisms to improve existing systems. In this paper, we present ExSTraQt (EXtract Suspicious TRAnsactions from Quasi-Temporal graph representation), an advanced supervised learning approach to detect money laundering (or suspicious) transactions in financial datasets. Our proposed framework excels in performance, when compared to the state-of-the-art AML (Anti Money Laundering) detection models. The key strengths of our framework are sheer simplicity, in terms of design and number of parameters; and scalability, in terms of the computing and memory requirements. We evaluated our framework on transaction-level detection accuracy using a real dataset; and a set of synthetic financial transaction datasets. We consistently achieve an uplift in the F1 score for most datasets, up to 1% for the real dataset; and more than 8% for one of the synthetic datasets. We also claim that our framework could seamlessly complement existing AML detection systems in banks. Our code and datasets are available at https://github.com/mhaseebtariq/exstraqt.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ExSTraQt is a lightweight supervised classifier on quasi-temporal graph features for AML that reports small F1 gains and ships code, but the abstract leaves the key modeling choices and ablations opaque.

read the letter

The paper introduces ExSTraQt, a supervised model that converts transaction streams into quasi-temporal graphs and classifies individual transactions as suspicious. It claims consistent F1 improvements over prior AML detectors, reaching 1% on a real dataset and more than 8% on one synthetic set, while stressing low parameter count and modest compute needs. The GitHub release of code and datasets is a clear plus for anyone who wants to reproduce or extend the work. That openness makes the practical angle credible: banks could slot this in without heavy infrastructure changes. The emphasis on simplicity over complex graph neural nets also fits the domain, where interpretability and speed matter more than marginal accuracy jumps. The main weakness is that the abstract supplies almost no information on how the quasi-temporal edges are defined, which node or edge features are fed to the classifier, or what the exact baselines and data splits look like. Without those details or an ablation that isolates the temporal component, the reported uplifts could stem from standard feature engineering rather than the graph representation itself. The 1% real-data gain is small enough that statistical significance and variance across runs would need checking. This is incremental applied work rather than a conceptual advance, but the open artifacts and focus on real constraints make it worth a referee's time. Practitioners in financial fraud detection would get the most from it; a serious editor should send it out so reviewers can verify the implementation directly.

Referee Report

3 major / 2 minor

Summary. The paper proposes ExSTraQt, a supervised learning framework that extracts suspicious money-laundering transactions from a quasi-temporal graph representation of financial transaction data. It claims superior F1-score performance over state-of-the-art AML models (up to 1% uplift on a real dataset and >8% on one synthetic dataset), while highlighting the framework's simplicity (few parameters) and scalability.

Significance. If the performance gains are reproducible and attributable to the quasi-temporal encoding, the work could provide a lightweight, practical complement to existing rule-based AML systems in high-volume banking environments. The public release of code and datasets supports reproducibility, which is a strength.

major comments (3)

[Abstract] Abstract: the stated F1 uplifts are presented without any description of model architecture, feature construction from the quasi-temporal graph, baseline implementations, statistical testing, or data-split procedures. These omissions make it impossible to verify whether the reported improvements support the central claim of consistent outperformance.
[Framework and Evaluation] Framework and Evaluation sections: no ablation experiments are reported that isolate the contribution of the quasi-temporal edges or time-derived features versus a static graph or non-graph feature set. Without such controls, it remains unclear whether the quasi-temporal representation is the load-bearing element driving the observed F1 gains or whether the results could be reproduced with simpler feature sets.
[Evaluation] Evaluation section: the supervised model is described as having a low parameter count, yet no concrete architecture, training procedure, or hyper-parameter details are supplied. This prevents assessment of whether the claimed simplicity is genuine or whether the performance numbers depend on undisclosed implementation choices.

minor comments (2)

[Abstract] The abstract asserts 'sheer simplicity' and 'scalability' but supplies no quantitative runtime, memory, or parameter-count figures in the results; adding these metrics would strengthen the practical-utility claim.
[Methodology] Notation for the quasi-temporal graph construction (e.g., how temporal edges are defined and weighted) is introduced without a formal definition or pseudocode; a small diagram or equation would improve clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that additional methodological details and controls would strengthen the paper and will incorporate them in the revision. We respond to each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the stated F1 uplifts are presented without any description of model architecture, feature construction from the quasi-temporal graph, baseline implementations, statistical testing, or data-split procedures. These omissions make it impossible to verify whether the reported improvements support the central claim of consistent outperformance.

Authors: We acknowledge the abstract's brevity due to length limits. The full manuscript describes the supervised classifier (gradient boosting on quasi-temporal features), feature extraction (node/edge attributes plus temporal encodings), baselines (standard AML graph and rule-based models), 5-fold cross-validation for splits, and paired t-tests for significance. In revision we will expand the abstract to include a one-sentence summary of these elements while preserving the performance claims. revision: yes
Referee: [Framework and Evaluation] Framework and Evaluation sections: no ablation experiments are reported that isolate the contribution of the quasi-temporal edges or time-derived features versus a static graph or non-graph feature set. Without such controls, it remains unclear whether the quasi-temporal representation is the load-bearing element driving the observed F1 gains or whether the results could be reproduced with simpler feature sets.

Authors: The current experiments compare ExSTraQt against published SOTA AML models that employ static graphs and non-graph features, showing consistent gains. We agree that explicit ablations would isolate the quasi-temporal contribution more clearly. We will add these experiments (static vs. quasi-temporal edges, with/without time features) to the revised Evaluation section. revision: yes
Referee: [Evaluation] Evaluation section: the supervised model is described as having a low parameter count, yet no concrete architecture, training procedure, or hyper-parameter details are supplied. This prevents assessment of whether the claimed simplicity is genuine or whether the performance numbers depend on undisclosed implementation choices.

Authors: The model is an XGBoost classifier with default hyperparameters (max_depth=6, n_estimators=100, learning_rate=0.1) trained via standard 5-fold CV on the extracted features; parameter count is under 10k weights. The GitHub repository contains the exact training script. In revision we will add an explicit subsection listing architecture, training procedure, and all hyperparameters to the Evaluation section. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical performance on held-out data

full rationale

The paper presents ExSTraQt as a supervised classifier on quasi-temporal graph features, with all reported results consisting of F1-score uplifts measured on separate real and synthetic transaction datasets. No equations, parameter-fitting steps, or derivations are described that would reduce the claimed performance gains to quantities defined by or fitted to the evaluation data itself. The evaluation protocol is presented as standard held-out testing against baselines, with no self-citation load-bearing the central claim and no renaming of known results as novel derivations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the unproven effectiveness of the quasi-temporal representation for capturing laundering tactics and on standard supervised-learning generalization assumptions; no explicit free parameters or invented physical entities are described.

axioms (1)

domain assumption Labeled transaction data exists and is representative enough for supervised models to learn generalizable patterns of money laundering.
Implicit in any supervised detection claim on real and synthetic financial datasets.

invented entities (1)

Quasi-temporal graph representation no independent evidence
purpose: To encode financial transaction sequences in a graph form that preserves temporal ordering while remaining computationally lightweight.
Introduced as the core modeling choice of the ExSTraQt framework.

pith-pipeline@v0.9.0 · 5547 in / 1271 out tokens · 40416 ms · 2026-05-13T20:49:50.123005+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 4 internal anchors

[1]

Erik Altman, Jovan Blanuša, Luc Von Niederhäusern, Béni Egressy, Andreea Anghel, and Kubilay Atasu. 2023. Realistic synthetic financial transactions for anti-money laundering models.Advances in Neural Information Processing Systems36 (2023), 29851–29874

work page 2023
[2]

Relational inductive biases, deep learning, and graph networks

Peter W. Battaglia, Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinícius Flores Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Ra- poso, Adam Santoro, Ryan Faulkner, Çaglar Gülçehre, H. Francis Song, An- drew J. Ballard, Justin Gilmer, George E. Dahl, Ashish Vaswani, Kelsey R. Allen, Charles Nash, Victoria Langston, Chris Dyer, Nicol...

work page internal anchor Pith review Pith/arXiv arXiv 2018
[3]

Luca Bionducci, Alessio Botta, Philip Bruno, Olivier Denecker, Carolyne Gathinji, Reema Jain, Marie-Claude Nadeau, and Bharath Sattanathan

work page
[4]

https://www.mckinsey.com/industries/financial-services/our-insights/ the-2023-mckinsey-global-payments-report

On the cusp of the next payments era: Future opportunities for Banks. https://www.mckinsey.com/industries/financial-services/our-insights/ the-2023-mckinsey-global-payments-report

work page 2023
[5]

Jovan Blanuša, Maximo Cravero Baraja, Andreea Anghel, Luc von Niederhäusern, Erik Altman, Haris Pozidis, and Kubilay Atasu. 2024. Graph Feature Preproces- sor: Real-time Extraction of Subgraph-based Features from Transaction Graphs. arXiv:2402.08593 [cs.LG]

work page arXiv 2024
[6]

Xavier Bresson and Thomas Laurent. 2017. Residual Gated Graph ConvNets. (11 2017). doi:10.48550/arXiv.1711.07553

work page doi:10.48550/arxiv.1711.07553 2017
[7]

Mário Cardoso, Pedro Saleiro, and Pedro Bizarro. 2022. LaundroGraph: Self-Supervised Graph Representation Learning for Anti-Money Laundering. arXiv:2210.14360 [cs.LG]

work page arXiv 2022
[8]

Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(San Francisco, California, USA)(KDD ’16). Association for Computing Machinery, New York, NY, USA, 785–794. doi:10. 1145/2939672.2939785

work page arXiv 2016
[9]

Gabriele Corso, Luca Cavalleri, Dominique Beaini, Pietro Liò, and Petar Veličković. 2020. Principal Neighbourhood Aggregation for Graph Nets. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ran- zato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 13260–13271. https://proceedings.neurips.cc/paper_file...

work page 2020
[10]

Ahmad Naser Eddin, Jacopo Bono, David Aparício, David Polido, João Tiago As- censão, Pedro Bizarro, and Pedro Ribeiro. 2022. Anti-Money Laundering Alert Op- timization Using Machine Learning with Graphs. (2022). arXiv:2112.07508 [cs.LG] https://arxiv.org/abs/2112.07508

work page arXiv 2022
[11]

Béni Egressy, Luc von Niederhäusern, Jovan Blanuša, Erik Altman, Roger Wat- tenhofer, and Kubilay Atasu. 2024. Provably powerful graph neural networks for directed multigraphs. InProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Sympo...

work page doi:10.1609/aaai.v38i10.29069 2024
[12]

FIU Nederland. 2024. What is Money Laundering? https://www.fiu-nederland. nl/en/home/what-is-money-laundering

work page 2024
[13]

https://www.fincen.gov/. 2025. FinCEN Suspicious Activity Report (FinCEN SAR) Electronic Filing Instructions. https://www.fincen.gov/ sites/default/files/shared/FinCEN%20SAR%20ElectronicFilingInstructions- %20Stand%20Alone%20doc.pdf

work page 2025
[14]

Weihua Hu*, Bowen Liu*, Joseph Gomes, Marinka Zitnik, Percy Liang, Vi- jay Pande, and Jure Leskovec. 2020. Strategies for Pre-training Graph Neu- ral Networks. InInternational Conference on Learning Representations. https: //openreview.net/forum?id=HJlWWJSFDH

work page 2020
[15]

Guillaume Jaume, An-Phi Nguyen, Maria Rodriguez Martinez, Jean-Philippe Thiran, and Maria Gabrani. 2019. edGNN: a Simple and Powerful GNN for Directed Labeled Graphs. (04 2019). doi:10.48550/arXiv.1904.08745

work page doi:10.48550/arxiv.1904.08745 2019
[16]

Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: a highly efficient gradient boost- ing decision tree. InProceedings of the 31st International Conference on Neural Information Processing Systems(Long Beach, California, USA)(NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 3149–3157

work page 2017
[17]

Kipf et al.Variational Graph Auto- Encoders

Thomas N. Kipf and Max Welling. 2016. Variational Graph Auto-Encoders. arXiv:1611.07308 [stat.ML] https://arxiv.org/abs/1611.07308

work page arXiv 2016
[18]

Xurui Li, Xiang Cao, Xuetao Qiu, Jintao Zhao, and Jianbin Zheng. 2017. Intelligent Anti-Money Laundering Solution Based upon Novel Community Detection in Massive Transaction Networks on Spark. In2017 Fifth International Conference on Advanced Cloud and Big Data (CBD). 176–181. doi:10.1109/CBD.2017.38

work page doi:10.1109/cbd.2017.38 2017
[19]

Xiangfeng Li, Shenghua Liu, Zifeng Li, Xiaotian Han, Chuan Shi, Bryan Hooi, He Huang, and Xueqi Cheng. 2020. Flowscope: Spotting money laundering based on graphs. InProceedings of the AAAI conference on artificial intelligence, Vol. 34. 4731–4738

work page 2020
[20]

Junhong Lin, Xiaojie Guo, Yada Zhu, Samuel Mitchell, Erik Altman, and Julian Shun. 2024. FraudGT: A Simple, Effective, and Efficient Graph Transformer for Financial Fraud Detection. InProceedings of the 5th ACM International Conference on AI in Finance(Brooklyn, NY, USA)(ICAIF ’24). Association for Computing Machinery, New York, NY, USA, 292–300. doi:10.1...

work page doi:10.1145/3677052.3698648 2024
[21]

Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation Forest. In2008 Eighth IEEE International Conference on Data Mining. 413–422. doi:10.1109/ICDM. 2008.17

work page doi:10.1109/icdm 2008
[22]

Junliang Luo and Xue Liu. 2025. Optimizing Blockchain Analysis: Tackling Temporality and Scalability with an Incremental Approach with Metropolis- Hastings Random Walks. InProceedings of WSDM. 410–418

work page 2025
[23]

Berkan Oztas, Deniz Cetinkaya, Festus Adedoyin, Marcin Budka, Gokhan Aksu, and Huseyin Dogan. 2024. Transaction monitoring in anti-money laundering: A qualitative analysis and points of view from industry.Future Generation Computer Systems159 (2024), 161–171. doi:10.1016/j.future.2024.05.027

work page doi:10.1016/j.future.2024.05.027 2024
[24]

1999.The PageRank Citation Ranking: Bringing Order to the Web.Technical Report 1999-

Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999.The PageRank Citation Ranking: Bringing Order to the Web.Technical Report 1999-

work page 1999
[25]

http://ilpubs.stanford.edu:8090/422/ Previous number = SIDL-WP-1999-0120

Stanford InfoLab. http://ilpubs.stanford.edu:8090/422/ Previous number = SIDL-WP-1999-0120

work page 1999
[26]

2019.Approximation ratios of graph neural networks for combinatorial problems

Ryoma Sato, Makoto Yamada, and Hisashi Kashima. 2019.Approximation ratios of graph neural networks for combinatorial problems. Curran Associates Inc., Red Hook, NY, USA

work page 2019
[27]

Michele Starnini, Charalampos E. Tsourakakis, Maryam Zamanipour, André Panisson, Walter Allasia, Marco Fornasiero, Laura Li Puma, Valeria Ricci, Silvia Ronchiadin, Angela Ugrinoska, Marco Varetto, and Dario Moncalvo. 2021.Smurf- Based Anti-money Laundering in Time-Evolving Transaction Networks. Springer International Publishing, 171–186. doi:10.1007/978-3...

work page doi:10.1007/978-3-030-86514-6_11 2021
[28]

Haseeb Tariq and Marwan Hassani. 2023. Topology-Agnostic Detection of Temporal Money Laundering Flows in Billion-Scale Transactions. InMachine Learning and Principles and Practice of Knowledge Discovery in Databases. 402– 419

work page 2023
[29]

Haseeb Tariq, Alen Kaja, and Marwan Hassani. 2026. Detecting Complex Money Laundering Patterns with Incremental and Distributed Graph Modeling. arXiv:2604.01315 [cs.LG] https://arxiv.org/abs/2604.01315

work page internal anchor Pith review Pith/arXiv arXiv 2026
[30]

2025.SparkR: R Front End for ’Apache Spark’

The Apache Software Foundation. 2025.SparkR: R Front End for ’Apache Spark’. https://www.apache.org https://spark.apache.org

work page 2025
[31]

Scientific Reports9(1), 5233 (2019) https://doi

V. Traag, L. Waltman, and Nees Jan van Eck. 2019. From Louvain to Leiden: guaranteeing well-connected communities.Scientific Reports9 (03 2019), 5233. doi:10.1038/s41598-019-41695-z

work page doi:10.1038/s41598-019-41695-z 2019
[32]

United Nations Office on Drugs and Crime. 2024. Overview - Money Laundering. https://www.unodc.org/unodc/en/money-laundering/overview.html

work page 2024
[33]

Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. arXiv:1710.10903 [stat.ML] https://arxiv.org/abs/1710.10903

work page internal anchor Pith review Pith/arXiv arXiv 2018
[34]

Weidele, Claudio Bellei, Tom Robinson, and Charles E

Mark Weber, Giacomo Domeniconi, Jie Chen, Daniel Karl I. Weidele, Claudio Bellei, Tom Robinson, and Charles E. Leiserson. 2019. Anti-Money Laundering in Bitcoin: Experimenting with Graph Convolutional Networks for Financial Forensics. arXiv:1908.02591 [cs.SI] https://arxiv.org/abs/1908.02591

work page arXiv 2019
[35]

withpersona.com. 2025. The most mind-blowing money laundering statis- tics of 2023. https://withpersona.com/blog/the-most-mind-blowing-money- laundering-statistics-of-2022

work page 2025
[36]

XBlock. [n. d.].Ethereum Phishing Transaction Network. https://www.kaggle. com/datasets/xblock/ethereum-phishing-transaction-network

work page
[37]

Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2019. How Powerful are Graph Neural Networks? arXiv:1810.00826 [cs.LG] https://arxiv.org/abs/ 1810.00826

work page internal anchor Pith review Pith/arXiv arXiv 2019
[38]

Jiaxuan You, Jonathan M Gomes-Selman, Rex Ying, and Jure Leskovec. 2021. Identity-aware Graph Neural Networks.Proceedings of the AAAI Conference on Artificial Intelligence35, 12 (May 2021), 10737–10745. doi:10.1609/aaai.v35i12. 17283

work page doi:10.1609/aaai.v35i12 2021