pith. machine review for the scientific record. sign in

arxiv: 2604.02899 · v1 · submitted 2026-04-03 · 💻 cs.LG

Recognition: no theorem link

Extracting Money Laundering Transactions from Quasi-Temporal Graph Representation

Authors on Pith no claims yet

Pith reviewed 2026-05-13 20:49 UTC · model grok-4.3

classification 💻 cs.LG
keywords money laundering detectionanti-money launderinggraph representationsupervised learningfinancial transactionsF1 scorequasi-temporal graphssuspicious transaction detection
0
0 comments X

The pith

A quasi-temporal graph framework detects money laundering transactions more accurately than existing AML models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ExSTraQt, a supervised learning approach that turns financial transaction records into quasi-temporal graphs to flag suspicious money laundering activity. This representation lets a simple classifier achieve higher F1 scores than current anti-money laundering systems while using fewer parameters and less memory. A reader would care because banks handle billions of transactions each day and waste large resources on false alerts from rule-based tools. The method reports consistent gains on both a real dataset and multiple synthetic ones, and the authors state it can run alongside existing bank systems.

Core claim

ExSTraQt extracts suspicious transactions from a quasi-temporal graph representation of financial data using supervised learning and outperforms state-of-the-art AML detection models in transaction-level accuracy, with F1 score uplifts of up to 1% on a real dataset and more than 8% on one synthetic dataset, while requiring minimal parameters and computing resources.

What carries the argument

Quasi-temporal graph representation of transactions that feeds a supervised classifier to label suspicious activity.

Load-bearing premise

The quasi-temporal graph representation must sufficiently encode the evolving patterns used by criminal organizations so that a supervised model trained on the given datasets generalizes to unseen real-world transaction streams.

What would settle it

Running the framework on a fresh real-world transaction dataset and finding no F1 improvement or a drop relative to baseline AML models would challenge the performance claim.

Figures

Figures reproduced from arXiv: 2604.02899 by Haseeb Tariq, Marwan Hassani.

Figure 1
Figure 1. Figure 1: Types of ML flow accounts. Blue node represents the node that is being defined. 3.2 Challenges Transaction labels are highly imbalanced. Usually, legitimate trans￾actions outnumber illicit ones, which appear in ratios as low as 0.05% (see [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: We want to stress here that, while having 9 stages, the [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the framework [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Example of dispense flow calculation. from the aggregated graph [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Execution times for the distributed graph features [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Execution times for the flow-based features gener [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Execution times for the temporal flow-based fea [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Execution times for different steps, with increasing [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
read the original abstract

Money laundering presents a persistent challenge for financial institutions worldwide, while criminal organizations constantly evolve their tactics to bypass detection systems. Traditional anti-money laundering approaches mainly rely on predefined risk-based rules, leading to resource-intensive investigations and high numbers of false positive alerts. In order to restrict operational costs from exploding, while billions of transactions are being processed every day, financial institutions are investing in more sophisticated mechanisms to improve existing systems. In this paper, we present ExSTraQt (EXtract Suspicious TRAnsactions from Quasi-Temporal graph representation), an advanced supervised learning approach to detect money laundering (or suspicious) transactions in financial datasets. Our proposed framework excels in performance, when compared to the state-of-the-art AML (Anti Money Laundering) detection models. The key strengths of our framework are sheer simplicity, in terms of design and number of parameters; and scalability, in terms of the computing and memory requirements. We evaluated our framework on transaction-level detection accuracy using a real dataset; and a set of synthetic financial transaction datasets. We consistently achieve an uplift in the F1 score for most datasets, up to 1% for the real dataset; and more than 8% for one of the synthetic datasets. We also claim that our framework could seamlessly complement existing AML detection systems in banks. Our code and datasets are available at https://github.com/mhaseebtariq/exstraqt.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes ExSTraQt, a supervised learning framework that extracts suspicious money-laundering transactions from a quasi-temporal graph representation of financial transaction data. It claims superior F1-score performance over state-of-the-art AML models (up to 1% uplift on a real dataset and >8% on one synthetic dataset), while highlighting the framework's simplicity (few parameters) and scalability.

Significance. If the performance gains are reproducible and attributable to the quasi-temporal encoding, the work could provide a lightweight, practical complement to existing rule-based AML systems in high-volume banking environments. The public release of code and datasets supports reproducibility, which is a strength.

major comments (3)
  1. [Abstract] Abstract: the stated F1 uplifts are presented without any description of model architecture, feature construction from the quasi-temporal graph, baseline implementations, statistical testing, or data-split procedures. These omissions make it impossible to verify whether the reported improvements support the central claim of consistent outperformance.
  2. [Framework and Evaluation] Framework and Evaluation sections: no ablation experiments are reported that isolate the contribution of the quasi-temporal edges or time-derived features versus a static graph or non-graph feature set. Without such controls, it remains unclear whether the quasi-temporal representation is the load-bearing element driving the observed F1 gains or whether the results could be reproduced with simpler feature sets.
  3. [Evaluation] Evaluation section: the supervised model is described as having a low parameter count, yet no concrete architecture, training procedure, or hyper-parameter details are supplied. This prevents assessment of whether the claimed simplicity is genuine or whether the performance numbers depend on undisclosed implementation choices.
minor comments (2)
  1. [Abstract] The abstract asserts 'sheer simplicity' and 'scalability' but supplies no quantitative runtime, memory, or parameter-count figures in the results; adding these metrics would strengthen the practical-utility claim.
  2. [Methodology] Notation for the quasi-temporal graph construction (e.g., how temporal edges are defined and weighted) is introduced without a formal definition or pseudocode; a small diagram or equation would improve clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that additional methodological details and controls would strengthen the paper and will incorporate them in the revision. We respond to each major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the stated F1 uplifts are presented without any description of model architecture, feature construction from the quasi-temporal graph, baseline implementations, statistical testing, or data-split procedures. These omissions make it impossible to verify whether the reported improvements support the central claim of consistent outperformance.

    Authors: We acknowledge the abstract's brevity due to length limits. The full manuscript describes the supervised classifier (gradient boosting on quasi-temporal features), feature extraction (node/edge attributes plus temporal encodings), baselines (standard AML graph and rule-based models), 5-fold cross-validation for splits, and paired t-tests for significance. In revision we will expand the abstract to include a one-sentence summary of these elements while preserving the performance claims. revision: yes

  2. Referee: [Framework and Evaluation] Framework and Evaluation sections: no ablation experiments are reported that isolate the contribution of the quasi-temporal edges or time-derived features versus a static graph or non-graph feature set. Without such controls, it remains unclear whether the quasi-temporal representation is the load-bearing element driving the observed F1 gains or whether the results could be reproduced with simpler feature sets.

    Authors: The current experiments compare ExSTraQt against published SOTA AML models that employ static graphs and non-graph features, showing consistent gains. We agree that explicit ablations would isolate the quasi-temporal contribution more clearly. We will add these experiments (static vs. quasi-temporal edges, with/without time features) to the revised Evaluation section. revision: yes

  3. Referee: [Evaluation] Evaluation section: the supervised model is described as having a low parameter count, yet no concrete architecture, training procedure, or hyper-parameter details are supplied. This prevents assessment of whether the claimed simplicity is genuine or whether the performance numbers depend on undisclosed implementation choices.

    Authors: The model is an XGBoost classifier with default hyperparameters (max_depth=6, n_estimators=100, learning_rate=0.1) trained via standard 5-fold CV on the extracted features; parameter count is under 10k weights. The GitHub repository contains the exact training script. In revision we will add an explicit subsection listing architecture, training procedure, and all hyperparameters to the Evaluation section. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical performance on held-out data

full rationale

The paper presents ExSTraQt as a supervised classifier on quasi-temporal graph features, with all reported results consisting of F1-score uplifts measured on separate real and synthetic transaction datasets. No equations, parameter-fitting steps, or derivations are described that would reduce the claimed performance gains to quantities defined by or fitted to the evaluation data itself. The evaluation protocol is presented as standard held-out testing against baselines, with no self-citation load-bearing the central claim and no renaming of known results as novel derivations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the unproven effectiveness of the quasi-temporal representation for capturing laundering tactics and on standard supervised-learning generalization assumptions; no explicit free parameters or invented physical entities are described.

axioms (1)
  • domain assumption Labeled transaction data exists and is representative enough for supervised models to learn generalizable patterns of money laundering.
    Implicit in any supervised detection claim on real and synthetic financial datasets.
invented entities (1)
  • Quasi-temporal graph representation no independent evidence
    purpose: To encode financial transaction sequences in a graph form that preserves temporal ordering while remaining computationally lightweight.
    Introduced as the core modeling choice of the ExSTraQt framework.

pith-pipeline@v0.9.0 · 5547 in / 1271 out tokens · 40416 ms · 2026-05-13T20:49:50.123005+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 4 internal anchors

  1. [1]

    Erik Altman, Jovan Blanuša, Luc Von Niederhäusern, Béni Egressy, Andreea Anghel, and Kubilay Atasu. 2023. Realistic synthetic financial transactions for anti-money laundering models.Advances in Neural Information Processing Systems36 (2023), 29851–29874

  2. [2]

    Relational inductive biases, deep learning, and graph networks

    Peter W. Battaglia, Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinícius Flores Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Ra- poso, Adam Santoro, Ryan Faulkner, Çaglar Gülçehre, H. Francis Song, An- drew J. Ballard, Justin Gilmer, George E. Dahl, Ashish Vaswani, Kelsey R. Allen, Charles Nash, Victoria Langston, Chris Dyer, Nicol...

  3. [3]

    Luca Bionducci, Alessio Botta, Philip Bruno, Olivier Denecker, Carolyne Gathinji, Reema Jain, Marie-Claude Nadeau, and Bharath Sattanathan

  4. [4]

    https://www.mckinsey.com/industries/financial-services/our-insights/ the-2023-mckinsey-global-payments-report

    On the cusp of the next payments era: Future opportunities for Banks. https://www.mckinsey.com/industries/financial-services/our-insights/ the-2023-mckinsey-global-payments-report

  5. [5]

    Jovan Blanuša, Maximo Cravero Baraja, Andreea Anghel, Luc von Niederhäusern, Erik Altman, Haris Pozidis, and Kubilay Atasu. 2024. Graph Feature Preproces- sor: Real-time Extraction of Subgraph-based Features from Transaction Graphs. arXiv:2402.08593 [cs.LG]

  6. [6]

    Xavier Bresson and Thomas Laurent. 2017. Residual Gated Graph ConvNets. (11 2017). doi:10.48550/arXiv.1711.07553

  7. [7]

    Mário Cardoso, Pedro Saleiro, and Pedro Bizarro. 2022. LaundroGraph: Self-Supervised Graph Representation Learning for Anti-Money Laundering. arXiv:2210.14360 [cs.LG]

  8. [8]

    Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(San Francisco, California, USA)(KDD ’16). Association for Computing Machinery, New York, NY, USA, 785–794. doi:10. 1145/2939672.2939785

  9. [9]

    Gabriele Corso, Luca Cavalleri, Dominique Beaini, Pietro Liò, and Petar Veličković. 2020. Principal Neighbourhood Aggregation for Graph Nets. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ran- zato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 13260–13271. https://proceedings.neurips.cc/paper_file...

  10. [10]

    Ahmad Naser Eddin, Jacopo Bono, David Aparício, David Polido, João Tiago As- censão, Pedro Bizarro, and Pedro Ribeiro. 2022. Anti-Money Laundering Alert Op- timization Using Machine Learning with Graphs. (2022). arXiv:2112.07508 [cs.LG] https://arxiv.org/abs/2112.07508

  11. [11]

    Béni Egressy, Luc von Niederhäusern, Jovan Blanuša, Erik Altman, Roger Wat- tenhofer, and Kubilay Atasu. 2024. Provably powerful graph neural networks for directed multigraphs. InProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Sympo...

  12. [12]

    FIU Nederland. 2024. What is Money Laundering? https://www.fiu-nederland. nl/en/home/what-is-money-laundering

  13. [13]

    https://www.fincen.gov/. 2025. FinCEN Suspicious Activity Report (FinCEN SAR) Electronic Filing Instructions. https://www.fincen.gov/ sites/default/files/shared/FinCEN%20SAR%20ElectronicFilingInstructions- %20Stand%20Alone%20doc.pdf

  14. [14]

    Weihua Hu*, Bowen Liu*, Joseph Gomes, Marinka Zitnik, Percy Liang, Vi- jay Pande, and Jure Leskovec. 2020. Strategies for Pre-training Graph Neu- ral Networks. InInternational Conference on Learning Representations. https: //openreview.net/forum?id=HJlWWJSFDH

  15. [15]

    Guillaume Jaume, An-Phi Nguyen, Maria Rodriguez Martinez, Jean-Philippe Thiran, and Maria Gabrani. 2019. edGNN: a Simple and Powerful GNN for Directed Labeled Graphs. (04 2019). doi:10.48550/arXiv.1904.08745

  16. [16]

    Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: a highly efficient gradient boost- ing decision tree. InProceedings of the 31st International Conference on Neural Information Processing Systems(Long Beach, California, USA)(NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 3149–3157

  17. [17]

    Kipf et al.Variational Graph Auto- Encoders

    Thomas N. Kipf and Max Welling. 2016. Variational Graph Auto-Encoders. arXiv:1611.07308 [stat.ML] https://arxiv.org/abs/1611.07308

  18. [18]

    Xurui Li, Xiang Cao, Xuetao Qiu, Jintao Zhao, and Jianbin Zheng. 2017. Intelligent Anti-Money Laundering Solution Based upon Novel Community Detection in Massive Transaction Networks on Spark. In2017 Fifth International Conference on Advanced Cloud and Big Data (CBD). 176–181. doi:10.1109/CBD.2017.38

  19. [19]

    Xiangfeng Li, Shenghua Liu, Zifeng Li, Xiaotian Han, Chuan Shi, Bryan Hooi, He Huang, and Xueqi Cheng. 2020. Flowscope: Spotting money laundering based on graphs. InProceedings of the AAAI conference on artificial intelligence, Vol. 34. 4731–4738

  20. [20]

    Junhong Lin, Xiaojie Guo, Yada Zhu, Samuel Mitchell, Erik Altman, and Julian Shun. 2024. FraudGT: A Simple, Effective, and Efficient Graph Transformer for Financial Fraud Detection. InProceedings of the 5th ACM International Conference on AI in Finance(Brooklyn, NY, USA)(ICAIF ’24). Association for Computing Machinery, New York, NY, USA, 292–300. doi:10.1...

  21. [21]

    Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation Forest. In2008 Eighth IEEE International Conference on Data Mining. 413–422. doi:10.1109/ICDM. 2008.17

  22. [22]

    Junliang Luo and Xue Liu. 2025. Optimizing Blockchain Analysis: Tackling Temporality and Scalability with an Incremental Approach with Metropolis- Hastings Random Walks. InProceedings of WSDM. 410–418

  23. [23]

    Berkan Oztas, Deniz Cetinkaya, Festus Adedoyin, Marcin Budka, Gokhan Aksu, and Huseyin Dogan. 2024. Transaction monitoring in anti-money laundering: A qualitative analysis and points of view from industry.Future Generation Computer Systems159 (2024), 161–171. doi:10.1016/j.future.2024.05.027

  24. [24]

    1999.The PageRank Citation Ranking: Bringing Order to the Web.Technical Report 1999-

    Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999.The PageRank Citation Ranking: Bringing Order to the Web.Technical Report 1999-

  25. [25]

    http://ilpubs.stanford.edu:8090/422/ Previous number = SIDL-WP-1999-0120

    Stanford InfoLab. http://ilpubs.stanford.edu:8090/422/ Previous number = SIDL-WP-1999-0120

  26. [26]

    2019.Approximation ratios of graph neural networks for combinatorial problems

    Ryoma Sato, Makoto Yamada, and Hisashi Kashima. 2019.Approximation ratios of graph neural networks for combinatorial problems. Curran Associates Inc., Red Hook, NY, USA

  27. [27]

    Michele Starnini, Charalampos E. Tsourakakis, Maryam Zamanipour, André Panisson, Walter Allasia, Marco Fornasiero, Laura Li Puma, Valeria Ricci, Silvia Ronchiadin, Angela Ugrinoska, Marco Varetto, and Dario Moncalvo. 2021.Smurf- Based Anti-money Laundering in Time-Evolving Transaction Networks. Springer International Publishing, 171–186. doi:10.1007/978-3...

  28. [28]

    Haseeb Tariq and Marwan Hassani. 2023. Topology-Agnostic Detection of Temporal Money Laundering Flows in Billion-Scale Transactions. InMachine Learning and Principles and Practice of Knowledge Discovery in Databases. 402– 419

  29. [29]

    Haseeb Tariq, Alen Kaja, and Marwan Hassani. 2026. Detecting Complex Money Laundering Patterns with Incremental and Distributed Graph Modeling. arXiv:2604.01315 [cs.LG] https://arxiv.org/abs/2604.01315

  30. [30]

    2025.SparkR: R Front End for ’Apache Spark’

    The Apache Software Foundation. 2025.SparkR: R Front End for ’Apache Spark’. https://www.apache.org https://spark.apache.org

  31. [31]

    Scientific Reports9(1), 5233 (2019) https://doi

    V. Traag, L. Waltman, and Nees Jan van Eck. 2019. From Louvain to Leiden: guaranteeing well-connected communities.Scientific Reports9 (03 2019), 5233. doi:10.1038/s41598-019-41695-z

  32. [32]

    United Nations Office on Drugs and Crime. 2024. Overview - Money Laundering. https://www.unodc.org/unodc/en/money-laundering/overview.html

  33. [33]

    Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. arXiv:1710.10903 [stat.ML] https://arxiv.org/abs/1710.10903

  34. [34]

    Weidele, Claudio Bellei, Tom Robinson, and Charles E

    Mark Weber, Giacomo Domeniconi, Jie Chen, Daniel Karl I. Weidele, Claudio Bellei, Tom Robinson, and Charles E. Leiserson. 2019. Anti-Money Laundering in Bitcoin: Experimenting with Graph Convolutional Networks for Financial Forensics. arXiv:1908.02591 [cs.SI] https://arxiv.org/abs/1908.02591

  35. [35]

    withpersona.com. 2025. The most mind-blowing money laundering statis- tics of 2023. https://withpersona.com/blog/the-most-mind-blowing-money- laundering-statistics-of-2022

  36. [36]

    XBlock. [n. d.].Ethereum Phishing Transaction Network. https://www.kaggle. com/datasets/xblock/ethereum-phishing-transaction-network

  37. [37]

    Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2019. How Powerful are Graph Neural Networks? arXiv:1810.00826 [cs.LG] https://arxiv.org/abs/ 1810.00826

  38. [38]

    Jiaxuan You, Jonathan M Gomes-Selman, Rex Ying, and Jure Leskovec. 2021. Identity-aware Graph Neural Networks.Proceedings of the AAAI Conference on Artificial Intelligence35, 12 (May 2021), 10737–10745. doi:10.1609/aaai.v35i12. 17283