pith. sign in

arxiv: 2606.17555 · v2 · pith:QWVH7EX7new · submitted 2026-06-16 · 💻 cs.CR · cs.AI· cs.CE· cs.ET

An AI Security Agent for Banking: Multi-Vector Fraud and AML Detection Across Retail and Corporate Accounts

Pith reviewed 2026-06-30 11:04 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.CEcs.ET
keywords fraud detectionAMLbanking securityLSTMgraph analysisAI agentsynthetic datamulti-vector detection
0
0 comments X

The pith

A three-component AI agent fuses LSTM, statistical monitors, and graph analysis to detect banking fraud and AML more accurately than rule-based or LSTM-only systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents an AI security agent for retail and corporate banking that handles two parallel streams of events: transactions covering card fraud, ACH/wire fraud and AML, and sessions covering account takeover and hijacking. Each stream runs a fusion of an LSTM model for per-account behavior sequences, a statistical velocity and threshold monitor, and a graph module that tracks counterparty patterns such as fan-in, fan-out and pass-through ratios to surface laundering. On a synthetic dataset of 237,669 transactions and 113,508 sessions spanning 13 threat categories and 3,470 accounts, the fused agent records F1 scores of 0.787 for transactions and 0.867 for sessions, exceeding both a rule-based baseline and an LSTM-only baseline. The architecture also supplies a customer verification chatbot and an analyst case-summary assistant while maintaining sub-millisecond critical-tier latency.

Core claim

The agent improves detection of both signature-based fraud and behavioral financial crimes by running LSTM sequence models, statistical velocity/threshold monitors, and graph modules on parallel transaction and session streams, delivering F1 scores of 0.787 and 0.867 on synthetic logs versus 0.562/0.733 for rules and 0.655/0.713 for LSTM alone.

What carries the argument

Three-component fusion architecture that processes transaction and session streams with an LSTM sequence model of per-account behavior, a statistical velocity/threshold monitor, and a graph module for account-counterparty patterns.

If this is right

  • The graph module enables detection of layering and mule networks that resemble legitimate activity at the individual level.
  • The customer-facing chatbot provides identity verification at 96.6 percent accuracy while flagging mass-reset attempts at 86.8 percent.
  • Analyst case summaries reach 99.3 percent action-recommendation F1 with critical alerts under 0.43 ms at the 95th percentile.
  • Performance gains hold across both retail and corporate accounts and across 13 distinct threat categories.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the synthetic data distribution matches real banking traffic, banks could replace or augment static rules with this fused approach to reduce missed behavioral crimes.
  • The same fusion pattern could be tested on other high-velocity domains such as insurance claims or securities trading where both individual sequences and network patterns matter.
  • Adding an online learning loop to the LSTM component might allow the agent to adapt to new fraud tactics without full retraining.

Load-bearing premise

The synthetic transaction and session logs sufficiently capture the statistical and graph properties of real-world fraud and AML patterns so that performance on the synthetic set predicts production performance.

What would settle it

Deploy the agent on a real production banking log containing labeled fraud and AML cases and measure whether transaction and session F1 scores remain above the rule-based and LSTM-only baselines.

Figures

Figures reproduced from arXiv: 2606.17555 by Joseph Walusimbi, Joshua Benjamin Ssentongo.

Figure 1
Figure 1. Figure 1: Modular architecture of the AI security agent for banking. Blue: [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
read the original abstract

Banks face two threat families with fundamentally different detection requirements: signature-based fraud (card-not-present attacks, account takeover, ATM cloning) and behavioural financial crime (structuring, layering, mule networks, business email compromise). Static rule engines catch high-velocity events but remain blind to BEC payment redirection, session hijacking, and laundering layering, which are engineered to resemble legitimate activity at the individual level. This paper presents an AI security agent for retail and corporate banking using a three-component fusion architecture across two parallel event streams: transactions (card fraud, ACH/wire fraud, AML) and sessions (account takeover, hijacking, SIM-swap, insider abuse). Each stream combines an LSTM sequence model of per-account behaviour, a statistical velocity/threshold monitor, and a graph module capturing account-counterparty patterns (fan-in, fan-out, pass-through ratio) for laundering detection. Experiments on a synthetic log of 237,669 transactions and 113,508 sessions across 13 threat categories and 3,470 accounts show overall F1 of 0.787 (transaction) and 0.867 (session), versus 0.562/0.733 for a rule-based baseline and 0.655/0.713 for an LSTM-only baseline. The agent also includes a customer-facing verification chatbot (96.6% identity accuracy, 86.8% mass-reset detection) and an analyst case-summary assistant (99.3% action recommendation F1), with Critical-tier response latency under 0.43 ms at the 95th percentile.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents an AI security agent for banking fraud and AML detection that processes parallel transaction and session streams using a three-component architecture: LSTM sequence models for per-account behavior, statistical velocity/threshold monitors, and graph modules capturing counterparty patterns such as fan-in, fan-out, and pass-through ratios. It reports overall F1 scores of 0.787 (transactions) and 0.867 (sessions) on a synthetic dataset of 237,669 transactions and 113,508 sessions spanning 13 threat categories and 3,470 accounts, outperforming rule-based (0.562/0.733) and LSTM-only (0.655/0.713) baselines, and additionally describes a customer verification chatbot and analyst case-summary assistant.

Significance. If the synthetic data were shown to reproduce the relevant real-world joint distributions (velocity profiles, amount histograms, and directed graph properties of mule networks and layering), the fused architecture could offer a practical advance over static rules for behavioral threats like BEC and structuring. The low reported latency and multi-stream design address operational constraints in retail and corporate banking. However, without validation of the synthetic generator, the numerical gains do not yet establish production utility or generalizability.

major comments (2)
  1. [Abstract] Abstract: The headline F1 results (0.787 transaction, 0.867 session) and all comparative claims rest exclusively on a synthetic log of 237,669 transactions and 113,508 sessions generated by the authors. No section supplies the generation procedure, the calibration targets (e.g., per-account velocity distributions, amount histograms conditioned on fraud type, or graph metrics such as fan-in/out and pass-through ratios), or any quantitative match to real banking telemetry. This absence directly undermines evaluation of whether the reported lift over baselines reflects architectural merit or properties of the synthetic threat injection.
  2. [Abstract] Abstract (experiments paragraph): The manuscript states that the synthetic data covers “13 threat categories” yet provides neither the precise definitions of those categories nor any hold-out or external benchmark set. Without these, it is impossible to determine whether the three-component fusion genuinely improves detection of the behavioral patterns (layering, mule networks) that the introduction identifies as the primary motivation for the graph module.
minor comments (1)
  1. [Abstract] Abstract: The 13 threat categories and the precise definitions of the graph features (fan-in, fan-out, pass-through ratio) are referenced but not enumerated or formalized; adding a short table or explicit list would improve clarity.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the thorough review and for highlighting the importance of synthetic data transparency. We agree that additional details on the data generator and threat definitions will strengthen the paper and will incorporate them in the revision. Our responses to the major comments follow.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The headline F1 results (0.787 transaction, 0.867 session) and all comparative claims rest exclusively on a synthetic log of 237,669 transactions and 113,508 sessions generated by the authors. No section supplies the generation procedure, the calibration targets (e.g., per-account velocity distributions, amount histograms conditioned on fraud type, or graph metrics such as fan-in/out and pass-through ratios), or any quantitative match to real banking telemetry. This absence directly undermines evaluation of whether the reported lift over baselines reflects architectural merit or properties of the synthetic threat injection.

    Authors: We acknowledge that the manuscript does not currently include a detailed description of the synthetic data generation procedure or its calibration targets. In the revised version we will add a new subsection (approximately 3.1) that specifies the generator design, including how velocity distributions, amount histograms conditioned on each fraud type, and graph metrics (fan-in, fan-out, pass-through ratios) were set using publicly reported banking-fraud statistics and domain-expert heuristics. We will also report quantitative similarity measures (e.g., Kolmogorov-Smirnov distances on marginals and graph-property comparisons) between the synthetic logs and the reference distributions used for calibration. Because the underlying real-world telemetry remains proprietary, we cannot publish direct numerical matches to any single bank’s production data; the added section will instead make the calibration process fully reproducible from open sources. revision: yes

  2. Referee: [Abstract] Abstract (experiments paragraph): The manuscript states that the synthetic data covers “13 threat categories” yet provides neither the precise definitions of those categories nor any hold-out or external benchmark set. Without these, it is impossible to determine whether the three-component fusion genuinely improves detection of the behavioral patterns (layering, mule networks) that the introduction identifies as the primary motivation for the graph module.

    Authors: We will add a table (new Table 1) that gives precise, operational definitions for each of the 13 threat categories together with the transaction- and session-level features that instantiate them. The revised experiments section will also state the train/validation/test split ratios used on the synthetic corpus and confirm that all reported F1 scores are computed on the held-out test portion. No external real-world benchmark set is available owing to regulatory and privacy restrictions on banking telemetry; the synthetic generator is explicitly constructed to reproduce the joint distributions of the behavioral patterns (layering, mule networks, BEC redirection) that motivate the graph module. We believe the combination of explicit category definitions and documented calibration will allow readers to assess whether the observed lift is attributable to the fusion architecture. revision: partial

standing simulated objections not resolved
  • Direct quantitative validation against any bank’s proprietary real-world telemetry is precluded by data-privacy and regulatory constraints; only publicly reported aggregate statistics can be used for calibration.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The manuscript presents a three-component fusion architecture (LSTM + statistical monitor + graph module) for fraud/AML detection and reports empirical F1 scores on a described synthetic transaction/session log. No load-bearing step reduces a claimed result to its own inputs by construction: the architecture is not defined in terms of the reported metrics, no parameter is fitted on a subset and then renamed as a prediction on a related quantity, and no uniqueness theorem or ansatz is imported via self-citation. The central performance numbers are direct measurements on the provided synthetic corpus rather than algebraic identities or self-referential fits. Concerns about synthetic-data realism pertain to external validity and are outside the circularity criteria.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; no equations, parameter lists, or explicit assumptions are visible beyond the implicit claim that synthetic data is representative.

pith-pipeline@v0.9.1-grok · 5827 in / 1331 out tokens · 27353 ms · 2026-06-30T11:04:08.047672+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

19 extracted references · 9 canonical work pages

  1. [1]

    Schmidhuber, 1997: Long short-term memory.Neural Com- put.,9, 1735–1780, doi:10.1162/neco.1997.9.8.1735

    S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997. doi: 10.1162/neco.1997.9.8.1735

  2. [2]

    Isolation forest,

    F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation forest,” inProc. 8th IEEE Int. Conf. Data Mining (ICDM), Pisa, Italy, 2008, pp. 413–422. doi: 10.1109/ICDM.2008.17

  3. [3]

    A survey of network anomaly detection techniques,

    M. Ahmed, A. N. Mahmood, and J. Hu, “A survey of network anomaly detection techniques,”J. Netw. Comput. Appl., vol. 60, pp. 19–31, 2016. doi: 10.1016/j.jnca.2015.11.016

  4. [4]

    Anomaly detection: a survey.ACM Comput

    V . Chandola, A. Banerjee, and V . Kumar, “Anomaly detection: a survey,”ACM Comput. Surv., vol. 41, no. 3, p. 15, 2009. doi: 10.1145/1541880.1541882

  5. [5]

    A finan- cial fraud detection model based on LSTM deep learning tech- nique,

    Y . Alghofaili, A. Albattah, and M. A. Rassam, “A finan- cial fraud detection model based on LSTM deep learning tech- nique,”J. Appl. Secur. Res., vol. 15, no. 4, pp. 498–516, 2020. doi: 10.1080/19361610.2020.1815491

  6. [6]

    Feature engineering strategies for credit card fraud detection,

    A. C. Bahnsen, D. Aouada, A. Stojanovic, and B. Ottersten, “Feature engineering strategies for credit card fraud detection,”Expert Syst. Appl., vol. 51, pp. 134–142, 2016. doi: 10.1016/j.eswa.2015.12.030

  7. [7]

    Anti-money laundering in Bitcoin: ex- perimenting with graph convolutional networks for financial forensics,

    M. Weber, G. Domeniconi, J. Chen, D. K. I. Weidele, C. Bellei, T. Robinson, and C. E. Leiserson, “Anti-money laundering in Bitcoin: ex- perimenting with graph convolutional networks for financial forensics,” inProc. KDD 2019 Workshop FinancialCrime, 2019

  8. [8]

    Semi-supervised classification with graph convolutional networks,

    T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” inProc. 5th Int. Conf. Learning Representa- tions (ICLR), Toulon, France, 2017

  9. [9]

    Finding money launderers using hetero- geneous graph neural networks,

    F. Johannessen and M. Jullum, “Finding money launderers using hetero- geneous graph neural networks,”J. Finance Data Sci., p. 100175, 2025. doi: 10.1016/j.jfds.2025.100175

  10. [10]

    Financial fraud detection using graph neural networks: a systematic review,

    S. Motie and B. Raahemi, “Financial fraud detection using graph neural networks: a systematic review,”Expert Syst. Appl., vol. 240, p. 122156,

  11. [11]

    doi: 10.1016/j.eswa.2023.122156

  12. [12]

    Internet Crime Report 2023,

    FBI Internet Crime Complaint Center (IC3), “Internet Crime Report 2023,” Federal Bureau of Investigation, Washington, DC, 2024. [On- line]. Available: https://www.ic3.gov/Media/PDF/AnnualReport/2023 IC3Report.pdf

  13. [13]

    What is business email compromise (BEC)?

    Palo Alto Networks Unit 42, “What is business email compromise (BEC)?” 2024. [Online]. Avail- able: https://www.paloaltonetworks.com/cyberpedia/ what-is-business-email-compromise-bec-tactics-and-prevention

  14. [14]

    Network intrusion datasets: a survey, limitations, and recommendations,

    P. Goldschmidt and D. Chud ´a, “Network intrusion datasets: a survey, limitations, and recommendations,”Computers & Security, vol. 156, p. 104510, 2025. doi: 10.1016/j.cose.2025.104510

  15. [15]

    Uganda charges finance ministry officials with corruption and money laundering,

    Reuters, “Uganda charges finance ministry officials with corruption and money laundering,”Reuters, February 7,

  16. [16]

    Available: https://www.reuters.com/world/africa/ uganda-charges-finance-ministry-officials-with-corruption-money-laundering-2025-02-07/

    [Online]. Available: https://www.reuters.com/world/africa/ uganda-charges-finance-ministry-officials-with-corruption-money-laundering-2025-02-07/

  17. [17]

    Global Economic Crime and Fraud Survey 2024 — Uganda Report,

    PricewaterhouseCoopers, “Global Economic Crime and Fraud Survey 2024 — Uganda Report,” PwC Uganda, Kampala, 2024. [Online]. Available: https://www.pwc.com/ug/en/publications/ global-economic-crime-and-fraud-survey-2024.html

  18. [18]

    Equity Group fires 1,200 staff after internal $15 million fraud probe,

    TechCabal, “Equity Group fires 1,200 staff after internal $15 million fraud probe,” May 30, 2025. [Online]. Available: https://techcabal.com/ 2025/05/30/equity-group-ceo-fires-1200-fraud/

  19. [19]

    Flutterwave security breach: $7 million transferred to multiple accounts,

    Techpoint Africa, “Flutterwave security breach: $7 million transferred to multiple accounts,” April 2024. [Online]. Available: https://techpoint. africa/insight/major-hack-and-fraud-cases-in-nigeria/