pith. machine review for the scientific record. sign in

arxiv: 2605.01892 · v1 · submitted 2026-05-03 · 💻 cs.AI · cs.CR· cs.IR

Recognition: unknown

CyberAId: AI-Driven Cybersecurity for Financial Service Providers

Authors on Pith no claims yet

Pith reviewed 2026-05-09 17:25 UTC · model grok-4.3

classification 💻 cs.AI cs.CRcs.IR
keywords cybersecuritymulti-agent systemslarge language modelsfinancial servicesSIEMregulatory compliancehybrid AIfederated defense
0
0 comments X

The pith

A hybrid multi-agent system layers specialized LLM subagents over existing SIEM telemetry to deliver auditable cybersecurity for financial institutions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

European financial institutions struggle with alert volumes their SOC teams cannot investigate and incomplete coverage of attack techniques, even though they possess the raw data. Frontier LLMs perform well on isolated tasks such as vulnerability exploitation or intrusion detection, yet no single model yet constitutes a full platform that persists state, maps outputs to regulations, and passes audits. The paper argues the solution lies in a hybrid architecture where LLM subagents reason over classical SIEM and XDR telemetry instead of replacing it, federate accumulated knowledge across institutions without sharing raw data, and plug into complementary tools such as quantum authentication or digital twins. This approach is instantiated in the CyberAId platform, which coordinates subagents through a main agent layer under bounded human oversight and follows four falsifiable design principles. The authors plan to validate the system in four concrete financial scenarios and treat each deployment as a step toward collective, continuously improving defense.

Core claim

The right unit of construction is a hybrid multi-agent system in which specialised LLM subagents reason over classical SIEM/XDR telemetry rather than replacing it, share accumulated agent state across institutions through privacy-preserving federation, and can connect to complementary capability packs such as quantum-based authentication, digital twins for adversarial validation, and eBPF-based kernel telemetry. CyberAId is a model-agnostic, on-premise-deployable platform in which a Main Agent coordination layer, a Reporting capability, and specialist subagents operate within a shared runtime under bounded human-in-the-loop autonomy, organised around four falsifiable design principles, and 4

What carries the argument

The CyberAId hybrid multi-agent platform, in which a Main Agent coordinates specialist LLM subagents that process SIEM/XDR telemetry, maintain federated state, produce regulatory-mapped reports, and integrate additional capability modules inside a shared runtime with human oversight.

If this is right

  • SOC teams gain capacity to investigate a larger share of generated alerts by delegating initial reasoning to the subagents.
  • Coverage of techniques in frameworks such as MITRE ATT&CK rises without discarding prior investments in SIEM and XDR systems.
  • Security findings become directly mappable to regulatory regimes, supporting audit survival and compliance reporting.
  • Institutions can accumulate and share defensive knowledge across boundaries through privacy-preserving federation.
  • Each real-world deployment contributes to a growing collective defense via skill-based agent adaptation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same hybrid layering pattern could extend to regulated sectors such as healthcare or energy, where telemetry already exists but reasoning capacity is limited.
  • Skill-based adaptation across deployments offers a practical route to turning isolated installations into an evolving shared defense resource.
  • Integration with quantum and digital-twin modules points toward a modular way to incorporate future technologies without rebuilding the core agent layer.
  • Bounded human-in-the-loop autonomy provides a template for other high-stakes domains that need both automation and accountability.

Load-bearing premise

Frontier large language models can be composed into a persistent, auditable multi-tenant platform that maps its findings to regulatory regimes and survives audits when layered on top of existing telemetry.

What would settle it

A deployment of CyberAId in one of the four target use cases, such as anti-money-laundering for payment service providers, that produces reports rejected during a regulatory audit or fails to increase the fraction of investigated alerts would show the architecture does not meet its requirements.

Figures

Figures reproduced from arXiv: 2605.01892 by Amin Babazadeh, Bruno Almeida, Christos Gkizelis, Christos Xenakis, Despina Tomkou, Dimosthenis Kyriazis, Ernstjan de Gooyert, George Fatouros, George Kousiouris, Georgios Makridis, Giannis Chouchoulis, Giannis Ledakis, John Soldatos, Konstantina Tripodi, Konstantinos Ilias, Kostas Metaxas, Kostis Mavrogiorgos, Louiza Kachrimani, Panagiotis Rizomiliotis, Pedro Malo, Pepi Paraskevoulakou.

Figure 1
Figure 1. Figure 1: Conceptual architecture of the CyberAId platform. Specialist LLM agents share a single multi-agent runtime that reasons over classical SIEM/XDR view at source ↗
Figure 2
Figure 2. Figure 2: Agent orchestration and trust envelope. The Security Context flows view at source ↗
Figure 3
Figure 3. Figure 3: Client-impersonation use case (UC-1). A suspicious order is wrapped view at source ↗
read the original abstract

European financial institutions face mounting regulatory pressure while their security operations centres remain constrained not by data or staffing but by reasoning capacity: enterprise SIEMs cover only a fraction of MITRE ATT&CK techniques, two thirds of SOC teams cannot keep pace with alert volumes, and the majority of breaches are preceded by alerts that are generated but never investigated. Frontier large language models now achieve state-of-the-art results on isolated cybersecurity tasks (one-day vulnerability exploitation, code-level patching, intrusion detection) yet no narrow win constitutes a platform that can compose across functions, persist multi-tenant state, map findings to regulatory regimes and survive an audit. This position paper argues that the right unit of construction is a hybrid multi-agent system in which specialised LLM subagents reason over classical SIEM/XDR telemetry rather than replacing it, share accumulated agent state across institutions through privacy-preserving federation, and can connect to complementary capability packs such as quantum-based authentication, digital twins for adversarial validation, and eBPF-based kernel telemetry. We present CyberAId, a model-agnostic, on-premise-deployable platform in which a Main Agent coordination layer, a Reporting capability, and specialist subagents operate within a shared runtime under bounded human-in-the-loop autonomy, organised around four falsifiable design principles, and aligned with relevant regulations. CyberAId will be validated at four representative financial use cases (client impersonation, anti-money-laundering for payment service providers, retail-banking incident response, and high-frequency-trading resilience) and propose skill-based agent adaptation as the most promising research direction for turning each deployment into a contribution to a continuously refined collective defence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper proposes CyberAId, a model-agnostic, on-premise hybrid multi-agent platform in which a Main Agent coordination layer orchestrates specialized LLM subagents that reason over classical SIEM/XDR telemetry rather than replacing it. The architecture supports privacy-preserving federation for sharing agent state across institutions, integration with complementary capability packs, bounded human-in-the-loop autonomy, and mapping of findings to regulatory regimes, all organized around four falsifiable design principles. Validation is planned for four financial use cases (client impersonation, anti-money-laundering, retail-banking incident response, and high-frequency-trading resilience), with skill-based agent adaptation proposed as a path to collective defense.

Significance. If the proposed architecture can be realized with the claimed properties, it would offer a concrete path to composing isolated LLM cybersecurity capabilities into a persistent, auditable, multi-tenant platform that augments rather than supplants existing telemetry tools. The emphasis on privacy-preserving federation, regulatory alignment, and falsifiable design principles provides a structured framework that could improve SOC reasoning capacity while preserving auditability and compliance in regulated financial environments.

minor comments (3)
  1. [Abstract and CyberAId platform section] The abstract and platform description refer to 'four falsifiable design principles' without enumerating or briefly stating them; adding an explicit list would allow readers to assess their falsifiability directly.
  2. [Use-case validation paragraph] The four representative financial use cases are named but receive no high-level scenario descriptions or mappings to specific subagent roles; including one-sentence illustrations per use case would strengthen the grounding of the proposal.
  3. [Introduction] References to frontier LLM performance on isolated tasks (vulnerability exploitation, code patching, intrusion detection) are cited without specific citations or performance metrics; adding 1-2 key references would support the composition argument.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their thorough summary of our position paper and for the positive assessment of CyberAId's potential to address reasoning-capacity constraints in financial SOCs while preserving auditability and regulatory alignment. We are pleased that the referee recognizes the value of the hybrid multi-agent architecture, privacy-preserving federation, and falsifiable design principles. Given the recommendation for minor revision and the absence of specific major comments requiring changes, we believe the manuscript is already well-positioned.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper is a position paper that presents a conceptual architecture proposal without equations, derivations, fitted parameters, or any claimed first-principles results. All central claims (hybrid multi-agent LLM layer over SIEM/XDR, privacy-preserving federation, regulatory mapping) are framed as design principles to be validated in future work rather than derived from prior steps within the paper. No load-bearing argument reduces to self-definition, fitted inputs renamed as predictions, or self-citation chains. The derivation chain is empty by construction, making the proposal self-contained as a forward-looking design document.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The proposal rests on domain assumptions about LLM capabilities and the feasibility of hybrid integration rather than new axioms or entities with independent evidence.

axioms (2)
  • domain assumption Frontier large language models achieve state-of-the-art results on isolated cybersecurity tasks
    Invoked in the abstract as the foundation for composing subagents into a platform.
  • ad hoc to paper A hybrid multi-agent system can persist multi-tenant state and map findings to regulatory regimes while surviving audit
    Central design claim presented without supporting derivation or evidence.
invented entities (1)
  • CyberAId platform with Main Agent coordination layer and specialist subagents no independent evidence
    purpose: To serve as the model-agnostic on-premise system that composes LLM reasoning with classical telemetry
    The core proposed construct; no implementation or falsifiable test results are provided in the paper.

pith-pipeline@v0.9.0 · 5713 in / 1480 out tokens · 40166 ms · 2026-05-09T17:25:36.366160+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

29 extracted references · 7 canonical work pages · 1 internal anchor

  1. [1]

    Report on artificial intelligence in the banking sector,

    European Banking Authority, “Report on artificial intelligence in the banking sector,” EBA, Paris, France, Tech. Rep., 2024. [Online]. Available: https://www.eba.europa.eu

  2. [2]

    Cost of a data breach report 2024,

    IBM Security, “Cost of a data breach report 2024,” IBM, Armonk, NY , USA, Tech. Rep., 2024, accessed: 2026-04-01. [Online]. Available: https://www.ibm.com/think/insights/ cost-of-a-data-breach-2024-financial-industry

  3. [3]

    Ai-augmented soc: A survey of llms and agents for security automation,

    S. Srinivas, B. Kirk, J. Zendejas, M. Espino, M. Boskovich, A. Bari, K. Dajani, and N. Alzahrani, “Ai-augmented soc: A survey of llms and agents for security automation,”Journal of Cybersecurity and Privacy, vol. 5, no. 4, 2025. [Online]. Available: https://www.mdpi.com/2624-800X/5/4/95

  4. [4]

    Llm agents can au- tonomously exploit one-day vulnerabilities

    R. Fang, R. Bindu, A. Gupta, and D. Kang, “LLM agents can autonomously exploit one-day vulnerabilities,”arXiv preprint arXiv:2404.08144, 2024

  5. [5]

    Teams of llm agents can exploit zero-day vulnerabilities,

    Y . Zhu, A. Kellermann, A. Gupta, P. Li, R. Fang, R. Bindu, and D. Kang, “Teams of llm agents can exploit zero-day vulnerabilities,” inProceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), 2026, pp. 23–35

  6. [6]

    IRIS: LLM-assisted static analysis for de- tecting security vulnerabilities,

    Z. Li, S. Dutta, and M. Naik, “IRIS: LLM-assisted static analysis for detecting security vulnerabilities,”arXiv preprint arXiv:2405.17238, 2024, accepted at ICLR 2025

  7. [7]

    Ids-agent: An llm agent for explainable intrusion detection in iot networks,

    Y . Li, Z. Xiang, N. D. Bastian, D. Song, and B. Li, “Ids-agent: An llm agent for explainable intrusion detection in iot networks,” in NeurIPS 2024 Workshop on Open-World Agents: Synnergizing Reason- ing and Decision-Making in Open-World Environments (OWA-2024), 2024, poster

  8. [8]

    ReAct: Synergizing reasoning and acting in language models,

    S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y . Cao, “ReAct: Synergizing reasoning and acting in language models,” inPro- ceedings of the International Conference on Learning Representations (ICLR 2023), Kigali, Rwanda, 2023

  9. [9]

    A survey on large language model based autonomous agents,

    L. Wang, C. Ma, X. Feng, Z. Zhang, H. Yang, J. Zhang, Z. Chen, J. Tang, X. Chen, Y . Linet al., “A survey on large language model based autonomous agents,”Frontiers of Computer Science, vol. 18, no. 6, p. 186345, 2024

  10. [10]

    Autogen: Enabling next-gen llm applications via multi-agent conversations,

    Q. Wu, G. Bansal, J. Zhang, Y . Wu, B. Li, E. Zhu, L. Jiang, X. Zhang, S. Zhang, J. Liuet al., “Autogen: Enabling next-gen llm applications via multi-agent conversations,” inFirst conference on language modeling, 2024

  11. [11]

    Chain-of-thought prompting elicits reasoning in large language models,

    J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V . Le, D. Zhouet al., “Chain-of-thought prompting elicits reasoning in large language models,”Advances in neural information processing systems, vol. 35, pp. 24 824–24 837, 2022

  12. [12]

    Signal or Noise in Multi-Agent LLM-based Stock Recommendations?

    G. Fatouros and K. Metaxas, “Signal or noise in multi-agent llm-based stock recommendations?”arXiv preprint arXiv:2604.17327, 2026

  13. [13]

    Toolformer: Language models can teach themselves to use tools,

    T. Schick, J. Dwivedi-Yu, R. Dess `ı, R. Raileanu, M. Lomeli, E. Hambro, L. Zettlemoyer, N. Cancedda, and T. Scialom, “Toolformer: Language models can teach themselves to use tools,”Advances in neural informa- tion processing systems, vol. 36, pp. 68 539–68 551, 2023

  14. [14]

    Multi-agent llm orchestration achieves deterministic, high-quality decision support for incident response,

    P. Drammeh, “Multi-agent llm orchestration achieves deterministic, high-quality decision support for incident response,”arXiv preprint arXiv:2511.15755, 2025

  15. [15]

    CORTEX: Collaborative LLM Agents for High-Stakes Alert Triage, September 2025

    B. Wei, Y . S. Tay, H. Liu, J. Pan, K. Luo, Z. Zhu, and C. Jordan, “CORTEX: Collaborative LLM agents for high-stakes alert triage,”arXiv preprint arXiv:2510.00311, 2025

  16. [16]

    Cyberrag: An agentic rag cyber attack classification and reporting tool,

    F. Blefari, C. Cosentino, F. A. Pironti, A. Furfaro, and F. Marozzo, “Cyberrag: An agentic rag cyber attack classification and reporting tool,” Future Generation Computer Systems, p. 108186, 2025

  17. [17]

    CTIKG: LLM-powered knowledge graph construction from cyber threat intelligence,

    L. Huang and X. Xiao, “CTIKG: LLM-powered knowledge graph construction from cyber threat intelligence,” inProceedings of the First Conference on Language Modeling (COLM 2024), Philadelphia, PA, USA, 2024

  18. [18]

    KnowPhish: Large language models meet multimodal knowledge graphs for enhancing reference-based phishing detection,

    Y . Li, C. Huang, S. Deng, M. L. Lock, T. Cao, N. Oo, H. W. Lim, and B. Hooi, “KnowPhish: Large language models meet multimodal knowledge graphs for enhancing reference-based phishing detection,” in 33rd USENIX Security Symposium (USENIX Security 24). Philadelphia, PA: USENIX Association, 2024, pp. 793–810. [Online]. Available: https: //www.usenix.org/con...

  19. [19]

    Regulation (EU) 2022/2554 on digital operational resilience for the financial sector (DORA),

    European Parliament and Council, “Regulation (EU) 2022/2554 on digital operational resilience for the financial sector (DORA),” Brussels, Belgium, 2022

  20. [20]

    Directive (EU) 2022/2555 on measures for a high common level of cybersecurity across the union (NIS2),

    EU, “Directive (EU) 2022/2555 on measures for a high common level of cybersecurity across the union (NIS2),” Brussels, Belgium, 2022

  21. [21]

    Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (AI Act),

    European Parliament and Council, “Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (AI Act),” Brussels, Belgium, 2024

  22. [22]

    Threatmodeling-llm: Automating threat modeling using large language models for banking system,

    T. Wu, S. Yang, S. Liu, D. Nguyen, S. Jang, and A. Abuadbba, “Threatmodeling-llm: Automating threat modeling using large language models for banking system,”arXiv preprint arXiv:2411.17058, 2024

  23. [23]

    The emerged security and privacy of LLM agent: A survey with case studies,

    F. He, T. Zhu, D. Ye, B. Liu, W. Zhou, and P. S. Yu, “The emerged security and privacy of LLM agent: A survey with case studies,”ACM Computing Surveys, 2025

  24. [24]

    From prompt injections to protocol exploits: Threats in llm-powered ai agents workflows,

    M. A. Ferrag, N. Tihanyi, D. Hamouda, L. Maglaras, A. Lakas, and M. Debbah, “From prompt injections to protocol exploits: Threats in llm-powered ai agents workflows,”ICT Express, 2025

  25. [25]

    RedTeamLLM: An agentic AI framework for offensive security,

    B. Challita and P. Parrend, “RedTeamLLM: An agentic AI framework for offensive security,”arXiv preprint arXiv:2505.06913, 2025

  26. [26]

    MITRE ATT&CK Enterprise Matrix, Version 14,

    MITRE Corporation, “MITRE ATT&CK Enterprise Matrix, Version 14,” MITRE ATT&CK Enterprise Matrix, Version 14, McLean, V A, USA, 2023, available online: https://attack.mitre.org (accessed on 1 April 2026)

  27. [27]

    Pci dss: A critical analysis of implemen- tation, effectiveness, and legislative impact in payment card security,

    S. Chippagiri and A. Ramesh, “Pci dss: A critical analysis of implemen- tation, effectiveness, and legislative impact in payment card security,” 2025

  28. [28]

    Shrinking the kernel attack surface through static and dynamic syscall limitation,

    D. Zhan, Z. Yu, X. Yu, H. Zhang, and L. Ye, “Shrinking the kernel attack surface through static and dynamic syscall limitation,”IEEE Transactions on Services Computing, vol. 16, no. 2, pp. 1431–1443, 2022

  29. [29]

    Demonstration of quantum-digital payments,

    P. Schiansky, J. Kalb, E. Sztatecsny, M.-C. Roehsner, T. Guggemos, A. Trenti, M. Bozzio, and P. Walther, “Demonstration of quantum-digital payments,”nature communications, vol. 14, no. 1, p. 3849, 2023