Recognition: unknown
CyberAId: AI-Driven Cybersecurity for Financial Service Providers
Pith reviewed 2026-05-09 17:25 UTC · model grok-4.3
The pith
A hybrid multi-agent system layers specialized LLM subagents over existing SIEM telemetry to deliver auditable cybersecurity for financial institutions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The right unit of construction is a hybrid multi-agent system in which specialised LLM subagents reason over classical SIEM/XDR telemetry rather than replacing it, share accumulated agent state across institutions through privacy-preserving federation, and can connect to complementary capability packs such as quantum-based authentication, digital twins for adversarial validation, and eBPF-based kernel telemetry. CyberAId is a model-agnostic, on-premise-deployable platform in which a Main Agent coordination layer, a Reporting capability, and specialist subagents operate within a shared runtime under bounded human-in-the-loop autonomy, organised around four falsifiable design principles, and 4
What carries the argument
The CyberAId hybrid multi-agent platform, in which a Main Agent coordinates specialist LLM subagents that process SIEM/XDR telemetry, maintain federated state, produce regulatory-mapped reports, and integrate additional capability modules inside a shared runtime with human oversight.
If this is right
- SOC teams gain capacity to investigate a larger share of generated alerts by delegating initial reasoning to the subagents.
- Coverage of techniques in frameworks such as MITRE ATT&CK rises without discarding prior investments in SIEM and XDR systems.
- Security findings become directly mappable to regulatory regimes, supporting audit survival and compliance reporting.
- Institutions can accumulate and share defensive knowledge across boundaries through privacy-preserving federation.
- Each real-world deployment contributes to a growing collective defense via skill-based agent adaptation.
Where Pith is reading between the lines
- The same hybrid layering pattern could extend to regulated sectors such as healthcare or energy, where telemetry already exists but reasoning capacity is limited.
- Skill-based adaptation across deployments offers a practical route to turning isolated installations into an evolving shared defense resource.
- Integration with quantum and digital-twin modules points toward a modular way to incorporate future technologies without rebuilding the core agent layer.
- Bounded human-in-the-loop autonomy provides a template for other high-stakes domains that need both automation and accountability.
Load-bearing premise
Frontier large language models can be composed into a persistent, auditable multi-tenant platform that maps its findings to regulatory regimes and survives audits when layered on top of existing telemetry.
What would settle it
A deployment of CyberAId in one of the four target use cases, such as anti-money-laundering for payment service providers, that produces reports rejected during a regulatory audit or fails to increase the fraction of investigated alerts would show the architecture does not meet its requirements.
Figures
read the original abstract
European financial institutions face mounting regulatory pressure while their security operations centres remain constrained not by data or staffing but by reasoning capacity: enterprise SIEMs cover only a fraction of MITRE ATT&CK techniques, two thirds of SOC teams cannot keep pace with alert volumes, and the majority of breaches are preceded by alerts that are generated but never investigated. Frontier large language models now achieve state-of-the-art results on isolated cybersecurity tasks (one-day vulnerability exploitation, code-level patching, intrusion detection) yet no narrow win constitutes a platform that can compose across functions, persist multi-tenant state, map findings to regulatory regimes and survive an audit. This position paper argues that the right unit of construction is a hybrid multi-agent system in which specialised LLM subagents reason over classical SIEM/XDR telemetry rather than replacing it, share accumulated agent state across institutions through privacy-preserving federation, and can connect to complementary capability packs such as quantum-based authentication, digital twins for adversarial validation, and eBPF-based kernel telemetry. We present CyberAId, a model-agnostic, on-premise-deployable platform in which a Main Agent coordination layer, a Reporting capability, and specialist subagents operate within a shared runtime under bounded human-in-the-loop autonomy, organised around four falsifiable design principles, and aligned with relevant regulations. CyberAId will be validated at four representative financial use cases (client impersonation, anti-money-laundering for payment service providers, retail-banking incident response, and high-frequency-trading resilience) and propose skill-based agent adaptation as the most promising research direction for turning each deployment into a contribution to a continuously refined collective defence.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes CyberAId, a model-agnostic, on-premise hybrid multi-agent platform in which a Main Agent coordination layer orchestrates specialized LLM subagents that reason over classical SIEM/XDR telemetry rather than replacing it. The architecture supports privacy-preserving federation for sharing agent state across institutions, integration with complementary capability packs, bounded human-in-the-loop autonomy, and mapping of findings to regulatory regimes, all organized around four falsifiable design principles. Validation is planned for four financial use cases (client impersonation, anti-money-laundering, retail-banking incident response, and high-frequency-trading resilience), with skill-based agent adaptation proposed as a path to collective defense.
Significance. If the proposed architecture can be realized with the claimed properties, it would offer a concrete path to composing isolated LLM cybersecurity capabilities into a persistent, auditable, multi-tenant platform that augments rather than supplants existing telemetry tools. The emphasis on privacy-preserving federation, regulatory alignment, and falsifiable design principles provides a structured framework that could improve SOC reasoning capacity while preserving auditability and compliance in regulated financial environments.
minor comments (3)
- [Abstract and CyberAId platform section] The abstract and platform description refer to 'four falsifiable design principles' without enumerating or briefly stating them; adding an explicit list would allow readers to assess their falsifiability directly.
- [Use-case validation paragraph] The four representative financial use cases are named but receive no high-level scenario descriptions or mappings to specific subagent roles; including one-sentence illustrations per use case would strengthen the grounding of the proposal.
- [Introduction] References to frontier LLM performance on isolated tasks (vulnerability exploitation, code patching, intrusion detection) are cited without specific citations or performance metrics; adding 1-2 key references would support the composition argument.
Simulated Author's Rebuttal
We thank the referee for their thorough summary of our position paper and for the positive assessment of CyberAId's potential to address reasoning-capacity constraints in financial SOCs while preserving auditability and regulatory alignment. We are pleased that the referee recognizes the value of the hybrid multi-agent architecture, privacy-preserving federation, and falsifiable design principles. Given the recommendation for minor revision and the absence of specific major comments requiring changes, we believe the manuscript is already well-positioned.
Circularity Check
No significant circularity identified
full rationale
The paper is a position paper that presents a conceptual architecture proposal without equations, derivations, fitted parameters, or any claimed first-principles results. All central claims (hybrid multi-agent LLM layer over SIEM/XDR, privacy-preserving federation, regulatory mapping) are framed as design principles to be validated in future work rather than derived from prior steps within the paper. No load-bearing argument reduces to self-definition, fitted inputs renamed as predictions, or self-citation chains. The derivation chain is empty by construction, making the proposal self-contained as a forward-looking design document.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Frontier large language models achieve state-of-the-art results on isolated cybersecurity tasks
- ad hoc to paper A hybrid multi-agent system can persist multi-tenant state and map findings to regulatory regimes while surviving audit
invented entities (1)
-
CyberAId platform with Main Agent coordination layer and specialist subagents
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Report on artificial intelligence in the banking sector,
European Banking Authority, “Report on artificial intelligence in the banking sector,” EBA, Paris, France, Tech. Rep., 2024. [Online]. Available: https://www.eba.europa.eu
2024
-
[2]
Cost of a data breach report 2024,
IBM Security, “Cost of a data breach report 2024,” IBM, Armonk, NY , USA, Tech. Rep., 2024, accessed: 2026-04-01. [Online]. Available: https://www.ibm.com/think/insights/ cost-of-a-data-breach-2024-financial-industry
2024
-
[3]
Ai-augmented soc: A survey of llms and agents for security automation,
S. Srinivas, B. Kirk, J. Zendejas, M. Espino, M. Boskovich, A. Bari, K. Dajani, and N. Alzahrani, “Ai-augmented soc: A survey of llms and agents for security automation,”Journal of Cybersecurity and Privacy, vol. 5, no. 4, 2025. [Online]. Available: https://www.mdpi.com/2624-800X/5/4/95
2025
-
[4]
Llm agents can au- tonomously exploit one-day vulnerabilities
R. Fang, R. Bindu, A. Gupta, and D. Kang, “LLM agents can autonomously exploit one-day vulnerabilities,”arXiv preprint arXiv:2404.08144, 2024
-
[5]
Teams of llm agents can exploit zero-day vulnerabilities,
Y . Zhu, A. Kellermann, A. Gupta, P. Li, R. Fang, R. Bindu, and D. Kang, “Teams of llm agents can exploit zero-day vulnerabilities,” inProceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), 2026, pp. 23–35
2026
-
[6]
IRIS: LLM-assisted static analysis for de- tecting security vulnerabilities,
Z. Li, S. Dutta, and M. Naik, “IRIS: LLM-assisted static analysis for detecting security vulnerabilities,”arXiv preprint arXiv:2405.17238, 2024, accepted at ICLR 2025
-
[7]
Ids-agent: An llm agent for explainable intrusion detection in iot networks,
Y . Li, Z. Xiang, N. D. Bastian, D. Song, and B. Li, “Ids-agent: An llm agent for explainable intrusion detection in iot networks,” in NeurIPS 2024 Workshop on Open-World Agents: Synnergizing Reason- ing and Decision-Making in Open-World Environments (OWA-2024), 2024, poster
2024
-
[8]
ReAct: Synergizing reasoning and acting in language models,
S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y . Cao, “ReAct: Synergizing reasoning and acting in language models,” inPro- ceedings of the International Conference on Learning Representations (ICLR 2023), Kigali, Rwanda, 2023
2023
-
[9]
A survey on large language model based autonomous agents,
L. Wang, C. Ma, X. Feng, Z. Zhang, H. Yang, J. Zhang, Z. Chen, J. Tang, X. Chen, Y . Linet al., “A survey on large language model based autonomous agents,”Frontiers of Computer Science, vol. 18, no. 6, p. 186345, 2024
2024
-
[10]
Autogen: Enabling next-gen llm applications via multi-agent conversations,
Q. Wu, G. Bansal, J. Zhang, Y . Wu, B. Li, E. Zhu, L. Jiang, X. Zhang, S. Zhang, J. Liuet al., “Autogen: Enabling next-gen llm applications via multi-agent conversations,” inFirst conference on language modeling, 2024
2024
-
[11]
Chain-of-thought prompting elicits reasoning in large language models,
J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V . Le, D. Zhouet al., “Chain-of-thought prompting elicits reasoning in large language models,”Advances in neural information processing systems, vol. 35, pp. 24 824–24 837, 2022
2022
-
[12]
Signal or Noise in Multi-Agent LLM-based Stock Recommendations?
G. Fatouros and K. Metaxas, “Signal or noise in multi-agent llm-based stock recommendations?”arXiv preprint arXiv:2604.17327, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[13]
Toolformer: Language models can teach themselves to use tools,
T. Schick, J. Dwivedi-Yu, R. Dess `ı, R. Raileanu, M. Lomeli, E. Hambro, L. Zettlemoyer, N. Cancedda, and T. Scialom, “Toolformer: Language models can teach themselves to use tools,”Advances in neural informa- tion processing systems, vol. 36, pp. 68 539–68 551, 2023
2023
-
[14]
P. Drammeh, “Multi-agent llm orchestration achieves deterministic, high-quality decision support for incident response,”arXiv preprint arXiv:2511.15755, 2025
-
[15]
CORTEX: Collaborative LLM Agents for High-Stakes Alert Triage, September 2025
B. Wei, Y . S. Tay, H. Liu, J. Pan, K. Luo, Z. Zhu, and C. Jordan, “CORTEX: Collaborative LLM agents for high-stakes alert triage,”arXiv preprint arXiv:2510.00311, 2025
-
[16]
Cyberrag: An agentic rag cyber attack classification and reporting tool,
F. Blefari, C. Cosentino, F. A. Pironti, A. Furfaro, and F. Marozzo, “Cyberrag: An agentic rag cyber attack classification and reporting tool,” Future Generation Computer Systems, p. 108186, 2025
2025
-
[17]
CTIKG: LLM-powered knowledge graph construction from cyber threat intelligence,
L. Huang and X. Xiao, “CTIKG: LLM-powered knowledge graph construction from cyber threat intelligence,” inProceedings of the First Conference on Language Modeling (COLM 2024), Philadelphia, PA, USA, 2024
2024
-
[18]
KnowPhish: Large language models meet multimodal knowledge graphs for enhancing reference-based phishing detection,
Y . Li, C. Huang, S. Deng, M. L. Lock, T. Cao, N. Oo, H. W. Lim, and B. Hooi, “KnowPhish: Large language models meet multimodal knowledge graphs for enhancing reference-based phishing detection,” in 33rd USENIX Security Symposium (USENIX Security 24). Philadelphia, PA: USENIX Association, 2024, pp. 793–810. [Online]. Available: https: //www.usenix.org/con...
2024
-
[19]
Regulation (EU) 2022/2554 on digital operational resilience for the financial sector (DORA),
European Parliament and Council, “Regulation (EU) 2022/2554 on digital operational resilience for the financial sector (DORA),” Brussels, Belgium, 2022
2022
-
[20]
Directive (EU) 2022/2555 on measures for a high common level of cybersecurity across the union (NIS2),
EU, “Directive (EU) 2022/2555 on measures for a high common level of cybersecurity across the union (NIS2),” Brussels, Belgium, 2022
2022
-
[21]
Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (AI Act),
European Parliament and Council, “Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (AI Act),” Brussels, Belgium, 2024
2024
-
[22]
Threatmodeling-llm: Automating threat modeling using large language models for banking system,
T. Wu, S. Yang, S. Liu, D. Nguyen, S. Jang, and A. Abuadbba, “Threatmodeling-llm: Automating threat modeling using large language models for banking system,”arXiv preprint arXiv:2411.17058, 2024
-
[23]
The emerged security and privacy of LLM agent: A survey with case studies,
F. He, T. Zhu, D. Ye, B. Liu, W. Zhou, and P. S. Yu, “The emerged security and privacy of LLM agent: A survey with case studies,”ACM Computing Surveys, 2025
2025
-
[24]
From prompt injections to protocol exploits: Threats in llm-powered ai agents workflows,
M. A. Ferrag, N. Tihanyi, D. Hamouda, L. Maglaras, A. Lakas, and M. Debbah, “From prompt injections to protocol exploits: Threats in llm-powered ai agents workflows,”ICT Express, 2025
2025
-
[25]
RedTeamLLM: An agentic AI framework for offensive security,
B. Challita and P. Parrend, “RedTeamLLM: An agentic AI framework for offensive security,”arXiv preprint arXiv:2505.06913, 2025
-
[26]
MITRE ATT&CK Enterprise Matrix, Version 14,
MITRE Corporation, “MITRE ATT&CK Enterprise Matrix, Version 14,” MITRE ATT&CK Enterprise Matrix, Version 14, McLean, V A, USA, 2023, available online: https://attack.mitre.org (accessed on 1 April 2026)
2023
-
[27]
Pci dss: A critical analysis of implemen- tation, effectiveness, and legislative impact in payment card security,
S. Chippagiri and A. Ramesh, “Pci dss: A critical analysis of implemen- tation, effectiveness, and legislative impact in payment card security,” 2025
2025
-
[28]
Shrinking the kernel attack surface through static and dynamic syscall limitation,
D. Zhan, Z. Yu, X. Yu, H. Zhang, and L. Ye, “Shrinking the kernel attack surface through static and dynamic syscall limitation,”IEEE Transactions on Services Computing, vol. 16, no. 2, pp. 1431–1443, 2022
2022
-
[29]
Demonstration of quantum-digital payments,
P. Schiansky, J. Kalb, E. Sztatecsny, M.-C. Roehsner, T. Guggemos, A. Trenti, M. Bozzio, and P. Walther, “Demonstration of quantum-digital payments,”nature communications, vol. 14, no. 1, p. 3849, 2023
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.