pith. sign in

arxiv: 2605.30604 · v1 · pith:OZYSEKTKnew · submitted 2026-05-28 · 💻 cs.CR · cs.AI· cs.CL· cs.IR

An Organization-Scoped LLM Agent Runtime Architecture for Regulated Cybersecurity Operations

Pith reviewed 2026-06-29 06:23 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.CLcs.IR
keywords LLM agentscybersecurity operationssecurity contextSIEMXDRhuman-in-the-loopauditabilityregulated workflows
0
0 comments X

The pith

A typed Security Context created at every entry point and enforced at every boundary organizes LLM agents for auditable regulated cybersecurity operations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a runtime architecture that keeps LLM agents within organization scope when they handle cybersecurity tasks. It does this by generating a typed Security Context at every starting point, such as incoming SIEM or XDR alerts treated as primary triggers, and then checking that context at each component handoff. The design adds a shared runtime core, specialist subagents, a controlled tool adapter for SIEM and XDR actions, structured evidence-linked findings, tiered human review points, and permanent audit records. The architecture stays independent of any specific model and can run locally while integrating with existing security stacks rather than replacing them. A sympathetic reader would care because current LLM agent work on isolated tasks does not yet supply the policy and audit substrate needed when one analyst action can bind an entire regulated organization.

Core claim

The central claim is that a typed Security Context instantiated at every entry point, including SIEM/XDR notifications as first-class triggers, and enforced at every component boundary, together with a shared Runtime Core, logical specialist subagents, a governed Tool Adapter Layer, structured findings with evidence references, tiered human-in-the-loop gates, and append-only audit, supplies the missing runtime substrate for organization-scoped, model-agnostic, locally deployable LLM agent operations in financial cybersecurity.

What carries the argument

The typed Security Context, which is created at entry points and carries organization scope and policy to be checked at every boundary and tool call.

If this is right

  • SIEM and XDR notifications become direct triggers that start agent workflows under the same context rules as analyst-initiated tasks.
  • All tool calls to query, enrich, or respond through SIEM or XDR systems occur only through a single governed adapter layer that applies uniform policy and logging.
  • Findings are produced in structured form that always references the original evidence and can be traced through the audit trail.
  • Tiered human review gates can be inserted at defined points without changing the underlying agent logic.
  • The entire system remains usable with any LLM backend and can be deployed inside the organization's own infrastructure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the context enforcement works, organizations could treat the agent runtime as an additional controlled user of their existing security tools rather than a separate analytical layer.
  • The same boundary checks might later support optional extensions such as graph-based retrieval or federated knowledge sharing without altering the core enforcement model.
  • A working implementation would make it possible to measure policy compliance as a first-class metric alongside task accuracy.

Load-bearing premise

That this typed Security Context and its enforcement mechanisms can be implemented and kept consistent across different SIEM and XDR systems and different LLM backends without adding unacceptable delay or new ways for policy to be bypassed.

What would settle it

A test run in which an LLM agent action that violates the stated organization policy succeeds because the Security Context was not created or checked at one of the component boundaries.

Figures

Figures reproduced from arXiv: 2605.30604 by Dimosthenis Kyriazis, George Fatouros, George Kousiouris, Georgios Makridis, John Soldatos.

Figure 1
Figure 1. Figure 1: Three deployment patterns for LLM agents acting on behalf of a user. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Organization-scoped LLM agent runtime architecture. Entry points [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Security Context propagation through one workflow. Each boundary check (Table III) emits an audit event; the dashed branch shows a denied retrieval [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
read the original abstract

Regulated cybersecurity workflows lack a runtime substrate that enforces organization-level scope across retrieval, tool calls, memory, findings, reports, and audit while remaining model-agnostic and locally deployable. Recent large language model (LLM) agent systems report strong results on isolated cybersecurity tasks, yet they do not by themselves define an auditable platform architecture for regulated security operations centre (SOC) and compliance workflows, where a single analyst may trigger actions that bind the organization, and where the runtime must integrate with existing SIEM/XDR stacks as a primary source of context and alert-driven triggers rather than operate as a standalone analytical layer. This paper proposes an organization-scoped LLM agent runtime architecture for financial cybersecurity. The contribution is a typed Security Context that is created at every entry point, including SIEM/XDR notifications ingested as first-class triggers, and enforced at every component boundary, combined with a shared Runtime Core, logical specialist subagents, a governed Tool Adapter Layer exposing SIEM/XDR query, enrichment, and response primitives under uniform policy and audit, structured findings with evidence references, tiered human-in-the-loop (HITL) gates, and append-only audit. Model Context Protocol (MCP), extended telemetry, digital twins for pentesting, graph retrieval, and federated knowledge sharing are treated as optional extension paths rather than mandatory runtime assumptions. We describe an implementable slice as the architecture's testability surface, and we propose a falsifiable evaluation plan with metric-level pass criteria for architecture readiness, security-policy enforcement, evidence traceability, output quality, and operational observability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes an organization-scoped LLM agent runtime architecture for regulated financial cybersecurity operations. The central contribution is a typed Security Context created at every entry point (including SIEM/XDR notifications as first-class triggers) and enforced at every component boundary. This is combined with a shared Runtime Core, logical specialist subagents, a governed Tool Adapter Layer exposing SIEM/XDR primitives under uniform policy and audit, structured findings with evidence references, tiered human-in-the-loop gates, and append-only audit. Model Context Protocol and other extensions are treated as optional. The paper describes an implementable slice as a testability surface and proposes a falsifiable evaluation plan with metric-level pass criteria for architecture readiness, security-policy enforcement, evidence traceability, output quality, and operational observability.

Significance. If the enforcement claims hold under implementation, the architecture would address a genuine gap by providing an auditable, model-agnostic substrate for LLM agents in regulated SOC environments that integrates directly with existing SIEM/XDR stacks rather than operating standalone. The proposal of a falsifiable evaluation plan with explicit pass criteria is a positive element that could support future verification. The model-agnostic and locally deployable emphasis is also a strength for practical adoption.

major comments (2)
  1. [Abstract] Abstract (contribution paragraph): The claim that a typed Security Context 'is created at every entry point... and enforced at every component boundary' is load-bearing for the organization-scoped guarantee, yet the architecture description provides no concrete enforcement primitives, isolation guarantees, or handling for LLM-specific risks such as prompt injection or tool misuse that could violate scope. This absence prevents assessment of whether the central claim can be realized across heterogeneous SIEM/XDR and LLM backends.
  2. [Abstract] Abstract: The manuscript presents only a high-level design proposal and evaluation plan with no implemented system, prototype data, formal invariants, or checked properties that could be used to verify internal consistency or correctness of the enforcement mechanisms.
minor comments (1)
  1. [Abstract] The abstract is information-dense; clearer enumeration or a diagram of the core components (Security Context, Runtime Core, Tool Adapter Layer) would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the potential of the proposed architecture to address gaps in regulated SOC environments. We address each major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract (contribution paragraph): The claim that a typed Security Context 'is created at every entry point... and enforced at every component boundary' is load-bearing for the organization-scoped guarantee, yet the architecture description provides no concrete enforcement primitives, isolation guarantees, or handling for LLM-specific risks such as prompt injection or tool misuse that could violate scope. This absence prevents assessment of whether the central claim can be realized across heterogeneous SIEM/XDR and LLM backends.

    Authors: We agree that the current description of enforcement remains at the architectural level without low-level primitives. In revision we will expand the Runtime Core and Tool Adapter Layer sections with concrete mechanisms, including typed context objects serialized with policy metadata, explicit boundary validation checks, context sanitization to mitigate prompt injection, and tool-call authorization lists derived from the Security Context. These additions will illustrate how the scoping guarantee can be realized across backends while preserving the model-agnostic stance. revision: yes

  2. Referee: [Abstract] Abstract: The manuscript presents only a high-level design proposal and evaluation plan with no implemented system, prototype data, formal invariants, or checked properties that could be used to verify internal consistency or correctness of the enforcement mechanisms.

    Authors: The manuscript is intentionally a design proposal that defines an organization-scoped runtime model together with a falsifiable evaluation plan and metric-level pass criteria. This scope is appropriate for establishing the substrate before implementation; the evaluation plan is provided precisely so that future prototypes can verify the claims. We therefore maintain the current contribution type and do not plan to add an implementation or formal proofs. revision: no

Circularity Check

0 steps flagged

No circularity in architecture design proposal

full rationale

The manuscript is a design proposal for an organization-scoped LLM agent runtime. It defines a typed Security Context created at entry points and enforced at boundaries, along with a Runtime Core, subagents, Tool Adapter Layer, HITL gates, and audit mechanisms. No equations, fitted parameters, predictions, or first-principles derivations appear in the provided text. The contribution is presented as an architectural description rather than a reduction of outputs to prior inputs or self-citations. No load-bearing steps match the enumerated circularity patterns; the work is self-contained as a conceptual specification with a proposed evaluation plan.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The proposal rests on domain assumptions about the feasibility of context enforcement and integration with existing security stacks; no free parameters or invented entities with independent evidence are introduced in the abstract.

axioms (2)
  • domain assumption Existing SIEM/XDR systems can expose query, enrichment, and response primitives under uniform policy without loss of fidelity or security.
    Invoked in the description of the governed Tool Adapter Layer.
  • domain assumption A typed Security Context can be created and enforced at every boundary without introducing new vulnerabilities or unacceptable overhead.
    Central to the contribution statement.
invented entities (1)
  • typed Security Context no independent evidence
    purpose: To carry and enforce organization-level scope across all agent components and triggers.
    New construct introduced as the load-bearing mechanism; no independent falsifiable evidence provided in the abstract.

pith-pipeline@v0.9.1-grok · 5841 in / 1565 out tokens · 21594 ms · 2026-06-29T06:23:04.090731+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 6 canonical work pages · 2 internal anchors

  1. [1]

    A survey on large language model based autonomous agents,

    L. Wang, C. Ma, X. Feng, Z. Zhang, H. Yang, J. Zhang, Z. Chen, J. Tang, X. Chen, Y . Linet al., “A survey on large language model based autonomous agents,”Frontiers of Computer Science, vol. 18, no. 6, p. 186345, 2024

  2. [2]

    Market- senseai 2.0: Enhancing stock analysis through llm agents,

    G. Fatouros, K. Metaxas, J. Soldatos, and M. Karathanassis, “Market- senseai 2.0: Enhancing stock analysis through llm agents,” in2025 IEEE International Conference on Data Mining Workshops (ICDMW), 2025, pp. 883–892

  3. [3]

    A survey on agentic security: Applications, threats and defenses,

    A. Shahriar, M. N. Rahman, S. Ahmed, F. Sadeque, and M. R. Parvez, “A survey on agentic security: Applications, threats and defenses,”arXiv preprint arXiv:2510.06445, 2025

  4. [4]

    CyberAId: AI-Driven Cybersecurity for Financial Service Providers

    G. Fatouros, G. Makridis, J. Soldatos, D. Kyriazis, P. Malo, G. Kousiouris, G. Ledakis, L. Kachrimani, P. Rizomiliotis, B. Almeida et al., “Cyberaid: Ai-driven cybersecurity for financial service providers,”arXiv preprint arXiv:2605.01892, 2026

  5. [5]

    Unrolling the codex agent loop,

    OpenAI, “Unrolling the codex agent loop,” https://openai.com/index/ unrolling-the-codex-agent-loop, 2026, accessed 1 May 2026

  6. [6]

    How claude code works,

    Anthropic, “How claude code works,” https://docs.anthropic.com/en/ docs/agents-and-tools/claude-code/overview, 2026, accessed 1 May 2026

  7. [7]

    Best practices for coding with agents,

    Cursor, “Best practices for coding with agents,” https://www.cursor.com/ blog/agent-best-practices, 2026, accessed 1 May 2026

  8. [8]

    Agent runtimes,

    OpenClaw, “Agent runtimes,” https://docs.openclaw.ai/concepts/ agent-runtimes, 2026, accessed 1 May 2026

  9. [9]

    Autogen: Enabling next-gen llm applications via multi-agent conversations,

    Q. Wu, G. Bansal, J. Zhang, Y . Wu, B. Li, E. Zhu, L. Jiang, X. Zhang, S. Zhang, J. Liuet al., “Autogen: Enabling next-gen llm applications via multi-agent conversations,” inFirst Conference on Language Modeling, 2024

  10. [10]

    ReAct: Synergizing reasoning and acting in language models,

    S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y . Cao, “ReAct: Synergizing reasoning and acting in language models,” inPro- ceedings of the International Conference on Learning Representations (ICLR 2023), Kigali, Rwanda, 2023

  11. [11]

    Retrieval- augmented generation for knowledge-intensive nlp tasks,

    P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. K ¨uttler, M. Lewis, W.-t. Yih, T. Rockt ¨aschelet al., “Retrieval- augmented generation for knowledge-intensive nlp tasks,”Advances in Neural Information Processing Systems, vol. 33, pp. 9459–9474, 2020

  12. [12]

    Chain-of-thought prompting elicits reasoning in large language models,

    J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V . Le, D. Zhouet al., “Chain-of-thought prompting elicits reasoning in large language models,”Advances in neural information processing systems, vol. 35, pp. 24 824–24 837, 2022. TABLE VI FALSIFIABLE EVALUATION PLAN WITH METRIC-LEVEL PASS CRITERIA. Dimension Question Metric Pass criterion Archi...

  13. [13]

    Can large language models beat wall street? evaluating gpt-4’s impact on financial decision-making with marketsenseai,

    G. Fatouros, K. Metaxas, J. Soldatos, and D. Kyriazis, “Can large language models beat wall street? evaluating gpt-4’s impact on financial decision-making with marketsenseai,”Neural Computing and Applica- tions, vol. 37, no. 30, pp. 24 893–24 918, 2025

  14. [14]

    Toolformer: Language models can teach themselves to use tools,

    T. Schick, J. Dwivedi-Yu, R. Dess `ı, R. Raileanu, M. Lomeli, E. Hambro, L. Zettlemoyer, N. Cancedda, and T. Scialom, “Toolformer: Language models can teach themselves to use tools,”Advances in Neural Infor- mation Processing Systems, vol. 36, pp. 68 539–68 551, 2023

  15. [15]

    LLM Agents can Autonomously Exploit One-day Vulnerabilities

    R. Fang, R. Bindu, A. Gupta, and D. Kang, “LLM agents can autonomously exploit one-day vulnerabilities,”arXiv preprint arXiv:2404.08144, 2024

  16. [16]

    Teams of llm agents can exploit zero-day vulnerabilities,

    Y . Zhu, A. Kellermann, A. Gupta, P. Li, R. Fang, R. Bindu, and D. Kang, “Teams of llm agents can exploit zero-day vulnerabilities,” inProceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), 2026, pp. 23–35

  17. [17]

    IRIS: LLM-assisted static analysis for detecting security vulnerabilities,

    Z. Li, S. Dutta, and M. Naik, “IRIS: LLM-assisted static analysis for detecting security vulnerabilities,”arXiv preprint arXiv:2405.17238, 2024, accepted at ICLR 2025

  18. [18]

    Ids-agent: An llm agent for explainable intrusion detection in iot networks,

    Y . Li, Z. Xiang, N. D. Bastian, D. Song, and B. Li, “Ids-agent: An llm agent for explainable intrusion detection in iot networks,” inNeurIPS 2024 Workshop on Open-World Agents, 2024

  19. [19]

    CORTEX: Collaborative LLM agents for high-stakes alert triage,

    B. Wei, Y . S. Tay, H. Liu, J. Pan, K. Luo, Z. Zhu, and C. Jordan, “CORTEX: Collaborative LLM agents for high-stakes alert triage,”arXiv preprint arXiv:2510.00311, 2025

  20. [20]

    Cyberrag: An agentic rag cyber attack classification and reporting tool,

    F. Blefari, C. Cosentino, F. A. Pironti, A. Furfaro, and F. Marozzo, “Cyberrag: An agentic rag cyber attack classification and reporting tool,” Future Generation Computer Systems, p. 108186, 2025

  21. [21]

    CTIKG: LLM-powered knowledge graph construction from cyber threat intelligence,

    L. Huang and X. Xiao, “CTIKG: LLM-powered knowledge graph construction from cyber threat intelligence,” inProceedings of the First Conference on Language Modeling (COLM 2024), Philadelphia, PA, USA, 2024

  22. [22]

    KnowPhish: Large language models meet multimodal knowledge graphs for enhancing reference-based phishing detection,

    Y . Li, C. Huang, S. Deng, M. L. Lock, T. Cao, N. Oo, H. W. Lim, and B. Hooi, “KnowPhish: Large language models meet multimodal knowledge graphs for enhancing reference-based phishing detection,” in 33rd USENIX Security Symposium (USENIX Security 24). USENIX Association, 2024, pp. 793–810

  23. [23]

    Security of llm-based agents regarding attacks, defenses, and applications: A comprehensive survey,

    Y . Tang, Y . Liu, J. Lan, Z. Yan, and E. Gelenbe, “Security of llm-based agents regarding attacks, defenses, and applications: A comprehensive survey,”Information Fusion, p. 103941, 2025

  24. [24]

    From prompt injections to protocol exploits: Threats in llm-powered ai agents workflows,

    M. A. Ferrag, N. Tihanyi, D. Hamouda, L. Maglaras, A. Lakas, and M. Debbah, “From prompt injections to protocol exploits: Threats in llm-powered ai agents workflows,”ICT Express, 2025

  25. [25]

    The emerged security and privacy of LLM agent: A survey with case studies,

    F. He, T. Zhu, D. Ye, B. Liu, W. Zhou, and P. S. Yu, “The emerged security and privacy of LLM agent: A survey with case studies,”ACM Computing Surveys, 2025

  26. [26]

    Security best practices,

    Model Context Protocol, “Security best practices,” https:// modelcontextprotocol.io/docs/tutorials/security/security best practices, 2026, accessed 1 May 2026

  27. [27]

    Threatmodeling-llm: Automating threat modeling using large language models for banking system,

    T. Wu, S. Yang, S. Liu, D. Nguyen, S. Jang, and A. Abuadbba, “Threatmodeling-llm: Automating threat modeling using large language models for banking system,”arXiv preprint arXiv:2411.17058, 2024