Probabilistic Agents in Deterministic Audits: Evaluating Multi-Agent Systems for Automated Audits Based on the German IT-Grundschutz

Lea Roxanne Muth; Marian Margraf

arxiv: 2606.25622 · v1 · pith:XUMG72KJnew · submitted 2026-06-24 · 💻 cs.CR · cs.AI

Probabilistic Agents in Deterministic Audits: Evaluating Multi-Agent Systems for Automated Audits Based on the German IT-Grundschutz

Lea Roxanne Muth , Marian Margraf This is my paper

Pith reviewed 2026-06-25 20:39 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords multi-agent systemsIT-Grundschutzautomated auditsHybridRAGcompliance automationLLM evaluationrisk management

0 comments

The pith

A multi-agent system with hybrid retrieval automates semantic steps in German IT-Grundschutz audits but struggles in deterministic logical phases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a multi-agent system combined with HybridRAG to partially automate the resource-intensive German IT-Grundschutz certification required under NIS-2. It adds a Hypothesis-Verification Loop that cross-checks agent inferences against a knowledge graph and a Decoupled Reasoning Pipeline that keeps semantic extraction separate from deterministic inheritance rules. Tested end-to-end on the BSI RecPlast GmbH reference case, the system delivers high precision and recall in structural analysis and modeling yet shows clear shortfalls in protection needs assessment and the final IT-GS check where probabilistic outputs clash with required deterministic logic.

Core claim

The multi-agent system with the Hypothesis-Verification Loop and Decoupled Reasoning Pipeline achieves high efficacy in semantic tasks of the IT-GS process by automating information extraction, but the probabilistic nature of LLMs limits its ability to meet the deterministic requirements in logical reasoning phases such as PNA and IT-GS Check.

What carries the argument

The Hypothesis-Verification Loop in Structural Analysis that cross-references agent-inferred dependencies against the Knowledge Graph, together with the Decoupled Reasoning Pipeline that separates semantic extraction from deterministic protection-need inheritance.

If this is right

Manual effort drops sharply in structural analysis and modeling through automated extraction of dependencies and assets.
Quantitative metrics show the architecture meets expert-level precision and recall in semantic tasks but falls short in phases that demand strict logical inheritance.
The two added mechanisms (Hypothesis-Verification Loop and Decoupled Reasoning Pipeline) are presented as necessary to enforce compliance rigor inside an otherwise probabilistic agent system.
Current LLMs cannot yet replace human auditors for the full deterministic compliance verification required by IT-GS.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The semantic-versus-logic split observed here could be tested on other deterministic audit frameworks such as ISO 27001 to check whether the same pattern appears.
Hybrid systems that route logical inheritance steps to symbolic solvers rather than LLMs may close the performance gap left open by the current design.
Broader validation across multiple company sizes and sectors would be needed before claiming the reported effort reductions apply generally.

Load-bearing premise

The single BSI RecPlast GmbH case study supplies a representative reference dataset sufficient to measure performance across every phase of the IT-GS process.

What would settle it

Running the identical architecture on two or more additional independent BSI-provided IT-GS case studies and checking whether the performance gap between semantic phases and logical phases remains consistent.

Figures

Figures reproduced from arXiv: 2606.25622 by Lea Roxanne Muth, Marian Margraf.

read the original abstract

The NIS-2 Directive mandates robust Risk Management from thousands of small and medium enterprises. To ensure compliance, companies rely on established standards such as the German IT-Grundschutz (IT-GS) of the Federal Office for Information Security. However, IT-GS certification is resource-intensive and requires a high level of manual effort for documentation, validation, and revision, making scalable implementation difficult and expensive. Building upon our previous conceptual framework, this paper presents the technical implementation and empirical evaluation of a Multi-Agent System (MAS) architecture combined with Hybrid Retrieval Augmented Generation (HybridRAG) for the partial automation of IT-GS certification. We introduce two novel technical contributions to the MAS architecture to enforce the compliance rigor. The Hypothesis-Verification Loop in the Structural Analysis (SA) phase that cross-references agent-inferred dependencies against the Knowledge Graph to reduce hallucinations, and a Decoupled Reasoning Pipeline that separates agent-driven semantic extraction from the deterministic protection need inheritance. We utilize the BSI's "RecPlast GmbH" case study as a human expert-generated reference data set for end-to-end evaluation of the architecture and to quantify Precision, Recall, and F1-scores. The performance of the system is investigated across the phases of SA, Protection Needs Assessment (PNA), Modeling, and IT-GS Check. The empirical results reveal noticeable differences throughout the different steps of IT-GS. While the MAS demonstrates high efficacy in semantic tasks (SA and Modeling), significantly reducing manual effort through automated information extraction, quantitative results reveal limitations in logical reasoning phases (PNA and IT-GS Check) as the probabilistic nature of current LLMs struggles to meet the deterministic rigor required by IT-GS.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MAS for IT-Grundschutz works on extraction but shows reasoning limits on a single case study.

read the letter

The paper builds a multi-agent system for partial IT-Grundschutz automation on top of prior conceptual work. It adds a Hypothesis-Verification Loop in structural analysis and a Decoupled Reasoning Pipeline that keeps semantic extraction separate from deterministic inheritance. They test the whole thing end-to-end on the BSI RecPlast GmbH reference case and give phase-by-phase precision, recall, and F1 numbers.

The results line up with what they claim: solid numbers on semantic tasks like SA and modeling, weaker performance on PNA and the final IT-GS check. The authors are straightforward that current LLMs have trouble with the deterministic parts of the standard, and the numbers support that observation.

The evaluation uses an external expert reference dataset, which is better than self-generated ground truth. The two new mechanisms look like practical attempts to cut hallucinations and enforce structure.

The main limitation is the single case study. One company's documentation and protection profile is not enough to treat the reasoning gaps as a general property of LLMs rather than something specific to this scenario. No repeated runs or error bars are reported, so the size of the differences is hard to judge.

This is worth a referee for people working on applied AI for compliance and security standards. It supplies concrete implementation details and real numbers on where automation helps and where it still falls short. I would send it to review.

Referee Report

2 major / 2 minor

Summary. The paper presents the technical implementation of a Multi-Agent System (MAS) with HybridRAG for partial automation of German IT-Grundschutz (IT-GS) certification under NIS-2. It introduces a Hypothesis-Verification Loop for the Structural Analysis (SA) phase and a Decoupled Reasoning Pipeline separating semantic extraction from deterministic protection need inheritance. The system is evaluated end-to-end against the BSI-provided 'RecPlast GmbH' expert-generated reference dataset, reporting precision, recall, and F1 scores across SA, Protection Needs Assessment (PNA), Modeling, and IT-GS Check phases. Results indicate high performance on semantic tasks but limitations on logical reasoning phases, attributed to the probabilistic nature of LLMs versus the deterministic requirements of IT-GS.

Significance. If the phase-specific performance differences generalize, the work supplies quantitative evidence on where LLM-based agents can reduce manual effort in compliance documentation (semantic extraction) versus where they fall short (logical reasoning under deterministic standards). The use of an external BSI case study as ground truth supplies independent grounding for the metrics and builds directly on the authors' prior conceptual framework.

major comments (2)

[Evaluation section (results across phases, single 'RecPlast GmbH' reference)] Evaluation section (results across phases, single 'RecPlast GmbH' reference): The central claim that 'quantitative results reveal limitations in logical reasoning phases (PNA and IT-GS Check) as the probabilistic nature of current LLMs struggles to meet the deterministic rigor required by IT-GS' rests on performance numbers from only one case study. No error bars, multiple independent runs, sensitivity analysis, or cross-scenario replication are reported, so the observed gaps cannot be distinguished from case-specific factors such as documentation style, knowledge-graph coverage, or annotation conventions.
[Methods section describing the Decoupled Reasoning Pipeline] Methods section describing the Decoupled Reasoning Pipeline: The separation of agent-driven semantic extraction from deterministic protection-need inheritance is presented as a novel contribution, yet the manuscript provides no pseudocode, formal specification, or ablation showing that the deterministic component is strictly enforced rather than still relying on LLM output. This detail is load-bearing for the claim that the architecture enforces compliance rigor.

minor comments (2)

Define all phase abbreviations (SA, PNA, IT-GS) at first use in the main text rather than assuming reader familiarity from the abstract.
Add a summary table of precision/recall/F1 scores by phase to improve readability of the quantitative results.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the evaluation and methods. We address each major comment below with honest responses and indicate planned revisions.

read point-by-point responses

Referee: Evaluation section (results across phases, single 'RecPlast GmbH' reference): The central claim that 'quantitative results reveal limitations in logical reasoning phases (PNA and IT-GS Check) as the probabilistic nature of current LLMs struggles to meet the deterministic rigor required by IT-GS' rests on performance numbers from only one case study. No error bars, multiple independent runs, sensitivity analysis, or cross-scenario replication are reported, so the observed gaps cannot be distinguished from case-specific factors such as documentation style, knowledge-graph coverage, or annotation conventions.

Authors: We agree that the reported phase differences rest on a single BSI-provided case study and that the lack of error bars, multiple runs, or cross-scenario replication prevents strong claims of generalizability. The 'RecPlast GmbH' reference remains valuable as an independent expert-generated dataset, but we will revise the Evaluation section to qualify the central claim, add an explicit limitations paragraph on threats to validity (including case-specific factors), and outline future multi-case validation. No new statistical analyses or runs will be added, as they were not performed. revision: partial
Referee: Methods section describing the Decoupled Reasoning Pipeline: The separation of agent-driven semantic extraction from deterministic protection-need inheritance is presented as a novel contribution, yet the manuscript provides no pseudocode, formal specification, or ablation showing that the deterministic component is strictly enforced rather than still relying on LLM output. This detail is load-bearing for the claim that the architecture enforces compliance rigor.

Authors: The Decoupled Reasoning Pipeline is implemented such that semantic extraction uses agents while protection-need inheritance applies deterministic, rule-based logic directly to the extracted entities, bypassing LLM generation entirely. To strengthen this, we will add pseudocode and a formal specification of the pipeline in the revised Methods section, explicitly documenting the separation and confirming the deterministic step does not invoke the LLM. An ablation isolating the component is noted as desirable but may exceed space constraints. revision: yes

Circularity Check

0 steps flagged

No significant circularity; evaluation grounded in external BSI case study

full rationale

The paper's central empirical claims rest on direct computation of precision, recall, and F1-scores against the external BSI-provided 'RecPlast GmbH' human-expert reference dataset across SA, PNA, Modeling, and IT-GS Check phases. The reference to a prior conceptual framework supplies architectural context but does not define or force the reported performance numbers. No equations, fitted parameters renamed as predictions, uniqueness theorems, or ansatzes reduce any result to its own inputs by construction. The derivation chain is therefore self-contained against an independent external benchmark.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the domain assumption that the single RecPlast case study is representative ground truth and that F1 scores on semantic versus logical phases generalize beyond this instance.

axioms (1)

domain assumption The BSI RecPlast GmbH case study serves as valid and representative ground truth for end-to-end evaluation of the MAS across SA, PNA, Modeling, and IT-GS Check phases.
Used as the reference dataset to compute precision, recall, and F1 scores.

pith-pipeline@v0.9.1-grok · 5851 in / 1284 out tokens · 20967 ms · 2026-06-25T20:39:26.538904+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 6 canonical work pages · 1 internal anchor

[1]

L333, pp

European Parliament and Council of the European Union, “Directive (EU) 2022/2555 of the European Parliament and of the Council of 14 December 2022 on measures for a high common level of cybersecurity across the Union, amending Regulation (EU) No 910/2014 and Directive (EU) 2018/1972, and repealing Directive (EU) 2016/1148 (NIS 2 Directive),” Official Jour...

2022
[2]

Information security, cybersecurity and privacy protection — Information security management systems — Require- ments

ISO/IEC 27001:2022. Information security, cybersecurity and privacy protection — Information security management systems — Require- ments. Edition 3, 2022. Available: https://www.iso.org/standard/27001. (Accessed: 2025-12-21)

2022
[3]

BSI-Standards,

BSI, “BSI-Standards,” Bundesamt f ¨ur Sicherheit in der Information- stechnik. Available: https://www.bsi.bund.de/dok/6603458. (Accessed: 2025-12-21)

arXiv 2025
[4]

Pressemitteilung, 30

Bundesamt f ¨ur Sicherheit in der Informationstechnik (BSI).IT- Sicherheitsrecht: NIS-2-Regierungsentwurf ist ein großer Schritt auf dem Weg zur Cybernation. Pressemitteilung, 30. July 2025. Available: https://www.bsi.bund.de/DE/Service-Navi/Presse/Pressemitteilungen/ Presse2025/250730 NIS-2-Regierungsentwurf.html. (Accessed: 2025- 12-21)

2025
[5]

Navigating cybersecurity investments in the time of NIS 2,

European Union Agency for Cybersecurity (ENISA), “Navigating cybersecurity investments in the time of NIS 2,” ENISA, Jul. 2023. Available: https://www.enisa.europa.eu/news/ navigating-cybersecurity-investments-in-the-time-of-nis-2. (Accessed: 2025-12-21)

2023
[6]

An Approach for a Supporting Multi- LLM System for Automated Certification Based on the German IT- Grundschutz,

L. Muth and M. Margraf, “An Approach for a Supporting Multi- LLM System for Automated Certification Based on the German IT- Grundschutz,” 2025 IEEE International Conference on Cyber Security and Resilience (CSR), Chania, Crete, Greece, 2025, pp. 482-489. Available: https://doi.org/10.1109/CSR64739.2025.11130171

work page doi:10.1109/csr64739.2025.11130171 2025
[7]

HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Ex- traction,

B. Sarmah et al., “HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Ex- traction,” Proceedings of the 5th ACM International Conference on AI in Finance, 608–616, Nov. 2024. Available: https://doi.org/10.1145/ 3677052.3698671

arXiv 2024
[8]

From Local to Global: A Graph RAG Approach to Query-Focused Summarization,

D. Edge et al., “From Local to Global: A Graph RAG Approach to Query-Focused Summarization,” arXiv preprint arXiv:2404.16130, Apr

Pith/arXiv arXiv
[9]

Available: https://doi.org/10.48550/arXiv.2404.16130

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2404.16130
[10]

IT- Grundschutz-Kompendium: Hilfsmittel und Anwenderbeitr ¨age,

Bundesamt f ¨ur Sicherheit in der Informationstechnik, “IT- Grundschutz-Kompendium: Hilfsmittel und Anwenderbeitr ¨age,” BSI, 2023. Available: https://www.bsi.bund.de/DE/Themen/ Unternehmen-und-Organisationen/Standards-und-Zertifizierung/ IT-Grundschutz/Hilfsmittel und Anwenderbeitraege/Hilfsmittel vom BSI/Recplast/recplast node.html. (Accessed: 2025-12-21)

2023
[11]

IT- Grundschutz-Kompendium,

Bundesamt f ¨ur Sicherheit in der Informationstechnik, “IT- Grundschutz-Kompendium,” Edition 2023. Available: https: //www.bsi.bund.de/SharedDocs/Downloads/DE/BSI/Grundschutz/ IT-GS-Kompendium/IT Grundschutz Kompendium Edition2023.pdf. (Accessed: 2025-12-21)

2023
[12]

Ontology-based information security compli- ance determination and control selection on the example of ISO 27002,

S. Fenz and T. Neubauer, “Ontology-based information security compli- ance determination and control selection on the example of ISO 27002,” Information & Computer Security, V ol. 26, Nov. 2018. Available: https://doi.org/10.1108/ICS-02-2018-0020

work page doi:10.1108/ics-02-2018-0020 2018
[13]

Work in Progress: Leveraging Large Language Models for Cybersecurity Compliance: A Pilot Study in ISO 27001 Audit Planning,

A. Salman, Y . Alsiyat, S. Creese, and M. Goldsmith, “Work in Progress: Leveraging Large Language Models for Cybersecurity Compliance: A Pilot Study in ISO 27001 Audit Planning,” 2025 IEEE European Symposium on Security and Privacy Workshops, pp. 351–359, Jun

2025
[14]

Available: https://doi.org/10.1109/EuroSPW67616.2025.00046. (

work page doi:10.1109/eurospw67616.2025.00046 2025
[15]

Constructing Cybersecurity Knowledge Graphs for Hybrid LLM–Graph Reasoning on Vulnerabilities,

J. Vizcarra, Y . Gempei, Y . Wang, T. Isohara, and M. Kurokawa, “Constructing Cybersecurity Knowledge Graphs for Hybrid LLM–Graph Reasoning on Vulnerabilities,” ISWC 2025 Companion V olume, Nov

2025
[16]

Available: https://ceur-ws.org/V ol-4085/paper35.pdf
[17]

CyKG-RAG: Towards knowledge-graph enhanced retrieval augmented generation for cyberse- curity,

K. Kurniawan, E Kiesling, and A. Ekelhart, “CyKG-RAG: Towards knowledge-graph enhanced retrieval augmented generation for cyberse- curity,” RAGE-KG 2024 Workshop at ISWC 2024, Nov. 2024. Available: https://ceur-ws.org/V ol-3950/paper1.pdf. (Accessed: 2025-12-21)

2024
[18]

Available: https://github.com/langchain-ai/langgraph

LangGraph AI. Available: https://github.com/langchain-ai/langgraph. (Accessed: 2025-12-21)

2025
[19]

Available: https://platform.openai.com/docs/models/

OpenAI, GPT-4o mini, GPT-4.1, GPT-5 mini, and GPT-OSS-120B. Available: https://platform.openai.com/docs/models/. (Accessed: 2025- 12-21)

2025
[20]

React: Synergizing reasoning and acting in language models,

S. Yao, J. Zhao, D. Yu, N. Du, and I. Shafran, “React: Synergizing reasoning and acting in language models,” ICLR 2023, Feb. 2023. Available: https://arxiv.org/abs/2210.03629

Pith/arXiv arXiv 2023
[21]

Available: https://www.anthropic.com/ news/claude-haiku-4-5

Anthropic, Claude Haiku 4.5. Available: https://www.anthropic.com/ news/claude-haiku-4-5. (Accessed: 2025-12-21)

2025
[22]

Available: https://github.com/dhiaaeddine16/ LLMGraphTransformer

LLMGraphTransformer. Available: https://github.com/dhiaaeddine16/ LLMGraphTransformer. (Accessed: 2025-11-2)

2025
[23]

Exploring LLM To Extract Knowledge Graph From Academic Abstracts,

V . E. Yamamoto et al., “Exploring LLM To Extract Knowledge Graph From Academic Abstracts,” ISWC 2025 Companion V olume, Nov. 2025. Available: https://ceur-ws.org/V ol-4085/paper49.pdf

2025
[24]

Available: https://platform

OpenAI, text-embedding-3-small Model. Available: https://platform. openai.com/docs/models/text-embedding-3-small. (Accessed: 2025-12- 21)

2025
[25]

AI-Augmented SOC: A Survey of LLMs and Agents for Security Automation,

S. Srinivas et al., “AI-Augmented SOC: A Survey of LLMs and Agents for Security Automation,” Journal of Cybersecurity and Privacy, V ol. 5, Article Nr. 95, Sep. 2025. Available: https://doi.org/10.3390/jcp5040095

work page doi:10.3390/jcp5040095 2025
[26]

Revisiting Representation Degeneration Problem in Language Modeling,

Z. Zhang et al., “Revisiting Representation Degeneration Problem in Language Modeling,” In Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 518—527, Nov. 2020. Available: https: //doi.org/10.18653/v1/2020.findings-emnlp.46

work page doi:10.18653/v1/2020.findings-emnlp.46 2020
[27]

Embedding-Based de- cision support framework for large-scale content analysis

M. Kamat, J. Jagasia, A. Vaidya, and O. Surve, “Embedding-Based de- cision support framework for large-scale content analysis”, Knowledge- Based Systems, V olume 332, Nov. 2025. Available: https://doi.org/10. 1016/j.knosys.2025.114926. (Accessed: 2025-12-21)

arXiv 2025

[1] [1]

L333, pp

European Parliament and Council of the European Union, “Directive (EU) 2022/2555 of the European Parliament and of the Council of 14 December 2022 on measures for a high common level of cybersecurity across the Union, amending Regulation (EU) No 910/2014 and Directive (EU) 2018/1972, and repealing Directive (EU) 2016/1148 (NIS 2 Directive),” Official Jour...

2022

[2] [2]

Information security, cybersecurity and privacy protection — Information security management systems — Require- ments

ISO/IEC 27001:2022. Information security, cybersecurity and privacy protection — Information security management systems — Require- ments. Edition 3, 2022. Available: https://www.iso.org/standard/27001. (Accessed: 2025-12-21)

2022

[3] [3]

BSI-Standards,

BSI, “BSI-Standards,” Bundesamt f ¨ur Sicherheit in der Information- stechnik. Available: https://www.bsi.bund.de/dok/6603458. (Accessed: 2025-12-21)

arXiv 2025

[4] [4]

Pressemitteilung, 30

Bundesamt f ¨ur Sicherheit in der Informationstechnik (BSI).IT- Sicherheitsrecht: NIS-2-Regierungsentwurf ist ein großer Schritt auf dem Weg zur Cybernation. Pressemitteilung, 30. July 2025. Available: https://www.bsi.bund.de/DE/Service-Navi/Presse/Pressemitteilungen/ Presse2025/250730 NIS-2-Regierungsentwurf.html. (Accessed: 2025- 12-21)

2025

[5] [5]

Navigating cybersecurity investments in the time of NIS 2,

European Union Agency for Cybersecurity (ENISA), “Navigating cybersecurity investments in the time of NIS 2,” ENISA, Jul. 2023. Available: https://www.enisa.europa.eu/news/ navigating-cybersecurity-investments-in-the-time-of-nis-2. (Accessed: 2025-12-21)

2023

[6] [6]

An Approach for a Supporting Multi- LLM System for Automated Certification Based on the German IT- Grundschutz,

L. Muth and M. Margraf, “An Approach for a Supporting Multi- LLM System for Automated Certification Based on the German IT- Grundschutz,” 2025 IEEE International Conference on Cyber Security and Resilience (CSR), Chania, Crete, Greece, 2025, pp. 482-489. Available: https://doi.org/10.1109/CSR64739.2025.11130171

work page doi:10.1109/csr64739.2025.11130171 2025

[7] [7]

HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Ex- traction,

B. Sarmah et al., “HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Ex- traction,” Proceedings of the 5th ACM International Conference on AI in Finance, 608–616, Nov. 2024. Available: https://doi.org/10.1145/ 3677052.3698671

arXiv 2024

[8] [8]

From Local to Global: A Graph RAG Approach to Query-Focused Summarization,

D. Edge et al., “From Local to Global: A Graph RAG Approach to Query-Focused Summarization,” arXiv preprint arXiv:2404.16130, Apr

Pith/arXiv arXiv

[9] [9]

Available: https://doi.org/10.48550/arXiv.2404.16130

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2404.16130

[10] [10]

IT- Grundschutz-Kompendium: Hilfsmittel und Anwenderbeitr ¨age,

Bundesamt f ¨ur Sicherheit in der Informationstechnik, “IT- Grundschutz-Kompendium: Hilfsmittel und Anwenderbeitr ¨age,” BSI, 2023. Available: https://www.bsi.bund.de/DE/Themen/ Unternehmen-und-Organisationen/Standards-und-Zertifizierung/ IT-Grundschutz/Hilfsmittel und Anwenderbeitraege/Hilfsmittel vom BSI/Recplast/recplast node.html. (Accessed: 2025-12-21)

2023

[11] [11]

IT- Grundschutz-Kompendium,

Bundesamt f ¨ur Sicherheit in der Informationstechnik, “IT- Grundschutz-Kompendium,” Edition 2023. Available: https: //www.bsi.bund.de/SharedDocs/Downloads/DE/BSI/Grundschutz/ IT-GS-Kompendium/IT Grundschutz Kompendium Edition2023.pdf. (Accessed: 2025-12-21)

2023

[12] [12]

Ontology-based information security compli- ance determination and control selection on the example of ISO 27002,

S. Fenz and T. Neubauer, “Ontology-based information security compli- ance determination and control selection on the example of ISO 27002,” Information & Computer Security, V ol. 26, Nov. 2018. Available: https://doi.org/10.1108/ICS-02-2018-0020

work page doi:10.1108/ics-02-2018-0020 2018

[13] [13]

Work in Progress: Leveraging Large Language Models for Cybersecurity Compliance: A Pilot Study in ISO 27001 Audit Planning,

A. Salman, Y . Alsiyat, S. Creese, and M. Goldsmith, “Work in Progress: Leveraging Large Language Models for Cybersecurity Compliance: A Pilot Study in ISO 27001 Audit Planning,” 2025 IEEE European Symposium on Security and Privacy Workshops, pp. 351–359, Jun

2025

[14] [14]

Available: https://doi.org/10.1109/EuroSPW67616.2025.00046. (

work page doi:10.1109/eurospw67616.2025.00046 2025

[15] [15]

Constructing Cybersecurity Knowledge Graphs for Hybrid LLM–Graph Reasoning on Vulnerabilities,

J. Vizcarra, Y . Gempei, Y . Wang, T. Isohara, and M. Kurokawa, “Constructing Cybersecurity Knowledge Graphs for Hybrid LLM–Graph Reasoning on Vulnerabilities,” ISWC 2025 Companion V olume, Nov

2025

[16] [16]

Available: https://ceur-ws.org/V ol-4085/paper35.pdf

[17] [17]

CyKG-RAG: Towards knowledge-graph enhanced retrieval augmented generation for cyberse- curity,

K. Kurniawan, E Kiesling, and A. Ekelhart, “CyKG-RAG: Towards knowledge-graph enhanced retrieval augmented generation for cyberse- curity,” RAGE-KG 2024 Workshop at ISWC 2024, Nov. 2024. Available: https://ceur-ws.org/V ol-3950/paper1.pdf. (Accessed: 2025-12-21)

2024

[18] [18]

Available: https://github.com/langchain-ai/langgraph

LangGraph AI. Available: https://github.com/langchain-ai/langgraph. (Accessed: 2025-12-21)

2025

[19] [19]

Available: https://platform.openai.com/docs/models/

OpenAI, GPT-4o mini, GPT-4.1, GPT-5 mini, and GPT-OSS-120B. Available: https://platform.openai.com/docs/models/. (Accessed: 2025- 12-21)

2025

[20] [20]

React: Synergizing reasoning and acting in language models,

S. Yao, J. Zhao, D. Yu, N. Du, and I. Shafran, “React: Synergizing reasoning and acting in language models,” ICLR 2023, Feb. 2023. Available: https://arxiv.org/abs/2210.03629

Pith/arXiv arXiv 2023

[21] [21]

Available: https://www.anthropic.com/ news/claude-haiku-4-5

Anthropic, Claude Haiku 4.5. Available: https://www.anthropic.com/ news/claude-haiku-4-5. (Accessed: 2025-12-21)

2025

[22] [22]

Available: https://github.com/dhiaaeddine16/ LLMGraphTransformer

LLMGraphTransformer. Available: https://github.com/dhiaaeddine16/ LLMGraphTransformer. (Accessed: 2025-11-2)

2025

[23] [23]

Exploring LLM To Extract Knowledge Graph From Academic Abstracts,

V . E. Yamamoto et al., “Exploring LLM To Extract Knowledge Graph From Academic Abstracts,” ISWC 2025 Companion V olume, Nov. 2025. Available: https://ceur-ws.org/V ol-4085/paper49.pdf

2025

[24] [24]

Available: https://platform

OpenAI, text-embedding-3-small Model. Available: https://platform. openai.com/docs/models/text-embedding-3-small. (Accessed: 2025-12- 21)

2025

[25] [25]

AI-Augmented SOC: A Survey of LLMs and Agents for Security Automation,

S. Srinivas et al., “AI-Augmented SOC: A Survey of LLMs and Agents for Security Automation,” Journal of Cybersecurity and Privacy, V ol. 5, Article Nr. 95, Sep. 2025. Available: https://doi.org/10.3390/jcp5040095

work page doi:10.3390/jcp5040095 2025

[26] [26]

Revisiting Representation Degeneration Problem in Language Modeling,

Z. Zhang et al., “Revisiting Representation Degeneration Problem in Language Modeling,” In Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 518—527, Nov. 2020. Available: https: //doi.org/10.18653/v1/2020.findings-emnlp.46

work page doi:10.18653/v1/2020.findings-emnlp.46 2020

[27] [27]

Embedding-Based de- cision support framework for large-scale content analysis

M. Kamat, J. Jagasia, A. Vaidya, and O. Surve, “Embedding-Based de- cision support framework for large-scale content analysis”, Knowledge- Based Systems, V olume 332, Nov. 2025. Available: https://doi.org/10. 1016/j.knosys.2025.114926. (Accessed: 2025-12-21)

arXiv 2025